Can Deep Reinforcement Learning Improve Inventory Management? Performance on Dual Sourcing, Lost Sales and Multi-Echelon Problems
Abstract
We study the effectiveness of Deep Reinforcement Learning (DRL) algorithms on classic inventory problems in operations. Despite the excitement about Deep Reinforcement Learning in many industries, such as gaming, robotics and self-driving cars, DRL applications in operations and supply chain management remain rather scarce. We identify this gap and provide a rigorous performance evaluation of DRL on three classic and intractable inventory problems: dual-sourcing, lost sales and multi-echelon inventory management. We model each inventory problem as a Markov Decision Process and apply a state-of-the-art DRL algorithm: the A3C algorithm. We show how to apply and tune the A3C algorithm to achieve good performance across three sets of inventory problems, for a variety of parameter settings. We demonstrate that the A3C algorithm can match performance of many state-of-the-art heuristics and other approximate dynamic programming methods, with limited changes to the tuning parameters across all studied problems. Yet, tuning DRL algorithms remains computationally burdensome while the resulting policies often lack interpretability. Generating structural policy insight or designing specialized policies that are (ideally provably) near optimal thus remains indispensable. DRL provides a promising research avenue, especially when problem-dependent heuristics are lacking. In this case DRL may be used to set new benchmarks or provide insight in the development of new heuristics.