In recent years, online retailing firms have been experimenting and implementing innovative dynamic pricing strategies and inventory policies to better match demand with supply. In many cases, the decision maker may not know the demand distributional information of a given product a priori, and can only collect observed sales data or censored demand data over time. The key challenge is that the collected data is affected by the operational decisions by the decision maker, which then affects the decision maker's understanding of the underlying system in making new operational decisions.
In this talk, we propose new nonparametric learning algorithms for several fundamental models with unknown demand functions under censored demand information, including the periodic-review perishable inventory problem, the lost-sales inventory problem, and the joint pricing and inventory control problem. The performance measure is regret, which is the revenue or cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. We show that the proposed algorithms converge to the clairvoyant optimal policies as the planning horizon increases, and obtain the convergence rate of regret. The techniques developed are effective for learning a stochastic system with complex systems dynamics and lasting impact on decisions.
This talk is based on joint work with Xiuli Chao, Beryl Chen, and Huanan Zhang.