Call for Abstracts
Firms often have to sequentially decide how to allocate a finite budget among competing actions. For example, advertisers must decide which ad to show to the next website visitor given several ad copies. Online retailers must decide which products to recommend next. Learning in such situations is not trivial because choosing one alternative improves estimates about its success probability but takes resources away from a potentially better alternative. When the firm is simultaneously interested in learning and in revenue, i.e., in ‘learning while earning’, these problems are referred to as Multi-Armed Bandit (MAB) problems. These algorithms use tools and techniques ranging from inference, predictive analytics, and general learning algorithms to address the exploration-exploitation trade-off inherent to such problems.
Topics & Domains
The topics of the workshop include, but are not limited to, the following:
- Applications to online advertising, consumer search, news and product recommendation, product recommendations, energy markets, clinical trials, experimental design, portfolio management, website design, and many other domains
- Theoretical aspects of the exploration-exploitation tradeoff
- Adaptive learning algorithms, broadly defined
- Novel statistical machine-learning methods for informing sequential decision making
- POMDPs with different types of rewards (e.g., terminal or cumulative), state (in)dependence, and policies
- Optimality and convergence
- New approaches to address thorny practical challenges, such as the curse of dimensionality, scalability, or latency (in online problems) when computing, approximating and learning optimal and near-optimal policies
- Diverse approaches including regret minimization, dynamic allocation indices, confidence bound, dynamic programming, sequential experimentation, look-ahead, and others