Accès direct au contenu


Version anglaise


Accueil > Formations > Master MVA > Présentation des cours

Sequential learning

Intervenant : Pierre Gaillard (INRIA) and Remy Degenne (INRIA)

Objectif du cours :

In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc.
The objectives of the course (in English) is to introduce and study the main concepts (regret, calibration, etc.) of online learning, construct algorithms and show connection with game theory.

We will also cover the bandit setting (cf the course of Reinforcement learning) and its generalization, the partial monitoring

Thèmes abordés :

* Regret minimization
* Calibration
* Exponential weights algorithms
* Stochastic Optimization
* Game Theory

Pré-requis :

Probability and Optimization Notion.

Organisation des séances :

6 classes on the blackboard.

Mode de validation :

Devoir Maison
Final Exam

Références :

Prediction, learning, and games Nicolò Cesa-Bianchi and Gábor Lugosi Cambridge University Press, 2006

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems S. Bubeck and N. Cesa-Bianchi, . In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012.

Approachability, Regret and Calibration: Implications and equivalences. V. Perchet, Journal of Dynamics and Games, 1:181-254, 2014

Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.