Introduction to reinforcement learning
Volume horaire : 24
Description du contenu de l'enseignement :
1. Multiarmed bandits, Markov Decision Processes and other models
2. Planning: finite and infinite horizon problems, the value function, Bellman equations, dynamic programming, value and policy iteration
3. Probabilistic and statistical tools for RL: Bayesian models, relative entropy and hypothesis testing, concentration inequalities, linear regression, the stochastic approximation algorithm
4. RL algorithms for multiarmed bandits: the explore vs. exploit compromise, bandit algorithms vs. A/B testing, UCB, Thomson sampling,
contextual bandits
5. RL algorithms for Markov Decision Processes: off policy and on policy learning, Q-learning, SARSA, Monte Carlo tree search
Compétence à acquérir :
This introductory course will provide the main methodological building blocks of reinforcement learning. Some basic notions in probability theory are required to follow the course. The course will imply some work on simple implementations of the algorithms, assuming familiarity with common scientific computing language.
Bibliographie, lectures recommandées
Bibliographie, lectures recommandées
M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic
Programming. John Wiley & Sons, 1994.
R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press,
1998.
C. Szepesvari. Algorithms for Reinforcement Learning. Morgan & Claypool
Publishers, 2010
J. Myles White. Bandit Algorithms for Website Optimization. O'Reilly. 2012
T. Lattimore and C. Szepesvari. Bandit Algorithms. Cambridge University Press. 2019. downloads.tor-lattimore.com/banditbook/book.pdf