36
Reinforcement Learning LU 1 - Introduction Dr. Joschka B¨ odecker AG Maschinelles Lernen und Nat¨ urlichsprachliche Systeme Albert-Ludwigs-Universit¨ at Freiburg [email protected] Acknowledgement Slides courtesy of Martin Riedmiller and Martin Lauer Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)

Reinforcement Learning - uni-freiburg.deml.informatik.uni-freiburg.de/_media/teaching/ws1516/rl/ue1.pdf · Reinforcement Learning LU 1 - Introduction Dr. Joschka B odecker AG Maschinelles

  • Upload
    ngocong

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Reinforcement Learning

LU 1 - Introduction

Dr. Joschka BodeckerAG Maschinelles Lernen und Naturlichsprachliche Systeme

Albert-Ludwigs-Universitat Freiburg

[email protected]

AcknowledgementSlides courtesy of Martin Riedmiller and Martin Lauer

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)

Organisational issues

Dr. Joschka BoedeckerRoom 00010, building [email protected] hours: Tuesday 2 - 3 pm

no script - slides available onlinehttp://ml.informatik.uni-freiburg.de/teaching/ws1516/rl

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)

Dates winter term 2015/2016

3+1Lecture Monday, 14:00 (c.t.) - 15:30, SR 02-017, building 052Wednesday, 16:00 (s.t) - 17:30, SR 02-017, building 052

Exercise sessions on Wednesday, 16:00 - 17:30, interleaved with lecturestarting at Oct. 28held by Jan Wulfing, [email protected]

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)

Goal of this lecture

Introduction of learning problem typeReinforcement LearningIntroduction to the mathematical basicsof an independently learning system.

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)

Goal of the 1. unit

Motivation, definition and differentiation

Outline

I Examples

I Solution approaches

I Machine Learning

I Reinforcement Learning

I Overview

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)

Example Backgammon

Can a program independently learn Backgammon?

Learning from success (win) andfailure (loss)

Neuro-Backgammon:Playing at world champion level(Tesauro, 1992)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)

Example pole balancing (control engineering)

Can a program independently learn balancing?

Learning from success and failure

Neural RL Controller:Noise, inaccuracies, unknownbehaviour, non-linearities, ...(Riedmiller et.al. )

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)

Example robot soccer

Can programs independently learn how to cooperate?

Learning from success and failure

Cooperative RL Agents:Complexity, distributed intelligence, ...(Riedmiller et.al. )

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)

Example: Autonomous (e.g. humanoid) robots

Task: Movement control similar to humans(walking, running, playing soccer, cycling, skiing,...)Input: Image from cameraOutput: Control signals to the joints

Problems:

I very complex

I consequences of actions hard to predict

I interference / noise

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)

Example: Maze

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)

The ’Agent Concept’

[Russell and Norvig 1995,page 33] ”An agent isanything that can beviewed as perceiving itsenvironment throughsensors and acting uponthat environment througheffectors.”

examples:

I a human

I a robot arm

I an autonomous car

I a motor controller

I ...

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)

Solution approaches in ’Artificial Intelligence’ (AI)

I Planning / search (e.g. A∗, backtracking)

I Deduction (e.g. logic programming, predicate logic)

I Expert systems (e.g. knowledge generated by experts)

I Fuzzy control systems (fuzzy logic)

I Genetic algorithms (evolution of solutions)

I Machine Learning (e.g. reinforcement learning)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)

Types of learning (in humans)

I Learning from a teacher

I Structuring of objects

I Learning from experience

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)

Types of Machine Learning (ML)

I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)

I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.

I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Types of Machine Learning (ML)

I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)

I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.

I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Types of Machine Learning (ML)

I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)

I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.

I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Machine Learning: ’ingredients’

1. Type of the learning problem (given / seeked)

2. Representation of learned solution knowledgetable, rules, linear mapping, neural network, . . .

3. Solution process (observed data 7→ solution)(heuristic) search, gradient descent, optimization technique, . . .

Not at all: ’For this problem I need a neural network’

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)

Emphasis of the lecture: Reinforcement Learning

I No information regarding the solution strategy required

I Independent learning of a strategy by smart trial of solutions (’trial anderror’)

I Biggest challenge of a learning system

I Representation of solution knowledge by usage of a function approximator(e.g. tables, linear models, neural networks, etc.)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)

RL using the example of autonomous robots

bad: Damage (fall, ...)good: task done successfullybetter: fast / low energy / smoothmovements /. . .⇒ optimization!

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)

Reinforcement Learning (RL)

Also: Learning from evaluations, autonomous learning, neuro dynamicprogramming

I Defines a learning type and not a method!Central feature: Evaluating training signal - e.g. ’good’ / ’bad’

I RL with immediate evaluation:Decision 7→ EvaluationExample: Parameter for a basketball throw

I RL with rewards delayed in timeDecision, decision, . . . , decision → evaluationsubstantially harder; interesting, because of versatile applications

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)

Delayed RL

I Decision, decision, . . . , decision → evaluation

I Example: Robotics, control systems, games (chess, backgammon)

I Basic problem: Temporal credit assignment

I Basic architecture: Actor-critic system

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)

Multistage decision problems

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)

Actor-critic system (Barto, Sutton, 1983)

Actor: In situation s choose action u (strategy π : S → U)Critic: ’Distribution’ of the external signal onto single actions

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)

Reinforcement Learning

I 1959 Samuel’s Checker-Player: Temporal difference (TD) methods

I 1968 Michie and Chambers: Boxes

I 1983 Barto, Sutton’s AHC/ACE, 1987 Sutton’s TD(λ)

I Early 90ies: Correlation between dynamic programming (DP) and RL:Werbos, Sutton, Barto, Watkins, Singh, Bertsekas

I DP - classic optimization technique (late 50ies: Bellman)too much effort for large tasksAdvantage: Clean mathematical formulation, convergences

I 2000 Policy Gradient methods (Sutton et. al, Peters et. al, ...)

I 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller, ..)

I many examples of successful, at least practically relevant applications since

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)

Other examples

field input goal exampleoutput (actions)

games board situation winning backgammon, chessvalid move

robotics sensor data reference value pendulum, robot soccercontrol variable

sequence state gain assembly line, mobile networkplanning candidate

benchmark state goal position mazedirection

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)

Goal: Autonomous learning system

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)

Approach - rough outline

I Formulation of the learning problem as an optimization task

I Solution by learning based on the optimization technique of DynamicProgramming

I Difficulties:I very large state spaceI process behaviour unknown

I Application of approximation techniques (e.g. neural networks, ...)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Approach - rough outline

I Formulation of the learning problem as an optimization task

I Solution by learning based on the optimization technique of DynamicProgramming

I Difficulties:I very large state spaceI process behaviour unknown

I Application of approximation techniques (e.g. neural networks, ...)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Approach - rough outline

I Formulation of the learning problem as an optimization task

I Solution by learning based on the optimization technique of DynamicProgramming

I Difficulties:I very large state spaceI process behaviour unknown

I Application of approximation techniques (e.g. neural networks, ...)

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Outline of lecture

1. part: Introduction

2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration

3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning

4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning

5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture

1. part: Introduction

2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration

3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning

4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning

5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture

1. part: Introduction

2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration

3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning

4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning

5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture

1. part: Introduction

2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration

3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning

4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning

5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture

1. part: Introduction

2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration

3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning

4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning

5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Further courses on machine learning

I lecture: machine learning (summer term)

I lab course: deep learning (Wed., 10-12)

I Bachelor-/ Master theses, team projects

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)

Further readings

D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. AthenaScientific, Belmont, Massachusetts, 1996.

A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge,Massachusetts, 1998.

M. Puterman. Markov Decision Processes: Discrete Stochastic DynamicProgramming. John Wiley and Sons, New York, 1994.

L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: Asurvey. Journal of Artificial Intelligence Research, 4:237-285, 1996

M. Wiering (ed.). Reinforcement learning : state-of-the-Art. Springer, 2012

WWW:

I http://www-all.cs.umass.edu/rlr/

I http://richsutton.com/RL-FAQ.html

Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)