2009 SSCI ADPRL Tutorial

Tutorial ADPRL-1 - Adaptive Dynamic Programming and Reinforcement Learning for Feedback Control of Dynamical Systems

Tuesday, March 31, 8:30AM-10:30AM, Room: Two Rivers

Presenter: Frank L. Lewis
Automation and Robotics Research Institute
University of Texas at Arlington Riverbend
USA

Optimal Control design techniques have provided very effective feedback controllers for modern systems in aerospace, vehicle systems, industrial process control, robotics, mobile robots, wireless sensor networks, and elsewhere. Optimal control design is fundamentally a backwards-in-time procedure based on dynamic programming, specifically on Bellman’s Optimality Principle. This means that most existing optimal control design methods must be carried out off-line. Moreover, the full system dynamical description must generally be known to compute optimal controllers using well-known techniques, such as Riccati equation design.

In this talk we show how to implement optimal feedback controllers on-line forward in time for systems whose dynamical description is not known or is partially known. In effect, a family of on-line Optimal Adaptive Controllers is provided, whereby adaptive learning techniques can be used to learn the optimal control strategy in real time using system measured input-output data. Both discrete time system and continuous time systems are covered. New results in ADP for continuous time systems are given.

These Optimal Adaptive Controllers are based on Werbos’ Approximate Dynamic Programming (ADP) and Q learning. Reinforcement Learning is a method for on-line learning of control policies based on stimuli from the environment in response to current control policies. Such methods were used by I.M. Pavlov for learning in mammals. Particularly interesting are the actor-critic structures, including those based on policy iteration those based on value iteration. A special case of value iteration is the ADP structures. Q-learning is a method of actor critic reinforcement learning that does not require any knowledge of the system dynamics, yet finds optimal control policies on-line in real time. ADP and Q learning have been well developed by the Computational Intelligence Community, and have not been fully explored for feedback control purposes within the Control Systems Community.

In this talk we show how ADP and Q learning can be used to design a highly effective class of adaptive controllers that converge on-line to optimal control solutions, given prescribed cost functions. Results are shown for feedback control of linear and nonlinear systems. H-infinity 2-player Nash game controllers are designed, as well as modified and improved Receding Horizon Controllers.

IEEE SSCI 2009     March 30 – April 2, 2009     Sheraton Music City Hotel, Nashville, TN, USA