Dynamic Programming and Optimal Control: Sessions plan

This both a record of what we have done and a draft plan for future sessions. Completed sessions are in black, future sessions in blue.

The number of Ph.D. students at the first session was 12, plus 2 faculty. One additional Ph.D. student has written to say that he will be attending in the future. Two students are attending the course for credit.

I will try to tailor the material to your interests and needs once I know you better.

Monday January 21 8.00-10.40

Dynamic Programming: The principle of optimality. State-structured models. Shortest path problem. Feedback, open-loop, and closed-loop controls. Markov decision processes. Who Wants to be a Millionaire?

Examples of Dynamic Programming: Useful tricks to solve dynamic programming problems. The idea that it can be useful to model things in terms of time to go. Managing spending and savings. Exercising a stock option. The secretary problem. Inventory control models.

Problems sheet 1 was distributed (containing questions on sessions 1 and 2).
Students were assigned to do questions 1-4 on Problems sheet 1.
Students were asked to send me an email to comment on their background, interests and how they found the pace and interest of the first session.

Monday January 28 8.00-10.40

Dynamic Programming over the Infinite Horizon: Cases of discounted, negative and positive dynamic programming. Validity of the optimality equation over the infinite horizon. Job scheduling problem. Asset selling problem.

Discussion of Problems 1-4.

Positive Programming: Special theory for maximizing nonnegative rewards. Possibility that can be no optimal policy. If a given policy has a value function that satisfies the optimality equation then that policy is optimal. Value iteration algorithm. Optimal gambling. Searching for a moving target. Pharmaceutical trials.

Students gave in work on questions 1-4 of Problems sheet 1.

Students were asked to do questions 5-8 on Problems sheet 1.

Monday February 4 8.00-10.40

Negative Programming: A partially observed Markov decision process. Stationary policies. The special theory of minimizing nonnegative costs. We see that action that extremizes the right hand side of the optimality equation is an optimal policy. Optimal stopping problems with finite horizon. Optimality of one-step look-ahead rule. Optimal parking.

Discussion of Problems 5-8.

Optimal Stopping Problems: Bruss’s odds algorithm. Secretary problem revisited. Optimal stopping over an infinite horizon. Stopping a random walk. Sequential probability ratio test. Two-armed bandit. Prospecting.

Students gave in work on questions 5-8 of Problems sheet 1.
Student were assigned questions 1-4 on Problems sheet 2 (containing questions on session 3-5).

Monday February 11 8.00-10.40

Bandit Processes and the Gittins Index: The multi-armed bandit problems. General bandit processes. Gittins index theorem. Calibration. Clinical trials. Applications to scheduling problems.

Discussion of Problems 2:1-4.

Applications of Bandit Processes: Stochastic scheduling. Playing golf with more than one ball. Target processes. Weitzman's problem. [Searching for a stationary hidden object.]

Students handed in work on questions 1-4 of Problems sheet 2.
Student were assigned questions 5-7 on Problems sheet 2.

Friday 15 February 15.45-18.30

Average-cost Programming: The average-cost optimality equation. Value iteration bounds. Policy improvement algorithm. Acceptance/rejection of offered jobs.

Continuous-time Markov Decision Processes: Stochastic scheduling. Lady’s nylon stocking problem. Makespan and flowtime. Control problems in a continuous-time stochastic setting. Markov jump processes when the state space is discrete. Uniformization. Admission control at a queue.

Monday February 18 8.00-10.40
Restless Bandits: Examples. Whittle index. Indexability. Asymptotic optimality of Whittle index.

Discussion of Problems 2:5-7.

Sequential Assignment and Allocation Problems: Sequential stochastic assignment problem, Investment problem. Bomber and fighter problems. Online bin packing.

Students handed in work on questions 5-7 of Problems sheet 2.
Student were assigned questions 1-3 on Problems sheet 3.

Monday February 25 8.00-10.40

LQ Regulation: LQ regulation model in discrete and continuous time. Riccati equation, and its validity in the model with additive white noise. Linearization of nonlinear models.

Discussion of Problems 3:1-3.

Controllability, Observability & Stabilizability: Controllability and observability in discrete and continuous time. Broom balancing. Stabilizability. Stablizing a pendulum.

Students handed in work on questions 1-3 of Problems sheet 3.
Student were assigned questions 4-7 on Problems sheet 3.

Monday March 4 8.00-10.40

Infinite horizon limits and Observability. Conditions for observability. Conditional distributions of jointly Gaussian random vectors.

Kalman Filter and Certainty Equivalence: The LQG model. The Kalman filter. Certainty equivalence. Separation principle.

Dynamic Programming in Continuous Time: The Hamilton-Jacobi-Bellman equation for dynamic programming in continuous time. Sustainable fishing problem.

Students handed in work on questions 4-7 of Problems sheet 3.
Student were assigned questions 1-3 on Problems sheet 4.

Monday March 11 8.00-10.40

Pontryagin’s Maximum Principle: Optimization of consumption. Pontryagin’s maximum principle. Parking a rocket car . Adjoint variables as Lagrange multipliers. Transversality conditions.

Applications of the Maximum Principle: Examples of typical arguments for synthesizing a solution to an optimal control problem using Pontryagin’s maximum principle. Insects as optimizers. Problems in which time appears explicitly. Monopolist. Neoclassical economic growth. Turnpike theory.

Students handed in work on questions 1-3 of Problems sheet 4.
Student were assigned questions 4,5,7,8 on Problems sheet 4.

Friday 15 March 15.45-18.30

I will not be setting any exam questions on topics in this final session.

Controlled Diffusion Processes: Brief introduction to controlled continuous-time stochastic models with a continuous state space, i.e. controlled diffusion processes.

Risk Sensitive Optimal Control: Whittle risk-sensitivity, risk-sensitive certain-equivalence principle, large deviations, stochastic maximum principle, optimization of consumption with uncertain lifetime.

No session.
Students to hand in work on questions 4-8 of Problems sheet 4.

Final exam: ALH304, from 09:30 to 12:30 on Thursday April 25

Sessions plan

Statcounter