Dynamic Programming and Optimal Control: Session 4

In today's lecture I used a portion of my slides from a talk on the Gittins Index. You may enjoy looking through the entire talk. It will look better if you download the .pdf file to your computer and read the presentation with a .pdf viewer such as acrobat, rather than try to read within your browser.

I have slightly changed Problems sheet 2. The assignment for this week is to do questions 5-7.

Subsequent to today's lecture I have made some changes to sections 8.5 and 8.7, now using $C_i$ to mean the completion time of job $i$. I did not discuss Section 8.8 of the notes. This is about an application of the Gittins index theorem to a problem in which it at first does not seem to apply because the states of the bandits that are not continued are not frozen.

It is remarkable how many interesting problems can be recast as bandit problems. We saw today several ways to massage some problems into the form of bandit processes (Weitzmas's problem, target and tax problems, and scheduling in a $M/G/1$ queue). In particular, we saw that the $\beta\to 1$ limit can be meaningful. You will explore this a bit in Problem 6 on Problems Sheet 2.

The proof of the Gittins index theorem is actually fairly easy. Yet it is one of the most beautiful results in the field of Markov decision processes. Its discovery and proof in 1974 is due to John Gittins. The proof I have given in today's lecture is very different to Gittins's original proof. It was first presented in my paper On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 1024-33, 1992.

Dynamic Programming and Optimal Control

Monday, February 11, 2013

Session 4

Statcounter