Dynamic Programming and Optimal Control: Session 6

In today's session we looked at some more advanced topics, which previous sessions have prepared us to understand. I focused on some problems that I have studied in my own research.

With hindsight, I may have been too ambitious in imagining what I might cover this morning. I hope that my notes will help fill in gaps if you wish to learn more.

In the end, we covered the notion of Whittle indices for restless bandits. We saw that the policy they define is optimal, or near optimal, in the limit that the number of identical bandits $n$ tends to infinity. This was by the fact that in large-$n$ systems the proportions of bandits in the states have a "fluid model", which is linear within polyhedral regions.

In the second part we looked at stochastic sequential allocation and assignment problems. The assignment problem (SSAP) has a surprising answer: that the optimal strategy can be described in terms of thresholds that do not depend on the values of the $p_i$. The proof is cute and is worth understanding.

I mentioned Rhoda Righter's nice treatment of a SSAP with arrivals, in which the incentives for individual (selfish) behaviour by workers can be made to coincide with the socially optimal behaviour. This is also true in the SSAP without arrivals, if the jobs arr offered to the workers in decreasing order of their $p_i$. For example, with $n$ workers present, the $i$th will accept the job iff it has been declined by workers $1,2,\dotsc,i-1$ and then is value $X$ is at least $a_{i,n}$, which is the expected value of the job that will be assigned to $i$th worker if she rejects this currently offered job. It will be declined by the previous workers precisely if $X < a_{i-1,n}$.

There was not time to discuss all problems. I did not fully discuss all the solutions to Problem Sheet 2 5-7. The model solutions are now on the web site.

Some reference about the Bomber Problem and its relatives are in the previous post. The slides I used to talk to are here.

Dynamic Programming and Optimal Control

Monday, February 18, 2013

Session 6

Statcounter