I will be using this blog space to talk about some extra things. Sometimes upon leaving a lecture I think, "I wish I had said ...". This blog gives me a place to say it. Or I can use this space to talk about a question that a student has asked during a lecture.
There is a commenting space after each post for you to write comments or ask questions using the
IntenseDebate commenting system. Please do not be shy to do so, as other students will probably benefit.
Please notice that I am writing this blog using LaTeX, which is interpreted into a pretty form by
MathJax. I discovered MathJax a short while ago, and am impressed how well it works in a web page. However, your browser must be running Javascript for it to work.
In the first session I gave you definitions and notation for talking about state, control, history, value function, etc, and in which to develop
dynamic programming equations for a very general case, a state-structured case, and the Markov decision process case. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will become old friends once you have used them a few times.
From this first session you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive.
As an example of a problem in which the optimal policy should be determined by working backwards from the end, I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003). I once wrote a nice Cambridge exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.
In the second hour today I discussed three examples, the last of which was the Secretary Problem. This is the most famous problem in the field of optimal stopping. It is credited to Merrill M. Flood in 1949, who called it the fiancée problem. It gained wider publicity when it appeared in Martin Gardner's column of Scientific American in 1960. There is an interesting Wikipedia article about it. One of the interesting things said in this article is that in behavioural studies people tend to stop too soon (i.e. marry too soon, make a purchase too soon). See The devil is in the details: Incorrect intuitions in optimal search.
The optimal solution to the secretary problem can also be found as a special case application of Bruss's odds-algorithm, due to Thomas Bruss.
Variations of the secretary problem can be created by changing the assumptions about whether or not, in addition to relative ranks, there are values $X_1,\dotsc,X_h$ can be observed, and whether the objective is to maximize the probability of choosing the best candidate, to maximize the expected rank, or to maximize the expected value of the candidate selected.
A variation that has never been completely solved is the so-called Robbin's Problem. In this problem we do observe values of candidates, say $X_1,\dotsc, X_h$, and these are assumed to be independent, identically distributed uniform$[0,1]$ random variables. The objective is to maximize the expected rank of the candidate that is selected (best = rank 1, second-best = rank 2, etc). It is known only that, as $h$ goes to infinity, the expected rank that can be achieved under an optimal policy lies between 1.908 and 2.329. This problem is much more difficult that the usual secretary problem because the decision as to whether or not to hire candidate $t$ must depend upon all the values of $X_1,\dotsc, X_t$,
not just upon how $X_t$ ranks amongst them.
At the end of the class today one of you asked me if there are any interesting problems in which candidates are not homogeneous. I have not thought about this before, but it sounds like it could be interesting. We might suppose. for example that there are 8 candidates. Their ranks are determined by the ranks of some hidden i.i.d.
$N(\mu_i,1)$ random variables,where the $\mu_i$ are known to us. At each stage we must choose which candidate we want to interview next. Clearly we should be able to do better than with 8 homogeneous candidates ($\mu_1=\cdots=\mu_8$). I wonder, how much better, and how? Is it better to start off by seeing a strong candidate (large $\mu_i$) or a weaker candidate (small $\mu_i$)? Of course this is just one possibility way to model non-homogeneous candidates. What other models might we make for non-homogeneous candidates? As MS/OR people we find it natural to think about tweaking models until we find one that is both solvable, and interesting.