Dynamic Programming and Optimal Control: January 2013

Monday, January 28, 2013

Search for a Moving Target

In Section 4.5 I talked about a simple problem of searching for a moving target that moves back and forth randomly between two locations (e.g. think of a destroyer that hunting for a submarine that moves between two depths).

In the following more general form this problem has never been solved! Suppose that it costs $c_i$ to search location $i$. If the object (or target) is in location $i$ and we search there then we will find it with probability $\alpha_i$. If we do not find it, the the object moves according to some two state Markov chain with known transition matrix $P=(p_{ij})$. The aim is to minimize the expected cost spent searching until we find the object.

Obviously we can always compute $p_t$, the probability that the object is in box 1, given the fact that we have made certain searches and not yet found it. Everyone believes that the optimal strategy is of the form "search box 1 iff $p_t\geq p^*$ for some $p^*$." The number $p^*$ will depend on all the parameters in the problem.

The optimality of this form of policy has been proved for the special case of $c_1=c_2$. However, it remains unproved for general $c_1,c_2$.

Problems 1-4

I discussed the solutions to Problems 1-4. The text of my model solutions is now on the web site, under the "Resources" heading in the right hand column.

These are fairly straightforward questions. I think that one of the most important things at this stage is to become comfortable with formulating optimality equations. We've now seen quite a lot of examples. Some have been a little non-standard, like the multiplying matrices question, and job scheduling.

Next time we will discuss Problems 5-8, and I will hand out Problems sheet 2.

Session 2

I started today by showing you the centipede game. The analysis of this game is by a backward induction, similar to that which we use in finite-horizon dynamic programming.

In this class we had an important theorem (Theorem 3.1). It had an easy but nontrivial proof. Theorem 3.1 states that $F(x)$ satisfies a DP equation (3.7). It holds under the assumption of D (discounted), N (negative) or P (positive) programming. I explained what these three assumptions mean.

I gave an example, $c(x,u) = x$, $a(x,u) = –x$, $\beta = 1$, for which $F(x)$ does not exist. Note that a problem with this data fails to satisfy D, N or P.

The example in Section 3.5 (of selling a tulip bulb collection) is very much like the secretary problem in Section 2.3. The difference is that now (i) we observe values (not just ranks), (ii) wish to maximize the expected value of the selected candidate (rather than probability of choosing the best), and (iii) the number of offers is infinite, but with a discount factor beta.

One important way in which discounting by a factor beta can naturally arise is when a catastrophe can occur, with probability $1–\beta$, bringing the problem to an end. We discussed the fact that the policy does not change if offers remain open.

In Section 4.3 (optimal gambling) we saw that timid play is optimal in the gambling problem when the game is favorable to the gambler ($p \geq 0.5$).

Consider this question: "A gambler has £2 and needs to increase it to £10 in a hurry. The gambler decides to use a bold strategy in which he stakes all his money if he has £5 or less, and otherwise stakes just enough to increase his capital, if he wins, to £10."

When $p \leq 0.5$ bold play maximizes the probability that the gambler reaches N pounds. However, bold play may not be uniquely optimal.

When it was first discovered that bold play is not uniquely optimal, it was contrary to the accepted wisdom at that time (1955). It impressed Leonard Savage when it was presented to him by a Ph.D. student named Lester Dubins They developed a collaboration culminating in the famous monograph How to Gamble if You Must (Inequalities for Stochastic Processes). See the fourth paragraph of this obituary of Lester Dubins.

If $p=0.5$ all strategies are optimal. How could we prove that? Easy. Simply show that given any policy $\pi$ the value function is $F(\pi,i)=i/N$ and that this satisfies the DP equation. Then apply Theorem 4.1. You can read more about these so-called red and black games at the Virtual Laboratories in Probability and Statistics.

In Section 4.5 (pharmaceutical trials) I introduced an important class of very practical problems. One obvious generalization is to a problem with $k$ drugs, each of which has a different unknown probability of success, and about which we will learn as we make trials of the drugs. This is called a multi-armed bandit problem. The names comes from thinking about a gaming machine (or fruit machine), having $k$ arms that you might pull, one at a time. The arms have different unknown probabilities of generating payouts. In today's lecture we considered a special case of the two-armed bandit problem in which one of the arms has a known success probability, $p$, and the second arm has an unknown success probability, $\theta$. I will say more about these problems in latter classes. The table on page 18 was computed by value iteration. This table is from the book Multi-armed Bandit Allocation Indices (of which I am one of the authors). The amazon.co.uk page will let you look inside this book.

Monday, January 21, 2013

Find the greatest

I mentioned today a problem of minimizing the expected number of questions required to identify the greatest of $X_1,\dotsc,X_n$, where these are i.i.d. $U[0,1]$.

We pick a set $A(1) \subset [0, 1]$ and ask which $X_i \in A(1)$. Depending on the response, we pick another subset $A(2)$ and ask which $X_i \in A(2)$, and so on, until we identify the largest $X_i$.

It turns out that it is optimal to ask questions only of the special form "Which $X_i$ are bigger than a(1)?", "Which $X_i$ are bigger than a(2)", etc.

If you would like to see how this is proved, the paper is An optimal strategy for a conflict resolution problem, by V. Anantharam and P. Varayia, Systems and Control Letters, 1986, pp. 329 -332.

A fact about this problem which is initially surprising is that the minimum expected number of questions required remains bounded as $n\to\infty$. Just make the first question of the form "Which $X_i$ are bigger than n/(n+1)?"

Problems sheet 1

I handed out Problems sheet 1 this morning and as Assignment 1 I have asked students who are doing the course for credit to do problems 1-4 this week. Of course I hope other participants will also enjoy the questions. I have chosen my selection of questions quite carefully, as ones that are pedagogically helpful to do.

You may hand-write your solutions, or write them up in LaTeX. It's an essential to know LaTeX if you are doing anything at all mathematical. Almost all research papers in MS/OR are prepared that way. So this might be a good opportunity to practice if you have not done so already.

If you are able, you can scan your pages, or prepared a .pdf file that can be emailed to me. I do not insist on this, but if you can email your work to me in the afternoon on Sunday, then I can probably return your work to you at our class on Monday.

Next time I will talk briefly about the solutions to problems 1-4. Some people may want to also do Problems 5-8. Problem #7 needs a special theorem that will come next time. You may, however, be able to guess its answer.

Session 1

I will be using this blog space to talk about some extra things. Sometimes upon leaving a lecture I think, "I wish I had said ...". This blog gives me a place to say it. Or I can use this space to talk about a question that a student has asked during a lecture.

There is a commenting space after each post for you to write comments or ask questions using the IntenseDebate commenting system. Please do not be shy to do so, as other students will probably benefit.

Please notice that I am writing this blog using LaTeX, which is interpreted into a pretty form by MathJax. I discovered MathJax a short while ago, and am impressed how well it works in a web page. However, your browser must be running Javascript for it to work.

In the first session I gave you definitions and notation for talking about state, control, history, value function, etc, and in which to develop dynamic programming equations for a very general case, a state-structured case, and the Markov decision process case. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will become old friends once you have used them a few times.

From this first session you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive.

As an example of a problem in which the optimal policy should be determined by working backwards from the end, I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003). I once wrote a nice Cambridge exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.

In the second hour today I discussed three examples, the last of which was the Secretary Problem. This is the most famous problem in the field of optimal stopping. It is credited to Merrill M. Flood in 1949, who called it the fiancée problem. It gained wider publicity when it appeared in Martin Gardner's column of Scientific American in 1960. There is an interesting Wikipedia article about it. One of the interesting things said in this article is that in behavioural studies people tend to stop too soon (i.e. marry too soon, make a purchase too soon). See The devil is in the details: Incorrect intuitions in optimal search.

The optimal solution to the secretary problem can also be found as a special case application of Bruss's odds-algorithm, due to Thomas Bruss.

Variations of the secretary problem can be created by changing the assumptions about whether or not, in addition to relative ranks, there are values $X_1,\dotsc,X_h$ can be observed, and whether the objective is to maximize the probability of choosing the best candidate, to maximize the expected rank, or to maximize the expected value of the candidate selected.

A variation that has never been completely solved is the so-called Robbin's Problem. In this problem we do observe values of candidates, say $X_1,\dotsc, X_h$, and these are assumed to be independent, identically distributed uniform$[0,1]$ random variables. The objective is to maximize the expected rank of the candidate that is selected (best = rank 1, second-best = rank 2, etc). It is known only that, as $h$ goes to infinity, the expected rank that can be achieved under an optimal policy lies between 1.908 and 2.329. This problem is much more difficult that the usual secretary problem because the decision as to whether or not to hire candidate $t$ must depend upon all the values of $X_1,\dotsc, X_t$, not just upon how $X_t$ ranks amongst them.

At the end of the class today one of you asked me if there are any interesting problems in which candidates are not homogeneous. I have not thought about this before, but it sounds like it could be interesting. We might suppose. for example that there are 8 candidates. Their ranks are determined by the ranks of some hidden i.i.d. $N(\mu_i,1)$ random variables,where the $\mu_i$ are known to us. At each stage we must choose which candidate we want to interview next. Clearly we should be able to do better than with 8 homogeneous candidates ($\mu_1=\cdots=\mu_8$). I wonder, how much better, and how? Is it better to start off by seeing a strong candidate (large $\mu_i$) or a weaker candidate (small $\mu_i$)? Of course this is just one possibility way to model non-homogeneous candidates. What other models might we make for non-homogeneous candidates? As MS/OR people we find it natural to think about tweaking models until we find one that is both solvable, and interesting.

Course starts January 21, 2013

Our sessions will be in Room ALH-304

Monday 21 January, 08.00 – 10.40.
Monday 28 January, 08.00 – 10.40.
Monday 4 February, 08.00 – 10.40.
Monday 11 February, 08.00 – 10.40.
Friday 15 February 15.45-18.30.
Monday 18 Febraury, 08.00 – 10.40.
Monday 25 Febraury, 08.00 – 10.40.
Monday 4 March, 08.00 – 10.40.
Monday 11 March, 08.00 – 10.40.
Friday 15 March 15.45-18.30.

There is an extra session in reserve, if needed:
Monday 18 March 08.00 – 10.40.

and
Friday 22 March 15.45-18.30 FINAL EXAM

Monday, January 28, 2013

Monday, January 21, 2013

Statcounter