Friday, March 15, 2013

Session 10

I have only been able to say a few things about Brownian motion and the Wiener process. The The Wikipedia articles referenced in the previous sentence are good and contain some nice video demonstrations of things like self-similar scaling. I reviewed only the very basics, but enough for us to develop a simple stochastic differential equation that corresponds to the optimality equation of dynamic programming in the case of continuous time and a continuous state space. Although Brownian motion is a subtle thing, one can infer (or at least guess) many of its properties by thinking about the simple discrete time symmetric random walk, of which it is a limit.

There are many interesting things to be said about Brownian motion. Paul Levy's Forgery Theorem for Brownian motion, implies that in two dimensions almost surely every arc of the path contains the complete works of Shakespeare in the handwriting of Bacon.

I mentioned in the lecture that it is initially surprising to see that in Example 19.4 the cost is greatest when there is no noise. This is a little bit counterintuitive and so we should try to understand why this is. I guess it is because $C_0$ and $C_1$ are almost the same, and $L$ is small compared to $Q$, and so we can let the noise take us to the endpoints, only expending minimal effort (which is costed at $Qu^2$). I guess that if $C_0$ were a lot less than $C_1$, or if $L$ were bigger compared to $Q$, then noise will not be helpful and we would expect Figure 4 to look different.

The last part of today's session was about Risk-sensitive optimal control. I have corrected the small sign errors that were in the notes. If you would like to learn more about this then I recommend this paper:

P. Whittle. Risk-sensitivity, large deviations and stochastic control, European Journal of Operational Research 73 (1994) 295-303.

I think it is the easiest one to start with. There are other references given at the end of this paper.

Thursday, March 14, 2013

Proof by Genie, continued

I am interested in collecting more examples of Proof by Genie, as mentioned in a previous post. If you come across an example of this, please let me know.

I've now remembered another example of proof by genie, actually from one of my own recent papers. Two people are trying to find each other on the vertices of a triangle. Initially they start at two different vertices and at each step $t=1,2,\dotsc$ each player stays or moves to another vertex, trying to minimize the expected number or steps until they meet. In working out the best symmetric rendezvous strategy it helps to assume that the players are given a common notion of what is "clockwise" on the triangle. However, once the optimal strategy is found it turns out that a common notion of clockwise is not needed to implement the optimal strategy. If you would like to learn more about this problem see

R. R. Weber, Optimal symmetric rendezvous search on three locations, Math Oper Res., 37(1): 111-122, 2012.

or seminar slides. One of the slides begins with the words "Suppose the problem is made a bit easier: the players are provided with the same notion of clockwise. (We will see later that this does not actually help the players.)"

Final exam

The course final exam has been rescheduled to Thursday April 25, in ALH304, from 09:30 to 12:30.

Tuesday, March 12, 2013

Broom balancing in the 21st century

Have a look at this link to see what optimal control can do when it is coupled to today's fast microprocessors and feedback controllers:

Quadrocopter pole acrobatics

Thanks to Felix Horns for finding this link.

Monday, March 11, 2013

Final homework

The final homework is Problems Sheet 4, Questions 4,5,7,8. (We have already done 6.) In all these questions the strategy is to use the necessary conditions of PMP to synthesize the optimal solution. What I mean by "synthesize" is that you write down all the information that PMP says is necessary (in terms of $u(t)$ maximizing, $H(x,u,\lambda)$ to $-\lambda_0(t)$, and $\dot \lambda=-\partial H/\partial x$, and any boundary and transversality conditions) until you convince yourself that the optimal policy can only be one thing. Question 5 is a bit like proving the shortest distance between two points is a straight line. The answer is fairly obvious, but we can prove it is correct by applying the necessary conditions of PMP to rule out the possibility of any other solution.

Please send the solutions to me by next Monday.

Session 9

The maximum principle that I talked about today is due to Lev Pontryagin. It is remarkable that despite being blind he was one of the greatest mathematicians of his generation. The key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ exists to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books that it appears. I hope I have been able to make it seem more intuitive. I think that Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation.

The rocket car example is a celebrated problem of optimal control theory that is nicely solved using the PMP. There is a nice interactive demo of it that you can try. I am pleased to have finally tracked down the fact that this problem was first solved by D.W. Bushaw, Differential Equations with a Discontinuous Forcing Term, Ph.D. Thesis, Princeton, 1952.

In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s Ph.D. thesis is recognized as the beginning of modern optimal control theory."

The name of A. A. Feldbaum (a Russian) is also mentioned in connection with this problem which he solved at about the same time. Pontryagin came up with his maximum principle a few years later, 1956.

The final example today ended with an example of turnpike theory. This is an important concept in economics and is concerned with the optimal path of capital accumulation. Can you see intuitively why the optimal turnpike does not depend on the utility function $g$ of the agent? The label "turnpike" is due to Dorfman, Samuelson and Solow. I have mentioned it in the blog What makes for a beautiful problem is Science?

My personal experience of the power of PMP came in solving the problem about searching for a moving target that I have previously mentioned.

R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.

The problem that is solved in this paper in continuous time is still an open problem in discrete time. That shows the power of PMP, in that with it we can solve problems in discrete time that cannot be solved in discrete time

Friday, March 8, 2013

What makes for a beautiful problem in Science?

In a paper with the title, What makes for a beautiful problem in Science? Paul Samuelson (Journal of Political Economy, Vol. 78, No. 6, 1970), writes:

What is the shortest distance between two points? A straight line. Can you prove it?

What is the longest distance between two points? Surely, no answer is possible. But consider two points a mile apart on a horizontal line in the plane. Suppose that the admissible paths cannot have an absolute slope exceeding one million. Then it is not hard to envisage that the longest distance is a bit over a million miles. But it takes the methods of Pontryagin and not those of the classical theory to handle such a problem in which the control variables are constrained within a closed set.

Following Session 9 you might like to see if you can use Pontryagin's maximum principle to prove what Samuelson claims. You must start deciding what he means by an admissible path. I expect he is thinking about a curve between $(0,0)$ and $(1,0)$ that is a graph $y(x)$ determined by $y'(x)=u(x)$, $0\leq x \leq 1$. I have added (sic) (for sic erat scriptum) because I think the word "not" should be deleted. He is trying to say that the maximum distance is a bit over a million miles, in fact $\sqrt{10^{12}+1}$. Samuelson goes on:

A scientific problem gains in aesthetic interest if you can explain it to your neighbor at a dinner party.

That no map requires more than four colors is such a problem. Can the turnpikes of Professor Chakravarty pass this test? In a system in which everything produces everything, a critical set of proportions at which balanced growth is most rapid seems reasonable. So the general notion of the von Neumann turnpike is conveyed. But suppose we begin away from this highway, and also are to end up away from it. Still, if the journey is to be a sufficiently long one, as seen we shall do well to move in the beginning toward that fast highway; catch a fast ride along it, staying so long there as to leave behind in all directions any rival who sticks to any slower path; then, toward the end of our journey, we can swing off the turnpike to our final rendezvous.

We will see this in Session 9 (the turnpike of economic growth, Section 15.6).

I cannot resist one more excerpt from Samuelson's paper, in which he muses on the way in which an optimal path may sometimes seem strange.

Begin with durable hammers and robots that can make new hammers or robots with the same symmetrical input requirements for the one as for the other. Each input is subject to familiar diminishing returns; but increasing both inputs proportionally leads to constant returns to scale. I begin with 10 robots and 5 hammers and must end up in minimum time with (at least) 1,000 robots and 5,000 hammers. How should I proceed? Surely, I begin by producing only hammers, since the redundant robots are subject to diminishing returns. When I reach 10 of each input, I shift gears to produce equal amounts of both, proceeding along what is known as a von Neumann-DOSSO turnpike. You might think I go along it until 1,000 of each are attained, thereafter producing only the 4,000 more hammers. And you might be right. But you might be wrong. Depending upon the exact technology, it might pay to produce more of both along the turnpike, ceasing to produce more robots only after they are somewhere between 1,000 and 5,000 in number. We end up with more than we needed of robots. Irrational, since we don't want them and they cost something? Not necessarily, since they may have been worthwhile for their help in producing the extra hammers we do want.

Monday, March 4, 2013

Kalman Filter in economics and finance

To follow up a question this morning from Yiangos. I asked my colleague Lord Eatwell, a famous economist and President of my college, "Is the Kalman filter much used in economics". He immediately replied, "Yes, all the time".

Eatwell is one of the compliers of the Palgrave Dictionary of Economics. I searched in Palgrave for the Kalman Filter, and read:

The Kalman filter

The Kalman filter deals with state-space representations where the transition and measurement equations are linear and where the shocks to the system are Gaussian. The procedure was developed by Kalman (1960) to transform (‘filter’) some original observables $y_t$ into Wold innovations at and estimates of the state $x_t$. With the innovations, we can build the likelihood function of the dynamic model. With the estimates of the states, we can forecast and smooth the stochastic process.

The use of unobserved components opens up a new range of possibilities for economic modelling. Furthermore, it provides insights and a unified approach to many other problems. The examples below give a flavour.

The local linear trend model generalizes (1) by the introduction of a stochastic slope, βt, which itself follows a random walk. Thus ...

I recommend the New Palgrave, as a good place to go if you need to find a short article on any economics topic. My short answer to Yiangos's question is that econometricians are always engaged in building linear models in which some variables are explained by other variables (in some sort of regression model). As the values of variables become known over time one wants to update estimates of other variables. The machinery for doing this is provided by the Kalman filter. Notice that the Kalman filter does not have anything to do with the Q assumptions of our LQG model. It is only the Linear Gaussian parts that are relevant.

I tried a Google search for "Kalman Filter and finance". Many things turn up. For example, here is a talk, Kalman Filtering in Mathematical Finance.

Session 8

I have added to the notes, on page 77, the argument by which we can verify that in the fishing problem the value function $F(x)$ is concave.

The homework for next week is just three questions: the first three on Problems Sheet 4. See hints below.

The name "Kalman filter" refers to the estimation equation (16.1) and takes its name from Rudolf Kalman (1930 –), who developed it in the years 1958-64. He also coined the terms controllable and observable, and gave the criteria that we have seen in previous lectures. The fact that a system is controllable iff the matrix $[B\ AB\ \cdots\ A^{n-1}B]$ is of full rank is sometimes called Kalman's criteria. In the IEEE biography of Kalman it is stated

The Kalman filter, and its later extensions to nonlinear problems, represents perhaps the most widely applied by-product of modern control theory. It has been used in space vehicle navigation and control (e.g. the Apollo vehicle), radar tracking algorithms for ABM applications, process control, and socioeconomic systems.

The theory in this lecture is admittedly quite tricky - partly because the notation. As a test of memory, can you say what roles in the theory are taken by each of these?

$x_t$, $\hat x_t$, $y_t$, $u_t$, $\epsilon_t$, $\eta_t$, $\Delta_t$, $\xi_t$, $\zeta_t$, $A$, $B$, $C$, $R$, $S$, $Q$, $K_t$, $\Pi_t$, $N$, $L$, $M$, $H_t$, $V_t$.

You will understand the ideas better once you have worked through the details of a scalar example (in which $n=m=p=1$). You do this in Problems Sheet 4 Question 1. When you do this question, start by supposing that $\hat x_t=\hat x_{t-1}+u_{t-1}+h_t(y_t-\hat x_{t-1})$, and then find the value of $h_t$ that minimizes the variance of $\hat x_t$. You can start by subtracting $x_t=x_{t-1}+u_{t-1}+3\epsilon_t$ and using $y_t=x_{t-1}+2\eta_t$. I also think that Question 2 is helpful in gaining an appreciation of the duality between control and estimation problems. One of the things you will learn from this question is a sufficient condition for the variance of $\hat x_t$ (that is, $V_t$) to tend to a finite limit. Question 3 is about solving a HJB equation (easier than the fishing example).

Notice that the Riccati equation for $V_t$, i.e. $V_t = g\, V_{t-1}$ runs in the opposite time direction to the one we had for $\Pi_t$ in Chapter 13, where $\Pi_{t-1} = f\, \Pi_t$. We are given $V_0$ and $\Pi_h$.

Thursday, February 28, 2013

Coda on the puzzle

Here is an amusing sporting fact that resonates with our puzzle in which a decision that is the same under conditions of A true and A false, may not be the right decision when the truth status of A is unknown.

John Howard (London School of Economics) tells me that in cricket it can happen that a batsman attempts to drive the ball but appears to miss (or he may have snicked it). The ball then hits his pad, bounces off, and is caught by a slip fielder. There is an appeal for lbw (leg before wicket). The umpire would have given him out "lbw" if he was sure he had not hit the ball. He would be out "caught" if he had hit the ball. But a batsman cannot be given out unless a definite reason is given, and if the umpire is not sure which of these two it was, then batsman is not out.

You can read more about this in the article, Ricky Ponting and the Judges, by Ian Rumfitt, who writes:

"In the first innings of the final Ashes Test of 2009, Ricky Ponting faced a ball which was deflected, off something, into the wicket-keeper’s hands. The English XI appealed, and in the agonizingly long time that it took Asad Rauf to decide, Jonathan Agnew (commentating on Test Match Special) reasoned as follows: ‘Either the ball hit Ponting’s bat or it hit his pads. If it hit his bat, he is out caught behind. If it hit his pads, he is out lbw. So, either way, he is out’. Rauf, however, appeared to disagree with Agnew’s reasoning, and Ponting stayed at the wicket."

Monday, February 25, 2013

Stabilizability - a personal anecdote

I post this on the evening of my 60th birthday. My birthday began with Session 7 (about controllability and stabilizability) with a class of Ph.D. students at London Business School. When I was born we lived just north of Regent's Park, in Acacia Road, St John's Wood, which is just 0.9 miles from London Business School's location. I find it remarkable to have returned so close to my geographic origins after 60 years.

Who solved the Secretary Problem?

I thought you might enjoy this.

In his paper (Who Solved the Secretary Problem? Statist. Sci., Volume 4, Number 3 (1989), 282-289) Thomas Ferguson tells the following funny story.

"When the celebrated German astronomer, Johannes Kepler (1571-1630), lost his first wife to cholera in 1611, he set about finding a new wife using the same methodical thoroughness and careful consideration of the data that he used in finding the orbit of Mars to be an ellipse. His first, not altogether happy, marriage had been arranged for him, and this time he was determined to make his own decision. In a long letter to a Baron Strahlendorf on October 23, 1613, written after he had made his selection, he describes in great detail the problems he faced and the reasons behind each of the decisions he made. He arranged to interview and to choose from among no fewer than eleven candidates for his hand. The process consumed much of his attention and energy for nearly 2 years, what with the investigations into the virtues and drawbacks of each candidate, her dowry, negotiations with her parents, natural hesitations, the advice of friends, etc. The book of Arthur Koestler (1960) contains an entertaining and insightful exposition of the process. The book of Carola Baumgardt (1951) contains much supplementary information.

Suffice it to say that of the eleven candidates interviewed, Kepler eventually decided on the fifth. It may be noted that when $n = 11$, the function $\phi_n(r)$ of (2.1) takes on its maximum value when $r = 5$. Perhaps, if Kepler had been aware of the theory of the secretary problem, he could have saved himself a lot of time and trouble." Fortunately,

"His new wife, whose education, as he says in his letter, must take the place of a dowry, bore him seven children, ran his household efficiently, and seems to have provided the necessary tranquil homelife for his erratic genius to flourish."

Ferguson has many other interesting things in his paper. It created some controversy and amusement when published because his answer to the question "Who Solved the Secretary Problem?" is "Nobody"! He then goes on to "solve" it for the first time! (at least a particular version he likes). He also says:

"As historians, we should take as the secretary problem, the problem as it first appeared in print, in Martin Gardner's February 1960 column in Scientific American, where it was called the game of googol and described as follows.

Ask someone to take as many slips of paper as he pleases, and on each slip write a different positive number. The numbers may range from small fractions of 1 to a number the size of a googol (1 followed by a hundred 0's) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that you guess to be the largest of the series. You cannot go back and pick up a previously turned slip. If you turn over all slips, then of course you must pick the last one turned.
The astute reader may notice that this is not the simple form of the secretary problem described in Section 2. The actual values of the numbers are revealed to the decision maker in violation of condition 4. Also, there is this "someone" who chooses the numbers, presumably to make your problem of selecting the largest as difficult as possible. The game of googol is really a two-person game."

Proof by Genie

Sometimes arguments like that used in the puzzle can be useful in proofs. Remember, I told you about the Find the Greatest problem? In their paper, An optimal strategy for a conflict resolution problem, (Venkat Anantharam and Pravin Varayia, Systems and Control Letters, 1986, pp. 329 -332), the authors make an argument like this:

We postulate the following genie:

The genie tells us which of the sets $A$ and $A^c$ ...

I omit the detail (see their paper if you like). I focus here on the idea that the genie is providing extra information that can make the optimal policy easier to figure out. The authors continue:

By postulating a genie we mean that we permit ourselves to use different strategies on events on which the genie gives us different answers. Clearly we can do no better without the genie than we can with it.

They then show that given the information provided by the genie, the optimal policy is one that could have been implemented without having that information, and which takes a special form. So they can conclude the optimal policy (without the genie) must also take this special form (i.e. that it suffices to ask questions of the simple form "Whose number is bigger than $x$?").

We see, however, that one has to be careful with this type of proof. In the puzzle we are drawn into thinking about a genie who tells us whether A is true or false. At issue is the fact that the policy which is optimal, given what the genie tells us about A, may have identical $u_0$, whatever the genie says, but this is only because we can act differently for $u_1, u_2, \dotsc$.

Answer to the puzzle

The intuitive answer to this puzzle is yes ($u_0=1$ is optimal). That might make you suspicious that I am going to argue for the non-intuitive answer, no ($u_0=1$ is not necessarily optimal). I wonder if you will like my answer, or think I am cheating?

I said nothing about $u_0$ being the only decision. I make no apology, because these lectures have been all about taking a sequence of decisions (or controls) $u_0,u_1,u_2,\dotsc$ over time. That possibility should have been in your mind, and I was giving you a hint by denoting the decision as $u_0$ (the first of a sequence of decisions).

Suppose we are faced with a stopping problem in which $u_0=1$ means "continue" and $u_0=0$ means "stop". We should consider the possibility that by knowing that A is true, or by knowing A is false, it is optimal to continue. But if we do not know whether A is true or false then it is optimal to stop.

It is not completely obvious that this can happen. So let's see how to construct an example in which it does. Consider a version of the secretary problem in which we will see 3 candidates. They are "best", "middle" and 'worst" (B,M,W). Of course we can only compare them with one another, so if they are presented in the order M,B,W, we would see this as $(x_1,x_2,x_3)=(1, 1, 0)$, where $x_i=1$ if the $i$th candidate is the best so far. Suppose that the candidates will be presented in one of the of the following four orders, with consequent sequence of 0s and 1s, with the probabilities given:

B,M,W (1,0,0) with probability 0.2
M,B,W (1,1,0) 0.3
B,W,M (1,0,0) 0.2
W,M,B (1,1,1) 0.3

It is a bit artificial that all 6 possible orders are not equally likely, but for the purpose of this discussion we are not trying to think about a really practical problem.

We wish to maximize the probability of stopping on the last 1.

Now suppose the unknown "A" is whether "the worst candidate will be seen last".

If "A" is true it will be 1 or 2, but if A is false it will be 3 or 4.

You can now verify that given A is true it is optimal not to stop on the first 1 ($u_0=1=$ "continue"). The same decision is optimal if A is false. In both cases we win with probability 0.6.

However, if we do not know whether A is true or false, then if is optimal to stop on the first 1 $(u_0=0$, and win with probability 0.4). If we do not stop we will with probability 0.6 reach 1, 1 and from there can win with probability of only 0.5. So our win probability is only 0.3.

Do you think this example is a cheat (in that it does not really match up with the way I posed the original puzzle), or have you learned something interesting?

I was thinking about this recently in connection with a similar question. I wonder if any of you can invent a neater example to illustrate this point, or one with a better "back-story".

The question I have been thinking about concerns an application of Bruss's odds theorem in circumstances that we do not know the odds. Recall that if $X_1,\dotsc,X_n$ are independent, with $X_i=1$ or $0$ with probabilities $p_i$ and $q_i=1-p_i$, then we maximize the probability of stopping on the last 1 if we stop at the first 1 that we find amongst $X_s,X_{s+1},\dotsc,X_n$, where $s$ is the greatest integer $i$ such that

$\frac{p_i}{q_i}+\cdots+\frac{p_n}{q_n}\geq 1$.

Now suppose we do not know the actual values of $p_1,\dotsc,p_n$. But we have seen $X_{i-1}=1$ and somehow know that $p_i/q_i+\cdots+p_n/q_n=1$. Can we say that it is optimal to stop? Yes. Since sum-of-odds $=1$ is the borderline case in the odds theorem is it also optimal not to stop? Yes.

But suppose we know only that $E[p_i/q_i+\cdots+p_n/q_n] > 1$. Is it optimal to stop?

Session 7

Subsequent to the class I have expanded my solution to question 3.2.

The linear-quadratic regulator (LGR) that I discussed today is one part of the solution to the linear-quadratic-Gaussian control problem (LQG). The LQG problem is perhaps the most important problem in control theory. It has a full solution: given in terms of the Kalman Filter (a linear-quadratic estimator) and the linear-quadratic regulator.

Following today's lecture you can do the assignment of Questions 4-7 on Problem Sheet 3. Here is an important hint. Do not try to solve these problems by plugging in values to the general Riccati equation solution in equations. It is always better to figure out the solution from scratch, by the method of backwards induction. Conjecture for yourself that the optimal value function, $F(x,t)$, is of a form that is quadratic in the state variable, plus some $\gamma_t$ (if it is a problem with white noise). Then find the explicit form of the recurrences, working backwards inductively from a terminal time $h$.

In general, solutions to a Riccati equation can be computed numerically, but not algebraically. However, one can find an full solution in the special case that $x_t$ and $u_t$ are both one-dimensional (Question 4). There is also a fully-solvable special case in which $x_t$ is two-dimensional and $u_t$ is one-dimensional (Question 6). A useful trick in some problems is to observe that $\Pi_t^{-1}$ satisfies a recurrence relation (Question 6).

No one enjoys memorizing the Riccati equation. The important thing is to remember enough of what it is for, how it looks, and how it is derived, so that you could reconstruct its derivation in the context of a particular problem.

The idea of controllability is straightforward and we can understand it using ideas of linear algebra.

Consider again the broom balancing problem. I mentioned that two upright brooms are not controllable around their equilibrium simultaneously if they are the same lengths, but it is possible if the lengths are different. This is because of the forms:

$A=\begin{pmatrix}0 & 1 & 0 & 0\\ \alpha & 0 & 0& 0\\ 0 & 0 & 0& 1\\ 0 & 0& \beta & 0\end{pmatrix}\quad B=\begin{pmatrix}0 \\-\alpha\\0\\-\beta\end{pmatrix}\quad M_4=\begin{pmatrix}0 & -\alpha & 0 & -\alpha^2\\ -\alpha & 0 & -\alpha^2& 0\\ 0 & -\beta & 0& -\beta^2\\ -\beta & 0& -\beta^2 & 0\end{pmatrix}$

where $\alpha = g/L_1$, $\beta=g/L_2$. So $M_4$ is of rank 4 iff $\alpha$ and $\beta$ are different. However, this assumes that as you move your hand you must keep it at constant height from the ground. What do you think might happen if you can move your hand up and down also?

Sunday, February 24, 2013

Control of queues

In Section 10.3 we looked at a problem of controlling the arrival rate at an $M/M/1$ queue. Question 3 on Problem Sheet 3 is about controlling the service rate. In both cases, I was using this to illustrate the policy improvement algorithm. In 8.7 I introduced the topic of optimal scheduling in a multi-class $M/G/1$ queue.

Problems about controlling service and/or arrivals, and scheduling, in single queues, or networks of queues, crop up again and again in OR models in communications, supply chains, etc. There is a vast research literature. I have written a few papers about this myself, such as

R. R. Weber and S. Stidham, Jr. Optimal control of service rates in networks of queues. Adv. Appl. Prob. 19, 202-218, 1987.

S. Stidham, Jr and R. R. Weber. Monotonic and insensitive optimal policies for the control of queues with undiscounted costs. Operations Research 37, 611-625, 1989.

S. Stidham and R. Weber. A survey of Markov decision models for control of networks of queues. Queueing Systems 13, 291-314, 1993.

My co-author of those papers, Sandy Stidham, has a huge knowledge of this field. He written a nice article containing 349 references, which gives a bird's eye overview of queueing control and related topics entitled, Applied Probability in Operations Research: A Retrospective, 2001.

If in your own research you ever find that you have reason to want to know more about this subject then I suggest you begin by reading Sandy's retrospective. Or take a look at one of the papers above. The first one is about a problem in which both service and arrival rates can be controlled within a simple network of queues.

Friday, February 22, 2013

A Puzzle

Here is a puzzle to test your understanding and intuition.

You are faced with a problem in which you need to make a decision. For example, you might be choosing $u_0$. You can choose $u_0=0$ or $u_0=1$. One of these is optimal (better than the other).

There is something "A" which you do not know. It could be either true or false.

If you were to know A is true then $u_0=1$ is definitely optimal (i.e. better than $u_0=0$).
If you were to know A is false then $u_0=1$ is also definitely optimal.

Can you therefore conclude that $u_0=1$ is optimal?

See Answer to the puzzle, Proof by Genie, Coda on the puzzle, and Proof by Genie, continued.

Monday, February 18, 2013

Homework

The homework assignment due on 25/2 is Problem Sheet 3: 1-3. I have already given some hints on doing these questions in the blog for Session 5.

Session 6

In today's session we looked at some more advanced topics, which previous sessions have prepared us to understand. I focused on some problems that I have studied in my own research.

With hindsight, I may have been too ambitious in imagining what I might cover this morning. I hope that my notes will help fill in gaps if you wish to learn more.

In the end, we covered the notion of Whittle indices for restless bandits. We saw that the policy they define is optimal, or near optimal, in the limit that the number of identical bandits $n$ tends to infinity. This was by the fact that in large-$n$ systems the proportions of bandits in the states have a "fluid model", which is linear within polyhedral regions.

In the second part we looked at stochastic sequential allocation and assignment problems. The assignment problem (SSAP) has a surprising answer: that the optimal strategy can be described in terms of thresholds that do not depend on the values of the $p_i$. The proof is cute and is worth understanding.

I mentioned Rhoda Righter's nice treatment of a SSAP with arrivals, in which the incentives for individual (selfish) behaviour by workers can be made to coincide with the socially optimal behaviour. This is also true in the SSAP without arrivals, if the jobs arr offered to the workers in decreasing order of their $p_i$. For example, with $n$ workers present, the $i$th will accept the job iff it has been declined by workers $1,2,\dotsc,i-1$ and then is value $X$ is at least $a_{i,n}$, which is the expected value of the job that will be assigned to $i$th worker if she rejects this currently offered job. It will be declined by the previous workers precisely if $X < a_{i-1,n}$.

There was not time to discuss all problems. I did not fully discuss all the solutions to Problem Sheet 2 5-7. The model solutions are now on the web site.

Some reference about the Bomber Problem and its relatives are in the previous post. The slides I used to talk to are here.

ABCs of the Bomber Problem

I briefly spoke today about the infamous Bomber Problem. If you would like to read more, see ABCs of the Bomber Problem and its Relatives, Weber, 2011. There is also a talk about this here. This problem has been open for over 35 years, and has been said by researchers who know it to be highly additive. It is very infuriating that for a problem with such a very simple dynamic programming equation an "obvious" fact cannot be proved or disproved.

$F(n,t)$ is the probability that a bomber plane, which is potentially attacked by enemy fighters $t$ times, can successfully defend itself against all such attacks, when starting with a supply of $n$ air-to-air missiles. The probability of surviving an attack by using $k$ missiles is $(1-\theta^k)$, where $0 < \theta < 1$.

$F(n,t) = q F(n,t-1) + p \max_{k\in\{1,\ldots,n\}}(1-\theta^k)F(n-k,t-1)$

$F(n,0)=1$

Conjecture B is that the maximizing $k$ is nondecreasing in $n$.

Many problems very similar to this are easy to solve, such as the stochastic sequential investment problem (of Derman Leiberman and Ross, described on page 34 of notes). Typically the proofs methods are similar to that in Problem 7 of Problems Sheet 2.

Klinger and Brown's 1968 paper is here.

Saturday, February 16, 2013

Session 5

You will find that the policy improvement algorithm is useful in answering Problems 1 and 2 on Problems Sheet 3. In Problem 1, you will find $\lambda$ and $\phi$ for a certain policy ("On seeing a filling station, stop and fill the tank"), and then look for a condition that the policy improvement algorithm will not improve it (or, equivalently, that this $\lambda$ and $\phi$ satisfy the optimality equation.

In Problem 2 you will use policy improvement idea in the context of a discounted-cost problem. You find $F(x)$ for a simple policy (in which the engineer allocates his time randomly), and then improve it by a step of the policy improvement algorithm. This leads to a policy in which the engineer puts all his maintenance effort into the machine with greatest value of $c_i(x_i+q_i)/(1-\beta b_i)$. This policy is better, but may not yet be optimal.

I have a personal fondness for some things I talked about the second half today. Theorem 10.1 (and extensions of it to other distributions beyond exponential) was one of the things that I proved in my Ph.D. dissertation in 1980. The Lady's Nylon Stocking Problem was a fairly famous unsolved problem. It gave me something to talk about with non-mathematically inclined fellows when I first became a research fellow. The paper is Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flow time. J. Appl. Prob. 19, 167-182, 1982.

The method of uniformization that I described in this lecture seems to have been introduced by Jensen (1953), but it was first used extensively in solving queueing problems in the late 1970s (by Grassman (1977) and Keilson (1979)). Here is a quote from a recent 2011 paper by Rhonda Righter, Stochastic sequential assignment problem with arrivals, Probability in the Engineering and Informational Sciences, 25, 2011, 477–485:

Proof. We consider the equivalent discrete-time problem by uniformizing with uniformization rate $\lambda + \gamma + \alpha = 1$, and we use value iteration (i.e., induction on a finite time horizon, $n$).

The Wikipedia article has more about it and summarises the method in these words:

The method involves the constructions of an analogous discrete time Markov chain, where transitions occur according to an exponential distribution with the same parameter in every state.

I have used this idea many dozens of times in my own research. It is usually the best first step to make when tackling a continuous-time Markov decision problem.

Tuesday, February 12, 2013

Scheduling in the Multiclass M/G/1 queue

Here is a bit more about this for those who are interested. Shyam has asked me about this section of the last lecture.

The result in Section 8.7 is originally due to Klimov in this paper: Klimov, G.P. (1974). Time-sharing service systems I, Theory Probab. Appl.,19, 532–51.

However, you will find easier to read something like this: On the Gittins index in the M/G/1 queue, Aalto, Ayesta and Righter, Queueing Systems, 63, 437-458, 2009, and also to see that researchers are still thinking about it even 35 years later!

Abstract: For an M/G/1 queue with the objective of minimizing the mean number of jobs in the system, the Gittins index rule is known to be optimal among the set of non-anticipating policies. We develop properties of the Gittins index. For a single-class queue it is known that when the service time distribution is of type Decreasing Hazard Rate (New Better than Used in Expectation), the ...

Perhaps you wondered about my calculation of a Gittins index as

$\frac{r_i + \alpha t_i r_1}{t_i + α t_i t_1}$

This is when we are trying to find the bandit of second highest priority after job class 1. In calculating its index in an undiscounted case we will be wanting to maximize the quantity of (reward up to $\tau$)/$\tau$. The stopping time $\tau$ will be the first time that we have again cleared the system of type 1 jobs, i.e. all jobs of priority greater than $i$ (assuming hypothetically for now that $i$ is the job type of second greatest index).

Although it is not actually needed, we can value of $\alpha$ as follows. It is the length of a busy period of type 1 jobs that is created while we process a single type $i$ job.

Processing the type $i$ job takes time $t_i$. During that time suppose $M$ type 1 jobs arrive. To clear one of these, and other type 1 jobs that arrive while doing so (and while doing thoses, etc) takes an expected time $x_1$, where

$x_1 = t_1 + (\lambda_1 t_1) x_1$

Think about this. While we process one class 1 job, other class 1 jobs arrive, the expected number of which is $\lambda_1 t_1$. So $x_1 = t_1/(1-\lambda_1 t_1)$. So we will take time $M x_1$ to clear type 1 jobs after after finishing a type $i$ job. The expected value of $M$ is $\lambda_1 t_i$. So

$\alpha = \lambda t_1/(1-\lambda_1 t_1 )$.

We also get a $r_1$ for each type 1 job cleared, and so this is why the numerator has $\alpha t_i r_1$.

A similar calculation will be carried out when we have identified the job class of second highest priority, say class 2, and and are now looking for the one of third highest priority. Now $\alpha$ will be determined by the time needed to clear all jobs of classes 1 and 2.

Monday, February 11, 2013

Session 4

In today's lecture I used a portion of my slides from a talk on the Gittins Index. You may enjoy looking through the entire talk. It will look better if you download the .pdf file to your computer and read the presentation with a .pdf viewer such as acrobat, rather than try to read within your browser.

I have slightly changed Problems sheet 2. The assignment for this week is to do questions 5-7.

Subsequent to today's lecture I have made some changes to sections 8.5 and 8.7, now using $C_i$ to mean the completion time of job $i$. I did not discuss Section 8.8 of the notes. This is about an application of the Gittins index theorem to a problem in which it at first does not seem to apply because the states of the bandits that are not continued are not frozen.

It is remarkable how many interesting problems can be recast as bandit problems. We saw today several ways to massage some problems into the form of bandit processes (Weitzmas's problem, target and tax problems, and scheduling in a $M/G/1$ queue). In particular, we saw that the $\beta\to 1$ limit can be meaningful. You will explore this a bit in Problem 6 on Problems Sheet 2.

The proof of the Gittins index theorem is actually fairly easy. Yet it is one of the most beautiful results in the field of Markov decision processes. Its discovery and proof in 1974 is due to John Gittins. The proof I have given in today's lecture is very different to Gittins's original proof. It was first presented in my paper On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 1024-33, 1992.

Monday, February 4, 2013

Session 3

I started today by talking about the solution to the burglar problem. This is a subtle question and a very good one for testing your understanding. My solutions for the problems are now on the web site.

I said more about the problem with which we ended Session 2, namely, the famous unsolved problem concerned with searching for a randomly moving hidden object. But I realized later that I forgot to make the comment that this is an example of a partially observed Markov decision process (POMDP). In a POMDP the decision-maker cannot directly observe the underlying state. Instead, he must maintain a probability distribution over the set of possible states, based on his observations, and the underlying MDP. This distribution is updated using the usual Bayesian calculations. That is what we did by defining $x_i=P(\text{object is in location $i$})$.

I mentioned that this problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state $x_1$ (the probability that the object is in location 1) changes continuously, rather than in jumps.

Let me emphasize that in Session 2 (Chapter 4), we used $F_s(x)$ to mean the value function of a MDP over $s$ steps, with $F_0(x)=0$. In the case of $c(x,u)\geq 0$, $F_s(x)$ is clearly nondecreasing in $s$. By contrast, in this lecture, $F_s(x)$ was used to mean the minimum cost in a stopping problem in which we must stop within $s$ steps, and $F_0(x)=k(x)$. $F_s(x)$ is now clearly nonincreasing in $s$. That is because as $s$ increases we are being given more flexibility as to when we must stop. In both cases $F_s(x)$ satisfies an optimality equation and $F_s(x)\to$ a limit, say $F_\infty(x)$, but from a different direction.

In Lecture 5 we looked at some stopping problems, which are a type of problem that crop up all over the place. Those of you working in economics and finance are almost certainly familiar with the theory of option pricing (Black-Scholes). The holder of an American option is allowed to exercise the right to buy (or sell) the underlying asset at a predetermined price at any time before or at the expiry date. The valuation of American options is essentially an optimal stopping problem.

You can read more about Bruss's odds algorithm in his paper: "Sum the odds to one and stop" Annals of Probability, Vol. 28, 1384–1391, (2000). I showed you (and have now added to the notes) how to use this to address a Secretary Problem in which the candidates arrive in groups, of sizes $t_1,t_2,\dotsc,t_h$. This partially addresses a problem asked by Ken Chang about secretary problems in which candidates are heterogeneous.

I decided not to talk about Example 6.2. You might like to look at that in the notes. This is a classic problem, which was addressed by some of the mathematicians who did the earliest work on optimal stopping problems.

Let me finish with some hints for the first four questions of Problems Sheet 2. Question 1 can be solved by realizing that there are only two stationary Markov policies. You simple need to decide which is best. Question 2 is a simple stopping problem that can be solved using a OSLA rule. When you do Question 3 you might think about using Bruss's Odds Algorithm. Although you cannot apply the algorithm directly, you should be able to solve this problem using the same scheme of proof as we used to prove Bruss's result in 6.1. Question 4 is an important one because it should convince you that it is possible to solve stochastic dynamic programming equations using linear programming (at least if both state space and action space are finite.) It there are $n$ states and $m$ actions (controls) available in each state then we will have an LP in $n$ variables, with $nm$ linear inequality constraints.

Monday, January 28, 2013

Search for a Moving Target

In Section 4.5 I talked about a simple problem of searching for a moving target that moves back and forth randomly between two locations (e.g. think of a destroyer that hunting for a submarine that moves between two depths).

In the following more general form this problem has never been solved! Suppose that it costs $c_i$ to search location $i$. If the object (or target) is in location $i$ and we search there then we will find it with probability $\alpha_i$. If we do not find it, the the object moves according to some two state Markov chain with known transition matrix $P=(p_{ij})$. The aim is to minimize the expected cost spent searching until we find the object.

Obviously we can always compute $p_t$, the probability that the object is in box 1, given the fact that we have made certain searches and not yet found it. Everyone believes that the optimal strategy is of the form "search box 1 iff $p_t\geq p^*$ for some $p^*$." The number $p^*$ will depend on all the parameters in the problem.

The optimality of this form of policy has been proved for the special case of $c_1=c_2$. However, it remains unproved for general $c_1,c_2$.

Problems 1-4

I discussed the solutions to Problems 1-4. The text of my model solutions is now on the web site, under the "Resources" heading in the right hand column.

These are fairly straightforward questions. I think that one of the most important things at this stage is to become comfortable with formulating optimality equations. We've now seen quite a lot of examples. Some have been a little non-standard, like the multiplying matrices question, and job scheduling.

Next time we will discuss Problems 5-8, and I will hand out Problems sheet 2.

Session 2

I started today by showing you the centipede game. The analysis of this game is by a backward induction, similar to that which we use in finite-horizon dynamic programming.

In this class we had an important theorem (Theorem 3.1). It had an easy but nontrivial proof. Theorem 3.1 states that $F(x)$ satisfies a DP equation (3.7). It holds under the assumption of D (discounted), N (negative) or P (positive) programming. I explained what these three assumptions mean.

I gave an example, $c(x,u) = x$, $a(x,u) = –x$, $\beta = 1$, for which $F(x)$ does not exist. Note that a problem with this data fails to satisfy D, N or P.

The example in Section 3.5 (of selling a tulip bulb collection) is very much like the secretary problem in Section 2.3. The difference is that now (i) we observe values (not just ranks), (ii) wish to maximize the expected value of the selected candidate (rather than probability of choosing the best), and (iii) the number of offers is infinite, but with a discount factor beta.

One important way in which discounting by a factor beta can naturally arise is when a catastrophe can occur, with probability $1–\beta$, bringing the problem to an end. We discussed the fact that the policy does not change if offers remain open.

In Section 4.3 (optimal gambling) we saw that timid play is optimal in the gambling problem when the game is favorable to the gambler ($p \geq 0.5$).

Consider this question: "A gambler has £2 and needs to increase it to £10 in a hurry. The gambler decides to use a bold strategy in which he stakes all his money if he has £5 or less, and otherwise stakes just enough to increase his capital, if he wins, to £10."

When $p \leq 0.5$ bold play maximizes the probability that the gambler reaches N pounds. However, bold play may not be uniquely optimal.

When it was first discovered that bold play is not uniquely optimal, it was contrary to the accepted wisdom at that time (1955). It impressed Leonard Savage when it was presented to him by a Ph.D. student named Lester Dubins They developed a collaboration culminating in the famous monograph How to Gamble if You Must (Inequalities for Stochastic Processes). See the fourth paragraph of this obituary of Lester Dubins.

If $p=0.5$ all strategies are optimal. How could we prove that? Easy. Simply show that given any policy $\pi$ the value function is $F(\pi,i)=i/N$ and that this satisfies the DP equation. Then apply Theorem 4.1. You can read more about these so-called red and black games at the Virtual Laboratories in Probability and Statistics.

In Section 4.5 (pharmaceutical trials) I introduced an important class of very practical problems. One obvious generalization is to a problem with $k$ drugs, each of which has a different unknown probability of success, and about which we will learn as we make trials of the drugs. This is called a multi-armed bandit problem. The names comes from thinking about a gaming machine (or fruit machine), having $k$ arms that you might pull, one at a time. The arms have different unknown probabilities of generating payouts. In today's lecture we considered a special case of the two-armed bandit problem in which one of the arms has a known success probability, $p$, and the second arm has an unknown success probability, $\theta$. I will say more about these problems in latter classes. The table on page 18 was computed by value iteration. This table is from the book Multi-armed Bandit Allocation Indices (of which I am one of the authors). The amazon.co.uk page will let you look inside this book.

Monday, January 21, 2013

Find the greatest

I mentioned today a problem of minimizing the expected number of questions required to identify the greatest of $X_1,\dotsc,X_n$, where these are i.i.d. $U[0,1]$.

We pick a set $A(1) \subset [0, 1]$ and ask which $X_i \in A(1)$. Depending on the response, we pick another subset $A(2)$ and ask which $X_i \in A(2)$, and so on, until we identify the largest $X_i$.

It turns out that it is optimal to ask questions only of the special form "Which $X_i$ are bigger than a(1)?", "Which $X_i$ are bigger than a(2)", etc.

If you would like to see how this is proved, the paper is An optimal strategy for a conflict resolution problem, by V. Anantharam and P. Varayia, Systems and Control Letters, 1986, pp. 329 -332.

A fact about this problem which is initially surprising is that the minimum expected number of questions required remains bounded as $n\to\infty$. Just make the first question of the form "Which $X_i$ are bigger than n/(n+1)?"

Problems sheet 1

I handed out Problems sheet 1 this morning and as Assignment 1 I have asked students who are doing the course for credit to do problems 1-4 this week. Of course I hope other participants will also enjoy the questions. I have chosen my selection of questions quite carefully, as ones that are pedagogically helpful to do.

You may hand-write your solutions, or write them up in LaTeX. It's an essential to know LaTeX if you are doing anything at all mathematical. Almost all research papers in MS/OR are prepared that way. So this might be a good opportunity to practice if you have not done so already.

If you are able, you can scan your pages, or prepared a .pdf file that can be emailed to me. I do not insist on this, but if you can email your work to me in the afternoon on Sunday, then I can probably return your work to you at our class on Monday.

Next time I will talk briefly about the solutions to problems 1-4. Some people may want to also do Problems 5-8. Problem #7 needs a special theorem that will come next time. You may, however, be able to guess its answer.

Session 1

I will be using this blog space to talk about some extra things. Sometimes upon leaving a lecture I think, "I wish I had said ...". This blog gives me a place to say it. Or I can use this space to talk about a question that a student has asked during a lecture.

There is a commenting space after each post for you to write comments or ask questions using the IntenseDebate commenting system. Please do not be shy to do so, as other students will probably benefit.

Please notice that I am writing this blog using LaTeX, which is interpreted into a pretty form by MathJax. I discovered MathJax a short while ago, and am impressed how well it works in a web page. However, your browser must be running Javascript for it to work.

In the first session I gave you definitions and notation for talking about state, control, history, value function, etc, and in which to develop dynamic programming equations for a very general case, a state-structured case, and the Markov decision process case. Please be patient with the notation. It is not as complex as it may first appear. Things like $a(x,u,t)$, $F(x_t,t)$, $u_t$, and $U_t$ will become old friends once you have used them a few times.

From this first session you should be taking away the key idea of dynamic programming, and the fact that problems in stages (with separable cost and a finite time horizon) can often be solved by working backwards from the final stage. The minimum length path (stage coach) problem is trivial, but should have made these ideas very intuitive.

As an example of a problem in which the optimal policy should be determined by working backwards from the end, I mentioned that I had once appeared on ITV's Who Wants to be a Millionaire (October 2003). I once wrote a nice Cambridge exam question on this, including the model solution. You might like to look at this now – simply to see the sort of mathematical problem that this course will enable you to solve. You can also view the overhead slides for a little presentation called A Mathematician Plays "Who Wants to Be a Millionaire?" which I once gave to a Royal Institution workshop for school children. You might like to see how I made best use of my "Ask the audience" lifeline by employing an idea from statistics. Basically, I asked members of the audience not to vote if they felt at all unsure of the right answer.

In the second hour today I discussed three examples, the last of which was the Secretary Problem. This is the most famous problem in the field of optimal stopping. It is credited to Merrill M. Flood in 1949, who called it the fiancée problem. It gained wider publicity when it appeared in Martin Gardner's column of Scientific American in 1960. There is an interesting Wikipedia article about it. One of the interesting things said in this article is that in behavioural studies people tend to stop too soon (i.e. marry too soon, make a purchase too soon). See The devil is in the details: Incorrect intuitions in optimal search.

The optimal solution to the secretary problem can also be found as a special case application of Bruss's odds-algorithm, due to Thomas Bruss.

Variations of the secretary problem can be created by changing the assumptions about whether or not, in addition to relative ranks, there are values $X_1,\dotsc,X_h$ can be observed, and whether the objective is to maximize the probability of choosing the best candidate, to maximize the expected rank, or to maximize the expected value of the candidate selected.

A variation that has never been completely solved is the so-called Robbin's Problem. In this problem we do observe values of candidates, say $X_1,\dotsc, X_h$, and these are assumed to be independent, identically distributed uniform$[0,1]$ random variables. The objective is to maximize the expected rank of the candidate that is selected (best = rank 1, second-best = rank 2, etc). It is known only that, as $h$ goes to infinity, the expected rank that can be achieved under an optimal policy lies between 1.908 and 2.329. This problem is much more difficult that the usual secretary problem because the decision as to whether or not to hire candidate $t$ must depend upon all the values of $X_1,\dotsc, X_t$, not just upon how $X_t$ ranks amongst them.

At the end of the class today one of you asked me if there are any interesting problems in which candidates are not homogeneous. I have not thought about this before, but it sounds like it could be interesting. We might suppose. for example that there are 8 candidates. Their ranks are determined by the ranks of some hidden i.i.d. $N(\mu_i,1)$ random variables,where the $\mu_i$ are known to us. At each stage we must choose which candidate we want to interview next. Clearly we should be able to do better than with 8 homogeneous candidates ($\mu_1=\cdots=\mu_8$). I wonder, how much better, and how? Is it better to start off by seeing a strong candidate (large $\mu_i$) or a weaker candidate (small $\mu_i$)? Of course this is just one possibility way to model non-homogeneous candidates. What other models might we make for non-homogeneous candidates? As MS/OR people we find it natural to think about tweaking models until we find one that is both solvable, and interesting.

Course starts January 21, 2013

Our sessions will be in Room ALH-304

Monday 21 January, 08.00 – 10.40.
Monday 28 January, 08.00 – 10.40.
Monday 4 February, 08.00 – 10.40.
Monday 11 February, 08.00 – 10.40.
Friday 15 February 15.45-18.30.
Monday 18 Febraury, 08.00 – 10.40.
Monday 25 Febraury, 08.00 – 10.40.
Monday 4 March, 08.00 – 10.40.
Monday 11 March, 08.00 – 10.40.
Friday 15 March 15.45-18.30.

There is an extra session in reserve, if needed:
Monday 18 March 08.00 – 10.40.

and
Friday 22 March 15.45-18.30 FINAL EXAM