Dynamic Programming and Optimal Control: March 2013

Friday, March 15, 2013

Session 10

I have only been able to say a few things about Brownian motion and the Wiener process. The The Wikipedia articles referenced in the previous sentence are good and contain some nice video demonstrations of things like self-similar scaling. I reviewed only the very basics, but enough for us to develop a simple stochastic differential equation that corresponds to the optimality equation of dynamic programming in the case of continuous time and a continuous state space. Although Brownian motion is a subtle thing, one can infer (or at least guess) many of its properties by thinking about the simple discrete time symmetric random walk, of which it is a limit.

There are many interesting things to be said about Brownian motion. Paul Levy's Forgery Theorem for Brownian motion, implies that in two dimensions almost surely every arc of the path contains the complete works of Shakespeare in the handwriting of Bacon.

I mentioned in the lecture that it is initially surprising to see that in Example 19.4 the cost is greatest when there is no noise. This is a little bit counterintuitive and so we should try to understand why this is. I guess it is because $C_0$ and $C_1$ are almost the same, and $L$ is small compared to $Q$, and so we can let the noise take us to the endpoints, only expending minimal effort (which is costed at $Qu^2$). I guess that if $C_0$ were a lot less than $C_1$, or if $L$ were bigger compared to $Q$, then noise will not be helpful and we would expect Figure 4 to look different.

The last part of today's session was about Risk-sensitive optimal control. I have corrected the small sign errors that were in the notes. If you would like to learn more about this then I recommend this paper:

P. Whittle. Risk-sensitivity, large deviations and stochastic control, European Journal of Operational Research 73 (1994) 295-303.

I think it is the easiest one to start with. There are other references given at the end of this paper.

Thursday, March 14, 2013

Proof by Genie, continued

I am interested in collecting more examples of Proof by Genie, as mentioned in a previous post. If you come across an example of this, please let me know.

I've now remembered another example of proof by genie, actually from one of my own recent papers. Two people are trying to find each other on the vertices of a triangle. Initially they start at two different vertices and at each step $t=1,2,\dotsc$ each player stays or moves to another vertex, trying to minimize the expected number or steps until they meet. In working out the best symmetric rendezvous strategy it helps to assume that the players are given a common notion of what is "clockwise" on the triangle. However, once the optimal strategy is found it turns out that a common notion of clockwise is not needed to implement the optimal strategy. If you would like to learn more about this problem see

R. R. Weber, Optimal symmetric rendezvous search on three locations, Math Oper Res., 37(1): 111-122, 2012.

or seminar slides. One of the slides begins with the words "Suppose the problem is made a bit easier: the players are provided with the same notion of clockwise. (We will see later that this does not actually help the players.)"

Final exam

The course final exam has been rescheduled to Thursday April 25, in ALH304, from 09:30 to 12:30.

Tuesday, March 12, 2013

Broom balancing in the 21st century

Have a look at this link to see what optimal control can do when it is coupled to today's fast microprocessors and feedback controllers:

Quadrocopter pole acrobatics

Thanks to Felix Horns for finding this link.

Monday, March 11, 2013

Final homework

The final homework is Problems Sheet 4, Questions 4,5,7,8. (We have already done 6.) In all these questions the strategy is to use the necessary conditions of PMP to synthesize the optimal solution. What I mean by "synthesize" is that you write down all the information that PMP says is necessary (in terms of $u(t)$ maximizing, $H(x,u,\lambda)$ to $-\lambda_0(t)$, and $\dot \lambda=-\partial H/\partial x$, and any boundary and transversality conditions) until you convince yourself that the optimal policy can only be one thing. Question 5 is a bit like proving the shortest distance between two points is a straight line. The answer is fairly obvious, but we can prove it is correct by applying the necessary conditions of PMP to rule out the possibility of any other solution.

Please send the solutions to me by next Monday.

Session 9

The maximum principle that I talked about today is due to Lev Pontryagin. It is remarkable that despite being blind he was one of the greatest mathematicians of his generation. The key thing to grasp is that the PMP provides necessary conditions. We use the fact that an adjoint trajectory $\lambda$ exists to deduce properties of, or completely determine, the optimal control and optimally controlled trajectory. To my thinking, the PMP is notoriously badly explained in most books that it appears. I hope I have been able to make it seem more intuitive. I think that Lagrange multipliers give a helpful interpretation, as does differentiation of the infinitestimal version of the optimality equation.

The rocket car example is a celebrated problem of optimal control theory that is nicely solved using the PMP. There is a nice interactive demo of it that you can try. I am pleased to have finally tracked down the fact that this problem was first solved by D.W. Bushaw, Differential Equations with a Discontinuous Forcing Term, Ph.D. Thesis, Princeton, 1952.

In the obituary of Donald W. Bushaw (1926-2012) it is stated that "Don’s Ph.D. thesis is recognized as the beginning of modern optimal control theory."

The name of A. A. Feldbaum (a Russian) is also mentioned in connection with this problem which he solved at about the same time. Pontryagin came up with his maximum principle a few years later, 1956.

The final example today ended with an example of turnpike theory. This is an important concept in economics and is concerned with the optimal path of capital accumulation. Can you see intuitively why the optimal turnpike does not depend on the utility function $g$ of the agent? The label "turnpike" is due to Dorfman, Samuelson and Solow. I have mentioned it in the blog What makes for a beautiful problem is Science?

My personal experience of the power of PMP came in solving the problem about searching for a moving target that I have previously mentioned.

R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986.

The problem that is solved in this paper in continuous time is still an open problem in discrete time. That shows the power of PMP, in that with it we can solve problems in discrete time that cannot be solved in discrete time

Friday, March 8, 2013

What makes for a beautiful problem in Science?

In a paper with the title, What makes for a beautiful problem in Science? Paul Samuelson (Journal of Political Economy, Vol. 78, No. 6, 1970), writes:

What is the shortest distance between two points? A straight line. Can you prove it?

What is the longest distance between two points? Surely, no answer is possible. But consider two points a mile apart on a horizontal line in the plane. Suppose that the admissible paths cannot have an absolute slope exceeding one million. Then it is not hard to envisage that the longest distance is a bit over a million miles. But it takes the methods of Pontryagin and not those of the classical theory to handle such a problem in which the control variables are constrained within a closed set.

Following Session 9 you might like to see if you can use Pontryagin's maximum principle to prove what Samuelson claims. You must start deciding what he means by an admissible path. I expect he is thinking about a curve between $(0,0)$ and $(1,0)$ that is a graph $y(x)$ determined by $y'(x)=u(x)$, $0\leq x \leq 1$. I have added (sic) (for sic erat scriptum) because I think the word "not" should be deleted. He is trying to say that the maximum distance is a bit over a million miles, in fact $\sqrt{10^{12}+1}$. Samuelson goes on:

A scientific problem gains in aesthetic interest if you can explain it to your neighbor at a dinner party.

That no map requires more than four colors is such a problem. Can the turnpikes of Professor Chakravarty pass this test? In a system in which everything produces everything, a critical set of proportions at which balanced growth is most rapid seems reasonable. So the general notion of the von Neumann turnpike is conveyed. But suppose we begin away from this highway, and also are to end up away from it. Still, if the journey is to be a sufficiently long one, as seen we shall do well to move in the beginning toward that fast highway; catch a fast ride along it, staying so long there as to leave behind in all directions any rival who sticks to any slower path; then, toward the end of our journey, we can swing off the turnpike to our final rendezvous.

We will see this in Session 9 (the turnpike of economic growth, Section 15.6).

I cannot resist one more excerpt from Samuelson's paper, in which he muses on the way in which an optimal path may sometimes seem strange.

Begin with durable hammers and robots that can make new hammers or robots with the same symmetrical input requirements for the one as for the other. Each input is subject to familiar diminishing returns; but increasing both inputs proportionally leads to constant returns to scale. I begin with 10 robots and 5 hammers and must end up in minimum time with (at least) 1,000 robots and 5,000 hammers. How should I proceed? Surely, I begin by producing only hammers, since the redundant robots are subject to diminishing returns. When I reach 10 of each input, I shift gears to produce equal amounts of both, proceeding along what is known as a von Neumann-DOSSO turnpike. You might think I go along it until 1,000 of each are attained, thereafter producing only the 4,000 more hammers. And you might be right. But you might be wrong. Depending upon the exact technology, it might pay to produce more of both along the turnpike, ceasing to produce more robots only after they are somewhere between 1,000 and 5,000 in number. We end up with more than we needed of robots. Irrational, since we don't want them and they cost something? Not necessarily, since they may have been worthwhile for their help in producing the extra hammers we do want.

Monday, March 4, 2013

Kalman Filter in economics and finance

To follow up a question this morning from Yiangos. I asked my colleague Lord Eatwell, a famous economist and President of my college, "Is the Kalman filter much used in economics". He immediately replied, "Yes, all the time".

Eatwell is one of the compliers of the Palgrave Dictionary of Economics. I searched in Palgrave for the Kalman Filter, and read:

The Kalman filter

The Kalman filter deals with state-space representations where the transition and measurement equations are linear and where the shocks to the system are Gaussian. The procedure was developed by Kalman (1960) to transform (‘filter’) some original observables $y_t$ into Wold innovations at and estimates of the state $x_t$. With the innovations, we can build the likelihood function of the dynamic model. With the estimates of the states, we can forecast and smooth the stochastic process.

The use of unobserved components opens up a new range of possibilities for economic modelling. Furthermore, it provides insights and a unified approach to many other problems. The examples below give a flavour.

The local linear trend model generalizes (1) by the introduction of a stochastic slope, βt, which itself follows a random walk. Thus ...

I recommend the New Palgrave, as a good place to go if you need to find a short article on any economics topic. My short answer to Yiangos's question is that econometricians are always engaged in building linear models in which some variables are explained by other variables (in some sort of regression model). As the values of variables become known over time one wants to update estimates of other variables. The machinery for doing this is provided by the Kalman filter. Notice that the Kalman filter does not have anything to do with the Q assumptions of our LQG model. It is only the Linear Gaussian parts that are relevant.

I tried a Google search for "Kalman Filter and finance". Many things turn up. For example, here is a talk, Kalman Filtering in Mathematical Finance.

Session 8

I have added to the notes, on page 77, the argument by which we can verify that in the fishing problem the value function $F(x)$ is concave.

The homework for next week is just three questions: the first three on Problems Sheet 4. See hints below.

The name "Kalman filter" refers to the estimation equation (16.1) and takes its name from Rudolf Kalman (1930 –), who developed it in the years 1958-64. He also coined the terms controllable and observable, and gave the criteria that we have seen in previous lectures. The fact that a system is controllable iff the matrix $[B\ AB\ \cdots\ A^{n-1}B]$ is of full rank is sometimes called Kalman's criteria. In the IEEE biography of Kalman it is stated

The Kalman filter, and its later extensions to nonlinear problems, represents perhaps the most widely applied by-product of modern control theory. It has been used in space vehicle navigation and control (e.g. the Apollo vehicle), radar tracking algorithms for ABM applications, process control, and socioeconomic systems.

The theory in this lecture is admittedly quite tricky - partly because the notation. As a test of memory, can you say what roles in the theory are taken by each of these?

$x_t$, $\hat x_t$, $y_t$, $u_t$, $\epsilon_t$, $\eta_t$, $\Delta_t$, $\xi_t$, $\zeta_t$, $A$, $B$, $C$, $R$, $S$, $Q$, $K_t$, $\Pi_t$, $N$, $L$, $M$, $H_t$, $V_t$.

You will understand the ideas better once you have worked through the details of a scalar example (in which $n=m=p=1$). You do this in Problems Sheet 4 Question 1. When you do this question, start by supposing that $\hat x_t=\hat x_{t-1}+u_{t-1}+h_t(y_t-\hat x_{t-1})$, and then find the value of $h_t$ that minimizes the variance of $\hat x_t$. You can start by subtracting $x_t=x_{t-1}+u_{t-1}+3\epsilon_t$ and using $y_t=x_{t-1}+2\eta_t$. I also think that Question 2 is helpful in gaining an appreciation of the duality between control and estimation problems. One of the things you will learn from this question is a sufficient condition for the variance of $\hat x_t$ (that is, $V_t$) to tend to a finite limit. Question 3 is about solving a HJB equation (easier than the fishing example).

Notice that the Riccati equation for $V_t$, i.e. $V_t = g\, V_{t-1}$ runs in the opposite time direction to the one we had for $\Pi_t$ in Chapter 13, where $\Pi_{t-1} = f\, \Pi_t$. We are given $V_0$ and $\Pi_h$.