Dynamic Programming and Optimal Control: Session 3

I started today by talking about the solution to the burglar problem. This is a subtle question and a very good one for testing your understanding. My solutions for the problems are now on the web site.

I said more about the problem with which we ended Session 2, namely, the famous unsolved problem concerned with searching for a randomly moving hidden object. But I realized later that I forgot to make the comment that this is an example of a partially observed Markov decision process (POMDP). In a POMDP the decision-maker cannot directly observe the underlying state. Instead, he must maintain a probability distribution over the set of possible states, based on his observations, and the underlying MDP. This distribution is updated using the usual Bayesian calculations. That is what we did by defining $x_i=P(\text{object is in location $i$})$.

I mentioned that this problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state $x_1$ (the probability that the object is in location 1) changes continuously, rather than in jumps.

Let me emphasize that in Session 2 (Chapter 4), we used $F_s(x)$ to mean the value function of a MDP over $s$ steps, with $F_0(x)=0$. In the case of $c(x,u)\geq 0$, $F_s(x)$ is clearly nondecreasing in $s$. By contrast, in this lecture, $F_s(x)$ was used to mean the minimum cost in a stopping problem in which we must stop within $s$ steps, and $F_0(x)=k(x)$. $F_s(x)$ is now clearly nonincreasing in $s$. That is because as $s$ increases we are being given more flexibility as to when we must stop. In both cases $F_s(x)$ satisfies an optimality equation and $F_s(x)\to$ a limit, say $F_\infty(x)$, but from a different direction.

In Lecture 5 we looked at some stopping problems, which are a type of problem that crop up all over the place. Those of you working in economics and finance are almost certainly familiar with the theory of option pricing (Black-Scholes). The holder of an American option is allowed to exercise the right to buy (or sell) the underlying asset at a predetermined price at any time before or at the expiry date. The valuation of American options is essentially an optimal stopping problem.

You can read more about Bruss's odds algorithm in his paper: "Sum the odds to one and stop" Annals of Probability, Vol. 28, 1384–1391, (2000). I showed you (and have now added to the notes) how to use this to address a Secretary Problem in which the candidates arrive in groups, of sizes $t_1,t_2,\dotsc,t_h$. This partially addresses a problem asked by Ken Chang about secretary problems in which candidates are heterogeneous.

I decided not to talk about Example 6.2. You might like to look at that in the notes. This is a classic problem, which was addressed by some of the mathematicians who did the earliest work on optimal stopping problems.

Let me finish with some hints for the first four questions of Problems Sheet 2. Question 1 can be solved by realizing that there are only two stationary Markov policies. You simple need to decide which is best. Question 2 is a simple stopping problem that can be solved using a OSLA rule. When you do Question 3 you might think about using Bruss's Odds Algorithm. Although you cannot apply the algorithm directly, you should be able to solve this problem using the same scheme of proof as we used to prove Bruss's result in 6.1. Question 4 is an important one because it should convince you that it is possible to solve stochastic dynamic programming equations using linear programming (at least if both state space and action space are finite.) It there are $n$ states and $m$ actions (controls) available in each state then we will have an LP in $n$ variables, with $nm$ linear inequality constraints.

Dynamic Programming and Optimal Control

Monday, February 4, 2013

Session 3

Statcounter