Thursday, February 28, 2013

Coda on the puzzle

Here is an amusing sporting fact that resonates with our puzzle in which a decision that is the same under conditions of A true and A false, may not be the right decision when the truth status of A is unknown.

John Howard (London School of Economics)  tells me that in cricket it can happen that a batsman attempts to drive the ball but appears to miss (or he may have snicked it). The ball then hits his pad, bounces off, and is caught by a slip fielder. There is an appeal for lbw (leg before wicket). The umpire would have given him out "lbw" if he was sure he had not hit the ball. He would be out "caught" if he had hit the ball. But a batsman cannot be given out unless a definite reason is given, and if the umpire is not sure which of these two it was, then batsman is not out.

You can read more about this in the article, Ricky Ponting and the Judges, by Ian Rumfitt, who writes:

"In the first innings of the final Ashes Test of 2009, Ricky Ponting faced a ball which was deflected, off something, into the wicket-keeper’s hands. The English XI appealed, and in the agonizingly long time that it took Asad Rauf to decide, Jonathan Agnew (commentating on Test Match Special) reasoned as follows: ‘Either the ball hit Ponting’s bat or it hit his pads. If it hit his bat, he is out caught behind. If it hit his pads, he is out lbw. So, either way, he is out’. Rauf, however, appeared to disagree with Agnew’s reasoning, and Ponting stayed at the wicket."

Monday, February 25, 2013

Stabilizability - a personal anecdote

I post this on the evening of my 60th birthday. My birthday began with Session 7 (about controllability and stabilizability) with a class of Ph.D. students at London Business School. When I was born we lived just north of Regent's Park, in Acacia Road, St John's Wood, which is just 0.9 miles from London Business School's location. I find it remarkable to have returned so close to my geographic origins after 60 years.

Who solved the Secretary Problem?

I thought you might enjoy this.

In his paper (Who Solved the Secretary Problem? Statist. Sci., Volume 4, Number 3 (1989), 282-289) Thomas Ferguson tells the following funny story.

"When the celebrated German astronomer, Johannes Kepler (1571-1630), lost his first wife to cholera in 1611, he set about finding a new wife using the same methodical thoroughness and careful consideration of the data that he used in finding the orbit of Mars to be an ellipse. His first, not altogether happy, marriage had been arranged for him, and this time he was determined to make his own decision. In a long letter to a Baron Strahlendorf on October 23, 1613, written after he had made his selection, he describes in great detail the problems he faced and the reasons behind each of the decisions he made. He arranged to interview and to choose from among no fewer than eleven candidates for his hand. The process consumed much of his attention and energy for nearly 2 years, what with the investigations into the virtues and drawbacks of each candidate, her dowry, negotiations with her parents, natural hesitations, the advice of friends, etc. The book of Arthur Koestler (1960) contains an entertaining and insightful exposition of the process. The book of Carola Baumgardt (1951) contains much supplementary information.

Suffice it to say that of the eleven candidates interviewed, Kepler eventually decided on the fifth. It may be noted that when $n = 11$, the function $\phi_n(r)$ of (2.1) takes on its maximum value when $r = 5$. Perhaps, if Kepler had been aware of the theory of the secretary problem, he could have saved himself a lot of time and trouble." Fortunately,
"His new wife, whose education, as he says in his letter, must take the place of a dowry, bore him seven children, ran his household efficiently, and seems to have provided the necessary tranquil homelife for his erratic genius to flourish."
Ferguson has many other interesting things in his paper. It created some controversy and amusement when published because his answer to the question "Who Solved the Secretary Problem?" is "Nobody"! He then goes on to "solve" it for the first time! (at least a particular version he likes). He also says:
"As historians, we should take as the secretary problem, the problem as it first appeared in print, in Martin Gardner's February 1960 column in Scientific American, where it was called the game of googol and described as follows.
Ask someone to take as many slips of paper as he pleases, and on each slip write a different positive number. The numbers may range from small fractions of 1 to a number the size of a googol (1 followed by a hundred 0's) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that you guess to be the largest of the series. You cannot go back and pick up a previously turned slip. If you turn over all slips, then of course you must pick the last one turned.
The astute reader may notice that this is not the simple form of the secretary problem described in Section 2. The actual values of the numbers are revealed to the decision maker in violation of condition 4. Also, there is this "someone" who chooses the numbers, presumably to make your problem of selecting the largest as difficult as possible. The game of googol is really a two-person game."

Proof by Genie

Sometimes arguments like that used in the puzzle can be useful in proofs. Remember, I told you about the Find the Greatest problem? In their paper, An  optimal  strategy  for  a conflict  resolution problem, (Venkat  Anantharam  and  Pravin  Varayia, Systems and Control Letters, 1986, pp. 329 -332), the authors make an argument like this:

We  postulate  the  following  genie: 
  
The  genie  tells  us which  of  the  sets $A$ and $A^c$ ...

I omit the detail (see their paper if you like). I focus here on the idea that the genie is providing extra information that can make the optimal policy easier to figure out. The authors continue:

By postulating a genie we mean that we permit ourselves to use different strategies on events on which the genie gives us different answers. Clearly we can do no better without the genie than we can with it.

They then show that given the information provided by the genie, the optimal policy is one that could have been implemented without having that information, and which takes a special form. So they can conclude the optimal policy (without the genie) must also take this special form (i.e. that it suffices to ask questions of the simple form "Whose number is bigger than $x$?").

We see, however, that one has to be careful with this type of proof. In the puzzle we are drawn into thinking about a genie who tells us whether A is true or false. At issue is the fact that the policy which is optimal, given what the genie tells us about A, may have identical $u_0$, whatever the genie says, but this is only because we can act differently for $u_1, u_2, \dotsc$.

Answer to the puzzle

The intuitive answer to this puzzle is yes ($u_0=1$ is optimal). That might make you suspicious that I am going to argue for the non-intuitive answer, no ($u_0=1$ is not necessarily optimal). I wonder if you will like my answer, or think I am cheating?

I said nothing about $u_0$ being the only decision. I make no apology, because these lectures have been all about taking a sequence of decisions (or controls) $u_0,u_1,u_2,\dotsc$ over time.  That possibility should have been in your mind, and I was giving you a hint by denoting the decision as $u_0$ (the first of a sequence of decisions).

Suppose we are faced with a stopping problem in which $u_0=1$ means "continue" and $u_0=0$ means "stop". We should consider the possibility that by knowing that A is true, or by knowing A is false, it is optimal to continue. But if we do not know whether A is true or false then it is optimal to stop.

It is not completely obvious that this can happen. So let's see how to construct an example in which it does. Consider a version of the secretary problem in which we will see 3 candidates. They are "best", "middle" and 'worst" (B,M,W). Of course we can only compare them with one another, so if they are presented in the order M,B,W, we would see this as $(x_1,x_2,x_3)=(1, 1, 0)$, where $x_i=1$ if the $i$th candidate is the best so far. Suppose that the candidates will be presented in one of the of the following four orders, with consequent sequence of 0s and 1s, with the probabilities given:
  1. B,M,W (1,0,0) with probability 0.2
  2. M,B,W (1,1,0)                  0.3
  3. B,W,M (1,0,0)                  0.2
  4. W,M,B (1,1,1)                  0.3
It is a bit artificial that all 6 possible orders are not equally likely, but for the purpose of this discussion we are not trying to think about a really practical problem.

We wish to maximize the probability of stopping on the last 1.

Now suppose the unknown "A" is whether "the worst candidate will be seen last".

If "A" is true it will be 1 or 2, but if A is false it will be 3 or 4.

You can now verify that given A is true it is optimal not to stop on the first 1 ($u_0=1=$ "continue"). The same decision is optimal if A is false. In both cases we win with probability 0.6.

However, if we do not know whether A is true or false, then if is optimal to stop on the first 1 $(u_0=0$, and win with probability 0.4). If we do not stop we will with probability 0.6 reach 1, 1 and from there can win with probability of only 0.5. So our win probability is only 0.3.

Do you think this example is a cheat (in that it does not really match up with the way I posed the original puzzle), or have you learned something interesting?

I was thinking about this recently in connection with a similar question. I wonder if any of you can invent a neater example to illustrate this point, or one with a better "back-story".

The question I have been thinking about concerns an application of Bruss's odds theorem in circumstances that we do not know the odds. Recall that if $X_1,\dotsc,X_n$ are independent, with $X_i=1$ or $0$ with probabilities $p_i$ and $q_i=1-p_i$, then we maximize the probability of stopping on the last 1 if we stop at the first 1 that we find amongst $X_s,X_{s+1},\dotsc,X_n$, where $s$ is the greatest integer $i$ such that

$\frac{p_i}{q_i}+\cdots+\frac{p_n}{q_n}\geq 1$.

Now suppose we do not know the actual values of $p_1,\dotsc,p_n$. But we have seen $X_{i-1}=1$ and somehow know that $p_i/q_i+\cdots+p_n/q_n=1$. Can we say that it is optimal to stop? Yes. Since sum-of-odds $=1$ is the borderline case in the odds theorem is it also optimal not to stop? Yes.

But suppose we know only that $E[p_i/q_i+\cdots+p_n/q_n] > 1$. Is it optimal to stop?

Session 7

Subsequent to the class I have expanded my solution to question 3.2.

The linear-quadratic regulator (LGR) that I discussed today is one part of the solution to the linear-quadratic-Gaussian control problem (LQG). The LQG problem is perhaps the most important problem in control theory. It has a full solution: given in terms of the Kalman Filter (a linear-quadratic estimator) and the linear-quadratic regulator.

Following today's lecture you can do the assignment of Questions 4-7 on Problem Sheet 3. Here is an important hint. Do not try to solve these problems by plugging in values to the general Riccati equation solution in equations. It is always better to figure out the solution from scratch, by the method of backwards induction. Conjecture for yourself that the optimal value function, $F(x,t)$, is of a form that is quadratic in the state variable, plus some $\gamma_t$ (if it is a problem with white noise). Then find the explicit form of the recurrences, working backwards inductively from a terminal time $h$.

In general, solutions to a Riccati equation can be computed numerically, but not algebraically. However, one can find an full solution in the special case that $x_t$ and $u_t$ are both one-dimensional (Question 4). There is also a fully-solvable special case in which $x_t$ is two-dimensional and $u_t$ is one-dimensional (Question 6). A useful trick in some problems is to observe that $\Pi_t^{-1}$ satisfies a recurrence relation (Question 6).

No one enjoys memorizing the Riccati equation. The important thing is to remember enough of what it is for, how it looks, and how it is derived, so that you could reconstruct its derivation in the context of a particular problem.

The idea of controllability is straightforward and we can understand it using ideas of linear algebra.

Consider again the broom balancing problem. I mentioned that two upright brooms are not controllable around their equilibrium simultaneously if they are the same lengths, but it is possible if the lengths are different. This is because of the forms:

$A=\begin{pmatrix}0 & 1 & 0 & 0\\ \alpha & 0 & 0& 0\\ 0 & 0 & 0& 1\\ 0 & 0& \beta & 0\end{pmatrix}\quad B=\begin{pmatrix}0 \\-\alpha\\0\\-\beta\end{pmatrix}\quad M_4=\begin{pmatrix}0 & -\alpha & 0 & -\alpha^2\\ -\alpha & 0 & -\alpha^2& 0\\ 0 & -\beta & 0& -\beta^2\\ -\beta & 0& -\beta^2 & 0\end{pmatrix}$

where $\alpha = g/L_1$, $\beta=g/L_2$. So $M_4$ is of rank 4 iff $\alpha$ and $\beta$ are different. However, this assumes that as you move your hand you must keep it at constant height from the ground. What do you think might happen if you can move your hand up and down also?

Sunday, February 24, 2013

Control of queues

In Section 10.3 we looked at a problem of controlling the arrival rate at an $M/M/1$ queue. Question 3 on Problem Sheet 3 is about controlling the service rate. In both cases, I was using this to illustrate the policy improvement algorithm. In 8.7 I introduced the topic of optimal scheduling in a multi-class $M/G/1$ queue. 

Problems about controlling service and/or arrivals, and scheduling, in single queues, or networks of queues, crop up again and again in OR models in communications, supply chains, etc. There is a vast research literature. I have written a few papers about this myself, such as 

R. R. Weber and S. Stidham, Jr. Optimal control of service rates in networks of queuesAdv. Appl. Prob. 19, 202-218, 1987.

S. Stidham, Jr and R. R. Weber. Monotonic and insensitive optimal policies for the control of queues with undiscounted costsOperations Research 37, 611-625, 1989. 

S. Stidham and R. Weber. A survey of Markov decision models for control of networks of queues. Queueing Systems 13, 291-314, 1993. 

My co-author of those papers, Sandy Stidham, has a huge knowledge of this field. He written a nice article containing 349 references, which gives a bird's eye overview of queueing control and related topics entitled, Applied Probability in Operations Research: A Retrospective, 2001.

If in your own research you ever find that you have reason to want to know more about this subject then I suggest you begin by reading Sandy's retrospective. Or take a look at one of the papers above.    The first one is about a problem in which both service and arrival rates can be controlled within a simple network of queues.

Friday, February 22, 2013

A Puzzle

Here is a puzzle to test your understanding and intuition.

You are faced with a problem in which you need to make a decision. For example, you might be choosing $u_0$. You can choose $u_0=0$ or $u_0=1$. One of these is optimal (better than the other).

There is something "A" which you do not know. It could be either true or false.

If you were to know A is true then $u_0=1$ is definitely optimal (i.e. better than $u_0=0$).
If you were to know A is false then $u_0=1$ is also definitely optimal.

Can you therefore conclude that $u_0=1$ is optimal?

See Answer to the puzzle, Proof by GenieCoda on the puzzle, and Proof by Genie, continued.

Monday, February 18, 2013

Homework

The homework assignment due on 25/2 is Problem Sheet 3: 1-3. I have already given some hints on doing these questions in the blog for Session 5.

Session 6

In today's session we looked at some more advanced topics, which previous sessions have prepared us to understand. I focused on some problems that I have studied in my own research.

With hindsight, I may have been too ambitious in imagining what I might cover this morning. I hope that my notes will help fill in gaps if you wish to learn more.

In the end, we covered the notion of Whittle indices for restless bandits. We saw that the policy they define is optimal, or near optimal, in the limit that the number of identical bandits $n$ tends to infinity. This was by the fact that in large-$n$ systems the proportions of bandits in the states have a "fluid model", which is linear within polyhedral regions.

In the second part we looked at stochastic sequential allocation and assignment problems. The assignment problem (SSAP) has a surprising answer: that the optimal strategy can be described in terms of thresholds that do not depend on the values of the $p_i$. The proof is cute and is worth understanding.

I mentioned Rhoda Righter's nice treatment of a SSAP with arrivals, in which the incentives for individual (selfish) behaviour by workers can be made to coincide with the socially optimal behaviour. This is also true in the SSAP without arrivals, if the jobs arr offered to the workers in decreasing order of their $p_i$. For example, with $n$ workers present, the $i$th will accept the job iff it has been declined by workers $1,2,\dotsc,i-1$  and then is value $X$ is at least $a_{i,n}$, which is the expected value of the job that will be assigned to $i$th worker if she rejects this currently offered  job. It will be declined by the previous workers precisely if $X <  a_{i-1,n}$.

There was not time to discuss all problems. I did not fully discuss all the solutions to Problem Sheet 2 5-7. The model solutions are now on the web site.

Some reference about the Bomber Problem and its relatives are in the previous post. The slides I used to talk to are here

ABCs of the Bomber Problem

I briefly spoke today about the infamous Bomber Problem. If you would like to read more, see ABCs of the Bomber Problem and its Relatives, Weber, 2011. There is also a talk about this here. This problem has been open for over 35 years, and has been said by researchers who know it to be highly additive. It is very infuriating that for a problem with such a very simple dynamic programming equation an "obvious" fact cannot be proved or disproved.

$F(n,t)$ is the probability that a bomber plane, which is potentially attacked by enemy fighters $t$ times, can successfully defend itself against all such attacks, when starting with a supply of $n$ air-to-air missiles. The probability of surviving an attack by using $k$ missiles is $(1-\theta^k)$, where $0 < \theta < 1$.

$F(n,t) = q F(n,t-1) + p \max_{k\in\{1,\ldots,n\}}(1-\theta^k)F(n-k,t-1)$

$F(n,0)=1$

Conjecture B is that the maximizing $k$ is nondecreasing in $n$.

Many problems very similar to this are easy to solve, such as the stochastic sequential investment problem (of Derman Leiberman and Ross, described on page 34 of notes). Typically the proofs methods are similar to that in Problem 7 of Problems Sheet 2.

Klinger and Brown's 1968 paper is here.

Saturday, February 16, 2013

Session 5

You will find that the policy improvement algorithm is useful in answering Problems 1 and 2 on Problems Sheet 3. In Problem 1, you will find $\lambda$ and $\phi$ for a certain policy ("On seeing a filling station, stop and fill the tank"), and then look for a condition that the policy improvement algorithm will not improve it (or, equivalently, that this $\lambda$ and $\phi$ satisfy the optimality equation.

In Problem 2 you will use policy improvement idea in the context of a discounted-cost problem. You find $F(x)$ for a simple policy (in which the engineer allocates his time randomly), and then improve it by a step of the policy improvement algorithm. This leads to a policy in which the engineer puts all his maintenance effort into the machine with greatest value of $c_i(x_i+q_i)/(1-\beta b_i)$. This policy is better, but may not yet be optimal.

I have a personal fondness for some things I talked about the second half today. Theorem 10.1 (and extensions of it to other distributions beyond exponential) was one of the things that I proved in my Ph.D. dissertation in 1980. The Lady's Nylon Stocking Problem was a fairly famous unsolved problem. It gave me something to talk about with non-mathematically inclined fellows when I first became a research fellow. The paper is  Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flow timeJ. Appl. Prob. 19, 167-182, 1982.

The method of uniformization that I described in this lecture seems to have been introduced by Jensen (1953), but it was first used extensively in solving queueing problems in the late 1970s (by Grassman (1977) and Keilson (1979)). 
Here is a quote from a recent 2011 paper by Rhonda RighterStochastic sequential assignment problem with arrivals, Probability in the Engineering and Informational Sciences, 25, 2011, 477–485:

Proof. We consider the equivalent discrete-time problem by uniformizing with uniformization rate $\lambda + \gamma + \alpha = 1$, and we use value iteration (i.e., induction on a finite time horizon, $n$).

The Wikipedia article has more about it and summarises the method in these words:
The method involves the constructions of an analogous discrete time Markov chain, where transitions occur according to an exponential distribution with the same parameter in every state.
I have used this idea many dozens of times in my own research. It is usually the best first step to make when  tackling a continuous-time Markov decision problem.

Tuesday, February 12, 2013

Scheduling in the Multiclass M/G/1 queue

Here is a bit more about this for those who are interested. Shyam has asked me about this section of the last lecture.

The result in Section 8.7 is originally due to Klimov in this paper: Klimov, G.P. (1974). Time-sharing service systems I, Theory Probab. Appl.,19, 532–51.

However, you will find easier to read something like this: On the Gittins index in the M/G/1 queue, Aalto, Ayesta and Righter, Queueing Systems, 63, 437-458, 2009, and also to see that researchers are still thinking about it even 35 years later!


Abstract: For an M/G/1 queue with the objective of minimizing the mean number of jobs in the system, the Gittins index rule is known to be optimal among the set of non-anticipating policies. We develop properties of the Gittins index. For a single-class queue it is known that when the service time distribution is of type Decreasing Hazard Rate (New Better than Used in Expectation), the ...

Perhaps you wondered about my calculation of a Gittins index as

         $\frac{r_i +  \alpha t_i r_1}{t_i + α t_i t_1}$

This is when we are trying to find the bandit of second highest priority after job class 1. In calculating its index in an undiscounted case we will be wanting to maximize the quantity of  (reward up to $\tau$)/$\tau$. The stopping time $\tau$ will be the first time that we have again cleared the system of type 1 jobs, i.e. all jobs of priority greater than $i$ (assuming hypothetically for now that $i$ is the job type of second greatest index).

Although it is not actually needed, we can value of $\alpha$ as follows. It is the length of a busy period of type 1 jobs that is created while we process a single type $i$ job.

Processing the type $i$ job takes time $t_i$. During that time suppose $M$ type 1 jobs arrive. To clear one of these, and other type 1 jobs that arrive while doing so (and while doing thoses, etc) takes an expected time $x_1$, where

$x_1 = t_1 + (\lambda_1 t_1) x_1$ 

Think about this. While we process one class 1 job, other class 1 jobs arrive, the expected number of which is $\lambda_1 t_1$. So $x_1 = t_1/(1-\lambda_1 t_1)$. So we will take time $M x_1$ to clear type 1 jobs after after finishing a type $i$ job. The expected value of $M$ is $\lambda_1 t_i$. So 

$\alpha = \lambda t_1/(1-\lambda_1 t_1 )$.

We also get a $r_1$ for each type 1 job cleared, and so this is why the numerator has $\alpha t_i r_1$. 

A similar calculation will be carried out when we have identified the job class of second highest priority, say class 2, and and are now looking for the one of third highest priority. Now $\alpha$ will be determined by the time needed to clear all jobs of classes 1 and 2.

Monday, February 11, 2013

Session 4

In today's lecture I used a portion of my slides from a talk on the Gittins Index. You may enjoy looking through the entire talk. It will look better if you download the .pdf file to your computer and read the presentation with a .pdf viewer such as acrobat, rather than try to read within your browser.

I have slightly changed Problems sheet 2. The assignment for this week is to do questions 5-7.

Subsequent to today's lecture I have made some changes to sections 8.5 and 8.7, now using $C_i$ to mean the completion time of job $i$. I did not discuss Section 8.8 of the notes. This is about an application of the Gittins index theorem to a problem in which it at first does not seem to apply because the states of the bandits that are not continued are not frozen.

It is remarkable how many interesting problems can be recast as bandit problems. We saw today several ways to massage some problems into the form of bandit processes (Weitzmas's problem, target and tax problems, and scheduling in a $M/G/1$ queue). In particular, we saw that the $\beta\to 1$ limit can be meaningful. You will explore this a bit in Problem 6 on Problems Sheet 2.

The proof of the Gittins index theorem is actually fairly easy. Yet it is one of the most beautiful results in the field of Markov decision processes. Its discovery and proof in 1974 is due to John Gittins. The proof I have given in today's lecture is very different to Gittins's original proof. It was first presented in my paper On the Gittins index for multiarmed banditsAnn. Appl. Prob. 2, 1024-33, 1992.

Monday, February 4, 2013

Session 3

I started today by talking about the solution to the burglar problem. This is a subtle question and a very good one for testing your understanding. My solutions for the problems are now on the web site.

I said more about the problem with which we ended Session 2, namely, the famous unsolved problem concerned with searching for a randomly moving hidden object. But I realized later that I forgot to make the comment that this is an example of a partially observed Markov decision process (POMDP). In a POMDP the decision-maker cannot directly observe the underlying state. Instead, he must maintain a probability distribution over the set of possible states, based on his observations, and the underlying MDP. This distribution is updated using the usual Bayesian calculations. That is what we did by defining $x_i=P(\text{object is in location $i$})$.

I mentioned that this problem has not yet been solved in full generality. However, the problem has been solved in continuous time. See R. R. Weber. Optimal search for a randomly moving object. J. Appl. Prob. 23:708-717, 1986. The reason the problem is easier in continuous time is that the state $x_1$ (the probability that the object is in location 1) changes continuously, rather than in jumps.

Let me emphasize that in Session 2 (Chapter 4), we used $F_s(x)$ to mean the value function of a MDP over $s$ steps, with $F_0(x)=0$. In the case of $c(x,u)\geq 0$, $F_s(x)$ is clearly nondecreasing in $s$. By contrast, in this lecture, $F_s(x)$ was used to mean the minimum cost in a stopping problem in which we must stop within $s$ steps, and $F_0(x)=k(x)$. $F_s(x)$ is now clearly nonincreasing in $s$. That is because as $s$ increases we are being given more flexibility as to when we must stop. In both cases $F_s(x)$ satisfies an optimality equation and $F_s(x)\to$ a limit, say $F_\infty(x)$, but from a different direction.

In Lecture 5 we looked at some stopping problems, which are a type of problem that crop up all over the place. Those of you working in economics and finance are almost certainly familiar with the theory of option pricing (Black-Scholes). The holder of an American option is allowed to exercise the right to buy (or sell) the underlying asset at a predetermined price at any time before or at the expiry date. The valuation of American options is essentially an optimal stopping problem.

You can read more about Bruss's odds algorithm in his paper: "Sum the odds to one and stop" Annals of Probability, Vol. 28, 1384–1391, (2000). I showed you (and have now added to the notes) how to use this to address a Secretary Problem in which the candidates arrive in groups, of sizes $t_1,t_2,\dotsc,t_h$. This partially addresses a problem asked by Ken Chang about secretary problems in which candidates are heterogeneous.

I decided not to talk about Example 6.2. You might like to look at that in the notes. This is a classic problem, which was addressed by some of the mathematicians who did the earliest work on optimal stopping problems.

Let me finish with some hints for the first four questions of Problems Sheet 2. Question 1 can be solved by realizing that there are only two stationary Markov policies. You simple need to decide which is best. Question 2 is a simple stopping problem that can be solved using a OSLA rule. When you do Question 3 you might think about using Bruss's Odds Algorithm. Although you cannot apply the algorithm directly, you should be able to solve this problem using the same scheme of proof as we used to prove Bruss's result in 6.1. Question 4 is an important one because it should convince you that it is possible to solve stochastic dynamic programming equations using linear programming (at least if both state space and action space are finite.)  It there are $n$ states and $m$ actions (controls) available in each state then we will have an LP in $n$ variables, with $nm$ linear inequality constraints.