Here is a bit more about this for those who are interested. Shyam has asked me about this section of the last lecture.
The result in Section 8.7 is originally due to Klimov in this paper: Klimov, G.P. (1974). Time-sharing service systems I, Theory Probab. Appl.,19, 532–51.
However, you will find easier to read something like this: On the Gittins index in the M/G/1 queue, Aalto, Ayesta and Righter, Queueing Systems, 63, 437-458, 2009, and also to see that researchers are still thinking about it even 35 years later!
Abstract: For an M/G/1 queue with the objective of minimizing the mean number of jobs in the system, the Gittins index rule is known to be optimal among the set of non-anticipating policies. We develop properties of the Gittins index. For a single-class queue it is known that when the service time distribution is of type Decreasing Hazard Rate (New Better than Used in Expectation), the ...
$\alpha = \lambda t_1/(1-\lambda_1 t_1 )$.
The result in Section 8.7 is originally due to Klimov in this paper: Klimov, G.P. (1974). Time-sharing service systems I, Theory Probab. Appl.,19, 532–51.
However, you will find easier to read something like this: On the Gittins index in the M/G/1 queue, Aalto, Ayesta and Righter, Queueing Systems, 63, 437-458, 2009, and also to see that researchers are still thinking about it even 35 years later!
Abstract: For an M/G/1 queue with the objective of minimizing the mean number of jobs in the system, the Gittins index rule is known to be optimal among the set of non-anticipating policies. We develop properties of the Gittins index. For a single-class queue it is known that when the service time distribution is of type Decreasing Hazard Rate (New Better than Used in Expectation), the ...
Perhaps you wondered about my calculation of a Gittins index as
$\frac{r_i + \alpha t_i r_1}{t_i + α t_i t_1}$
This is when we are trying to find the
bandit of second highest priority after job class 1. In
calculating its index in an undiscounted case we will be
wanting to maximize the quantity of (reward up to $\tau$)/$\tau$. The stopping
time $\tau$ will be the first time that we have again cleared
the system of type 1 jobs, i.e. all jobs of priority greater
than $i$ (assuming hypothetically for now that $i$ is the job
type of second greatest index).
Although it is not actually needed, we can value of $\alpha$ as follows.
It is the length of a busy period of type 1 jobs that is created
while we process a single type $i$ job.
Processing the type $i$ job takes time $t_i$. During
that time suppose $M$ type 1 jobs arrive. To clear one of these, and
other type 1 jobs that arrive while doing so (and while doing thoses, etc) takes an expected time $x_1$, where
$x_1 = t_1 + (\lambda_1 t_1) x_1$
Think about this. While we process one class 1 job, other class 1 jobs arrive, the expected number of which is $\lambda_1 t_1$. So $x_1 = t_1/(1-\lambda_1 t_1)$. So we will
take time $M x_1$ to clear type 1 jobs after after finishing a type $i$ job. The expected value of $M$ is $\lambda_1 t_i$. So
$\frac{r_i + \alpha t_i r_1}{t_i + α t_i t_1}$
$\alpha = \lambda t_1/(1-\lambda_1 t_1 )$.
We also get a $r_1$ for each type 1 job
cleared, and so this is why the numerator has $\alpha t_i r_1$.
A similar calculation will be carried out when we have identified the job class of second highest priority, say class 2, and and are now looking for the one of third highest priority. Now $\alpha$ will be determined by the time needed to clear all jobs of classes 1 and 2.