# 10 Expectation

[latex]\newcommand{\pr}[1]{P(#1)} \newcommand{\var}[1]{\mbox{var}(#1)} \newcommand{\mean}[1]{\mbox{E}(#1)} \newcommand{\sd}[1]{\mbox{sd}(#1)}[/latex]

Probabilities tell us how likely particular outcomes are to occur. In this way a probability is the long-run relative frequency of an outcome. We can similarly define the long-run average for a random variable. This expected value is what we would expect to get as we average more and more outcomes of the variable. It can be calculated easily from the random variable’s probability function.

# Expected Value

A classic application of expected values is in analysing a game of chance like Keno. The table below shows the probability function for the simple one-number game introduced in Chapter 8.

## Probability function for Keno winnings

[asciimath]x[/asciimath] | [asciimath]0[/asciimath] | [asciimath]3[/asciimath] |

[asciimath]P(X=x)[/asciimath] | [asciimath]\frac{3}{4}[/asciimath] | [asciimath]\frac{1}{4}[/asciimath] |

So if you play this Keno game once then you will either win $3 or win $0. But what would you **expect** to win if you played the game over and over again? For example, playing the game 10 times you might get lucky and win in 4 of them. This gives total winnings of $12, an average of $1.20 per game. Now if you played 100 times or 1000 times, what would you expect this average to be?

You can calculate this by using the probabilities as long term relative frequencies. The probability of winning $3 is [latex]\frac{1}{4}[/latex] so in the long run we expect to win $3 for every 4 games, an average of $0.75 per game. Here the other option doesn’t give us anything but the reasoning would be the same. We get $0 with probability [latex]\frac{3}{4}[/latex] so we expect to win [latex]3 \times[/latex]0 = $0$ every 4 games, an average of $0.00 per game. The total from the two possibilities is $0.75 per game. We call this the **expected value** of the random variable [latex]X[/latex], written [latex]\mean{X}[/latex]. In general we calculate it by

\[ \mean{X} = \sum_x \pr{X = x} x. \]

Here we calculated

\[ \mean{X} = \frac{3}{4} \times 0 + \frac{1}{4} \times 3 = 0.75. \]

In the long term we will get $0.75 for each game we play. Unfortunately Keno costs $1 per game and so in the long term we will actually lose $0.25 per game. What makes gambling exciting is the **variability** in the results, rather than the long term result. We will define the standard deviation of a random variable later in this chapter.

## Population Mean

In the sciences we are usually not directly interested in games of Keno. Our main focus has been on the idea of sampling from a population. It turns out in this case that the expected value has a simple interpretation.

Suppose a finite population has [latex]N[/latex] individuals with values for some variable of [latex]x_1, x_2, \ldots, x_N[/latex]. Let the random variable [latex]X[/latex] be the process of picking an individual at random and recording the value of the variable for that individual. If this is a simple random sample then each individual has the same probability of being chosen and since all the probabilities must add up to 1 this probability must be [latex]1/N[/latex]. The expected value of [latex]X[/latex] is thus

\begin{eqnarray*}

\mean{X} & = & \frac{1}{N} x_1 + \frac{1}{N} x_2 + \cdots + \frac{1}{N} x_N \\

& = & \frac{x_1 + x_2 + \cdots + x_N}{N}. \\

\end{eqnarray*}

This should look familiar – it is the formula for the sample mean but instead of just a sample we have calculated it for the whole population. Thus for sampling from a finite population the expected value is just the **population mean**. Instead of [latex]\mean{X}[/latex] we will sometimes write [latex]\mu_X[/latex] to reflect this.

## Law of Large Numbers

The **law of large numbers** simply states that, as the number of trials increases, sample proportions get closer to probabilities and the sample mean gets closer to the expected value.

A common misinterpretation of this idea is that, for example, if you toss a coin five times and you get heads each time then the next toss is more likely to give tails, to balance things up and get closer to the probability of 0.5. Of course this is not true. If the tosses are independent then the sixth toss is still a 50-50 chance of heads or tails. We could get a hundred tails in a row and that would not matter because after a million more tosses the hundred tails would only be a small wrinkle. The law of large numbers only talks about the long-term, not short-term, behaviour. For a nice example of improbable coin tossing, see the beginning of the film *Rosencrantz and Guildenstern Are Dead* (1990), based on the play by Tom Stoppard.

The phrase “law of large numbers” was first introduced in 1837 by Siméon Poisson, a French mathematician who we will meet again in Chapter 11.

# Variance

An expected value is the long-run mean of a random variable so it is natural to quantify variability using squared deviations about this mean, just as we did for samples. Essentially we want the expected squared deviation. In the Keno example if we win $3 then the squared deviation from the mean is [latex](3 - 0.75)^2 = 5.0625[/latex]. This happens with probability [latex]\frac{1}{4}[/latex] so the long-term contribution to the squared deviation is [latex]\frac{1}{4} \times 5.0625 = 1.2656[/latex]. Similarly, if we win $0 then the squared deviation is [latex](0 - 0.75)^2 = 0.5625[/latex]. This happens with probability [latex]\frac{3}{4}[/latex] so on average it contributes [latex]\frac{3}{4} \times 0.5625 = 0.4219[/latex] to the squared deviation. Adding these gives the long-run squared deviation, the **variance** of [latex]X[/latex],

\[ \var{X} = 1.2656 + 0.4219 = 1.6875. \]

In general,

\[ \var{X} = \sum_x \pr{X = x} (x – \mu_X)^2. \]

The units of the variance are squared so as for samples we take the square root to get back to the original units, giving the **standard deviation**

\[ \sd{X} = \sqrt{ \sum_x \pr{X = x} (x – \mu_X)^2 }. \]

For the Keno example, [latex]\sd{X} = \$1.30[/latex]. This is more than the expected value of $0.75 and that is what makes the gambling so exciting; the average return is low but there is a lot of variability.

Note that another way of writing the formula for variance is

\[ \var{X} = \mean{(X – \mu_X)^2}, \]

so variance really is the expected long-run squared deviation from the mean.

## Population Standard Deviation

As before, suppose a finite population has [latex]N[/latex] individuals with values for some variable of [latex]x_1, x_2, \ldots, x_N[/latex]. Let the random variable [latex]X[/latex] be the process of picking an individual at random and recording the value of the variable for that individual. Each individual’s probability of being chosen is [latex]1/N[/latex]. The standard deviation of [latex]X[/latex] is thus

\begin{eqnarray*}

\sd{X} & = & \sqrt{\frac{1}{N} (x_1- \mu_X)^2 + \frac{1}{N} (x_2- \mu_X)^2 + \cdots + \frac{1}{N} (x_N – \mu_X)^2} \\

& = & \sqrt{\frac{\sum (x_j- \mu_X)^2}{N}}. \\

\end{eqnarray*}

This is the **population standard deviation** and we often write [latex]\sigma_X[/latex] instead of [latex]\sd{X}[/latex]. It is a little different from the sample standard deviation formula since we divide by [latex]N[/latex] instead of [latex]N-1[/latex], giving a slightly smaller measure of variability. This is reasonable since we know the mean we are calculating squared deviations around and so we can be more certain of the result. This is mainly of theoretical interest since for data analysis applications it is unlikely we know the population mean.

## Precision

To give an example of how we will think about population means and standard deviations, the simple game of Keno was played 100 times with a computer (so we didn’t lose any real money). We won 28 of the games, a total of $84 in winnings. The sample mean for each game was $0.84 and the sample standard deviation was $1.35.

For this particular example we know the population mean and standard deviation. In dealing with variables for human or animal populations it will be unlikely that we know these parameters. Instead we use our sample statistics to estimate these unknown values. The sample proportion of wins, 0.28, is pretty close to the population probability 0.25. The sample mean $0.84 is close to the population mean of $0.75. The sample standard deviation $1.35 is close to the population standard deviation of $1.30.

Here we know the population values and so we can see how **precise** our sample statistics are. If we don’t know the population values then how can we be sure our sample statistics are as close as we would like to the unknown values? Estimating the precision of sample statistics is a vital part of statistical analysis and will be discussed starting in Chapter 13.

# Continuous Random Variables

So far we have motivated and defined the expected value and standard deviation of a discrete random variable. These were the sums of observations and squared deviations, respectively, weighted by their probability of outcome. Since all the examples given involved finite numbers of outcomes, these values could be calculated directly by using the sums.

For continuous variables this is not possible since there are an uncountable number of outcomes and the probability of any individual outcome is 0. However, calculus provides a notion of sum, the **integral**, which can be used to make analogous definitions for continuous variables.

Suppose a continuous random variable [latex]X[/latex] comes from a distribution with probability density function [latex]f(x)[/latex]. Here [latex]f(x)[/latex] plays the role of the probability of [latex]X = x[/latex] occurring. Even though that probability is actually 0, we can use [latex]f(x)[/latex] to weight the outcome [latex]x[/latex] for calculating the average

\[ \mean{X} = \mu_X = \int_{-\infty}^{+\infty} f(x) x dx, \]

the **expected value** of [latex]X[/latex]. The integral sign, [latex]\int[/latex], is an elongated ‘S’ for ‘sum’, reflecting its relationship with the discrete sum we’ve seen before.

The **variance** is then

\[ \var{X} = \int_{-\infty}^{+\infty} f(x) (x – \mu_X)^2 dx, \]

giving [latex]\sd{X} = \sqrt{\var{X}}[/latex].

These formulas can be difficult to work out by hand, especially if you haven’t done much calculus before. However, many software packages and graphics calculators can now be used to do these calculations exactly, with algebra, or approximately.

# Combining Variables

We are frequently interested in the behaviour of combinations of random variables. For example, obtaining a sample mean involves the sum of [latex]n[/latex] random outcomes and we would like to be able to describe the resulting variability of the statistic itself. We will not prove any of the formulas below but most of them are intuitive.

## Shifting and Scaling

Suppose [latex]X[/latex] is the random variable representing the winnings in the game of Keno in the first section of this chapter. Suppose also that the casino has a promotion where they double all your winnings. That is, your winnings now come from the random variable [latex]Y = 2X[/latex], with possible outcomes of $0 and $6. You could work out [latex]\mean{Y}[/latex] from scratch but it is easy to see that if your winnings all double then your expected value will also double. Here [latex]\mean{Y} = 2\times\$0.75 = \$1.50[/latex]. Similarly, the standard deviation will also double, with [latex]\sd{Y} = 2\times\$1.30 = \$2.60[/latex]. In general, if [latex]a[/latex] is any number then

\[ \mean{aX} = a\mean{X} \; \mbox{ and } \; \sd{aX} = |a|\sd{X}. \]

We put the absolute value signs around [latex]a[/latex] because standard deviation can never be negative. For example, if we changed $3 to -$3 in the Keno game, so if you win you pay the casino $3 instead of them paying you, then the new expected value is -$0.75. That is, in the long-run you pay the casino $0.75. However the standard deviation is still $1.30. Even though the direction of payments has changed there is no change in the amount of variability.

Now suppose the Casino gives us an extra $2 each time we play, so our winnings come from [latex]Y = X + 2[/latex]. Since the mean of [latex]X[/latex] was $0.75 then the mean of [latex]Y[/latex] is $2.75, very generous. But what is [latex]\sd{Y}[/latex]? This is still the same as [latex]\sd{X}[/latex]. Shifting the distribution does not change how spread out it is. In general, if [latex]b[/latex] is any number then

\[ \mean{X + b} = \mean{X} + b \; \mbox{ and } \; \sd{X + b} = \sd{X}. \]

## Two Random Variables

Suppose we now look at our total winnings from two games of Keno. This is not the same as the [latex]Y = 2X[/latex] example above since that only involved one random outcome, the [latex]X[/latex]. Instead we now have [latex]Y = X_1 + X_2[/latex], where [latex]X_1[/latex] is the winnings on the first game and [latex]X_2[/latex] is the winnings on the second game.

We expect to win $0.75 on the first game and $0.75 on the second game, so in total we expect to win $1.50. In general,

\[ \mean{X_1 + X_2} = \mean{X_1} + \mean{X_2}. \]

Expected values generally work in this simple intuitive way, giving the same answer as [latex]2\times\$0.75[/latex]. However the standard deviation of [latex]Y[/latex] is not [latex]2\times\$1.30 = \$2.60[/latex]. It turns out that we cannot add standard deviations together directly but we can add together squared deviations.

That is,

\[ \var{X_1 + X_2} = \var{X_1} + \var{X_2}. \]

For two games of Keno we have

\[ \var{X_1 + X_2} = 1.30^2 + 1.30^2 = 3.38, \]

so [latex]\sd{X_1 + X_2} = \sqrt{3.38} = \$1.84[/latex]. This is actually less than $2.60 so adding up several measurements gives a random variable with less variability than the sum of the variabilities. This is a key observation and is what makes **replication** an important part of science. It is also what allows a casino to run, giving reliable returns when averaged over thousands or millions of games.

There are two important subtleties to be aware of when adding variance. Firstly, variance involves a squaring process and so anything negative becomes positive. Thus

\[ \var{X_1 – X_2} = \var{X_1} + \var{X_2}. \]

If you have two sources of variability then when you combine them you will always have more variability, even if you are subtracting them. This will be a common use of variance, looking at the difference between two populations.

Secondly, suppose we take a random sample of 10 people and let [latex]X[/latex] be the proportion of males and [latex]Y[/latex] be the proportion of females. The kind of data we might get when doing this are given in the table below.

## Sample data of 10 people

Sample | 1 | 2 | 3 | 4 | 5 |

[asciimath]x[/asciimath] | 0.4 | 0.3 | 0.5 | 0.4 | 0.5 |

[asciimath]y[/asciimath] | 0.6 | 0.7 | 0.5 | 0.6 | 0.5 |

There is clearly variability in [latex]X[/latex] and in [latex]Y[/latex] from sample to sample, so [latex]\var{X} \gt 0[/latex] and [latex]\var{Y} \gt 0[/latex], but what is [latex]\var{X + Y}[/latex]? It will be 0 since [latex]X + Y[/latex] is always 1. This is not what our formula above predicts and the reason is that this [latex]X[/latex] and [latex]Y[/latex] are not **independent**. In fact we can only add variances when the two random variables are independent, so the original equation should be qualified as

\[ \var{X_1 + X_2} = \var{X_1} + \var{X_2}, \; \mbox{ if } X_1\mbox{ and }X_2\mbox{ are independent}. \]

When we come to compare populations using statistical analysis we will make extensive use of this formula. Thus our samples will always have to be independent.

Summary

- The expected value of a random variable gives the long-run mean we would expect to see from it.
- The variance of a random variable gives the expected squared deviation of the variable.
- For the random process of sampling from a population the expected value is the population mean and the standard deviation is the population standard deviation.
- Simple formulas allow us to determine the expected value and standard deviation of more complicated random variables without having to work them out from scratch.

Exercise 1

Calculate the expected value and standard deviation of the random variable [latex]X[/latex] whose probability distribution is given in this table in the next chapter.

Exercise 2

Suppose [latex]Y = X_1 + X_2 + \cdots + X_n[/latex] and that [latex]\mean{X_j} = \mu[/latex] and [latex]\sd{X_j} = \sigma[/latex]. What are [latex]\mean{Y}[/latex] and [latex]\sd{Y}[/latex]?

Exercise 3

If you know some calculus, evaluate [latex]\mean{X}[/latex] and [latex]\sd{X}[/latex] when [latex]X[/latex] is the continuous random variable having the density curve given in this previous figure.

Exercise 4

As in the section on combining variables, suppose we play two games of Keno and let [latex]X_1[/latex] be the winnings on the first game and [latex]X_2[/latex] be the winnings on the second game. Let the total winnings be \[Y = X_1 + X_2.\]

- What are the possible outcomes of [latex]Y[/latex]?
- Assuming the games are independent, determine the probability function for [latex]Y[/latex] based on the probability function for [latex]X_j[/latex] in the previous table.
- Calculate [latex]\mean{Y}[/latex] and [latex]\sd{Y}[/latex] from the probability function to verify the formulas given in the aforementioned section.

Exercise 5

Suppose [latex]X[/latex] is the number of coffees purchased in a day by a random university student with probability function as given in the following table.

## Probability function for daily coffees

[asciimath]x[/asciimath] | [asciimath]0[/asciimath] | [asciimath]1[/asciimath] | [asciimath]2[/asciimath] | [asciimath]3[/asciimath] |

[asciimath]P(X=x)[/asciimath] | [asciimath]0.3[/asciimath] | [asciimath]0.4[/asciimath] | [asciimath]0.2[/asciimath] | [asciimath]0.1[/asciimath] |

- What are the expected value and standard deviation of [latex]X[/latex]?
- Suppose a café aims to service the coffee needs of 100 students. What are the expected value and standard deviation of the total number of coffees they will sell each day?