# 14 Confidence Intervals

[latex]\newcommand{\pr}[1]{P(#1)} \newcommand{\var}[1]{\mbox{var}(#1)} \newcommand{\mean}[1]{\mbox{E}(#1)} \newcommand{\sd}[1]{\mbox{sd}(#1)} \newcommand{\Binomial}[3]{#1 \sim \mbox{Binomial}(#2,#3)} \newcommand{\Student}[2]{#1 \sim \mbox{Student}(#2)} \newcommand{\Normal}[3]{#1 \sim \mbox{Normal}(#2,#3)} \newcommand{\Poisson}[2]{#1 \sim \mbox{Poisson}(#2)} \newcommand{\se}[1]{\mbox{se}(#1)}[/latex]

Consider the following sequence of reasoning based on what we have seen so far.

- The sample mean has roughly a Normal distribution with mean [latex]\mu[/latex] and standard deviation [latex]\sigma/\sqrt{n}[/latex].
- In a Normal distribution about 95% of observations occur within 1.96 standard deviations of the mean.
- So in 95% of samples the sample mean will be within [latex]1.96\sigma/\sqrt{n}[/latex] of [latex]\mu[/latex].
- Reversing this, in 95% of samples [latex]\mu[/latex] will be within [latex]1.96\sigma/\sqrt{n}[/latex] of [latex]\overline{x}[/latex].

This tells us that when we use the sample mean to estimate the population mean, we can also give an idea of how far away the population mean could be from our estimate. We say we are 95% **confident** that the population mean is

\[ \overline{x} \pm 1.96\frac{\sigma}{\sqrt{n}}, \]

or that it is in the interval

\[ \left(\overline{x} – 1.96\frac{\sigma}{\sqrt{n}}, \overline{x} + 1.96\frac{\sigma}{\sqrt{n}} \right). \]

We call this range a **confidence interval** for the population mean.

This is great, allowing us to say something concrete about a population based on our sample. Unfortunately this expression for a confidence interval is also useless in practice, since there is typically no reason why we would actually know [latex]\sigma[/latex], the population standard deviation. Before we can calculate confidence intervals from real data we need to deal with this issue.

# Standard Error

In Chapter 13 we saw that if [latex]X[/latex] is a random variable with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex] then

\[ Z = \frac{\overline{X} – \mu}{\sigma/\sqrt{n}}, \]

has (approximately) the standard Normal distribution. The standard deviation of the sample mean, [latex]\sigma/\sqrt{n}[/latex], allows us to quantify the precision of the sample mean as an estimate of the population mean [latex]\mu[/latex], as in the confidence interval above.

So how can we calculate the precision of the sample mean when we don’t know [latex]\sigma[/latex]? The simple solution is to estimate [latex]\sigma[/latex] by the sample standard deviation, [latex]s[/latex]. This gives an **estimated standard deviation** of the sample mean,

\[ \se{\overline{x}} = \frac{s}{\sqrt{n}}. \]

The estimated standard deviation of a statistic is its **standard error**.

Note that the standard error of the sample mean, [latex]\se{\overline{x}}[/latex], will be different for each sample, since [latex]s[/latex] will be different for each sample. We have used a lowercase [latex]\overline{x}[/latex] to emphasise that it only makes sense to talk about the standard error of a particular sample mean, [latex]\overline{x}[/latex], rather than of the sample mean process, [latex]\overline{X}[/latex].

# Student’s t Distribution

Using the standard error to standardise gives a statistic

\[ \frac{\overline{X} – \mu}{S/\sqrt{n}}, \]

where [latex]\overline{X}[/latex] and [latex]S[/latex] are the sample mean and sample standard deviation. [latex]S[/latex] is a random variable, since it is different for each sample, and so this statistic will have more variability than just the variability from the sampling distribution of [latex]\overline{X}[/latex]. Does it still have a Normal distribution?

To check this we made 10000 samples of 5 adult males from Arcadia, recording the mean height in each sample. Here we actually know the entire population, the 1273 adult males in the town, and so we can calculate the population mean [latex]\mu[/latex] = 178.3 cm and standard deviation [latex]\sigma[/latex] = 7.00 cm. The figure below shows a density plot of the standardised [latex]z[/latex] values with the standard Normal distribution overlaid on it. The agreement is pretty close — most values are between -3 and 3, as expected for a Normal distribution.

Now instead of standardising by the known standard deviation, let us use the sample standard deviations, different for each sample. The distribution is shown in the figure below, again with the standard Normal distribution overlaid.

The difference is somewhat subtle but important. There is extra variability in the standardised values, ranging from about -10 to 10 instead of -3 to 3. We say the distribution has “fatter” tails than the standard Normal since the density is heavier in the tails. This is more clearly illustrated in the following figure, where the `S’ shape of the Normal probability plot suggests the distribution is stretched out in both directions compared to a Normal density.

The sample standard deviation is a random quantity and has added some extra uncertainty to our standardised values. We need to compensate for this when working with sample means and their standard errors. For example, a confidence interval based on 1.96 standard deviations will not give 95% confidence anymore; it may only produce a correct interval 91% of the time. We will need to go out extra standard deviations to maintain the same level of confidence.

If [latex]\overline{X}[/latex] and [latex]S[/latex] are the sample mean and sample standard deviation from a random sample then the random variable

\[ T = \frac{\overline{X} – \mu}{S/\sqrt{n}} \]

has the distribution known as **Student’s t distribution**. This distribution was developed by William Gosset, a chemist at the Guinness Brewery in Dublin from 1899. He wasn’t a statistician by training but found lots of small data sets to deal with at the Brewery. Over the years he developed methods for working with this data, and published details of the [latex]t[/latex] distribution in 1908 under the pen-name “Student” (Student, 1908).

There are actually many [latex]t[/latex] distributions. To see why, consider the distribution in the following figure. This shows the results of a similar simulation to before but with samples of size 25 instead of size 5.

The distribution is now closer to Normal so the [latex]t[/latex] distribution with [latex]n = 25[/latex] appears to be different to the [latex]t[/latex] distribution with [latex]n = 5[/latex] in our earlier figure. With samples of only 5 people the sample standard deviation [latex]s[/latex] is a pretty unreliable estimate of the unknown [latex]\sigma[/latex], but for samples of 25 people it is more accurate and so the [latex]t[/latex] value calculation is closer to the original [latex]z[/latex] calculation.

In fact we will have a different [latex]t[/latex] distribution for every sample size. What is important is the amount of information we have about the population variability, the quantity we called **degrees of freedom** in Chapter 5. We call the distribution of [latex]T[/latex] from samples of size 5 the [latex]t[/latex] distribution with 4 degrees of freedom, written [latex]t(4)[/latex]. In general, the [latex]t[/latex] values calculated from samples of size [latex]n[/latex] will have the [latex]t(n-1)[/latex] distribution.

## T(4) Distribution

[latex]t[/latex] | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|

0.0 | 0.500 | 0.496 | 0.493 | 0.489 | 0.485 | 0.481 | 0.478 | 0.474 | 0.470 | 0.466 |

0.1 | 0.463 | 0.459 | 0.455 | 0.451 | 0.448 | 0.444 | 0.440 | 0.437 | 0.433 | 0.429 |

0.2 | 0.426 | 0.422 | 0.418 | 0.415 | 0.411 | 0.407 | 0.404 | 0.400 | 0.397 | 0.393 |

0.3 | 0.390 | 0.386 | 0.382 | 0.379 | 0.375 | 0.372 | 0.369 | 0.365 | 0.362 | 0.358 |

0.4 | 0.355 | 0.351 | 0.348 | 0.345 | 0.341 | 0.338 | 0.335 | 0.331 | 0.328 | 0.325 |

0.5 | 0.322 | 0.318 | 0.315 | 0.312 | 0.309 | 0.306 | 0.303 | 0.300 | 0.297 | 0.293 |

0.6 | 0.290 | 0.287 | 0.284 | 0.281 | 0.278 | 0.276 | 0.273 | 0.270 | 0.267 | 0.264 |

0.7 | 0.261 | 0.258 | 0.256 | 0.253 | 0.250 | 0.247 | 0.245 | 0.242 | 0.239 | 0.237 |

0.8 | 0.234 | 0.232 | 0.229 | 0.227 | 0.224 | 0.222 | 0.219 | 0.217 | 0.214 | 0.212 |

0.9 | 0.210 | 0.207 | 0.205 | 0.203 | 0.200 | 0.198 | 0.196 | 0.193 | 0.191 | 0.189 |

1.0 | 0.187 | 0.185 | 0.183 | 0.181 | 0.179 | 0.176 | 0.174 | 0.172 | 0.170 | 0.168 |

1.1 | 0.167 | 0.165 | 0.163 | 0.161 | 0.159 | 0.157 | 0.155 | 0.153 | 0.152 | 0.150 |

1.2 | 0.148 | 0.146 | 0.145 | 0.143 | 0.141 | 0.140 | 0.138 | 0.136 | 0.135 | 0.133 |

1.3 | 0.132 | 0.130 | 0.129 | 0.127 | 0.126 | 0.124 | 0.123 | 0.121 | 0.120 | 0.118 |

1.4 | 0.117 | 0.116 | 0.114 | 0.113 | 0.112 | 0.110 | 0.109 | 0.108 | 0.106 | 0.105 |

1.5 | 0.104 | 0.103 | 0.102 | 0.100 | 0.099 | 0.098 | 0.097 | 0.096 | 0.095 | 0.094 |

1.6 | 0.092 | 0.091 | 0.090 | 0.089 | 0.088 | 0.087 | 0.086 | 0.085 | 0.084 | 0.083 |

1.7 | 0.082 | 0.081 | 0.080 | 0.079 | 0.078 | 0.078 | 0.077 | 0.076 | 0.075 | 0.074 |

1.8 | 0.073 | 0.072 | 0.071 | 0.071 | 0.070 | 0.069 | 0.068 | 0.067 | 0.067 | 0.066 |

1.9 | 0.065 | 0.064 | 0.064 | 0.063 | 0.062 | 0.061 | 0.061 | 0.060 | 0.059 | 0.059 |

2.0 | 0.058 | 0.057 | 0.057 | 0.056 | 0.055 | 0.055 | 0.054 | 0.054 | 0.053 | 0.052 |

2.1 | 0.052 | 0.051 | 0.051 | 0.050 | 0.050 | 0.049 | 0.048 | 0.048 | 0.047 | 0.047 |

2.2 | 0.046 | 0.046 | 0.045 | 0.045 | 0.044 | 0.044 | 0.043 | 0.043 | 0.042 | 0.042 |

2.3 | 0.041 | 0.041 | 0.041 | 0.040 | 0.040 | 0.039 | 0.039 | 0.038 | 0.038 | 0.038 |

2.4 | 0.037 | 0.037 | 0.036 | 0.036 | 0.036 | 0.035 | 0.035 | 0.034 | 0.034 | 0.034 |

2.5 | 0.033 | 0.033 | 0.033 | 0.032 | 0.032 | 0.032 | 0.031 | 0.031 | 0.031 | 0.030 |

2.6 | 0.030 | 0.030 | 0.029 | 0.029 | 0.029 | 0.028 | 0.028 | 0.028 | 0.028 | 0.027 |

2.7 | 0.027 | 0.027 | 0.026 | 0.026 | 0.026 | 0.026 | 0.025 | 0.025 | 0.025 | 0.025 |

2.8 | 0.024 | 0.024 | 0.024 | 0.024 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.022 |

2.9 | 0.022 | 0.022 | 0.022 | 0.021 | 0.021 | 0.021 | 0.021 | 0.021 | 0.020 | 0.020 |

3.0 | 0.020 | 0.020 | 0.020 | 0.019 | 0.019 | 0.019 | 0.019 | 0.019 | 0.018 | 0.018 |

3.1 | 0.018 | 0.018 | 0.018 | 0.018 | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 |

3.2 | 0.016 | 0.016 | 0.016 | 0.016 | 0.016 | 0.016 | 0.016 | 0.015 | 0.015 | 0.015 |

3.3 | 0.015 | 0.015 | 0.015 | 0.015 | 0.014 | 0.014 | 0.014 | 0.014 | 0.014 | 0.014 |

3.4 | 0.014 | 0.014 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 |

3.5 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.011 |

3.6 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 |

3.7 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 | 0.010 |

3.8 | 0.010 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 | 0.009 |

3.9 | 0.009 | 0.009 | 0.009 | 0.009 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 | 0.008 |

This table gives [latex]\pr{T \ge t}[/latex] for [latex]\Student{T}{4}[/latex]

The table above gives probabilities for the [latex]t(4)[/latex] distribution, with a similar structure to the table for the standard Normal distribution. However, since the distribution depends on the degrees of freedom, in practice we would need a whole book of such tables to cater for different sample sizes. A table such as the one below is used instead. This just gives the **critical values** for the distribution, the [latex]t[/latex] scores that have a certain probability to their right. This is exactly what is needed for forming confidence intervals and can also be used to decide significance at certain levels for hypothesis tests, as we will see in Chapter 15.

For large samples there is essentially no difference between the [latex]t[/latex] distribution and the Normal distribution, as you can see from the table below. You may see books referring to using the [latex]t[/latex] distribution as “small sample” methods and using the Normal distribution as “large sample” methods. In fact, whenever you use a sample standard deviation you use a [latex]t[/latex] distribution. It is just that for large samples it does not really matter.

## Critical values of Student's T distribution

df | 0.25 | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 | 0.001 | 0.0005 | 0.0001 |
---|---|---|---|---|---|---|---|---|---|

1 | 1.000 | 3.078 | 6.314 | 12.71 | 31.82 | 63.66 | 318.3 | 636.6 | 3183.1 |

2 | 0.816 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 | 22.33 | 31.60 | 70.70 |

3 | 0.765 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 | 10.21 | 12.92 | 22.20 |

4 | 0.741 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 | 7.173 | 8.610 | 13.03 |

5 | 0.727 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 | 5.893 | 6.869 | 9.678 |

6 | 0.718 | 1.440 | 1.943 | 2.447 | 3.143 | 3.707 | 5.208 | 5.959 | 8.025 |

7 | 0.711 | 1.415 | 1.895 | 2.365 | 2.998 | 3.499 | 4.785 | 5.408 | 7.063 |

8 | 0.706 | 1.397 | 1.860 | 2.306 | 2.896 | 3.355 | 4.501 | 5.041 | 6.442 |

9 | 0.703 | 1.383 | 1.833 | 2.262 | 2.821 | 3.250 | 4.297 | 4.781 | 6.010 |

10 | 0.700 | 1.372 | 1.812 | 2.228 | 2.764 | 3.169 | 4.144 | 4.587 | 5.694 |

11 | 0.697 | 1.363 | 1.796 | 2.201 | 2.718 | 3.106 | 4.025 | 4.437 | 5.453 |

12 | 0.695 | 1.356 | 1.782 | 2.179 | 2.681 | 3.055 | 3.930 | 4.318 | 5.263 |

13 | 0.694 | 1.350 | 1.771 | 2.160 | 2.650 | 3.012 | 3.852 | 4.221 | 5.111 |

14 | 0.692 | 1.345 | 1.761 | 2.145 | 2.624 | 2.977 | 3.787 | 4.140 | 4.985 |

15 | 0.691 | 1.341 | 1.753 | 2.131 | 2.602 | 2.947 | 3.733 | 4.073 | 4.880 |

16 | 0.690 | 1.337 | 1.746 | 2.120 | 2.583 | 2.921 | 3.686 | 4.015 | 4.791 |

17 | 0.689 | 1.333 | 1.740 | 2.110 | 2.567 | 2.898 | 3.646 | 3.965 | 4.714 |

18 | 0.688 | 1.330 | 1.734 | 2.101 | 2.552 | 2.878 | 3.610 | 3.922 | 4.648 |

19 | 0.688 | 1.328 | 1.729 | 2.093 | 2.539 | 2.861 | 3.579 | 3.883 | 4.590 |

20 | 0.687 | 1.325 | 1.725 | 2.086 | 2.528 | 2.845 | 3.552 | 3.850 | 4.539 |

21 | 0.686 | 1.323 | 1.721 | 2.080 | 2.518 | 2.831 | 3.527 | 3.819 | 4.493 |

22 | 0.686 | 1.321 | 1.717 | 2.074 | 2.508 | 2.819 | 3.505 | 3.792 | 4.452 |

23 | 0.685 | 1.319 | 1.714 | 2.069 | 2.500 | 2.807 | 3.485 | 3.768 | 4.415 |

24 | 0.685 | 1.318 | 1.711 | 2.064 | 2.492 | 2.797 | 3.467 | 3.745 | 4.382 |

25 | 0.684 | 1.316 | 1.708 | 2.060 | 2.485 | 2.787 | 3.450 | 3.725 | 4.352 |

26 | 0.684 | 1.315 | 1.706 | 2.056 | 2.479 | 2.779 | 3.435 | 3.707 | 4.324 |

27 | 0.684 | 1.314 | 1.703 | 2.052 | 2.473 | 2.771 | 3.421 | 3.690 | 4.299 |

28 | 0.683 | 1.313 | 1.701 | 2.048 | 2.467 | 2.763 | 3.408 | 3.674 | 4.275 |

29 | 0.683 | 1.311 | 1.699 | 2.045 | 2.462 | 2.756 | 3.396 | 3.659 | 4.254 |

30 | 0.683 | 1.310 | 1.697 | 2.042 | 2.457 | 2.750 | 3.385 | 3.646 | 4.234 |

40 | 0.681 | 1.303 | 1.684 | 2.021 | 2.423 | 2.704 | 3.307 | 3.551 | 4.094 |

50 | 0.679 | 1.299 | 1.676 | 2.009 | 2.403 | 2.678 | 3.261 | 3.496 | 4.014 |

60 | 0.679 | 1.296 | 1.671 | 2.000 | 2.390 | 2.660 | 3.232 | 3.460 | 3.962 |

70 | 0.678 | 1.294 | 1.667 | 1.994 | 2.381 | 2.648 | 3.211 | 3.435 | 3.926 |

80 | 0.678 | 1.292 | 1.664 | 1.990 | 2.374 | 2.639 | 3.195 | 3.416 | 3.899 |

90 | 0.677 | 1.291 | 1.662 | 1.987 | 2.368 | 2.632 | 3.183 | 3.402 | 3.878 |

100 | 0.677 | 1.290 | 1.660 | 1.984 | 2.364 | 2.626 | 3.174 | 3.390 | 3.862 |

[latex]\infty[/latex] | 0.674 | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 | 3.090 | 3.291 | 3.719 |

This table gives [latex]t^{*}[/latex] such that [latex]\pr{T \ge t^{*}} = p[/latex], where [latex]\Student{T}{\mbox{df}}[/latex].

# Confidence Interval for a Mean

We can now calculate a confidence interval for a population mean [latex]\mu[/latex], using the [latex]t[/latex] distribution in place of the Normal distribution. The general formula is

\[ \overline{x} \pm t_{n-1}^{*} \frac{s}{\sqrt{n}}, \]

where [latex]t_{n-1}^{*}[/latex] is the number of standard errors required for the desired confidence in the [latex]t(n-1)[/latex] distribution. We’ll see how this comes together in an example.

Caffeine

Consider the increases in pulse rate of the 10 subjects in Chapter 2 who drank the caffeinated cola. These have mean [latex]\overline{x}[/latex] = 15.80 bpm with standard deviation [latex]s = 8.324[/latex] bpm. We estimate the mean increase in pulse rate for all people in this population with 15.80 bpm. The standard error of this estimate is

\[ \se{\overline{x}} = \frac{8.324}{\sqrt{10}} = 2.632 \mbox{ bpm}, \]

with [latex]10 - 1 = 9[/latex] degrees of freedom.

For a Normal distribution, 95% of outcomes occur within 1.96 standard deviations of the mean. For the [latex]t[/latex] distributions, the table of Student’s T distribution gives the [latex]t[/latex] scores that have certain areas to the right. The figure below shows that for 95% confidence we require 5% area in the tails.

The [latex]t[/latex] distributions are symmetric and so this means we need 2[latex]\frac{1}{2}[/latex]% area to the right. Thus we look under the [latex]p = 0.025[/latex] column of the table of Student’s T distribution. With 9 degrees of freedom the **critical value** is 2.262 standard errors. Note that this is higher than 1.96, reflecting the fact that we need to have a wider interval to account for the uncertainty that comes from not knowing [latex]\sigma[/latex].

We can now assemble our confidence interval. Based on the sample of 10 subjects we are 95% confident that the mean increase in pulse rate from drinking 250 mL of caffeinated diet cola is

\begin{eqnarray*}

15.80 \pm 2.262 \frac{8.324}{\sqrt{10}} & = &15.80 \pm 2.262 \times 2.632 \\

& = & 15.80 \pm 5.95 \mbox{ bpm}.

\end{eqnarray*}

Thus we are 95% confident that the mean increase is between about 9.8 bpm and 21.8 bpm.

Reflect on where this interval comes from. In particular, note that the 95% is saying that an interval generated in this way will contain the true mean 95% of the time. A confidence interval is a **random interval** and the confidence level gives the probability of this interval containing the value of interest.

The **margin of error** for this interval was 5.95 bpm. This can be used as a measure of the **precision** of the estimate. Confidence intervals can be written either as the actual interval, such as (9.8, 21.8) or 9.8 — 21.8, or as the estimate and margin of error, such as

\[ 15.80 \pm 5.95. \]

If you are interested in seeing whether a particular value for the mean is plausible, then the interval is more useful. If you don’t care about particular values but just want to make clear how accurate your estimate is, then the second form is better.

From the calculation of margin of error, you can see that the estimate can be made more precise by increasing the sample size [latex]n[/latex]. We can also improve precision by accepting lower confidence levels. For example, if we were happy with 90% confidence then the table of Student’s T distribution shows that 1.833 standard errors would be fine. The following figure shows the spectrum of confidence intervals for the mean increase in pulse rate across a range of confidence levels. The three levels most commonly used, 90%, 95% and 99%, are indicated by vertical lines.

# Choosing Sample Size

The margin of error for the increase in pulse rate was 5.95 bpm. Suppose we wanted to be able to estimate the mean increase in pulse rate with a margin of error of 2 bpm. If we keep the 95% confidence level, our only choice is to alter sample sizes.

In an ideal world where we knew the population standard deviation, [latex]\sigma[/latex], the margin of error is

\[ m = 1.96\frac{\sigma}{\sqrt{n}}. \]

Rearranging this gives

\[ n = \left(\frac{1.96\sigma}{m}\right)^2, \]

a simple formula for calculating the required sample size. In practice we need an estimate of [latex]\sigma[/latex] in order to use this formula. For the caffeine example we can treat the data observed as coming from a **pilot study**. This is one reason for pilot studies, to give initial estimates of variability in order to help plan the main study for meeting certain objectives. Here our pilot study estimates [latex]\sigma[/latex] with [latex]s[/latex] = 8.324 and we would like our main study to estimate the mean to within [latex]m[/latex] = 2 bpm at 95% confidence. Thus we require

\[ n = \left(\frac{1.96 s}{m}\right)^2 = \left(\frac{1.96 \times 8.324}{2.0}\right)^2 = 66.5, \]

so at least 67 subjects are needed in a trial to obtain the desired accuracy.

Given that the [latex]z^* = 1.96[/latex] and [latex]s = 8.324[/latex] are fixed in this calculation, the sample size is proportional to the square of the reciprocal of the desired margin of error,

\[ n \propto \left(\frac{1}{m}\right)^2. \]

Thus in general if you want to halve the margin of error you will need to increase the sample size by a factor of four. The figure below shows this relationship in terms of the ratio of the desired margin of error to the standard deviation. For example, wanting a margin of error of 2 bpm when our estimated standard deviation was 8.324 bpm gives a ratio of [latex]2/8.324 = 0.24[/latex]. The height of the plot at 0.24 is around 67, the required sample size.

Note that the 1.96 in the above calculations could also be replaced by the appropriate [latex]t^*[/latex], though this is hard since we don’t know what [latex]n-1[/latex] will be. An iterative method can be used but in this setting we are usually planning large studies for which the difference between [latex]t^*[/latex] and 1.96 will be negligible. (For [latex]n = 67[/latex], [latex]t^*[/latex] is less than 2.00.) The amount of variability in [latex]s[/latex] between the pilot study and the main study will almost certainly overshadow this small difference in the number of standard errors.

# Prediction Intervals

A confidence interval gives a range of plausible values for the mean of a population. For example, the [latex]n[/latex] = 10 subjects who drank caffeinated cola had a mean increase in pulse rate of [latex]\overline{x}[/latex] = 15.80 bpm and standard deviation [latex]s[/latex] = 8.324 bpm. The 95% confidence interval for the mean increase in pulse rate was

\[ 15.80 \pm 5.95 \mbox{ bpm}. \]

Again this means we are 95% confident that the mean increase in pulse rate is between 9.8 bpm and 21.8 bpm.

Suppose instead we want to predict the increase in pulse rate experienced by another person who drinks 250 mL of caffeinated cola. As always, there are two questions we need to ask and you should take the time to think about the answers before reading on:

- What value should we use for our prediction?
- What is the precision of this prediction likely to be?

If we want to “predict” a population mean then we have the answers to these. We use the sample mean as our prediction and can use its standard error, in conjunction with the [latex]t[/latex] distribution, to quantify its accuracy. In this case we usually say we are “estimating” a population mean; we will use “predict” to refer to making predictions about individual outcomes.

What value should we use to predict the increase for a new person from this population? If we don’t have any other information then our best bet is to use the sample mean, 15.80 bpm. One reason for doing this is that the sample mean is the **mode** of the Normal density curve we might use to describe increases in pulse rate and so is the most likely outcome.

So our prediction for an individual increase is simply the sample mean, 15.80 bpm. To measure how precise this prediction is we need to take into account two sources of variability. Firstly, there is the variability in the sample mean as an estimate. This is what the standard error measures, with a value in this case of 2.632 bpm, and is what we have used in our confidence intervals. To predict an individual outcome we must also add the natural variability in the population. This is estimated by the sample standard deviation, 8.324 bpm. For our prediction interval error we combine these to obtain

\[ \sqrt{2.632^2 + 8.324^2} = 8.730 \mbox{ bpm}. \]

We use [latex]t_{9}^{*}[/latex] as before, so a 95% **prediction interval** is

\[ 15.80 \pm t_{9}^{*} \; \times 8.730 = 15.80 \pm 19.75 \mbox{ bpm}. \]

This is quite a wide interval, which is not surprising since wanting to be 95% sure of the increase in pulse rate is going to include most possible increases.

A general formula for a prediction interval is

\[ \overline{x} \pm t_{n-1}^{*} s \sqrt{1 + \frac{1}{n}}. \]

Such intervals are rarely used in this context. We will return to them when talking about predictions from least-squares fits, a more common use, in Chapter 18.

# Assumptions for t Methods

Recall that the [latex]t[/latex] distribution arose from a Normal distribution where we didn’t know the standard deviation. If the population we are sampling from has a Normal distribution then the sample mean has a Normal distribution too. Thus we can use the [latex]t[/latex] methods and all the probabilities will be correct. That is, 95% of the time we make a 95% confidence interval it will contain the population mean.

What if the population is not Normal? In that case the Central Limit Theorem says that the sample mean is approximately Normal and that the approximation gets better as the sample size increases. As long as this approximation is okay then it is fine to use the [latex]t[/latex] methods. Wild and Seber (2000) give the following recommendations as to how big [latex]n[/latex] should be in order for the approximation to be reasonable.

- For small samples (roughly [latex]n \lt 15[/latex]) only use the [latex]t[/latex] methods if the data are close to symmetric and there are no outliers.
- For moderate samples ([latex]15 \le n \le 40[/latex]) use the [latex]t[/latex] methods as long as there are no outliers or strong skewness in the data.
- For large samples (roughly [latex]n \gt 40[/latex]) use the [latex]t[/latex] methods even in the presence of strong skewness, though outliers may still affect results.

Normal probability plots (see Chapter 12) are a useful means for checking for skewness and outliers.

If we cannot use the [latex]t[/latex] procedures then there are various options. One is to use **nonparametric** methods, as we will discuss in Chapter 24. These make different assumptions about the data and so can work where the [latex]t[/latex] methods are not appropriate.

# Transforming Data

The other general approach is to **transform** the data in some way to help satisfy the [latex]t[/latex] assumptions. For example, taking logarithms can make a skewed distribution more symmetric. Analysis can be carried out on the log values and then conclusions can be exponentiated to return to the original units.

Maximum Drug Concentration

In Chapter 6 we gave a time plot of drug concentrations from a single subject in a bioequivalence study. The maximum drug concentration during the 8 hours for this subject was 627 [latex]\mu[/latex]g/L for the reference formulation and 704 [latex]\mu[/latex]g/L for the test formulation. The maximum concentrations for the remaining 23 subjects are shown in the following table, along with the time taken to reach that concentration. One condition for bioequivalence is that the maximum concentration of the test formulation should be between 80% and 125% of the maximum concentration of the reference formulation.

## Maximum drug concentration ([asciimath]\mu[/asciimath]g/L) and time to maximum (hours)

Max Concentration | Time to Max | |||
---|---|---|---|---|

Subject | Test | Reference | Test | Reference |

1 | 704 | 627 | 1.33 | 4.00 |

2 | 767 | 618 | 0.98 | 1.02 |

3 | 530 | 336 | 0.98 | 2.00 |

4 | 1322 | 1369 | 1.67 | 1.35 |

5 | 796 | 984 | 4.48 | 1.33 |

6 | 576 | 423 | 0.98 | 1.00 |

7 | 3097 | 2212 | 1.33 | 2.00 |

8 | 1429 | 1696 | 1.33 | 2.33 |

9 | 626 | 690 | 2.67 | 1.33 |

10 | 433 | 430 | 1.33 | 1.33 |

11 | 4136 | 3088 | 1.00 | 2.02 |

12 | 527 | 458 | 2.00 | 1.33 |

13 | 1223 | 1516 | 3.00 | 2.00 |

14 | 1267 | 674 | 1.67 | 2.02 |

15 | 374 | 603 | 1.65 | 2.00 |

16 | 1026 | 926 | 1.33 | 1.67 |

17 | 2265 | 2062 | 1.67 | 1.32 |

18 | 445 | 461 | 1.32 | 1.35 |

19 | 774 | 690 | 1.33 | 4.00 |

20 | 796 | 984 | 4.48 | 1.33 |

21 | 1226 | 1499 | 3.00 | 2.00 |

22 | 553 | 481 | 2.00 | 1.33 |

23 | 1284 | 1592 | 3.00 | 2.00 |

24 | 1330 | 655 | 1.67 | 1.67 |

The figure below shows a comparison of the maximum drug concentrations between the test and reference groups.

For a new formulation to be bioequivalent to a reference formulation, one of the conditions is that the maximum concentration from the new formulation is between 80% and 125% that of the reference formulation. We need a confidence interval to estimate the difference between maximum concentrations and then we can see if this holds in this study.

However, the previous figure suggests that the distributions of maximum concentrations are highly skewed, undermining the applicability of a [latex]t[/latex] confidence interval. It is standard to take logarithms in this case to help make the data more symmetric — the figure below shows the same data plotted on a log scale. The transformed data are not perfectly symmetric, but they are a lot closer and, given the sample size, a [latex]t[/latex] procedure is now appropriate.

The second step is to notice that these are not independent sets of measurements since they come from the same patients. These two figures are actually quite misleading by suggesting a comparison between two groups. What we are interested in is whether there is a difference between the maximum concentrations in each patient, so the simple approach is to take the difference between each pair of transformed measurements. These values are shown in the table below for you to check.

## Differences in log-transformed maximum drug concentrations

0.050 | 0.094 | 0.198 | -0.015 | -0.092 | 0.134 |

0.146 | -0.074 | -0.042 | 0.003 | 0.127 | 0.061 |

-0.093 | 0.274 | -0.207 | 0.045 | 0.041 | -0.015 |

0.050 | -0.092 | -0.087 | 0.061 | -0.093 | 0.308 |

This is now a one-sample problem. The sample mean of the [latex]n = 24[/latex] observations is 0.0326 with standard deviation 0.12448. This gives a 95% confidence interval (for the population difference between the transformed maximum concentrations) of

\[ 0.0326 \pm t_{23}^{*} \frac{0.12448}{\sqrt{24}} = 0.0326 \pm 0.0525, \]

a range of -0.0199 to 0.0851. This is hard to interpret directly, but recall that

\[ \log(\mbox{Test}) – \log(\mbox{Reference}) = \log \left(\frac{\mbox{Test}}{\mbox{Reference}}\right), \]

so undoing our transformation will give information about the **ratio** of the maximum concentrations. In fact, the values in the previous table could have been calculated by taking the logs of the ratios, rather than the differences in the logs. For example, the first subject had a maximum concentration of 704 [latex]\mu[/latex]g/L for the test formulation compared to 627 [latex]\mu[/latex]g/L for the reference formulation. This gives a ratio of [latex]704/627 = 1.123[/latex], and then [latex]\log(1.123) = 0.050[/latex], as before.

Thus the estimated mean ratio is

\[ 10^{0.0326} = 1.078, \]

so we estimate that the test formulation is giving about a 7.8% higher maximum concentration on average than the reference formulation. We can do the same for the confidence interval bounds, giving a 95% interval for the true ratio to be

\[ (10^{-0.0199}, 10^{0.0851}) = (0.955, 1.216). \]

Note that the middle of this range is 1.086, which is different to the estimated mean, 1.078. All the confidence intervals we have seen so far have been symmetric about their estimate, but when you use a transformation like this you may end up with an **asymmetric** interval.

The upper range of this interval gives a ratio with the test maximum 21.6% higher than the reference maximum. This is below the prescribed limit of 125%, so this suggests that the two formulations are bioequivalent in terms of maximum drug concentration.

## Geometric Mean

Suppose we have [latex]n[/latex] observations [latex]x_1, x_2, \ldots, x_n[/latex] and we work out the sample mean of their log-transformed values,

\begin{eqnarray*}

\frac{\log(x_1) + \log(x_2) + \cdots + \log(x_n)}{n} & = & \frac{\log(x_1 \times x_2 \times \cdots \times x_n)}{n} \\

& = & \log( \sqrt[n]{x_1 \times x_2 \times \cdots \times x_n})

\end{eqnarray*}

In the above example we did this calculation for [latex]n=24[/latex] observations and found the sample mean of 0.0326 for transformed ratios. However, we wanted an estimate of the mean ratio and so we removed the logarithm by calculating [latex]10^{0.0326} = 1.078[/latex]. Doing this for the general formula gives

\[ \sqrt[n]{x_1 \times x_2 \times \cdots \times x_n}. \]

You may have seen this formula before. It is the **geometric mean** of the [latex]n[/latex] values. Thus the mean ratio we calculate using the log-transformed ratios is actually the geometric mean, not the regular sample mean (also known as the **arithmetic mean**).

This highlights one of the roles of the geometric mean. It is always less than the sample mean and so for right-skewed data it may be a more appropriate measure of the centre of the distribution. Indeed that is why we were using the log-transformed data, overcoming the strong skewness present in the data.

Summary

- Confidence intervals make statements about a population parameter based on the known sampling distribution of a statistic.
- For confidence intervals based on the sample mean we need to use Student’s [latex]t[/latex] distribution to calculate the margin of error. For a single sample mean this distribution has degrees of freedom [latex]n-1[/latex].
- The margin of error can be controlled by changing the confidence level or by choosing an appropriate sample size.
- Prediction intervals involve greater uncertainty since they have to include the underlying variability of the response, in addition to the sampling variability of the mean estimate.
- For small samples the [latex]t[/latex] confidence intervals are only valid when the data are roughly symmetric and free of outliers. If these assumptions are not met then one option is to transform the data, using logarithms or other transformations.

Exercise 1

Check the critical values for [latex]t(4)[/latex] given in the table of Student’s T distribution against the probabilities given in the [latex]t(4)[/latex] distribution.

Exercise 2

Based on the data from the oxytocin study, calculate a 95% confidence interval for the mean basal plasma oxytocin level for single women.

Exercise 3

Calculate a 95% confidence interval for the mean resting pulse rate of all Islanders based on the sample in the survey data. Would you have any concerns about using this interval?