# 17 Inferences for Proportions

[latex]\newcommand{\pr}[1]{P(#1)} \newcommand{\var}[1]{\mbox{var}(#1)} \newcommand{\mean}[1]{\mbox{E}(#1)} \newcommand{\sd}[1]{\mbox{sd}(#1)} \newcommand{\Binomial}[3]{#1 \sim \mbox{Binomial}(#2,#3)} \newcommand{\Student}[2]{#1 \sim \mbox{Student}(#2)} \newcommand{\Normal}[3]{#1 \sim \mbox{Normal}(#2,#3)} \newcommand{\Poisson}[2]{#1 \sim \mbox{Poisson}(#2)} \newcommand{\se}[1]{\mbox{se}(#1)} \newcommand{\prbig}[1]{P\left(#1\right)}[/latex]

# Binomial Tests

Suppose a coin is tossed 10 times and each time it comes up heads. Should we be suspicious? How suspicious? One way of quantifying this is to calculate the probability that this could happen for a fair coin. If [latex]X[/latex] is the number of heads then [latex]\Binomial{X}{10}{0.5}[/latex] if the coin is fair and [latex]\pr{X = 10}[/latex] can be found from the binomial distribution table or by

\[ \pr{X = 10} = 0.5 \times 0.5 \times \cdots \times 0.5 = (0.5)^{10} = 0.001. \]

This is pretty unlikely, only a 1 in a 1000 chance, so our suspicions are justified.

What assumptions did we use in our probability calculation? An important one is **independence**, that each toss is independent of the others. We will always assume that our samples are independent.

The other assumption is that the coin is fair and so the binomial probability is [latex]p = 0.5[/latex], where [latex]p[/latex] is the probability of a head. We now have evidence to suspect that this is wrong.

In this example our null hypothesis was that the coin was fair, so we write

[latex]H_{0}: p = 0.5[/latex].

In this example, before we saw the coin being tossed we would have had no reason to suspect the tossing was biased in a particular direction, so our alternative is two-sided. In symbols we would write [latex]H_1: p \ne 0.5[/latex]. In this case we would have been just as suspicious if [latex]X = 0[/latex], seeing all tails. The [latex]P[/latex]-value is then the probability [latex]X=10[/latex] or [latex]X=0[/latex], that is, 0.001 + 0.001 = 0.002. In most cases the **null distribution**, the distribution of our count or statistic if [latex]H_0[/latex] is true, is symmetric. This means that the two-sided [latex]P[/latex]-value is just double the one-sided [latex]P[/latex]-value.

On the other hand, suppose the person tossing the coin had claimed beforehand that they had a special skill in always getting heads. For example, the former magician Persi Diaconis was able to toss a coin so that it rotated the same number of times, always landing the same side up. He went on to become a professor of statistics at Harvard and is now at Stanford University. In such a case we would be specifically looking to see whether there was a high proportion of heads, so we would use the one-sided alternative [latex]H_1: p \gt 0.5[/latex]. The one-sided [latex]P[/latex]-value is [latex]\pr{X = 10}[/latex] = 0.001.

In both of these cases we would say we **reject** the null hypothesis in favour of the alternative. In the two-sided case we would say “there is very strong evidence to suggest that the coin tossing is not fair”. In the one-sided case we would say “there is very strong evidence to suggest that the coin comes up heads more often than a fair coin should.”

Suppose that instead of seeing all 10 heads, the coin came up heads 9 times out of 10. What would the one-sided [latex]P[/latex]-value be now? Is it [latex]\pr{X = 9}[/latex]? Not quite. In general we want to know how likely something **as extreme or more extreme** as that observed could happen by chance. Getting 10 heads would be more extreme than 9, so our [latex]P[/latex]-value here is

\[ \pr{X \ge 9} = \pr{X = 9} + \pr {X = 10} = 0.010 + 0.001 = 0.011. \]

The probability of getting 9 heads or something even more unusual just by chance is 0.011. This is still quite small, 1 in a 100, so seeing 9 heads is still evidence to reject [latex]H_0[/latex].

One reason for using this approach is that the probability of individual outcomes is often very small. If we toss a fair coin 20 times then we expect to get 10 heads on average, but the probability of getting exactly 10 heads is only 0.176. It is worse for continuous variables where the probability of any outcome is 0. To carry out hypothesis tests we work with extreme ranges instead.

## Binomial Power

Suppose this time you toss a coin 20 times. How many heads or tails would you have to observe to reject the null hypothesis of a fair coin at the 5% level?

To answer this, let [latex]X[/latex] be the number of heads from 20 tosses of a fair coin. From the cumulative binomial distribution table, [latex]\pr{X \ge 15} = 0.021[/latex]. So if you got 15 heads (or tails) the two-sided [latex]P[/latex]-value would be 0.042, giving evidence at the 5% level. (If you only got 14 heads or tails the [latex]P[/latex]-value would be 0.116, giving no evidence.)

Thus you need at least 15 heads or tails to get evidence against the coin being fair. Suppose in fact the coin is weighted so that the probability of heads is 0.2. What is the probability we get at least 15 heads or tails then?

The cumulative binomial distribution table shows that the probability of getting 15 or more heads if [latex]p = 0.2[/latex] is pretty much 0. What is the probability of at least 15 tails? This would be 5 or fewer heads. Using the binomial distribution, this probability is

\[ \pr{X \le 5} = 0.012 + 0.058 + 0.137 + 0.205 + 0.218 + 0.175 = 0.805. \]

Thus the total probability of obtaining at least 15 heads or tails if the probability of heads is actually 0.2 is 0.805. Thus we will reject the null hypothesis of a fair coin 80.5% of the time. This means that the power of this test procedure is 80.5%.

# Normal Approximation

We know that the sample count has a Binomial distribution. The sample proportion will have the same shape distribution as the sample count since it is obtained by scaling with [latex]\frac{1}{n}[/latex]. However, the sample proportion can also be written as

\[ \hat{P} = \frac{B_{1} + B_{2} + \cdots + B_{n}}{n}, \]

where the [latex]B_{j}[/latex] are random variables corresponding to the individual trials, with [latex]B_j = 1[/latex] for success and [latex]B_j = 0[/latex] for failure, as in Chapter 11. This formula should look familiar — it is simply a sample mean. The Central Limit Theorem says that any sample mean will tend to have a Normal distribution as [latex]n[/latex] gets large, so the sample proportion and sample count are both approximately Normal for large [latex]n[/latex].

The figure below shows the Binomial(10, 0.5) distribution of counts with its Normal approximation, using the mean and standard deviation from the above formulas.

Since Binomial distributions with [latex]p = 0.5[/latex] are symmetric the symmetric Normal distribution has an easy time doing its approximation. Compare this to the figure below, which shows the Binomial(10, 0.1) distribution.

This distribution is very skewed to the right and so the Normal distribution is not such a good approximation. If [latex]p[/latex] is far from 0.5 then we need larger sample sizes before the Normal approximation is appropriate. The figure below shows the Binomial(50, 0.1) distribution. This is still a little skewed but the Normal approximation looks more reasonable.

A general rule of thumb is that the Normal approximation can be used as long as

\[ np \ge 10 \; \mbox{ and } \; n(1-p) \ge 10. \]

If [latex]p = 0.5[/latex] then [latex]n = 20[/latex] is large enough while for [latex]p = 0.1[/latex] or [latex]p = 0.9[/latex] you should have at least [latex]n = 100[/latex].

As an example, suppose we want to find the probability [latex]\pr{X \ge 16}[/latex] for [latex]\Binomial{X}{20}{0.5}[/latex], a borderline case for our rule. The binomial distribution gives

\begin{eqnarray*}

\pr{X \ge 16} & = & \pr{X = 16} + \pr{X = 17} + \cdots + \pr{X = 20} \\

& = & 0.005 + 0.001 \\

& = & 0.006,

\end{eqnarray*}

or the cumulative binomial distribution table can be used to directly find [latex]\pr{X \ge 16} = 0.006[/latex].

For the Normal approximation we use [latex]\mean{X} = np = 10[/latex] and

\[ \sd{X} = \sqrt{20(0.5)(1-0.5)} = 2.236. \]

Then

\[ \pr{X \ge 16} \simeq P\left(Z \ge \frac{16 – 10}{2.236}\right) = \pr{Z \ge 2.68} = 0.004, \]

using the normal distribution table.

## Continuity Correction

This approximation is close but we can actually do a bit better. In a Binomial distribution we have

\[ \pr{X \le 15} + \pr{X \ge 16} = 1, \]

since there are no possibilities between 15 and 16. However, a continuous approximation to this will leave a gap between 15 and 16. That is, if [latex]X[/latex] was considered to be continuous then

\[ \pr{X \le 15} + \pr{X \ge 16} \lt 1, \]

since there are a range of continuous values between 15 and 16. A simple way to fix this and improve the approximation is to fill the gap by adding everything up to 15.5 to the left and everything down to 15.5 on the right, giving

\[ \pr{X \le 15.5} + \pr{X \ge 15.5} = 1. \]

This statement is still true if [latex]X[/latex] is discrete. This process is called a **continuity correction**. We can now estimate [latex]\pr{X \ge 16}[/latex] again by

\[ \pr{X \ge 16} \simeq P\left(Z \ge \frac{15.5 – 10}{2.236}\right) = \pr{Z \ge 2.46} = 0.007, \]

closer to the exact value of 0.006 than before.

## Confidence Intervals

Since sample proportions are approximately Normal we can use the same reasoning we saw in Chapter 14 to calculate confidence intervals for population proportions. Our estimate for the population proportion, [latex]p[/latex], is the sample proportion, [latex]\hat{p}[/latex], with standard deviation

\[ \sd{\hat{P}} = \sqrt{\frac{p(1-p)}{n}}. \]

We can’t use this directly since the standard deviation involves [latex]p[/latex], the unknown parameter we are trying to estimate. Instead we also estimate [latex]p[/latex] in the formula by [latex]\hat{p}[/latex], giving the **standard error**

\[ \se{\hat{p}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

Thus the general formula for a confidence interval for a population proportion is

\[ \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

Note that this is not related to the [latex]t[/latex] distribution in any way, so we get our **critical value**, [latex]z^{*}[/latex], from the Normal distribution. For example, for 95% confidence we use the original value of 1.96 standard errors (see the [latex]\infty[/latex] row in Student’s T distribution). Whenever we work with proportions or other discrete variables we will use [latex]z[/latex] instead of [latex]t[/latex].

Sex Ratio

The 60 Islanders in the survey data are a sample from the people in the three large towns. We observed [latex]\hat{p}[/latex] = 26/60 = 0.433 were female.

To give a 95% confidence interval for the proportion of females in the population we are sampling from we use

\[ 0.433 \pm 1.96 \sqrt{\frac{0.433(1-0.433)}{60}} = 0.433 \pm 0.125 \]

It is worth emphasising that these are numbers with units. For instance, the margin of error is 12.5% or 12.5 percentage points. It is different to the 95% for the confidence level which is a probability statement about the sampling process. Combining proportions with confidence levels and [latex]P[/latex]-values can get a bit confusing so try to keep the distinction clear in your mind.

Our confidence interval is then 43.3% [latex]\pm[/latex] 12.5%, or (30.8%, 55.8%). That is, we are 95% sure that the proportion of females in the population we are sampling from is between 30.8% and 55.8%. This is quite a wide interval, largely because the sample size of [latex]n = 60[/latex] is small. When dealing with proportions you need very large samples to get margin of errors of just a few percentage points. For example, based on this estimate, to obtain a margin of error of [latex]m = 0.01[/latex] (1%) we would need a sample of size

\[ n = \left(\frac{z^{*}}{m}\right)^2 \hat{p}(1-\hat{p}) = \left(\frac{1.96}{0.01}\right)^2 0.433(1-0.433) = 9432. \]

## Hypothesis Tests

The general method for a hypothesis test of [latex]H_0: p = p_0[/latex] is to use the [latex]z[/latex] statistic

\[ z = \frac{\hat{p} – p_0}{\sqrt{\frac{p(1-p)}{n}}}, \]

and compare it to the Normal distribution. There are two ways of dealing with the unknown [latex]p[/latex] in the standard deviation, as shown in the following examples.

Mendel

Campbell and Reece (2002) describe Mendel’s experiments beginning in the late 1850’s with crossing pea varieties having different traits. For example, starting with two homozygous varieties, one with purple flowers and one with white flowers, the first generation of plants all had purple flowers while the second generation had 705 plants with purple flowers and 224 plants with white flowers. The theory Mendel developed would suggest a 3:1 ratio of purple to white, if purple was the dominant trait. Are his experimental results consistent with this theory?

To phrase this as a hypothesis test, let [latex]p[/latex] be the probability of a plant in the second generation having purple flowers. The null hypothesis is that there is a 3:1 ratio so

\[ H_0 : p = 0.75. \]

The alternative is two-sided since too many or too few purple flowers would be evidence against the theory, so

\[ H_1 : p \ne 0.75. \]

We estimate [latex]p[/latex] by the sample proportion

\[ \hat{p} = \frac{705}{705 + 224} = \frac{705}{929} = 0.7589. \]

This seems pretty close to the value predicted by the theory. There is actually some controversy about Mendel’s results because as a whole they seem to fit the theory too well. His experiments and results were discussed in detail by Fisher (1936a). Letters to *Nature*, such as those by Van Valen (1987) and Edwards (1987), highlight the issues in the debate. So in practical terms his results seem consistent with the theory. Is there any statistical evidence against the theory though? We should compare the difference in terms of standard errors, where

\[ \se{\hat{p}} = \sqrt{\frac{0.7589(1 – 0.7589)}{929}} = 0.0138, \]

so that the number of standard errors from the predicted value is

\[ z = \frac{0.7589 – 0.75}{0.0138} = 0.64. \]

From the Normal table the one-sided [latex]P[/latex]-value for this [latex]z[/latex] score is 0.261, giving a two-sided [latex]P[/latex]-value of 0.522. Thus there is no evidence that the data do not match the theory.

## Hypothesised Standard Deviation

Confidence intervals and hypothesis tests for a population mean were identical in terms of the standard errors they used. For proportions the standard deviation formula,

\[ \sd{\hat{P}} = \sqrt{\frac{p(1-p)}{n}}, \]

involves the proportion [latex]p[/latex]. In the above examples we estimated [latex]p[/latex] with the sample proportion [latex]\hat{p}[/latex] to give the standard error. However we always calculate [latex]P[/latex]-values assuming that the null hypothesis is true and here [latex]H_0[/latex] gives a value for the unknown [latex]p[/latex]. As before we denote this by [latex]p_0[/latex] since it is the value hypothesised by [latex]H_0[/latex]. We have then hypothesised a value for the standard deviation,

\[ \sd{\hat{P}} = \sqrt{\frac{p_0(1-p_0)}{n}}, \]

so for hypothesis testing for a single proportion we don’t need to use the standard error. Note that this doesn’t happen for testing means because the null hypothesis doesn’t say anything about the standard deviation.

For Mendel’s data the standard deviation using the hypothesised value is

\[ \sqrt{\frac{0.75(1 – 0.75)}{929}} = 0.0142, \]

very similar to before since the observed [latex]\hat{p}[/latex] is so close to [latex]p_0[/latex]. The [latex]z[/latex] value becomes

\[ z = \frac{0.7589 – 0.75}{0.0142} = 0.63, \]

giving a one-sided [latex]P[/latex]-value of 0.264 and a two-sided [latex]P[/latex]-value of 0.528. The conclusions are the same as before.

# Comparing Two Proportions

Nicotine Inhalers

Bolliger et al. (2000) describe a randomised double-blind experiment on the effectiveness of oral nicotine inhalers in reducing smoking. This involved 400 volunteers who had smoked at least 15 cigarettes a day for at least 3 years, and who had tried to reduce their smoking but had failed to do so. The subjects were given an oral inhaler to use as needed, for up to 18 months, and were encouraged to limit their smoking as much as possible. Nicotine inhalers were randomly assigned to half of the subjects while the other half received a placebo inhaler.

After 4 months the researchers recorded which subjects had sustained a reduction of at least 50% in the number of cigarettes smoked each day. The table below gives a two-way table of these results, with 26% of the nicotine group achieving a smoking reduction compared to only 9% for the placebo group.

## Sustained reductions after 4 months of inhaler use

Nicotine | Placebo | |
---|---|---|

Reduction | 52 | 18 |

No Reduction | 148 | 182 |

Total | 200 | 200 |

Is there any evidence that a nicotine inhaler helped smokers sustain a reduction in smoking?

We can answer this question by comparing the proportions of all smokers who sustain a reduction in smoking using the nicotine inhaler ([latex]p_1[/latex]) and using the placebo inhaler ([latex]p_2[/latex]).

As with a comparison of two population means, to compare [latex]p_1[/latex] and [latex]p_2[/latex] we look at their difference, [latex]p_1 - p_2[/latex], and estimate it using [latex]\hat{p}_1 - \hat{p}_2[/latex]. Here we have [latex]\hat{p}_1 = 0.26[/latex] from [latex]n_1 = 200[/latex], and [latex]\hat{p}_2 = 0.09[/latex] from [latex]n_2 = 200[/latex]. The difference in proportion is 0.26 – 0.09 = 0.17, so the success proportion for the nicotine group seems to be 17% higher than for the placebo group.

This estimate can be thought of as an outcome of the sampling process [latex]\hat{P}_1 - \hat{P}_2[/latex], the process of taking two random samples and then returning the difference between their sample proportions. Both [latex]\hat{P}_1[/latex] and [latex]\hat{P}_2[/latex] have approximately Normal distributions so the difference is approximately Normal with

\[ \mean{\hat{P}_1 – \hat{P}_2} = \mean{\hat{P}_1} – \mean{\hat{P}_2} = p_1 – p_2, \]

and

\[ \sd{\hat{P}_1 – \hat{P}_2} =\sqrt{ \frac{p_{1}(1 – p_{1})}{n_1} + \frac{p_{2}(1 – p_{2})}{n_2}}. \]

As usual, there is no reason why we would know [latex]p_1[/latex] or [latex]p_2[/latex], since they are the values we are trying to estimate, so we must use the standard error instead

\[ \se{\hat{p}_1 – \hat{p}_2} = \sqrt{\frac{\hat{p}_{1}(1 – \hat{p}_{1})}{n_1} + \frac{\hat{p}_{2}(1 – \hat{p}_{2})}{n_2}}. \]

## Confidence Intervals

For a confidence interval, these formulas give

\[ (\hat{p}_1 – \hat{p}_2) \pm z^{*} \sqrt{\frac{\hat{p}_{1}(1 – \hat{p}_{1})}{n_1} + \frac{\hat{p}_{2}(1 – \hat{p}_{2})}{n_2}}, \]

where [latex]z^{*}[/latex] is the number of standard deviations in the Normal distribution for the required confidence.

For the nicotine inhaler study we can calculate a 95% confidence interval for the difference between the proportions as

\begin{eqnarray*}

& & 0.17 \pm 1.96 \sqrt{\frac{0.26(1 – 0.26)}{200} + \frac{0.09(1 – 0.09)}{200}} \\

& = & 0.17 \pm 1.96 \times 0.0370 \\

& = & 0.17 \pm 0.0725.

\end{eqnarray*}

Thus we are 95% confident that the underlying difference is between about 10% and 24%, suggesting that the presence of nicotine in the inhaler has an effect.

## Hypothesis Tests

It is clear from the confidence interval that the nicotine inhaler has been beneficial in reducing smoking since a 0 difference is a long way out of the 95% interval. However, we can also frame this kind of question as a test of [latex]H_0: p_1 = p_2[/latex]. To carry out this test we use the [latex]z[/latex] statistic

\[ z = \frac{(\hat{p}_1 – \hat{p}_2) – 0}{\sqrt{ \frac{p_{1}(1 – p_{1})}{n_1} + \frac{p_{2}(1 – p_{2})}{n_2}}}, \]

finding [latex]P[/latex]-values from the Normal distribution.

As usual, we do not know [latex]p_1[/latex] and [latex]p_2[/latex] in the denominator but we can estimate them with [latex]\hat{p}_1[/latex] and [latex]\hat{p}_2[/latex].

For the inhaler experiment, a test of [latex]H_0: p_1 = p_2[/latex] against [latex]H_1: p_1 \gt p_2[/latex] then uses the statistic

\[ z = \frac{0.17 – 0}{0.0370} = 4.59, \]

giving a [latex]P[/latex]-value close to 0. Again, this is very strong evidence in favour of [latex]H_1[/latex], suggesting that the nicotine inhalers have increased the probability of sustaining a reduction in smoking.

## Pooled Sample Proportion

As with the one-sample case, when doing a hypothesis test we have another way of dealing with the unknown population proportions in the standard deviation. We always calculate [latex]P[/latex]-values assuming the null hypothesis is true. Here the hypothesis is that [latex]p_1 = p_2[/latex], so that we assume the samples are coming from distributions with the same proportions. We can call this common proportion [latex]p[/latex] and instead of having [latex]n_1[/latex] and [latex]n_2[/latex] separate samples we are assuming that we have [latex]n_1 + n_2[/latex] samples about [latex]p[/latex]. To estimate [latex]p[/latex], we combine the counts to give the **pooled sample proportion**

\[ \hat{p} = \frac{X_1 + X_2}{n_1 + n_2}. \]

For the inhaler data,

\[ \hat{p} = \frac{52 + 18}{200 + 200} = 0.175. \]

We can use this value to estimate both [latex]p_1[/latex] and [latex]p_2[/latex] in the formula for standard deviation, giving the (slightly) simpler [latex]z[/latex] statistic

\[ z = \frac{(\hat{p}_1 – \hat{p}_2) – 0}{\sqrt{ \hat{p} (1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}. \]

Again, for the inhaler data,

\[ z = \frac{0.17 – 0}{ \sqrt{0.175(1- 0.175) \left(\frac{1}{200} + \frac{1}{200}\right)}} = 4.47. \]

This is almost identical to before, again giving strong evidence that the nicotine inhalers have been beneficial.

# Odds Ratios

An alternative way of analysing two groups in terms of how likely some outcome is to occur is through an **odds ratio**. We saw how to calculate the odds of an outcome in Chapter 8. For example, the odds for a reduction in smoking for subjects with nicotine inhalers are 0.26/0.74 = 0.3514 to 1, while in the placebo group the odds are 0.09/0.91 = 0.0989 to 1. This gives an odds ratio of

\[ \mbox{OR } = \frac{0.3514}{0.0989} = 3.55. \]

That is, the odds of sustaining a reduction in smoking after 4 months are 3.55 times higher if someone is using a nicotine inhaler. Notice that in this sentence an odds ratio has been used but only one treatment has been mentioned. This is typically what you will find when reading research articles and you should always ask yourself what is the underlying group for the odds ratio. It will often be a control group of some form, such as with the placebo treatment used in this study.

## Confidence Intervals

Finding an odds ratio of 3.55 seems to suggest that there is evidence that nicotine inhalers are beneficial in assisting the sustained reduction of smoking. However, it should be clear by now that we are not happy with an estimate by itself. We need some measure of precision. Could it be that there is really no effect and the ratio of 3.55 was just due to sampling variability?

We can determine a confidence interval for the true odds ratio in a similar way to those we have already calculated for means and proportions. The main difference arises from the fact that odds ratios can never be negative but they can be arbitrarily large. It is no surprise then that the sampling distribution for odds ratios is going to be skewed to the right, so our methods based on the Normal distribution are not going to be appropriate. The following figure shows a density curve of the odds ratio from 10000 samples where the null hypothesis is true (and where we have assumed [latex]p_1 = p_2 = 0.175[/latex] for both groups). As expected the median value for the odds ratio if the null hypothesis is true is 1, with half of the values above or below 1, but the tail to the right is longer than to the left.

An obvious solution is to transform our statistic so that its sampling distribution is more symmetric. It turns out that if you take the (natural) logarithm of the odds ratio then you get a statistic where the sampling distribution can be approximated by the Normal distribution. The figure below illustrates the effect.

For the inhaler study we find [latex]\ln(\mbox{OR}) = \ln(3.55) = 1.267[/latex].

All we need now is the standard error of this statistic. The formula for this is given by

\[ \se{\ln(\mbox{OR})} = \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}}, \]

where the counts [latex]a[/latex], [latex]b[/latex], [latex]c[/latex], and [latex]d[/latex] are the four entries in the two-way table. From the previous table (from the example on Nicotine Inhalers), we have [latex]a = 52[/latex], [latex]b = 18[/latex], [latex]c = 148[/latex], and [latex]d = 182[/latex], giving

\[ \se{\ln(\mbox{OR})} = \sqrt{\frac{1}{52} + \frac{1}{18} + \frac{1}{148} + \frac{1}{182}} = 0.2950. \]

Now a 95% confidence interval for [latex]\ln(\mbox{OR})[/latex] is

\[ 1.267 \pm 1.96 \times 0.2950 = 1.267 \pm 0.5782, \]

giving the range [latex](0.6888, 1.845)[/latex] for [latex]\ln(\mbox{OR})[/latex]. This is not what we want though, since we are interested in the odds ratio itself, rather than its logarithm. We can obtain the confidence interval for the odds ratio by raising [latex]e[/latex] to the power of each endpoint. This gives

\[ (e^{0.6888}, e^{1.845}) = (1.991, 6.328). \]

Thus we are 95% confident that the odds of a sustained reduction in smoking is between 1.99 and 6.33 times higher when using a nicotine inhaler.

If we were testing a null hypothesis that the nicotine inhaler had no effect on the reduction of smoking then we would expect an odds ratio of 1. This would mean that the odds were the same for both groups. Since 1 is outside the confidence interval we have found, we have evidence against this null hypothesis, suggesting that nicotine inhalers are effective.

# Relative Risk

A simpler comparison between proportions is given by **relative risk**. This is the ratio of the two probabilities of a certain outcome between two groups, rather than the ratio of the odds. For example, the relative “risk” of seeing a reduction in smoking between the nicotine and placebo groups is

\[ \mbox{RR } = \frac{\hat{p}_1}{\hat{p}_2} = \frac{0.26}{0.09} = 2.89. \]

That is, people using a nicotine inhaler are 2.89 times more likely to sustain a reduction in smoking for 4 months.

Like odds ratios, a relative risk has a skewed distribution and so to make inferences with it we work with the logarithm of relative risk instead. For example, the log relative risk for the inhaler study is [latex]\ln(\mbox{RR}) = \ln(2.89) = 1.061[/latex].

This statistic has an approximately Normal distribution with standard error

\[ \se{\ln(\mbox{RR})} = \sqrt{\frac{1 – \hat{p}_1}{\hat{p}_1 n_1} + \frac{1 – \hat{p}_2}{\hat{p}_2 n_2} } \]

(Agresti, 1990). For the inhaler study this is

\[ \se{\ln(\mbox{RR})} = \sqrt{\frac{1 – 0.26}{0.26 \times 200} + \frac{1 – 0.09}{0.09 \times 200} } = 0.2545. \]

A 95% confidence interval for [latex]\ln(\mbox{RR})[/latex] is thus

\[ (1.061 – 1.96 \times 0.2545, 1.061 + 1.96 \times 0.2545) = (0.5622, 1.560). \]

Taking exponentials gives the 95% confidence interval for the relative risk as

\[ (e^{0.5622}, e^{1.560}) = (1.75, 4.76), \]

so we are 95% confident that having the nicotine inhaler makes it between 1.75 and 4.76 times as likely to see a reduction in smoking after 4 months. It is also useful to think of a proportion as a rate. That is, the rate of reduction in smoking is between 1.75 and 4.76 times higher with the nicotine inhaler.

While relative risks are perhaps easier to interpret, it turns out that odds ratios have an intimate relationship with **logistic regression**, a topic we will come to in Chapter 23. For this reason you will more likely see odds ratios used in the literature for comparing groups in terms of categorical outcomes.

It is also the case that relative risk and odds ratio are close to each other for rare diseases. Supposing [latex]\hat{p}_1[/latex] and [latex]\hat{p}_2[/latex] are both close to 0, so that [latex]1-\hat{p}_1[/latex] and [latex]1-\hat{p}_2[/latex] are both close to 1, we have that

\[ \mbox{OR} = \frac{\hat{p}_1/(1-\hat{p}_1)}{\hat{p}_2/(1-\hat{p}_2)} \approx \frac{\hat{p}_1/1}{\hat{p}_2/1} = \frac{\hat{p}_1}{\hat{p}_2} = \mbox{ RR}. \]

Researchers often use this to interpret an odds ratio as a relative risk. You should be careful about doing this since the approximation becomes poor for more common outcomes. For example, the following figure shows the correspondence between relative risk and odds ratio with changing [latex]p_1[/latex] for fixed values of [latex]p_2[/latex]. The dashed line is the identity, where [latex]\mbox{RR } = \mbox{ OR}[/latex], and you can see that the match is often far from this line.

Summary

- The Normal approximation to the Binomial distribution can be used to calculate confidence intervals and hypothesis tests for a population proportion based on a sample proportion.
- The difference in a categorical response between groups can be compared using a difference of proportions, an odds ratio, or a relative risk.
- The odds ratio and relative risk have skewed distributions so intermediate calculations are performed with their natural logarithms.

Exercise 1

What is the smallest number of coin tosses you could do that would give any evidence against a fair coin at the 5% level? Corresponding to this number, what is the smallest deviation from [latex]p = 0.5[/latex] that you could hope to detect with 80% power?

Exercise 2

Use the Normal approximation with the continuity correction to estimate the probability of [latex]\pr{X = 10}[/latex] for [latex]\Binomial{X}{15}{0.4}[/latex]. Compare your answer with the exact value given in the binomial distribution.

Exercise 3

Suppose you want to conduct a survey that estimates a proportion with a margin of error of 4%. What sample size should you use?

Exercise 4

In Chapter 2 we used a sample of 10000 randomisations to estimate the [latex]P[/latex]-value in the test of whether caffeine increases pulse rate. Is this a suitable number of trials? For example, if we were interested in testing whether the [latex]P[/latex]-value was 0.05 or lower, how many trials are needed to be 95% confident of the [latex]P[/latex]-value within 0.005?

Exercise 5

Based on the survey data, calculate a 95% confidence interval for the proportion of Islanders who would say `Yes’ to the sensitive survey question.

Exercise 6

The interval in the previous exercise includes random `Yes’ responses. Use the same process as given in Chapter 9 to transform the limits of the interval into estimates of the proportion of Islanders who would approve of kissing on the first date. What happens to the margin of error?

Exercise 7

Rothwell et al. (2011) examined the results of a number of large studies to investigate the effect of daily aspirin on the long-term risk of death due to cancer. In one of the studies, the British Doctors Aspirin Trial (Peto et al., 1988), there were 3429 subjects taking daily aspirin with 75 deaths due to cancer recorded. At the same time there were 1710 subjects taking a placebo with 47 deaths due to cancer. Calculate the odds ratio of death due to cancer between the aspirin and control groups. Is there evidence of a benefit from daily aspirin?