Categorical Data

Michael Bulmer

22 Categorical Data

[latex]\newcommand{\pr}[1]{P(#1)} \newcommand{\var}[1]{\mbox{var}(#1)} \newcommand{\mean}[1]{\mbox{E}(#1)} \newcommand{\sd}[1]{\mbox{sd}(#1)} \newcommand{\Binomial}[3]{#1 \sim \mbox{Binomial}(#2,#3)} \newcommand{\Student}[2]{#1 \sim \mbox{Student}(#2)} \newcommand{\Normal}[3]{#1 \sim \mbox{Normal}(#2,#3)} \newcommand{\Poisson}[2]{#1 \sim \mbox{Poisson}(#2)} \newcommand{\se}[1]{\mbox{se}(#1)} \newcommand{\prbig}[1]{P\left(#1\right)} \newcommand{\degc}{$^{\circ}$C}[/latex]

Testing Randomness

In contrast to the algorithmic random digits seen in Chapter 2, the table following this chapter gives 1800 digits from a human asked to create a random sequence. Are these digits random?

We will look at the first half of this sequence, the first 900 digits, leaving an analysis of the second half as an exercise. There are actually many different criteria for a sequence being “random” in this context, one of which is that the outcomes should all be equally likely. Here we would expect each digit to appear 900/10 = 90 times. The observed values are given in the table below with a bar chart of these shown in the following figure. This is a one-way table, giving observed counts for a single categorical variable.

Observed and expected counts for the first 900 digits

Digit	0	1	2	3	4	5	6	7	8	9
Observed	47	101	109	90	145	111	132	75	50	40
Expected	90	90	90	90	90	90	90	90	90	90

Bar chart of observed frequency for the first 900 digits

There are certainly deviations from what we would expect, ranging from only 40 occurrences for ‘9’ up to 145 for ‘4’. Even if the digits were truly random we would not expect to get exactly 90 of each one appearing. But are the observed deviations plausible if they were truly random? This is a standard hypothesis test setting. We want to know the probability of getting values as far away (or further) than those observed by chance if they really were equally likely.

This is a good chance to reflect on the basic ideas of hypothesis testing. We are not estimating a parameter here, and so will not be talking about confidence intervals. Instead we write [latex]H_0[/latex] in words as

[latex]H_0[/latex]: observations follow hypothesised distribution.

The alternative is the very general statement

[latex]H_1[/latex]: observations do not follow hypothesised distribution.

This is known as a goodness-of-fit test and we need some way of measuring how close the observed counts are to the expected counts. An obvious measure is to add up all the differences between the observed and expected counts, since we would expect this to be bigger if there were bigger deviations. However this sum is always 0 because the positive and negative differences always cancel out. (Why?) We could fix this by adding up the absolute differences, but as usual we add up the squared differences, just as we did for the sample standard deviation. This gives the statistic
\[ \sum (\mbox{observed} – \mbox{expected})^2, \]
where the sum is over all the categories (the 10 digits).
This, however, is not perfect as it does not take into account the relative size of deviations. For example, an observed value of 20 would be the same distance from an expected value of 10 as an observed value of 1010 would be from an expected value of 1000. However the first is much more significant since the observation was double the expected, while the second is not much of a difference at all. To capture this we take the ratio of the squared difference by the expected value, giving
\[ \chi^2 = \sum \frac{(\mbox{observed} – \mbox{expected})^2}{\mbox{expected}}. \]
Here [latex]\chi[/latex] is the Greek letter chi, and this statistic is called the chi-square statistic. If there is evidence against the null hypothesis then we would expect [latex]\chi^2[/latex] to be large. Here we find
\[ \chi^2 = \frac{(47 – 90)^2}{90} + \frac{(101 – 90)^2}{90} + \cdots + \frac{(40 – 90)^2}{90} = 132.07. \]

How do we know if 132.07 could simply be due to sampling variability? We need to know the sampling distribution of this statistic, assuming that [latex]H_0[/latex] is true. This distribution is called the chi-square distribution. Like the [latex]t[/latex] distribution, there is a different chi-square distribution for each number of categories. Here we have 10 categories but the sum of the differences between observed and expected is always 0, so there are only 9 free differences in the analysis. As before, we call this the degrees of freedom of the chi-square statistic.

The figure below shows the [latex]\chi^2_9[/latex] distribution, the chi-square distribution with 9 degrees of freedom, along with the [latex]\chi^2_1[/latex], [latex]\chi^2_4[/latex] and [latex]\chi^2_8[/latex] distributions for comparison. Since we are squaring everything the value of [latex]\chi^2[/latex] can never be negative but there is no real limit on how big [latex]\chi^2[/latex] can be, so this is a rather skewed distribution.

[latex]\chi^2_1[/latex], [latex]\chi^2_4[/latex], [latex]\chi^2_8[/latex] and [latex]\chi^2_9[/latex] distributions

Like the other continuous distributions we have seen, there is no simple way of working out areas under the [latex]\chi^2_9[/latex] density curve. The table below gives the areas under the [latex]\chi^2_1[/latex] distribution as an example but it is impractical to provide tables for each degrees of freedom and is also worthless since computer packages can provide these areas easily.

[latex]\chi^2(1)[/latex] distribution

	First decimal place of [latex]x[/latex]
[latex]x[/latex]	0	1	2	3	4	5	6	7	8	9
0.0	1.000	0.752	0.655	0.584	0.527	0.480	0.439	0.403	0.371	0.343
1.0	0.317	0.294	0.273	0.254	0.237	0.221	0.206	0.192	0.180	0.168
2.0	0.157	0.147	0.138	0.129	0.121	0.114	0.107	0.100	0.094	0.089
3.0	0.083	0.078	0.074	0.069	0.065	0.061	0.058	0.054	0.051	0.048
4.0	0.046	0.043	0.040	0.038	0.036	0.034	0.032	0.030	0.028	0.027
5.0	0.025	0.024	0.023	0.021	0.020	0.019	0.018	0.017	0.016	0.015
6.0	0.014	0.014	0.013	0.012	0.011	0.011	0.010	0.010	0.009	0.009
7.0	0.008	0.008	0.007	0.007	0.007	0.006	0.006	0.006	0.005	0.005
8.0	0.005	0.004	0.004	0.004	0.004	0.004	0.003	0.003	0.003	0.003
9.0	0.003	0.003	0.002	0.002	0.002	0.002	0.002	0.002	0.002	0.002
10.0	0.002	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
11.0	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
12.0	0.001	0.001

This table gives [latex]\pr{X^2 \ge x}[/latex] where [latex]X^2 \sim \chi^2_1[/latex].

The following table provides the critical values for a range of degrees of freedom so you can see their general pattern. Unlike the [latex]t[/latex] distributions, the critical values here keep getting higher as the degrees of freedom increase, not surprising since [latex]\chi^2[/latex] is the sum of more and more terms.

[latex]\chi^2[/latex] distribution

	Probability [latex]p[/latex]
df	0.975	0.95	0.25	0.10	0.05	0.025	0.01	0.005	0.001
1	0.001	0.004	1.323	2.706	3.841	5.024	6.635	7.879	10.83
2	0.051	0.103	2.773	4.605	5.991	7.378	9.210	10.60	13.82
3	0.216	0.352	4.108	6.251	7.815	9.348	11.34	12.84	16.27
4	0.484	0.711	5.385	7.779	9.488	11.14	13.28	14.86	18.47
5	0.831	1.145	6.626	9.236	11.07	12.83	15.09	16.75	20.52
6	1.237	1.635	7.841	10.64	12.59	14.45	16.81	18.55	22.46
7	1.690	2.167	9.037	12.02	14.07	16.01	18.48	20.28	24.32
8	2.180	2.733	10.22	13.36	15.51	17.53	20.09	21.95	26.12
9	2.700	3.325	11.39	14.68	16.92	19.02	21.67	23.59	27.88
10	3.247	3.940	12.55	15.99	18.31	20.48	23.21	25.19	29.59
11	3.816	4.575	13.70	17.28	19.68	21.92	24.72	26.76	31.26
12	4.404	5.226	14.85	18.55	21.03	23.34	26.22	28.30	32.91
13	5.009	5.892	15.98	19.81	22.36	24.74	27.69	29.82	34.53
14	5.629	6.571	17.12	21.06	23.68	26.12	29.14	31.32	36.12
15	6.262	7.261	18.25	22.31	25.00	27.49	30.58	32.80	37.70
16	6.908	7.962	19.37	23.54	26.30	28.85	32.00	34.27	39.25
17	7.564	8.672	20.49	24.77	27.59	30.19	33.41	35.72	40.79
18	8.231	9.390	21.60	25.99	28.87	31.53	34.81	37.16	42.31
19	8.907	10.12	22.72	27.20	30.14	32.85	36.19	38.58	43.82
20	9.591	10.85	23.83	28.41	31.41	34.17	37.57	40.00	45.31
21	10.28	11.59	24.93	29.62	32.67	35.48	38.93	41.40	46.80
22	10.98	12.34	26.04	30.81	33.92	36.78	40.29	42.80	48.27
23	11.69	13.09	27.14	32.01	35.17	38.08	41.64	44.18	49.73
24	12.40	13.85	28.24	33.20	36.42	39.36	42.98	45.56	51.18
25	13.12	14.61	29.34	34.38	37.65	40.65	44.31	46.93	52.62
26	13.84	15.38	30.43	35.56	38.89	41.92	45.64	48.29	54.05
27	14.57	16.15	31.53	36.74	40.11	43.19	46.96	49.64	55.48
28	15.31	16.93	32.62	37.92	41.34	44.46	48.28	50.99	56.89
29	16.05	17.71	33.71	39.09	42.56	45.72	49.59	52.34	58.30
30	16.79	18.49	34.80	40.26	43.77	46.98	50.89	53.67	59.70
40	24.43	26.51	45.62	51.81	55.76	59.34	63.69	66.77	73.40
50	32.36	34.76	56.33	63.17	67.50	71.42	76.15	79.49	86.66
60	40.48	43.19	66.98	74.40	79.08	83.30	88.38	91.95	99.61
70	48.76	51.74	77.58	85.53	90.53	95.02	100.4	104.2	112.3
80	57.15	60.39	88.13	96.58	101.9	106.6	112.3	116.3	124.8
90	65.65	69.13	98.65	107.6	113.1	118.1	124.1	128.3	137.2
100	74.22	77.93	109.1	118.5	124.3	129.6	135.8	140.2	149.4

This table gives [latex]x^{*}[/latex] such that [latex]\pr{X^2 \ge x^{*}} = p[/latex], where [latex]X^2 \sim \chi^2(\mbox{df})[/latex].

The [latex]P[/latex]-value we want is [latex]\pr{X^2 \ge 132.07}[/latex], where [latex]X^2 \sim \chi^2_9[/latex]. For 9 degrees of freedom in the table above this [latex]P[/latex]-value is far below 0.001. Note that this is a two-sided [latex]P[/latex]-value already, since positive and negative deviations have been squared and combined, so there is no need to multiply it by 2. This is very strong evidence against [latex]H_0[/latex] so in conclusion there is very strong evidence to suggest that the human-generated numbers are not uniformly distributed.

Assumptions

Note that the observed counts in our data are discrete and so the [latex]\chi^2[/latex] statistic is also discrete, even though it might look like a continuous decimal number. However, the [latex]\chi^2[/latex] distribution we are using for hypothesis testing is continuous, so the underlying assumption is that this continuous approximation to the real discrete distribution is a good one. This is analogous to using the Normal distribution to approximate the Binomial distribution for proportion tests.

To satisfy this assumption we use the rule of thumb that all expected counts should be at least 1 and 80% of them should be at least 5. In the random digits example this is justified, with all expected counts equal to 90. In the section below we’ll see an example where we need to combine groups to satisfy the assumption.

Correlation Test of Randomness

Before we continue discussing categorical data, note that there are other requirements that a genuinely random sequence of numbers should satisfy. One important one is that consecutive numbers should be independent. This is particularly important if we were using a random number generator on a calculator or computer to help choose random samples for an experiment. All of the statistical tests we have described assume that samples are independent of each other and so a poor random number generator could undermine our studies.

One way of testing for independence here is to tally occurrences of the 100 possible pairs of digits. If the outcomes were independent then these 100 pairs should be equally likely and we could use a chi-square statistic with 99 degrees of freedom to test this uniformity.

Another method is to make a scatter plot where each point represents a digit and the digit that followed it in the sequence. Such a plot is shown in the following figure, where jittering has been used to separate points which would otherwise be obscured (since this is discrete data). If one digit and the next were independent then there should be no association present in this plot. However, there are noticeable gaps (an ‘8’ was never followed by a ‘1’) and very dense combinations (‘6’ was often followed by ‘5’). This leads to a smoothed line suggesting that perhaps there is a positive association between one digit and the next.

Jittered scatter plot of consecutive digits

We can support the visual impression by calculating the correlation coefficient, [latex]r = +0.232[/latex]. This is not particularly large but it is significantly different from 0 ([latex]p \lt 0.001[/latex]). This gives strong evidence that low digits tend to be followed by low digits and high digits tend to be followed by high digits. Thus there is further evidence that the digits produced are not genuinely random.

Parametric Distributions

Poisson Yeast Cells

The table below shows the counts of yeast cells made by Student (1907), along with the expected counts from the Poisson distribution with [latex]\lambda = 4.68[/latex]. Is there any evidence to suggest that the Poisson distribution is not appropriate for this data?

Observed and expected counts of yeast cells

Yeast Cells	0-1	2	3	4	5	6	7	8	9+
Observed	20	43	53	86	70	54	37	18	19
Expected	21.1	40.6	63.4	74.2	69.4	54.2	36.2	21.2	19.7

For [latex]\Poisson{X}{4.68}[/latex] we have [latex]\pr{X=0} = 0.009279[/latex] so that the expected count is [latex]400 \times 0.009279 = 3.71[/latex]. Since this is quite low we have combined the 0 and 1 counts together to satisfy the assumptions given earlier. Similarly, since the Poisson probabilities all get small for large [latex]x[/latex] and so we combine 9 and over into a group. The expected value for this group can be calculated using complements since
\[ \pr{X \ge 9} = 1 – \pr{X \le 8} = 1 – \sum_{x=0}^8 \frac{e^{-\lambda} \lambda^x}{x!}. \]

We calculate the [latex]\chi^2[/latex] statistic
\[ x = \frac{(20 – 21.1)^2}{21.1} + \frac{(43 – 40.6)^2}{40.6} + \cdots + \frac{(19 – 19.7)^2}{19.7} = 4.31. \]
The basic degrees of freedom are 9 – 1 = 8. However, for this example we did not actually know the theoretical distribution before we looked at the data, unlike the previous example where we could specify the uniform expected values beforehand. To get the expected values here we first needed to estimate the Poisson parameter [latex]\lambda[/latex] from the data. When we do this we lose one degree of freedom, just as when we calculated the sample standard deviation we lost one degree of freedom because we had to estimate the sample mean or when we calculated the residual standard error we lost two degrees of freedom because we had to estimate the sample intercept and slope. Hence the degrees of freedom for this test are 9 – 2 = 7.

From the [latex]\chi^2[/latex] table we find the [latex]P[/latex]-value is greater than 0.25. This gives no evidence against the null hypothesis and so the observed counts are consistent with a Poisson(4.68) distribution.

Relationship to Proportion Test

Mendel Revisited

In Chapter 17 we analysed Mendel’s experiment concerning the inheritance of pea plant flower colours. The two counts, 705 purple and 224 white, can also be written as a one-way table. In the table below we have put the counts together with the expected counts based on Mendel’s theory of a 3:1 ratio.

Observed and expected counts for Mendel's experiment

Colour	Purple	White
Observed	705	224
Expected	696.75	232.25

To test the significance of the deviations from the expected values we calculate the [latex]\chi^2[/latex] statistic
\[ x = \frac{(705 – 696.75)^2}{696.75} + \frac{(224 – 232.25)^2}{232.25} = 0.39. \]
This statistic has 1 degree of freedom. From the [latex]\chi^2(1)[/latex] table we see the [latex]P[/latex]-value is around 0.527, no evidence that the observed results differ from the theory.

Note that this is almost identical to the result we obtained in Chapter 17 when we used the hypothesised value of [latex]p[/latex] to estimated the standard deviation of [latex]\hat{p}[/latex]. In fact the [latex]z[/latex] value there was 0.626 and [latex]0.626^2[/latex] = 0.39, the value of [latex]x[/latex]. The [latex]\chi_1^2[/latex] distribution is just the square of the Normal distribution. The chi-square test for one-way tables is thus a generalisation of the one-sample test of a proportion using the Normal distribution, allowing us to test a distribution with more than one free proportion. This is similar to the relationship we saw in Chapter 19, where the [latex]F[/latex] test for comparing two means gave identical results to the pooled two-sample [latex]t[/latex] test.

Two-Way Tables

In Chapter 17 we looked at the effect of nicotine inhalers on smoking reduction using a comparison between two proportions. We can try to determine whether the inhalers are beneficial by testing for an association in the two-way table of counts. The following table shows this data again, with marginal totals included. Our null hypothesis is that the inhaler contents and the reduction outcome are independent, while the alternative hypothesis is simply that they are not.

Sustained reductions after 4 months of inhaler use

	Nicotine	Placebo	Total
Reduction	52	18	70
No Reduction	148	182	330
Total	200	200	400

From the marginal distributions we see that 200/400 = 0.5 of the subjects had the nicotine inhaler, while 70/400 = 0.175 of the subjects had a reduction. If there was no association between inhaler contents and reduction then these outcomes should be independent of each other. We should then be able to multiply their proportions together to estimate the proportion of subjects having nicotine and having a reduction,
\[ 0.5 \times 0.175 = 0.0875. \]
Thus we would expect 8.75% of all subjects to have this combination. Now 8.75% of 400 is 35, compared to the observed value of 52. Is this a significant difference? We can use a chi-square test to find out.

Firstly, we calculate the other three expected counts. Note that we divide by 400 twice in getting our proportions but then multiply by it at the end. We can save one step and give the simple formula
\[ \mbox{expected count } = \frac{\mbox{row total } \times \mbox{ column total}}{\mbox{total}}. \]
For example, we would expect the count of subjects in the placebo group who don’t sustain a reduction to be
\[ \frac{200 \times 330}{400} = 165. \]
The table below gives all the expected counts.

Expected counts for inhaler data

	Nicotine	Placebo
Reduction	35	35
No Reduction	165	165

We can simply work out the chi-square statistic as before,
\[ \chi^2 = \frac{(52 – 35)^2}{35} + \cdots + \frac{(182 – 165)^2}{165} = 20.02. \]
This statistic has a [latex]\chi^2_{\mbox{df}}[/latex] distribution with
\[ \mbox{df } = (\mbox{rows } – 1) \times (\mbox{columns } – 1), \]
since the degrees of freedom from the two variables multiply in the same way that you multiply the number of rows and columns to find the number of cells. For this example, df = 1. The [latex]\chi^2(1)[/latex] table or the [latex]\chi^2[/latex] table tell us the [latex]P[/latex]-value is very close to 0. Thus there is very strong evidence of an association between the type of inhaler and whether a reduction was sustained.

Note that [latex]\sqrt{20.02} = 4.47[/latex], the [latex]z[/latex] statistic we found in Chapter 17 with the pooled sample proportion, so this test for association is identical to the two-sample proportion test using the Normal approximation. The advantage of the chi-square test is that it can be applied to tables with more than two rows or columns.

Pizza Preference and Sex

The table below gives the two-way table of counts of 200 Islanders by pizza preference and sex that we first saw in Chapter 6.

Counts of preferred pizza by sex

	Mushroom	Pineapple	Prawns	Sausage	Spinach	Total
Female	10	39	17	13	23	102
Male	18	10	13	36	21	98
Total	28	49	30	49	44	200

The table below shows the expected counts if the two variables were independent.

Expected counts of preferred pizza by sex

	Mushroom	Pineapple	Prawns	Sausage	Spinach	Total
Female	14.3	25.0	15.3	25.0	22.4	102
Male	13.7	24.0	14.7	24.0	21.6	98
Total	28	49	30	49	44	200

The [latex]\chi^2[/latex] statistic is [latex]30.8[/latex] with 4 degrees of freedom (2 rows and 5 columns). From the [latex]\chi^2[/latex] table we find the [latex]P[/latex]-value is less than 0.001, giving substantial evidence from this data to suggest that pizza preference differs between males and females.

Fisher’s Exact Test

For small sample sizes the test based on the [latex]\chi^2[/latex] distribution will usually give a poor approximation to the [latex]P[/latex]-values. An alternative is to use Fisher’s exact test (Glantz, 2002). This procedure enumerates all the possible tables that would be as unusual as the one obtained assuming no association or simulates the generation of such tables if the number of possibilities is too big. Either method is straightforward but can require a lot of calculation and so is usually left to a computer.

Simpson’s Paradox

Appleton et al. (1996) surveyed women twenty years after they were part of a study in 1972-1974 on thyroid and heart disease. The following table shows the survival status at the time of the second survey of the 1314 women who had been classified as either a current smoker or as never having smoked in the original survey.

Two-way table of smoking and survival

	Survival
Survival	Yes	No	Total
Dead	139	230	369
Alive	443	502	945
Total	582	732	1314

Of the 582 women who smoked, 443 were still alive after twenty years, a survival rate of 76%. Of the 732 women who didn’t smoke, 502 were still alive, a survival rate of 69%. That is interesting: it seems that, for the population these women came from, smoking might actual help survival. A chi-square test of association gives a [latex]P[/latex]-value of 0.003, strong evidence that smoking status and survival are related.

This seems like good news for smokers! But now consider the table below which shows the same data but as a three-way table with an extra variable for age group.

Three-way table of age group, smoking and survival

	Age Group
	18-44		45-64		65+
	Smoking		Smoking		Smoking
Survival	Yes	No	Yes	No	Yes	No	Total
Dead	19	13	78	52	42	165	369
Alive	269	327	167	147	7	28	945
Total	288	340	245	199	49	193	1314

We can now look at the relationship between smoking status and survival for the different age groups.

For 18-44 year olds, 269 out of 288 smokers survived (93%) compared to 327 out of 340 nonsmokers (96%). For this age group it was better to be a nonsmoker.
For 45-64 year olds, 167 out of 245 smokers survived (68%) compared to 147 out of 199 nonsmokers (74%). For this age group it was also better to be a nonsmoker.
For women 65 and over, 7 out of 49 smokers survived (14%) compared to 28 out of 193 nonsmokers (15%). For this age group it was also slightly better to be a nonsmoker.

So for each age group the survival rate was higher for nonsmokers. That seems odd since above we saw that altogether it was smokers who had the better survival rate. The same data are giving us completely opposite conclusions!

This phenomenon is known as Simpson’s paradox. We will leave you to figure out which is the correct conclusion and why the other one gives the opposite result.

Summary

The chi-square test is a general procedure for comparing observed and expected counts.
For a one-way table the expected counts need to come from some hypothetical distribution.
Comparing counts to a uniform distribution is a simple test of randomness.
For a two-way table the expected counts come from the null hypothesis of no association between the two variables.
Simpson’s paradox is an example where ignoring a variable can dramatically change conclusions from a study.

Exercise 1

Consider the last 900 digits of the human random digits. Are these digits uniformly random?

Exercise 2

This table in the Appendix gives 1800 decimal places of [latex]e[/latex]. Are the decimal digits of [latex]e[/latex] uniformly random?

Exercise 3

This table in the Appendix gives 1800 decimal places of [latex]\pi[/latex]. Are the decimal digits of [latex]\pi[/latex] uniformly random?

Exercise 4

The small town of Shinobi is located near a lake that has been associated with mystical powers. Nathan Yamada, a resident of Shinobi, rolled a die 120 times to give the results shown in the table below. Is there evidence of anything unusual with the dice rolls?

Outcomes of 120 dice rolls in the Island village of Shinobi

65612	64662	32226	35535	22565	25525
16132	12451	33635	66521	35553	41332
42565	21541	35113	55624	55362	65635
14232	11532	52635	56661	54544	44152

Exercise 5

Is there any evidence that the data collected for Exercise 9 of Chapter 11 does not follow a Poisson distribution?

Exercise 6

A table in Chapter 6 gives a three-way table of pizza preference by sex and island. Collapse this data into a two-way table of pizza preference and island. Is there any evidence that pizza preference differs between the two islands?

Exercise 7

Discuss the data in Chapter 22. What is the correct conclusion to make and why does the other table suggest the opposite conclusion?

Exercise 8

Green roofs are becoming a standard way of introducing vegetation in dense urban areas. Fernandez-Cañero et al. (2013) conducted a survey to assess attitudes towards green roof systems. From responses to 450 questionnaires they obtained the data shown in the table below. Is there evidence of an age difference in the interest in green roof systems?

Interest in green roof systems by age group

Age group	Under 18	18-25	26-40	Over 40
Interested	136	42	48	75
Not interested	83	18	31	17

Human random digits

45632	68450	63215	64789	62354	56121	33654	12126	44789	50112	35641	13254
46877	46521	11254	45789	64423	65789	65121	21523	65498	95632	45630	12186
65452	64478	96542	36512	45879	86614	23211	25354	40708	96362	13354	68755
21246	57252	01230	65456	98726	32548	96542	36544	21224	31675	98382	01645
82053	46725	64352	16546	53411	54653	82861	54653	60452	56563	74124	61365
67553	42116	56535	21454	62351	07467	53210	34411	32124	56467	67589	54624
22132	12466	72580	86549	79889	98764	52013	62656	05463	21154	84643	79164
58360	43461	25648	70586	32132	54060	96735	64945	46538	64594	65043	62165
04257	46686	44213	74663	46564	57986	93568	21326	45821	26432	80246	42467
63421	54679	47365	03253	61213	45767	67360	03431	26342	46157	28240	64560
84327	20542	32727	32373	16314	32789	21327	09127	13297	15321	41461	16420
45214	65542	57046	44141	44501	47147	68234	26146	50434	23516	46382	46151
46594	67246	15179	73986	73246	18467	97237	53488	74797	94653	15464	95267
34641	67897	97978	46121	21346	45797	91521	34543	14123	03451	32751	32761
76213	76842	73683	72014	00154	07404	17010	19480	76210	07104	46045	40174
00147	90545	71747	47448	07106	70075	15746	72727	07501	10457	10471	24178
10220	47579	46210	76867	51376	15746	12374	41645	32873	79783	14618	42556
42150	64274	06754	71419	87672	99874	62135	16143	45619	78495	64315	12036
43458	77847	31648	54546	41818	79798	46598	45146	12121	30615	44949	74825
36416	54894	97494	94530	32041	49050	74694	16731	45106	06085	05074	61810
58150	64154	51241	41079	78707	40079	15460	24641	87914	24645	27300	41948
46421	46745	42767	23154	68442	16427	35149	18246	15645	12465	16541	24376
54914	87216	45461	24681	94216	24900	97040	60504	05030	60405	12400	64867
91046	84142	70160	16421	97364	76915	61464	35247	97586	45356	45163	25798
83164	51356	48135	60804	50604	08780	98075	02032	06504	25010	20560	87875
77814	54110	74870	14749	74070	45781	07181	48779	87114	00413	21457	46512
49687	46541	32786	15762	13761	65745	16740	01567	84270	76491	84017	94047
59407	44974	51654	74974	97497	79712	04176	75746	54401	46404	46752	28346
41516	07288	58462	52346	75465	98497	72561	52432	64578	52612	40145	76761
24204	65413	12768	21846	31043	54196	84657	31224	68149	84216	45124	63149

These digits were generated by a human asked to type a sequence of 1800 random digits.

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

A Portable Introduction to Data Analysis Copyright © 2024 by The University of Queensland is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.