3 Data weighting
Christopher Sexton
Sample compared to population
The baseline Australian Oral Health Workforce Cohort Study collected demographic variables that were used to weight the sample to registry data collected from Ahpra. This data included counts of:
- Gender (Male/Female)
- Age (years)
- Division:
- Primary state of practice
This non-identifiable data was requested from Ahpra through freedom of information request. This population level data was compared to the eligible sample to the population of registered oral health practitioners within these divisions. A comparison of the percentages for the gender, age, state and division of practice show that the sample of respondents are similar to the population of available data from Ahpra (Table 3.1).
Table 3.1. Australian oral health workforce survey respondent characteristics compared to data from the Australian Health Profession Regulation Agency.
|
Source of Data |
|
Survey N = 431 |
N = 5,405 |
|
|
n (%) |
n (%) |
Gender |
|
|
Male |
33 (7.8) |
467 (8.6) |
Female |
391 (92.2) |
4,938 (91.4) |
Unknown |
7 |
0 |
Age (years) |
|
|
Less than 25 |
35 (8.3) |
478 (8.8) |
25 – 29 |
69 (16.3) |
910 (16.8) |
30 – 34 |
93 (21.9) |
1,075 (19.9) |
35 – 39 |
62 (14.6) |
816 (15.1) |
40 – 44 |
50 (11.8) |
545 (10.1) |
45 – 49 |
32 (7.5) |
426 (7.9) |
50 – 54 |
32 (7.5) |
396 (7.3) |
55 – 59 |
33 (7.8) |
337 (6.2) |
60+ |
18 (4.3) |
422 (7.8) |
Unknown |
7 |
0 |
Primary state of practice |
|
|
91 (21.6) |
1,406 (26.0) |
|
79 (18.8) |
1,192 (22.1) |
|
104 (24.7) |
968 (17.9) |
|
82 (19.5) |
754 (14.0) |
|
42 (10.0) |
852 (15.8) |
|
9 (2.1) |
89 (1.6) |
|
7 (1.7) |
99 (1.8) |
|
7 (1.7) |
45 (0.8) |
|
Unknown |
10 |
0 |
Practitioner division |
|
|
133 (31.4) |
1,446 (26.8) |
|
34 (8.0) |
667 (12.3) |
|
225 (53.1) |
2,787 (51.6) |
|
25 (5.9) |
382 (7.1) |
|
Other combinations |
7 (1.7) |
123 (2.3) |
Unknown |
7 |
0 |
A list of minor differences between the sample and Ahpra are:
- Males were underrepresented;
- Age groups follow a similar pattern but the range of respondents is lower as there were no respondents greater in age than 69;
- There were few respondents that were older than 60;
- QLD and SA respondents were over-represented and NSW, VIC and WA were underrepresented;
- TAS, ACT and NT had limited number of respondents in total;
- DHs were over-represented; and
- The division of practice that had limited number of respondents were DTs, DT/ DH and Other combination.
- There were 10 respondents that did not provide sufficient data on gender, age, state of practice or practitioner division. These respondents were excluded from weighting.
Recoding data for weighting
The sample and population datasets were prepared for calculating the weights of the sample to be representative of the population through the following changes:
- The sample did not contain data from respondents older than 69, these respondents were excluded Ahpra population;
- Age groups were recoded for the sample and the Ahpra population to: Less than 30, 30 to 40, 40 to 50 and 50+;
- TAS, ACT and NT were recoded into one group due to small numbers in the sample;
- Divisions were recoded for the sample and the Ahpra population to DH, OHTs, and All other combinations.
Base weights are the inverse of probability for being included in the sample. To calculate the weights, the sample has to be mutually exclusive of the population. Therefore, the number of sample respondents for all combinations of gender, age, state and division were subtracted from the Ahpra populations. The Australian Oral Health Workforce Cohort 2023 Sample were marked as respondents and the remaining Ahpra population were non-respondents (Table 3.2).
Table 3.2. Mutually exclusive groups of the Australian oral health workforce survey respondents compared to non-respondents from the Australian Health Profession Regulation Agency.
|
Mutually exclusive |
|
Respondent |
Non-respondent |
|
|
N = 431 |
N = 3,907 |
|
n (%) |
n (%) |
Gender |
|
|
Male |
33 (7.8) |
189 (4.8) |
Female |
391 (92.2) |
3,718 (95.2) |
Unknown |
7 |
0 |
Age (years) |
|
|
Less than 30 |
104 (24.5) |
1,147 (29.4) |
30 – 40 |
155 (36.6) |
1,560 (39.9) |
40 – 50 |
82 (19.3) |
653 (16.7) |
50 or more |
83 (19.6) |
547 (14.0) |
Unknown |
7 |
0 |
Primary state of practice |
|
|
91 (21.6) |
1,127 (28.8) |
|
79 (18.8) |
864 (22.1) |
|
104 (24.7) |
671 (17.2) |
|
82 (19.5) |
590 (15.1) |
|
42 (10.0) |
578 (14.8) |
|
23 (5.5) |
77 (2.0) |
|
Unknown |
10 |
0 |
Practitioner Division |
|
|
133 (31.4) |
1,113 (28.5) |
|
225 (53.1) |
2,345 (60.0) |
|
All other combinations |
66 (15.6) |
449 (11.5) |
Unknown |
7 |
0 |
Modelling pseudo-inclusion probabilities
Quasi-randomisation attempts to model the pseudo-inclusion probabilities to correct for selection bias in non-probability samples. The inverse of the pseudo-probabilities are the base weights for the sample. The probabilities are modelled using logistic regression to model responses for given covariates. This process is similar to propensity score-adjustment.
Univariate logistic regression models that used the predictors gender, age, state and profession as predictors of inclusion in the sample were initially built. Akaike’s Information Criteria (AIC) identified the starting model with the lowest AIC value. Subsequent factors were added to the model and log-likelihood ratio tests were used to assess whether the additional factors improved the model fit. The parsimonious model that minimised AIC and maximised the log-likelihood included the main effects for state, practitioner division and gender to predict the probability of responding to the survey.
The base weights were adjusted by the formula:
The probabilities for each combination of state, profession, gender and age groups inverted formed the propensity score-adjusted base weights for the survey data. The resulting weights are not scaled to match the sample size of the survey.
This adjustment maintains the weighted percentages from the base weights but scales the weighted sample to approximate the unadjusted respondent sample size (Table 3.3). This adjustment is necessary as the sample is approximately almost one-tenth the quantity of the reference data from Ahpra.
Calibration to REFERENCE data
The next stage of adjustment is to calibrate the weights so that marginal percentages of the sample covariates are tuned to match the percentages of the target population. Generalised regression (GREG) raking estimators were used to tune the weights of the sample to match the percentages of the Ahpra population of OHPs.
Some weights of practitioners were substantially weighted higher and may have undue influence on subsequent estimations. Due to this, calibration was repeated with a maximum weight enforced. This weighting was trimmed at wi= 25 and weights higher than this were re-distributed iteratively across the remaining sample to maintain the population size from Ahpra.
The effect of trimming the weight should reduce variance at the expense of precision. This is demonstrated in Table 3.4, where the point estimates for the marginal percentages of the trimmed dataset vary from the Ahpra percentages but the width of the condifence intervals are decreased.
Table 3.3. Unweighted Australian oral health workforce survey respondent characteristics adjusted by base weights and adjusted weights. Australian Health Practitioner Regulation Agency provided for comparison.
|
Sample |
Base weights |
Propensity score adjusted weights |
Ahpra Population |
N (%) N = 421 |
(95% CI) N = 476 |
(95% CI) N = 424 |
N (%) N = 5,389 |
|
Gender |
||||
Male |
33 (7.8) |
8.6 (6.18, 11.9) |
8.6 (6.18, 11.9) |
466 (8.6) |
Female |
388 (92.2) |
91.4 (88.1, 93.8) |
91.4 (88.1, 93.8) |
4,923 (91.4) |
Age (years) |
||||
Less than 30 |
101 (24.0) |
23.6 (19.8, 27.9) |
23.6 (19.8, 27.9) |
1,388 (25.8) |
30 – 40 |
155 (36.8) |
36.8 (32.3, 41.6) |
36.8 (32.3, 41.6) |
1,891 (35.1) |
40 – 50 |
82 (19.5) |
19.5 (15.9, 23.6) |
19.5 (15.9, 23.6) |
971 (18.0) |
50 or more |
83 (19.7) |
20.1 (16.5, 24.2) |
20.1 (16.5, 24.2) |
1,139 (21.1) |
Primary state of practice |
||||
91 (21.6) |
20.8 (17.2, 24.9) |
20.8 (17.2, 24.9) |
1,404 (26.1) |
|
79 (18.8) |
18.2 (14.8, 22.1) |
18.2 (14.8, 22.1) |
1,189 (22.1) |
|
104 (24.7) |
25.3 (21.3, 29.7) |
25.3 (21.3, 29.7) |
964 (17.9) |
|
82 (19.5) |
19.7 (16.2, 23.9) |
19.7 (16.2, 23.9) |
751 (13.9) |
|
42 (10.0) |
9.6 (7.13, 12.7) |
9.6 (7.13, 12.7) |
849 (15.8) |
|
23 (5.5) |
6.4 (4.32, 9.52) |
6.4 (4.32, 9.52) |
232 (4.3) |
|
Division of practice |
||||
132 (31.4) |
31.5 (27.2, 36.2) |
31.5 (27.2, 36.2) |
1,438 (26.7) |
|
223 (53.0) |
52.0 (47.2, 56.8) |
52.0 (47.2, 56.8) |
2,787 (51.7) |
|
All other combinations |
66 (15.7) |
16.4 (13.1, 20.4) |
16.4 (13.1, 20.4) |
1,164 (21.6) |
Table 3.4. Unweighted Australian oral health workforce survey respondent characteristics adjusted by base weights and adjusted weights. Australian Health Practitioner Regulation Agency provided for comparison.
|
Raked |
Raked and trimmed |
Ahpra population |
(95% CI) |
(95% CI) |
n (%) |
|
|
N = 5,389 |
N = 5,389 |
N = 5,389 |
Gender |
|
|
|
Male |
8.6 (6.1, 12.1) |
8.7 (6.2, 12.2) |
466 (8.6) |
Female |
91.4 (87.9, 93.9) |
91.3 (87.8, 93.8) |
4,923 (91.4) |
Age (years) |
|
|
|
Less than 25 |
8.9 (6.31, 12.3) |
9.0 (6.4, 12.5) |
478 (8.9) |
25 – 29 |
16.9 (13.3, 21.2) |
17.0 (13.4, 21.3) |
910 (16.9) |
30 – 34 |
19.9 (16.3, 24.2) |
20.3 (16.6, 24.5) |
1,075 (19.9) |
35 – 39 |
15.1 (11.8, 19.2) |
15.4 (12.1, 19.5) |
816 (15.1) |
40 – 44 |
10.1 (7.60, 13.3) |
10.3 (7.8, 13.6) |
545 (10.1) |
45 – 49 |
7.9 (5.5, 11.2) |
8.0 (5.6, 11.3) |
426 (7.9) |
50 – 54 |
7.3 (5.1, 10.5) |
7.4 (5.2, 10.6) |
396 (7.3) |
55 – 59 |
6.3 (4.3, 9.0) |
6.3 (4.4, 9.1) |
337 (6.3) |
60 – 64 |
5.6 (3.0, 10.3) |
4.2 (2.3, 7.7) |
302 (5.6) |
65 – 69 |
1.9 (0.9, 4.1) |
2.0 (0.9, 4.1) |
104 (1.9) |
Primary state of practice |
|
|
|
26.1 (21.6, 31.0) |
25.9 (21.6, 30.8) |
1,404 (26.1) |
|
22.1 (18.0, 26.8) |
22.1 (18.1, 26.7) |
1,189 (22.1) |
|
17.9 (14.7, 21.6) |
18.2 (15.0, 21.8) |
964 (17.9) |
|
13.9 (11.1, 17.3) |
14.1 (11.3, 17.5) |
751 (13.9) |
|
15.8 (11.9, 20.6) |
15.6 (11.8, 20.4) |
849 (15.8) |
|
1.6 (0.82, 3.2) |
1.7 (0.8, 3.3) |
88 (1.6) |
|
1.8 (0.7, 4.6) |
1.6 (0.7, 3.5) |
99 (1.8) |
|
0.8 (0.4, 1.9) |
0.9 (0.4, 2.0) |
45 (0.8) |
|
Practitioner division |
|
|
|
26.7 (22.6, 31.2) |
27.1 (23.0, 31.6) |
1,438 (26.7) |
|
12.3 (8.8, 17.0) |
11.5 (8.3, 15.8) |
664 (12.3) |
|
51.7 (46.5, 56.9) |
52.2 (47.1, 57.3) |
2,787 (51.7) |
|
7.0 (4.6, 10.4) |
6.8 (4.6, 10.0) |
377 (7.0) |
|
Other combination |
2.3 (1.1, 4.8) |
2.3 (1.1, 4.86) |
123 (2.3) |
Quality of survey weights
The quality of the survey weights throughout the calculation process is demonstrated in the following and table. The weighted sample numbers show how the weights have changed the estimated sample size to finally be equal to the Ahpra registered oral health practitioners. The mean weights and standard deviation values (SD) show how the mean weight has changed after each calculation and adjustment. Further, the standard deviation was reduced when the raked weights were trimmed for the extreme values.
Overall, the reduction in the variation as measured by SD after trimming does not improve the precision of the percentage estimates. Therefore, the untrimmed raked weights were used for all weighted analysis in this report.
Table 3.5. Summary statistics of staged weight calculations.
Weighting type |
Individuals |
Weighted numbers |
Minimum |
Maximum |
Mean |
Median |
||
Base |
421 |
476 |
1.059 |
1.786 |
1.131 |
0.075 |
1.113 |
0.067 |
Adjusted |
421 |
424 |
0.942 |
1.589 |
1.007 |
0.066 |
0.990 |
0.060 |
Raked |
421 |
5389 |
4.164 |
41.011 |
12.800 |
5.370 |
11.923 |
7.152 |
Raked (trim) |
421 |
5389 |
4.388 |
25.000 |
12.800 |
4.578 |
12.148 |
7.152 |
Australian Health Practitioner Regulation Agency
Dental Hygienists
Dental Therapists
Oral Health Therapists
Australian Capital Territory
New South Wales
Northern Territory
Queensland
South Australia
Tasmania
Victoria
Western Australia
Akaike’s Information Criteria
Generalised regression
Oral health practitioner
standard deviation
Interquartile range