- 1 Introduction
- 2 Cleaning of time use diary data
- 3 Report on Adapted PPVT-III and 'Who Am I?'
- 4 Imputations to solve missing data problems in Wave 2.5
- 5 Review of main educational program of 4-5 year olds
- 6 Cleaning of income data
- 7 Height differences
- 8 Data issues in Wave 3.5
- 9 Data issues in Wave 4
- 10 Data issues in Wave 5
- 11 Smoking inside the household
- 12 Missing data for Wave 6 items
- 13 Issues with breadwinner questions
- 14 Date of birth corrections
- 15 Minor changes for weight, BMI & and height percentiles and z-scores
- 16 Body fat percentage data corrections
- 17 Wave 4 salary and wages
- 18 Study children allergies (issues with Wave 6 and 7 data)
- 19 After school care issue Wave 7 B cohort
- 20 Who is mother/father issue
- 21 Repeated a year level issue
- 22 Executive functioning - CogState - missing data Wave 7
- 23 Expected/received child support per child
- 24 Reason for change in education institution - SC CAI 6.5 (pc44c3b1):
- 25 Child support - parent living elsewhere PLE 20.8 (pe21p5)
- 26 Informant indicator in LSAC variable naming convention: Approach in Wave 7 and subsequent Waves
- 27 Desired occupation sequencing issue
- 28 Inconsistent placement of SC question
- 29 Difference in health status of household members across waves of LSAC
- 30 Academic Rating Scale score in Wave 7
- 31 Gambling data inconsistencies
- 32 References
- Appendix A: Item-person map
- Appendix B: Principal component analysis
5 Review of main educational program of 4-5 year olds
5.1 K cohort
In investigating the quality of the data for the child's educational program type (cpc06a4) at Wave 1, concerns were raised in regard to the consistency of responses to this item with other information from the face-to-face interview and the teacher questionnaire. It was decided to provide a corrected version of cpc06a4 as well as the original version. The correction involved two processes:
1.If teacher data was present and contradicted the value given by Parent 1, the value indicated by the teacher data was used instead.
2. If no teacher data was present, a number of checks were performed on the consistency of the parent's response with other data given (e.g. number of hours in care, the age of the study child, etc.). If a majority of cases with teacher data were corrected when they had the same combination of the original response and number of inconsistencies, then those without teacher data were corrected to the majority value. For example, it was found that among those cases with two or more inconsistencies whose original response was 'Pre-year 1 in a school', more than 50% of the teacher data, where available, indicated that the true response was 'Preschool in a school'. This value was therefore assumed to be most likely for these cases in the absence of teacher data.
More information on this process is provided in the Data User Guide.
At Wave 3, respondents were asked to confirm the details of the educational program the child was in at the time of the Wave 2 interview two years prior, and were then asked about the details of the child's educational programs from ages three to up to six years prior (working backwards until either the child wasn't in an educational program or was in preschool/kindergarten).
Table 11 shows the information captured for each year. This section suggests improvements to the imputation based on this new data.
To determine how to best use these data, some determination has to be made as to their quality. As an initial check, the recall data were checked for reliability with themselves. The data were considered unreliable if there was a greater gap in year level than the number of years between time points, or a lesser gap unless there was an indication that a year level was repeated. This check revealed 14.5% of the cases were unreliable. The data collected at Wave 3 for these cases were not used to impute cpc06a4.
The data were then examined to quantify the number of inconsistencies with other data items from the Wave 1 questionnaire. The following circumstances were considered to be inconsistent:
- The child was in a 'pre-year 1 program' at school and was:
- attending this program fewer than five days/week
- attending this program less than 30 hours/week
- younger than 55 months of age at Wave 1
- in 'Year 1' in Wave 2 unless indicated they had repeated a grade level.
- The child was attending a 'preschool' (other than in day care) and was:
- attending this program for 30+ hours/week
- more than 62 months of age at Wave 1
- in 'Year 2' at Wave 2.
Three different versions of this information were compared using these checks:
1.'Original' - the original value entered from the Wave 1 face-to-face interview
2. 'Teacher' - the original data corrected when it disagreed with data obtained from the Wave 1 teacher questionnaire
3. 'Recall' - the information as recalled by the respondents at Wave 3 for four years prior.
Among those cases that had teacher data at Wave 1 and had reliable recall data at Wave 3,81% were found to have no inconsistencies when using the recall data. This compares with 65% with no inconsistencies using the original data and 84% when using the teacher data. So, it would seem that the teacher data is still the most consistent indicator of the true value; however, the recall data is also reasonably consistent.
In order to determine how to best use this data in the imputation, two different methods were tried. In the first, the recall data was substituted for the original data automatically. There was agreement between the value created using this scheme and the one using the teacher questionnaire data in 76% of cases.
For the second approach to imputation, the recall data (when reliable) were used as an additional check to those listed above and imputations were made on the basis of the number of unlikely combinations of data.
Under the second scheme, the following corrections were made:
1.Children in 'Year 1' were automatically recoded to 'pre-Year 1'.
2. Children in 'pre-year 1' with two or more inconsistencies were recoded to 'preschool in a school'.
3. Children attending a 'preschool in a school' with two or more inconsistencies were recoded to 'pre-year 1'.
4. Children attending a 'preschool at a non-school centre' with two or more inconsistencies were recoded as being in a 'day care centre with a preschool program'.
For cases with Wave 1 teacher data, the data generated by these corrections matched the teacher data in 80% of cases, better than using the recall data by themselves, and better than the correction scheme used prior to the recall data becoming available (which matched in 73% of cases). This approach has therefore been taken.
5.2 B cohort
Given the problems experienced for the K cohort at Wave 1, a different set of questions on educational programs was developed for the B cohort at Wave 3 (see Table 12). In Wave 1 for the K cohort, the data collected from the face-to-face interview on educational programs differed from that collected in the teacher questionnaire in 29% of cases. In Wave 3, for the B cohort, there were differences in 13% of cases.
However, when the consistency of the teacher data and the parent data was tested against other answers in the Wave 3 interview, it was found that neither version had many inconsistencies; however, the teacher corrected version had slightly more (3% versus 2.7%).
In the seven cases (so far) with inconsistencies when the teacher data were used, the teacher's response was 'pre-Year 1 school program' while the parent's was 'preschool program in a school'. These cases may represent programs that don't fall neatly into either category (e.g. classes at a pre-Year 1 level that children attend part-time), although there is no consistency in terms of state of residence of the children or the organisational basis of the school (e.g. independent versus state versus Catholic). Whatever the situation is with these cases, there seems to be little reason to correct the parent data or teacher data when there is little indication of which is correct.
1.Teacher data still to be used to correct parent data when available in determining educational program at Wave 1 for the K cohort.
2. Recall data to be used as an extra consistency check within the existing process when imputing this information when teacher data is absent.
3. No imputation to be performed on Wave 3 B cohort educational program data.
2 That is, minus the 14.5% mentioned above.