Data Issues

Waves 1 to 7
Data issues - Waves 1 to 7 – February 2019

5 Review of main educational program of 4-5 year olds

5.1 K cohort

In investigating the quality of the data for the child's educational program type (cpc06a4) at Wave 1, concerns were raised in regard to the consistency of responses to this item with other information from the face-to-face interview and the teacher questionnaire. It was decided to provide a corrected version of cpc06a4 as well as the original version. The correction involved two processes:

1.If teacher data was present and contradicted the value given by Parent 1, the value indicated by the teacher data was used instead.

Or

2. If no teacher data was present, a number of checks were performed on the consistency of the parent's response with other data given (e.g. number of hours in care, the age of the study child, etc.). If a majority of cases with teacher data were corrected when they had the same combination of the original response and number of inconsistencies, then those without teacher data were corrected to the majority value. For example, it was found that among those cases with two or more inconsistencies whose original response was 'Pre-year 1 in a school', more than 50% of the teacher data, where available, indicated that the true response was 'Preschool in a school'. This value was therefore assumed to be most likely for these cases in the absence of teacher data.

More information on this process is provided in the Data User Guide.

At Wave 3, respondents were asked to confirm the details of the educational program the child was in at the time of the Wave 2 interview two years prior, and were then asked about the details of the child's educational programs from ages three to up to six years prior (working backwards until either the child wasn't in an educational program or was in preschool/kindergarten).

Table 11 shows the information captured for each year. This section suggests improvements to the imputation based on this new data.

To determine how to best use these data, some determination has to be made as to their quality. As an initial check, the recall data were checked for reliability with themselves. The data were considered unreliable if there was a greater gap in year level than the number of years between time points, or a lesser gap unless there was an indication that a year level was repeated. This check revealed 14.5% of the cases were unreliable. The data collected at Wave 3 for these cases were not used to impute cpc06a4.

The data were then examined to quantify the number of inconsistencies with other data items from the Wave 1 questionnaire. The following circumstances were considered to be inconsistent:

  • The child was in a 'pre-year 1 program' at school and was:
    • attending this program fewer than five days/week
    • attending this program less than 30 hours/week
    • younger than 55 months of age at Wave 1

    or

    • in 'Year 1' in Wave 2 unless indicated they had repeated a grade level.
  • The child was attending a 'preschool' (other than in day care) and was:
    • attending this program for 30+ hours/week
    • more than 62 months of age at Wave 1

    or

    • in 'Year 2' at Wave 2.
Table 11: Variables capturing previous years educational programs for the K cohort at Wave 3
Questions
1) What program did child attend the year before, that is in (3 years prior)?
Year 1 (Grade 1)

Pre-year 1 program

Preschool/kindergarten program

Long day care

Home-schooled

Other

Child did not attend an educational program

3

3

3

3

3

2

End of recall

2) Other specify Previous year
3) Was that located in a school?  

Yes

No

Previous year

epc59d?

4) Was it a …?  
Preschool/kindergarten only centre

Preschool/kindergarten in a long day care centre

Mobile pre-school

Long day care centre

Other

End of recall items

End of recall items

End of recall items

End of recall items

End of recall items

Three different versions of this information were compared using these checks:

1.'Original' - the original value entered from the Wave 1 face-to-face interview

2. 'Teacher' - the original data corrected when it disagreed with data obtained from the Wave 1 teacher questionnaire

3. 'Recall' - the information as recalled by the respondents at Wave 3 for four years prior.

Among those cases that had teacher data at Wave 1 and had reliable recall data at Wave 3,2 81% were found to have no inconsistencies when using the recall data. This compares with 65% with no inconsistencies using the original data and 84% when using the teacher data. So, it would seem that the teacher data is still the most consistent indicator of the true value; however, the recall data is also reasonably consistent.

In order to determine how to best use this data in the imputation, two different methods were tried. In the first, the recall data was substituted for the original data automatically. There was agreement between the value created using this scheme and the one using the teacher questionnaire data in 76% of cases.

For the second approach to imputation, the recall data (when reliable) were used as an additional check to those listed above and imputations were made on the basis of the number of unlikely combinations of data.

Under the second scheme, the following corrections were made:

1.Children in 'Year 1' were automatically recoded to 'pre-Year 1'.

2. Children in 'pre-year 1' with two or more inconsistencies were recoded to 'preschool in a school'.

3. Children attending a 'preschool in a school' with two or more inconsistencies were recoded to 'pre-year 1'.

4. Children attending a 'preschool at a non-school centre' with two or more inconsistencies were recoded as being in a 'day care centre with a preschool program'.

For cases with Wave 1 teacher data, the data generated by these corrections matched the teacher data in 80% of cases, better than using the recall data by themselves, and better than the correction scheme used prior to the recall data becoming available (which matched in 73% of cases). This approach has therefore been taken.

5.2 B cohort

Given the problems experienced for the K cohort at Wave 1, a different set of questions on educational programs was developed for the B cohort at Wave 3 (see Table 12). In Wave 1 for the K cohort, the data collected from the face-to-face interview on educational programs differed from that collected in the teacher questionnaire in 29% of cases. In Wave 3, for the B cohort, there were differences in 13% of cases.

However, when the consistency of the teacher data and the parent data was tested against other answers in the Wave 3 interview, it was found that neither version had many inconsistencies; however, the teacher corrected version had slightly more (3% versus 2.7%).

In the seven cases (so far) with inconsistencies when the teacher data were used, the teacher's response was 'pre-Year 1 school program' while the parent's was 'preschool program in a school'. These cases may represent programs that don't fall neatly into either category (e.g. classes at a pre-Year 1 level that children attend part-time), although there is no consistency in terms of state of residence of the children or the organisational basis of the school (e.g. independent versus state versus Catholic). Whatever the situation is with these cases, there seems to be little reason to correct the parent data or teacher data when there is little indication of which is correct.

Outcomes

1.Teacher data still to be used to correct parent data when available in determining educational program at Wave 1 for the K cohort.

2. Recall data to be used as an extra consistency check within the existing process when imputing this information when teacher data is absent.

3. No imputation to be performed on Wave 3 B cohort educational program data.

Table 12: Variables capturing current educational programs for the B cohort at Wave 3
Questions
1) (Thinking about the arrangement the child uses for the most hours per week) is this located in a school?
Yes

No

1

5

2) What class or program does child attend?  
Year 1 (Grade 1)

Pre-year 1 program

Preschool/kindergarten program

Long day care

Other, e.g. multi-age classes, early intervention

4

4

4

4

3

3) Other specify 4
4) Does child attend this program at  
A government school?

A Catholic school?

An independent or private school?

Further items

Further items

Further items

5) Which of the following best describes where child goes?  
Preschool/kindergarten only centre

Preschool/kindergarten in a long day care centre

Mobile preschool

Long day care centre

Other

Further items

Further items

Further items

Further items

6

6) Other specify Further items

2 That is, minus the 14.5% mentioned above.