Data user guide

The Longitudinal Study of Australian Children: An Australian Government Initiative
Data User Guide – August 2018

10. Data imputation

Limited imputation of data is undertaken in LSAC. In general, imputation occurs only when there is a clear contradiction between data items and there is a good reason to believe one item over the other. Some basic principles are applied for this task.

10.1 Virtual roll-forward

'Roll-forward' is the term in CAI design that refers to the use of data from a previous wave of data collection to determine the questions that need to be asked in a subsequent wave. For wave 2 a limited set of data was rolled forward, largely to assist with the household composition module. Time and resource implications meant that roll-forward could not be used in some other parts of the questionnaire where it may have reduced respondent burden.

For example, in wave 2, respondents were again asked about the age the child stopped being breastfed, in order to obtain the information from those cases where this had not yet happened at the time of wave 1. In re-asking this question, some respondents gave different answers to their wave 1 responses. Given that recollection of respondents is likely to be more accurate closer to the event (i.e. the cessation of breastfeeding), it was decided that in cases where wave 1 data exists, the wave 1 value is taken as correct and the wave 2 value is ignored (i.e. as if the wave 1 data had been rolled forward and the question never asked in wave 2). This means a single variable is produced that represents the best estimate from the two waves of data. (Users are able to tell at which wave the timing data was collected by referring to the question from each wave asking if the child is still being breastfed.)

Note: From wave 3 onwards there is a greater use of roll-forward, which reduced the number of situations where such conflicts could occur.

10.2 Longitudinal contradictions

Another possible contradiction in the data may occur where respondents report at a subsequent wave that an event took place at a time before a previous wave, when the previous wave's data indicated that this event hadn't happened yet.

In these cases, the time of the previous wave is treated as the time of the event. For example, if a parent reported at wave 2 that the child stopped being breastfed after two months; however, at wave 1 the child was three months old and was reported as still being breastfed, the age of breastfeeding cessation would be set to three months.

This strategy for fixing the time of an event is also used for the:

  • date when new members joined the household
  • length of attendance at a particular child care facility
  • date left the household for wave 1 members and temporary members (bf14m1, bf14m2, etc.)
  • age stopped breastfeeding (zf05c)
  • age first had non-breast milk (zhb07)
  • age first had solid food (zhb10)
  • age entered child care arrangements (bpc11a, bpc11b, etc.)
  • age last lived with two biological parents (bpe23c).

10.3 Other imputations

On inspection of the data, problems were revealed in a small number of items. These problems were solved using imputation and are listed below.

  • Employment status: Some assumptions are made to assist in coding the parent to employed, unemployed or not in the labour force where missing values were present.
  • Type of educational program (K cohort, wave 1): There appeared to be some confusion with parents and interviewers as to whether the child was in pre-school or pre-Year 1 at school. The type of education program variable was amended based on the teacher data and other information provided in the questionnaire.
  • Parental income: Outlying values, particularly those with responses to other questions (e.g. categorical income, sources of income) that make the income value appear incorrect, were adjusted. For further information about imputations related to parental income, see LSAC Technical Paper No. 14 Imputing income in the Longitudinal Study of Australian Children [PDF 1.3 MB].
  • Parental height: It was found that there were some changes in height between waves for some parents of study children. While most were minor (most likely due to estimation error) some were more substantial, and called into question the reliability of differences in body mass index recordings between waves.
  • Time use diary data: Responses were recorded by marking an oval to indicate whether an activity/situation occurred in each 15-minute time period. A number of 'false positives' were discovered in the wave 1 TUD data. Imputation was used to reduce the number of false positives. A number of imputations were also performed to improve data quality in all three waves.

Further details of these imputations are given in Data Issues: Waves 1 to 7.