Data user guide
- 1. Introduction
- 2. What is LSAC?
- 3. Instruments
- 4. The LSAC data release
- 5. File structure
- 6. Variable naming conventions
- 7. Documentation
- 8. Data transformations
- 9. Confidentialisation
- 10. Data imputation
- 11. Survey methodology
- 12. Important issues for data analysis
- 13. User support and training
- Appendix: LSAC variable naming conventions
Table 1 summarises the data collection instruments used in each wave.
The following methods are used to collect study data.
- The face-to-face interview (F2F) is conducted with P1 and the study child (although, in Wave 1, P2 could complete some sections if this was more convenient). This component is undertaken with all participating families at each wave. Some interviews might be completed over the telephone in full; for example, with participating families in remote areas (see section 11.3.7).
- The P1 during interview questionnaire (P1D) consisted of self-complete items for which it was considered important to achieve high response rates. In Wave 4 it became a computer-assisted self-interview (CASI).
- The P1 leave-behind questionnaire (P1L) consisted of lower priority self-complete items. Efforts are made to obtain this data from P1 while the interviewer is in the home. This form became part of the CASI.
- The P2 leave-behind questionnaire (P2L) consists of self-complete items. Efforts are made to obtain this data from Parent 2 while the interviewer is in the home. If this is not possible the questionnaire is left for completion at a later time.
- Child self-report interview (CSR) consists of items answered by the study child. For children younger than 10 years old it was administered by an interviewer. For children aged 10-15 years (K cohort, Waves 4, 5 & 6 and B cohort, Wave 6) it was administered via an audio computer-assisted self-interview (ACASI) and from 16 to 17 years (K cohort, Wave 7) by a computer-assisted self-interview (CASI). As part of the interview, physical measurements are taken and other assessments (such as measures of cognition or achievement) are administered to the study child.
- The study child completes an audio computer-assisted self-interview (ACASI) or a computer-assisted interview (CASI) by themselves. This method allows sensitive content to be answered by the child in total anonymity.
- The time use diary (TUD) documents a 24-hour period of the child's life. In Waves 1, 2 and 3, the child's family were asked to complete two TUDs, one for a week day and one for a weekend day. A different procedure was implemented in Wave 4. In Wave 4, the study child (K cohort only) was asked to complete one TUD. A TUD form with instructions on how and when to fill it in was sent to the study child prior to the interview. The study child was asked to fill in the TUD form on the day before the interview date. The next day, during the interview, the interviewer asked the child to describe 'yesterday' using the TUD form. The day the diary referred to could be any day of the week depending on when the interview was scheduled.
- The parent living elsewhere questionnaire (PLE) was first included in Wave 2 as a mail-back questionnaire. In Wave 3 it became a computer-assisted telephone interview (CATI).
- The RAP study child is the study child respondent living away from parents (from Wave 7 for K cohort). Study child (RAP) and P1 (RAP) both complete home interviews in their own separate homes. P2 (RAP) and Parent PLE (RAP) instruments are still administrated in the same way for RAP study child's parents as for other participants.
- The home-based carer questionnaire (HBC) is for children aged 0-1 and 2-3 years who receive child care in a home environment, most commonly from a grandparent.
- The centre-based carer questionnaire (CBC) is for children aged 0-1 and 2-3 years who receive child care from long day care programs in centres, schools, occasional care programs, multi-purpose centres and other arrangements.
- The teacher questionnaire (TQ) is for children aged 4-5 years and older who attend a school or, for some 4-5 year olds, a preschool or long day care centre.
Notes: The indicator variable can be used to see if data is present or not for a particular instrument (for more information see sections 8.6 & 8.7). The [*] in the indicator variable should be replaced by the age indicator (a, c, d, e, f, g, h i) as discussed below. In-between waves were administered using mail out surveys for Waves 1.5, 2.5 and 3.5. Waves 4.5 and 5.5 used online web forms to update contact details.
- Interviewers make observations (IOBS) with permission of the respondent about the interview, state of the house (where the interview was conducted) and the neighbourhood characteristics of where the respondent lives.
- In Wave 1 the Australian Early Development Census (AEDC) was included as a nested study, which involved the AEDC questionnaire being sent with the LSAC K cohort teacher questionnaire in Victoria, Queensland and Western Australia. The AEDC is a community-level measure of young children's development based on a teacher-completed checklist. It consists of over 100 questions measuring five developmental domains: language and cognitive skills; emotional maturity; physical health and wellbeing; communication skills and general knowledge; and social competence. For more information visit the Australian Early Development Census website.
- The family contact form (FCF) recorded information about any contact between the interviewer and the family of each of the selected children at the time of Wave 1, regardless of whether they agreed to participate in the study or not. The information was mainly used by the fieldwork agency, with the only information from the FCF available in the publicly released dataset being the information on the family's home and neighbourhood. In subsequent waves, this information was included as part of the interviewer observations of the face-to-face interview.
- Between-wave questionnaires (Wave 1.5, wave 2.5 and Wave 3.5) are brief questionnaires sent to respondents to complete and return in the year between main waves of data collection. Between-wave surveys help to maintain contact with study families and collect information about activities and development in the year between the main waves. For Waves 4.5 and 5.5, online web forms were used to update contact details of study participants.
3.1.1 Physical measurements
For the B cohort in Wave 1, the child's weight was obtained by calculating the difference between the weight of Parent 1 (or another adult) with the child and the weight of the parent/other adult on their own. For the B cohort at all subsequent waves, and the K cohort at all waves, the child's weight was measured directly.
In Wave 1 the scales used were Salter Australia glass bathroom scales (150 kg x 50 g). In Waves 2 and 3, these scales were used along with HoMedics digital BMI bathroom scales (180 kg x 100 g). In Waves 4, 5, 6 and 7, Tanita body fat scales were used.
Height is measured for children aged two years and older. In Waves 1, 2 and 3, height was measured using an Invicta stadiometer, from Modern Teaching Aids. In Waves 4, 5, 6 and 7 a laser stadiometer was used. Two measurements were taken, and if the two measurements differed by 0.5 cm or more, a third measurement was taken. The average of the two closest measures was included on the data file.
This measurement is taken for children aged two years and older using a non-stretch dressmaker's tape, positioning the tape horizontally over the navel. In all waves, two measurements were taken, and if these differed by 0.5 cm or more, a third measurement was taken. The average of the two closest measures was recorded on the data file.
A body fat measurement was included in Waves 4, 5, 6 and 7, with the reading provided by the same scales used for weight (Tanita body fat scales). Issues with the body-fat measurement are outlined in the Data Issues Paper.
This measurement was only taken for the B cohort in Wave 1, using an Abbott head circumference tape. Two measurements were taken, and if these differed by 0.5 cm or more, a third measurement was taken. The average of the two closest measures was included on the data file.
This measurement was taken for the K cohort in Waves 4 and 5 and for the B cohort in Wave 6 and 7 using the A&D Digital Blood Pressure Monitor - Model UA-767. The interviewer took two measurements, with a one-minute interval between the measurements. Both of the readings were included in the data file.
3.1.2 'Who am I?' (WAI)2
The 'Who am I?' (WAI) assessment is a direct child assessment measure that requires children to copy shapes (a circle, triangle, cross, square and diamond) and write numbers, letters, words and sentences. For the LSAC testing, there was a change to WAI Item 11: 'This is a picture of me' was replaced with a sentence to be copied, 'John is big.' The WAI assessment was used for children aged 4-5 years (Wave 1 K cohorts and Wave 3 B cohorts) to assess the general cognitive abilities needed for beginning school.
The study child was given his/her own answer booklet to draw and write in. What they wrote/drew was assessed by experienced researchers at the Australian Council for Educational Research (ACER). See Data Issues Waves 1 to 7 for details of the Rasch Modelling used to score the WAI.
3.1.3 Peabody Picture Vocabulary Test (PPVT)3
A short form of the Peabody Picture Vocabulary Test (PPVT-III), a test designed to measure a child's knowledge of the meaning of spoken words and his or her receptive vocabulary for Standard American English, was developed for use in the study. This adaptation is based on work done in the USA for the Head Start Impact Study, with a number of changes made for use in Australia.
Different versions of the PPVT containing different, although overlapping, sets of items of appropriate difficulty were used for the children at ages 4-5, 6-7 and 8-9 years. A book with 40 plates of display pictures was used. The child points to (or says the number of) a picture that best represents the meaning of the word read out by the interviewer.
Scores are created via Rasch Modelling so that changes in scores represent real changes in functioning, rather than just changes in position relative to peers. See Data Issues Waves 1 to 7 for more details.
3.1.4 Matrix Reasoning4
Children completed the Matrix Reasoning (MR) test from the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV) at ages 6-7, 8-9 and 10-11 years. This test of non-verbal intelligence presents the child with an incomplete set of diagrams (an item) and requires them to select the picture that completes the set from five different options. The data file includes raw scores (number of correct responses) and scaled scores based on age norms given in the WISC-IV manual. The instrument comprises 35 items of increasing complexity. Children start on the item corresponding to their age-appropriate start point. If a child does not answer correctly on the first or second start-point items, the examiner should ask two items prior to the age-appropriate start point (called 'reverse administration'). Reverse administration was not implemented in the LSAC instrument. See the discussion of this issue in the Data Issues Waves 1 to 7.
3.1.5 Executive functioning (EXEC/CogState)5
The executive functioning of children in the K cohort was tested at Wave 6 using three Cogstate cognitive tests, including the Identification task (IDNT), One-back test (ONBT), and the Groton Maze Learning Test (GML). In Wave 7, the same battery of tests was used to examine the executive functioning of the P1 of K cohort children. The datafiles the outcome variables are contained in the CogState dataset, where a series of cognitive testing batteries have been customised for use in LSAC. Each row of a CogState dataset represents one task in the CogState test battery for one study subject in one test session.
The IDNT tests choice reaction time and overt attention. The subject must determine stimulus colour and then pick the appropriate button depending on the colour of the stimulus. The subject is shown a playing card on the screen and asked to respond as quickly as possible to the question: “Is the card red?”. Rapid and accurate responding requires children to pay attention to the colour of the card, but not its suit or number. The ONBT is a task of working memory, where the subject is required to remember the imagine of the last item they saw on the screen and compare the memory of this image to the next stimulus..
The GML test contains five learning trials (i.e. the subject repeats the same task five times), where the subject is shown a 10 x 10 grid of tiles on a computer touchscreen. A 28-step pathway is hidden among these 100 possible locations. The child is instructed to move one step from the start location and then to continue, one tile at a time, toward the end. The subject repeats the task while trying to remember the pathway they have just completed and learns the 28-step pathway though the maze on the basis of trial and error feedback. The scores are interpreted by calculating the total number of errors made in attempting to learn the same hidden pathway. A lower score indicates better performance.
Further information about the instruments used is available in the 'Instruments' section of this guide, and in LSAC Technical Paper No. 19, Executive Functioning - Use of Cogstate measures in the Longitudinal Study of Australian Children [PDF 1.4 MB].
3.1.6 Rice Test of Grammaticality Judgement (GJT/SLI)6
As children grow older, different methods are needed to assess the presence or absence of specific language impairment (SLI). That is, to identify whether children are meeting expected performance levels in achieving the adult standard of English grammar. Where LSAC children were identified in early waves to have poor language performance, it was not possible to distinguish the children with and without SLI. The Rice Grammaticality Judgement Task (GJ Task) was therefore introduced in wave 6 for children of the K cohort.
The GJ Task is a short, automated (administered by ACASI) task that requires the study child to distinguish between grammatical and non-grammatical utterances known to be vulnerable to SLI in English-speaking children (Rice, Hoffman & Wexler, 2009). The study child listens through earphones as 20 pre-recorded items are spoken and enters their response by clicking the appropriate radio buttons (1 for 'Right', 5 for 'Not so good', and 9 for 'Hear again'). Its sensitivity and specificity for SLI are .70 with a ROC of approximately 0.85.
The number and percentages of survey instruments of each type that were completed at each wave are shown in Table 2. Get more detailed information on non-response from the technical papers on weighting and non-response.
Notes: SC ACASI = B cohort and SC CASI = K cohort. Wave 6 CSR instrument was used and in Wave 7 CAI was used. a Questionnaire acronyms are detailed above in section 3, Table 1: Data collection modes by wave. b 'Eligible' means the number of LSAC children for whom a questionnaire was applicable (e.g. children are eligible for a HBC questionnaire if the child's main care is attended for 8 hours or more per week and this is home-based care). c 'Actual' means the number of respondents for whom a form was returned. d Response rates for Waves 2 to 7 as proportion of Wave 1 families. e Represents instances where a child interview was completed and the main interview with the parents was not. Specifically, in Wave 4 there were five cases (K cohort). In Wave 5 there were eight cases for the K cohort and four cases for the B cohort. In Wave 6 there were 11 cases for K cohort and four cases for the B cohort. In Wave 7 there were seven cases for B cohort and 41 cases or K cohort. N/A = Not administered. Also in Wave 7, an 'in-between' wave activity was conducted to address the increase in refusals, hence W7.25 was developed. CSR/CAI f = Child Self-report & Computer Assisted Interview (Introduced first time in K cohort) – both are an interviewer administered survey with the study child.
3.2.1 Parent 1 questionnaires
In Wave 1, interviewers encouraged the parents to complete the P1L and P2L forms while the interviewer was in the home. Interviewers were also able to pick up forms in some cases, when forms were left behind. Forms not given to interviewers were mailed back. Two reminders were made for forms that were not returned.
In Wave 2, P1 had two forms to complete. Interviewers were instructed that the P1D form must be completed when they were in the home (resulting in a high response rate). The P1L was generally left behind to be mailed back, as there was not enough time for these to be completed. Interviewers were generally not required to pick up the forms. Up to four reminders were made for forms that were not returned; however, the P1L forms showed lower response rates in Wave 2 compared with Wave 1. This may have been because P1 had already completed one form or because interviewers did not generally pick up forms.
For Wave 3, there was only one P1 self-complete form. Interviewers were instructed that this form must be completed while the interviewer was in the home. However, only two thirds of parents were able to do so. Three reminders were sent for forms not returned.
In Wave 4, P1 was asked to complete a CASI, which resulted in a response rate of 99% of eligible respondents. This was higher than the response rate of 88% of eligible respondents achieved in Wave 3 using the self-complete form.
In Wave 5, response rates were very similar to response rates obtained in Wave 4. This was due to no mode changes and attrition tapering off.
In Wave 6, response rates were similar to previous waves using the same mode. There was a slight decrease from the K cohort completion of the CASI from 98% in Wave 5 to 96% in Wave 6.
In Wave 7, response rates saw a very slight decrease in the B cohort completion of the CASI from 98% in wave 6 to 97% in wave 7. While there was a slight increase in the K cohort completion of the CASI from 96% in wave 6 to 99% in wave 7.
3.2.2 Parent 2, TUD and teacher forms
Response rates to the P2L and the TUD were broadly similar between waves (Waves 1, 2 and 3; between 67% and 79%), while the carer and teacher questionnaire response rates were much improved in Wave 2, with similar response rates at Wave 3. In Wave 4 the TUD response rate was 96%. The higher response rate could be contributed to changes in the procedure and in the informant. In Waves 4, 5 and 6 the interviewer collected the TUD information from the child instead of the parent. The data were collected as part of the interview rather than leaving a diary that previously required completion and return via mail by respondent families after the visit. In Wave 7 hard copy questionnaires were collected from P2 for both B and K cohorts. However, TUDs and teacher forms were collected from B cohort children only.
3.2.3 PLE response
The PLE questionnaire was introduced in Wave 2 and applies to children who see their 'parent living elsewhere' (PLE) at least once a year. There are three stages at which non-response can occur: (1) obtaining contact details from P1; (2) obtaining permission from P1; and (3) receiving a response from the PLE.
In Wave 2, contact details were given for 69% of cases for the B cohort and 70% of cases for the K cohort, and responses were received from 35% of PLEs sent a questionnaire for the B cohort and 47% for the K cohort.
Due to the relatively low response in Wave 2 to the mail-out questionnaire, a change in methodology was introduced in Wave 3. Where P1 had provided contact details, PLEs were telephoned and asked to respond to a computer-assisted telephone interview (CATI). The response from PLEs who were approached was very positive. Of the 856 PLEs that interviewers attempted to contact, interviews were achieved with 675 (79%) PLEs and only 53 (6%) PLEs refused an interview. Most of the remaining non-responses were due to not being able to contact the PLE.
In Wave 3, P1 was explicitly asked for their permission to contact the PLE. Therefore, it was easy for P1 to refuse to provide any information about the PLE or refuse the PLE's participation. This meant that no information was obtained for 260 (18%) PLEs.
It is worth noting that from Wave 4 onwards, there was no direct question asking the P1 permission to contact the PLE. However, some P1 respondents refused the PLE's participation by not providing contact details.
Table 3 summarises the PLE response rates from Waves 3 to 7.
|Wave 3||Wave 4||Wave 5||Wave 6||Wave 7|
|PLE identified during P1 interview||578||837||1,415||674||878||1,552||773||911||1,684||778||817||1,595||732||756**||1,488|
Note: *The PLE is considered eligible when: (1) the PLE satisfies the parental requirements; i.e. PLEs who see the study child at least once a year; (2) the PLE's contact details are available; (3) P1 did not explicitly refuse permission to contact the PLE. ** There were 19 (RAP) PLEs identified during P1 interview and 9 (RAP) identified as Eligible PLE* in the K cohort.
3.2.4 Wave 7 RAP response
Delays in enumeration hindered the progress of identifying populations such as RAP children, RAP parents and RAP PLEs in Wave 7. This had flow-on effects in contacting these respondents, and the timing available for tracking or follow up. During Wave 7 enumeration, 24 RAP parent records were generated. Of these, 13 (54.2%) parents undertook an interview, while three parents (12.5%) refused, eight parents (33.3%) were not contactable.
Table 4 summarises the final RAP response rates for Wave 7.
Note: * Includes avoidance
3.2.5 Wave 7.25 response
The fully responding rate for the K cohort was significantly lower than the B cohort as this required collecting the respondent engagement questions from both the P1 and the SC, as well as all of the CATI Wave 7 catch-up questions from the SC.
For both the B and K cohorts the non-contact rate was by far the largest with almost 50% of all records being unable to be contacted. Interviewers were advised to only make up to three call attempts before finalising selections (as is standard for follow-up refusal workloads). This would have had an impact on their ability to get hold of respondents.
Table 5 summaries the final response rates for Wave 7.25.
|Field response||Cohort B||Cohort K||Total|
Notes: * Respondent engagement questions only (i.e. no CATI catch-up questions). ** For Ks, both the P1 and SC refused to take part or P1 refused for themselves and the SC.
2 The 'Who Am I?' is copyrighted by the Australian Council for Educational Research, Melbourne, 1999.
3 The Peabody Picture Vocabulary Test, Third Edition (PPVT-III) Form IIA is copyright by Lloyd Dunn, Leota Dunn, Douglas Dunn, & American Guidance Service, Inc., 1997, and published exclusively by AGS Publishing. Permission to adapt and create a short form for LSAC was granted by the publisher. The PPVT-III - LSAC Australian Short-form was developed by S. Rothman, Australian Council for Educational Research (ACER), Melbourne, from the Peabody Picture Vocabulary Test, Third Edition (PPVT-III), Form IIA, English edition.
4 The Wechsler Intelligence Scale for Children, Fourth Edition is copyrighted by Harcourt Assessment, Inc., 2004.
5 Executive functioning was assessed via direct cognitive assessment using the Cogstate cognitive testing battery. The Cogstate program produces a variety of cognitive tests, which can be found at www.Cogstate.com/
6 Test of Early Grammatical Impairment. United States: The Psychological Corporation, A Harcourt Company.