Data Issues

Waves 1 to 7
Data issues - Waves 1 to 7 – February 2019

10 Data issues in Wave 5

The first four waves of LSAC data included geography items such as postcodes and various levels of the Australian Standard Geographical Classification (ASGC) that were generated from geocoding of the residential addresses of study families. In Wave 1, the geocodes were based on global positioning system (GPS) coordinates obtained by I-view interviewers at the time of interview, while in Waves 2-4, they were based on residential addresses collected by Australian Bureau of Statistics (ABS) interviewers.

In July 2011, the ABS introduced a new statistical geography framework called the Australian Statistical Geography Standard (ASGS) to replace the ASGC. The main purpose of the ASGS is to disseminate geographically classified statistics. It provides a common framework of statistical geography enabling the publication of statistics that are comparable and spatially integrated.

Improved data sources and technology have allowed the ABS the opportunity to create a better geography optimised for the release of ABS statistics. A new robust and stable structure means that changes over time are minimised, assisting in the maintenance of quality time-series data. In addition, the ASGS, together with improved methods of calculation, allows for more accurate correspondences to translate ABS data to non-ABS administrative and geographic regions.

For further information on this new standard refer to 1270.0.55.001 - Australian Statistical Geography Standard (ASGS): Volume 1 - Main Structure and Greater Capital City Statistical Areas, July 2011.

To take advantage of this more comprehensive, flexible and consistent way of defining Australia's statistical geography, the ASGS is included on LSAC releases from Wave 5 onwards. To ensure that there is a common geographical standard across waves, the decision was made to:

  • dual-code Wave 5 residential addresses to ASGC and ASGS, enabling the comparison of old and new classifications; and
  • back-code Waves 1-4 residential addresses to the new standard ASGS.

The new variables added to the general release file for each wave are shown in Table 15.

Table 15: New geography variables included from Wave 5
Without age variable name Label
Gccsa Australian Statistical Geography Standard (ASGS) - Edition 2011 - Greater Capital City
Statistical Area Structure
Sos Australian Statistical Geography Standard (ASGS) - Edition 2011 - Section of State
sa22011 Australian Statistical Geography Standard (ASGS) - Edition 2011 - SA2
sa32011 Australian Statistical Geography Standard (ASGS) - Edition 2011 - SA3
sa42011 Australian Statistical Geography Standard (ASGS) - Edition 2011 - SA4
absra Australian Statistical Geography Standard (ASGS) - Edition 2011 - Remoteness Area (ABS)

Most addresses were auto-coded using ASGS address coders, which link addresses to geographical areas. However, in some cases, addresses were either incomplete, had spelling errors or, more rarely, were identical addresses in the same suburb. In these cases, addresses were manually cleaned to reduce the number of records with missing geocodes. After these steps, there were still some records unable to be geocoded to ASGS (level SA2). These numbers for Waves 1-5 are provided in Table 16.

Table 16: Number of records missing SA2 by wave
Wave Number of responding records not coded to SA2
1 20
2 12
3 13
4 34
5 2

To enable coding to the ASGS, many addresses needed cleaning to ensure accurate data. As a result, some records have SLAs where there were none previously, and others have been coded to a different SLA.

The 2011 Census and SEIFA data are available in the new ASGS classifications. However, while it is possible to provide ASGS classifications for Waves 1-5, census and SEIFA data for 2001 and 2006 are not available for these new geographic classifications (ASGS).

From Wave 7 onwards only ASGS geography variables will be output on the files.

10.2 Occupation

LSAC data include variables for the occupation of Parent 1 (P1) and Parent 2 (P2). In recent waves, the occupation of Parent Living Elsewhere (PLE) and the parents of P1/P2/PLE (i.e. the study child's grandparents) are also included. These were coded using the Australian Standard Classification of Occupations (ASCO). The ANU4 scale - a scale of occupational status calculated using ASCO, which is an occupational classification system that classifies jobs according to skill level and skill specialisation - is also provided to data users for Waves 1-4.

Since Wave 2, LSAC occupation data have also been coded to the newer occupation standard, which is the Australian and New Zealand Standard Classification of Occupations (ANZSCO). ANZSCO was introduced in 2006 and was a product of a development program between the ABS, Statistics New Zealand and the Australian Government Department of Employment and Workplace Relations.

For further information on this standard, refer to 1220.0 - ANZSCO - Australian and New Zealand Standard Classification of Occupations, First Edition, 2006.

The latest release of ASCO was in 1997, reducing its applicability to the current Australian workforce. Therefore, from Wave 5 onwards only, ANZSCO codes will be produced. To enable the transition to using ANZSCO, the study has:

  • added ANZSCO codes to the Waves 2-4 data files, as these codes were already generated during these waves, and is investigating the possibility of providing ANZSCO for Wave 1 through correction code
  • replaced the ANU4 scale from Wave 5 onwards with the Australian Socioeconomic Index 2006 (AUSEI06) (McMillan, Beavis, & Jones, 2009), the latest in the series of occupation status scales developed by the ANU.
  • provided AUSEI06 for Waves 2-4, and is investigating the possibility of adding to Wave 1 through a correction code.

The new variables added to the general release file are in Table 17.

Table 17: New occupation variables included from Wave 5
Question ID Label
pw08_5 Current occupation (ANZSCO code)
pw08_6 Current or most recent occupation (ANZSCO code)
pw08_7 Current occupation (AUSIE06 code)

The SEP variable (Z score for socio-economic position among all LSAC families) has been calculated from Waves 1 to 4 using ASCO classifications. Due to ASCO being unavailable for Wave 5, the SEP variable has not been calculated and hence is not available in the Wave 5 dataset. Further work will be done into ways we can calculate the SEP using the ANZSCO classifications and a new/revised SEP variable may be available in the future.

10.3 ACIR data issue (all waves)

After analysis of the ACIR data previously supplied, it came to light that immunisation rates in LSAC did not reflect national rates. After investigation with the data provider, it was found that data extraction up to Wave 5 had not extracted all the required records. These data have been rectified; however, data users should not use the previous version of the ACIR data.

10.4 Changes to household files

Addition of 'Person Type' to the files

In Wave 5, Person Type (f21a) is available on the Waves 1-5 files for the first time, with a code attached to each household member and wave. This item is derived from information collected in the P1 interview and amended where needed during processing. A list of the person types and a description of each is shown in Table 18.

Table 18: Person Type descriptors
Code Person Type Description
1 Study child The study children are the focus of the study, and consist of two cohorts (B cohort aged 8-9 years and K cohort aged 12-13 years in Wave 5).
2 Parent 1 Parent or guardian who provides the greatest role in caring for the study child and is therefore likely to be the most reliable informant on the health, development and care of the study child. Parent 1 must live with the study child
3 Parent 2 Study child's other resident parent/guardian, or the married or de facto partner of Parent 1. Another person in the household can be considered as Parent 2 if they are acting as a significant parental figure who helps to care for the child and is a stable member of the child's residential family unit.
4 Usual resident A person other than the study child and the study child's resident parent(s) who usually lives in the study child's house (e.g. siblings of the study child)
5 Non-resident A person other than a parent who has previously been a resident of the household but no longer lives in the same household as the study child
6 Parent living elsewhere A parent of the study child who does not live in the same household as Parent 1 and the study child. This person may previously have been a Parent 2 (or a Parent 1).
7 Temporary member Includes people who, in-between waves, joined the study child's household for more than three months but have since left
8 Empty row In the household files row/member number 3 is always used for Parent 2 at Wave 1. When there was no P2 in the house at Wave 1, this row is left as an empty row. Also used when duplicate members are picked up
9 Deceased A person who was previously recorded as a resident of the household but has died

Changes in relationship to study child information for household members

For Waves 1-4, the household file carried forward the relationship to study child for each member in the household from Wave 1 or the subsequent wave for members entering the household after Wave 1. This means that for an existing household member, the relationship information in the household file is generally the same across waves. In some cases, this will not reflect changes in the relationships within the household. Relationship changes that we know did occur include:

  • a step-parent changing to adopted parent
  • an unrelated adult changing to step-parent
  • a foster sibling changing to adopted sibling.

From the Wave 5 interview onwards the relationship of existing household members to the study child can be updated during the interview for household members present in previous waves.

As a result, from Wave 5 onwards there will be differences in the relationships between study children and household members between waves.

Inclusion of two waves of household data in the PLE person grid

The person grid is a list of people and their demographics associated with the study child, some members may still reside with the study child and others may have left. The Wave 5 parent living elsewhere survey instrument included roll-forward person grid data from Wave 4, so now two waves of household data for ongoing responding PLEs are available. Including Wave 4 details of a PLE's household in the survey instrument enables comparisons of the PLE's household circumstances between waves.

Concordance between people on main and PLE person grids

The concordance between the main household and the PLE's household has been provided for the first time in Wave 5. This enables the identification of who is the same person between the two files, who is on the main file only, and who is on the PLE file only. Table 19 provides a list of variables provided in the concordance file.

Table 19: Concordance file variables
Question ID Label
MID5 Wave 5 Main Household Member Number
PLEID5 Wave 5 PLE Household Member Number
HHTYPE 5 Wave 5 Household Type
CHHFLOOP Wave 5 Combined Household Row Number

The values for HHTYPE_5 are:

  • 0 = Not present at Wave 5
  • 1 = Wave 5 main household member only
  • 2 = Wave 5 PLE household member only
  • 3 = Wave 5 main and PLE household member

For example:

  • Main household member number 4 was present at Wave 5, and that person was also present at Wave 5 in the PLE household, where they were recorded as member number 3. The variables that link these records will contain the following values: MID5 = 4; PLEID5 = 3, HHTYPE_5 = 3.
  • If main household member number 4 was in the main household only at Wave 5, the values would be: MID5 = 4; PLEID5 = -9, HHTYPE_5 = 1.
  • If PLE household member number 3 was in the PLE household only at Wave 5, the values would be: MID5 = -9; PLEID5 = 3, HHTYPE_5 = 2.

The values in MID5 and PLEID5 correspond to the member number in the data files, so this will enable you to find demographic information and link it to the files if required.

Child report of whether at school

At the start of both the study child's audio-computer-assisted self-interview (ACASI) module and the face-to-face Child Self-Report K (CSRK) module, the interviewer records whether the study child is attending school, using response options of Yes and No. If the study child does not attend a school, some questions about schooling are not asked. These questions are directly related to the school environment and therefore are not relevant to study children not attending school. Parent 1 is also asked a question about whether the child:

  • attends a government school
  • attends a Catholic school
  • attends an independent or private school
  • is not in school.

In total, the number of K cohort children coded as not in school as a result of the P1 interview was 33, whereas from the child interview the combined number was 218. Table 20 demonstrates that there were 191 records where the responses about whether the child was in school conflicted between the two interview components.

Table 20: Whether in school according to Parent 1 and study child components
 Parent 1 (EDUC14)In school Study child (ACASI02/CSRK02)
In school Not at school (either question) No study child interview Neither question answered Total
In school 3,639 189 51 38 3,917
Not at school 2 29 2 0 33
No P1 interview 4 0 0 0 4
Question not answered 1 0 1 0 2
Total 3,646 218 54 38 3,956

Table 21 cross-tabulates possible reasons for the discrepancy against school type, as recorded in the P1 interview for these 189 records. Around 44% of the difference seems to be accounted for by the interview taking place at the weekends or in school holidays.

Table 21: Characteristics of child or interview for children entered as not attending school by the interviewers
School attended Interview date in school holidays Interview date on weekend (not school holidays) Interview date is school day Total
Government school 32 16 61 109
Catholic school 11 5 23 39
Independent or private school 12 7 22 41
Total 55 28 106 189

To improve the quality of reporting in Wave 6, and to clear up any confusion, school attendance was recorded in the same way in both the child interview and the Parent 1 interview. In the child interview the same response categories of government school, Catholic school, independent or private school, and not in school will be provided instead of Yes/No responses. This change is to make it clearer that the study is asking about usual school attendance and not whether school was attended on the current interview date. This point was further highlighted in interviewer training.