Data user guide
- 1. Introduction
- 2. What is LSAC?
- 3. Instruments
- 4. The LSAC data release
- 5. File structure
- 6. Variable naming conventions
- 7. Documentation
- 8. Data transformations
- 9. Confidentialisation
- 10. Data imputation
- 11. Survey methodology
- 12. Important issues for data analysis
- 13. User support and training
- Appendix: LSAC variable naming conventions
Two types of data are available to data users:
- restricted release data
- general release data.
The only information not included is name, address and other contact details for the child, family, child care agency and teacher or carer. Access to the in-confidence datasets may be granted where data users are able to demonstrate a genuine need for the additional data and they meet the necessary additional security requirements. It should be noted that from wave 7 in-confidence data will be known as restricted release data.
In addition to the information removed for the in-confidence file, some other items have also been removed, and some items have either been transformed, had response categories collapsed or have been top-coded (i.e. recoding outlying values to a less extreme value).
The following items are removed:
- qualitative data provided by respondents
- census and postcode data for the location of carers and schools.
The following items are transformed:
- postcode - postcodes are given an indicator so that all children selected in the same postcode can be identified
- date left hospital after birth - number of days between birth and departure.
The following items have response categories collapsed (i.e. response categories combined to form an aggregate category):
- parents' occupation - output at two-digit Australian and New Zealand Standard Classification of Occupations (ANZSCO) level, or rounded off to the nearest five if ANU four ratings of occupational prestige
- occupation in previous job - output at two-digit ANZSCO level
- Socio-Economic Index for Areas (SEIFA) variables - rounded to the nearest 10
- country of birth (coded as 0 if fewer than five contributors)
- religion (coded as 0 if fewer than five contributors)
- language other than English (LOTE) (coded as 0 if fewer than five respondents).
The following data items are top-coded:
- housing costs
- child support paid by Parent 2
- children and parents' current height, weight and waist circumference
- number of hours spent in child care.
Topics are supressed in relation to study child offspring information. LSAC assessed disclosure risk assessment of study child offspring information available in K cohort (less than five cases). Topics that were considered as highly vulnerable to exposure to privacy risk were family demographics, health behaviour and risk factors, health status, home education environment, offspring program characteristics, paid work, parenting and relationships. This information is available in the restricted data file whereas the information is presented as missing in the general release data file.