Data User FAQs

Getting access to LSAC Data
Ethics
Data Linkage
Postcodes
Specific Data Questions

Getting access to LSAC Data

How do I access LSAC data?

Information on how to access LSAC Data can be found on our accessing data page.

What are the main differences between the Restricted Release and the General Release Datasets?

The General Release includes data which has more sensitive information removed, such as:

names
addresses (including postcodes)
date of birth.

Some other information has been confidentialised by various methods such as top coding and the application of classification codes at a more general level.

The Restricted Release includes data which has names and addresses removed but some information such as postcodes, date of birth and other data are provided at a more detailed level than the General Release datasets. Because of this, access arrangements to Restricted Release data are more rigorous than for General Release data.

Can I access the Restricted Release if I am based overseas?

Unfortunately, no. As an overseas researcher, you will only have access to General Release data. You may have the opportunity to access Restricted Release data if you are working with Australian-based collaborators.

Can access to LSAC data be shared amongst people from the same institution?

No. Only individual users can apply for access to LSAC data. Individuals will need to apply for access through the Australian Data Archive (ADA). More detailed information on how to access LSAC data is available

How do I get access to the data from the CheckPoint dataset?

You can apply and download CheckPoint data directly from the LSAC Dataverse.

I am a student, can I submit/lead my own project?

Yes, however you must specify your current level of study and provide the name, organisation and email address of your supervisor or instructor.

I have followed all of the steps outlined on the 'Accessing LSAC data' page, but I haven't received an outcome yet. How can I check the status of my request to access LSAC data?

Please contact ADA Dataverse.

How much does it cost to access LSAC data?

LSAC data is available to use free of charge.

My project is complete, what do I need to do to close the project?

Upon completion of your project, you must follow the steps to relinquish access to the datasets. Please refer to the DSS Longitudinal Studies Data Access and Use Guidelines.

When is the next LSAC Data Workshop?

The date of the next workshop is to be confirmed. Please subscribe to our mailing list or view our Data User updates. You can also register your interest for the next Data User Workshop.

Ethics

Do I need separate ethical approval to use LSAC data?

Separate ethics approval is not required for the use of LSAC data, but use of the data and the nature of the data will need to be included in your original research ethics application to your relevant ethics committee.

My ethics application has requested details about consent. Can you provide a copy of the consent forms used in LSAC?

We are unable to provide specific consent forms, but we can provide sufficient documentation for your ethics application.

The research methodology and survey content of Growing Up in Australia: The Longitudinal Study of Australian Children (LSAC) is reviewed and approved by the Australian Institute of Family Studies Ethics Committee, which is a Human Research Ethics Committee registered with the National Health and Medical Research Council (NHMRC). The Ethics Committee ensures that Growing Up in Australia meets the ethical standards outlined in the National Statement on Ethical Conduct in Research Involving Humans.

Data Linkage

What linked health and education records are available?

Each wave the LSAC study data is linked to a broad range of education and health related administrative data, such as Medicare (MBS/PBS), NAPLAN and AEDC. More information on LSAC data linkage administration is available.

Postcodes

Where can I find postcodes and geographical information?

Postcodes and ABS geographical indicators:
The LSAC datasets contain geographical indicators coded using Statistical Local Area (SLA) from Wave 1 to Wave 6 (2011) and Australian Statistical Geography Standard (ASGS) from Wave 1 to Wave 9C2. Several editions of the SLA or ASGS have been used over the course of the study. The edition that was used to code each variable is specified in the Variable Label in the LSAC Data Dictionary.

Information on the latest and previous edition of the ASGS are available from the Australian Bureau of Statistics (ABS). SLA has been retired but the metadata are available from the Australian Institute of Health and Welfare (AIHW).

Metadata:
Postcode and SLA/ASGS variables are listed in the Data Dictionary.

Where and how to access the data:
Postcodes and finer-level ASGS variables (SA2-SA4) are only available in Restricted Release datasets. Broader-level ASGS variables are also available in general release. Access to the Restricted Release dataset can requested via the LSAC Dataverse.

Please note, the General Release dataset contains confidentialised geographical information (postcodes, LGA and ASGS (SA2-SA4)). This means that the values are assigned with an indicator so that data users can determine the selection of respondents in the same postcode/LGA/ASGS within the General Release dataset.

Is it possible to know the number of study participants from a specified list of Local Government Areas (LGA's) across multiple Waves?

We have Australian Statistical Geography Standard (ASGS) LGA for Waves 9C1 and 9C2. Otherwise, LGA could potentially be worked out from postcode data in other Waves. See the question about geographical indicators above for more info.

Note that these data would only be available in Restricted Release

Specific Data Questions

How do I calculate the duration of activities in the Time Use Diary (TUD)?

For the K cohort, Time Use Diary data were collected differently Waves 1-3 compared with in Waves 4-6. Thus, durations would be calculated differently depending on the Wave.

In Wave 1-3 the activities were recorded in 15-minute intervals, so one activity might span several intervals. The duration of an activity would be the sum of all consecutive 15-min intervals in which the same activity occurs.
In Waves 4-6 the activities were recorded by main activity, with variable start times. The duration of an activity would be the elapsed time between that activity’s start time and the next activity’s start time. The order of activities on a given day is indicated by the variable loop (before bed is 1-99; after bed is 100-199).

More information on data issues relating to the TUD here: (section 2) Data Issues: Waves 1 to 9 | Growing Up in Australia [PDF, 3.73 MB].

Why have the individual questions from the Social Skills Rating Scale (SSRS) and Social Skills Improvement Scale (SSIS) been left out of the dataset?

SSIS and SSRS are Pearson’s licenced and intellectual property (IP) products. LSAC used these products to measure self-control, empathy, cooperation, responsibility, and assertion on multiple informants (SC/Parent 1 and Teacher) across different Waves. Pearson constrains the publishing of details of individual SSIS and SSRS input items (wording and response frame) in publications and reporting without granted permission. Due to these constraints, the individual input SSIS and SSRS items are not available in LSAC data products such as labelled questionnaires, data dictionaries, data frequencies, rationale reports and associated variables in the unit record data files. Therefore, only derived variables measuring self-control, empathy, cooperation, responsibility, and assertion are provided in LSAC data products.

Regarding the MBS dataset, why are there multiple values in the Hospital Indicator variable that mean the same thing?

Two different agencies were involved in the extraction of Medicare data over the years. Therefore, there are some code frame differences.

Rules:

If the data was extracted by “Service Australia agency” the value ‘H’ means the service was carried out as a private patient in a public hospital.
If there is a blank cell, the service was carried out of the hospital setting.
If you find a coding frame ‘A’ that means pre-admission or post-discharge. You may not encounter them because AIFS have aligned the data code frames.
When the extraction was done by the Department of Health, the value ‘N’ means out of the hospital and the value ‘Y’ means the service was carried out as a private patient in a public hospital.

Please note that we're aligning the H and Y values in the next data release only (9.1C2), by changing H values to Y. 'A' values will remain the same. Out of hospital services will still be missing/null in SA data and 'N' in Department of Health data.

How do the sample weights work?

Please refer to the latest Technical Paper on Weighting and Non-Respondents [PDF, 1 MB].

I am calculating weights for the data and there are missing values under one of the variables. How do I handle this for calculating weights?

Please notify the LSAC Team by emailing [email protected]

Can weightings still be used to obtain representativeness when looking at data from a single state?

It’s our understanding that the relevant weightings are still applicable when looking at a particular State (e.g., Tasmania).

The sampling of the two LSAC cohorts were stratified by State and the data are benchmarked to the target populations for each state (see below):
Benchmarks for children in the B and K cohorts for each state by capital city/rest of state area were drawn from the ABS Estimated Resident Population as at March 2004 Benchmarks for households by language spoken at home and mother’s education level within each region were generated using proportions taken from the 2001 Census (p.5, LSAC Technical Paper No. 26: Wave 9C2 Weighting and non response [PDF, 1 MB]).

So, relevant weights should be applied when analysing the data for a specific state. However, it’s important to check whether the sample size is sufficient for the intended analysis.

More information on weighting in particular waves can be found in other LSAC Technical Papers.

Why are there a high number of observations with the code "-9"?

Section 5.8 of the Data User guide describes the missing value naming conventions.

A "-9" means the question wasn't asked. It does not include the participant intentionally skipping a question that was asked of them (this would be -3 in interviews, blank/null in self-complete forms).

Is there a data dictionary for the House Hold Form (HHF)?

There is no data dictionary available for household form (HHF) data because the information is mostly repeated across Waves. Questions asked via the HHF are not included in the labelled questionnaires, and are only reflected in the main survey data dictionary. The HHF is a vast database capturing household events across Waves. Some key information from HHF files is added to the main survey data files for data users for simplicity.

The data for X Wave isn't there/the wrong data is in the data file.

LSAC data files are named using the minimum age of participants at that Wave, not the Wave number. For example, data file lsacgrb14 is the General Release file for the B cohort at age 14-15 (Wave 8). For the age of the cohorts at each Wave see p2 of the LSAC Data User Guide.

Growing Up in Australia

Main menu