Data user guide

The Longitudinal Study of Australian Children: An Australian Government Initiative
Data User Guide – December 2018

5. File structure

For the Wave 7 data general release, the following datasets are available.

Table 6: Data release for waves and cohorts
Number of datasets Description of datasets Main dataset for each wave and cohort Data type
14 Main datasets for each wave and cohort lsacgrb0*, lsacgrb2, lsacgrb4, lsacgrb6, lsacgrb8, lascgrb10, lascgrb12, lsacgrk4*, lsacgrk6, lsacgrk8, lsacgrk10, lsacgrk12, lsacgrk14, lsacgrk16 Main
2 Study child household hhgrb, hhgrk Supplementry
1 P1 RAP household p1raphhgrk16 Supplementary
8 PLE household plehhgrb6, plehhgrb8, plehhgrb10, plehhgrb12, plehhgrk10, plehhgrk12, plehhgrk14, plehhgrk16 Supplementary
3 Event history calendar ehcegrk16, ehcrgrk16, ehcsgrk16 Supplementary
2 Executive functioning execksc, execkp1 Supplementary
23 Time use diary tudb10, tudb12, tudk10, tudk12, tudk14
  • one cleaned data file with problematic cases deleted for each cohort for Waves 1, 2 and 3 (diaryb0, diaryb2, etc.)
  • one data file with the cases deleted from the above files after cleaning for each cohort for Waves 1, 2 and 3 (poortudsb0, poortudsb2, etc.)
  • one data file with all cases and no data cleaning performed on them for each cohort for Waves 1, 2 and 3 (ucdiaryb0, ucdiaryb2, etc.)
Supplementary
2 Wave 2.5 lsacgrb3, lsacgrk7 Supplementary
2 Wave 3.5 lsacgrb5, lsacgrk9 Supplementary
7 Medicare Australia mbssc, pbssc, mbsp1, mbsp2, pbsp1, pbsp2, acir Linked
1 NAPLAN lsacnaplan Linked
1 MySchool lsacmyschool Linked
1 AEDC aedc^ Linked
3 Centrelink welfare isp_summary, ftb_summary, concession_cards^ Linked
1 Child Health CheckPoint lsacgrcp^ Substudy

Notes: * Wave 1.5 datasets have been added to the Wave 1 datasets. This was possible because all participants who responded at Wave 1.5 had to complete a Wave 1 interview. This is not the case with the other between-wave mailouts, as respondents may have completed any prior combination of interviews. This structure has been used to reduce the size of the main datasets and because some data are formatted using more than one record for each child. ^ This is available with additional approval.

5.1 Main dataset

The main dataset consists of the data from all questionnaires except the time use diary, Wave 2.5, Wave 3.5, Wave 4.5, Wave 5.5, some household composition information and linked datasets. Data from the instruments are presented in the following order

  • FCF (Wave 1 files only)
  • F2F
  • P1 self-complete (except Wave 1 files)
  • P2 self-complete
  • PLE self-complete/interview (except Wave 1 files)
  • Teacher/Carer questionnaire
  • Wave 1.5 data (Wave 1 files only)

A number of derived variables are included in the output dataset alongside the raw responses used in their derivation. Additionally, the main datasets contain status variables (e.g. date of interview, whether each type of form was returned, etc.), ABS Population Census and NCAC data, and weights.

5.1.1 Australian Bureau of Statistics Census of Population and Housing data

Public data from the Australian Bureau of Statistics Census of Population and Housing have been added to the file to enhance the range of neighbourhood characteristics available for analysis with the LSAC data. Census data are available for the child's residence from Waves 1 to 7.

The items currently included are:

  • SEIFA - rounded off to the nearest 10 for on the general release file
  • remoteness area classification
  • percentage of persons aged under 5, 10 and 18 years
  • percentage of persons born in Australia
  • percentage of persons speaking English-only at home
  • percentage of persons with Aboriginal and Torres Strait Islander (ATSI) origins
  • percentage of persons who completed Year 12 schooling
  • percentage of persons in above-median income category
  • percentage of persons working
  • percentage of households with internet capacity (in 2006 Census only)
  • percentage of households with broadband (in 2006 Census only).

Census data is either linked at the Statistical Local Area (SLA), before 2011, or the Australian Statistical Geography Standard (ASGS) level, from 2011, or, where this wasn't available, the child's postcode. One estimate is provided for each time point representing a linear interpolation of the data at the censuses either side of the time period. For example, if a SLA had 4.2% of people with ATSI origins in 2001 and 6.5% with ATSI origins in 2006 then the estimate for the proportion in 2004 would be:

""

""

""

""

If data is only available for one of the censuses then no interpolation is performed. A 'link type' variable is included to tell data users whether the linkage was performed using statistical area level or postcode and whether the 2001 census, 2006 census, 2011 census or 2016 census or all were used.

5.1.2 National Childcare Accreditation Council data

A key research question in LSAC relates to the effect of child care on children's developmental outcomes over time. While LSAC collected parent-reported information on children's child care histories and carer reports on the child care environment, relatively little systematic information was collected on the quality of child care.

The National Childcare Accreditation Council Inc. (NCAC) as it was then had quality assurance data on every long day care (LDC) centre, some family day care (FDC) schemes and some before- and after-school care providers. The LSAC dataset includes linked NCAC data for most children using LDC or FDC at Wave 1, where contact details of this care were obtained and matched with NCAC data. The match rate obtained during the linkage process was 78% for Wave 1, 82% for Wave 2, 84% for Wave 3 and 92% for Wave 4.

One complication in using the NCAC data is due to the change of accreditation systems for both FDC and LDC. In Wave 1, all cases had FDC assessed under the guidelines laid out in second edition of the FDCQA Quality Practices Guide (NCAC, 2004), while from Wave 2 and onwards, all cases have been assessed under the third edition of this guide, introduced in July 2005. The revised guidelines contain the same quality areas (though some have been combined) but the number of principles used to assess these areas has been reduced from 35 to 30. The old scheme had 10 quality areas assessed by 35 principles, while the new scheme has seven quality areas assessed by 30 principles.

For LDC, all Wave 1 centres were assessed under the QIAS Validation Report, 2nd Edition (NCAC, 2003). From July 2006, accreditation decisions were made under the QIAS Quality Practices Guide, 1st Edition. As a consequence, some of the Wave 2 and 3 accreditations were made under the new scheme, while some were made under the old scheme.

Before- and after-school care arrangements were assessed by the guidelines laid out in the OSHCQA Quality Practices Guide, 1st Edition (NCAC, 2003). In Waves 2 and 3, some accreditations were made under the new scheme, while some were made under the old scheme.

The variables included are:

  • date of accreditation
  • date of validation
  • accreditation status
  • LDC v1 Quality area 1: Relationships with children
  • LDC v1 Quality area 2: Respect for children
  • LDC v1 Quality area 3: Partnerships with families
  • LDC v1 Quality area 4: Staff interactions
  • LDC v1 Quality area 5: Planning and evaluation
  • LDC v1 Quality area 6: Learning and development
  • LDC v1 Quality area 7: Protective care
  • LDC v1 Quality area 8: Health
  • LDC v1 Quality area 9: Safety
  • LDC v1 Quality area 10: Managing to support quality
  • LDC v2 Quality area 1: Staff relationships with children and peers
  • LDC v2 Quality area 2: Partnerships with families
  • LDC v2 Quality area 3: Programming and evaluation
  • LDC v2 Quality area 4: Children's experiences and learning
  • LDC v2 Quality area 5: Protective care and safety
  • LDC v2 Quality area 6: Health, nutrition and wellbeing
  • LDC v2 Quality area 7: Managing to support quality
  • FDC Quality area 1: Interactions
  • FDC Quality area 2: Physical environment
  • FDC Quality area 3: Children's experiences, learning and development
  • FDC Quality area 4: Health, hygiene, nutrition, safety and wellbeing
  • FDC Quality area 5: Carers and coordination unit staff
  • FDC Quality area 6: Management and administration
  • OHS Quality area 1: Respect for children
  • OHS Quality area 2: Staff interactions and relationships with children
  • OHS Quality area 3: Partnerships with families and community links
  • OHS Quality area 4: Programming and evaluation
  • OHS Quality area 5: Play and development
  • OHS Quality area 6: Health, nutrition and wellbeing
  • OHS Quality area 7: Protective care and safety
  • OHS Quality area 8: Managing to support quality
  • Demographic data

The data used to develop the quality areas were collected from six sources:

  • a self-study report prepared by centre management
  • a validation survey completed by the director
  • a validation survey completed by staff
  • a validation survey completed by families
  • a validation report completed by an independent peer
  • a set of moderation ratings completed by independent moderators.

Data on 35 principles were collected. Each principle was related to one of the 10 quality areas. Response categories for each principle were: 'unsatisfactory', 'satisfactory', 'good quality' and 'high quality'. Proportionally weighted factor-score regression coefficients for principle ratings were calculated to determine the extent to which each principle contributed to a quality area. For further information, see Rowe (2006).

As no data about the child was obtained, no consent was required from parents to collect this data (although parents did need to give details about their carers to assist in the linking).

5.2 Supplementary files

5.2.1 Household composition data

The study child household: At each wave of data collection, detailed information about every member of the household where the study child resides is collected. Information is collected about people currently residing in the study child's household, as well as people who have come and gone between waves but lived with the study child for at least three months. This information is usually collected from Parent 1 only. However, from Wave 7 onwards, if a study child has moved out of the parental household, this information is collected directly from the study child. Parent 1 is still asked to provide information on their own household (P1 RAP).

The main household dataset for each cohort contains one record for each study child, detailing the composition of their household from their recruitment to the study to the most recent data collection. This dataset also includes ex-household members (with a variable indicating that they are no longer resident), such as parents living elsewhere who were resident at a previous wave. The details collected about the study child, P1 and P2 are included in each main dataset, along with a number of derived variables on household composition.

The study child's household is always the household where the study child resides. When the study child resides with parents, the information is collected about the parental household and saved in the household file 'hhgrb/k'. When the study child moves out of the parental household (SC RAP) to another household (independent living) the information is collected about all members of the household the study child moves to and is saved in the household file 'hhgrb/k'.

PLE household: PLE household composition data is released from Wave 4 and contains detailed information about every member of the household in which the parent living elsewhere lives. The household data file is wave specific and released cross-sectionally at every wave, one record per study child.

P1 RAP household: Another household composition data file available in Wave 7 for the K cohort is the P1 RAP. This file contains detailed information about every member of the P1 RAP household and is saved in the file 'p1raphhgrb/k'. The P1 RAP household is a parental household of study children who were living away from P1 during the Wave 7 interview.

5.2.2 Event history calendar

The event history calendar (EHC) was introduced in Wave 7 to collect retrospective reports of events and the timings of those events from the K cohort children. The primary focus of the EHC was to capture information on residential living arrangements, study and employment domains. Three data files are available with each corresponding to the specific domain (ehcrgrk16 - resident living away, ehcegk16 - employment and ehcsgrk16 - study). The files are structured as long format data, allowing multiple reports of events per child where possible. The EHC data file names are wave specific with the keyword 'K16' representing the 16 years of age of K cohort respondents. The EHC was able to capture all the changes that have occurred in these domains since the Wave 6 interview; or if the respondent was not interviewed in Wave 6, the two years preceding the date of the Wave 7 interview.

5.2.3 Executive functioning

Executive functioning data was collected from K cohort study children in Wave 6 and the parents (P1) of K cohort study children in Wave 7 interviews. This information is available in two separate data files (execksc and execkp1). The data file names are respondent-specific with keywords KSC and KP1 representing study children and parents of the K cohort. However, the first letter of variable names in these data files represents wave-specific/child age indicator information. Further information about Cogstate data collection is available in LSAC Technical Paper No. 19, Executive Functioning: Use of Cogstate Measures in the Longitudinal Study of Australian Children [PDF 401 KB].

5.2.4 Time use diary data

In Waves 1-3, responding families were given two time use diaries (TUDs) to complete at each wave. Each record in the TUD data relates to a single diary; that is, each child can have up to two records (one for each TUD).

This paper form TUD gathered information on children's activities and the context of 96 15-minute periods in each 24-hour block. In addition to these variables, the TUD data includes the child's unique identification number in order to allow linkage with the main dataset. It also includes the following general descriptors:

  • date diary should be completed
  • day of week diary should be completed
  • diet of the study child on the day in question (Waves 2 and 3)
  • relationship of the diary writer to the child
  • over what duration the diary was completed
  • actual day and date of completion
  • hours of work done by respondent on day of completion (Waves 2 and 3)
  • the kind of day described in the diary.

Due to scanning problems in Wave 1, and other data quality issues that are likely to apply equally across waves, a number of imputations and corrections have been applied to the TUD data (for more details, see Data Issues: Waves 1 to 7). So, researchers can determine the effect of these imputations/corrections to the data on any analysis. An uncorrected version of the TUD data is also provided, as well as files containing imputations/corrected versions of cases that were considered unsuitable for data analysis even after correction.

LSAC Technical Paper 4 Children's time use in the Longitudinal Study of Australian Children: Data quality and analytical issues in the 4-year cohort [PDF 840 KB] and Technical Paper 13 The Times of Their Lives: Collecting time use data from children in the Longitudinal Study of Australian Children [PDF 1.5 MB] include detailed discussions of issues that should be considered when using the time use data.

In Wave 4 a new methodological approach was undertaken due to a shift from the parent being the informant to the study child being the informant. In Waves 4-6 only the K cohort completed the TUD, which was substantially different from the TUDs that the parents had completed in earlier waves. With the child being the informant, the interviewer was directly involved in working with the child to transfer information from the diary into a computer instrument. In Wave 7 the TUD was collected only for B cohort. Waves 4-7 had the form of an 'ABS Activity Episode' diary. This data is stored as a long file, as opposed to the wide files the previous diaries were stored as.

Example analysis
SAS

The following code gives the proportion of children eating or drinking while watching a TV, video, DVD or movie at any time of day for the B cohort at Wave 1. Statements 1 and 2 tell SAS to create a new dataset beginning with the data in the mtud.diary2 file (you will need to use your own library name). The third statement tells SAS to treat the time use data as a multidimensional array (x) containing 96 rows of 40 columns each. The next statement tells SAS to set up a new array of 96 variables (TVeat) into which the data for eating in front of the TV will be derived.

Statements 5-8 contain a do loop, which runs across all 96 time periods. Statement 5 tells SAS to create a variable 'i' to keep track of which time period is being worked on, and to give it the values 1-96 in turn. Statement 6 tells SAS to allocate the value 100 at the position in the 'TVeat' array for the current time period if the child was eating or drinking (column 4 in the array 'x') and was watching a TV, etc. (column 12 in 'x'). Statement 7 says the value of 0 will be assigned if the child either wasn't eating or drinking or wasn't watching TV, etc., and the diarist was sure of the child's activities for the time period. This means that cases where the diarist wasn't sure, or didn't fill any information in for activities in this time period, will have missing data. Statement 8 finishes the do loop, and statement 9 finishes the data step so SAS runs the above statements.

Statements 10-13 produce the means of the variables in the 'TVeat' array (which SAS gives the names TVeat1 to TVeat96 by default). The mean here will be the percentage of children from whom an activity was known that ate or drank in front of the TV, etc., at each time period. Line 12 uses the day weight variable 'bweightd' to ensure the proportion is representative of the population and represents each day of the week equally.

data diary2;

set mtud.diary2;

array x [96,40] b2da0101--b2de0196;

array Tveat [96];

do i=1 to 96;

if x[i,4]=1 and x[i,12]=1 then Tveat[i]=100;

else if (x[i,4]=0 or x[i,12]=0) and x[i,1]^=1 then Tveat[i]=0;

end;

run;

proc means data=diary2;

var Tveat1-Tveat96;

weight bweightd;

run;

This data can be used to produce a graph known as a tempogram.

Figure 2 shows the data produced by the example program along with the equivalent data for the K cohort at Waves 1 and 2. It shows that children did more of this as they got older, and that this activity was most common in the early mornings.

Figure 2: Tempogram of children watching TV, video, DVD or movie while eating or drinking by wave and cohort.

Figure 2: Tempogram of children watching TV, video, DVD or movie while eating or drinking wave and cohort

SPSS

The equivalent code to derive the TVeat variable in SPSS would appear as:

do repeat

eat b2da0401 b2da0402 … b2da0496/

tv b2da1201 b2da1201 … b2da1296/

dk b2da0101 b2da0101 … b2da0196/

tve tveat1 to tveat96.

if (eat=1 or tv=1) tve=1.

if ((eat=0 or tv=0) and dk=0) tve=0.

end repeat.

STATA

The equivalent code to derive the TVeat variable in STATA would look like:

foreach n of numlist 1/9 {

gen tveat`n'=1 if (b2da040`n'==1 & b2da120`n'==1)

replace tveat`n'=0 if ((b2da040`n'==0 | b2da120`n'==0) & b2da010`n'==0)

}

foreach n of numlist 10/96 {

gen tveat`n'=1 if (b2da04`n'==1 & b2da12`n'==1)

replace tveat`n'=0 if ((b2da04`n'==0 | b2da12`n'==0) & b2da01`n'==0)

}

5.2.5 Wave 2.5 data

The data from the Wave 2.5 mailout is included in two separate datasets. Unlike Wave 1.5 in relation to Wave 1, families that responded to Wave 2.5 did not necessarily respond to Wave 2. Merging these with the Wave 2 datasets would have resulted in a number of largely blank cases on the data file.

The data in the Wave 2.5 file consists of questionnaire items, a small number of derived items and linked census data based on the postcodes of responding families at the time of Wave 2.5. Unfortunately, formatting of the questionnaires resulted in some respondents skipping items that they should have answered. Imputation has been performed on some items where it was possible to infer the data for these questions based on responses to other questions. See Data Issues: Waves 1 to 7 for further information.

5.2.6 Wave 3.5 data

The data from the Wave 3.5 mailout is included in a separate dataset, in the same way that data from Wave 2.5 was included.

The data in the Wave 3.5 file consists of questionnaire items, a small number of derived items and linked census data based on the postcodes of responding families at the time of wave 3.5. Imputation has been performed on some items where it was possible to infer the data for these questions based on responses to other questions. See Data Issues: Waves 1 to 7 for further information.

5.2.7 Distance to coast data

Distance to coast has been generated for every residential address in Waves 1-7 by geocoding latitude and longitude information. The distance to the coast data for each cohort (B and K) are stored in a separate data file. The dataset contains one record per study child with multiple distance-related variables representing different waves of data collection as denoted by the first letter of the variable name. See Distance to coast data information [PDF, 503 KB], providing information on distance calculation and confidentialisation strategy. Distance to coast data are only available with restricted release data files.

5.3 Linked data

5.3.1 Medicare Australia data

In Wave 1, 97% of parents of study children gave consent for their children's data to be linked with Medicare Australia data on an ongoing basis. This includes data from the Medicare Benefit Scheme (MBS), the Pharmaceutical Benefit Scheme (PBS) and the Australian Childhood Immunisation Records (ACIR). Data from these sources provide details of usage history of MBS, PBS and ACIR services. Linkage was successful for 93% of children (incomplete consent forms resulted in data not being released for about 400 children).

Since the child's use of medical services is ongoing, the Medicare Australia data are not broken into waves but are provided as three separate files:

ACIR: Each record in the file represents an immunisation that the child has had.

MBS: Each record on this file represents a benefit claim.

PBS: Each record represents a benefit claim.

In Wave 7, Parent 1 and Parent 2 themselves consented to their data linkage for MBS, PBS and RPBS.

ACIR file

Records are currently available for payments received from birth to early 2013. The following variables are included on the file:

  • child identification number
  • vaccination code
  • vaccination name
  • scrambled provider ID
  • date of receipt of payment
  • date of immunisation.

Some of the vaccination codes contain dose numbers, indicating a vaccine that has been received in a series of doses. The sequence of doses for these has been included in the dataset (i.e. 1st, 2nd, etc.). If a dose is missing, it means that it was either not reported to ACIR or it was missed.

MBS file

Records are currently available for services between January 2002 (or birth for the B cohort) and early 2015. The following variables are included on this file:

  • child identification number
  • item number
  • item name
  • amount of benefit paid
  • hospital indicator
  • scrambled provider ID
  • date of payment
  • date of service.

Some cases have very small or negative benefit amounts. In relation to negative benefits, this indicates that an adjustment has been made to the Medicare benefit records. There are several reasons why this may happen:

  • It is a correction of a data entry made against the wrong individual reference number on a Medicare card (i.e. service is initially incorrectly recorded against someone else on the same card).
  • The provider has issued an amended account.
  • A new cheque has been issued to replace lost/stolen/unpresented cheques.

In relation to small benefits:

  • There are a number of item numbers that have small benefits; for example, many pathology-related claims.
  • There are also small amounts for things such as bulk bill incentives (generally around $5-6).
  • The claimant had reached the Medicare Safety Net (MSN) threshold. Once the threshold has been reached, the family's out-of-pocket expenses are tallied and a payment is calculated for a percentage of the substantiated amounts. In effect, there can be two payments made for the same doctor's visit - one to the doctor for the service and one to the claimant for MSN purposes.
PBS file

The final of these datasets contains the PBS data. Again, each record represents a benefit claim. Records are available for medications supplied between May 2002 (or birth for the B cohort) and early 2015. The following information is included for each record:

  • child identification number
  • item code
  • item name
  • quantity
  • benefit paid
  • prescription type (original, repeat or unknown)
  • payment category
  • payment status
  • date of payment
  • date of supply.
Example derivations

There are simple techniques in SAS, SPSS and STATA to summarise across multiple records to create derived items from the Medicare datasets. The following code samples create a variable (ben07) for the amount of PBS benefits paid for a child in 2007. Note that this variable will initially be missing for cases that had no PBS claims in 2007 as well as those for which data linkage was unsuccessful. The 'match' file can be used to distinguish between these cases and set ben07 to 0 for those with no claims. This file contains a variable called 'medicare', which is 1 if linkage is successful for a case and 0 otherwise.

SAS

proc means data=m.pbs nway sum;

class hicid;

var benefit;

where datesupp>=mdy(1,1,2007) and datesupp<=mdy(1,1,2008);

output out=temp sum=ben07;

run;

data temp;

merge temp m3.match;

by hicid;

if medicare=1 and ben07=. then ben07=0;

run;

SPSS

temp.

select if (datesupp >= date.dmy(1,1,2007) & datesupp <= date.dmy(31,12,2007)).

aggregate

/outfile='/temp.sav'

/break=hicid

/ben07=sum(benefit).

get

file='/temp.sav'.

match files /file=*

/file='/match.sav'

/by hicid.

if (medicare=1 & missing(ben07)) ben07=0.

execute.

STATA

Note that the collapse command will delete all other data than hicid and ben07. Ensure it is saved to a new file.

collapse (sum) ben07=benefit if (datesupp>=mdy(1,1,2007) & datesupp<=mdy(1,1,2008)), by(hicid)

merge hicid using match

replace ben07=0 if (medicare==1 & ben07==.)

keep if ben07!=.

sort hicid

save temp, replace

5.3.2 AEDC data

Every three years since 2009, the Australian Government has undertaken a census of all children in their first year of full-time schooling. Find out more information about What the AEDC means for parents.

The Australian Early Development Census (AEDC) data for B cohort children were obtained from the Department of Education. The Department of Education is responsible for the AEDC. The Social Research Centre manages the data. The data contains no variable labels or value labels but these can be found in the Data Dictionary provided on the AEDC website.

5.3.3 NAPLAN data

In Wave 3, 81% of parents of K cohort children gave consent for their child's data to be linked with NAPLAN data for the duration of the study. Linkage was successful for 96% of these children. For 4% of children, the NAPLAN data were not found, either because these children had not sat NAPLAN tests yet or they sat the NAPLAN tests in 2008 or 2009 but a match was not found. Families who did not give consent or who did not participate at Wave 3 were asked again at Wave 4. Out of 964 families who were followed up in Wave 4, 847 gave consent to link NAPLAN results.

The Wave 6 LSAC NAPLAN release includes B cohort and K cohort NAPLAN results for 2008-14.

LSAC Technical Paper 8 Using National Assessment Program - Literacy and Numeracy (NAPLAN) data in the Longitudinal Study of Australian Children (LSAC) [PDF 1.4 MB] includes a detailed discussion of the NAPLAN data linkage process and data issues, and should be considered when using the LSAC NAPLAN data.

5.3.4 ACARA MySchool data

Data has been obtained from ACARA. ACARA is responsible for collating NAPLAN data received from Australian schools, collecting school characteristics and managing the MySchool website. Some of the data ACARA collects and collates on Australian schools is publicly available on the MySchool website. School data about the schools LSAC participants attend has been linked onto the LSAC survey datasets and is available to data users. See Technical papers

5.3.5 Centrelink welfare data

During Wave 7 enumeration consent was collected from the K cohort study child's parents (P1 and P2) to link their Centrelink welfare benefits back to 1 January 1999 and from the K cohort study child to link back to their 16th birthday. The data includes information on income support payments, Family Tax Benefit, Carer Allowance and concession cards. The data released with Wave 7 are extracted up until the end of the 2016/17 financial year (30 June 2017), apart from the Family Tax Benefit data, which is only extracted up until 30 June 2015 as it is based on entitlement calculated after reconciliation with tax data.

The linked Centrelink data is provided in separate datasets from the main LSAC data files and there are both general release and restricted release versions. These files are not supplied automatically with the LSAC data files and have to be explicitly requested. The general release version of the Centrelink data can be applied for by data users applying for either the general release or the restricted version of the main LSAC files at no additional cost. The restricted version of the Centrelink data can also be applied for users of the general release LSAC file.

Applicants for the restricted Centrelink files will need to present a project rationale for access to the restricted data making it clear why this data is essential for their research. This will entail either specifying why particular data items are required or why the research questions require access to episodic income support data. See below for a description of the data available in the two versions of the Centrelink files.

General release Centrelink files

ISP_Summary: Contains data for income support payments receipt (ISP) aggregated at financial-year level. For each participant who has received an income support payment in a particular year there will be a single observation. The following information is included in the summary file:

  • benefit type received by the participant for the greatest duration during the year
  • number of days that the participant received an income support payment and duration they received the primary benefit type
  • duration in receipt of rent assistance, home ownership status and rent type
  • number of days the participant received other income while in receipt of an income support payment
  • number of days the participant was partnered
  • indicators for receipt of carer allowance payment and low income card

FTB_Summary: Contains data for Family Tax Benefit (FTB) summarised aggregated at financial-year level based on a participant's reconciled eligibility and entitlement determined after receipt of their taxable income provided by the ATO. Information is only provided up to two years prior to the extraction date at which point the data is considered 'mature'; that is, the vast majority have tax data against which their entitlement can be reconciled. The information provided includes:

  • no. of days the participant was eligible for FTB (in total), FTB-A and FTB-B
  • no. of days the participant was eligible for an ISP while eligible for FTB
  • no. of days customer was partnered with a primary partner while eligible for FTB
  • no. of days the participant was partnered with ex-partners while eligible for FTB
  • count of children assessed as FTB children
  • total validated adjusted taxable income (customer + primary partner + ex-partners).

Concession_cards: Contains episodes of concession cards data for participants where a participant held a concession card. As a participant can have multiple concession cards during the same time duration, in some cases this file has overlapping episodes of concession cards for a participant. Information includes the benefit type that qualified them for a concession, the concession card type and the number of dependent children.

Restricted release Centrelink files

ISP_Episodic: Holds the information for each episode of ISP receipt. In addition to the variables in the summary file the following information is provided:

  • entitlement rate
  • activity requirements
  • reason for end of payment
  • earnings amount and work hours
  • educational details - student status, course level and type, highest educational level before episode
  • rent amount
  • homelessness
  • medical conditions (currently a binary indicator pending confidentialisation) and impairment rating
  • vulnerability indicator.

FTB_Customer_Reconciled: Has the same structure as the ISP_Summary file. Additional information provided includes:

  • age, citizenship, Indigenous indicator, overseas indicator, preferred written language, remoteness area
  • no. of days eligible for FTB-A (by rate type)
  • no. of days eligible for of FTB-B
  • FTB-A and FTB-B pre-reconciliation eligibility amounts (paid and notional)
  • FTB-A and FTB-B post-reconciliation entitlement amounts
  • maintenance income and amount of FTB-A not paid due to MI test
  • no. of days overseas
  • count of FTB shared care children
  • no. of days also eligible for an ISP
  • adjusted taxable income broken down by components.

FTB_Child_Reconciled: This file holds the reconciled data for the FTB children for which a participant received FTB payments in an entitlement year. The data contains one observation for each FTB customer - FTB child combination for each entitlement year during which the participant/customer received FTB payment for the corresponding child. Details for children aged 16 or over are not included due to privacy considerations. Information includes:

  • age, gender, overseas indicator and duration
  • post-reconciliation durations for FTB-A and FTB-B
  • regular and shared care durations
  • FTB-A supplement amount.

5.4 CheckPoint Health data

A comprehensive, one-off physical health and biomarker module, known as the Child Health CheckPoint, was added for the B cohort between LSAC Waves 6 and 7. B cohort families who took part in a LSAC Wave 6 home interview were eligible for the Child Health CheckPoint module. In 2015-16, the B cohort child and one of their parents participated in a comprehensive clinic appointment or shorter home visit. A second parent was also invited to provide a genetic sample. The study child was aged 11-12 years at the time of assessment. The aim of this additional phase was to learn more about the health of young Australians between childhood and adolescence. Further information about Child Health CheckPoint is available from the study website.

Ideally, a physical health and biomarker module would have been offered to both B and K cohorts. However, because the CheckPoint was funded by a national competitive grant scheme, there were only sufficient funds to assess one of the two LSAC cohorts. The B cohort was chosen over the K cohort because the younger cohort has early-life data collected prospectively; were commencing puberty, which was important to many CheckPoint measures; and were at an age where the study children were less likely to become disengaged or too busy to participate.

During the LSAC Wave 6 home visit, the interviewer briefly introduced the Child Health CheckPoint and collected written consent to pass their contact details to the CheckPoint team solely for purposes of recruitment to the CheckPoint module. The majority of the Wave 6 interviews took place in March-September 2014. Permission for contact was received from 3,513 families (93% of Wave 6 families and 69% of the original cohort). See the Child Health CheckPoint website.