Data user guide

The Longitudinal Study of Australian Children: An Australian Government Initiative
Data User Guide – August 2018

6. Variable naming conventions

The variable naming convention was developed so that variables have predictable names across waves and informants, and so that thematically linked variables have similar names wherever possible. A two-page 'help sheet' is provided in this Data User Guide (see Appendix A) to help users learn these conventions.

6.1 Questionnaire variables

Variable names follow the standard format in most cases. Exceptions to this naming convention (derived items and household composition variables) are explained in the sections that follow.

Standard format: A tt xxxxx

Where:

A = child age indicator

tt = topic indicator

xxxxx = specific question identifier.

6.1.1 Child age indicator (alpha)

The child age indicator is an alpha symbol that indicates the child's age, allowing for comparisons between the cohorts where data have been collected for both cohorts at that age. For instance:

a     indicates the child is aged 0-1 years (which is the B cohort in Wave 1)

b     indicates the child is aged 2-3 years (which is the B cohort in Wave 2)

c     indicates the child is aged 4-5 years (which is the B cohort in Wave 3, and the K cohort in Wave 1)

d      indicates the child is aged 6-7 years (which is the B cohort in Wave 4, and the K cohort in Wave 2)

e      indicates the child is aged 8-9 years (which is the B cohort in Wave 5, and the K cohort in Wave 3)

f      indicates the child is aged 10-11 years (which is the B cohort in Wave 6, and the K cohort in Wave 4)

g      indicates the child is aged 12-13 years (which is the B cohort in Wave 7 and the K cohort in Wave 5)

h      indicates the child is aged 14-15 years (which is the K cohort in Wave 6)

i      indicates the child is aged 16-17 years (which is the K cohort in Wave 7)

This is an example of how the child age indicator is used for the item 'Parent 1 rating of parenting self-efficacy':

Wave 1 B cohort: apa01a

Wave 2 B cohort: bpa01a

Wave 3 B cohort: cpa01a

Wave 4 B cohort: dpa01a

Wave 5 B cohort: epa01a

Wave 6 B cohort: fpa01a

Wave 7 B cohort: gpa01a

Wave 1 K cohort: cpa01a

Wave 2 K cohort: dpa01a

Wave 3 K cohort: epa01a

Wave 4 K cohort: fpa01a

Wave 5 K cohort: gpa01a

Wave 6 K cohort: hpa01a

Wave 7 K cohort: ipa01a

Those items of information that do not change (e.g. details of birth, age child began or stopped something, etc.) are given the age indicator z so that they have a consistent variable name across cohorts regardless of the age of the child when the information was obtained. For example, zhs03a indicates 'birth weight of the study child' regardless of whether the information was collected when the child was aged 0-1 years, as for the B cohort, or aged 4-5 years, as for the K cohort.

6.1.2 Topic indicator (alpha)

The topic indicator is taken from the topic field of the data dictionary. Efforts were made to make the abbreviations used meaningful (e.g. family demographics is fd).

A list of topics and their abbreviations is provided in Table 7.

Table 7: Topics used in LSAC datasets
Abbrev. Topic Scope
ce Centrelink data Statistical information about payments and services
fad Family demographics Demographic information relating to the family such as education, ethnicity and religion
fn Finances Financial information such as income and use of government benefits
ed Education Scales that measure the effect of study on parenting
gd General development Scales that contain items from multiple domains of child development
hb Health behaviour and risk factors Behaviours and other risk factors that potentially impinge upon the health of the study child or his/her family. These include behaviours such as parental smoking and drinking as well as risk factors such as a parent experiencing diabetes during pregnancy.
he Home education environment Information on factors likely to impinge on the child's learning while at home such as parental support for education, number of books in the home and TV use. Also contains information on parent interaction with teachers such as parent teacher interviews including from the teacher's perspective
ho Housing Information on housing such as number of bedrooms, tenure type and payments
hs Health status Information about the physical and mental health status of the study child or his/her family such as body mass index, diagnosis of conditions and number of hospital stays
id Identifiers Questionnaire process variables such as sequence guides, consents and details of proxy respondents
lc Learning and cognition outcomes Information on the child's development in the areas of learning and cognition including language, literacy and numeracy
pa Parenting Information on parenting styles and other information affecting parenting such as self-efficacy
pc Program characteristics Characteristics of the educational or child care program such as type of program, number of days or hours the child attends and staff satisfaction
pe Parent living elsewhere Details of the child's PLE such as the relationship to study child, interactions with resident parents and child support
pl Parental leave in Australia Data from the Parental Leave in Australia Survey - a nested study
pw Paid work Information on work status such as employment, occupation and work/family interactions
re Relationships Information on the quality of relationships primarily focused on the relationship between Parent 1 and Parent 2, but also on broader family harmony
sc Social capital Information on social capital such as attitudes to neighbours and the neighbourhood and use of services
se Social and emotional outcomes Information relevant to the social and emotional development of the child such as temperament, behaviour and emotional states
tp Teaching practices Practices employed by teachers and child care workers in their work such as time use, use of resources and general philosophies

For example:

apa01a (P1 rating of self-efficacy) has 'pa' as the second and third letters as its topic is 'Parenting'; and

zhs03a (Birth weight of study child) has 'hs' as the second and third letter as its topic is 'Health status'.

6.1.3 Specific question identifier (alphanumeric)

The last five digits of a variable name make up the specific question identifier (if required). These digits contain whatever information is necessary to uniquely identify each item. Each has an arbitrary two-digit question number, not related to the questionnaire positioning. Items of related content are grouped together as much as possible.

For example:

bhs12a is whether P1 is concerned about the child's weight

bhs12b is whether P1 considers the child to be 'underweight', 'normal weight', 'somewhat overweight' or 'very overweight'.

The sixth digit of the variable name can also be an informant or subject indicator where a question is asked of or about more than one person. The indicators used are:

a    Parent 1

b     Parent 2

c      Study child

m     Mother

f      Father (or family home for census data)

t      Teacher/Carer

i      In-between waves respondent

p    Parent living elsewhere

y     Study Child Offspring (ya-1st offspring, yb-2nd offspring and yc-3rd offspring)

x    Other biological parent of the Study Child Offspring (xa-Other biological parent of 1st Child, xb - Other biological parent of 2nd Child, xc-Other biological parent of 3rd Child)

For example:

bhs13a is Parent 1's rating of their own overall health status.

bhs13b is Parent 2's rating of their own overall health status.

bhs13c is Parent 1's rating of the study child's overall health status.

bhs13p is the PLE's rating of their own overall health status.

bhs13m is the mother's rating of their own overall health status.

bhs13f is the father's rating of their own overall health status.

An exception to the above rule is in the area of child care and education (variables with topic indicators pc and tp). Here the prefixes a, b, c, d and e are used to mean different things at each wave depending on the options available to the child at that age (see Table 8).

All items that form a scale have a single question number. Where applicable, the name of the item also indicates the relevant subscale or sub-subscale (please note that this is done only where it is possible to do so, due to the eight-character limit for the name of an item).

An example of how this is applied is shown with the Conduct Problems and Peer Problems subscales of the Strengths and Difficulties Questionnaire (see Table 9). These are subscales that both P1 and the teacher filled out in Waves 1 and 2 for the K cohort.

As shown:

  • The 6th character in the variable name in this case represents an informant indicator: 'a' is for Parent 1, 't' is for teacher.
  • The 7th character indicates the subscale: 4 for Conduct, 5 for Peer. (Note: the subscales 1 for Prosocial, 2 for Hyperactivity and 3 for Emotional are also available as part of the SDQ.)
  • The final character uniquely identifies each item. (Note: different items were used for the Conduct subscale in Waves 1 and 2 due to the change in the child's age).
Table 8: Subject indicators for education and childcare variables
Indicator Age 0-1 Age 2-3 Age 4-5 Age 6-7 Age 8-9 Age 10-11 Age 12-13 Age 14-15 Age 16-17
a 1st child care 1st child care Main educational program Main educational program Main educational program Main educational program Main educational program Main educational program Main educational program
b 2nd child care 2nd child care 1st child care Before-school care Before-school care Before-school care Before-school care    
c 3rd child care 3rd child care 2nd child care After-school care After-school care After-school care After-school care    
d   Other child care 3rd child care   Child care at other times   Other child care Other child care  
e       Program child would attend if attending school Program child would attend if attending school Program child would attend if attending school      
o   Any extra care Any extra care Any extra care   Any extra care      
Table 9: Variable names of SDQa conduct and peer problems subscales
  Wave 1
Parent 1
K cohort name
Wave 1
Teacher
K cohort name
Wave 2
Parent 1
K cohort name
Wave 2
Teacher
K cohort name
Conduct problems        
Often loses temper cse03a4a cse03t4a dse03a4a dse03t4a
Generally, well behaved, usually does what adults request cse03a4b cse03t4b dse03a4b dse03t4b
Often fights with other children or bullies them cse03a4c cse03t4c dse03a4c dse03t4c
Often argumentative with adults cse03a4d cse03t4d N/A N/A
Can be spiteful to others cse03a4e cse03t4e N/A N/A
Often lies or cheats N/A N/A dse03a4f dse03t4f
Steals from home, school or elsewhere N/A N/A dse03a4g dse03t4g
Peer problems
Rather solitary, tends to play alone cse03a5a cse03t5a dse03a5a dse03t5a
Has at least one good friend cse03a5b cse03t5b dse03a5b dse03t5b
Generally liked by other children cse03a5c cse03t5c dse03a5c dse03t5c
Picked on or bullied by other children cse03a5d cse03t5d dse03a5d dse03t5d
Gets on better with adults than with other children cse03a5e cse03t5e dse03a5e dse03t5e

Note: a The SDQ is copyrighted by Robert Goodman, UK, 1999.

6.2 Derived variables

The derived items start with an age indicator, as outlined in section 6.1.1, followed by an informant or subject indicator and then a mnemonic that relates to the subject matter of the derived item. For example, the Peer subscale of the SDQ for the K cohort teacher in Wave 2 is dtpeer, where d = child aged 6-7 years, t = teacher and peer = Peer subscale of SDQ.

6.3 Study child household composition variables

In order to keep the variable names under eight characters, it was necessary to have a slightly different convention in the Wave 2 data release. Household composition variables have the following structure:

A f ##xmmm

Where:

A = Child age indicator

f = f (for 'family')

## = Question number (numeric)

x = Sub-question indicator (optional)

mmm = person identifier

Note:

The age indicator above is as described in section 6.1.1.

'f' is a constant to indicate that it is the household composition that is being described.

The question number and sub-question indicator indicate the question being responded to.

The person identifier indicates the member number, or other identification information. For every household, the study child is member 1, the Wave 1 P1 is member 2, and the Wave 1 P2 is member 3 (or will be missing if there is no P2 at Wave 1). Any additional people in the household at the time of Wave 1 are given member numbers 4 through to whatever is required. Each household member retains the same member number throughout the study, even if they leave and re-enter the study child's home.

Due to the requirements of the CAI instrument, some families have 'gaps' in member numbering; for example, where someone is Member 5 but Member 4 has never been assigned.

Member 1 is denoted by 'm1' in the above convention, member 2 as 'm2' and so on as required.

As families change from Wave 2 on, the new P1, P2, mother or father could have any member number apart from 1. For this reason, an extra set of variables has been derived to give the details for the P1, P2, mother and father at any age. This subscript is an age indicator and then either 'p1', 'p2', 'm', or 'f'.

A set of indicator variables tracks the household member number of P1, P2, mother and father at each wave. For example, bp2mn tells you the household member number of P2 when the child is aged 2-3, while cmmn gives the member number of the mother when the child is aged 4-5.

Some examples:

zf02m1 is the gender of the study child (z = unchanging characteristic, f = 'Family', 02 = gender, m1 = study child)

bf01m2 is whether the Wave 1 P1 is present in the household when the child is aged 2-3 (b = child aged 2-3, f = 'family', 01 = present for wave, m2 = Wave 1 P1)

cf01m3 is whether the Wave 1 P2 is present when the child was aged 4-5 (or whether there was a P2 at all in Wave 1 for the K cohort) (c = child aged 4-5, f = 'family', 01 = present for Wave, m3 = Wave 1 P2)

af08am is the relationship of the mother to the study child when the child was aged 0-1 (a = ages 0-1, f = 'family', 08 = relationship to study child, am = mother of child at age 0-1)

df01cp1 is whether the P1 of the child when aged 4-5 is present in the household when the child is aged 6-7. (d = child aged 6-7, f = 'family', 01 = present for wave, cp1 = child's P1 when child is aged 4-5)

cf13dp2 is whether the P2 of the child when aged 6-7 had a medical condition or disability at the time the child was 4-5 (c = child aged 4-5, f = 'family', 13 = whether person has a disability, dp2 = P2 when child is aged 6-7).

Table 10 shows the information that is available for each person.

Table 10: Question numbers used in variable names for household member characteristics
Question number Question
01 Present for wave
02 Gender
03 Age
04 Date of birth
05 Temporarily away from home (as per Wave 1 question)
06 Relationship to Parent 1
07 Relationship to Parent 2
08 Relationship to study child
08z Relationship to study child partner
09 Country of birth
10 Year of first arrival in Australia
11 Language other than English spoken at home
12 ATSI status
13 Has a condition or disability for six months or more (as per Wave 1 question)
13a 1st specific condition
13b 2nd specific condition
14 Date stopped living with study child
15 Reason stopped living with study child
16 Temporarily away from home (as per Wave 2 question)
16o Temporarily away from home (other) (as per Wave 2 question)
17 Has a condition or disability for six months or more (as per Wave 2 question)
17a Has sight problems (as per Wave 2 question)
17b Has hearing problems (as per Wave 2 question)
17c Has speech problems (as per Wave 2 question)
17d Has blackouts, etc. (as per Wave 2 question)
17e Has difficulty learning (as per Wave 2 question)
17f Limited use of arms or fingers (as per Wave 2 question)
17g Difficulty gripping (as per Wave 2 question)
17h Limited use of legs and feet (as per Wave 2 question)
17i Other physical condition (as per Wave 2 question)
17j Other disfigurement (as per Wave 2 question)
17k None of the above conditions (as per Wave 2 question)
18 Restricted in everyday activities
18a Has difficulty breathing (as per Wave 2 question)
18b Has chronic pain (as per Wave 2 question)
18c Has nervous condition requiring treatment (as per Wave 2 question)
18d Has mental illness requiring supervision (as per Wave 2 question)
18e Has head injury (as per Wave 2 question)
18f Has other long-term condition (as per Wave 2 question)
18g Has other condition requiring treatment (as per Wave 2 question)
18h None of the above restrictions (as per Wave 2 question)
19 Date began living with the study child
20 Household member was in the household for at least three months but moved in and left between current and previous waves
21 Person type
22 Young carer activities
23 Migration status

6.4 PLE household composition variables

From Wave 4, the household information for the child's parent living elsewhere (PLE) has been collected. PLE household composition variables have a similar structure to that of the study child household composition variables:

A f ##xple#

Where:

A = Child age indicator

f = f (for 'family')

## = Question number (numeric)

x = Sub-question indicator (optional)

ple# = person identifier within PLE household with ple (for Parent Living Elsewhere) and # member number

Note:

The age indicator is as described in section 6.1.1.

'f' is a constant to indicate that it is the household composition that is being described.

The question number and sub-question indicator indicate the question being responded to.

The person identifier comprises the constant 'ple' to indicate that it is the PLE household and the member number. For every PLE household, the study child is member 1 (ple1) and PLE is member 2 (ple2). For example, variable f02ple2 refers to a PLE gender when a study child is 10-11 years old. Any additional member in the household is assigned a PLE member number that remains the same throughout the study, even if they leave and re-enter the PLE's home.

Table 11 shows the information that is available for each PLE.

Table 11: Question numbers used in variable names for PLE household member characteristics
Question number Question
01 Present for wave
02 Gender
03 Age
04 Date of birth
05 Temporarily away from home (as per Wave 1 question)
06a Relationship to PLE
08 Relationship to study child
09 Country of birth
10 Year of first arrival in Australia
11 Main language spoken at home
12 ATSI status

A PLE household file also includes the following variables (the asterisk refers to the child age indicator):

*datplec - date of PLE CATI interview

*plepar - whether PLE has a partner

*pleparmn - PLE partner member number in PLE household

*dfd02p3 - date of recent PLE marriage

*dfd02p4 - date of PLE cohabitation.

6.5 Age invariant indicator variables

There are five variables at the start of each of the main data files that contain no age indicator. These are:

hicid - unique identifier assigned when child was selected by Medicare Australia

cohort

wave

stratum - stratum at the time of selection

pcodes - postcode at the time of selection

Users wishing to create long datasets should note the presence of these variables when removing age indicators.

6.5.1 Study child unique identifier

Each study child has a single, unique identification variable to ensure matching and merging across instruments, files and waves. This number was allocated at the time of selection by Medicare Australia.

The first digit indicates which cohort the child is in (1-4 = Infant; 5-8 = Child) and what fieldwork phase (see 'Methodology' section for more detail) the child was selected to be part of in Wave 1 (phase 1 = 1 and 5, phase 2 = 2 and 6, etc.).

The second is the state the child was selected from (1 = NSW, 2 = Vic., etc.).

The third indicates the part of the state the child was selected from (1-2 = capital city; 3-4 = rest of state).

The remaining five digits are a random number allocated by Medicare Australia.

Note that the stratum for selection may differ from the location of the child at interview and that the fieldwork phase may change from wave to wave.

6.6 Indicator variables

There are indicator variables in the main data files that indicate which parts of an interview were incomplete. These variables were created to flag to data users (through yes/no values) that no data, or only partial data, exists for an instrument (e.g. the CASI) or an informant (e.g. Parent 1). The data may be incomplete due to a number of different reasons. There may be no data if a self-complete form was not returned; the parent/child did not provide consent to obtain/provide the data; one of the informants refused to participate; or when the interview was only partially completed.

For example, on the day of the interview the parent may consent to the child participating but refuse to participate themselves. In this example, there would be data for the sections where the study child is the informant; however, there would be no data for the sections where P1 is the informant. To identify these cases a data user can use the following indicator variable *nopar (* refers to the age indicator). Another example is a teacher's responses. To identify cases where a teacher form was not returned, a data user can examine the variable *tcd. A data user can also examine the following indicator variables: *partresp to identify cases that were incomplete due to an interview stopping half way as opposed to just certain sections being refused or *hhresp to identify cases where the household interview was completed.

There are a large number of indicator variables. Data users are encouraged to investigate the reasons for data being incomplete through these variables. Note that the indicator variables do not follow the general variable naming conventions described above. Some indicator variables are listed in Table 1. Indicator variables can be found in the data dictionary under the topic 'Identifiers', along with other variables that fall under that topic. For more information refer to the data dictionary.

6.7 Variable labelling convention

The labels used for the variable dataset take the following general form:

(Age) - (Informant/subject) - (Questionnaire position) - (Construct label)

Age is a label for the age indicator from the variable name, so:

a = 0/1

b = 2/3

c = 4/5

d = 6/7

e = 8/9

f = 10/11

g = 12/13

h = 14/15

i = 16/17

If no age indicator is present in the variable name, or the age indicator is z, then this part of the variable label will not be included.

For example:

label zf04m1 = 'SC - DOB', here no age is associated with the variable because it doesn't change with time, hence no age indicator is included.

label df03m1 = '6/7 - SC - Age', this variable is a variable that changes over time so the age indicator is required in order to establish when the question was answered.

Informant/subject gives the informant or subject of the question as contained in the variable name. For household composition variables involving P1, P2, mother or father, the age of the study child at which the person's status as parent is determined will also be indicated (e.g. M@0/1 is the mother when the child is aged 0-1 years old). If the information only exists for one subject or informant in the study this part of the variable label will not be included.

Questionnaire position indicates the location of the question the data was obtained from within the LSAC questionnaires (e.g. F2F H2 is question H2 of the face-to-face interview). This part of the variable label is left blank for derived items such as scales and other non-input items but included for mother/father variables where the location of both the P1 and the P2 variables are given.

Construct label provides a description of what information is actually contained in the variable (e.g. 'Sex', 'Birthweight', etc.). This part of the variable name will be consistent for each variable representing the same construct for a different subject/informant or wave.

For example:

The Parent 1's rating of their own health quality at Wave 1 for the B cohort (ahs13a) has the variable label '0/1 - P1 - P1L D1 - Global Health Measure'. (0/1 is the age indicator, P1 is the informant/subject indicator, P1L D1 indicates the variable comes from the first question of section D of the P1 leave-behind questionnaire, 'Global Health Measures' is the construct label).

Total score for the P1 parental warmth scale for the K cohort at Wave 2 (dbwarm) id '6/7 - P2 - warm parenting' (6/7 is the age indicator, P2 is the informant indicator, there is no questionnaire position as the variable is calculated from multiple questions, 'warm parenting' is the construct label).

6.8 Missing value conventions

Missing data are coded as follows:
-1 Not applicable (when explicitly available as an option in the questionnaire)
-2 Don't know
-3 Refused or not answered
-4 Section refused
-9 Not asked due to one of the following reasons:
(a) A question was skipped due to the answer to a preceding question (e.g. if a child never repeated a grade, the following question regarding what grade the child repeated was not asked/skipped).
(b) A form was not returned or consent to participate was not given (e.g. if a teacher form was not returned then the teacher's responses for this hicid are set to -9. To identify cases for which a form was not returned/or consent was not provided a data user can use an indicator variable (see Table 1 for details)).
(c) One of the informants refused to participate (e.g. if a parent refused to participate but not a child then the parent's responses are set to -9. To identify cases when the parent refused to participate, a data user can use the *nopar indicator variable).
(d) A form was partially completed (e.g. P1 completed the interview over the phone (P1 CATI) but the face-to-face component did not occur. To identify these cases, a data user can use the *partresp indicator variable). (See section 6.6 for more detail.)
-99 Missing data - data not collected where it might be expected (e.g. the respondent skipped a question they should have answered in a self-complete form), or made missing due to an unreliable value (e.g. weight of P1 recorded as 800 kg).
Other specific examples of (-99)
(a) Negative income (loss)
(b) Before baby's birth
(c) No set amount

For further details about how missing LSAC income data is treated see Technical Paper No. 14 Imputing income in the Longitudinal Study of Australian Children [PDF 1.3 MB].

6.9 Renamed variables

There are approximately 1,900 variables across waves 1-6 in B and K cohorts that have been renamed during Wave 7 data processing. Some variables from earlier waves have been renamed to include an informant indicator that differentiates data received from parents as opposed to data received from the study child. The vast majority of the renaming activity affects the Parent 1 variable for earlier waves. By making these changes it allows a clear way to align future study child content but still differentiate the informant when they are asked the same questions as parents were asked in earlier waves. Data Issue Paper: Waves 1-7 describes the variable renaming issues in detail.