Data user guide
- 1. Introduction
- 2. What is LSAC?
- 3. Instruments
- 4. The LSAC data release
- 5. File structure
- 6. Variable naming conventions
- 7. Documentation
- 8. Data transformations
- 9. Confidentialisation
- 10. Data imputation
- 11. Survey methodology
- 12. Important issues for data analysis
- 13. User support and training
- Appendix: LSAC variable naming conventions
8. Data transformations
- 8.1 Transformations to ensure consistency
- 8.2 Transformations to update information
- 8.3 Summary measures for scales
- 8.4 Outcome Index measures
The data from many of the responses to questions have been transformed to assist data users.
LSAC contains a number of items that have been asked slightly differently across waves. Where this is logically supportable, items are recoded to match the variables produced from other waves. These recoded versions are provided in addition to the original item response. Some examples of this:
- Income is generally collected as a continuous variable; however, for the PLE in wave 2 it was collected using five categories. To assist users in comparing the responses of different informants, an additional variable containing the continuous income information recoded into these five categories is added wherever income has been collected continuously.
- In wave 1, respondents were asked if the child received any regular child care from a grandparent. In wave 2, respondents were given the option of this being a maternal or paternal grandparent. In addition to the two variables giving this information separately for maternal and paternal grandparents, an extra variable has been added for whether the child is being cared for by a grandparent.
From wave 2 onwards, there are a number of places in the questionnaire where respondents are asked about what has happened with something since the last interview (or in the last two years if the study child is living in a new household). For example, in wave 1, P1 was asked how many homes the study child had lived in since birth, while in subsequent waves P1 was asked how many homes the study child had lived in since the last interview. The datasets for the subsequent waves contain variables on the number of homes since the last interview and a tally of all the homes the study child has ever lived in.
The appropriate summary measure for each scale is included, based on advice from the Consortium Advisory Group. Where it is possible to logically implement either a mean or a sum score for a psychological scale or subscale, the preference of the Consortium Advisory Group was to provide the calculation of means, except in cases where convention would dictate another scoring system. This enabled the calculation of scale level derivations where data measuring a construct has multiple contributing data items and where some contributing items are missing. Using a sum calculation for these scales would have led to the exclusion of cases with any missing data. All contributing data items to these scales are included on the datasets.
For scales where there are different sets of items for children at different ages or for different informants, multiple versions of the same scale are calculated based on just those items shared between two versions of the scale. For example, the parenting hostility scale began as a five-item measure for 0-1 year olds but had one item dropped for children aged 4-7 years, and a further item dropped for children aged 8-9 years. On the file for 0-1 year olds, three different versions of the scale are calculated: one using all five items, another using just the four items included for children aged 4-7 years, and another using just those three items used for children aged 8-9 years. As a general rule, data users should select the variable containing the greatest number of contributing items that is appropriate for their purpose. So, for analyses using just the hostility scale at age 0-1 years, or for those comparing the hostility scale at ages 0-1 and 2-3 years, analysts should use the five-item version. Analysts comparing hostility between the ages of 0 and 7 years should use the four-item version, and analysts comparing hostility between the ages of 0 and 9 years should use the three-item version.
A unique component of the derivation and analysis work was the development and derivation of the LSAC Outcome Index; a composite measure that indicates how children are developing. LSAC tracks the development of children across multiple domains, and the Outcome Index provides a means of summarising this complex information for policy makers, the media and the general public, as well as data users.
In contrast to some other indices, which focus on problems or negative outcomes, the LSAC Outcome Index, wherever possible, incorporates both positive and negative outcomes, reflecting the fact that most children have good developmental outcomes. Thus, the Outcome Index has the ability to distinguish groups of children developing poorly from those developing satisfactorily.
The rationale and methodology used to develop the Outcome Index are described in the LSAC Technical Paper No. 2, Summarising children's wellbeing: the LSAC Outcome Index. Papers on the derivation of the waves 2 and 3 Outcome Index are forthcoming. Any users planning to use the Outcome Index are strongly advised to read the technical papers as they contain important information about the correct use of the variable. From wave 4 the Outcome Index is not calculated.
When undertaking longitudinal analysis involving the Outcome Index, analysts should be cautious about using outcome indices from different waves in a pooled data file, as different measures may have been used at different waves to create the sub-domains.