Data Issues

Waves 1 to 7
Data issues - Waves 1 to 7 – February 2019

2 Cleaning of time use diary data

2.1 Background

The LSAC time use diary (TUD) is a diary consisting of 96 15-minute time intervals or bubbles with pre-coded activity (e.g. sleeping, eating, bathing) and context (e.g. where they were and who they were with) information. Parents are asked to mark which of the pre-coded activities were done during each of the 96 time intervals. The diary begins at 4 am. Time interval 1 is from 4 am to 4:14 am, time interval 2 is from 4:15 am to 4:29 am, etc. For the B cohort at Wave 1 there were 22 pre-coded activities, five context locations and seven 'who else was present' context options. Additionally, the diarists were asked whether they had paid for the activity that the child was doing. For the Wave 1 B cohort the total matrix size was 3,360, consisting of the 35 activities and context descriptors by 96 time intervals. The Wave 1 K cohort had 26 pre-coded activities. Otherwise, the diary was the same as for the B cohort. The matrix size for the K cohort was 3,744.

The data entry used scanning technology. For Wave 1, few checks were made at the time of data entry and subsequently it has been found that the scanner was sensitive to rub outs and other marks that appeared in the bubbles on the paper files. This resulted in false data (false positives) that exists in the electronic data files but does not exist on the paper files.

For Wave 2 various procedures were implemented to ensure that these problems did not recur. These procedures involved changes to the data capture and data validation stages.

To reduce problems associated with capturing the data, changes were made to the scanner settings. Through the Intelligent Forms Processing (IFP) system it is possible to define the minimum character/mark size that will be registered by the system. Testing of TUD capture confirmed that oversensitivity of scanning equipment can produce a high rate of false positive responses on the TUD. Following iterative testing of LSAC dress rehearsal (DR) TUD forms, it was determined that the character size for the TUD 'bubbles' should be increased from 2 x 2 pixels to 2 x 5 pixels. Testing showed that this setting allowed the IFP system to disregard very small specs of dust, etc. (thereby greatly reducing false positives) without any impact on the false negative rate.

A second setting that impacts on the registration of marks is the size of field that is 'scanned' for a character/mark. Often this field size is expanded slightly beyond the expected capture area to allow marks falling slightly outside the response box to be registered. However, in the case of the TUD, the extremely close arrangement of the response bubbles meant that such an expansion led to false positives from slight (and unintended) continuation of marks beyond one response bubble but not quite into the subsequent one. For this reason, horizontal margins for the capture area have been strictly limited to the intended response area (i.e. the border of the response bubble). However, vertical (top/bottom) margins of 6 pixels outside the response bubble have been retained to ensure that marks made slightly above/below a response bubble are captured.

Following capture, forms are forwarded for inspection and repair by a trained operator. The process outlined below is performed on all TUD forms, with the majority expected to contain at least one response mark that will need to be investigated.

The first repair process conducted on scanned forms is the on-screen inspection of mosaics of scanned response marks known as carpets. Carpets display images of all marks from the same form that have been recognised by the system. Depending on the system's confidence in the validity of a particular response, the mark will be displayed in a green, yellow or red shaded box. At this stage the operator is able to confirm or correct a response or, alternatively, select responses for further investigation through the form process (see below).

In the example below (Figure 1) the carpet displays images of eight response marks, all of which were confidently identified by the system as valid responses. In two cases the respondent has clearly attempted to correct a response by crossing out the original mark. While the Optical Mark Reader (OMR) scanner is unable to distinguish these responses from the valid responses, the analysis of carpets gives the operator an opportunity to correct the data, thereby avoiding potential false positives. Operators conducting repair of LSAC forms are trained to examine carpets for these types of responses.

Figure 1: Example of 'carpets' from time use diary scanning

Figure 1: Example of 'carpets' from time use diary scanning

Forms containing at least one mark queried by the system or the operator progress to the forms stage of repair. At this stage the operator is able to see the queried response in the context of the form and other responses.

In the test example below (Figure 2) the operator has queried the two marks highlighted in red during the earlier examination of carpets for this form. The ability to further examine these marks then enables the operator to make a more informed decision regarding its validity.

Figure 2: Example of in-context diary data display

 Figure 2: Example of in-context diary data display

The processes outlined above have been developed as a result of the experience of the Australian Bureau of Statistics (ABS) in the capture of LSAC TUD forms during the Wave 2 dress rehearsal and more recent testing of final forms. It should be noted, however, that these processes do not address data quality issues associated with respondent error nor will they fully overcome capture difficulties associated with formatting features of the TUD such as the extremely close arrangement of response bubbles and the sheer volume of information recorded on each page.

The rest of this section reports on the extent of the Wave 1 false positives and provides a description of attempts to electronically remove the false positives, as well as other measures to improve data quality. A number of strategies were used to recode the data, working off the premise that most implausible data combinations are likely to be false positives. For example, if it is late in the evening or early in the morning, a child is likely to be either in bed or asleep, not simultaneously sleeping and walking (not for extended periods of time at least). The rules for the electronic recodes are outlined in more detail below. These corrections were applied only to the early wave TUD data.1

In recoding the data it is important that the amount of real data being incorrectly removed (false negatives) is minimised. It is expected that this incorrect coding was most likely to occur when there were transitions between activities. In order to protect against this, sequences of events were often considered. That is, comparisons were often made with the preceding and following time interval. However, it must be realised that from time to time diarists unintentionally provide information on implausible events.

In addition to the recoding of false positives, other data cleaning strategies or imputations are employed. These recodes are potentially important, as it is common in time use analysis to exclude data with more than 90 minutes of missing data. Thus, if the number of missing bubbles can be minimised, less data is lost. These imputations are performed on the data from all three waves.

2.2 Assessing the extent of the false positives

In order to estimate the extent of the false positives and to ascertain whether corrections could be made electronically, a random sample was drawn from both the B cohort (n = 51) and the K cohort (n = 49). One diary was excluded from the B cohort as the diary was returned blank, and two diaries were excluded from the K cohort, one because only one activity was given for the whole diary and the other because it did not match the electronic file at all. These forms were manually checked against the electronic records so that false positives were identified and given a unique code on the electronic record. These files are known throughout this report as the 'corrected files'. The files of these cases before they are checked are known as the 'original files'.

K cohort, Wave 1

A summary of false positives by data type for the K cohort is presented in Table 1. Over the random sample of 47 children and over the 96 time intervals, there were 16,248 positive responses in the original file. This file had 882 'extra' bubbles or units of data than was provided in the 'corrected file'. Thus, the false positive rate is 6% (876/(876 + 15,366)*100). It should be noted that there was one diary that had a false positive rate of 30%, while the next highest figure was 14%. If these cases were to be excluded, the false positive rate would drop to 5%.

For the K cohort, the highest aggregate false positive rate was for the 'with whom' context data (7%), while for the activity data and 'where' context data, the false positive rate was around 5%. The light diary also asked whether someone was paid for the activity. There was only one false positive associated with this data.

The general trend was that the more real data there was the greater the number of false positives. This is not surprising given that much of the false positive data was due to rub-outs.

Table 1: Summary of false positives based on comparing original and corrected file of 47 cases (K cohort, Wave 1).
Source True positives False positives False positive rate (%)
Total 15,366 876 5.7
Activity 5,241 274 5.2
Where 3,738 195 5.2
Who 6,231 406 6.5
Paid 156 1 0.6

B cohort, Wave 1

A summary of false positives by data type for the B cohort is given in Table 2. Over the random sample of 50 children over the 96 time intervals, there were 17,703 units or bubbles of data in the original file. This file had 723 false positive values or 'extra' bubbles or units of data than was provided in the 'corrected file'. Thus, the total false positive rate was 4% (723/17703*100).

Table 2: Summary of false positives based on comparing original and corrected file of 50 cases (B cohort, Wave 1)
Source True positives False positives False positive rate (%)
Total 16,980 723 4.1
Activity 5,898 311 5.0
Where 4,235 107 2.5
Who 6,732 302 4.3
Paid 115 3 2.5

2.3 Recoding to reduce false positives

K cohort, Wave 1

A number of recodes were experimented with to reduce the rate of false positives. Table 3 gives a summary of the impact of electronic recodes on the original file in terms of both reducing the number of false positives as well as introducing false negatives into the data. These recodes reduced the false positive for the test file to 5%. The recodes are described in greater detail below.

Table 3: Summary of electronic corrections made to original file of 47 cases (4-5 year olds)
  Imputed true negativesa Imputed false negativesb
If sleeping (recode other activities = 0)  
Early morning (4 am to 9 am) 64 6
Late evening (9 pm to 4 am) 20 6
If awake in bed (recode other activities = 0) 1 0
If one activity and implausible context data  
Travel (recode home inside or alone) 12 11
Walk/ride (recode inside a home) 11 1
Television and computer (recode outside) 1 0
If sleeping or awake in bed (recode outside) 12 0
If at day care centre outside plausible hours and give other location (recode day care) 4 0
Total activities and context data 125 24

Notes: a Imputed true negatives are those cells that were imputed as '0' where the corrected file had indicated that the positive response was a false positive. b Imputed false negatives are those cells that were imputed to '0' where the corrected file indicated that the positive response was a true positive.

Correction 1: Being asleep and doing other activities at the same time

In order to reduce the likelihood of recoding a transitional phase, the child had to be asleep in a given interval and in the preceding and following intervals. Three separate time periods were tested: morning (4 am to 9 am), nighttime (9 pm to 4 am) and daytime (9 am to 9 pm).

For the morning and nighttime periods, if an activity occurred simultaneously with sleep, non-sleep activity time was coded as zero. The recoding of activities occurring simultaneously with sleep in the morning and in the evening was relatively successful, with an aggregate number of 84 correct recodes and 12 incorrect recodes. In the full Wave 1 file (i.e. n = 7,449) this resulted in 23,216 recodes of positive responses. This correction was not performed on the Wave 2 data (n = 6,906); however, if it had been it would have only resulted in 4,498 recodes. This provides further evidence that this correction removed many more false positives than true ones.

In the period between 9 am and 9 pm, children were most likely to be periodically transitioning between sleep and other activities. Attempting to recode activities occurring simultaneously with sleep, in this period, yielded no corrections to false positives and resulted in eight true positives being recoded. Alternatively, an attempt to recode sleep resulted in recoding four false positives and five true positives. Given that both these alternatives resulted in more incorrect recodes than correct ones, neither was performed on the main data file.

Correction 2: Being awake in bed and doing other unlikely activities at the same time

Other unlikely activities are defined as:

  • bathing, dressing, hair care, health care
  • using computer/computer games
  • walking for travel or fun
  • riding bicycle, trike, etc. (travel or fun)
  • other exercise - swim/dance/run about
  • travelling in pusher or on bicycle seat
  • travelling in car/other household vehicle
  • travelling on public transport, ferry, plane
  • taken places with adult (e.g. shopping)
  • organised lessons activities.

If the children were doing these activities as well as being awake in bed, other activities were coded as zero. Again, in order to reduce the likelihood of recoding a transitional phase, the child had to be awake in bed in a given interval, while in the following and preceding interval they had to be either asleep or awake in bed.

The impact of recoding activities occurring simultaneously with the child being awake in bed was relatively minor, with only one false positive being recoded in the 47 diaries in the original file, and 1,225 positive responses being recoded in the full file. This is not surprising given that 4-5 year old children are not often awake in bed for long periods of time unless they are ill or are having trouble getting to sleep. In Wave 2, 1,077 positive responses would have been recoded due to this correction.

Correction 3: A child cannot be travelling and be inside at home or be alone

A child cannot be travelling (travelling in a pusher/ travelling in a car/ travelling on public transport/taken places with an adult) and be simultaneously at home inside (or in someone else's home) or be alone, if travelling was their sole activity for the time period. Recoding the context data as 0 where this occurred resulted in slightly more false positives being altered than true ones. In the full file this resulted in 4,111 positive responses being recoded. In Wave 2, 1,359 responses would have been recoded.

Correction 4: A child cannot be walking/riding and be inside a home

A child cannot be walking for travel or fun or riding a bike or trike, etc. and be inside their own or someone else's home if this is their only activity for the time period. Recoding all incidences of being inside as zero resulted in many more false positives being altered than true ones. In the full file this resulted in 1,996 positive responses being recoded. In Wave 2, only 282 responses would have been recoded.

Correction 5: A child cannot be watching television or using a computer and be outside

A child cannot be watching television or using the computer and be outside. Recoding being outside as zero in this situation resulted in only one false positive correction for these 47 diaries but no alterations to true positives. In the full file this correction resulted in the recoding of 430 positive responses. In Wave 2, 157 responses would have been recoded.

Correction 6: A child cannot be sleeping or awake in bed and be outside

If a child was awake in bed or asleep and this was their only activity they cannot be outside. While this does exclude any children who were camping (assuming a tent doesn't count as indoors), in the test cases available this correction eliminated 12 false positives without altering a true positive. In the full file this resulted in 2,459 positive responses being recoded. In Wave 2, 535 responses would have been recoded.

Correction 7: A child cannot be at a day care centre/play group outside the hours of 7 am to 7 pm

If a response was given outside of these hours it was recoded to zero. In the original file this resulted in four corrections to false positives without creating any false negatives, while in the full file 1,417 responses were recoded by this correction. In Wave 2, 610 responses would have been recoded.

B cohort, Wave 1

Table 4 gives a summary of the impact of electronic recodes on the original file for the infants in terms of both reducing the number of false positives as well as introducing false negatives into the data. In summary, the recoding resulted in a reduction of 121 of the 723 false positive data, with little creation of false negatives. As a result of these recodes, the false positive rate fell from 4% to 3%. The recodes are described in greater detail below.

Table 4: Summary of electronic corrections made to original file of 50 cases (B cohort, Wave 1)
  n Imputed true negativesa Imputed false negativesb
If sleeping (recode other activities = 0)      
4 am to 7 am 50 28 8
If alone/sleeping then can't be with others      
4 am to 3 pm 50 38 15
10 pm to 4 am 50 22 0
Travelling and at home or alone 50 27 9
If one activity is breastfeeding, bathing, being held or read to, recode alone 50 4 1
If at day care centre outside of the hours of 7 am to 7 pm and give other location (recode day care) 50 2 0
Total activities and context data 50 121 33

Notes: a Imputed true negatives are those cells that were imputed as '0' where the corrected file had indicated that the positive response was a false positive. b Imputed false negatives are those cells that were imputed to '0' where the corrected file indicated that the positive response was a true positive.

Correction 1: Being asleep and doing other activities at the same time

Children should not be asleep and also be active in another activity in the same time interval unless the child was in transition between activities. For this recode, activities that were indicated in the same time period of sleep were recoded for intervals where the child was also asleep in the preceding and following interval. This recode was tried for a number of different time periods but was only successful at the start of the day between 4 am and 7 am, where it recoded 28 false positives and eight true ones. In the full file (n = 7,782) this recode resulted in 11,278 positive responses being altered. As for the K cohort, these corrections were not repeated in Wave 2; however, if they had been, only 1,464 responses would have been recoded, suggesting that most Wave 1 responses recoded were false positives.

Correction 2: Being asleep alone and with someone at the same time

If the child was sleeping alone in a time period as well as the one preceding and following it, all other data in the 'in the same room as' section was recoded to zero. The only time period this didn't work for was the evening between 3 pm and 10 pm, so this period was excluded from this recode. Outside of these times it resulted in 60 corrections to false positives while introducing only 15 false negatives. In the full file this recode resulted in 29,016 responses being altered. In Wave 2, 8,618 responses would have been recoded.

Correction 3: A child cannot be travelling and be inside at home or be alone

A child cannot be travelling (travelling in a pusher/ travelling in a car/ travelling on public transport/taken places with an adult) and be simultaneously at home inside or be alone. A child was identified as travelling if they are travelling in a given interval and the preceding and following interval. In this situation, the 'at home' or 'alone' response would be removed. This correction removed 27 false positives while introducing nine false negatives. In the full file, it led to 6,303 positive responses being removed. In Wave 2, 1,098 positive responses would have been removed.

Correction 4: Being alone and with improbable activities

If a child's only activities are breastfeeding, being held, having personal grooming tasks performed, or being read a story or talked or sung to, any response that the child was alone for the period was recoded to zero. Correcting this removed four false positives and produced one false negative, while in the full file this recode led to the removal of 1,031 responses of 'alone'. In Wave 2, 420 responses would have been removed.

Correction 5: Being at a day care centre/playgroup at improbable hours

Any response indicating that the child was at a day care centre outside the hours of 7 am to 7 pm was recoded to zero. This recode corrected two false positives and produced no false negatives, while in the full file, 593 positive responses were removed. In the Wave 2 file, 301 responses would have been removed.

2.4 Coding to improve data quality

A number of further recodes were undertaken to improve other aspects of data quality, such as reducing missing or contradictory data.

B and K cohorts

These operations were performed on all three waves of diary data for both cohorts.

Improvement 1: Recoding not sure when other activities given in the same time interval

Ideally, respondents should only have given 'not sure' as a response if they were unable to report any of the child's activities in a 15-minute block. Where this has happened the 'not sure' response was coded to zero. For Wave 1, this removed 12,770 'not sure' responses from the K cohort file and 10,026 'not sure' responses from the full infant file. For Wave 2, these figures were 4,678 for the K cohort and 3,717 for the B cohort and, in Wave 3, they were 5,380 for the K cohort and 3,759 for the B cohort.

Improvement 2: Imputing not sure or missing activity data as sleep and the early morning

If the parent was not sure of what the child was doing or activity data was missing in the early morning (4 am to 9 am) and the sequence of not sure/missing ended with either the child being awake in bed or sleeping, the not sure/missing was recoded as sleep. In Wave 1, these changes created an extra 984 sleep responses in the full K cohort file and 1,292 extra sleep responses in the full B cohort file. In Wave 2, these figures were 1,036 for the K cohort and 1,002 for the B cohort and, in Wave 3, they were 943 for the K cohort and 566 for the B cohort.

Improvement 3: Imputing not sure or missing activity data as sleep at nighttime

If the parent was not sure of what the child was doing or activity data was missing at nighttime (9 pm to 4 am) and this sequence began following the child being either awake in bed or sleeping, the not sure/missing data was recoded as sleep. This created an extra 2,540 sleep responses in the full K cohort file and 4,101 in the full B cohort file. In Wave 2, the figures were 2,681 for the K cohort and 3,933 for the B cohort and, in Wave 3, they were 2,229 for the K cohort and 2,328 for the B cohort.

Improvement 4: Other missing data

If there was a single time period with missing activity data and the child remained in the same location, then either the activity before or after the missing bubble was randomly allocated to the missing bubble. This improvement imputed activities in 2,517 time periods in the full K cohort file and 3,022 time periods in the B cohort file. For Wave 2, these figures were 1,956 for the K cohort and 2,389 for the B cohort and, in Wave 3, they were 1,553 for the K cohort and 1,767 for the B cohort.

Improvement 5: Missing 'who' information in child care

If a child's 'where' information includes 'day care centre/playgroup', it can reasonably be assumed they are in the presence of other children and other adults when alternative information is missing. This improvement imputed data in 17,294 time periods in the full K cohort file and 3,169 time periods in the B cohort file. As might be expected given the rise in time in non-parental care for the children as they get older, these numbers were higher in Wave 2 with 35,097 time periods for the K cohort and 12,712 for the B cohort and, in Wave 3, they were 23,045 for the K cohort and 15,835 for the B cohort.

2.5 Exclusion of cases

It is common practice when analysing time use diary data to exclude cases with poor quality data, often indicated by rules of thumb such as more than 90 minutes of missing information (e.g. Egerton and Gershuny, 2004; Fisher, 2002). The LSAC time use diaries use a different response format than many other similar instruments (i.e. the use of scanned responses rather than coding of text responses) and this may have an effect on the quality of the diary data and on which cases should be excluded. Cases considered to be of poor quality were removed from the main diary dataset and placed in a separate file so that they could be re-included for any analysis where the user thought they might be valuable.

2.6 Criteria for exclusion

Three criteria were used to exclude cases from the dataset.

Cases with large amounts of missing data

As mentioned above, it is common practice to remove cases with more than 90 minutes missing activity data from analyses. However, analyses of the LSAC data suggested that using this rule of thumb might be inappropriate as children who spent time away from their parents (e.g. in child care) were more likely to have greater levels of missing activity data. Instead, a diary was deleted from the file if it had no data of any kind for more than 90 minutes (or six time intervals). In Wave 1, this criterion excluded 239 diaries (3%) from the B cohort file and 368 diaries (5%) from the K cohort file. For Wave 2, 235 (4%) diaries were deleted from the B cohort file and 268 (4%) were deleted from the K cohort file. For Wave 3, 147 (3%) diaries were deleted from the B cohort file and 233 (4%) were deleted from the K cohort file.

Cases with large numbers of simultaneous activities

Most time use diaries request respondents to describe their main activity for each time period, with limited opportunities to describe secondary activities. The format of the LSAC time use diary meant that a number of activities could be specified separately; however, where numbers were large, it often indicated that the respondent had trouble understanding the task. As such, it was decided to exclude any respondent that gave more than five simultaneous activities for more than six time periods. In Wave 1, this criterion excluded 78 diaries (1%) from the B cohort file and 55 diaries (1%) from the K cohort file, while in Wave 2, 26 (0.4%) B cohort diaries and 16 (0.2%) K cohort diaries were deleted. For Wave 3, 11 (0.2%) diaries were deleted from the B cohort file and 11 (0.2%) were deleted from the K cohort file.

Cases with few changes in activities

Diaries with few changes in activities tended to occur when the parent either did not have a good idea of the child's activity (e.g. large amount of time in non-parental care) or was not able to fill in the diary in detail. It was decided that fewer than 10 different activities over the 24-hour period represented an unacceptable lack of detail. This excluded 59 diaries (1%) from the B cohort file and 144 diaries (2%) from K cohort file. In Wave 2, these figures were 110 (2%) and 171 (3%) respectively. For Wave 3, 120 (2%) diaries were deleted from the B cohort file and 159 (3%) were deleted from the K cohort file.

There were some diaries excluded for more than one reason, so in total for Wave 1, 330 diaries (4%) were excluded from the B cohort file and 490 (7%) were excluded from the K cohort file. The effect of the exclusion of these diaries on the socio-demographic composition of the time use diary sample can be seen in Table 5. The deleted diaries tended to come from lower socio-economic status families.

Table 5: Effect of deleting problem cases on socio-demographic composition of the Wave 1 LSAC TUD sample (unweighted data)
Wave 1 B cohort K cohort
Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%) Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%)
Gender
Male 51.2 51.4 51.6 50.9 51.2 51.6
Female 48.8 48.6 48.4 49.1 48.8 48.4
Age range of children (B cohort/K cohort)
3-5 months/ 51-53 months 11.2 11.1 11.2 10.6 10.6 10.7
6-11 months/ 54-59 months 73.2 73.3 73.4 72.1 72.1 72.8
12-14 months/ 60-62 months 14.7 14.8 14.8 16.1 16.1 15.7
15-19 months/ 63-67 months 1.0 0.8 0.6 1.3 1.3 0.8
Family type
Couple family: 90.7 91.5 93.0 86.0 87.0 88.9
both biological 90.1 91.0 92.5 82.9 84.3 86.6
other (e.g. step/blended) 0.6 0.6 0.5 3.1 2.8 2.3
Single parent family 9.3 8.5 7.0 14.0 12.9 11.0
Siblings
Only child 39.5 40.0 40.7 11.5 11.2 10.8
One sibling 36.8 36.8 36.9 48.4 49.4 50.7
Two or more siblings 23.7 23.2 22.3 40.1 39.4 38.9
Cultural background
Aboriginal or Torres Strait Islander 4.5 3.9 2.7 3.8 3.2 2.4
Mother speaks a language other than English at home 14.5 13.4 11.2 15.7 14.7 12.3
Work status
Both parents or lone parent work/s 47.9 48.8 50.5 55.5 56.0 57.2
One parent works (in couple family) 40.8 41.2 41.5 32.8 33.5 34.2
No parent works 11.3 10.1 8.0 11.6 10.5 8.6
Educational status
Mother completed Year 12 66.9 68.7 71.9 58.6 60.2 63.0
Father completed Year 12 58.5 59.4 60.8 52.7 53.3 54.6
Child care
Child has a regular care arrangement (including school) 35.9 35.7 35.4 96.7 97.1 97.6
State
New South Wales 31.6 30.8 30.0 31.6 31.2 30.8
Victoria 24.5 24.5 24.5 25.0 25.0 24.9
Queensland 20.6 20.9 21.2 19.8 20.1 20.4
South Australia 6.8 6.6 6.4 6.8 6.6 6.2
Western Australia 10.4 10.8 11.2 10.2 10.4 10.8
Tasmania 2.2 2.3 2.4 2.7 2.8 2.9
Northern Territory 1.7 1.8 1.8 1.7 1.6 1.5
Australian Capital Territory 2.1 2.3 2.4 2.3 2.3 2.4
Region
Capital city statistical division 62.5 62.8 63.1 62.1 62.0 61.6
Balance of state 37.5 37.2 36.9 37.9 38.0 38.4
Number of observations (n) a 5,107 8,858 7,452 4,983 8,565 6,959

Note: a TUD samples are larger than the LSAC sample as respondents were asked to complete two diaries.

In Wave 2, 335 (5%) were deleted from the B cohort file and 405 (6%) from the K cohort file. The effect of the exclusion of these diaries on the socio-demographic composition of the time use diary sample can be seen in Table 6. Again, the deleted diaries tended to come from lower socio-economic status families.

Table 6: Effect of deleting problem cases on socio-demographic composition of the Wave 2 LSAC TUD sample (unweighted data)
Wave 2 B cohort K cohort
Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%) Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%)
Gender
Male 51.1 51.3 51.2 51.0 51.7 51.9
Female 48.9 48.7 48.8 49.0 48.3 48.2
Age range of children (B cohort/K cohort)
27-32 months/ 75-77 months 6.3 6.6 6.7 7.1 7.5 7.6
30-35 months/ 78-83 months 64.8 66.1 66.5 63.7 64.7 64.9
36-38 months/ 84-86 months 23.5 23.0 22.8 23.8 23.3 23.2
39-43 months/ 87-91 months 5.4 4.3 3.9 5.4 4.5 4.3
Family type
Couple family: 89.0 91.6 92.0 85.2 88.2 88.9
both biological 88.0 90.5 91.0 81.3 85.2 85.9
other (e.g. step/blended) 1.0 1.1 1.1 3.9 3.1 2.9
Single parent family 11.0 8.4 8.0 14.8 11.8 11.1
Siblings
Only child 19.3 19.1 18.9 9.1 8.7 8.8
One sibling 49.1 51.4 51.9 45.2 47.7 48.1
Two or more siblings 31.6 29.6 29.2 45.7 43.6 43.1
Cultural background
Aboriginal or Torres Strait Islander 3.9 2.5 2.3 3.4 2.3 2.3
Mother speaks a language other than English at home 13.4 11.8 11.1 14.7 13.6 12.5
Work status
Both parents or lone parent work/s 56.9 58.0 58.5 65.4 67.6 68.3
One parent works (in couple family) 33.8 35.0 35.0 26.1 26.0 25.8
No parent works 9.3 7.0 6.5 8.6 6.5 5.9
Educational status
Mother completed Year 12 69.0 73.0 74.1 60.1 63.8 64.9
Father completed Year 12 59.7 62.2 62.6 53.2 55.5 56.0
Child care
Child has a regular care arrangement (including school) 70.4 71.3 71.5 99.7 99.7 99.6
State
New South Wales 31.1 30.9 30.9 31.1 30.9 31.0
Victoria 24.3 24.8 24.7 24.3 24.5 24.2
Queensland 21.4 21.1 21.1 21.4 20.4 20.8
South Australia 6.7 6.8 6.8 6.7 6.9 6.8
Western Australia 10.6 10.4 10.5 10.6 10.4 10.4
Tasmania 2.3 2.3 2.4 2.3 3.2 3.2
Northern Territory 1.4 1.3 1.3 1.4 1.3 1.2
Australian Capital Territory 2.2 2.3 2.3 2.2 2.4 2.4
Region
Capital city statistical division 61.9 62.6 62.5 61.6 61.4 61.4
Balance of state 38.1 37.4 37.5 38.4 38.6 38.6
Number of observations (n) a 4,606 6,917 6,582 4,464 6,858 6,483

Note: a TUD samples are larger than the LSAC sample as respondents were asked to complete two diaries.

In Wave 3, 228 (4%) were deleted from the B cohort file and 339 (6%) from the K cohort file. The effect of the exclusion of these diaries on the socio-demographic composition of the time use diary sample can be seen in Table 7. Again, the deleted diaries tended to come from lower socio-economic status families.

Table 7: Effect of deleting problem cases on socio-demographic composition of the Wave 3 LSAC TUD sample (unweighted data)
Wave 3 B cohort K cohort
Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%) Full LSAC sample (%) Full TUD sample (%) Reduced TUD sample (%)
Gender
Male 51.3 52.1 52.1 51.1 51.4 51.2
Female 48.7 47.9 47.9 48.9 48.6 48.8
Age range of children (B cohort/K cohort)
27-32 months/ 75-77 months 7.8 8.4 8.5 8.4 8.8 8.8
30-35 months/ 78-83 months 67.2 68.5 68.5 65.7 65.1 65.1
36-38 months/ 84-86 months 20.7 19.9 19.8 21.9 22.5 22.5
39-43 months/ 87-91 months 4.3 3.1 3.1 4.1 3.7 3.6
Family type      
Couple family: 88.9 91.6 91.8 85.6 86.4 86.7
both biological 85.8 89.0 89.3 78.8 79.7 80.2
other (e.g. step/blended) 3.0 2.6 2.3 6.8 6.7 6.5
Single parent family 11.1 8.4 8.2 14.4 13.6 13.3
Siblings      
Only child 10.4 10.4 10.2 8.2 8.6 8.6
One sibling 48.1 51.2 51.6 44.1 45.2 45.2
Two or more siblings 41.5 38.4 38.2 47.7 46.2 46.2
Cultural background      
Aboriginal or Torres Strait Islander 3.4 2.0 2.0 2.9 2.6 2.4
Mother speaks a language other than English at home 12.6 10.7 10.2 13.8 13.5 13.2
Work status      
Both parents or lone parent work/s 63.0 64.0 64.2 72.8 73.1 73.4
One parent works (in couple family) 29.7 30.9 30.9 20.7 20.7 20.7
No parent works 7.4 5.1 4.9 6.5 6.2 5.9
Educational status      
Mother completed Year 12 69.8 74.1 74.8 61.3 62.2 62.8
Father completed Year 12 60.4 63.3 63.7 54.0 54.9 55.1
Child care      
Child has a regular care arrangement (including school) 96.6 97.4 97.4 99.5 99.4 99.4
State      
New South Wales 30.1 29.6 29.6 30.8 31.2 31.5
Victoria 24.6 24.3 24.0 24.4 19.6 19.5
Queensland 22.0 21.4 21.5 20.8 23.2 23.1
South Australia 7.0 7.6 7.7 6.9 5.1 5.1
Western Australia 10.3 10.7 10.7 10.2 14.9 14.9
Tasmania 2.4 2.6 2.6 30. 2.4 2.4
Northern Territory 1.2 1.3 1.3 1.4 1.6 1.5
Australian Capital Territory 2.4 2.6 2.6 2.5 2.0 2.1
Region      
Capital city statistical division 61.9 62.6 62.6 61.3 61.3 61.0
Balance of state 38.2 37.4 37.4 38.7 38.8 39.0
Number of observations (n)a 4,384 5,909 6,582 4,332 5,924 5,585

Note: a TUD samples are larger than the LSAC sample as respondents were asked to complete two diaries.

2.7 Summary

Corrections to improve data quality and deletion of problem cases had some effect on the rate of false positives due to scanning errors in the corrected files (i.e. the cases that had been checked against the paper forms). When these improvements were performed on the corrected file, the false positive rate dropped to 5% for the K cohort file and 3% for the B cohort file. In Wave 2, these same recodes, when applied to more rigorously checked, scanned files, recoded far fewer responses. This adds further evidence that it was largely false positive responses that were being recoded in Wave 1. Tables 8a to 8f show the effect of the recodes on estimates produced by the full data file. In the final Wave 1 B cohort file (i.e. with cases deleted and all corrections made) 88% of cases had at least one correction made to them, while in the final K cohort file, 84% of cases had at least one correction. In Wave 2, these proportions were much lower: 41% for the B cohort and 48% for the K cohort. In Wave 3, they were 42% for the B cohort and 46% for the K cohort.

Table 8a: Effect of recoding and case deletions on estimates of time use in number of minutes/day (B cohort, Wave 1)a
  Raw file After recodes After recodes and deletions
What was the child doing?   
Not sure what child was doing 42.5 19.8 19.0
Sleeping, napping 772.5 784.5 800.3
Awake in bed/cot 48.2 46.9 45.2
Looking around, doing nothing 28.2 27.6 25.8
Bathing/nappy change, dress/hair care 91.5 91.3 90.9
Breastfeeding 53.7 52.9 52.3
Other eating, drinking, being fed 129.4 128.9 129.1
Crying, upset 43.4 42.9 42.1
Destroying things, creating mess 22.6 22.3 21.1
Held, cuddled, comforted, soothed 129.8 127.5 128.0
Watching TV, video or DVD 37.3 37.3 36.0
Listening to tapes, CDs, radio, music 29.3 28.4 26.9
Read a story, talked/sung to, sing/talk 77.9 77.1 76.2
Colour/drawing, look at book, puzzles 8.5 8.5 7.8
Organised activities/playgroup 10.6 10.5 9.6
Crawl, climb, swing arms or legs 121.7 120.2 119.2
Other play, other activities 137.4 135.7 137.9
Visiting people, special event, party 41.2 40.3 39.1
Taken places with adult (e.g. shopping) 54.7 54.1 53.2
Travel
Taken out in pram or bicycle seat 36.8 36.4 35.3
Travel in car/other household vehicle 54.9 54.4 54.2
Travel on public transport, ferry, plane 3.0 2.9 2.1
Where was the child?
Own home (indoors) 1111.0 1100.9 1130.0
Other person's home (indoors) 68.4 68.4 68.7
Day care centre/playgroup 22.2 20.9 19.7
Other indoors 43.7 43.7 43.6
Other outdoors 66.3 66.3 67.2
In the same room, nearby if outside
Alone 370.5 366.4 380.2
Mother, step-mother 781.6 756.3 775.7
Father, step-father 405.2 393.9 403.7
Grandparent(s)/other adult relative 105.0 103.0 102.9
Brother(s), sister(s), other children 365.5 364.4 371.5
Other adult(s) 65.2 71.8 71.7
Dog, cat or other pet (not fish) 119.4 114.3 116.3
Payment   
Someone paid for this activity 21.9 21.9 21.4

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Table 8b: Effect of recoding and case deletions on estimates of time use in number of minutes/day (K cohort, Wave 1) a
  Raw file After recodes After recodes and deletions
What was the child doing?
Not sure what child was doing 68.4 39.0 38.4
Sleeping, napping 626.6 634.7 650.8
Awake in bed 37.3 33.8 34.0
Eating, drinking, being fed 124.0 120.7 123.0
Bathing, dressing, hair care, health care 59.9 58.8 59.6
Do nothing, bored/restless 7.7 7.0 6.5
Crying, upset, tantrum 10.1 9.1 8.9
Destroy things, create mess 8.0 6.9 6.3
Held, cuddled, comforted, soothed 40.5 37.5 37.0
Being reprimanded, corrected 12.3 12.0 11.4
Watching TV, video, DVD, movie 124.8 121.1 122.6
Listening to tapes, CDs, radio, music 18.9 17.9 17.5
Use computer/computer games 16.6 15.9 15.5
Read a story, talk/sing, talked/sung to 61.3 59.7 59.8
Colour, look at book, educational game 43.0 42.2 41.7
Being taught to do chores, read, etc. 18.3 17.5 17.1
Walk for travel or for fun 13.7 13.1 12.8
Ride bicycle, trike, etc. (travel or fun) 17.9 17.1 16.5
Other exercise - swim /dance/run about 47.4 45.7 46.4
Visiting people, special event, party 50.4 46.9 47.2
Other play, other activities 109.6 105.1 108.1
Travel in pusher or on bicycle seat 3.9 3.4 3.0
Travel in car/other household vehicle 57.6 55.4 56.6
Travel on public transport, ferry, plane 5.5 4.7 4.3
Taken places with adult (e.g. shopping) 49.5 47.4 47.9
Organised lessons/activities 73.8 71.9 74.5
Where was the child?   
Own home (indoors) 917.4 906.5 941.2
Other person's home (indoors) 65.4 64.9 64.7
Day care centre/playgroup 108.6 104.9 106.3
Other indoors 73.0 73.0 75.6
Other outdoors 107.3 102.2 105.2
In the same room, nearby if outside
Alone 226.4 225.8 236.9
Mother, step-mother 593.4 593.4 616.3
Father, step-father 343.5 343.5 357.7
Grandparent(s)/other adult relative 92.8 92.8 92.7
Brother(s), sister(s), other children 662.8 711.0 738.8
Other adult(s) 123.8 172.0 175.5
Dog, cat or other pet (not fish) 127.8 127.8 131.7
Payment
Someone paid for this activity 74.1 74.1 77.2

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Table 8c: Effect of recoding and cases deletions on estimates of time use in number of minutes/day (B cohort, Wave 2)a
  Raw file After recodes After recodes and deletions
What was the child doing?
Not sure what child was doing 60.2 48.6 47.4
Sleeping, napping 662.3 673.4 686.6
Awake in bed 40.7 41.0 41.0
Eating, drinking, being fed 117.7 119.1 120.6
Bathing, dressing, hair care, health care 54.0 54.7 55.2
Doing nothing, bored/restless 5.9 6.0 5.7
Crying, upset, tantrum 12.1 12.3 12.3
Arguing, fighting 5.6 5.7 5.4
Destroy things, create mess 7.1 7.2 6.5
Being reprimanded 9.7 9.8 9.1
Being held, cuddled, comforted, soothed 45.8 46.2 45.8
Watching TV, video, DVD, movie 94.5 95.1 95.9
Listening to tapes, CDs, radio, music 20.4 20.5 20.6
Using computer, computer game 4.3 4.3 4.1
Read a story, told a story, sung to 33.6 33.9 34.3
Colour/draw, look at book, educational game 36.2 36.5 36.3
Quiet free play 76.6 76.9 78.7
Active free play 88.8 89.1 90.4
Being taught to do chores 11.7 11.7 11.4
Visiting people, special event, party 72.0 72.1 73.4
Organised lessons/activities 15.8 15.8 16.0
Travel
Walking 13.0 13.1 12.5
Ride bicycle/trike 9.4 9.5 9.0
Travel in car 51.7 52.0 52.0
Travel in a pusher/bicycle seat 5.3 5.3 5.1
Travel on public transport 1.5 1.5 1.3
Taken places with adult (e.g. shopping) 34.5 34.6 34.7
Where was the child?
Own home (indoors) 944.8 944.8 974.8
Other person's home (indoors) 61.2 61.2 61.7
Day care centre/playgroup 87.5 87.5 85.4
Other indoors 85.5 85.5 86.9
Other outdoors 68.6 68.6 70.4
In the same room, nearby if outside
Alone 298.3 298.3 311.2
Mother, step-mother 677.7 677.7 699.3
Father, step-father 368.2 368.2 379.2
Grandparent(s)/other adult relative 94.0 94.0 94.9
Brother(s), sister(s), other children 551.3 590.0 606.4
Other adult(s) 91.1 129.8 129.0
Dog, cat or other pet (not fish) 113.9 113.9 118.3
Payment      
Someone paid for this activity 53.3 53.3 54.3

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Table 8d: Effect of recoding and case deletions on estimates of time use in number of minutes/day (K cohort, Wave 2)a
  Raw file After recodes After recodes and deletions
What was the child doing?
Not sure what child was doing 99.2 86.6 86.1
Sleeping, napping 598.4 607.5 620.2
Awake in bed 30.8 31.1 30.5
Eating and drinking 95.8 96.9 98.7
Bathing, dressing, hair care, health care 49.6 50.2 51.0
Do nothing, bored/restless 4.1 4.1 3.7
Crying, upset, tantrum 3.0 3.0 2.8
Arguing, fighting, destroy things 3.8 3.8 3.5
Held, cuddled, comforted, soothed 18.9 19.1 18.3
Being reprimanded, corrected 6.8 6.9 6.6
Watching TV, video, DVD, movie 91.2 91.7 92.9
Listening to tapes, CDs, radio, music 12.3 12.4 12.2
Use computer/computer games 18.4 18.6 18.5
Read a story, talk/sing, talked/sung to 16.9 17.1 17.1
Reading looking at book by self 21.0 21.2 21.3
Quiet free play 47.8 47.9 49.5
Active free play 65.6 65.8 67.6
Helping with chores/jobs 19.0 19.2 19.0
Visiting people, special event, party 59.7 59.8 59.9
Organised sport/physical activity 18.7 18.8 18.8
Other organised lessons/activities 17.3 17.4 18.0
Travel
Walk for travel or for fun 9.6 9.6 9.4
Ride bicycle, trike, etc. (travel or fun) 11.4 11.5 11.5
Travel in car 47.8 48.0 49.0
Travel on public transport 4.7 4.7 4.7
Taken places with adult (e.g. shopping) 20.4 20.5 20.7
Where was the child?
Own home (indoors) 784.2 784.2 812.8
Own home (outdoors) 56.7 56.7 57.0
School, after/before school care 219.8 219.8 223.1
Other indoors 78.1 78.1 78.1
Other outdoors 65.5 65.5 67.3
In the same room, nearby if outside
Alone 247.4 247.4 259.3
Mother, step-mother 479.1 479.1 494.1
Father, step-father 303.3 303.3 312.7
Grandparent(s)/other adult relative 65.5 65.5 65.4
Brother(s), sister(s), other children 649.3 759.6 781.9
Other adult(s) 141.6 251.9 257.2
Dog, cat or other pet (not fish) 126.2 126.2 130.2
Homework   
Activity done as part of homework 11.7 11.7 11.5

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Table 8e: Effect of recoding and cases deletions on estimates of time use in number of minutes/day (B cohort, Wave 3)a
  Raw file After recodes After recodes and deletions
What was the child doing?
Not sure what child was doing 60.0 48.5 47.7
Sleeping, napping 626.8 635.2 646.9
Awake in bed 35.9 36.2 35.9
Eating, drinking, being fed 106.5 107.7 109.5
Bathing, dressing, hair care, health care 53.4 54.0 55.0
Doing nothing, bored/restless 3.7 3.8 3.6
Crying, upset, tantrum 4.8 4.9 4.9
Arguing, fighting 5.7 5.8 5.7
Destroy things, create mess 3.3 3.3 3.2
Being reprimanded, corrected 6.6 6.7 6.6
Being held, cuddled, comforted, soothed 28.4 28.7 29.3
Watching TV, video, DVD, movie 99.1 99.7 100.9
Listening to tapes, CDs, radio, music 15.9 16.0 15.9
Using computer, computer game 14.0 14.1 14.3
Read a story, told a story, sung to 45.9 46.3 47.2
Colour/draw, look at book, educational game 39.4 39.6 39.9
Quiet free play 65.5 65.8 67.3
Active free play 82.0 82.3 83.8
Being taught to do chores 15.3 15.4 15.4
Visiting people, special event, party 63.5 63.5 64.2
Organised lessons/activities 54.5 54.5 55.6
Travel
Walking 9.7 9.8 9.7
Travel in a pusher/bicycle seat 2.4 2.4 2.3
Travel in car 49.5 49.8 50.7
Travel on public transport 3.3 3.3 3.3
Taken places with adult (e.g. shopping) 24.8 24.8 25.1
Ride bicycle/trike, etc. 8.1 8.1 8.1
Where was the child?
Own home (indoors) 906.1 906.1 929.9
Other person's home (indoors) 57.8 57.8 57.3
Day care centre/playgroup/pre-school/school 151.9 151.9 152.7
Other indoors 43.4 43.4 44.4
Other outdoors 86.8 86.8 89.2
In the same room, nearby if outside
Alone 212.7 212.7 219.0
Mother, step-mother 690.5 690.5 707.8
Father, step-father 412.6 412.6 422.6
Grandparent(s)/other adult relative 85.4 85.4 87.1
Brother(s), sister(s), other children 719.0 778.1 798.9
Other adult(s) 132.6 191.7 194.0
Dog, cat or other pet (not fish) 160.3 160.3 165.0
Payment
Someone paid for this activity 67.3 67.3 67.8

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Table 8f: Effect of recoding and case deletions on estimates of time use in number of minutes/day (K cohort, Wave 3)a
  Raw file After recodes After recodes and deletions
What was the child doing?   
Not sure what child was doing 91.2 75.4 75.1
Sleeping, napping 584.1 592.4 606.6
Awake in bed 34.6 34.8 34.2
Eating and drinking 95.8 96.7 97.8
Bathing, dressing, hair care, health care 48.8 49.4 50.0
Do nothing, bored/restless 5.1 5.2 4.3
Sulking, upset 3.0 3.0 2.5
Arguing, fighting 5.2 5.3 4.8
Being hugged, comforted, etc. 11.6 11.7 11.2
Being reprimanded, corrected 6.8 6.9 6.2
Watching TV, video, DVD, movie 103.5 104.1 105.0
Listening to tapes, CDs, radio, music 11.9 12.0 11.3
Use computer/computer games 30.2 30.4 30.2
Read a story, talk/sing, talked/sung to 9.9 10.0 9.4
Reading /looking at book by self 24.7 24.9 25.0
Quiet free play 42.6 42.8 43.7
Active free play 62.6 62.8 64.0
Helping with chores/jobs 22.8 22.9 22.8
Visiting people, special event, party 60.6 60.7 60.9
Organised sport/physical activity 25.4 25.4 25.9
Other organised lessons/activities 22.5 22.6 23.0
Travel   
Walk for travel or for fun 10.1 10.2 9.8
Ride bicycle, trike, etc. (travel or fun) 12.0 12.0 11.3
Travel in car 49.0 49.3 49.2
Travel on public transport 5.5 5.5 5.5
Taken places with adult (e.g. shopping) 20.9 21.0 20.6
Where was the child?
Own home (indoors) 809.9 809.9 838.1
Own home (outdoors) 53.6 53.6 53.4
School, after/before school care 209.2 209.2 215.5
Other indoors 91.3 91.3 92.6
Other outdoors 65.7 65.7 67.4
In the same room, nearby if outside   
Alone 247.3 247.3 260.6
Mother, step-mother 524.3 524.3 542.9
Father, step-father 344.4 344.4 355.8
Grandparent(s)/other adult relative 65.1 65.1 66.0
Brother(s), sister(s) 599.6 599.6 622.0
Other children 219.1 306.1 316.6
Other adult(s) 161.8 248.8 256.7
Dog, cat or other pet (not fish) 165.5 165.5 172.6
Homework   
Activity done as part of homework 14.1 14.1 13.7

Note: a Analysis uses weights that adjust for general LSAC non-response as well as weighting each day of the week equally in the analysis. These weights are recalculated when the poor-quality cases are deleted.

Acknowledgement

This chapter is largely based on the work of Jude Brown and Michael Bittman of the University of New England. David Zago of the Australian Bureau of Statistics provided the information on the process used to scan Waves 2 and 3 forms.

1 Note that at the time of the release of Wave 2 data the Wave 2 TUDs also had these processes applied to them in order to maximise consistency. For the release of the Wave 3 data this decision was reversed. It was felt that the corrections excluded some combinations of activities that were unlikely but possible, particularly as the children became older (e.g. sleeping outside).