HRS :: Documentation :: Family Data :: Merging HH-Child Records

Resources for Analysis of Family Data » Merging HRS Household-Member/Child Records Longitudinally

Background

The information in this document is provided to assist analysts who may need to obtain information about children or household members obtained in an earlier wave of the study. Although this document is an early version (March 17, 2008), we thought it might be of use in its present form. Your comments, via the HRS Help Desk, are welcome.

There currently no plans to release a revised version of the Longitudinal Other Person Number (LOPN) dataset. However you can merge household-member/child files longitudinally, without using the LOPN files, as described below. This procedure will result in much the same result (other than using 2006 and 2004 data and final-release versions of earlier data and not using exit files) as using the LOPN files.

Evolution of Structure

When working longitudinally with household-member/child files, you should keep in mind that the structure of household-member/child files has evolved over the course of the study.

In 2002 and subsequent waves, the household-member/child files contain a separate record for each child, child's spouse/partner, and other household member. All records in the household-member/child files from 2002 on are individual records.
In prior waves, 1993, 1995, 1996, 1998, and 2000, information about a non-resident child's spouse/partner is contained in the non-resident child's record while each resident, whether a child, spouse/partner of child, or other resident, has a separate record. In other words, for non-resident children, the records in these files are couple records, while for residents, they are individual records. During these waves if a non-resident child died, the surviving, non-resident, spouse was assigned their deceased spouse's OPN.

The HRS household-member/child identifiers were primarily designed to link the records with other records in a given wave; they were not optimized for merging records longitudinally across waves nor were they subjected to cross-wave consistency checks. Errors in identifiers have crept in across time.

Given this and depending on how your analytic needs can be formulated, you have a number of options in approaching these data:

Begin by merging (summary) information from single-wave household-member/child records to single-wave respondent records and then merge the resulting respondent records longitudinally. Respondent records can reliably be merged longitudinally using HHID and PN. We recommend this option if it accommodates your research plan.
The household-member/child records may be merged longitudinally, without using the LOPN files, as described below. However, caveats described in the LOPN documentation do remain. The technique of matching household-member/child records by HHID, previous-wave SUBHH, and OPN to track children and household-members across waves of the study is limited:
- for persons who assumed the OPN number of their deceased spouse or partner during the 1993 to 2000 waves,
- for spouses or partners assigned a new OPN in 2002,
- for persons with more than one OPN or for OPNs used by more than one person.
You should carefully consider how these limitations might affect your proposed analysis. You can, with resonable relibility, create records with historical data for matching with a given wave's household-member/child records.
We do not advise that you use the master file as a stand-alone file, as counting on flawless links of household-member/child records across all waves of the study can be fraught with peril and could lead to significant misinterpretations.

Master File Creation

A process to create a master file with all information about all household-members/children for all sub-samples for all the waves of the study is described below.

For the HRS sub-sample [1], interviewed in eight waves, 1992, 1994, 1996, 1998, 2000, 2002, 2004, and 2006, the merging is a seven step process.

Link 2006 child records with 2004 child records using HHID, JSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004 child records with 2002 child records using HHID, HSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004/2002 child records with 2000 child records using HHID, GSUBHH and OPN for HRS/AHEAD/CODA/WB.
Link the 2006/2004/2002/2000 child records with 1998 child records using HHID, FSUBHH and OPN for HRS/AHEAD/CODA/WB.
Link HRS records from the 2006/2004/2002/2000/1998 child records with 1996 child records using HHID, ESUBHH and OPN for HRS.
Link the 2006/2004/2002/2000/1998/1996 child records with 1994 child records using HHID, CSUBHH and OPN for HRS.
Link the 2006/2004/2002/2000/1998/1996/1994 child records to 1992 child records using HHID, ASUBHH and OPN for HRS.

For the AHEAD sub-sample [2], interviewed in seven waves, 1993, 1995, 1998, 2000, 2002, 2004, and 2006, it is a six step process. Steps 1 to 4 above and the two additional steps below.

Link 2006 child records with 2004 child records using HHID, JSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004 child records with 2002 child records using HHID, HSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004/2002 child records with 2000 child records using HHID, GSUBHH and OPN for HRS/AHEAD/CODA/WB.
Link the 2006/2004/2002/2000 child records with 1998 child records using HHID, FSUBHH and OPN for HRS/AHEAD/CODA/WB.
Link AHEAD records from the 2006/2004/2002/2000/1998 child records with 1995 child records using HHID, DSUBHH and OPN for AHEAD.
Link the 2006/2004/2002/2000/1998/1995 child records with 1993 child records using HHID, ASUBHH and OPN for AHEAD.

For the CODA [3] and WB [4] sub-samples, interviewed in five waves, 1998, 2000, 2002, 2004, and 2006, it is a four step process, steps 1 to 4 above.

Link 2006 child records with 2004 child records using HHID, JSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004 child records with 2002 child records using HHID, HSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004/2002 child records with 2000 child records using HHID, GSUBHH and OPN for HRS/AHEAD/CODA/WB.
Link the 2006/2004/2002/2000 child records with 1998 child records using HHID, FSUBHH and OPN for HRS/AHEAD/CODA/WB.

For the EBB sub-sample [5], interviewed in three waves, 2002, 2004, and 2006, it is a two step process, steps 1 and 2 above.

Link 2006 child records with 2004 child records using HHID, JSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.
Link the 2006/2004 child records with 2002 child records using HHID, HSUBHH and OPN for HRS/AHEAD/CODA/WB/EBB.

   Notes:
  [1] The HRS sub-sample may be identified by specifying WHERE SUBSTR(HHID,1,1) EQ �0�.
  [2] The AHEAD sub-sample may be identified by specifying WHERE SUBSTR(HHID,1,2) EQ �20�.
  [3] The CODA sub-sample may be identified by specifying WHERE SUBSTR(HHID,1,2) EQ �21�.
  [4] The WB sub-sample may be identified by specifying WHERE SUBSTR(HHID,1,1) EQ �1�.
  [5] The EBB sub-sample may be identified by specifying WHERE SUBSTR(HHID,1,1) EQ �5�.

At each step, if the earlier wave's data record is missing, the previous wave's xSUBHH is filled in with the more recent wave's xSUBHH. For example, for children or household-members for whom a report was obtained in 2006 and 2002, but not 2004, when merging 2006 data with 2004 data, the 2002 HSUBHH is assigned the value of the 2004 JSUBHH, so that in the next step linking will be able to continue with the 2002 data. In the tables below you will find a listing of current-wave and previous-wave SUBHH for each wave of the study.

HRS/AHEAD/CODA/WB/EBB sub-samples
Wave	SubHH	Previous SubHH
2006	KSUBHH	JSUBHH
2004	JSUBHH	HSUBHH
2002	HSUBHH	GSUBHH

HRS/AHEAD/CODA/WB sub-samples
Wave	SubHH	Previous SubHH
2000	GSUBHH	FSUBHH
1998	FSUBHH	ESUBHH or DSUBHH

HRS sub-sample
Wave	SubHH	Previous SubHH
1996	ESUBHH	CSUBHH
1994	CSUBHH	ASUBHH
1992	ASUBHH	...

AHEAD sub-sample
Wave	SubHH	Previous SubHH
1995	DSUBHH	BSUBHH
1993	BSUBHH	...

Finally, to create a master file, concatenate merged records for all sub-samples

for EBB, select the EBB records from the 2006/2004/2002 file,
for CODA/WB, select the CODA/WB records from the 2006/2004/2002/2000/1998 file,
for HRS, use the entire 2006/2004/2002/2000/1998/1996/1994/1992 file, and
for AHEAD, use the entire 2006/2004/2002/2000/1998/1995/1993 file.

From this master file you can subset records and create variables with information from earlier waves. SAS code to create a master file containing household-member/child records for all waves for all sub-samples is provided at the end of this document. You can modify it to include variables of analytic interest to you. If you are using SPSS or Stata, you would want to specify equivalent statements.

From 1996 to 2006, children�s education was obtained only for children under 30 or new children. In order to categorize the 2006 children by education you would need to merge files from all the way back to 1992, linking household-member/child records across time. You can create a 2006 historical file from the master file by selecting records with data from 2006, e.g., WHERE DA06 EQ 1, that will match a 2006 household-member/child record and assign values for "historical" variables with code like that illustrated below. For instance, you can create an education variable by selecting information for education from the most recent information (and taking into account different, if any, code frames over the years). See sample code at the end of this document.

If you are doing analysis of pre-2006 household-member/child records, for instance 2000, you can create a 2000 historical file by selecting records with data from 2000 from the master file, e.g., WHERE DA00 EQ 1, and eliminate duplicate records that are a artifact of a split in the household in a subsequent wave. As described below, duplicate records may occur because the household split in a subsequent wave and provided more than one report for a child, e.g., a dad-says record and a mom-says record. The resulting file, with duplicates omitted, can be matched by HHID GSUBHH OPN, one-to-one, to the 2000 household-member/child files. See sample code at the end of this document.

In a household for a couple that split after the first wave of the study, a single child may have two records in later waves. Thus in a given wave, one HHID OPN can represent EITHER one person OR two different people. The following illustration -- couple divorces, one respondent remarries, both split-off households have new members -- may help you visualize the situation. (Names are for illustration only.) For example, in 1994, after a split, a single OPN may represent either two reports of one person, e.g., 101, in the scenario below or two different people, e.g., 151 in the scenario below. His new stepchildren were assigned a CSUBHH of 1 and OPNs of 151 and 152. Her mother was assigned a CSUBHH of 2 and an OPN of 151.

HHID and LOPN uniquely identify a single child or household-member across all waves of the study. The first digit of the LOPN variable is the SUBHH in which the child or household-member entered the study. The remaining three digits are the individual�s OPN number. LOPN is not needed to merge files for most types of analysis. For most types of merging you will need to use HHID, xSUBHH and OPN.

Example

Wave 1 - 1992
Household records
  HHID=090123 ASUBHH=0
Respondent records
  HHID=090123 PN=010 ASUBHH=0 (Joe)
  HHID=090123 PN=020 ASUBHH=0 (Carol)
Household member/child records
  HHID=090123 ASUBHH=0 OPN=101 (Susan daughter)

Wave 2 - 1994
Household records
  HHID=090123 CSUBHH=1
  HHID=090123 CSUBHH=2
Respondent records
  HHID=090123 PN=010 CSUBHH=1 (Joe)
  HHID=090123 PN=011 CSUBHH=1 (Amy - new wife)
  HHID=090123 PN=020 CSUBHH=2 (Carol)
Household member/child records
  HHID=090123 CSUBHH=1 ASUBHH=0 OPN=101 (Susan daughter - Joe's report)
  HHID=090123 CSUBHH=1 ASUBHH=0 OPN=151 (Joe's stepchild)
  HHID=090123 CSUBHH=1 ASUBHH=0 OPN=152 (Joe's stepchild)
  HHID=090123 CSUBHH=2 ASUBHH=0 OPN=101 (Susan daughter - Carol's report)
  HHID=090123 CSUBHH=2 ASUBHH=0 OPN=151 (Carol's mom)

Merged Master file
  HHID=090123 OPN=101 LOPN=0101 DA92=1 DA94=1 ASUBHH=0 CSUBHH=1 (Susan daughter - Joe's report)
  HHID=090123 OPN=151 LOPN=1151 DA92=. DA94=1 ASUBHH=0 CSUBHH=1 (Joe's stepchild)
  HHID=090123 OPN=152 LOPN=1152 DA92=. DA94=1 ASUBHH=0 CSUBHH=1 (Joe's stepchild)
  HHID=090123 OPN=101 LOPN=0101 DA92=1 DA94=1 ASUBHH=0 CSUBHH=2 (Susan daughter - Carol's report)
  HHID=090123 OPN=151 LOPN=2151 DA92=. DA94=1 ASUBHH=0 CSUBHH=2 (Carol's mom)

Sample Code

SAS code is provided below. If you are using SPSS or Stata, you will need to specify equivalent statements.

*************************************************************************;
*     Merging household-member/child records
     for all sub-samples 2006 - 1992;
*************************************************************************;
*        Create annual data files including HHID OPN cySUBHH pySUBHH,
     an assigned variable to indicate presence of record
     and variables of analytic interest, renamed if desired;
*************************************************************************;

data da06;
     set in06.h06e_mc;
     keep hhid opn ksubhh jsubhh da06 KE029;
     da06=1;
     rename KE029 = edu06;
run;

data da04;
     set in04.h04e_mc;
     keep hhid opn jsubhh hsubhh da04 JE029;
     da04=1;
     rename JE029 = edu04;
run;

data da02;
     set in02.h02e_mc;
     keep hhid opn hsubhh gsubhh da02 HE029;
     da02=1;
     rename HE029 = edu02;
run;

data da00;
     set in00.h00d_mc;
     keep hhid opn gsubhh fsubhh da00 G2008;
     da00=1;
     rename G2008 = edu00;
run;

data da98;
     set in98.h98d_mc;
     keep hhid opn fsubhh dsubhh esubhh da98 F1792;
     da98=1;
     rename F1792 = edu98;
run;

data da96;
     set in96.h96d_mc;
     keep hhid opn esubhh csubhh da96 E1372;
     da96=1;
     rename E1372 = edu96;
run;

data da95;
     set in95.a95d_mc;
     keep hhid opn dsubhh bsubhh da95 D1402;
     da95=1;
     rename D1402 = edu95;
run;

data da94;
     set in94.W2KIDS;
     keep hhid opn csubhh asubhh da94 W8010;
     da94=1;
     rename W8010 = edu94;
run;

data da93;
     set in93.BOP21;
     keep hhid opn bsubhh da93 V423;
     da93=1;
     rename V423 = edu93;
run;

data da92;
     set in92.kids;
     keep hhid opn asubhh da92 V8009;
     da92=1;
     rename V8009 = edu92;
run;

*************************************************************************;
*     Combine annual data files;
*************************************************************************;

*........................................................................;
*     For HRS/AHEAD/CODA/WB/EBB;
*........................................................................;

*     Step 1;
proc sort data=da06; by hhid jsubhh opn; run;
proc sort data=da04; by hhid jsubhh opn; run;

data da0604;
     merge da06 da04(in=a);
     by hhid jsubhh opn;
     if not a then hsubhh=jsubhh;
run;

*     Step 2;
proc sort data=da0604; by hhid hsubhh opn; run;
proc sort data=da02; by hhid hsubhh opn; run;

data da0602;
     merge da0604 da02(in=a);
     by hhid hsubhh opn;
     if not a then gsubhh=hsubhh;
run;

*........................................................................;
*     For HRS/AHEAD/CODA/WB;
*........................................................................;

*     Step 3;
*     Omit EBB records;
data da0602hacw;
     set da0602;
     where (substr(hhid,1,1) ne '5');
run;

proc sort data=da0602hacw; by hhid gsubhh opn; run;
proc sort data=da00; by hhid gsubhh opn; run;

data da0600;
     merge da0602hacw da00(in=a);
     by hhid gsubhh opn;
     if not a then fsubhh=gsubhh;
run;

*     Step 4;
proc sort data=da0600; by hhid fsubhh opn; run;
proc sort data=da98; by hhid fsubhh opn; run;

data da0698;
     merge da0600 da98(in=a);
     by hhid fsubhh opn;
     if not a then esubhh=fsubhh;
run;

*........................................................................;
*     For HRS;
*........................................................................;

*     Step 5;
*     Select HRS records;
data da0698h;
     set da0698;
     where (substr(hhid,1,1) eq '0');
run;

proc sort data=da0698h; by hhid esubhh opn; run;
proc sort data=da96; by hhid esubhh opn; run;

data da0696;
     merge da0698h da96(in=a);
     by hhid esubhh opn;
     if not a then csubhh=esubhh;
run;

*     Step 6;
proc sort data=da0696; by hhid csubhh opn; run;
proc sort data=da94; by hhid csubhh opn; run;

data da0694;
     merge da0696 da94(in=a);
     by hhid csubhh opn;
     if not a then asubhh=csubhh;
run;

*     Step 7;
proc sort data=da0694; by hhid asubhh opn; run;
proc sort data=da92; by hhid asubhh opn; run;

data da0692;
     merge da0694 da92;
     by hhid asubhh opn;
run;

*........................................................................;
*     For AHEAD;
*........................................................................;

*     Step 8;
*     Select AHEAD records;
data da0698a;
     set da0698;
     where (substr(hhid,1,2) eq '20');
run;

proc sort data=da0698a; by hhid dsubhh opn; run;
proc sort data=da95; by hhid dsubhh opn; run;

data da0695;
     merge da0698a da95(in=a);
     by hhid dsubhh opn;
     if not a then bsubhh=dsubhh;
run;

*     Step 9;
proc sort data=da0695; by hhid bsubhh opn; run;
proc sort data=da93; by hhid bsubhh opn; run;

data da0693;
     merge da0695 da93;
     by hhid bsubhh opn;
run;

*************************************************************************;
*     Finally concatenate final-stage data for all sub-samples
     to create a master file;

*     Select EBB records;
data da0602e;
     set da0602;
     where substr(hhid,1,1) eq '5';
run;

*     Select CODA/WB records;
data da0600cw;
     set da0600;
     where substr(hhid,1,1) eq '1' or substr(hhid,1,2) eq '21';
run;

data master;
     set
     da0602e
     da0600cw
     da0692
     da0693;

     *     assign LOPN;
     lopn='----';
     if da92 then lopn=asubhh||opn;
     else if da93 then lopn=bsubhh||opn;
     else if da94 then lopn=csubhh||opn;
     else if da95 then lopn=dsubhh||opn;
     else if da96 then lopn=esubhh||opn;
     else if da98 then lopn=fsubhh||opn;
     else if da00 then lopn=gsubhh||opn;
     else if da02 then lopn=hsubhh||opn;
     else if da04 then lopn=jsubhh||opn;
     else if da06 then lopn=ksubhh||opn;

     *     blank out assigned xSUBHH;
     if da92 eq . and da94 eq . then asubhh='';
     if da93 eq . and da95 eq . then bsubhh='';
     if da94 eq . and da96 eq . then csubhh='';
     if da95 eq . and da98 eq . then dsubhh='';
     if da96 eq . and da98 eq . then esubhh='';
     if da98 eq . and da00 eq . then fsubhh='';
     if da00 eq . and da02 eq . then gsubhh='';
     if da02 eq . and da04 eq . then hsubhh='';
     if da04 eq . and da06 eq . then jsubhh='';
run;


*************************************************************************;
*     Check distribution of historical analytic variables;

proc freq data=da06; table edu06 / missing nopercent; run;
proc freq data=da04; table edu04 / missing nopercent; run;
proc freq data=da02; table edu02 / missing nopercent; run;
proc freq data=da00; table edu00 / missing nopercent; run;
proc freq data=da98; table edu98 / missing nopercent; run;
proc freq data=da96; table edu96 / missing nopercent; run;
proc freq data=da95; table edu95 / missing nopercent; run;
proc freq data=da94; table edu94 / missing nopercent; run;
proc freq data=da93; table edu93 / missing nopercent; run;
proc freq data=da92; table edu92 / missing nopercent; run;

*************************************************************************;
*     Depending on your analytic needs,
     create wave-specific historical files
     for merging with wave-specific household-member/child files

     Two examples, for 2006 and for 2000, are given below

     The number of records in the historical file should match
     exactely the number of records in the corresponding wave-specific
     household-member/child file
     ;

*************************************************************************;
*     Create historical file for 2006;

data hist2006;
     set master;
     *     Select records that will match a 2006 record;
     where da06 eq 1;

     *   Assign education - most recent;
     edu=-1;
     if edu06 ne . then edu=edu06;
     else if edu04 ne . then edu=edu04;
     else if edu02 ne . then edu=edu02;
     else if edu00 ne . then edu=edu00;
     else if edu98 ne . then edu=edu98;
     else if edu96 ne . then edu=edu96;
     else if edu95 ne . then edu=edu95;
     else if edu94 ne . then edu=edu94;
     else if edu93 ne . then edu=edu93;
     else if edu92 ne . then edu=edu92;

     if edu eq .D then edu=98;
     if edu eq .R then edu=99;
run;

*************************************************************************;
*     Create historical file for 2000;

data hist2000a;
     set master;
     *     Select records that will match a 2000 record;
     where da00 eq 1;

     *   Assign education - most recent;
     edu=-1;
     if edu00 ne . then edu=edu00;
     else if edu98 ne . then edu=edu98;
     else if edu96 ne . then edu=edu96;
     else if edu95 ne . then edu=edu95;
     else if edu94 ne . then edu=edu94;
     else if edu93 ne . then edu=edu93;
     else if edu92 ne . then edu=edu92;

     if edu eq .D then edu=98;
     if edu eq .R then edu=99;

     *     Drop variables from "future" waves;
     drop edu02 edu04 edu06 da02 da04 da06 hsubhh jsubhh ksubhh;
run;

*     Eliminate duplicate records (from split household in subsequent wave);
proc sort
     data=hist2000a
     out=hist2000 nodupkey;
     by hhid gsubhh opn;
run;

*************************************************************************;