On this page...

Introduction
Goals
Protection Methods
Analysis Rules
Merge XRef
Notes

Data Products » Restricted Data » Application Overview » Part X

Maintaining Respondent Privacy and Anonymity: Guidelines for HRS Restricted Data Users

Introduction

A contractual obligation of researchers who qualify for access to restricted data from the Health and Retirement Study is to maintain respondent anonymity. This document is designed to assist those researchers in meeting this requirement by providing them with guidelines for implementing their own disclosure limitation review process.

Goals of the Disclosure Limitation Review Process

  • Prevent disclosure of confidential information
  • Reduce the likelihood of respondent re-identification
  • Provide useful data resources to researchers
  • Ensure that the results of the review process are acceptable to both the researcher and the provider(s) of the restricted data.

Methods Used to Protect Confidentiality in HRS Data Products

  • All HRS public and restricted files are directly or indirectly based on sample survey methodology.[1]
  • Public file variables containing indirect identifiers such as industry, occupation, and geographic information have been collapsed.[2]
  • Microdata files derived from SSA administrative data (e.g., Earnings, Benefits, and SSI records) have been subjected to rounding and top-coding in accordance with the governing Memorandum of Understanding.[3]
  • Direct respondent identifiers such as name, address, SSN, Medicare/Medicaid identifier, place of birth, etc. have been removed from all public microdata products, and limitations have been placed on access to geographic detail information[4]
  • Data items at the respondent level related to sample design, such as PSU, segment, and line, are not distributed to the public.

Protecting Confidentiality During Analysis

  • Researchers should only publish statistical summary values (frequency tabulations, magnitude tabulations, means, variances, regression coefficients, and correlation coefficients) that do not permit the identification of any individual person, family, household, employer, or benefit provider.
  • File(s) that result from any merge process which includes restricted data input should be treated as restricted.
  • Researchers should not publish the results of any analysis that can potentially identify respondents, either directly or inferentially.
  • Researchers are prohibited from publishing results that identify geographic areas below the level of Census Division. Under certain circumstances restricted data users with access to state-level geographic information may wish to report state-level summary information. In such cases, analysis results must be submitted to the Health and Retirement Study for review and approval prior to presentation or publication.
  • When producing tabulations for distribution, the following guidelines should be employed:
    • Magnitude Data: Ensure that no cells/strata with n < 3 are produced.[5]
    • Frequency Data: Apply a marginal threshold of n >= 5 and cell threshold of n >= 3 to all tabulations.[6]
  • Certain types of cross-category merges (e.g., State-level geographic data with Social Security Administrative data) are not allowed under the standard restricted data agreement (see Merge Rule Cross-Reference Table, below). Researchers are reminded that geographic information may not be used in conjunction with files derived from Social Security administrative data without written permission from the Social Security Administration.
  • Analysis results containing merged area data based on geographic information may be reported if there is no direct identification of geographic areas, if geographic areas are reported using the same grouping characteristics as public files, or if special approval has been granted by the HRS Data Confidentiality Committee. When using geocodes to link respondent information to area data, make sure that respondent privacy is not inadvertently compromised by reporting unique area data values (e.g., including census tracts with unusual environmental characteristics in data analysis reports).
  • Researchers may wish to recode or collapse certain high visibility variables such as Cause of Death or Medical Condition before reporting analysis results using such variables.[7]
  • All published research resulting from restricted data analysis should be reviewed according to the terms of the Agreement For Use of Restricted Data From the Health and Retirement Study.

Merge Rule Cross-Reference Table[8]

 

HRS Public Data

Geographic Information

SSA Administrative Data Sets

Medicare Summary and Claim Records

National Death Index

Other HRS Restricted Data Sets

HRS Public Data

Unrestricted

Class 1

Class 2

Class 2

Class 2

Class 1

Geographic Information

Class 1

n.a.

Class 3

Class 2

Class 2

Class 1

SSA Administrative Data Sets[9]

Class 2

Class 3

n.a.

Class 3

Class 2

Class 2

Medicare Summary & Claim Records

Class 2

Class 2

Class 3

n.a.

Class 2

Class 2

National Death Index

Class 2

Class 2

Class 2

Class 2

n.a.

Class 2

Other HRS Restricted Data Sets

Class 1

Class 1

Class 2

Class 2

Class 2

n.a.

Unrestricted: Public data sets (including Sensitive Health Data products) provided by the Health and Retirement Study that may be merged with any restricted data set. Sensitive Health Data products are provided to researchers under terms of a data use agreement signed by the researcher and HRS.
Class 1: Restricted data sets provided by the Health and Retirement Study for merging with HRS public data sets under terms of HRS Data Confidentiality Rules.
Class 2: Restricted data sets provided to HRS for merging with HRS public data sets under terms of a negotiated MOU or DUA
Class 3: Special case — merging procedures and disclosure review rules are based on negotiations among HRS, restricted data provider(s) and researcher.


Notes

[1] Report on Statistical Disclosure Limitation Methodology (Working Paper 22), Federal Committee on Statistical Methodology, Office of Management and Budget, May 1994, Chapter II.C.1.
[2] ibid., Chapter II.C.2
[3] ibid., Chapter II.E.2.a
[4] ibid., Chapter II.E.1
[5] ibid., Chapter III.B.1, Summary of Agency Practices
[6] ibid., Chapter III.B.1, Summary of Agency Practices
[7] ibid., Chapter II.E.2
[8] See the HRS File Merge Cross-Reference Table for further details.
[9] HRS cohort (respondents): Covered Earnings, RDSI Benefits, SSI Benefits, Summary of Earnings and Projected Benefits file.
AHEAD cohort (respondents and deceased spouses): Covered Earnings, RDSI Benefits, SSI Benefits, and Wage and Self-Employment Income.
CODA/War Baby cohorts (respondents and deceased spouses): Covered Earnings, RDSI Benefits, Payment History, and Wage and Self-Employment Income.
Early Boomers cohort (respondents and deceased spouses): Covered Earnings, RDSI Benefits, Payment History, and Wage and Self-Employment Income.
2006 (and thereafter) Permissions (respondents and deceased spouses): Covered Earnings, RDSI Benefits, Payment History, and Wage and Self-Employment Income.

Top