Maintaining Respondent Privacy and Anonymity: Guidelines for HRS Restricted Data Users
Introduction
A contractual
obligation of researchers who qualify for access to restricted data from the
Health and Retirement Study is to maintain respondent anonymity. This document
is designed to assist those researchers in meeting this requirement by providing
them with guidelines for implementing their own disclosure limitation review
process.
Goals of the Disclosure Limitation Review Process
- Prevent disclosure of confidential information
- Reduce the likelihood of respondent re-identification
- Provide useful data resources to researchers
- Ensure that the results of the review process are
acceptable to both the researcher and the provider(s) of the restricted
data.
Methods Used to Protect Confidentiality in HRS Data Products
- All HRS public and restricted files are directly
or indirectly based on sample survey methodology.[1]
- Public file variables containing indirect identifiers
such as industry, occupation, and geographic information have been collapsed.[2]
- Microdata files derived from SSA administrative
data (Wage and Self-Employment Income, Earnings, Benefits, and SSI records) have been subjected to rounding
and top-coding in accordance with the governing Memorandum of Understanding.[3]
- Direct respondent identifiers such as name, address,
SSN, Medicare/Medicaid identifier, place of birth, etc. have been removed
from all public microdata products, and limitations have been placed on
access to geographic detail information[4]
- Data items at the respondent level related to sample
design, such as PSU, segment, and line, are not distributed to users.
Protecting Confidentiality During Analysis
- Researchers should only publish statistical summary
information (frequency tabulations, magnitude tabulations, means, variances,
regression coefficients, and correlation coefficients) that does not permit
the identification of any individual person, family, household, employer
or benefit provider.
- File(s) that result from any merge process which
includes restricted data input should be treated as restricted.
- Researchers should not publish the results of any
analysis that can potentially identify respondents, either directly or inferentially.
- Researchers are prohibited from publishing results
that identify geographic areas below the level of Census Division. Under
certain circumstances restricted data users with access to state-level geographic
information may wish to report state-level summary information. In such
cases, analysis results must be submitted to the Health and Retirement Study
for review and approval prior to presentation or publication.
- When producing tabulations for distribution, the
following guidelines should be employed:
- Magnitude Data: Ensure that no cells/strata with
n < 3 are produced.[5]
- Frequency Data: Apply a marginal threshold of
n >= 5 and cell threshold of n >= 3 to all tabulations.[6]
- Certain types of cross-category merges (e.g., State-level geographic data
with Social Security Administrative data) are not allowed
under the standard restricted data agreement (see Merge Rule Cross-Reference
Table, below). Researchers are reminded that geographic information may
not be used in conjunction with files derived from Social Security administrative
data without written permission from the Social Security Administration.
- Analysis results containing merged area data based
on geographic information may be reported if there is no direct identification
of geographic areas, if geographic areas are reported using the same grouping
characteristics as public files, or if special approval has been granted
by the HRS Data Confidentiality Committee. When using geocodes to link respondent
information to area data, make sure that respondent privacy is not inadvertently
compromised by reporting unique area data values (e.g., including census tracts
with unusual environmental characteristics in data analysis reports).
- Researchers may wish to recode or collapse certain
high visibility variables such as Cause of Death or Medical Condition before
reporting analysis results using such variables.[7]
- All published research resulting from restricted
data analysis should be reviewed according to the terms of the Agreement
For Use of Restricted Data From the Health and Retirement Study.
Merge Rule Cross-Reference Table[8]
| |
HRS Public Data |
Geographic Information |
SSA Administrative Data Sets |
Medicare Summary and Claim Records |
National Death Index |
Other HRS Restricted Data Sets |
HRS Public Data |
Unrestricted |
Class 1 |
Class 2 |
Class 2 |
Class 2 |
Class 1 |
Geographic Information |
Class 1 |
n.a. |
Class 3 |
Class 2 |
Class 2 |
Class 1 |
SSA Administrative Data Sets[9] |
Class 2 |
Class 3 |
n.a. |
Class 3 |
Class 2 |
Class 2 |
Medicare Summary & Claim Records |
Class 2 |
Class 2 |
Class 3 |
n.a. |
Class 2 |
Class 2 |
National Death Index |
Class 2 |
Class 2 |
Class 2 |
Class 2 |
n.a. |
Class 2 |
Other HRS Restricted Data Sets |
Class 1 |
Class 1 |
Class 2 |
Class 2 |
Class 2 |
n.a. |
Unrestricted: Public data sets (including Sensitive Health Data products) provided by the Health
and Retirement Study that may be merged with any restricted data set. Sensitive Health Data products
are provided to researchers under terms of a data use agreement signed by the researcher and HRS.
Class 1: Restricted data sets provided by the Health and Retirement Study for merging
with HRS public data sets under terms of HRS Data Confidentiality Rules.
Class 2: Restricted data sets provided to HRS for merging with HRS public data sets
under terms of a negotiated MOU or DUA
Class 3: Special case — merging procedures and disclosure review rules are
based on negotiations among HRS, restricted data provider(s) and researcher.
|
[1]
Report on Statistical Disclosure Limitation Methodology (Working Paper 22),
Federal Committee on Statistical Methodology, Office of Management and Budget, May 1994, Chapter II.C.1.
[2] ibid., Chapter II.C.2
[3] ibid., Chapter II.E.2.a
[4] ibid., Chapter II.E.1
[5] ibid., Chapter III.B.1, Summary of Agency Practices.
[6] ibid., Chapter III.B.1, Summary of Agency Practices.
[7] ibid., Chapter II.E.2
[8]See the
HRS File Merge Cross-Reference Table for further details.
[9]
HRS cohort (respondents): Covered Earnings, RDSI Benefits, SSI Benefits, Summary of Earnings and Projected Benefits file.
AHEAD cohort (respondents and deceased spouses): Covered Earnings, RDSI Benefits, SSI Benefits, and Wage and Self-Employment Income.
CODA/War Baby cohorts (respondents and deceased spouses): Covered Earnings, RDSI Benefits, Payment History, and Wage and Self-Employment Income.
|