To assess the utility of imputing race/ethnicity using U.S. Census race/ethnicity, residential address, and surname information compared to standard missing data methods in a pediatric cohort.
Data Sources/Study Setting
Electronic health record data from 30 pediatric practices with known race/ethnicity.
In a simulation experiment, we constructed dichotomous and continuous outcomes with preāspecified associations with known race/ethnicity. Bias was introduced by nonrandomly setting race/ethnicity to missing. We compared typical methods for handling missing race/ethnicity (multiple imputation alone with clinical factors, complete case analysis, indicator variables) to multiple imputation incorporating surname and address information.
Imputation using U.S. Census information reduced bias for both continuous and dichotomous outcomes.
The new method reduces bias when race/ethnicity is partially, nonrandomly missing.