To evaluate the effects of specification choices on the accuracy of estimates in difference‐in‐differences () models.
Process‐of‐care quality data from Hospital Compare between 2003 and 2009.
We performed a Monte Carlo simulation experiment to estimate the effect of an imaginary policy on quality. The experiment was performed for three different scenarios in which the probability of treatment was (1) unrelated to pre‐intervention performance; (2) positively correlated with pre‐intervention levels of performance; and (3) positively correlated with pre‐intervention trends in performance. We estimated alternative models that varied with respect to the choice of data intervals, the comparison group, and the method of obtaining inference. We assessed estimator bias as the mean absolute deviation between estimated program effects and their true value. We evaluated the accuracy of inferences through statistical power and rates of false rejection of the null hypothesis.
Performance of alternative specifications varied dramatically when the probability of treatment was correlated with pre‐intervention levels or trends. In these cases, propensity score matching resulted in much more accurate point estimates. The use of permutation tests resulted in lower false rejection rates for the highly biased estimators, but the use of clustered standard errors resulted in slightly lower false rejection rates for the matching estimators.
When treatment and comparison groups differed on pre‐intervention levels or trends, our results supported specifications for models that include matching for more accurate point estimates and models using clustered standard errors or permutation tests for better inference. Based on our findings, we propose a checklist for analysis.