# Decomposition analysis to identify intervention targets for reducing disparities

There has been considerable interest in using decomposition methods in epidemiology (mediation analysis) and economics (Oaxaca-Blinder decomposition) to understand how health disparities arise and how they might change upon intervention. It has not been clear when estimates from the Oaxaca-Blinder decomposition can be interpreted causally because its implementation does not explicitly address potential confounding of target variables. While mediation analysis does explicitly adjust for confounders of target variables, it does so in a way that entails equalizing confounders across racial groups, which may not reflect the intended intervention. Revisiting prior analyses in the National Longitudinal Survey of Youth on disparities in wages, unemployment, incarceration, and overall health with test scores, taken as a proxy for educational attainment, as a target intervention, we propose and demonstrate a novel decomposition that controls for confounders of test scores (measures of childhood SES) while leaving their association with race intact. We compare this decomposition with others that use standardization (to equalize childhood SES alone), mediation analysis (to equalize test scores within levels of childhood SES), and one that equalizes both childhood SES and test scores. We also show how these decompositions, including our novel proposals, are equivalent to causal implementations of the Oaxaca-Blinder decomposition.

**Comments:**John Jackson is Assistant Professor in the Departments of Epidemiology and Mental Health at the Johns Hopkins Bloomberg School of Public Health and Tyler VanderWeele is Professor in the Departments of Epidemiology and Biostatistics at the Harvard T.H. Chan School of Public Health. Correspondence to john.jackson@jhu.edu

## Similar Publications

We describe a simple and effective technique, the Eigenvector Method for Umbrella Sampling (EMUS), for accurately estimating small probabilities and expectations with respect to a given target probability density. In EMUS, we apply the principle of stratified survey sampling to Markov chain Monte Carlo (MCMC) simulation: We divide the support of the target distribution into regions called strata, we use MCMC to sample (in parallel) from probability distributions supported in each of the strata, and we weight the data from each stratum to assemble estimates of general averages with respect to the target distribution. We demonstrate by theoretical results and computational examples that EMUS can be dramatically more efficient than direct Markov chain Monte Carlo when the target distribution is multimodal or when the goal is to compute tail probabilities. Read More

Unwanted variation, including hidden confounding, is a well-known problem in many fields, particularly large-scale gene expression studies. Recent proposals to use control genes --- genes assumed to be unassociated with the covariates of interest --- have led to new methods to deal with this problem. Going by the moniker Removing Unwanted Variation (RUV), there are many versions --- RUV1, RUV2, RUV4, RUVinv, RUVrinv, RUVfun. Read More

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. We improved the computational efficiency of an earlier solution with a low-rank, Nystr\"om-like solution. Read More

We propose an adaptive confidence interval procedure (CIP) for the coefficients in the normal linear regression model. This procedure has a frequentist coverage rate that is constant as a function of the model parameters, yet provides smaller intervals than the usual interval procedure, on average across regression coefficients. The proposed procedure is obtained by defining a class of CIPs that all have exact $1-\alpha$ frequentist coverage, and then selecting from this class the procedure that minimizes a prior expected interval width. Read More

Quantile regression, the prediction of conditional quantiles, finds applications in various fields. Often, some or all of the variables are discrete. The authors propose two new quantile regression approaches to handle such mixed discrete-continuous data. Read More

There have been some major advances in the theory of optimal designs for interference models. However, the majority of them focus on one-dimensional layout of the block and the study for two-dimensional interference model is quite limited partly due to technical difficulties. This paper tries to fill this gap. Read More

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are assumed to degrade statistical performance relative to exact methods. In applied mathematics, where much of the current theoretical work on approximation resides, the inputs are considered to be observed exactly. Read More

Effect modification occurs when the effect of the treatment variable on an outcome varies according to the level of other covariates and often has important implications in decision making. When there are hundreds of covariates, it becomes necessary to use the observed data to select a simpler model for effect modification and then make appropriate statistical inference. A two stage procedure is proposed to solve this problem. Read More

Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid sub-Gaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv [J. Read More

Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely $P^2$ statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. Read More