# Causal Inference through the Method of Direct Estimation

The intersection of causal inference and machine learning is a rapidly advancing field. We propose a new approach, the method of direct estimation, that draws on both traditions in order to obtain nonparametric estimates of treatment effects. The approach focuses on estimating the effect of fluctuations in a treatment variable on an outcome. A tensor-spline implementation enables rich interactions between functional bases allowing for the approach to capture treatment/covariate interactions. We show how new innovations in Bayesian sparse modeling readily handle the proposed framework, and then document its performance in simulation and applied examples. Furthermore we show how the method of direct estimation can easily extend to structural estimators commonly used in a variety of disciplines, like instrumental variables, mediation analysis, and sequential g-estimation.

**Comments:**Unpublished manuscript

## Similar Publications

The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardiaiality constraints. In this paper, we consider the scenario where overlapped cardinality constraints satisfy a Three-view Cardinality Structure (TVCS), which reflects the natural restriction in many applications, such as identification of gene regulatory networks and task-worker assignment problem. Read More

Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. Read More

Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Read More

Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. Read More

It is generally accepted that all models are wrong -- the difficulty is determining which are useful. Here, a useful model is considered as one that is capable of combining data and expert knowledge, through an inversion or calibration process, to adequately characterize the uncertainty in predictions of interest. This paper derives conditions that specify which simplified models are useful and how they should be calibrated. Read More

Variational inference methods for latent variable statistical models have gained popularity because they are relatively fast, can handle large data sets, and have deterministic convergence guarantees. However, in practice it is unclear whether the fixed point identified by the variational inference algorithm is a local or a global optimum. Here, we propose a method for constructing iterative optimization algorithms for variational inference problems that are guaranteed to converge to the $\epsilon$-global variational lower bound on the log-likelihood. Read More

Current approaches for Knowledge Distillation (KD) either directly use training data or sample from the training data distribution. In this paper, we demonstrate effectiveness of 'mismatched' unlabeled stimulus to perform KD for image classification networks. For illustration, we consider scenarios where this is a complete absence of training data, or mismatched stimulus has to be used for augmenting a small amount of training data. Read More

We study primal-dual type stochastic optimization algorithms with non-uniform sampling. Our main theoretical contribution in this paper is to present a convergence analysis of Stochastic Primal Dual Coordinate (SPDC) Method with arbitrary sampling. Based on this theoretical framework, we propose Optimality Violation-based Sampling SPDC (ovsSPDC), a non-uniform sampling method based on Optimality Violation. Read More

Recent advances in deep learning for object recognition in natural images has prompted a surge of interest in applying a similar set of techniques to medical images. Most of the initial attempts largely focused on replacing the input to such a deep convolutional neural network from a natural image to a medical image. This, however, does not take into consideration the fundamental differences between these two types of data. Read More

The recently developed variational autoencoders (VAEs) have proved to be an effective confluence of the rich representational power of neural networks with Bayesian methods. However, most work on VAEs use a rather simple prior over the latent variables such as standard normal distribution, thereby restricting its applications to relatively simple phenomena. In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space. Read More