Luke W. Miratrix - University of California, Berkeley; Department of Statistics

Luke W. Miratrix
Are you Luke W. Miratrix?

Claim your profile, edit publications, add additional information:

Contact Details

Luke W. Miratrix
University of California, Berkeley; Department of Statistics
United States

Pubs By Year

External Links

Pub Categories

Statistics - Methodology (10)
Statistics - Applications (7)
Computer Science - Computation and Language (3)
Statistics - Theory (3)
Mathematics - Statistics (3)
Computer Science - Information Retrieval (2)
Computer Science - Learning (1)

Publications Authored By Luke W. Miratrix

Regression discontinuity designs (RDDs) are natural experiments where treatment assignment is determined by a covariate value (or "running variable") being above or below a predetermined threshold. Because the treatment effect will be confounded by the running variable, RDD analyses focus on the local average treatment effect (LATE) at the threshold. The most popular methodology for estimating the LATE in an RDD is local linear regression (LLR), which is a weighted linear regression that places larger weight on units closer to the threshold. Read More

The popularity of online surveys has increased the prominence of sampling weights in claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the Neyman-Rubin model, for researchers producing and working with survey experimental data. We examine simple, efficient estimators (Horvitz-Thompson, H\`ajek, "double-H\`ajek", and post-stratification) for analyzing these data, along with formulae for biases and variances. Read More

There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite populations, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the physical randomization of the treatment assignment. These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. Read More

Estimating treatment effects for subgroups defined by post-treatment behavior (i.e., estimating causal effects in a principal stratification framework) can be technically challenging and heavily reliant on strong assumptions. Read More

Latent Dirichlet Allocation (LDA) models trained without stopword removal often produce topics with high posterior probabilities on uninformative words, obscuring the underlying corpus content. Even when canonical stopwords are manually removed, uninformative words common in that corpus will still dominate the most probable words in a topic. We propose a simple strategy for automatically promoting terms with domain relevance and demoting these domain-specific stop words. Read More

Researchers addressing post-treatment complications in randomized trials often turn to principal stratification to define relevant assumptions and quantities of interest. One approach for estimating causal effects in this framework is to use methods based on the "principal score," typically assuming that stratum membership is as-good-as-randomly assigned given a set of covariates. In this paper, we clarify the key assumption in this context, known as Principal Ignorability, and argue that versions of this assumption are quite strong in practice. Read More

Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Read More

Understanding and characterizing treatment effect variation in randomized experiments has become essential for going beyond the "black box" of the average treatment effect. Nonetheless, traditional statistical approaches often ignore or assume away such variation. In the context of a randomized experiment, this paper proposes a framework for decomposing overall treatment effect variation into a systematic component that is explained by observed covariates, and a remaining idiosyncratic component. Read More

Principal stratification is a widely used framework for addressing post-randomization complications in a principled way. After using principal stratification to define causal effects of interest, researchers are increasingly turning to finite mixture models to estimate these quantities. Unfortunately, standard estimators of the mixture parameters, like the MLE, are known to exhibit pathological behavior. Read More

We propose a general framework for topic-specific summarization of large text corpora, and illustrate how it can be used for analysis in two quite different contexts: an OSHA database of fatality and catastrophe reports (to facilitate surveillance for patterns in circumstances leading to injury or death) and legal decisions on workers' compensation claims (to explore relevant case law). Our summarization framework, built on sparse classification methods, is a compromise between simple word frequency based methods currently in wide use, and more heavyweight, model-intensive methods such as Latent Dirichlet Allocation (LDA). For a particular topic of interest (e. Read More

In randomized experiments with noncompliance, tests may focus on compliers rather than on the overall sample. Rubin (1998) put forth such a method, and argued that testing for the complier average causal effect and averaging permutation based p-values over the posterior distribution of the compliance status could increase power, as compared to general intent-to-treat tests. The general scheme is to repeatedly do a two-step process of imputing missing compliance statuses and conducting a permutation test with the completed data. Read More

We consider the conditional randomization test as a way to account for covariate imbalance in randomized experiments. The test accounts for covariate imbalance by comparing the observed test statistic to the null distribution of the test statistic conditional on the observed covariate imbalance. We prove that the conditional randomization test has the correct significance level and introduce original notation to describe covariate balance more formally. Read More

Applied researchers are increasingly interested in whether and how treatment effects vary in randomized evaluations, especially variation not explained by observed covariates. We propose a model-free approach for testing for the presence of such unexplained variation. To use this randomization-based approach, we must address the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this setting. Read More

"M-Bias," as it is called in the epidemiologic literature, is the bias introduced by conditioning on a pretreatment covariate due to a particular "M-Structure" between two latent factors, an observed treatment, an outcome, and a "collider." This potential source of bias, which can occur even when the treatment and the outcome are not confounded, has been a source of considerable controversy. We here present formulae for identifying under which circumstances biases are inflated or reduced. Read More

In this paper we propose a general framework for topic-specific summarization of large text corpora and illustrate how it can be used for the analysis of news databases. Our framework, concise comparative summarization (CCS), is built on sparse classification methods. CCS is a lightweight and flexible tool that offers a compromise between simple word frequency based methods currently in wide use and more heavyweight, model-intensive methods such as latent Dirichlet allocation (LDA). Read More

Affiliations: 1University of California, Berkeley; School of Information, 2University of California, Berkeley; Department of Statistics, 3University of California, Berkeley; Department of Statistics, 4Marin County, California; Registrar of Voters, 5Marin County, California; Registrar of Voters, 6Yolo County, California; County Clerk/Recorder, 7Santa Cruz County, California; County Clerk, 8Santa Cruz County, California; County Clerk, 9Yolo County, California; County Clerk/Recorder, 10Santa Cruz County, California; County Clerk

Risk-limiting post-election audits limit the chance of certifying an electoral outcome if the outcome is not what a full hand count would show. Building on previous work, we report on pilot risk-limiting audits in four elections during 2008 in three California counties: one during the February 2008 Primary Election in Marin County and three during the November 2008 General Elections in Marin, Santa Cruz and Yolo Counties. We explain what makes an audit risk-limiting and how existing and proposed laws fall short. Read More