# Smooth Image-on-Scalar Regression for Brain Mapping

Brain mapping is an increasingly important tool in neurology and psychiatry researches for the realization of data-driven personalized medicine in the big data era, which learns the statistical links between brain images and subject level features. Taking images as responses, the task raises a lot of challenges due to the high dimensionality of the image with relatively small number of samples, as well as the noisiness of measurements in medical images. In this paper we propose a novel method {\it Smooth Image-on-scalar Regression} (SIR) for recovering the true association between an image outcome and scalar predictors. The estimator is achieved by minimizing a mean squared error with a total variation (TV) regularization term on the predicted mean image across all subjects. It denoises the images from all subjects and at the same time returns the coefficient maps estimation. We propose an algorithm to solve this optimization problem, which is efficient when combined with recent advances in graph fused lasso solvers. The statistical consistency of the estimator is shown via an oracle inequality. Simulation results demonstrate that the proposed method outperforms existing methods with separate denoising and regression steps. Especially, SIR shows an evident advantage in recovering signals in small regions. We apply SIR on Alzheimer's Disease Neuroimaging Initiative data and produce interpretable brain maps of the PET image to patient-level features include age, gender, genotype and disease groups.

**Comments:**18 pages

## Similar Publications

We describe a simple and effective technique, the Eigenvector Method for Umbrella Sampling (EMUS), for accurately estimating small probabilities and expectations with respect to a given target probability density. In EMUS, we apply the principle of stratified survey sampling to Markov chain Monte Carlo (MCMC) simulation: We divide the support of the target distribution into regions called strata, we use MCMC to sample (in parallel) from probability distributions supported in each of the strata, and we weight the data from each stratum to assemble estimates of general averages with respect to the target distribution. We demonstrate by theoretical results and computational examples that EMUS can be dramatically more efficient than direct Markov chain Monte Carlo when the target distribution is multimodal or when the goal is to compute tail probabilities. Read More

Unwanted variation, including hidden confounding, is a well-known problem in many fields, particularly large-scale gene expression studies. Recent proposals to use control genes --- genes assumed to be unassociated with the covariates of interest --- have led to new methods to deal with this problem. Going by the moniker Removing Unwanted Variation (RUV), there are many versions --- RUV1, RUV2, RUV4, RUVinv, RUVrinv, RUVfun. Read More

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. We improved the computational efficiency of an earlier solution with a low-rank, Nystr\"om-like solution. Read More

We propose an adaptive confidence interval procedure (CIP) for the coefficients in the normal linear regression model. This procedure has a frequentist coverage rate that is constant as a function of the model parameters, yet provides smaller intervals than the usual interval procedure, on average across regression coefficients. The proposed procedure is obtained by defining a class of CIPs that all have exact $1-\alpha$ frequentist coverage, and then selecting from this class the procedure that minimizes a prior expected interval width. Read More

Quantile regression, the prediction of conditional quantiles, finds applications in various fields. Often, some or all of the variables are discrete. The authors propose two new quantile regression approaches to handle such mixed discrete-continuous data. Read More

There have been some major advances in the theory of optimal designs for interference models. However, the majority of them focus on one-dimensional layout of the block and the study for two-dimensional interference model is quite limited partly due to technical difficulties. This paper tries to fill this gap. Read More

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are assumed to degrade statistical performance relative to exact methods. In applied mathematics, where much of the current theoretical work on approximation resides, the inputs are considered to be observed exactly. Read More

Effect modification occurs when the effect of the treatment variable on an outcome varies according to the level of other covariates and often has important implications in decision making. When there are hundreds of covariates, it becomes necessary to use the observed data to select a simpler model for effect modification and then make appropriate statistical inference. A two stage procedure is proposed to solve this problem. Read More

Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid sub-Gaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv [J. Read More

Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely $P^2$ statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. Read More