# James E. Johndrow

## Contact Details

NameJames E. Johndrow |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesMathematics - Statistics (5) Statistics - Theory (5) Statistics - Methodology (4) Statistics - Computation (2) Mathematics - Probability (1) Computer Science - Computational Complexity (1) Statistics - Machine Learning (1) Statistics - Applications (1) Computer Science - Learning (1) |

## Publications Authored By James E. Johndrow

There has been substantial recent interest in record linkage, attempting to group the records pertaining to the same entities from a large database lacking unique identifiers. This can be viewed as a type of "microclustering," with few observations per cluster and a very large number of clusters. A variety of methods have been proposed, but there is a lack of literature providing theoretical guarantees on performance. Read More

Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing or augmenting human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions they produce will be equally free from prejudice. Read More

Data augmentation is a common technique for building tuning-free Markov chain Monte Carlo algorithms. Although these algorithms are very popular, autocorrelations are often high in large samples, leading to poor computational efficiency. This phenomenon has been attributed to a discrepancy between Gibbs step sizes and the rate of posterior concentration. Read More

Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions they produce will be equally free from prejudice. Read More

Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the population. Previous studies have suggested that it is not possible to reliably estimate the total population size when capture heterogeneity exists. Read More

Many modern applications collect large sample size and highly imbalanced categorical data, with some categories being relatively rare. Bayesian hierarchical models are well motivated in such settings in providing an approach to borrow information to combat data sparsity, while quantifying uncertainty in estimation. However, a fundamental problem is scaling up posterior computation to massive sample sizes. Read More

In applications where extreme dependence at different spatial locations is of interest, data are almost always time-indexed. When extremes do not occur contemporaneously, existing methods for inference and modeling in this setting often choose window sizes or introduce dependence in parameters with the goal of preserving temporal information. We propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence at different locations and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds at different locations. Read More

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Read More

The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. Read More

Contingency table analysis routinely relies on log linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a low rank tensor factorization of the probability mass function for multivariate categorical data, while log linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. Read More