# Alok Choudhary

## Contact Details

NameAlok Choudhary |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Distributed; Parallel; and Cluster Computing (4) Computer Science - Data Structures and Algorithms (3) Instrumentation and Methods for Astrophysics (2) Astrophysics of Galaxies (2) Computer Science - Computer Vision and Pattern Recognition (2) Physics - Computational Physics (1) Computer Science - Information Retrieval (1) Computer Science - Performance (1) Physics - Materials Science (1) |

## Publications Authored By Alok Choudhary

Many scientific data sets contain temporal dimensions. These are the data storing information at the same spatial location but different time stamps. Some of the biggest temporal datasets are produced by parallel computing applications such as simulations of climate change and fluid dynamics. Read More

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Read More

Connected Component Labeling (CCL) is an important step in pattern recognition and image processing. It assigns labels to the pixels such that adjacent pixels sharing the same features are assigned the same label. Typically, CCL requires several passes over the data. Read More

The "gravitational million-body problem," to model the dynamical evolution of a self-gravitating, collisional N-body system with N ~10^6 over many relaxation times, remains a major challenge in computational astrophysics. Unfortunately, current techniques to model such a system suffer from severe limitations. A direct N-body simulation with more than 10^5 particles can require months or even years to complete, while an orbit-sampling Monte Carlo approach cannot adequately treat the details of the core dynamics, particularly in the presence of many black holes. Read More

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. The circulant structure substantially reduces memory footprint and enables the use of the Fast Fourier Transform to speed up the computation. Considering a fully-connected neural network layer with d input nodes, and d output nodes, this method improves the time complexity from O(d^2) to O(dlogd) and space complexity from O(d^2) to O(d). Read More

The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with acceptable runtimes for certain classes of graphs, but many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques and is able to find maximum cliques in very large, sparse graphs quickly. Read More

The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, informatics, and many other areas. Although there exist several algorithms with acceptable runtimes for certain classes of graphs, many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques to very quickly find maximum cliques in large sparse graphs. Read More

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. Read More

Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application domains. However, the original interface does not provide an efficient mechanism for parallel data storage and access. Read More

With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement. Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods to perform these accesses lead to unacceptable performance for large data sets. Read More

Many scientific applications are I/O intensive and generate or access large data sets, spanning hundreds or thousands of "files." Management, storage, efficient access, and analysis of this data present an extremely challenging task. We have developed a software system, called Scientific Data Manager (SDM), that uses a combination of parallel file I/O and database support for high-performance scientific data management. Read More