Andrew Prout - MIT

Andrew Prout
Are you Andrew Prout?

Claim your profile, edit publications, add additional information:

Contact Details

Andrew Prout

Pubs By Year

Pub Categories

Computer Science - Distributed; Parallel; and Cluster Computing (10)
Computer Science - Databases (6)
Instrumentation and Methods for Astrophysics (2)
Computer Science - Computational Engineering; Finance; and Science (1)
Computer Science - Mathematical Software (1)
Quantitative Biology - Quantitative Methods (1)
Computer Science - Performance (1)
Computer Science - Cryptography and Security (1)

Publications Authored By Andrew Prout

In the rapidly expanding field of parallel processing, job schedulers are the "operating systems" of modern big data architectures and supercomputing systems. Job schedulers allocate computing resources and control the execution of processes on those resources. Historically, job schedulers were the domain of supercomputers, and job schedulers were designed to run massive, long-running computations over days and weeks. Read More

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. Read More

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. Read More

Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have diversified from long-running, synchronously-parallel simulations to include short-duration, independently parallel high performance data analysis (HPDA) jobs. Read More

HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Read More

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Numerous tools exist that allow users to store, query and index these massive quantities of data. Read More

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. Read More

High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is more critical than ever. Read More

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. Read More

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). Read More

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. Read More