Big Data in HEP: A comprehensive use case study

2017Mar
Affiliations: 1Fermi National Accelerator Laboratory, 2Fermi National Accelerator Laboratory, 3Princeton University, 4Fermi National Accelerator Laboratory, 5Fermi National Accelerator Laboratory, 6Princeton University, 7Fermi National Accelerator Laboratory, 8Fermi National Accelerator Laboratory now Johns Hopkins University, 9Princeton University, 10Fermi National Accelerator Laboratory

Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.

Comments: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016)

Similar Publications

We describe a high-performance implementation of the lattice-Boltzmann method (LBM) for sparse geometries on graphic processors. In our implementation we cover the whole geometry with a uniform mesh of small tiles and carry out calculations for each tile independently with a proper data synchronization at tile edges. For this method we provide both the theoretical analysis of complexity and the results for real implementations for 2D and 3D geometries. Read More


Snafu, or Snake Functions, is a modular system to host, execute and manage language-level functions offered as stateless (micro-)services to diverse external triggers. The system interfaces resemble those of commercial FaaS providers but its implementation provides distinct features which make it overall useful to research on FaaS and prototyping of FaaS-based applications. This paper argues about the system motivation in the presence of already existing alternatives, its design and architecture, the open source implementation and collected metrics which characterise the system. Read More


Graph spanners have been studied extensively, and have many applications in algorithms, distributed systems, and computer networks. For many of these application, we want distributed constructions of spanners, i.e. Read More


This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning method for reducing the privacy cost of information transmitted through a network. Distributed sensor networks are often used for automated classification and detection of abnormal events in high-stakes situations, e.g. Read More


In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. Read More


The singular value decomposition (SVD) is a widely used matrix factorization tool which underlies plenty of useful applications, e.g. recommendation system, abnormal detection and data compression. Read More


These are the proceedings of the 14th International Workshop on Formal Engineering approaches to Software Components and Architectures (FESCA). The workshop was held on April 22, 2017 in Uppsala (Sweden) as a satellite event to the European Joint Conference on Theory and Practice of Software (ETAPS'17). The aim of the FESCA workshop is to bring together junior researchers from formal methods, software engineering, and industry interested in the development and application of formal modelling approaches as well as associated analysis and reasoning techniques with practical benefits for software engineering. Read More


The multiway rendezvous introduced in Theoretical CSP is a powerful paradigm to achieve synchronization and communication among a group of (possibly more than two) processes. We illustrate the advantages of this paradigm on the production cell benchmark, a model of a real metal processing plant, for which we propose a compositional software controller, which is written in LNT and LOTOS, and makes intensive use of the multiway rendezvous. Read More


This paper focuses on a passivity-based distributed reference governor (RG) applied to a pre-stabilized mobile robotic network. The novelty of this paper lies in the method used to solve the RG problem, where a passivity-based distributed optimization scheme is proposed. In particular, the gradient descent method minimizes the global objective function while the dual ascent method maximizes the Hamiltonian. Read More


As technology proceeds and the number of smart devices continues to grow substantially, need for ubiquitous context-aware platforms that support interconnected, heterogeneous, and distributed network of devices has given rise to what is referred today as Internet-of-Things. However, paving the path for achieving aforementioned objectives and making the IoT paradigm more tangible requires integration and convergence of different knowledge and research domains, covering aspects from identification and communication to resource discovery and service integration. Through this chapter, we aim to highlight researches in topics including proposed architectures, security and privacy, network communication means and protocols, and eventually conclude by providing future directions and open challenges facing the IoT development. Read More