Computer Science - Distributed; Parallel; and Cluster Computing Publications (50)


Computer Science - Distributed; Parallel; and Cluster Computing Publications

Matlab/Simulink is a wide-spread tool for model-based design of embedded systems. Supporting hierarchy, domain specific building blocks, functional simulation and automatic code-generation, makes it well-suited for the design of control and signal processing systems. In this work, we propose an automated translation methodology for a subset of Simulink models to Synchronous dataflow Graphs (SDFGs) including the automatic code-generation of SDF-compatible embedded code. Read More

Fog computing is a promising architecture to provide economic and low latency data services for future Internet of things (IoT)-based network systems. It relies on a set of low-power fog nodes that are close to the end users to offload the services originally targeting at cloud data centers. In this paper, we consider a specific fog computing network consisting of a set of data service operators (DSOs) each of which controls a set of fog nodes to provide the required data service to a set of data service subscribers (DSSs). Read More

ADMM is a popular algorithm for solving convex optimization problems. Applying this algorithm to distributed consensus optimization problem results in a fully distributed iterative solution which relies on processing at the nodes and communication between neighbors. Local computations usually suffer from different types of errors, due to e. Read More

Timing and power consumption play an important role in the design of embedded systems. Furthermore, both properties are directly related to the safety requirements of many embedded systems. With regard to availability requirements, power considerations are of uttermost importance for battery operated systems. Read More

In this paper we consider programmable matter that consists of computationally limited devices (which we call particles) that are able to self-organize in order to achieve a desired collective goal without the need for central control or external intervention. Particles can establish and release bonds and can actively move in a self-organized way. We investigate the feasibility of solving fundamental problems relevant for programmable matter. Read More

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. Read More

Time-triggered switched networks are a deterministic communication infrastructure used by real-time distributed embedded systems. Due to the criticality of the applications running over them, developers need to ensure that end-to-end communication is dependable and predictable. Traditional approaches assume static networks that are not flexible to changes caused by reconfigurations or, more importantly, faults, which are dealt with in the application using redundancy. Read More

In this work, we propose two-level space-time domain decomposition preconditioners for parabolic problems discretized using finite elements. They are motivated as an extension to space-time of balancing domain decomposition by constraints preconditioners. The key ingredients to be defined are the sub-assembled space and operator, the coarse degrees of freedom (DOFs) in which we want to enforce continuity among subdomains at the preconditioner level, and the transfer operator from the sub-assembled to the original finite element space. Read More

Obtaining good performance when programming heterogeneous computing platforms poses significant challenges. We present a program transformation environment, implemented in Haskell, where architecture-agnostic scientific C code with semantic annotations is transformed into functionally equivalent code better suited for a given platform. The transformation steps are represented as rules that can be fired when certain syntactic and semantic conditions are fulfilled. Read More

A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. Read More

Elasticity is one of the key features of cloud computing that attracts many SaaS providers to minimize their services' cost. Cost is minimized by automatically provision and release computational resources depend on actual computational needs. However, delay of starting up new virtual resources can cause Service Level Agreement violation. Read More

Scalability is an important characteristic of cloud computing. With scalability, cost is minimized by provisioning and releasing resources according to demand. Most of current Infrastructure as a Service (IaaS) providers deliver threshold-based auto-scaling techniques. Read More

Asynchronous parallel computing and sparse recovery are two areas that have received recent interest. Asynchronous algorithms are often studied to solve optimization problems where the cost function takes the form $\sum_{i=1}^M f_i(x)$, with a common assumption that each $f_i$ is sparse; that is, each $f_i$ acts only on a small number of components of $x\in\mathbb{R}^n$. Sparse recovery problems, such as compressed sensing, can be formulated as optimization problems, however, the cost functions $f_i$ are dense with respect to the components of $x$, and instead the signal $x$ is assumed to be sparse, meaning that it has only $s$ non-zeros where $s\ll n$. Read More

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2017. Stockholm, Sweden, January 25th. Collocated with HIPEAC 2017 Conference. Read More

This paper considers the recovery of group sparse signals over a multi-agent network, where the measurements are subject to sparse errors. We first investigate the robust group LASSO model and its centralized algorithm based on the alternating direction method of multipliers (ADMM), which requires a central fusion center to compute a global row-support detector. To implement it in a decentralized network environment, we then adopt dynamic average consensus strategies that enable dynamic tracking of the global row-support detector. Read More

Weighted finite automata and transducers (including hidden Markov models and conditional random fields) are widely used in natural language processing (NLP) to perform tasks such as morphological analysis, part-of-speech tagging, chunking, named entity recognition, speech recognition, and others. Parallelizing finite state algorithms on graphics processing units (GPUs) would benefit many areas of NLP. Although researchers have implemented GPU versions of basic graph algorithms, limited previous work, to our knowledge, has been done on GPU algorithms for weighted finite automata. Read More

We present PFDCMSS, a novel message-passing based parallel algorithm for mining time-faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Read More

Message broadcasting in networks could be carried over spanning trees. A set of spanning trees in the same network is node independent if two conditions are satisfied. First, all trees are rooted at node $r$. Read More

Vehicle-to-vehicle (V2V) communication is a crucial component of the future autonomous driving systems since it enables improved awareness of the surrounding environment, even without extensive processing of sensory information. However, V2V communication is prone to failures and delays, so a distributed fault-tolerant approach is required for safe and efficient transportation. In this paper, we focus on the intersection crossing (IC) problem with autonomous vehicles that cooperate via V2V communications, and propose a novel distributed IC algorithm that can handle an unknown number of communication failures. Read More

In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be significant. Read More


We introduce the Ants Nearby Treasure Search (ANTS) problem, which models natural cooperative foraging behavior such as that performed by ants around their nest. In this problem, k probabilistic agents, initially placed at a central location, collectively search for a treasure on the two-dimensional grid. The treasure is placed at a target location by an adversary and the agents' goal is to find it as fast as possible as a function of both k and D, where D is the (unknown) distance between the central location and the target. Read More

Highly-available datastores are widely deployed for online applications. However, many online applications are not contented with the simple data access interface currently provided by highly-available datastores. Distributed transaction support is demanded by applications such as large-scale online payment used by Alipay or Paypal. Read More

Kernel matrices appear in machine learning and non-parametric statistics. Given $N$ points in $d$ dimensions and a kernel function that requires $\mathcal{O}(d)$ work to evaluate, we present an $\mathcal{O}(dN\log N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with $\mathcal{O}(N\log N)$ work. Read More

This paper presents a comprehensive literature review on applications of economic and pricing models for resource management in cloud networking. To achieve sustainable profit advantage, cost reduction, and flexibility in provisioning of cloud resources, resource management in cloud networking requires adaptive and robust designs to address many issues, e.g. Read More

Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms, however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. Read More

Cyber-Physical Systems (CPS) revolutionize various application domains with integration and interoperability of networking, computing systems, and mechanical devices. Due to its scale and variety, CPS faces a number of challenges and opens up a few research questions in terms of management, fault-tolerance, and scalability. We propose a software-defined approach inspired by Software-Defined Networking (SDN), to address the challenges for a wider CPS adoption. Read More

Cloud Computing (CC) is a model for enabling on-demand access to a shared pool of configurable computing resources. Testing and evaluating the performance of the cloud environment for allocating, provisioning, scheduling, and data allocation policy have great attention to be achieved. Therefore, using cloud simulator would save time and money, and provide a flexible environment to evaluate new research work. Read More

We present a randomized distributed algorithm that in radio networks with collision detection broadcasts a single message in $O(D+\log^2 n)$ time slots, with high probability. In view of the lower-bound $\Omega(D+\log^2 n)$, our algorithm is optimal in the considered model answering the decades-old question of Alon, Bar-Noy, Linial and Peleg [JCSS 1991]. Read More

Sequence alignment algorithms are a basic and critical component of many bioinformatics fields. With rapid development of sequencing technology, the fast growing reference database volumes and longer length of query sequence become new challenges for sequence alignment. However, the algorithm is prohibitively high in terms of time and space complexity. Read More

In data centers, data replication is the primary method used to ensure availability of customer data. To avoid correlated failure, cloud storage infrastructure providers model hierarchical failure domains using a tree, and avoid placing a large number of data replicas within the same failure domain (i.e. Read More

Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets or bins, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often choose to implement multisplit with a sort. One way is to first generate an auxiliary array of bucket IDs and then sort input data based on it. Read More

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. Read More

Here, we present the concept of an open virtual prototyping framework for maritime systems and operations that enables its users to develop re-usable component or subsystem models, and combine them in full-system simulations for prototyping, verification, training, and performance studies. This framework consists of a set of guidelines for model coupling, high-level and low-level coupling interfaces to guarantee interoperability, a full-system simulation software, and example models and demonstrators. We discuss the requirements for such a framework, address the challenges and the possibilities in fulfilling them, and aim to give a list of best practices for modular and efficient virtual prototyping and full-system simulation. Read More

Novel hardware-aided trusted execution environments, as provided by Intel's Software Guard Extensions (SGX), enable to execute applications in a secure context that enforces confidentiality and integrity of the application state even when the host system is misbehaving. While this paves the way towards secure and trustworthy cloud computing, essential system support to protect persistent application state against rollback and forking attacks is missing. In this paper we present LCM - a lightweight protocol to establish a collective memory amongst all clients of a remote application to detect integrity and consistency violations. Read More

The purpose of this book is to help you program shared-memory parallel machines without risking your sanity. We hope that this book's design principles will help you avoid at least some parallel-programming pitfalls. That said, you should think of this book as a foundation on which to build, rather than as a completed cathedral. Read More

In an endeavor to reach the vision of ubiquitous computing where users are able to use pervasive services without spatial and temporal constraints, we are witnessing a fast growing number of mobile and sensor-enhanced devices becoming available. However, in order to take full advantage of the numerous benefits offered by novel mobile devices and services, we must address the related security issues. In this paper, we present results of a systematic literature review (SLR) on security-related topics in ubiquitous computing environments. Read More

Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine --- the brain. Read More

Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertex-centric and (2) block-centric approaches. Read More

The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and reordering vertex and edge identifiers. Read More

Cloud radio access networks (C-RAN) and Mobile Cloud Computing (MCC) have emerged as promising candidates for the next generation access network techniques. MCC enables resource limited mobile devices to offload computationally intensive tasks to the cloud, while C-RAN offers a technology that addresses the increasing mobile traffic. In this paper, we propose a protocol for task offloading and for managing resources in both C-RAN and mobile cloud together using a centralised controller. Read More

In this paper, we present a novel multi-objective ant colony system algorithm for virtual machine (VM) consolidation in cloud data centers. The proposed algorithm builds VM migration plans, which are then used to minimize over-provisioning of physical machines (PMs) by consolidating VMs on under-utilized PMs. It optimizes two objectives that are ordered by their importance. Read More

Distributed storage systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. This paper is focused on the analysis of the efficiency of replication mechanism that determines the location of the copies of a given file at some server. The variability of the loads of the nodes of the network is investigated for several policies. Read More

We study broadcasting on multiple access channels with dynamic packet arrivals and jamming. The communication environments is represented by adversarial models which specify constraints on packet arrivals and jamming. We consider deterministic distributed broadcast algorithms and give upper bounds on the worst-case packet latency and the number of queued packets in relation to the parameters defining adversaries. Read More

This paper presents a distributed approach that scales up to segment tree crowns within a LiDAR point cloud representing an arbitrarily large forested area. The approach uses a single-processor tree segmentation algorithm as a building block in order to process the data delivered in the shape of tiles in parallel. The distributed processing is performed in a master-slave manner, in which the master maintains the global map of the tiles and coordinates the slaves that segment tree crowns within and across the boundaries of the tiles. Read More

This manuscript serves as a correctness proof of the Hierarchical MCS locks with Timeout (HMCS-T) described in our paper titled "An Efficient Abortable-locking Protocol for Multi-level NUMA Systems" appearing in the proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. HMCS-T is a very involved protocol. The system is stateful; the values of prior acquisition efforts affect the subsequent acquisition efforts. Read More

In this paper, we identify a new form of attack, called the Balance attack, against proof-of-work blockchain systems. The novelty of this attack consists of delaying network communications between multiple subgroups of nodes with balanced mining power. Our theoretical analysis captures the precise tradeoff between the network delay and the mining power of the attacker needed to double spend in Ethereum with high probability. Read More

We consider a dynamical process in a network which distributes all tokens located at a node among its neighbors, in a round-robin manner. We show that in the recurrent state of this dynamics, the number of particles located on a given edge, averaged over an interval of time, is tightly concentrated around the average particle density in the system. Formally, for a system of $k$ particles in a graph of $m$ edges, during any interval of length $T$, this time-averaged value is $k/m \pm \widetilde{O}(1/T)$, whenever $\gcd(m,k) = \widetilde{O}(1)$ (and so, e. Read More

The cloud computing model is rapidly transforming the IT landscape. Cloud computing is a new computing paradigm that delivers computing resources as a set of reliable and scalable internet-based services allowing customers to remotely run and manage these services. Infrastructure-as-a-service (IaaS) is one of the popular cloud computing services. Read More

This paper considers a distributed convex optimization problem with inequality constraints over time-varying unbalanced digraphs, where the cost function is a sum of local objectives, and each node of the graph only knows its local objective and inequality constraints. Although there is a vast literature on distributed optimization, most of them require the graph to be balanced, which is quite restrictive and not necessary. Very recently, the unbalanced problem has been resolved only for either time-invariant graphs or unconstrained optimization. Read More

Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. Read More