Gossiping with Latencies

Consider the classical problem of information dissemination: one (or more) nodes in a network have some information that they want to distribute to the remainder of the network. In this paper, we study the cost of information dissemination in networks where edges have latencies, i.e., sending a message from one node to another takes some amount of time. We first generalize the idea of conductance to weighted graphs, defining $\phi_*$ to be the "weighted conductance" and $\ell_*$ to be the "critical latency." One goal of this paper is to argue that $\phi_*$ characterizes the connectivity of a weighted graph with latencies in much the same way that conductance characterizes the connectivity of unweighted graphs. We give near tight upper and lower bounds on the problem of information dissemination, up to polylogarithmic factors. Specifically, we show that in a graph with (weighted) diameter $D$ (with latencies as weights), maximum degree $\Delta$, weighted conductance $\phi_*$ and critical latency $\ell_*$, any information dissemination algorithm requires at least $\Omega(\min(D+\Delta, \ell_*/\phi_*))$ time. We show several variants of the lower bound (e.g., for graphs with small diameter, graphs with small max-degree, etc.) by reduction to a simple combinatorial game. We then give nearly matching algorithms, showing that information dissemination can be solved in $O(\min((D + \Delta)\log^3{n}), (\ell_*/\phi_*)\log(n))$ time. This is achieved by combining two cases. When nodes do not know the latency of the adjacent edges, we show that the classical push-pull algorithm is (near) optimal when the diameter or maximum degree is large. For the case where the diameter and maximum degree are small, we give an alternative strategy in which we first discover the latencies and then use an algorithm for known latencies based on a weighted spanner construction.

Similar Publications

This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially be helpful for a wide-range of use-cases and with low-overhead. Read More

In this manuscript we present a fast GPU implementation for tomographic reconstruction of large datasets using data obtained at the Brazilian synchrotron light source. The algorithm is distributed in a cluster with 4 GPUs through a fast pipeline implemented in C programming language. Our algorithm is theoretically based on a recently discovered low complexity formula, computing the total volume within O(N3logN) floating point operations; much less than traditional algorithms that operates with O(N4) flops over an input data of size O(N3). Read More

The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (DSM) model supported in software transparently using C++ templated metaprogramming techniques. Read More

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Read More

Idle periods on different processes of Message Passing applications are unavoidable. While the origin of idle periods on a single process is well understood as the effect of system and architectural random delays, yet it is unclear how these idle periods propagate from one process to another. It is important to understand idle period propagation in Message Passing applications as it allows application developers to design communication patterns avoiding idle period propagation and the consequent performance degradation in their applications. Read More

Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of application performance on emerging hybrid-memory systems. Read More

Decentralized systems are a subset of distributed systems where multiple authorities control different components and no authority is fully trusted by all. This implies that any component in a decentralized system is potentially adversarial. We revise fifteen years of research on decentralization and privacy, and provide an overview of key systems. Read More

The computability power of a distributed computing model is determined by the communication media available to the processes, the timing assumptions about processes and communication, and the nature of failures that processes can suffer. In a companion paper we showed how dynamic epistemic logic can be used to give a formal semantics to a given distributed computing model, to capture precisely the knowledge needed to solve a distributed task, such as consensus. Furthermore, by moving to a dual model of epistemic logic defined by simplicial complexes, topological invariants are exposed, which determine task solvability. Read More

This paper considers the problem of decentralized optimization with a composite objective containing smooth and non-smooth terms. To solve the problem, a proximal-gradient scheme is studied. Specifically, the smooth and nonsmooth terms are dealt with by gradient update and proximal update, respectively. Read More

The model of population protocols refers to the growing in popularity theoretical framework suitable for studying pairwise interactions within a large collection of simple indistinguishable entities, frequently called agents. In this paper the emphasis is on the space complexity in fast leader election via population protocols governed by the random scheduler, which uniformly at random selects pairwise interactions within the population of n agents. The main result of this paper is a new fast and space optimal leader election protocol. Read More