Replicable Parallel Branch and Bound Search

Branch and bound searches are a common technique for solving global optimisation and decision problems, yet their irregularity, search order dependence, and the need to share bound information globally makes it challenging to implement them in parallel, and to reason about their parallel performance. We identify three key parallel search properties for replicable branch and bound implementations: Sequential Lower Bound, Non-increasing Runtimes, and Repeatability. We define a formal model for parallel branch and bound search problems and show its generality by using it to define three benchmarks: finding a Maximum Clique in a graph, 0/1 Knapsack and Travelling Salesperson (TSP). We present a Generic Branch and Bound search API that conforms to the model. For reusability we encapsulate the search behaviours as a pair of algorithmic based skeletons in a distributed-memory parallel Haskell. Crucially the Ordered skeleton is designed to guarantee the parallel search properties, potentially at a performance cost compared with the Unordered skeleton. We compare the sequential performance of the skeletons with a class leading C++ search implementation. We then use relative speedups to evaluate the skeletons for 40 benchmark instances on a cluster using 200 workers. The Ordered skeleton preserves the Sequential Lower Bound for all benchmark instances while the Unordered skeleton violates the property for 5 TSP instances. The Ordered skeleton preserves Non-increasing Runtimes for all benchmark instances while the Unordered skeleton violates the property for many instances of all three benchmarks. The Ordered skeleton delivers far more repeatable performance than the Unordered skeleton (Repeatability property) with a median relative standard deviation (RSD) of 1.78% vs 5.56%, 1.83% vs 87.56% and 1.96% vs 8.61% for all Maximum Clique, Knapsack and TSP instances respectively.

Comments: 38 pages, 12 figures, submitted to the Journal of Parallel and Distributed Computing

Similar Publications

Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems, and reports the work to improve the language scalability. Read More

We adapt a recent algorithm by Ghaffari [SODA'16] for computing a Maximal Independent Set in the LOCAL model, so that it works in the significantly weaker BEEP model. For networks with maximum degree $\Delta$, our algorithm terminates locally within time $O((\log \Delta + \log (1/\epsilon)) \cdot \log(1/\epsilon))$, with probability at least $1 - \epsilon$. The key idea of the modification is to replace explicit messages about transmission probabilities with estimates based on the number of received messages. Read More

Session types offer a type-based discipline for enforcing communication protocols in distributed programming. We have previously formalized simple session types in the setting of multi-threaded $\lambda$-calculus with linear types. In this work, we build upon our earlier work by presenting a form of dependent session types (of DML-style). Read More

ROOT provides an flexible format used throughout the HEP community. The number of use cases - from an archival data format to end-stage analysis - has required a number of tradeoffs to be exposed to the user. For example, a high "compression level" in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU time when read). Read More

In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed processor memory remains bounded independent of the processor number. In all stages of the AMR process, we use only distributed algorithms. Read More

In this paper, the fundamental problem of distribution and proactive caching of computing tasks in fog networks is studied under latency and reliability constraints. In the proposed scenario, computing can be executed either locally at the user device or offloaded to an edge cloudlet. Moreover, cloudlets exploit both their computing and storage capabilities by proactively caching popular task computation results to minimize computing latency. Read More

Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamically-partitioned cluster management mechanism and an utilization-fairness optimizer. Read More

This paper presents the pessimistic time complexity analysis of the parallel algorithm for minimizing the fleet size in the pickup and delivery problem with time windows. We show how to estimate the pessimistic complexity step by step. This approach can be easily adopted to other parallel algorithms for solving complex transportation problems. Read More

With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. Read More

In asynchronous distributed systems it is very hard to assess if one of the processes taking part in a computation is operating correctly or has failed. To overcome this problem, distributed algorithms are created using unreliable failure detectors that capture in an abstract way timing assumptions necessary to assess the operating status of a process. One particular type of failure detector is a leader election, that indicates a single process that has not failed. Read More