GPU parallel simulation algorithm of Brownian particles with excluded volume using Delaunay triangulations

A novel parallel simulation algorithm on the GPU, implemented in CUDA and C++, is presented for the simulation of Brownian particles that display excluded volume repulsion and interact with long and short range forces. When an explicit Euler-Maruyama integration step is performed to take into account the pairwise forces and Brownian motion, particle overlaps can appear. The excluded volume property brings up the need for correcting these overlaps as they happen, since predicting them is not feasible due to the random displacement of Brownian particles. The proposed solution handles, at each time step, a Delaunay triangulation of the particle positions because it allows us to efficiently solve overlaps between particles by checking just their neighborhood. The algorithm starts by generating a Delaunay triangulation of the particle initial positions on CPU, but after that the triangulation is always kept on GPU memory. We used a parallel edge-flip implementation to keep the triangulation updated during each time step, checking previously that the triangulation was not rendered invalid due to the particle displacements. The algorithm is validated with two models of active colloidal particles. Upon testing the parallel implementation of a long range forces simulation, the results show a performance improvement of up to two orders of magnitude when compared to the previously existing sequential solution.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than the one in the PDF file

Similar Publications

With the surge of multi- and manycores, much research has focused on algorithms for mapping and scheduling on these complex platforms. Large classes of these algorithms face scalability problems. This is why diverse methods are commonly used for reducing the search space. Read More


In asynchronous distributed systems it is very hard to assess if one of the processes taking part in a computation is operating correctly or has failed. To overcome this problem, distributed algorithms are created using unreliable failure detectors that capture in an abstract way timing assumptions necessary to assess the operating status of a process. One particular type of failure detector is a leader election, that indicates a single process that has not failed. Read More


The celebrated Time Hierarchy Theorem for Turing machines states, informally, that more problems can be solved given more time. The extent to which a time hierarchy-type theorem holds in the distributed LOCAL model has been open for many years. It is consistent with previous results that all natural problems in the LOCAL model can be classified according to a small constant number of complexities, such as $O(1),O(\log^* n), O(\log n), 2^{O(\sqrt{\log n})}$, etc. Read More


Boolean networks is a well-established formalism for modelling biological systems. A vital challenge for analysing a Boolean network is to identify all the attractors. This becomes more challenging for large asynchronous Boolean networks, due to the asynchronous updating scheme. Read More


Grids allow users flexible on-demand usage of computing resources through remote communication networks. A remarkable example of a Grid in High Energy Physics (HEP) research is used in the ALICE experiment at European Organization for Nuclear Research CERN. Physicists can submit jobs used to process the huge amount of particle collision data produced by the Large Hadron Collider (LHC). Read More


The $CONGEST$ model for distributed network computing is well suited for analyzing the impact of limiting the throughput of a network on its capacity to solve tasks efficiently. For many "global" problems there exists a lower bound of $\Omega(D + \sqrt{n/B})$, where $B$ is the amount of bits that can be exchanged between two nodes in one round of communication, $n$ is the number of nodes and $D$ is the diameter of the graph. Typically, upper bounds are given only for the case $B=O(\log n)$, or for the case $B = +\infty$. Read More


We consider the problem of routing in presence of faults in undirected weighted graphs. More specifically, we focus on the design of compact name-independent fault-tolerant routing schemes, where the designer of the scheme is not allowed to assign names to nodes, i.e. Read More


On the one hand, the correctness of routing protocols in networks is an issue of utmost importance for guaranteeing the delivery of messages from any source to any target. On the other hand, a large collection of routing schemes have been proposed during the last two decades, with the objective of transmitting messages along short routes, while keeping the routing tables small. Regrettably, all these schemes share the property that an adversary may modify the content of the routing tables with the objective of, e. Read More


The paper is devoted to an analytical study of the "master-worker" framework scalability on multiprocessors with distributed memory. A new model of parallel computations called BSF is proposed. The BSF model is based on BSP and SPMD models. Read More


In the context of distributed synchronous computing, processors perform in rounds, and the time-complexity of a distributed algorithm is classically defined as the number of rounds before all computing nodes have output. Hence, this complexity measure captures the running time of the slowest node(s). In this paper, we are interested in the running time of the ordinary nodes, to be compared with the running time of the slowest nodes. Read More