Computer Science - Data Structures and Algorithms Publications (50)


Computer Science - Data Structures and Algorithms Publications

We study the strong duality of non-convex matrix factorization: we show under certain dual conditions, non-convex matrix factorization and its dual have the same optimum. This has been well understood for convex optimization, but little was known for matrix factorization. We formalize the strong duality of matrix factorization through a novel analytical framework, and show that the duality gap is zero for a wide class of matrix factorization problems. Read More

We show that the integrality gap of the bidirected cut relaxation for the Steiner tree problem is at most 6/5 via a primal-dual schema based algorithm. Read More

The study of Dense-$3$-Subhypergraph problem was initiated in Chlamt{\'{a}}c et al. [Approx'16]. The input is a universe $U$ and collection ${\cal S}$ of subsets of $U$, each of size $3$, and a number $k$. Read More

Finding central nodes is a fundamental problem in network analysis. Betweenness centrality is a well-known measure which quantifies the importance of a node based on the fraction of shortest paths going though it. Due to the dynamic nature of many today's networks, algorithms that quickly update centrality scores have become a necessity. Read More

Re-Pair is an efficient grammar compressor that operates by recursively replacing high-frequency character pairs with new grammar symbols. The most space-efficient linear-time algorithm computing Re-Pair uses $(1+\epsilon)n+\sqrt n$ words on top of the re-writable text (of length $n$ and stored in $n$ words), for any constant $\epsilon>0$; in practice however, this solution uses complex sub-procedures preventing it from being practical. In this paper, we present an implementation of the above-mentioned result making use of more practical solutions; our tool further improves the working space to $(1. Read More

The paper develops a new technique to extract a characteristic subset from a random source that repeatedly samples from a set of elements. Here a characteristic subset is a set that when containing an element contains all elements that have the same probability. With this technique at hand the paper looks at the special case of the tournament isomorphism problem that stands in the way towards a polynomial-time algorithm for the graph isomorphism problem. Read More

Through the development of efficient algorithms, data structures and preprocessing techniques, real-world shortest path problems in street networks are now very fast to solve. But in reality, the exact travel times along each arc in the network may not be known. This lead to the development of robust shortest path problems, where all possible arc travel times are contained in a so-called uncertainty set of possible outcomes. Read More

For a positive parameter $\beta$, the $\beta$-bounded distance between a pair of vertices $u,v$ in a weighted undirected graph $G = (V,E,\omega)$ is the length of the shortest $u-v$ path in $G$ with at most $\beta$ edges, aka {\em hops}. For $\beta$ as above and $\epsilon>0$, a {\em $(\beta,\epsilon)$-hopset} of $G = (V,E,\omega)$ is a graph $G' =(V,H,\omega_H)$ on the same vertex set, such that all distances in $G$ are $(1+\epsilon)$-approximated by $\beta$-bounded distances in $G\cup G'$. Hopsets are a fundamental graph-theoretic and graph-algorithmic construct, and they are widely used for distance-related problems in a variety of computational settings. Read More

We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approximation ratio parameter $\alpha$. Read More

A novel landmark-based oracle (CFLAT) is presented, which provides earliest-arrival-time route plans in time-dependent road networks. To our knowledge, this is the first oracle that preprocesses combinatorial structures (collections of time-stamped min-travel-time-path trees) rather than travel-time functions. The preprocessed data structure is exploited by a new query algorithm (CFCA) which computes (and pays for it), apart from earliest-arrival-time estimations, the actual connecting path that preserves the theoretical approximation guarantees. Read More

In data-parallel computing frameworks, intermediate parallel data is often produced at various stages which needs to be transferred among servers in the datacenter network (e.g. the shuffle phase in MapReduce). Read More

We consider relative error low rank approximation of {\it tensors} with respect to the Frobenius norm: given an order-$q$ tensor $A \in \mathbb{R}^{\prod_{i=1}^q n_i}$, output a rank-$k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT, where OPT $= \inf_{\textrm{rank-}k~A'} \|A-A'\|_F^2$. Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors. One structural issue is that there may be no rank-$k$ tensor $A_k$ achieving the above infinum. Read More

We introduce a new dynamic data structure for maintaining the strongly connected components (SCCs) of a directed graph (digraph) under edge deletions, so as to answer a rich repertoire of connectivity queries. Our main technical contribution is a decremental data structure that supports sensitivity queries of the form "are $ u $ and $ v $ strongly connected in the graph $ G \setminus w $?", for any triple of vertices $ u, v, w $, while $ G $ undergoes deletions of edges. Our data structure processes a sequence of edge deletions in a digraph with $n$ vertices in $O(m n \log{n})$ total time and $O(n^2 \log{n})$ space, where $m$ is the number of edges before any deletion, and answers the above queries in constant time. Read More

We study the problem of finding the cycle of minimum cost-to-time ratio in a directed graph with $ n $ nodes and $ m $ edges. This problem has a long history in combinatorial optimization and has recently seen interesting applications in the context of quantitative verification. We focus on strongly polynomial algorithms to cover the use-case where the weights are relatively large compared to the size of the graph. Read More

We say that an algorithm is stable if small changes in the input result in small changes in the output. Algorithm stability plays an important role when analyzing and visualizing time-varying data. However, so far, there are only few theoretical results on the stability of algorithms, possibly due to a lack of theoretical analysis tools. Read More

We consider the family of $\Phi$-Subset problems, where the input consists of an instance $I$ of size $N$ over a universe $U_I$ of size $n$ and the task is to check whether the universe contains a subset with property $\Phi$ (e.g., $\Phi$ could be the property of being a feedback vertex set for the input graph of size at most $k$). Read More

We consider the problem of querying a string (or, a database) of length $N$ bits to determine all the locations where a substring (query) of length $M$ appears either exactly or is within a Hamming distance of $K$ from the query. We assume that sketches of the original signal can be computed off line and stored. Using the sparse Fourier transform computation based approach introduced by Pawar and Ramchandran, we show that all such matches can be determined with high probability in sub-linear time. Read More

Small depth networks arise in a variety of network related applications, often in the form of maximum flow and maximum weighted matching. Recent works have generalized such methods to include costs arising from concave functions. In this paper we give an algorithm that takes a depth $D$ network and strictly increasing concave weight functions of flows on the edges and computes a $(1 - \epsilon)$-approximation to the maximum weight flow in time $mD \epsilon^{-1}$ times an overhead that is logarithmic in the various numerical parameters related to the magnitudes of gradients and capacities. Read More

We consider the problem of summarizing a multi set of elements in $\{1, 2, \ldots , n\}$ under the constraint that no element appears more than $\ell$ times. The goal is then to answer \emph{rank} queries --- given $i\in\{1, 2, \ldots , n\}$, how many elements in the multi set are smaller than $i$? --- with an additive error of at most $\Delta$ and in constant time. For this problem, we prove a lower bound of $\mathcal B_{\ell,n,\Delta}\triangleq$ $\left\lfloor{\frac{n}{\left\lceil{\Delta / \ell}\right\rceil}}\right\rfloor $ $\log\big({\max\{\left\lfloor{\ell / \Delta}\right\rfloor,1\} + 1}\big)$ bits and provide a \emph{succinct} construction that uses $\mathcal B_{\ell,n,\Delta}(1+o(1))$ bits. Read More

Achieving the goals in the title (and others) relies on a cardinality-wise scanning of the ideals of the poset. Specifically, the relevant numbers attached to the k+1 element ideals are inferred from the corresponding numbers of the k-element (order) ideals. Crucial in all of this is a compressed representation (using wildcards) of the ideal lattice. Read More

Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e. Read More

The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. Read More

In a weighted sequence, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or uncertain data, for example, in molecular biology where they are known under the name of Position-Weight Matrices. Given a probability threshold $\frac1z$, we say that a string $P$ of length $m$ matches a weighted sequence $X$ at starting position $i$ if the product of probabilities of the letters of $P$ at positions $i,\ldots,i+m-1$ in $X$ is at least $\frac1z$. Read More

We consider the well-studied Hospital Residents (HR) problem in the presence of lower quotas (LQ). The input instance consists of a bipartite graph $G = (\mathcal{R} \cup \mathcal{H}, E)$ where $\mathcal{R}$ and $\mathcal{H}$ denote sets of residents and hospitals respectively. Every vertex has a preference list that imposes a strict ordering on its neighbors. Read More

In this paper, we consider a coverage problem for uncertain points in a tree. Let T be a tree containing a set P of n (weighted) demand points, and the location of each demand point P_i\in P is uncertain but is known to appear in one of m_i points on T each associated with a probability. Given a covering range \lambda, the problem is to find a minimum number of points (called centers) on T to build facilities for serving (or covering) these demand points in the sense that for each uncertain point P_i\in P, the expected distance from P_i to at least one center is no more than $\lambda$. Read More

String Kernel (SK) techniques, especially those using gapped $k$-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size ($\Sigma$) or allow more mismatches ($M$). This is because current gk-SK uses a trie-based algorithm to calculate co-occurrence of mismatched substrings resulting in a time cost proportional to $O(\Sigma^{M})$. Read More

Osborne's iteration is a method for balancing $n\times n$ matrices which is widely used in linear algebra packages, as balancing preserves eigenvalues and stabilizes their numeral computation. The iteration can be implemented in any norm over $\mathbb{R}^n$, but it is normally used in the $L_2$ norm. The choice of norm not only affects the desired balance condition, but also defines the iterated balancing step itself. Read More

Betweenness centrality is an important index widely used in different domains such as social networks, traffic networks and the world wide web. However, even for mid-size networks that have only a few hundreds thousands vertices, it is computationally expensive to compute exact betweenness scores. Therefore in recent years, several approximate algorithms have been developed. Read More

Monotonic surfaces spanning finite regions of $Z^d$ arise in many contexts, including DNA-based self-assembly, card-shuffling and lozenge tilings. One method that has been used to uniformly generate these surfaces is a Markov chain that iteratively adds or removes a single cube below the surface during a step. We consider a biased version of the chain, where we are more likely to add a cube than to remove it, thereby favoring surfaces that are "higher" or have more cubes below it. Read More

Symbolic regression is an important but challenging research topic in data mining. It can detect the underlying mathematical models. Genetic programming (GP) is one of the most popular methods for symbolic regression. Read More

For a fixed collection of graphs ${\cal F}$, the ${\cal F}$-M-DELETION problem consists in, given a graph $G$ and an integer $k$, decide whether there exists $S \subseteq V(G)$ with $|S| \leq k$ such that $G \setminus S$ does not contain any of the graphs in ${\cal F}$ as a minor. We are interested in the parameterized complexity of ${\cal F}$-M-DELETION when the parameter is the treewidth of $G$, denoted by $tw$. Our objective is to determine, for a fixed ${\cal F}$, the smallest function $f_{{\cal F}}$ such that ${\cal F}$-M-DELETION can be solved in time $f_{{\cal F}}(tw) \cdot n^{O(1)}$ on $n$-vertex graphs. Read More

We give algorithms with running time $2^{O({\sqrt{k}\log{k}})} \cdot n^{O(1)}$ for the following problems. Given an $n$-vertex unit disk graph $G$ and an integer $k$, decide whether $G$ contains (1) a path on exactly/at least $k$ vertices, (2) a cycle on exactly $k$ vertices, (3) a cycle on at least $k$ vertices, (4) a feedback vertex set of size at most $k$, and (5) a set of $k$ pairwise vertex-disjoint cycles. For the first three problems, no subexponential time parameterized algorithms were previously known. Read More

Disjoint-Set forests, consisting of Union-Find trees, are data structures having a widespread practical application due to their efficiency. Despite them being well-known, no exact structural characterization of these trees is known (such a characterization exists for Union trees which are constructed without using path compression) for the case assuming union-by-rank strategy for merging. In this paper we provide such a characterization by means of a simple push operation and show that the decision problem whether a given tree (along with the rank info of its nodes) is a Union-Find tree is NP-complete, complementing our earlier similar result for the union-by-size strategy. Read More

We introduce and investigate reroutable flows, a robust version of network flows in which link failures can be mitigated by rerouting the affected flow. Given a capacitated network, a path flow is reroutable if after failure of an arbitrary arc, we can reroute the interrupted flow from the tail of that arc to the sink, without modifying the flow that is not affected by the failure. Similar types of restoration, which are often termed "local", were previously investigated in the context of network design, such as min-cost capacity planning. Read More

We consider the problem of online Min-cost Perfect Matching with Delays (MPMD) introduced by Emek et al. (STOC 2016). In this problem, an even number of requests appear in a metric space at different times and the goal of an online algorithm is to match them in pairs. Read More

The interval subset sum problem (ISSP) is a generalization of the well-known subset sum problem. Given a set of intervals $\left\{[a_{i,1},a_{i,2}]\right\}_{i=1}^n$ and a target integer $T,$ the ISSP is to find a set of integers, at most one from each interval, such that their sum best approximates the target $T$ but cannot exceed it. In this paper, we first study the computational complexity of the ISSP. Read More

Let $G=(V,E)$ be a graph with $n$ vertices and $m$ edges, with a designated set of $\sigma$ sources $S\subseteq V$. The fault tolerant subgraph for any graph problem maintains a sparse subgraph $H$ of $G$, such that for any set $F$ of $k$ failures, the solution for the graph problem on $G\setminus F$ is maintained in $H\setminus F$. We address the problem of maintaining a fault tolerant subgraph for Breath First Search tree (BFS) of the graph from a single source $s\in V$ (referred as $k$ FT-BFS) or multiple sources $S\subseteq V$ (referred as $k$ FT-MBFS). Read More

A recently proposed exact algorithm for the maximum independent set problem is analyzed. The typical running time is improved exponentially in some parameter regions compared to simple binary search. The algorithm also overcomes the core transition point, where the conventional leaf removal algorithm fails, and works up to the replica symmetry breaking (RSB) transition point. Read More

In this paper, we consider the problems for covering multiple intervals on a line. Given a set B of m line segments (called "barriers") on a horizontal line L and another set S of n horizontal line segments of the same length in the plane, we want to move all segments of S to L so that their union covers all barriers and the maximum movement of all segments of S is minimized. Previously, an O(n^3 log n)-time algorithm was given for the problem but only for the special case m = 1. Read More

We investigate the Robust Multiperiod Network Design Problem, a generalization of the classical Capacitated Network Design Problem that additionally considers multiple design periods and provides solutions protected against traffic uncertainty. Given the intrinsic difficulty of the problem, which proves challenging even for state-of-the art commercial solvers, we propose a hybrid primal heuristic based on the combination of ant colony optimization and an exact large neighborhood search. Computational experiments on a set of realistic instances from the SNDlib show that our heuristic can find solutions of extremely good quality with low optimality gap. Read More

The problem of ranking a set of items is fundamental in today's data-driven world. Ranking algorithms lie at the core of applications such as search engines, news feeds, and recommendation systems. However, recent events have pointed to the fact that algorithmic bias in rankings, which results in decreased fairness or diversity in the type of content presented, can promote stereotypes and propagate injustices. Read More

In this paper, we mathematically model the multi-hop Peer-to-Peer (P2P) ride-matching problem as a binary program. We formulate this problem as a many-to-many problem in which a rider can travel by transferring between multiple drivers, and a driver can carry multiple riders. We propose a pre-processing procedure to reduce the size of the problem, and devise a decomposition algorithm to solve the original ride-matching problem to optimality by means of solving multiple smaller problems. Read More

We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false positive rate. Read More

We study quantum algorithms on search trees of unknown structure, in a model where the tree can be discovered by local exploration. That is, we are given the root of the tree and access to a black box which, given a vertex $v$, outputs the children of $v$. We construct a quantum algorithm which, given such access to a search tree of depth at most $n$, estimates the size of the tree $T$ within a factor of $1\pm \delta$ in $\tilde{O}(\sqrt{nT})$ steps. Read More

It has long been known that Feedback Vertex Set can be solved in time $2^{\mathcal{O}(w\log w)}n^{\mathcal{O}(1)}$ on graphs of treewidth $w$, but it was only recently that this running time was improved to $2^{\mathcal{O}(w)}n^{\mathcal{O}(1)}$, that is, to single-exponential parameterized by treewidth. We investigate which generalizations of Feedback Vertex Set can be solved in a similar running time. Formally, for a class of graphs $\mathcal{P}$, the Bounded $\mathcal{P}$-Block Vertex Deletion problem asks, given a graph $G$ on $n$ vertices and positive integers $k$ and $d$, whether $G$ contains a set $S$ of at most $k$ vertices such that each block of $G-S$ has at most $d$ vertices and is in $\mathcal{P}$. Read More

This paper presents the pessimistic time complexity analysis of the parallel algorithm for minimizing the fleet size in the pickup and delivery problem with time windows. We show how to estimate the pessimistic complexity step by step. This approach can be easily adopted to other parallel algorithms for solving complex transportation problems. Read More

In insertion-only streaming, one sees a sequence of indices $a_1, a_2, \ldots, a_m\in [n]$. The stream defines a sequence of $m$ frequency vectors $x^{(1)},\ldots,x^{(m)}\in\mathbb{R}^n$ with $(x^{(t)})_i = |\{j : j\in[t], a_j = i\}|$. That is, $x^{(t)}$ is the frequency vector after seeing the first $t$ items in the stream. Read More

Base station cooperation (BSC) has recently arisen as a promising way to increase the capacity of a wireless network. Implementing BSC adds a new design dimension to the classical wireless network design problem: how to define the subset of base stations (clusters) that coordinate to serve a user. Though the problem of forming clusters has been extensively discussed from a technical point of view, there is still a lack of effective optimization models for its representation and algorithms for its solution. Read More

We show that by restricting the degrees of the vertices of a graph to an arbitrary set $ \Delta $, the threshold point $ \alpha(\Delta) $ of the phase transition for a random graph with $ n $ vertices and $ m = \alpha(\Delta) n $ edges can be either accelerated (e.g., $ \alpha(\Delta) \approx 0. Read More