Computer Science - Data Structures and Algorithms Publications (50)

Search

Computer Science - Data Structures and Algorithms Publications

If f is a Boolean function given by a BDD then it is well known how to calculate the number of models (i.e. bitstrings x with f(x)=1). Read More


Let $(\{1,2,\ldots,n\},d)$ be a metric space. We analyze the expected value and the variance of $\sum_{i=1}^{\lfloor n/2\rfloor}\,d({\boldsymbol{\pi}}(2i-1),{\boldsymbol{\pi}}(2i))$ for a uniformly random permutation ${\boldsymbol{\pi}}$ of $\{1,2,\ldots,n\}$, leading to the following results: (I) Consider the problem of finding a point in $\{1,2,\ldots,n\}$ with the minimum sum of distances to all points. We show that this problem has a randomized algorithm that (1) always outputs a $(2+\epsilon)$-approximate solution in expected $O(n/\epsilon^2)$ time and that (2) inherits Indyk's~\cite{Ind99, Ind00} algorithm to output a $(1+\epsilon)$-approximate solution in $O(n/\epsilon^2)$ time with probability $\Omega(1)$, where $\epsilon\in(0,1)$. Read More


In this paper we present a new error bound on sampling algorithms for frequent itemsets mining. We show that the new bound is asymptotically tighter than the state-of-art bounds, i.e. Read More


In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in $\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n, \log(1/\delta)\log^2(\frac{n}{\log(1/\delta)})\})$ bits for failure probability $\delta$. Read More


Let $G$ be an $n$-node simple directed planar graph with nonnegative edge weights. We study the fundamental problems of computing (1) a global cut of $G$ with minimum weight and (2) a~cycle of $G$ with minimum weight. The best previously known algorithm for the former problem, running in $O(n\log^3 n)$ time, can be obtained from the algorithm of \Lacki, Nussbaum, Sankowski, and Wulff-Nilsen for single-source all-sinks maximum flows. Read More


We initiate the study of distance-sensitive hashing, a generalization of locality-sensitive hashing that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space $(X, \text{dist})$ and a "collision probability function" (CPF) $f\colon \mathbb{R}\rightarrow [0,1]$ we seek a distribution over pairs of functions $(h,g)$ such that for every pair of points $x, y \in X$ the collision probability is $\Pr[h(x)=g(y)] = f(\text{dist}(x,y))$. Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. Read More


The Local Computation Algorithms (LCA) model is a computational model aimed at problem instances with huge inputs and output. For graph problems, the input graph is accessed using probes: strong probes (SP) specify a vertex $v$ and receive as a reply a list of $v$'s neighbors, and weak probes (WP) specify a vertex $v$ and a port number $i$ and receive as a reply $v$'s $i^{th}$ neighbor. Given a local query (e. Read More


This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. Read More


In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. Read More


Graph spanners have been studied extensively, and have many applications in algorithms, distributed systems, and computer networks. For many of these application, we want distributed constructions of spanners, i.e. Read More


We study the problem of constructing synthetic graphs that resemble real-world directed graphs in terms of their degree correlations. We define the problem of directed 2K construction (D2K) that takes as input the directed degree sequence (DDS) and a joint degree and attribute matrix (JDAM) so as to capture degree correlation specifically in directed graphs. We provide necessary and sufficient conditions to decide whether a target D2K is realizable, and we design an efficient algorithm that creates realizations with that target D2K. Read More


This paper introduces and approximately solves a multi-component problem where small rectangular items are produced from large rectangular bins via guillotine cuts. An item is characterized by its width, height, due date, and earliness and tardiness penalties per unit time. Each item induces a cost that is proportional to its earliness and tardiness. Read More


In the Tree Augmentation problem we are given a tree $T=(V,F)$ and an additional set $E \subseteq V \times V$ of edges, called "links", with positive integer costs $\{c_e:e \in E\}$. The goal is to augment $T$ by a minimum cost set of links $J \subseteq E$ such that $T \cup J$ is $2$-edge-connected. Let $M$ denote the maximum cost of a link. Read More


The two-dimensional non-oriented bin packing problem with due dates packs a set of rectangular items, which may be rotated by 90 degrees, into identical rectangular bins. The bins have equal processing times. An item's lateness is the difference between its due date and the completion time of its bin. Read More


In this paper we analyze the practical implications of Szemer\'edi's regularity lemma in the preservation of metric information contained in large graphs. To this end, we present a heuristic algorithm to find regular partitions. Our experiments show that this method is quite robust to the natural sparsification of proximity graphs. Read More


In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. Read More


This study investigates whether reoptimization can help in solving the closest substring problem. We are dealing with the following reoptimization scenario. Suppose, we have an optimal l-length closest substring of a given set of sequences S. Read More


Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering spectral properties of the underlying matrix from a sampling of its entries. Read More


In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). Read More


Distance-based indices, including closeness centrality, average path length, eccentricity and average eccentricity, are important tools for network analysis. In these indices, the distance between two vertices is measured by the size of shortest paths between them. However, this measure has shortcomings. Read More


Various real-life planning problems require making upfront decisions before all parameters of the problem have been disclosed. An important special case of such problem especially arises in scheduling and staff rostering problems, where a set of tasks needs to be assigned to an available set of resources (personnel or machines), in a way that each task is assigned to one resource, while no task is allowed to share a resource with another task. In its nominal form, the resulting computational problem reduces to the well-known assignment problem that can be modeled as matching problems on bipartite graphs. Read More


A common problem in large-scale data analysis is to approximate a matrix using a combination of specifically sampled rows and columns, known as CUR decomposition. Unfortunately, in many real-world environments, the ability to sample specific individual rows or columns of the matrix is limited by either system constraints or cost. In this paper, we consider matrix approximation by sampling predefined blocks of columns (or rows) from the matrix. Read More


In a seminal paper of Charikar et al.~on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors. Here we improve the lower bound for the famous {\sf RePair} algorithm from $\Omega(\sqrt{\log n})$ to $\Omega(\log n/\log\log n)$. Read More


In this work we present the first practical $\left(\frac{1}{e}-\epsilon\right)$-approximation algorithm to maximise a general non-negative submodular function subject to a matroid constraint. Our algorithm is based on combining the decreasing-threshold procedure of Badanidiyuru and Vondrak (SODA 2014) with a smoother version of the measured continuous greedy algorithm of Feldman et al. (FOCS 2011). Read More


In this paper, we investigate the parametric weight knapsack problem, in which the item weights are affine functions of the form $w_i(\lambda) = a_i + \lambda \cdot b_i$ for $i \in \{1,\ldots,n\}$ depending on a real-valued parameter $\lambda$. The aim is to provide a solution for all values of the parameter. It is well-known that any exact algorithm for the problem may need to output an exponential number of knapsack solutions. Read More


Ortho-Radial drawings are a generalization of orthogonal drawings to grids that are formed by concentric circles and straight-line spokes emanating from the circles' center. Such drawings have applications in schematic graph layouts, e.g. Read More


We introduce the Connection Scan Algorithm (CSA) to efficiently answer queries to timetable information systems. The input consists, in the simplest setting, of a source position and a desired target position. The output consist is a sequence of vehicles such as trains or buses that a traveler should take to get from the source to the target. Read More


Finding maximum-cardinality matchings in undirected graphs is arguably one of the most central graph problems. For general m-edge and n-vertex graphs, it is well-known to be solvable in $O(m \sqrt{n})$ time. We develop the first linear-time algorithm to find maximum-cardinality matchings on cocomparability graphs, a prominent subclass of perfect graphs that contains interval graphs as well as permutation graphs. Read More


In this brief paper, we go through the theoretical steps of the spectral clustering on quantum computers by employing the phase estimation and the amplitude amplification algorithms. To speed-up the amplitude amplification, we introduce a biased version of the phase estimation algorithm. In addition, when the circuit representation of a data matrix of order $N$ is produced through an ancilla based circuit in which the matrix is written as a sum of $L$ number of Householder matrices; we show that the computational complexity of the whole process is bound by $O(c2^mLN)$ number of quantum gates. Read More


Given a traveling salesman problem (TSP) tour $H$ in graph $G$ a $k$-move is an operation which removes $k$ edges from $H$, and adds $k$ edges of $G$ so that a new tour $H'$ is formed. The popular $k$-OPT heuristics for TSP finds a local optimum by starting from an arbitrary tour $H$ and then improving it by a sequence of $k$-moves. Until 2016, the only known algorithm to find an improving $k$-move for a given tour was the naive solution in time $O(n^k)$. Read More


This paper severs as a user guide to the mapping framework VieM (Vienna Mapping and Sparse Quadratic Assignment). We give a rough overview of the techniques used within the framework and describe the user interface as well as the file formats used. Read More


We consider $k$ mobile agents of limited energy that are initially located at vertices of an edge-weighted graph $G$ and have to collectively deliver data from a source vertex $s$ to a target vertex $t$. The data are to be collected by an agent reaching $s$, who can carry and then hand them over another agent etc., until some agent with the data reaches $t$. Read More


Constructing a sparse \emph{spanning subgraph} is a fundamental primitive in graph theory. In this paper, we study this problem in the Centralized Local model, where the goal is to decide whether an edge is part of the spanning subgraph by examining only a small part of the input; yet, answers must be globally consistent and independent of prior queries. Unfortunately, maximally sparse spanning subgraphs, i. Read More


We study the problem of testing unateness of functions $f:\{0,1\}^d \to \mathbb{R}.$ We give a $O(\frac{d}{\epsilon} \cdot \log\frac{d}{\epsilon})$-query nonadaptive tester and a $O(\frac{d}{\epsilon})$-query adaptive tester and show that both testers are optimal for a fixed distance parameter $\epsilon$. Previously known unateness testers worked only for Boolean functions, and their query complexity had worse dependence on the dimension both for the adaptive and the nonadaptive case. Read More


Log-linear models are arguably the most successful class of graphical models for large-scale applications because of their simplicity and tractability. Learning and inference with these models require calculating the partition function, which is a major bottleneck and intractable for large state spaces. Importance Sampling (IS) and MCMC-based approaches are lucrative. Read More


For a (possibly infinite) fixed family of graphs F, we say that a graph G overlays F on a hypergraph H if V(H) is equal to V(G) and the subgraph of G induced by every hyperedge of H contains some member of F as a spanning subgraph.While it is easy to see that the complete graph on |V(H)| overlays F on a hypergraph H whenever the problem admits a solution, the Minimum F-Overlay problem asks for such a graph with the minimum number of edges.This problem allows to generalize some natural problems which may arise in practice. Read More


Deciding whether a given graph has a square root is a classical problem that has been studied extensively both from graph theoretic and from algorithmic perspectives. The problem is NP-complete in general, and consequently substantial effort has been dedicated to deciding whether a given graph has a square root that belongs to a particular graph class. There are both polynomial-time solvable and NP-complete cases, depending on the graph class. Read More


In evolutionary biology, phylogenetic networks are constructed to represent the evolution of species in which reticulate events are thought to have occurred, such as recombination and hybridization. It is therefore useful to have efficiently computable metrics with which to systematically compare such networks. Through developing an optimal algorithm to enumerate all trinets displayed by a level-1 network (a type of network that is slightly more general than an evolutionary tree), here we propose a cubic-time algorithm to compute the trinet distance between two level-1 networks. Read More


The constrained LCS problem asks one to find a longest common subsequence of two input strings $A$ and $B$ with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string $C$ as a substring. Given two strings $A$ and $B$ of respective lengths $M$ and $N$, and a constraint string $C$ of length at most $\min\{M, N\}$, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz~({\em Inf. Read More


The Planar Graph Metric Compression Problem is to compactly encode the distances among $k$ nodes in a planar graph of size $n$. Two na\"ive solutions are to store the graph using $O(n)$ bits, or to explicitly store the distance matrix with $O(k^2 \log{n})$ bits. The only lower bounds are from the seminal work of Gavoille, Peleg, Prennes, and Raz [SODA'01], who rule out compressions into a polynomially smaller number of bits, for {\em weighted} planar graphs, but leave a large gap for unweighted planar graphs. Read More


The Container Relocation Problem (CRP) is concerned with finding a sequence of moves of containers that minimizes the number of relocations needed to retrieve all containers, while respecting a given order of retrieval. However, the assumption of knowing the full retrieval order of containers is particularly unrealistic in real operations. This paper studies the stochastic CRP (SCRP), which relaxes this assumption. Read More


Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification~\cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown that it is possible to compute $k$ minwise hashes, of a vector with $d$ nonzeros, in mere $(d + k)$ computations, a significant improvement over the classical $O(dk)$. These advances have led to an algorithmic improvement in the query complexity of traditional indexing algorithms based on minwise hashing. Read More


Given a rectilinear domain $\mathcal{P}$ of $h$ pairwise-disjoint rectilinear obstacles with a total of $n$ vertices in the plane, we study the problem of computing bicriteria rectilinear shortest paths between two points $s$ and $t$ in $\mathcal{P}$. Three types of bicriteria rectilinear paths are considered: minimum-link shortest paths, shortest minimum-link paths, and minimum-cost paths where the cost of a path is a non-decreasing function of both the number of edges and the length of the path. The one-point and two-point path queries are also considered. Read More


In this work, we study theoretical models of \emph{programmable matter} systems. The systems under consideration consist of spherical modules, kept together by magnetic forces and able to perform two minimal mechanical operations (or movements): \emph{rotate} around a neighbor and \emph{slide} over a line. In terms of modeling, there are $n$ nodes arranged in a 2-dimensional grid and forming some initial \emph{shape}. Read More


A graph is $k$-connected if it has $k$ internally-disjoint paths between every pair of nodes. A subset $S$ of nodes in a graph $G$ is a $k$-connected set if the subgraph $G[S]$ induced by $S$ is $k$-connected; $S$ is an $m$-dominating set if every $v \in V \setminus S$ has at least $m$ neighbors in $S$. If $S$ is both $k$-connected and $m$-dominating then $S$ is a $k$-connected $m$-dominating set, or $(k,m)$-cds for short. Read More


We provide a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multi-dimensional and continuous type spaces. The key technical barrier preventing exact incentive compatibility in prior black-box reductions is that repairing violations of incentive constraints requires understanding the distribution of the mechanism's output. Read More


We study data structures for storing a set of polygonal curves in ${\rm R}^d$ such that, given a query curve, we can efficiently retrieve similar curves from the set, where similarity is measured using the discrete Fr\'echet distance or the dynamic time warping distance. To this end we devise the first locality-sensitive hashing schemes for these distance measures. A major challenge is posed by the fact that these distance measures internally optimize the alignment between the curves. Read More


Several papers have achieved time $O(\sqrt n m)$ for cardinality matching, starting from first principles. This results in a long derivation. We simplify the task by employing well-known concepts for maximum weight matching. Read More