Computer Science - Data Structures and Algorithms Publications (50)


Computer Science - Data Structures and Algorithms Publications

In this paper, we study the problem of constructing a network by observing ordered connectivity constraints, which we define herein. These ordered constraints are made to capture realistic properties of real-world problems that are not reflected in previous, more general models. We give hardness of approximation results and nearly-matching upper bounds for the offline problem, and we study the online problem in both general graphs and restricted sub-classes. Read More

We resolve a number of long-standing open problems in online graph coloring. More specifically, we develop tight lower bounds on the performance of online algorithms for fundamental graph classes. An important contribution is that our bounds also hold for randomized online algorithms, for which hardly any results were known. Read More

Bipartite matching, where agents on one side of a market are matched to agents or items on the other, is a classical problem in computer science and economics, with widespread application in healthcare, education, advertising, and general resource allocation. A practitioner's goal is typically to maximize a matching market's economic efficiency, possibly subject to some fairness requirements that promote equal access to resources. A natural balancing act exists between fairness and efficiency in matching markets, and has been the subject of much research. Read More

We show that the problem of finding an optimal bundle-pricing for a single additive buyer is #P-hard, even when the distributions have support size 2 for each item and the optimal solution is guaranteed to be a simple one: the seller picks a price for the grand bundle and a price for each individual item; the buyer can purchase either the grand bundle at the given price or any bundle of items at their total individual prices. We refer to this simple and natural family of pricing schemes as discounted item-pricings. In addition to the hardness result, we show that when the distributions are i. Read More

While greedy algorithms have long been observed to perform well on a wide variety of problems, up to now approximation ratios have only been known for their application to problems having submodular objective functions $f$. Since many practical problems have non-submodular $f$, there is a critical need to devise new techniques to bound the performance of greedy algorithms in the case of non-submodularity. Our primary contribution is the introduction of a novel technique for estimating the approximation ratio of the greedy algorithm for maximization of monotone non-decreasing functions based on the curvature of $f$ without relying on the submodularity constraint. Read More

We design approximation algorithms for Unique Games when the constraint graph admits good low diameter graph decomposition. %We study the Unique Games problem when the constraint graph is a $K_r$-minor free graph. For the ${\sf Max2Lin}_k$ problem in $K_r$-minor free graphs, when there is an assignment satisfying $1-\varepsilon$ fraction of constraints, we present an algorithm that produces an assignment satisfying $1-O(r\varepsilon)$ fraction of constraints, with the approximation ratio independent of the alphabet size. Read More

Given a set of $n$ points $P$ in the plane, the first layer $L_1$ of $P$ is formed by the points that appear on $P$'s convex hull. In general, a point belongs to layer $L_i$, if it lies on the convex hull of the set $P \setminus \bigcup_{jRead More

For each integer $n$ we present an explicit formulation of a compact linear program, with $O(n^3)$ variables and constraints, which determines the satisfiability of any 2SAT formula with $n$ boolean variables by a single linear optimization. This contrasts with the fact that the natural polytope for this problem, formed from the convex hull of all satisfiable formulas and their satisfying assignments, has superpolynomial extension complexity. Our formulation is based on multicommodity flows. Read More

A semiorder is a model of preference relations where each element $x$ is associated with a utility value $\alpha(x)$, and there is a threshold $t$ such that $y$ is preferred to $x$ iff $\alpha(y) > \alpha(x)+t$. These are motivated by the notion that there is some uncertainty in the utility values we assign an object or that a subject may be unable to distinguish a preference between objects whose values are close. However, they fail to model the well-known phenomenon that preferences are not always transitive. Read More

Listing all triangles in an undirected graph is a fundamental graph primitive with numerous applications. It is trivially solvable in time cubic in the number of vertices. It has seen a significant body of work contributing to both theoretical aspects (e. Read More

Hyperbolicity measures, in terms of (distance) metrics, how close a given graph is to being a tree. Due to its relevance in modeling real-world networks, hyperbolicity has seen intensive research over the last years. Unfortunately, the best known algorithms for computing the hyperbolicity number of a graph (the smaller, the more tree-like) have running time $O(n^4)$, where $n$ is the number of graph vertices. Read More

We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. Read More

In an effort to increase the versatility of finite element codes, we explore the possibility of automatically creating the Jacobian matrix necessary for the gradient-based solution of nonlinear systems of equations. Particularly, we aim to assess the feasibility of employing the automatic differentiation tool TAPENADE for this purpose on a large Fortran codebase that is the result of many years of continuous development. As a starting point we will describe the special structure of finite element codes and the implications that this code design carries for an efficient calculation of the Jacobian matrix. Read More

We consider the NP-hard Tree Containment problem that has important applications in phylogenetics. The problem asks if a given leaf-labeled network contains a subdivision of a given leaf-labeled tree. We develop a fast algorithm for the case that the input network is indeed a tree in which multiple leaves might share a label. Read More

Understanding the influence of a product is crucially important for making informed business decisions. This paper introduces a new type of skyline queries, called uncertain reverse skyline, for measuring the influence of a probabilistic product in uncertain data settings. More specifically, given a dataset of probabilistic products P and a set of customers C, an uncertain reverse skyline of a probabilistic product q retrieves all customers c in C which include q as one of their preferred products. Read More

The {\em maximum duo-preservation string mapping} ({\sc Max-Duo}) problem is the complement of the well studied {\em minimum common string partition} ({\sc MCSP}) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-{\sc Max-Duo} is the restricted version of {\sc Max-Duo}, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known {\em maximum independent set} ({\sc MIS}) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-{\sc Max-Duo} can then be approximated arbitrarily close to $1. Read More

We obtain the first polynomial-time algorithm for exact tensor completion that improves over the bound implied by reduction to matrix completion. The algorithm recovers an unknown 3-tensor with $r$ incoherent, orthogonal components in $\mathbb R^n$ from $r\cdot \tilde O(n^{1.5})$ randomly observed entries of the tensor. Read More

We give faster algorithms for producing sparse approximations of the transition matrices of $k$-step random walks on undirected, weighted graphs. These transition matrices also form graphs, and arise as intermediate objects in a variety of graph algorithms. Our improvements are based on a better understanding of processes that sample such walks, as well as tighter bounds on key weights underlying these sampling processes. Read More

Online algorithms process their inputs piece by piece, taking irrevocable decisions for each data item. This model is too restrictive for most partitioning problems, since data that is yet to arrive may render it impossible to extend partial partitionings to the entire data set reasonably well. In this work, we show that preemption might be a potential remedy. Read More

For MSO$_2$-expressible problems like Edge Dominating Set or Hamiltonian Cycle, it was open for a long time whether there is an algorithm which given a clique-width $k$-expression of an $n$-vertex graph runs in time $f(k) \cdot n^{\mathcal{O}(1)}$ for some function $f$. Recently, Fomin et al. (\emph{SIAM. Read More

This paper proposes an alternative way to identify nodes with high betweenness centrality. It introduces a new metric, k-path centrality, and a randomized algorithm for estimating it, and shows empirically that nodes with high k-path centrality have high node betweenness centrality. The randomized algorithm runs in time $O(\kappa^{3}n^{2-2\alpha}\log n)$ and outputs, for each vertex v, an estimate of its k-path centrality up to additive error of $\pm n^{1/2+ \alpha}$ with probability $1-1/n^2$. Read More

The present paper deals with the discrete inverse problem of reconstructing binary matrices from their row and column sums under additional constraints on the number and pattern of entries in specified minors. While the classical consistency and reconstruction problems for two directions in discrete tomography can be solved in polynomial time, it turns out that these window constraints cause various unexpected complexity jumps back and forth from polynomial-time solvability to $\mathbb{N}\mathbb{P}$-hardness. Read More

We study the following version of cut sparsification. Given a large edge-weighted network $G$ with $k$ terminal vertices, compress it into a small network $H$ with the same terminals, such that the minimum cut in $H$ between every bipartition of the terminals approximates the corresponding one in $G$ within factor $q\geq 1$, called the quality. We provide two new insights about the structure of cut sparsifiers, and then apply them to obtain improved cut sparsifiers (and data structures) for planar graphs. Read More

Binary search finds a given element in a sorted array with an optimal number of $\log n$ queries. However, binary search fails even when the array is only slightly disordered or access to its elements is subject to errors. We study the worst-case query complexity of search algorithms that are robust to imprecise queries and that adapt to perturbations of the order of the elements. Read More

Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable $X_i$ is represented by $\ell$ nodes (where $\ell$ is the number of labels) arranged in a column. However, this method in general requires $2\,\ell^2$ edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. Read More

In this paper we initiate the study of whether or not sparse estimation tasks can be performed efficiently in high dimensions, in the robust setting where an $\eps$-fraction of samples are corrupted adversarially. We study the natural robust version of two classical sparse estimation problems, namely, sparse mean estimation and sparse PCA in the spiked covariance model. For both of these problems, we provide the first efficient algorithms that provide non-trivial error guarantees in the presence of noise, using only a number of samples which is similar to the number required for these problems without noise. Read More

We provide evidence that computing the maximum flow value between every pair of nodes in a directed graph on $n$ nodes, $m$ edges,and capacities in the range $[1..n]$, which we call the All-Pairs Max-Flow problem, cannot be solved in time that is faster significantly (i. Read More

We show how to find canonical representations for circular-arc (CA) graphs by computing certain subsets of vertices called flip sets. A flip set enables one to convert a CA graph into an interval matrix in a reversible way. Since canonical representations for interval matrices can be computed in logspace this essentially means that the problem of finding canonical representations for CA graphs is logspace-reducible to computing 'canonical' flip sets. Read More

A celebrated technique for finding near neighbors for the angular distance involves using a set of \textit{random} hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of \textit{orthogonal} hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions. Read More

Vertex Separation Minimization Problem (VSMP) consists of finding a layout of a graph G = (V,E) which minimizes the maximum vertex cut or separation of a layout. It is an NP-complete problem in general for which metaheuristic techniques can be applied to find near optimal solution. VSMP has applications in VLSI design, graph drawing and computer language compiler design. Read More

Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of \emph{adaptive} testing algorithms, wherein each query may be determined by the answers received to prior queries, and their \emph{non-adaptive} counterparts, in which all queries are independent of answers obtained from previous queries. In this work, we investigate the role of adaptivity in property testing at a finer level. Read More

Given an $n \times d$ matrix $A$, its Schatten-$p$ norm, $p \geq 1$, is defined as $\|A\|_p = \left (\sum_{i=1}^{\textrm{rank}(A)}\sigma_i(A)^p \right )^{1/p}$, where $\sigma_i(A)$ is the $i$-th largest singular value of $A$. These norms have been studied in functional analysis in the context of non-commutative $\ell_p$-spaces, and recently in data stream and linear sketching models of computation. Basic questions on the relations between these norms, such as their embeddability, are still open. Read More

Nowadays, various sensors are collecting, storing and transmitting tremendous trajectory data, and it is known that raw trajectory data seriously wastes the storage, network band and computing resource. Line simplification (LS) algorithms are an effective approach to attacking this issue by compressing data points in a trajectory to a set of continuous line segments, and are commonly used in practice. However, existing LS algorithms are not sufficient for the needs of sensors in mobile devices. Read More

We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. Read More

Given a graph, the sparsest cut problem asks for a subset of vertices whose edge expansion (the normalized cut given by the subset) is minimized. In this paper, we study a generalization of this problem seeking for $ k $ disjoint subsets of vertices (clusters) whose all edge expansions are small and furthermore, the number of vertices remained in the exterior of the subsets (outliers) is also small. We prove that although this problem is $ NP-$hard for trees, it can be solved in polynomial time for all weighted trees, provided that we restrict the search space to subsets which induce connected subgraphs. Read More

LCLs or locally checkable labelling problems (e.g. maximal independent set, maximal matching, and vertex colouring) in the LOCAL model of computation are very well-understood in cycles (toroidal 1-dimensional grids): every problem has a complexity of $O(1)$, $\Theta(\log^* n)$, or $\Theta(n)$, and the design of optimal algorithms can be fully automated. Read More

Collaborative filtering is a broad and powerful framework for building recommendation systems that has seen widespread adoption. Over the past decade, the propensity of such systems for favoring popular products and thus creating echo chambers have been observed. This has given rise to an active area of research that seeks to diversify recommendations generated by such algorithms. Read More

The time complexity of data clustering has been viewed as fundamentally quadratic, slowing with the number of data items, as each item is compared for similarity to preceding items. Clustering of large data sets has been infeasible without resorting to probabilistic methods or to capping the number of clusters. Here we introduce MIMOSA, a novel class of algorithms which achieve linear time computational complexity on clustering tasks. Read More

Cell nuclei segmentation is one of the most important tasks in the analysis of biomedical images. With ever-growing sizes and amounts of three-dimensional images to be processed, there is a need for better and faster segmentation methods. Graph-based image segmentation has seen a rise in popularity in recent years, but is seen as very costly with regard to computational demand. Read More

Computational topology is an area that revisits topological problems from an algorithmic point of view, and develops topological tools for improved algorithms. We survey results in computational topology that are concerned with graphs drawn on surfaces. Typical questions include representing surfaces and graphs embedded on them computationally, deciding whether a graph embeds on a surface, solving computational problems related to homotopy, optimizing curves and graphs on surfaces, and solving standard graph algorithm problems more efficiently in the case of surface-embedded graphs. Read More

Betweenness is a well-known centrality measure that ranks the nodes according to their participation in the shortest paths of a network. In several scenarios, having a high betweenness can have a positive impact on the node itself. Hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. Read More

We study the propagation of comparative ideas in social network. A full characterization for submodularity in the comparative independent cascade (Com-IC) model of two-idea cascade is given, for competing ideas and complementary ideas respectively. We further introduce One-Shot model where agents show less patience toward ideas, and show that in One-Shot model, only the stronger idea spreads with submodularity. Read More

Understanding the interactions between different combinatorial optimisation problems in real-world applications is a challenging task. Recently, the traveling thief problem (TTP), as a combination of the classical traveling salesperson problem and the knapsack problem, has been introduced to study these interactions in a systematic way. We investigate the underlying non-linear packing while traveling (PWT) problem of the TTP where items have to be selected along a fixed route. Read More

Calude et al. have given the first algorithm for solving parity games in quasi-polynomial time, where previously the best algorithms were mildly subexponential. We combine the succinct counting technique of Calude et al. Read More

In this paper, we study a very general type of online network design problem, and generalize two different previous algorithms, one for an online network design problem due to Berman and Coulston [4] and one for (offline) general network design problems due to Goemans and Williamson [9]; we give an O(log k)-competitive algorithm, where k is the number of nodes that must be connected. We also consider a further generalization of the problem that allows us to pay penalties in exchange for violating connectivity constraints; we give an online O(log k)-competitive algorithm for this case as well. Read More

Most STM systems are poorly equipped to support libraries of concurrent data structures. One reason is that they typically detect conflicts by tracking transactions' read sets and write sets, an approach that often leads to false conflicts. A second is that existing data structures and libraries often need to be rewritten from scratch to support transactional conflict detection and rollback. Read More

Maximum flow is a fundamental problem in Combinatorial Optimization that has numerous applications in both theory and practice. In this paper, we study the flow network simplification problem, which asks to remove all the useless arcs from the graph. To be precise, an arc is useless if it does not participate in any simple s, t-path. Read More

Community detection in networks is a very actual and important field of research with applications in many areas. But, given that the amount of processed data increases more and more, existing algorithms need to be adapted for very large graphs. The objective of this project was to parallelise the Synchronised Louvain Method, a community detection algorithm developed by Arnaud Browet, in order to improve its performances in terms of computation time and thus be able to faster detect communities in very large graphs. Read More

We present a simple deterministic single-pass $(2+\epsilon)$-approximation algorithm for the maximum weight matching problem in the semi-streaming model. This improves upon the currently best known approximation ratio of $(3.5+\epsilon)$. Read More

We study the space complexity of querying languages over data streams in the sliding window model. The algorithm has to answer at any point of time whether the content of the sliding window belongs to a fixed regular language. For regular languages, a trichotomy is shown: For every regular language the optimal space requirement is asymptotically either constant, logarithmic, or linear in the size of the sliding window. Read More