# Qin Zhang

## Contact Details

NameQin Zhang |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Data Structures and Algorithms (19) Physics - Mesoscopic Systems and Quantum Hall Effect (7) Physics - Materials Science (3) Physics - Other (2) Computer Science - Databases (2) Computer Science - Computational Complexity (2) Physics - Instrumentation and Detectors (2) Computer Science - Distributed; Parallel; and Cluster Computing (1) Mathematics - Analysis of PDEs (1) Quantum Physics (1) Physics - Superconductivity (1) Mathematics - Operator Algebras (1) Mathematics - Numerical Analysis (1) Physics - Fluid Dynamics (1) Computer Science - Learning (1) Computer Science - Networking and Internet Architecture (1) Computer Science - Computer Vision and Pattern Recognition (1) Physics - Popular Physics (1) Physics - Optics (1) |

## Publications Authored By Qin Zhang

This letter adopts long short-term memory(LSTM) to predict sea surface temperature(SST), which is the first attempt, to our knowledge, to use recurrent neural network to solve the problem of SST prediction, and to make one week and one month daily prediction. We formulate the SST prediction problem as a time series regression problem. LSTM is a special kind of recurrent neural network, which introduces gate mechanism into vanilla RNN to prevent the vanished or exploding gradient problem. Read More

We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approximation ratio parameter $\alpha$. Read More

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication typically dominates the query processing time. Thus it becomes crucial to design communication efficient algorithms for queries on distributed data. Read More

Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site. Read More

We study the problem of edit similarity joins, where given a set of strings and a threshold value $K$, we need to output all the pairs of strings whose edit distances are at most $K$. Edit similarity join is a fundamental operation in numerous applications such as data cleaning/integration, bioinformatics, natural language processing, and has been identified as a primitive operator for database systems. This problem has been studied extensively in the literature. Read More

In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent data, and study the maximization problem over sliding windows. We provide a general reduction from the sliding window model to the standard streaming model, and thus our approach works for general constraints as long as there is a corresponding streaming algorithm in the standard streaming model. Read More

In this paper we study skyline queries in the distributed computational model, where we have $s$ remote sites and a central coordinator (the query node); each site holds a piece of data, and the coordinator wants to compute the skyline of the union of the $s$ datasets. The computation is in terms of rounds, and the goal is to minimize both the total communication cost and the round cost. Viewing data objects as points in the Euclidean space, we consider both the horizontal data partition case where each site holds a subset of points, and the vertical data partition case where each site holds one coordinate of all the points. Read More

Linear sketching algorithms have been widely used for processing large-scale distributed and streaming datasets. Their popularity is largely due to the fact that linear sketches can be naturally composed in the distributed model and be efficiently updated in the streaming model. The errors of linear sketches are typically expressed in terms of the sum of coordinates of the input vector excluding those largest ones, or, the mass on the tail of the vector. Read More

Forwarding information base (FIB) scalability is a fundamental problem of numerous new network architectures that propose to use location-independent network names. We propose Concise, a FIB design that uses very little memory to support fast query of a large number of location-independent names. Concise makes use of minimal perfect hashing and relies on the SDN framework and supports fast name classification. Read More

We show that in the document exchange problem, where Alice holds $x \in \{0,1\}^n$ and Bob holds $y \in \{0,1\}^n$, Alice can send Bob a message of size $O(K(\log^2 K+\log n))$ bits such that Bob can recover $x$ using the message and his input $y$ if the edit distance between $x$ and $y$ is no more than $K$, and output "error" otherwise. Both the encoding and decoding can be done in time $\tilde{O}(n+\mathsf{poly}(K))$. This result significantly improves the previous communication bounds under polynomial encoding/decoding time. Read More

We undertake a systematic study of sketching a quadratic form: given an $n \times n$ matrix $A$, create a succinct sketch $\textbf{sk}(A)$ which can produce (without further access to $A$) a multiplicative $(1+\epsilon)$-approximation to $x^T A x$ for any desired query $x \in \mathbb{R}^n$. While a general matrix does not admit non-trivial sketches, positive semi-definite (PSD) matrices admit sketches of size $\Theta(\epsilon^{-2} n)$, via the Johnson-Lindenstrauss lemma, achieving the "for each" guarantee, namely, for each query $x$, with a constant probability the sketch succeeds. (For the stronger "for all" guarantee, where the sketch succeeds for all $x$'s simultaneously, again there are no non-trivial sketches. Read More

In this paper we give a first set of communication lower bounds for distributed clustering problems, in particular, for k-center, k-median and k-means. When the input is distributed across a large number of machines and the number of clusters k is small, our lower bounds match the current best upper bounds up to a logarithmic factor. We have designed a new composition framework in our proofs for multiparty number-in-hand communication complexity which may be of independent interest. Read More

We study the problem of compressing a weighted graph $G$ on $n$ vertices, building a "sketch" $H$ of $G$, so that given any vector $x \in \mathbb{R}^n$, the value $x^T L_G x$ can be approximated up to a multiplicative $1+\epsilon$ factor from only $H$ and $x$, where $L_G$ denotes the Laplacian of $G$. One solution to this problem is to build a spectral sparsifier $H$ of $G$, which, using the result of Batson, Spielman, and Srivastava, consists of $O(n \epsilon^{-2})$ reweighted edges of $G$ and has the property that simultaneously for all $x \in \mathbb{R}^n$, $x^T L_H x = (1 \pm \epsilon) x^T L_G x$. The $O(n \epsilon^{-2})$ bound is optimal for spectral sparsifiers. Read More

The electrical performance of a tunnel field-effect transistor depends critically on the band offset at their semiconductor heterojunction interface. Historically, it has been difficult to experimentally determine how the electronic bands align at the heterojunction interface. We report here on experimental methods to ascertain a complete energy band alignment of a broken-gap tunnel field-effect transistor based on an InAs/GaSb hetero-junction. Read More

Modern data management systems often need to deal with massive, dynamic and inherently distributed data sources. We collect the data using a distributed network, and at the same time try to maintain a global view of the data at a central coordinator using a minimal amount of communication. Such applications have been captured by the distributed monitoring model which has attracted a lot of attention in recent years. Read More

A direct electronics printing technique through atomized spraying for patterning room temperature liquid metal droplets on desired substrate surfaces is proposed and experimentally demonstrated for the first time. This method has generalized purpose and is highly flexible and capable of fabricating electronic components on any desired target objects, with either flat or rough surfaces, made of different materials, or different orientations from 1-D to 3-D geometrical configurations. With a pre-designed mask, the liquid metal ink can be directly deposited on the substrate to form various specific patterns which lead to the rapid prototyping of electronic devices. Read More

A roller-ball pen enabled direct writing electronics via room temperature liquid metal ink was proposed. With the rolling to print mechanism, the metallic inks were smoothly written on flexible polymer substrate to form conductive tracks and electronic devices. The contact angle analyzer and scanning electron microscope were implemented to probe the inner property of the obtained electronics. Read More

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real $p, 1 \leq p < \infty$, given a matrix $M \in \mathbb{R}^{n \times d}$ with $n \gg d$, with constant probability we can choose a matrix $\Pi$ with $\max(1, n^{1-2/p}) \poly(d)$ rows and $n$ columns so that simultaneously for all $x \in \mathbb{R}^d$, $\|Mx\|_p \leq \|\Pi Mx\|_{\infty} \leq \poly(d) \|Mx\|_p.$ Importantly, $\Pi M$ can be computed in the optimal $O(\nnz(M))$ time, where $\nnz(M)$ is the number of non-zero entries of $M$. Read More

We consider a number of fundamental statistical and graph problems in the message-passing model, where we have $k$ machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the $k$ data sets. The communication is point-to-point, and the goal is to minimize the total communication among the $k$ machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Read More

The outstanding electrical and optical properties of graphene make it an excellent alternative as a transparent electrode. Here we demonstrate the application of graphene as collector material in internal photoemission (IPE) spectroscopy; enabling the direct observation of both electron and hole injections at a Si/Al2O3 interface and successfully overcoming the long-standing difficulty of detecting holes injected from a semiconductor emitter in IPE measurements. The observed electron and hole barrier heights are 3. Read More

We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008). In this model there are $k$ sites each tracking their input and communicating with a central coordinator that continuously maintain an approximate output to a function $f$ computed over the union of the inputs. The goal is to minimize the communication. Read More

Given a stream of items each associated with a numerical value, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing with respect to the numerical value. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the $w$ most recent items in the stream for any $w \ge 1$. Read More

We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the {\em count-tracking} problem, where there are $k$ players, each holding a counter $n_i$ that gets incremented over time, and the goal is to track an $\eps$-approximation of their sum $n=\sum_i n_i$ continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is $\Theta(k/\eps \cdot \log N)$, where $N$ is the final value of $n$ when the tracking finishes, we show that with randomization, the communication cost can be reduced to $\Theta(\sqrt{k}/\eps \cdot \log N)$. Read More

In this paper we prove lower bounds on randomized multiparty communication complexity, both in the \emph{blackboard model} (where each message is written on a blackboard for all players to see) and (mainly) in the \emph{message-passing model}, where messages are sent player-to-player. We introduce a new technique for proving such bounds, called \emph{symmetrization}, which is natural, intuitive, and often easy to use. For example, for the problem where each of $k$ players gets a bit-vector of length $n$, and the goal is to compute the coordinate-wise XOR of these vectors, we prove a tight lower bounds of $\Omega(nk)$ in the blackboard model. Read More

We explore the effects of metal contacts on the operation and scalability of 2D Graphene Field-Effect-Transistors (GFETs) using detailed numerical device simulations based on the non-equilibrium Green's function formalism self-consistently solved with the Poisson equation at the ballistic limit. Our treatment of metal-graphene (M-G) contacts captures: (1) the doping effect due to the shift of the Fermi level in graphene contacts, (2) the density-of-states (DOS) broadening effect inside graphene contacts due to Metal-Induced-States (MIS). Our results confirm the asymmetric transfer characteristics in GFETs due to the doping effect by metal contacts. Read More

We extend the noncommutative L1-maximal ergodic inequality for semifinite von Neumann algebras established by Yeadon in 1977 to the framework of noncommutative L1-spaces associated with sigma-finite von Neumann algebras. Since the semifnite case of this result is one of the two essential parts in the proof of noncommutative maximal ergodic inequality for tracial Lp-spaces (1

Read More

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation problems. This study is motivated by a goal of ultimately putting the MapReduce framework on an equal theoretical footing with the well-known PRAM and BSP parallel models, which would benefit both the theory and practice of MapReduce algorithms. We describe efficient MapReduce algorithms for sorting, multi-searching, and simulations of parallel algorithms specified in the BSP and CRCW PRAM models. Read More

A general solution for the electrostatic potential in an atomic-thin-body (ATB) field-effect transistor geometry is presented. The effective electrostatic scaling length, {\lambda}eff, is extracted from the analytical model, which cannot be approximated by the lowest order eigenmode as traditionally done in SOI-MOSFETs. An empirical equation for the scaling length that depends on the geometry parameters is proposed. Read More

We consider the {\em clustering with diversity} problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least $\ell$ points, all of which have distinct colors. We give a 2-approximation to this problem for any $\ell$ when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless $\mathbf{P=NP}$, by providing a matching lower bound. Read More

Current mesh reduction techniques, while numerous, all primarily reduce mesh size by successive element deletion (e.g. edge collapses) with the goal of geometric and topological feature preservation. Read More

We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let $A$ be a multiset of elements, drawn from the universe $U=\{1,. Read More

Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth gave an elegant analysis showing that with some simple collision resolution strategies such as linear probing or chaining, the expected average number of disk I/Os of a lookup is merely $1+1/2^{\Omega(b)}$, where each I/O can read a disk block containing $b$ items. Read More

A theory is developed for interband tunneling in semiconducting carbon nanotube and graphene nanoribbon p-n junction diodes. Characteristic length and energy scales that dictate the tunneling probabilities and currents are evaluated. By comparing the Zener tunneling processes in these structures to traditional group IV and III-V semiconductors, it is proved that for identical bandgaps, carbon based 1D structures have higher tunneling probabilities. Read More

We analyze several mechanisms leading to errors in a course of measurement of a superconducting flux-biased phase qubit. Insufficiently long measurement pulse may lead to nonadiabatic transitions between qubit states $|1>$ and $|0>$, before tunneling through a reduced barrier is supposed to distinguish the qubit states. Finite (though large) ratio of tunneling rates for these states leads to incomplete discrimination between $|1>$ and $|0>$. Read More

We have analyzed theoretically the operation of the Bayesian quantum feedback of a solid-state qubit, designed to maintain perfect coherent oscillations in the qubit for arbitrarily long time. In particular, we have studied the feedback efficiency in presence of dephasing environment and detector nonideality. Also, we have analyzed the effect of qubit parameter deviations and studied the quantum feedback control of an energy-asymmetric qubit. Read More

Pattern-matching-based document-compression systems (e.g. for faxing) rely on finding a small set of patterns that can be used to represent all of the ink in the document. Read More