Dongyan Zhao

Dongyan Zhao
Are you Dongyan Zhao?

Claim your profile, edit publications, add additional information:

Contact Details

Dongyan Zhao

Pubs By Year

Pub Categories

Computer Science - Computation and Language (6)
Computer Science - Databases (4)
Computer Science - Distributed; Parallel; and Cluster Computing (2)
Computer Science - Learning (1)
Computer Science - Data Structures and Algorithms (1)
Computer Science - Human-Computer Interaction (1)
Computer Science - Artificial Intelligence (1)
Computer Science - Neural and Evolutionary Computing (1)
Computer Science - Information Retrieval (1)

Publications Authored By Dongyan Zhao

Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction. Read More

Sentence simplification reduces semantic complexity to benefit people with language impairments. Previous simplification studies on the sentence level and word level have achieved promising results but also meet great challenges. For sentence-level studies, sentences after simplification are fluent but sometimes are not really simplified. Read More

Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent users' query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Read More

Open-domain human-computer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for open-domain dialog systems; researchers usually resort to human annotation for model evaluation, which is time- and labor-intensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user utterance). Read More

Open-domain human-computer conversation has attracted much attention in the field of NLP. Contrary to rule- or template-based domain-specific dialog systems, open-domain conversation usually requires data-driven approaches, which can be roughly divided into two categories: retrieval-based and generation-based systems. Retrieval systems search a user-issued utterance (called a query) in a large database, and return a reply that best matches the query. Read More

In this paper, we propose a data structure, a quadruple neighbor list (QN-list, for short), to support real time queries of all longest increasing subsequence (LIS) and LIS with constraints over sequential data streams. The QN-List built by our algorithm requires $O(w)$ space, where $w$ is the time window size. The running time for building the initial QN-List takes $O(w\log w)$ time. Read More

Existing knowledge-based question answering systems often rely on small annotated training data. While shallow methods like relation extraction are robust to data scarcity, they are less expressive than the deep meaning representation methods like semantic parsing, thereby failing at answering questions involving multiple constraints. Here we alleviate this problem by empowering a relation extraction method with additional evidence from Wikipedia. Read More

As the volume of the RDF data becomes increasingly large, it is essential for us to design a distributed database system to manage it. For distributed RDF data design, it is quite common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design consists of two steps: fragmentation and allocation. Read More

Syntactic features play an essential role in identifying relationship in a sentence. Previous neural network models often suffer from irrelevant information introduced when subjects and objects are in a long distance. In this paper, we propose to learn more robust relation representations from the shortest dependency path through a convolution neural network. Read More

We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a "partial evaluation and assembly" framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Read More

Although SPARQL has been the predominant query language over RDF graphs, some query intentions cannot be well captured by only using SPARQL syntax. On the other hand, the keyword search enjoys widespread usage because of its intuitive way of specifying information needs but suffers from the problem of low precision. To maximize the advantages of both SPARQL and keyword search, we introduce a novel paradigm that combines both of them and propose a hybrid query (called an SK query) that integrates SPARQL and keyword search. Read More