Zhiyuan Tang

Zhiyuan Tang
Are you Zhiyuan Tang?

Claim your profile, edit publications, add additional information:

Contact Details

Name
Zhiyuan Tang
Affiliation
Location

Pubs By Year

Pub Categories

 
Computer Science - Computation and Language (9)
 
Computer Science - Learning (8)
 
Computer Science - Neural and Evolutionary Computing (7)
 
Quantum Physics (4)
 
Statistics - Machine Learning (3)
 
Computer Science - Sound (2)
 
Physics - Optics (1)

Publications Authored By Zhiyuan Tang

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Read More

Deep neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID methods, although this information has been used in the conventional phonetic LID systems with a great success. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system but accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. Read More

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. Read More

Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. Read More

We present the OC16-CE80 Chinese-English mixlingual speech database which was released as a main resource for training, development and test for the Chinese-English mixlingual speech recognition (MixASR-CHEN) challenge on O-COCOSDA 2016. This database consists of 80 hours of speech signals recorded from more than 1,400 speakers, where the utterances are in Chinese but each involves one or several English words. Based on the database and another two free data resources (THCHS30 and the CMU dictionary), a speech recognition (ASR) baseline was constructed with the deep neural network-hidden Markov model (DNN-HMM) hybrid system. Read More

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties behind the remarkable performance remain unclear in many applications, e.g. Read More

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks. Read More

We present a silicon optical transmitter for polarization-encoded quantum key distribution (QKD). The chip was fabricated in a standard silicon photonic foundry process and integrated a pulse generator, intensity modulator, variable optical attenuator, and polarization modulator in a 1.3 mm $\times$ 3 mm die area. Read More

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. Read More

Measurement-device-independent quantum key distribution (MDI-QKD), which is immune to all detector side-channel attacks, is the most promising solution to the security issues in practical quantum key distribution systems. Though several experimental demonstrations of MDI QKD have been reported, they all make one crucial but not yet verified assumption, that is there are no flaws in state preparation. Such an assumption is unrealistic and security loopholes remain in the source. Read More

Pre-training is crucial for learning deep neural networks. Most of existing pre-training methods train simple models (e.g. Read More

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging, especially with limited training data. Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision. Read More

Decoy-state quantum key distribution (QKD) is a standard technique in current quantum cryptographic implementations. Unfortunately, existing experiments have two important drawbacks: the state preparation is assumed to be perfect without errors and the employed security proofs do not fully consider the finite-key effects for general attacks. These two drawbacks mean that existing experiments are not guaranteed to be secure in practice. Read More

We demonstrate the first implementation of polarization encoding measurement-device-independent quantum key distribution (MDI-QKD), which is immune to all detector side-channel attacks. Active phase randomization of each individual pulse is implemented to protect against attacks on imperfect sources. By optimizing the parameters in the decoy state protocol, we show that it is feasible to implement polarization encoding MDI-QKD over large optical fiber distances. Read More