KU-ISPL Language Recognition System for NIST 2015 i-Vector Machine Learning Challenge

In language recognition, the task of rejecting/differentiating closely spaced versus acoustically far spaced languages remains a major challenge. For confusable closely spaced languages, the system needs longer input test duration material to obtain sufficient information to distinguish between languages. Alternatively, if languages are distinct and not acoustically/linguistically similar to others, duration is not a sufficient remedy. The solution proposed here is to explore duration distribution analysis for near/far languages based on the Language Recognition i-Vector Machine Learning Challenge 2015 (LRiMLC15) database. Using this knowledge, we propose a likelihood ratio based fusion approach that leveraged both score and duration information. The experimental results show that the use of duration and score fusion improves language recognition performance by 5% relative in LRiMLC15 cost.


Similar Publications

New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides speaker embedding and can be effectively trained in an unsupervised manner. Read More


Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still vulnerable to spoofing attacks. Inthis work we overview different acoustic feature spaces and classifiersto determine reliable and robust countermeasures against spoofing at-tacks. Read More


This paper presents the Speech Technology Center (STC) replay attack detection systems proposed for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. In this study we focused on comparison of different spoofing detection approaches. These were GMM based methods, high level features extraction with simple classifier and deep learning frameworks. Read More


We study the problem of dictionary learning for signals that can be represented as polynomials or polynomial matrices, such as convolutive signals with time delays or acoustic impulse responses. Recently, we developed a method for polynomial dictionary learning based on the fact that a polynomial matrix can be expressed as a polynomial with matrix coefficients, where the coefficient of the polynomial at each time lag is a scalar matrix. However, a polynomial matrix can be also equally represented as a matrix with polynomial elements. Read More


Automatic music transcription (AMT) aims to infer a latent symbolic representation of a piece of music (piano-roll), given a corresponding observed audio recording. Transcribing polyphonic music (when multiple notes are played simultaneously) is a challenging problem, due to highly structured overlapping between harmonics. We study whether the introduction of physically inspired Gaussian process (GP) priors into audio content analysis models improves the extraction of patterns required for AMT. Read More


Time-frequency representations are important for the analysis of time series. We have developed an online time-series analysis system and equipped it to reliably handle re-alignment in the time-frequency plane. The system can deal with issues like invalid regions in time-frequency representations and discontinuities in data transmissions, making it suitable for on-line processing in real-world situations. Read More


In large-scale wireless acoustic sensor networks (WASNs), many of the sensors will only have a marginal contribution to a certain estimation task. Involving all sensors increases the energy budget unnecessarily and decreases the lifetime of the WASN. Using microphone subset selection, also termed as sensor selection, the most informative sensors can be chosen from a set of candidate sensors to achieve a prescribed inference performance. Read More


There is increasing interest in the use of animal-like robots in applications such as companionship and pet therapy. However, in the majority of cases it is only the robot's physical appearance that mimics a given animal. In contrast, MiRo is the first commercial biomimetic robot to be based on a hardware and software architecture that is modelled on the biological brain. Read More


A serious problem for automated music generation is to propose the model that could reproduce sophisticated temporal and melodic patterns that would correspond to the style of the training input. We propose a new architecture of an artificial neural network that helps to deal with such tasks. The proposed approach is based on a long short-term memory language model combined with variational recurrent autoencoder. Read More


A short overview demystifying the midi audio format is presented. The goal is to explain the file structure and how the instructions are used to produce a music signal, both in the case of monophonic signals as for polyphonic signals. Read More