Computer Science - Learning Publications (50)


Computer Science - Learning Publications

A variety of real-world processes (over networks) produce sequences of data whose complex temporal dynamics need to be studied. More especially, the event timestamps can carry important information about the underlying network dynamics, which otherwise are not available from the time-series evenly sampled from continuous signals. Moreover, in most complex processes, event sequences and evenly-sampled times series data can interact with each other, which renders joint modeling of those two sources of data necessary. Read More

Catastrophic forgetting is a problem which refers to losing the information of the first task after training from the second task in continual learning of neural networks. To resolve this problem, we propose the incremental moment matching (IMM), which uses the Bayesian neural network framework. IMM assumes that the posterior distribution of parameters of neural networks is approximated with Gaussian distribution and incrementally matches the moment of the posteriors, which are trained for the first and second task, respectively. Read More

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing modules that are not well matched because they are not trained jointly. Read More

The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering as an alternative to Lloyd's algorithm, which for all its ease of implementation and fast runtime, has the major drawback of being trapped at local optima. While the TS approach can yield superior performance, it involves a high computational complexity. Moreover, the difficulty in parameter selection in the existing TS approach does not make it any more attractive. Read More

Under normality and homoscedasticity assumptions, Linear Discriminant Analysis (LDA) is known to be optimal in terms of minimising the Bayes error for binary classification. In the heteroscedastic case, LDA is not guaranteed to minimise this error. Assuming heteroscedasticity, we derive a linear classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the Bayes error for binary classification. Read More

The nearest neighbor method together with the dynamic time warping (DTW) distance is one of the most popular approaches in time series classification. This method suffers from high storage and computation requirements for large training sets. As a solution to both drawbacks, this article extends learning vector quantization (LVQ) from Euclidean spaces to DTW spaces. Read More

A recurring problem faced when training neural networks is that there is typically not enough data to maximize the generalization capability of deep neural networks(DNN). There are many techniques to address this, including data augmentation, dropout, and transfer learning. In this paper, we introduce an additional method which we call Smart Augmentation and we show how to use it to increase the accuracy and reduce overfitting on a target network. Read More

Robot vision is a fundamental device for human-robot interaction and robot complex tasks. In this paper, we use Kinect and propose a feature graph fusion (FGF) for robot recognition. Our feature fusion utilizes RGB and depth information to construct fused feature from Kinect. Read More

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces. While recent work has proposed several techniques for automated option discovery, they do not scale to multi-level hierarchies and to expressive representations such as deep networks. We present Discovery of Deep Options (DDO), a policy-gradient algorithm that discovers parametrized options from a set of demonstration trajectories, and can be used recursively to discover additional levels of the hierarchy. Read More

The paper first reports an experimentally identified list of benchmark data sets that are hard for representative classification and feature selection methods. This was done after systematically evaluating a total of 54 combinations of methods, involving nine state-of-the-art classification algorithms and six commonly used feature selection methods, on 129 data sets from the UCI repository (some data sets with known high classification accuracy were excluded). In this paper, a data set for classification is called hard if none of the 54 combinations can achieve an AUC over 0. Read More

Deep convolutional neural networks are generally regarded as robust function approximators. So far, this intuition is based on perturbations to external stimuli such as the images to be classified. Here we explore the robustness of convolutional neural networks to perturbations to the internal weights and architecture of the network itself. Read More

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most competitive approaches lie at methodological extremes: some follow a Bayesian approach, defining probabilistic models with convergence guarantees, while others opt for more efficient heuristic techniques. Here we introduce an approximation to a segmental Bayesian model that falls in between, with a clear objective function but using hard clustering and segmentation rather than full Bayesian inference. Read More

We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. Read More

Text content can have different visual presentation ways with roughly similar characters. While conventional text image retrieval depends on complex model of OCR-based text recognition and text similarity detection, this paper proposes a new learning-based approach to text image retrieval with the purpose of finding out the original or similar text through a query text image. Firstly, features of text images are extracted by the CNN network to obtain the deep visual representations. Read More

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met. A prominent limitation of current systems lies in the lack of matching and communication between the various technologies involved in the distant speech recognition process. The speech enhancement and speech recognition modules are, for instance, often trained independently. Read More

Traditional image clustering methods take a two-step approach, feature learning and clustering, sequentially. However, recent research results demonstrated that combining the separated phases in a unified framework and training them jointly can achieve a better performance. In this paper, we first introduce fully convolutional auto-encoders for image feature learning and then propose a unified clustering framework to learn image representations and cluster centers jointly based on a fully convolutional auto-encoder and soft $k$-means scores. Read More

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four families of problems for which some of the commonly used existing algorithms fail or suffer significant difficulty. Read More

Synapses in real neural circuits can take discrete values, including zero (silent or potential) synapses. The computational role of zero synapses in unsupervised feature learning of unlabeled noisy data is still unclear, yet important to understand how the sparseness of synaptic activity is shaped during learning and its relationship with receptive field formation. Here, we formulate this kind of sparse feature learning by statistical mechanics approach. Read More

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate (i. Read More

Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. Read More

While modern day web applications aim to create impact at the civilization level, they have become vulnerable to adversarial activity, where the next cyber-attack can take any shape and can originate from anywhere. The increasing scale and sophistication of attacks, has prompted the need for a data driven solution, with machine learning forming the core of many cybersecurity systems. Machine learning was not designed with security in mind, and the essential assumption of stationarity, requiring that the training and testing data follow similar distributions, is violated in an adversarial domain. Read More

We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels. The compositional kernels we use are inspired by the structure of convolutional neural networks and kernels. The resulting scheme yields sparse and efficiently computable features. Read More

In this work, we propose several online methods to build a \emph{learning curriculum} from a given set of target-task-specific training tasks in order to speed up reinforcement learning (RL). These methods can decrease the total training time needed by an RL agent compared to training on the target task from scratch. Unlike traditional transfer learning, we consider creating a sequence from several training tasks in order to provide the most benefit in terms of reducing the total time to train. Read More

We report the results of our classification-based machine translation model, built upon the framework of a recurrent neural network using gated recurrent units. Unlike other RNN models that attempt to maximize the overall conditional log probability of sentences against sentences, our model focuses a classification approach of estimating the conditional probability of the next word given the input sequence. This simpler approach using GRUs was hoped to be comparable with more complicated RNN models, but achievements in this implementation were modest and there remains a lot of room for improving this classification approach. Read More

The least-squares support vector machine is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the least-squares support vector machine classifier. The proposed methods are based on randomized block kernel matrices, and we show that they provide good accuracy and reliable scaling for multi-class classification problems with relatively large data sets. Read More

We propose the first multistage intervention framework that tackles fake news in social networks by combining reinforcement learning with a point process network activity model. The spread of fake news and mitigation events within the network is modeled by a multivariate Hawkes process with additional exogenous control terms. By choosing a feature representation of states, defining mitigation actions and constructing reward functions to measure the effectiveness of mitigation activities, we map the problem of fake news mitigation into the reinforcement learning framework. Read More

We consider the problem of a robot learning the mechanical properties of objects through physical interaction with the object, and introduce a practical, data-efficient approach for identifying the motion models of these objects. The proposed method utilizes a physics engine, where the robot seeks to identify the inertial and friction parameters of the object by simulating its motion under different values of the parameters and identifying those that result in a simulation which matches the observed real motions. The problem is solved in a Bayesian optimization framework. Read More

We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are $NP$ hard to solve and we either give approximation guarantees or prove inapproximability results. Read More

Health care is one of the most exciting frontiers in data mining and machine learning. Successful adoption of electronic health records (EHRs) created an explosion in digital clinical data available for analysis, but progress in machine learning for healthcare research has been difficult to measure because of the absence of publicly available benchmark data sets. To address this problem, we propose four clinical prediction benchmarks using data derived from the publicly available Medical Information Mart for Intensive Care (MIMIC-III) database. Read More

We provide new results concerning noise-tolerant and sample-efficient learning algorithms under $s$-concave distributions over $\mathbb{R}^n$ for $-\frac{1}{2n+3}\le s\le 0$. The new class of $s$-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g. Read More

Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images. Interactive environments, in which an agent can deliberately take actions, offer an opportunity to tackle this task better, because the agent can experiment with different actions and observe their effects. We introduce the idea that in interactive environments, latent factors that control the variation in observed data can be identified by figuring out what the agent can control. Read More

We present UBEV, a simple and efficient reinforcement learning algorithm for fixed-horizon episodic Markov decision processes. The main contribution is a proof that UBEV enjoys a sample-complexity bound that holds for all accuracy levels simultaneously with high probability, and matches the lower bound except for logarithmic terms and one factor of the horizon. A consequence of the fact that our sample-complexity bound holds for all accuracy levels is that the new algorithm achieves a sub-linear regret of O(sqrt(SAT)), which is the first time the dependence on the size of the state space has provably appeared inside the square root. Read More

The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. Read More

This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. Read More

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. Read More

In this paper we analyze the gate activation signals inside the gated recurrent neural networks, and find the temporal structure of such signals is highly correlated with the phoneme boundaries. This correlation is further verified by a set of experiments for phoneme segmentation, in which better results compared to standard approaches were obtained. Read More

Multivariate binary distributions can be decomposed into products of univariate conditional distributions. Recently popular approaches have modeled these conditionals through neural networks with sophisticated weight-sharing structures. It is shown that state-of-the-art performance on several standard benchmark datasets can actually be achieved by training separate probability estimators for each dimension. Read More

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Read More

We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor. Read More

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. Read More

In all but the most trivial optimization problems, the structure of the solutions exhibit complex interdependencies between the input parameters. Decades of research with stochastic search techniques has shown the benefit of explicitly modeling the interactions between sets of parameters and the overall quality of the solutions discovered. We demonstrate a novel method, based on learning deep networks, to model the global landscapes of optimization problems. Read More

Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. Read More

The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardiaiality constraints. In this paper, we consider the scenario where overlapped cardinality constraints satisfy a Three-view Cardinality Structure (TVCS), which reflects the natural restriction in many applications, such as identification of gene regulatory networks and task-worker assignment problem. Read More

Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering. Read More

Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Read More

The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. Read More

Many problems in image processing and computer vision (e.g. colorization, style transfer) can be posed as 'manipulating' an input image into a corresponding output image given a user-specified guiding signal. Read More

Current approaches for Knowledge Distillation (KD) either directly use training data or sample from the training data distribution. In this paper, we demonstrate effectiveness of 'mismatched' unlabeled stimulus to perform KD for image classification networks. For illustration, we consider scenarios where this is a complete absence of training data, or mismatched stimulus has to be used for augmenting a small amount of training data. Read More