Dinh Phung

Dinh Phung
Are you Dinh Phung?

Claim your profile, edit publications, add additional information:

Contact Details

Dinh Phung

Pubs By Year

Pub Categories

Statistics - Machine Learning (27)
Computer Science - Learning (27)
Computer Science - Information Retrieval (10)
Statistics - Methodology (5)
Computer Science - Computer Vision and Pattern Recognition (4)
Statistics - Applications (3)
Computer Science - Artificial Intelligence (3)
Mathematics - Optimization and Control (1)
Computer Science - Computers and Society (1)
Computer Science - Neural and Evolutionary Computing (1)
Computer Science - Computation and Language (1)

Publications Authored By Dinh Phung

Multiple instance data are sets or multi-sets of unordered elements. Using metrics or distances for sets, we propose an approach to several multiple instance learning tasks, such as clustering (unsupervised learning), classification (supervised learning), and novelty detection (semi-supervised learning). In particular, we introduce the Optimal Sub-Pattern Assignment metric to multiple instance learning so as to provide versatile design choices. Read More

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. Read More

While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Read More

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. Read More

Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have not received much attention. This paper proposes the modelling of point pattern data via random finite sets (RFS). Read More

In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. Read More

To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Read More

Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. Read More

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Read More

A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. Read More

Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. Read More

Acquiring labels are often costly, whereas unlabeled data are usually easy to obtain in modern machine learning applications. Semi-supervised learning provides a principled machine learning framework to address such situations, and has been applied successfully in many real-word applications and industries. Nonetheless, most of existing semi-supervised learning methods encounter two serious limitations when applied to modern and large-scale datasets: computational burden and memory usage demand. Read More

Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. Read More

One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. Read More

We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). Read More

We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Read More

Recommender systems play a central role in providing individualized access to information and services. This paper focuses on collaborative filtering, an approach that exploits the shared structure among mind-liked users and similar items. In particular, we focus on a formal probabilistic framework known as Markov random fields (MRF). Read More

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. Read More

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. Read More

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. Read More

Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD -- their behaviour, concerns, needs etc. Read More

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. Read More

Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed-Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary variables, each of which can be interpreted as a particular hidden aspect of the data. Read More

Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. Read More

Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed $5$ stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Read More

Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. Read More

We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. Our motivation rests in the Thurstonian view that many discrete data types can be considered as being generated from a subset of underlying latent continuous variables, and in the observation that each realisation of a discrete type imposes certain inequalities on those variables. Thus learning and inference in TBM reduce to making sense of a set of inequalities. Read More

The \emph{maximum a posteriori} (MAP) assignment for general structure Markov random fields (MRFs) is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS) takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. Read More

Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional Markov random field. Read More

The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. Read More

Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. Read More

Learning structured outputs with general structures is computationally challenging, except for tree-structured models. Thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Read More

Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records. Read More

We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. Read More

Hierarchical beta process has found interesting applications in recent years. In this paper we present a modified hierarchical beta process prior with applications to hierarchical modeling of multiple data sources. The novel use of the prior over a hierarchical factor model allows factors to be shared across different sources. Read More

Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. Read More

Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirectedMarkov chains tomodel complex hierarchical, nestedMarkov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partiallysupervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. Read More

This paper addresses the general problem of modelling and learning rank data with ties. We propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. Read More