Leonid Sigal

Leonid Sigal
Are you Leonid Sigal?

Claim your profile, edit publications, add additional information:

Contact Details

Leonid Sigal

Pubs By Year

Pub Categories

Computer Science - Computer Vision and Pattern Recognition (13)
Computer Science - Artificial Intelligence (6)
Statistics - Machine Learning (5)
Computer Science - Learning (4)
Computer Science - Multimedia (3)
Computer Science - Human-Computer Interaction (2)
Statistics - Theory (1)
Statistics - Methodology (1)
Mathematics - Statistics (1)
Computer Science - Graphics (1)
Statistics - Applications (1)
Computer Science - Computation and Language (1)

Publications Authored By Leonid Sigal

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks. Read More

We explore the power of spatial context as a self-supervisory signal for learning visual representations. In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset. Unlike auto-encoders, that aim to encode and reconstruct original image patches, our network aims to encode and reconstruct intermediate representations of the spatially offset patches. Read More

Generating and manipulating human facial images using high-level attributal controls are important and interesting problems. The models proposed in previous work can solve one of these two problems (generation or manipulation), but not both coherently. This paper proposes a novel model that learns how to both generate and modify the facial image from high-level semantic attributes. Read More

Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. Read More

Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search. In this work, we study three different joint language-visual neural network model architectures. We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test. Read More

Despite significant progress in object categorization, in recent years, a number of important challenges remain, mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of semi-supervised vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot and open set recognition using a unified framework. Read More

Researchers often summarize their work in the form of posters. Posters provide a coherent and efficient way to convey core ideas from scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. Read More

Recently, attempts have been made to collect millions of videos to train CNN models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. Read More

Modern machine learning-based recognition approaches require large-scale datasets with large number of labelled training images. However, such datasets are inherently difficult and costly to collect and annotate. Hence there is a great and growing interest in automatic dataset collection methods that can leverage the web. Read More

Emotional content is a key element in user-generated videos. However, it is difficult to understand emotions conveyed in such videos due to the complex and unstructured nature of user-generated content and the sparsity of video frames that express emotion. In this paper, for the first time, we study the problem of transferring knowledge from heterogeneous external sources, including image and textual data, to facilitate three related tasks in video emotion understanding: emotion recognition, emotion attribution and emotion-oriented summarization. Read More

Learning from synthetic data has many important and practical applications. An example of application is photo-sketch recognition. Using synthetic data is challenging due to the differences in feature distributions between synthetic and real data, a phenomenon we term synthetic gap. Read More

We propose a method for using synthetic data to help learning classifiers. Synthetic data, even is generated based on real data, normally results in a shift from the distribution of real data in feature space. To bridge the gap between the real and synthetic data, and jointly learn from synthetic and real data, this paper proposes a Multichannel Autoencoder(MCAE). Read More

We present a hierarchical maximum-margin clustering method for unsupervised data analysis. Our method extends beyond flat maximum-margin clustering, and performs clustering recursively in a top-down manner. We propose an effective greedy splitting criteria for selecting which cluster to split next, and employ regularizers that enforce feature sharing/competition for capturing data semantics. Read More

We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes. Contrary to prior work which only utilized them as side information, we explicitly embed the semantic entities into the same space where we embed categories, which enables us to represent a category as their linear combination. By exploiting such a unified model for semantics, we enforce each category to be represented by a supercategory + sparse combination of attributes, with an additional exclusive regularization to learn discriminative composition. Read More

The goal of temporal alignment is to establish time correspondence between two sequences, which has many applications in a variety of areas such as speech processing, bioinformatics, computer vision, and computer graphics. In this paper, we propose a novel temporal alignment method called least-squares dynamic time warping (LSDTW). LSDTW finds an alignment that maximizes statistical dependency between sequences, measured by a squared-loss variant of mutual information. Read More

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. Read More