Dahua Lin

Dahua Lin
Are you Dahua Lin?

Claim your profile, edit publications, add additional information:

Contact Details

Name
Dahua Lin
Affiliation
Location

Pubs By Year

Pub Categories

 
Computer Science - Computer Vision and Pattern Recognition (10)
 
Computer Science - Learning (2)
 
Computer Science - Computation and Language (1)
 
Statistics - Machine Learning (1)
 
Computer Science - Artificial Intelligence (1)

Publications Authored By Dahua Lin

Detecting activities in untrimmed videos is an important yet challenging task. In this paper, we tackle the difficulties of effectively locating the start and the end of a long complex action, which are often met by existing methods. Our key contribution is the structured segment network, a novel framework for temporal action detection, which models the temporal structure of each activity instance via a structured temporal pyramid. Read More

Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e. Read More

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. Read More

Current action recognition methods heavily rely on trimmed videos for model training. However, it is very expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn from untrimmed videos without the need of temporal annotations of action instances. Read More

Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g. Read More

A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition. In our study, however, we observed difficulties along both directions. On one hand, the pursuit for very deep networks are met with diminishing return and increased training difficulty; on the other hand, widening a network would result in a quadratic growth in both computational cost and memory demand. Read More

Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power. This issue is primarily due to the fact that conventional MRFs formulations tend to use simplistic factors to capture local patterns. In this paper, we move beyond such limitations, and propose a novel MRF model that uses fully-connected neurons to express the complex interactions among pixels. Read More

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e. Read More

Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Read More

Binary representation is desirable for its memory efficiency, computation speed and robustness. In this paper, we propose adjustable bounded rectifiers to learn binary representations for deep neural networks. While hard constraining representations across layers to be binary makes training unreasonably difficult, we softly encourage activations to diverge from real values to binary by approximating step functions. Read More

This paper proposes a novel framework for generating lingual descriptions of indoor scenes. Whereas substantial efforts have been made to tackle this problem, previous approaches focusing primarily on generating a single sentence for each image, which is not sufficient for describing complex scenes. We attempt to go beyond this, by generating coherent descriptions with multiple sentences. Read More