Computer Science - Computer Vision and Pattern Recognition Publications (50)

Search

Computer Science - Computer Vision and Pattern Recognition Publications

This abstract briefly describes a segmentation algorithm developed for the ISIC 2017 Skin Lesion Detection Competition hosted at [ref]. The objective of the competition is to perform a segmentation (in the form of a binary mask image) of skin lesions in dermoscopic images as close as possible to a segmentation performed by trained clinicians, which is taken as ground truth. This project only takes part in the segmentation phase of the challenge. Read More


As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers' attention because of its descriptive power and clear structure. It localizes the objects and captures their interactions with a subject-predicate-object triplet, e.g. Read More


Convolutional Neural Networks (Convnets) have achieved good results in a range of computer vision tasks the recent years. Though given a lot of attention, visualizing the learned representations to interpret Convnets, still remains a challenging task. The high dimensionality of internal representations and the high abstractions of deep layers are the main challenges when visualizing Convnet functionality. Read More


Mandible bone segmentation from computed tomography (CT) scans is challenging due to mandible's structural irregularities, complex shape patterns, and lack of contrast in joints. Furthermore, connections of teeth to mandible and mandible to remaining parts of the skull make it extremely difficult to identify mandible boundary automatically. This study addresses these challenges by proposing a novel framework where we define the segmentation as two complementary tasks: recognition and delineation. Read More


Cascade is a widely used approach that rejects obvious negative samples at early stages for learning better classifier and faster inference. This paper presents chained cascade network (CC-Net). In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages. Read More


Skin cancer is a major public health problem, as is the most common type of cancer and represents more than half of cancer diagnoses worldwide. Early detection influences the outcome of the disease and motivates our work. We obtain the state of the art results for the ISBI 2016 Melanoma Classification Challenge (named Skin Lesion Analysis towards Melanoma Detection) facing the peculiarities of dealing with such a small, unbalanced, biological database. Read More


Increasing use of CT in modern medical practice has raised concerns over associated radiation dose. Reduction of radiation dose associated with CT can increase noise and artifacts, which can adversely affect diagnostic confidence. Denoising of low-dose CT images on the other hand can help improve diagnostic confidence, which however is a challenging problem due to its ill-posed nature, since one noisy image patch may correspond to many different output patches. Read More


Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies. Read More


Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. Read More


Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. A key observation is that traditional cross entropy loss only enforces the inter-class variation among samples and ignores to narrow down the similarity within each category. Read More


Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. Read More


In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. Read More


Object reconstruction is one of the main problems in cultural heritage preservation. This problem is due to lack of data in documentation. Thus in this research we presented a method of 3D reconstruction using close-range photogrammetry. Read More


Visual question answering (VQA) has witnessed great progress since May, 2015 as a classic problem unifying visual and textual data into a system. Many enlightening VQA works explore deep into the image and question encodings and fusing methods, of which attention is the most effective and infusive mechanism. Current attention based methods focus on adequate fusion of visual and textual features, but lack the attention to where people focus to ask questions about the image. Read More


The United States spends more than $1B each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Read More


Colorization of grayscale images has been a hot topic in computer vision. Previous research mainly focuses on producing a colored image to match the original one. However, since many colors share the same gray value, an input grayscale image could be diversely colored while maintaining its reality. Read More


Photography usually requires optics in conjunction with a recording device (an image sensor). Eliminating the optics could lead to new form factors for cameras. Here, we report a simple demonstration of imaging using a bare CMOS sensor that utilizes computation. Read More


Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. Read More


We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. Read More


We propose a novel label fusion technique as well as a crowdsourcing protocol to efficiently obtain accurate epithelial cell segmentations from non-expert crowd workers. Our label fusion technique simultaneously estimates the true segmentation, the performance levels of individual crowd workers, and an image segmentation model in the form of a pairwise Markov random field. We term our approach image-aware STAPLE (iaSTAPLE) since our image segmentation model seamlessly integrates into the well-known and widely used STAPLE approach. Read More


Unsupervised learning allows algorithms to adapt to different data thanks to the autonomous discovery of discriminating features during the training. When these algorithms are reducible to cost-function minimisation, better interpretations of their learning dynamics are possible. Recently, new Hebbian-like plasticity, bio-inspired, local and unsupervised learning rules for neural networks, have been shown to minimise a cost-function while performing online sparse representation learning. Read More


In this paper, we focus on fully automatic traffic surveillance camera calibration which we use for speed measurement of passing vehicles. We improve over a recent state-of-the-art camera calibration method for traffic surveillance based on two detected vanishing points. More importantly, we propose a novel automatic scene scale inference based on matching bounding boxes of rendered 3D models of vehicles with detected bounding boxes in the image. Read More


In this paper, we focus on traffic camera calibration and visual speed measurement from a single monocular camera, which is an important task of visual traffic surveillance. Existing methods addressing this problem are hard to compare due to lack of a common dataset with reliable ground truth. Therefore, it is not clear how the methods compare in various aspects and what are the factors affecting their performance. Read More


The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. Read More


Comparing images in order to recommend items from an image-inventory is a subject of continued interest. Added with the scalability of deep-learning architectures the once `manual' job of hand-crafting features have been largely alleviated, and images can be compared according to features generated from a deep convolutional neural network. In this paper, we compare distance metrics (and divergences) to rank features generated from a neural network, for content-based image retrieval. Read More


This paper proposes a branched residual network for image classification. It is known that high-level features of deep neural network are more representative than lower-level features. By sharing the low-level features, the network can allocate more memory to high-level features. Read More


Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos provides vital information for object detection. To fully utilize temporal information, state-of-the-art methods are therefore based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. Read More


The empirical fact that classifiers, trained on given data collections, perform poorly when tested on data acquired in different settings is theoretically explained in domain adaptation through a shift among distributions of the source and target domains. Alleviating the domain shift problem, especially in the challenging setting where no labeled data are available for the target domain, is paramount for having visual recognition systems working in the wild. As the problem stems from a shift among distributions, intuitively one should try to align them. Read More


Food is an integral part of our life and what and how much we eat crucially affects our health. Our food choices largely depend on how we perceive certain characteristics of food, such as whether it is healthy, delicious or if it qualifies as a salad. But these perceptions differ from person to person and one person's "single lettuce leaf" might be another person's "side salad". Read More


This paper presents a novel approach for video-based person re-identification using multiple Convolutional Neural Networks (CNNs). Unlike previous work, we intend to extract a compact yet discriminative appearance representation from several frames rather than the whole sequence. Specifically, given a video, the representative frames are selected based on the walking profile of consecutive frames. Read More


One of the major challenges of model-free visual tracking problem has been the difficulty originating from the unpredictable and drastic changes in the appearance of objects we target to track. Existing methods tackle this problem by updating the appearance model on-line in order to adapt to the changes in the appearance. Despite the success of these methods however, inaccurate and erroneous updates of the appearance model result in a tracker drift. Read More


Object recognition is an important problem in computer vision, having diverse applications. In this work, we construct an end-to-end scene recognition pipeline consisting of feature extraction, encoding, pooling and classification. Our approach simultaneously utilize global feature descriptors as well as local feature descriptors from images, to form a hybrid feature descriptor corresponding to each image. Read More


This paper proposes a novel advanced motion model to handle the irregular motion for the cubic map projection of 360-degree video. Since the irregular motion is mainly caused by the projection from the sphere to the cube map, we first try to project the pixels in both the current picture and reference picture from unfolding cube back to the sphere. Then through utilizing the characteristic that most of the motions in the sphere are uniform, we can derive the relationship between the motion vectors of various pixels in the unfold cube. Read More


Multi-view registration is a fundamental but challenging problem in 3D reconstruction and robot vision. Although the original motion averaging algorithm has been introduced as an effective means to solve the multi-view registration problem, it does not consider the reliability and accuracy of each relative motion. Accordingly, this paper proposes a novel motion averaging algorithm for multi-view registration. Read More


Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In many cases this leads to significantly smaller networks with only minimal changes to accuracy. Read More


Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. Read More


We introduce an approach for the real-time (2Hz) creation of a dense map and alignment of a moving robotic agent within that map by rendering using a Graphics Processing Unit (GPU). This is done by recasting the scan alignment part of the dense mapping process as a rendering task. Alignment errors are computed from rendering the scene, comparing with range data from the sensors, and minimized by an optimizer. Read More


Recognizing human activities in a sequence is a challenging area of research in ubiquitous computing. Most approaches use a fixed size sliding window over consecutive samples to extract features---either handcrafted or learned features---and predict a single label for all samples in the window. Two key problems emanate from this approach: i) the samples in one window may not always share the same label. Read More


Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. Read More


Label distribution learning (LDL) is a general learning framework, which assigns a distribution over a set of labels to an instance rather than a single label or multiple labels. Current LDL methods have either restricted assumptions on the expression form of the label distribution or limitations in representation learning. This paper presents label distribution learning forests (LDLFs) - a novel label distribution learning algorithm based on differentiable decision trees, which have several advantages: 1) Decision trees have the potential to model any general form of label distributions by the mixture of leaf node predictions. Read More


In global models/priors (for example, using wavelet frames), there is a well known analysis vs synthesis dichotomy in the way signal/image priors are formulated. In patch-based image models/priors, this dichotomy is also present in the choice of how each patch is modeled. This paper shows that there is another analysis vs synthesis dichotomy, in terms of how the whole image is related to the patches, and that all existing patch-based formulations that provide a global image prior belong to the analysis category. Read More


We propose an extended framework for marginalized domain adaptation, aimed at addressing unsupervised, supervised and semi-supervised scenarios. We argue that the denoising principle should be extended to explicitly promote domain-invariant features as well as help the classification task. Therefore we propose to jointly learn the data auto-encoders and the target classifiers. Read More


Automatic segmentation of the liver and hepatic lesions is an important step towards deriving quantitative biomarkers for accurate clinical diagnosis and computer-aided decision support systems. This paper presents a method to automatically segment liver and lesions in CT and MRI abdomen images using cascaded fully convolutional neural networks (CFCNs) enabling the segmentation of a large-scale medical trial or quantitative image analysis. We train and cascade two FCNs for a combined segmentation of the liver and its lesions. Read More


Photographs taken through a glass surface often contain an approximately linear superposition of reflected and transmitted layers. Decomposing an image into these layers is generally an ill-posed task and the use of an additional image prior and user provided cues is presently necessary in order to obtain good results. Current annotation approaches rely on a strong sparsity assumption. Read More


Automatic non-invasive assessment of hepatocellular carcinoma (HCC) malignancy has the potential to substantially enhance tumor treatment strategies for HCC patients. In this work we present a novel framework to automatically characterize the malignancy of HCC lesions from DWI images. We predict HCC malignancy in two steps: As a first step we automatically segment HCC tumor lesions using cascaded fully convolutional neural networks (CFCN). Read More


The development of reliable imaging biomarkers for the analysis of colorectal cancer (CRC) in hematoxylin and eosin (H&E) stained histopathology images requires an accurate and reproducible classification of the main tissue components in the image. In this paper, we propose a system for CRC tissue classification based on convolutional networks (ConvNets). We investigate the importance of stain normalization in tissue classification of CRC tissue samples in H&E-stained images. Read More


We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two-level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values. Read More


Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. Read More


Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable $X_i$ is represented by $\ell$ nodes (where $\ell$ is the number of labels) arranged in a column. However, this method in general requires $2\,\ell^2$ edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. Read More


Photos are becoming spontaneous, objective, and universal sources of information. This paper develops evolving situation recognition using photo streams coming from disparate sources combined with the advances of deep learning. Using visual concepts in photos together with space and time information, we formulate the situation detection into a semi-supervised learning framework and propose new graph-based models to solve the problem. Read More