Cognitive Mapping and Planning for Visual Navigation

We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person viewpoints and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the planner, and b) a spatial memory with the ability to plan given an incomplete set of observations about the world. CMP constructs a top-down belief map of the world and applies a differentiable neural net planner to produce the next action at each time step. The accumulated belief of the world enables the agent to track visited regions of the environment. Our experiments demonstrate that CMP outperforms both reactive strategies and standard memory-based architectures and performs well in novel environments. Furthermore, we show that CMP can also achieve semantically specified goals, such as 'go to a chair'.

Comments: Under review for CVPR 2017. Project webpage:

Similar Publications

This abstract briefly describes a segmentation algorithm developed for the ISIC 2017 Skin Lesion Detection Competition hosted at [ref]. The objective of the competition is to perform a segmentation (in the form of a binary mask image) of skin lesions in dermoscopic images as close as possible to a segmentation performed by trained clinicians, which is taken as ground truth. This project only takes part in the segmentation phase of the challenge. Read More

As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers' attention because of its descriptive power and clear structure. It localizes the objects and captures their interactions with a subject-predicate-object triplet, e.g. Read More

Convolutional Neural Networks (Convnets) have achieved good results in a range of computer vision tasks the recent years. Though given a lot of attention, visualizing the learned representations to interpret Convnets, still remains a challenging task. The high dimensionality of internal representations and the high abstractions of deep layers are the main challenges when visualizing Convnet functionality. Read More

Mandible bone segmentation from computed tomography (CT) scans is challenging due to mandible's structural irregularities, complex shape patterns, and lack of contrast in joints. Furthermore, connections of teeth to mandible and mandible to remaining parts of the skull make it extremely difficult to identify mandible boundary automatically. This study addresses these challenges by proposing a novel framework where we define the segmentation as two complementary tasks: recognition and delineation. Read More

Cascade is a widely used approach that rejects obvious negative samples at early stages for learning better classifier and faster inference. This paper presents chained cascade network (CC-Net). In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages. Read More

Skin cancer is a major public health problem, as is the most common type of cancer and represents more than half of cancer diagnoses worldwide. Early detection influences the outcome of the disease and motivates our work. We obtain the state of the art results for the ISBI 2016 Melanoma Classification Challenge (named Skin Lesion Analysis towards Melanoma Detection) facing the peculiarities of dealing with such a small, unbalanced, biological database. Read More

Increasing use of CT in modern medical practice has raised concerns over associated radiation dose. Reduction of radiation dose associated with CT can increase noise and artifacts, which can adversely affect diagnostic confidence. Denoising of low-dose CT images on the other hand can help improve diagnostic confidence, which however is a challenging problem due to its ill-posed nature, since one noisy image patch may correspond to many different output patches. Read More

Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies. Read More

Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. Read More

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. A key observation is that traditional cross entropy loss only enforces the inter-class variation among samples and ignores to narrow down the similarity within each category. Read More