Cognitive Mapping and Planning for Visual Navigation

We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person viewpoints and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the planner, and b) a spatial memory with the ability to plan given an incomplete set of observations about the world. CMP constructs a top-down belief map of the world and applies a differentiable neural net planner to produce the next action at each time step. The accumulated belief of the world enables the agent to track visited regions of the environment. Our experiments demonstrate that CMP outperforms both reactive strategies and standard memory-based architectures and performs well in novel environments. Furthermore, we show that CMP can also achieve semantically specified goals, such as 'go to a chair'.

Comments: Under review for CVPR 2017. Project webpage: https://sites.google.com/view/cognitive-mapping-and-planning/

Similar Publications

2017Mar
Affiliations: 1Medical Physics Unit, McGill University, Montréal, Canada, 2Radiation Oncology Division, Hôpital général juif, Montréal, Canada, 3Department of Radiation Oncology, Centre hospitalier universitaire de Sherbrooke, Montréal, Canada, 4Department of Radiation Oncology, Centre hospitalier de l'Université de Montréal, Montréal, Canada, 5Department of Radiation Oncology, Hôpital Maisonneuve-Rosemont, Montréal, Canada, 6Departments of Radiation Oncology & Radiology, Dana-Farber Cancer Institute, Boston, USA, 7Department of Radiation Oncology, Hôpital Maisonneuve-Rosemont, Montréal, Canada, 8Department of Radiation Oncology, Centre hospitalier de l'Université de Montréal, Montréal, Canada, 9Department of Radiation Oncology, Centre hospitalier universitaire de Sherbrooke, Montréal, Canada, 10Radiation Oncology Division, Hôpital général juif, Montréal, Canada, 11Medical Physics Unit, McGill University, Montréal, Canada, 12Department of Radiation Oncology, Physics Division, University of Michigan, Ann Arbor, USA

Quantitative extraction of high-dimensional mineable data from medical images is a process known as radiomics. Radiomics is foreseen as an essential prognostic tool for cancer risk assessment and the quantification of intratumoural heterogeneity. In this work, 1615 radiomic features (quantifying tumour image intensity, shape, texture) extracted from pre-treatment FDG-PET and CT images of 300 patients from four different cohorts were analyzed for the risk assessment of locoregional recurrences (LR) and distant metastases (DM) in head-and-neck cancer. Read More


Local deep neural networks have been recently introduced for gender recognition. Although, they achieve very good performance they are very computationally expensive to train. In this work, we introduce a simplified version of local deep neural networks which significantly reduces the training time. Read More


In the field of connectomics, neuroscientists seek to identify cortical connectivity comprehensively. Neuronal boundary detection from the Electron Microscopy (EM) images is often done to assist the automatic reconstruction of neuronal circuit. But the segmentation of EM images is a challenging problem, as it requires the detector to be able to detect both filament-like thin and blob-like thick membrane, while suppressing the ambiguous intracellular structure. Read More


One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce the semantic gaps between low-level features and high-level semantic concepts. In CBIR, the images are represented in the feature space and the performance of CBIR depends on the type of selected feature representation. Late fusion also known as visual words integration is applied to enhance the performance of image retrieval. Read More


With a widespread use of digital imaging data in hospitals, the size of medical image repositories is increasing rapidly. This causes difficulty in managing and querying these large databases leading to the need of content based medical image retrieval (CBMIR) systems. A major challenge in CBMIR systems is the semantic gap that exists between the low level visual information captured by imaging devices and high level semantic information perceived by human. Read More


We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems. Classification networks are only responsive to small and sparse discriminative regions from the object of interest, which deviates from the requirement of the segmentation task that needs to localize dense, interior and integral regions for pixel-wise inference. To mitigate this gap, we propose a new adversarial erasing approach for localizing and expanding object regions progressively. Read More


Face recognition (FR) methods report significant performance by adopting the convolutional neural network (CNN) based learning methods. Although CNNs are mostly trained by optimizing the softmax loss, the recent trend shows an improvement of accuracy with different strategies, such as task-specific CNN learning with different loss functions, fine-tuning on target dataset, metric learning and concatenating features from multiple CNNs. Incorporating these tasks obviously requires additional efforts. Read More


Robot vision is a fundamental device for human-robot interaction and robot complex tasks. In this paper, we use Kinect and propose a feature graph fusion (FGF) for robot recognition. Our feature fusion utilizes RGB and depth information to construct fused feature from Kinect. Read More


Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. Read More


Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images. However, the underlying manifold which those images reside on is rarely investigated. That raises a problem that the learned metric is not smooth with respect to the local geometry structure of the data manifold. Read More