Cognitive Mapping and Planning for Visual Navigation

We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person viewpoints and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the planner, and b) a spatial memory with the ability to plan given an incomplete set of observations about the world. CMP constructs a top-down belief map of the world and applies a differentiable neural net planner to produce the next action at each time step. The accumulated belief of the world enables the agent to track visited regions of the environment. Our experiments demonstrate that CMP outperforms both reactive strategies and standard memory-based architectures and performs well in novel environments. Furthermore, we show that CMP can also achieve semantically specified goals, such as 'go to a chair'.

Comments: Under review for CVPR 2017. Project webpage:

Similar Publications

The task of object viewpoint estimation has been a challenge since the early days of computer vision. To estimate the viewpoint (or pose) of an object, people have mostly looked at object intrinsic features, such as shape or appearance. Surprisingly, informative features provided by other, extrinsic elements in the scene, have so far mostly been ignored. Read More

Location recognition is commonly treated as visual instance retrieval on "street view" imagery. The dataset items and queries are panoramic views, i.e. Read More

In this paper, we propose a product quantization table (PQTable); a fast search method for product-quantized codes via hash-tables. An identifier of each database vector is associated with the slot of a hash table by using its PQ-code as a key. For querying, an input vector is PQ-encoded and hashed, and the items associated with that code are then retrieved. Read More

Precise delineation of organs at risk (OAR) is a crucial task in radiotherapy treatment planning, which aims at delivering high dose to the tumour while sparing healthy tissues. In recent years algorithms showed high performance and the possibility to automate this task for many OAR. However, for some OAR precise delineation remains challenging. Read More

We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user's active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1. Read More

Social relations are the foundation of human daily life. Developing techniques to analyze such relations from visual data bears great potential to build machines that better understand us and are capable of interacting with us at a social level. Previous investigations have remained partial due to the overwhelming diversity and complexity of the topic and consequently have only focused on a handful of social relations. Read More

The use of color in QR codes brings extra data capacity, but also inflicts tremendous challenges on the decoding process due to chromatic distortion, cross-channel color interference and illumination variation. Particularly, we further discover a new type of chromatic distortion in high-density color QR codes, cross-module color interference, caused by the high density which also makes the geometric distortion correction more challenging. To address these problems, we propose two approaches, namely, LSVM-CMI and QDA-CMI, which jointly model these different types of chromatic distortion. Read More

This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier that is based on a shallow convolutional neural network architecture and demonstrate that object recognition improves when this is combined with object state information from the tracking algorithm. Read More

Most of the traditional convolutional neural networks (CNNs) implements bottom-up approach (feedforward) for image classifications. However, many scientific studies demonstrate that visual perception in primates rely on both bottom-up and top-down connections. Therefore, in this work, we propose a CNN network with feedback structure for Solar power plant detection on low-resolution satellite images. Read More

Symmetry is an important composition feature by investigating similar sides inside an image plane. It has a crucial effect to recognize man-made or nature objects within the universe. Recent symmetry detection approaches used a smoothing kernel over different voting maps in the polar coordinate system to detect symmetry peaks, which split the regions of symmetry axis candidates in inefficient way. Read More