View Synthesis by Appearance Flow

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows -- 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

Similar Publications

Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. Read More

Computational photography encompasses a diversity of imaging techniques, but one of the core operations performed by many of them is to compute image differences. An intuitive approach to computing such differences is to capture several images sequentially and then process them jointly. Usually, this approach leads to artifacts when recording dynamic scenes. Read More

Given a large unlabeled set of images, how to efficiently and effectively group them into clusters based on the extracted visual representations remains a challenging problem. To address this problem, we propose a convolutional neural network (CNN) to jointly solve clustering and representation learning in an iterative manner. In the proposed method, given an input image set, we first randomly pick k samples and extract their features as initial cluster centroids using the proposed CNN with an initial model pre-trained from the ImageNet dataset. Read More

Cellular Automata (CA) theory is a discrete model that represents the state of each of its cells from a finite set of possible values which evolve in time according to a pre-defined set of transition rules. CA have been applied to a number of image processing tasks such as Convex Hull Detection, Image Denoising etc. but mostly under the limitation of restricting the input to binary images. Read More

Recently, deep convolutional neural network (DCNN) achieved increasingly remarkable success and rapidly developed in the field of natural image recognition. Compared with the natural image, the scale of remote sensing image is larger and the scene and the object it represents are more macroscopic. This study inquires whether remote sensing scene and natural scene recognitions differ and raises the following questions: What are the key factors in remote sensing scene recognition? Is the DCNN recognition mechanism centered on object recognition still applicable to the scenarios of remote sensing scene understanding? We performed several experiments to explore the influence of the DCNN structure and the scale of remote sensing scene understanding from the perspective of scene complexity. Read More

Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) automatic 3-D registration is implemented and validated for small animal image volumes so that the high-resolution anatomical MRI information can be fused with the low spatial resolution of functional PET information for the localization of lesion that is currently in high demand in the study of tumor of cancer (oncology) and its corresponding preparation of pharmaceutical drugs. Though many registration algorithms are developed and applied on human brain volumes, these methods may not be as efficient on small animal datasets due to lack of intensity information and often the high anisotropy in voxel dimensions. Therefore, a fully automatic registration algorithm which can register not only assumably rigid small animal volumes such as brain but also deformable organs such as kidney, cardiac and chest is developed using a combination of global affine and local B-spline transformation models in which mutual information is used as a similarity criterion. Read More

In this work, we explain in detail how receptive fields, effective receptive fields, and projective fields of neurons in different layers, convolution or pooling, of a Convolutional Neural Network (CNN) are calculated. While our focus here is on CNNs, the same operations, but in the reverse order, can be used to calculate these quantities for deconvolutional neural networks. These are important concepts, not only for better understanding and analyzing convolutional and deconvolutional networks, but also for optimizing their performance in real-world applications. Read More

Previous studies by our group have shown that three-dimensional high-frequency quantitative ultrasound methods have the potential to differentiate metastatic lymph nodes from cancer-free lymph nodes dissected from human cancer patients. To successfully perform these methods inside the lymph node parenchyma, an automatic segmentation method is highly desired to exclude the surrounding thin layer of fat from quantitative ultrasound processing and accurately correct for ultrasound attenuation. In high-frequency ultrasound images of lymph nodes, the intensity distribution of lymph node parenchyma and fat varies spatially because of acoustic attenuation and focusing effects. Read More

We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. Read More

In order to make hyperspectral image classification compu- tationally tractable, it is often necessary to select the most informative bands instead to process the whole data without losing the geometrical representation of original data. To cope with said issue, an improved un- supervised non-linear deep auto encoder (UDAE) based band selection method is proposed. The proposed UDAE is able to select the most infor- mative bands in such a way that preserve the key information but in the lower dimensions, where the hidden representation is a non-linear trans- formation that maps the original space to a space of lower dimensions. Read More