Improving Text Proposals for Scene Images with Fully Convolutional Networks

Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.

Comments: 6 pages, 8 figures, International Conference on Pattern Recognition (ICPR) - DLPR (Deep Learning for Pattern Recognition) workshop

Similar Publications

Visual Question Answering (VQA) has received a lot of attention over the past couple of years. A number of deep learning models have been proposed for this task. However, it has been shown that these models are heavily driven by superficial correlations in the training data and lack compositionality -- the ability to answer questions about unseen compositions of seen concepts. Read More

We propose an effective framework for multi-phase image segmentation and semi-supervised data clustering by introducing a novel region force term into the Potts model. Assume the probability that a pixel or a data point belongs to each class is known a priori. We show that the corresponding indicator function obeys the Bernoulli distribution and the new region force function can be computed as the negative log-likelihood function under the Bernoulli distribution. Read More

This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Read More

This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and improve the video representation ability have been proposed. Read More

In this paper, we propose a novel learning based method for automated segmenta-tion of brain tumor in multimodal MRI images. The machine learned features from fully convolutional neural network (FCN) and hand-designed texton fea-tures are used to classify the MRI image voxels. The score map with pixel-wise predictions is used as a feature map which is learned from multimodal MRI train-ing dataset using the FCN. Read More

Being a task of establishing spatial correspondences, medical image registration is often formalized as finding the optimal transformation that best aligns two images. Since the transformation is such an essential component of registration, most existing researches conventionally quantify the registration uncertainty, which is the confidence in the estimated spatial correspondences, by the transformation uncertainty. In this paper, we give concrete examples and reveal that using the transformation uncertainty to quantify the registration uncertainty is inappropriate and sometimes misleading. Read More

Among the patch-based image denoising processing methods, smooth ordering of local patches (patch ordering) has been shown to give state-of-art results. For image denoising the patch ordering method forms two large TSPs (Traveling Salesman Problem) comprised of nodes in N-dimensional space. Ten approximate solutions of the two large TSPs are then used in a filtering process to form the reconstructed image. Read More

Classifiers trained on given databases perform poorly when tested on data acquired in different settings. This is explained in domain adaptation through a shift among distributions of the source and target domains. Attempts to align them have traditionally resulted in works reducing the domain shift by introducing appropriate loss terms, measuring the discrepancies between source and target distributions, in the objective function. Read More

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Read More