Improving Text Proposals for Scene Images with Fully Convolutional Networks

Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.

Comments: 6 pages, 8 figures, International Conference on Pattern Recognition (ICPR) - DLPR (Deep Learning for Pattern Recognition) workshop

Similar Publications

This abstract briefly describes a segmentation algorithm developed for the ISIC 2017 Skin Lesion Detection Competition hosted at [ref]. The objective of the competition is to perform a segmentation (in the form of a binary mask image) of skin lesions in dermoscopic images as close as possible to a segmentation performed by trained clinicians, which is taken as ground truth. This project only takes part in the segmentation phase of the challenge. Read More

As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers' attention because of its descriptive power and clear structure. It localizes the objects and captures their interactions with a subject-predicate-object triplet, e.g. Read More

Convolutional Neural Networks (Convnets) have achieved good results in a range of computer vision tasks the recent years. Though given a lot of attention, visualizing the learned representations to interpret Convnets, still remains a challenging task. The high dimensionality of internal representations and the high abstractions of deep layers are the main challenges when visualizing Convnet functionality. Read More

Mandible bone segmentation from computed tomography (CT) scans is challenging due to mandible's structural irregularities, complex shape patterns, and lack of contrast in joints. Furthermore, connections of teeth to mandible and mandible to remaining parts of the skull make it extremely difficult to identify mandible boundary automatically. This study addresses these challenges by proposing a novel framework where we define the segmentation as two complementary tasks: recognition and delineation. Read More

Cascade is a widely used approach that rejects obvious negative samples at early stages for learning better classifier and faster inference. This paper presents chained cascade network (CC-Net). In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages. Read More

Skin cancer is a major public health problem, as is the most common type of cancer and represents more than half of cancer diagnoses worldwide. Early detection influences the outcome of the disease and motivates our work. We obtain the state of the art results for the ISBI 2016 Melanoma Classification Challenge (named Skin Lesion Analysis towards Melanoma Detection) facing the peculiarities of dealing with such a small, unbalanced, biological database. Read More

Increasing use of CT in modern medical practice has raised concerns over associated radiation dose. Reduction of radiation dose associated with CT can increase noise and artifacts, which can adversely affect diagnostic confidence. Denoising of low-dose CT images on the other hand can help improve diagnostic confidence, which however is a challenging problem due to its ill-posed nature, since one noisy image patch may correspond to many different output patches. Read More

Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies. Read More

Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. Read More

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. A key observation is that traditional cross entropy loss only enforces the inter-class variation among samples and ignores to narrow down the similarity within each category. Read More