Reading Text Appearing in Natural Images

Author

Radhakrishna Achanta

Goal of Research

The goal of this research is to detect amd read English text occuring in images of natural scenes.

Abstract of Current Research

Text present in digital images can provide useful information about the image and the group of images it belongs to. On this page we present some results of automatic detection and character recognition of text in natural scenes. The results are presented for three diffrent approaches to text detection and reading. The important contribution in two of them is that there is no third party Optical Character Recognition (OCR) software need for reading the text.

For all three approaches, We first locate the text regions in the image. The first figure on the left presents initial results of text detection. The second picture shows the text region after combining the detection rectangles using a histogram binning based method. After this we read the text by recognizing the characters.

  • In the first approach, character detection is done directly on the unprocessed text region detected in the previous stage using a battery of 26 character recognizers, one each for the capital letters of the English alphabet.
  • In the second approach to text reading, we segment the image using a hill-climbing based k-means segmentation algorithm. We then perform connected-component analysis to find all components in the image. Now using information about the character size from the previous approach, we filter out (blacken) all the non-character components based on aspect ration and area of characters, and retain (whiten) all the character components. The results for this binarize image is show in the next image from left. This can be fed to and Optical Character Recognition (OCR) engine if need be to read text.
  • In the third approach, we run the battery of 26 detectors again on each of the image regions containing a character. This further reduces any false detections that occur in the first stage.

The results are shown on the image dataset of ICDAR’05 text detection competition (the last image on the page does not belong to that dataset).

Original text detection
Text detection before filtering rectangles.
Original text detection
Text detection after filtering rectangles.
Original text detection
‘P’ wrongly detected as ‘D’.
Original text detection
Text detection before filtering rectangles.
Original text detection
Text detection after filtering rectangles.
Original text detection
‘E’ of ‘AGENTS’ and corner of window wrongly detected as ‘L’.
Original text detection
Text region after k-means segmentation.
Original text detection
Text region after binarization.
Original text detection
Final letter recognition and segmentation, here ‘C’ of ‘CASTLE’ was segmented but not recognized.
Original text detection
Text region after k-means segmentation.
Original text detection
Text region after binarization.
Original text detection
Final letter recognition and segmentation, here ‘N’ of ‘SAXONS’ was segmented but not recognized. Also, ‘G’ was identified as ‘O’
Original text detection
Text detection before filtering rectangles.
Original text detection
Text detection after filtering rectangles.
Original text detection
Part of ‘S’ wrongly detected as ‘D’.
Original text detection
Text detection before filtering rectangles.
Original text detection
Text detection after filtering rectangles.
Original text detection
‘O’ of ‘TO’ detected wrongly as ‘P’.
Original text detection
Text region after k-means segmentation.
Original text detection
Text region after binarization.
Original text detection
All letters correctly segmented and recognized
Original text detection
Text region after k-means segmentation.
Original text detection
Text region after binarization.
Original text detection
The letter ‘W’ that borders the edge is detected as a ‘V’

 

More results…

Collaborations

Sabine Süsstrunk
Olivier Küng

Funding

This work is supported by National Competence Center in Research on Mobile Information and Communication Systems (NCCR-MICS), a center supported by the Swiss National Science Foundation under grant number 5005-67322, and by K-Space, the European Network of Excellence in Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content.