Main Content

Recognize Text Using Optical Character Recognition (OCR)

This example shows how to use the ocr function from the Computer Vision Toolbox™ to perform optical character recognition.

Text Recognition Using the ocr Function

Recognizing text in images is useful in many computer vision applications such as image search, document analysis, and robot navigation. The ocr function provides an easy way to add text recognition functionality to a wide range of applications.

% Load an image.
I = imread("businessCard.png");

% Perform OCR.
results = ocr(I);

% Display one of the recognized words.
word = results.Words{2}
word = 
'MathWorks:'
% Location of the word in I
wordBBox = results.WordBoundingBoxes(2,:)
wordBBox = 1×4

   173    66   376    82

% Show the location of the word in the original image.
figure
Iname = insertObjectAnnotation(I,"rectangle",wordBBox,word);
imshow(Iname)

Figure contains an axes object. The axes object contains an object of type image.

Information Returned by the ocr Function

The ocr functions returns the recognized text, the recognition confidence, and the location of the text in the original image. You can use this information to identify the location of misclassified text within the image.

% Find 5 characters with least confidences.
[~ ,idx] = sort(results.CharacterConfidences);
lowConfidenceIdx = idx(1:5);

% Get the bounding box locations of the low confidence characters.
lowConfBBoxes = results.CharacterBoundingBoxes(lowConfidenceIdx,:);

% Get recognized characters.
lowConfChars = results.Text(lowConfidenceIdx)';

% Annotate image with low confidence characters.
str      = "Character = '" + lowConfChars + "'";
Ilowconf = insertObjectAnnotation(I,"rectangle",lowConfBBoxes,str);

figure
imshow(Ilowconf)
title("Character recognitions with the least confidence")

Figure contains an axes object. The axes object with title Character recognitions with the least confidence contains an object of type image.

Here, the logo in the business card is incorrectly classified as a text character. These kind of OCR errors can be identified using the confidence values before any further processing takes place.

Challenges Obtaining Accurate Results

ocr performs best when the text is located on a uniform background and is formatted like a document with dark text on a light background. When the text appears on a non-uniform dark background, additional pre-processing steps are required to get the best OCR results. In this part of the example, you will try to locate the digits on a keypad. Although, the keypad image may appear to be easy for OCR, it is actually quite challenging because the text is on a non-uniform dark background.

I = imread("keypad.jpg");
I = im2gray(I);

figure
imshow(I)

Figure contains an axes object. The axes object contains an object of type image.

% Run OCR on the image
results = ocr(I);

results.Text
ans = 
    ' 
     
     '

The empty results.Text indicates that no text is recognized. In the keypad image, the text is sparse and located on an irregular background. In this case, the heuristics used for document layout analysis within ocr might be failing to find blocks of text within the image, and, as a result, text recognition fails. In this situation, disabling the automatic layout analysis, using the LayoutAnalysis parameter, may help improve the results.

% Set LayoutAnalysis to "Block" to instruct ocr to assume the image
% contains just one block of text.
results = ocr(I,LayoutAnalysis="Block");

results.Text
ans =

  0x0 empty char array

What Went Wrong?

Adjusting the LayoutAnalysis parameter did not help. To understand why OCR continues to fail, you have to investigate the initial binarization step performed within ocr. You can use imbinarize to check this initial binarization step because both ocr and the default "global" method in imbinarize use Otsu's method for image binarization.

BW = imbinarize(I);

figure
imshowpair(I,BW,"montage")

Figure contains an axes object. The axes object contains an object of type image.

After thresholding, the binary image contains no text. This is why ocr failed to recognize any text in the original image. You can help improve the results by pre-processing the image to improve text segmentation. The next part of the example explores two useful pre-processing techniques.

Image Pre-processing Techniques To Improve Results

The poor text segmentation seen above is caused by the non-uniform background in the image, i.e. the light-gray keys surrounded by dark gray. You can use the following pre-processing technique to remove the background variations and improve the text segmentation. Additional details about this technique are given in the example entitled Correct Nonuniform Illumination and Analyze Foreground Objects.

% Remove keypad background.
Icorrected = imtophat(I,strel("disk",15));

BW1 = imbinarize(Icorrected);

figure 
imshowpair(I,BW1,"montage")

Figure contains an axes object. The axes object contains an object of type image.

After removing the background variation, the digits are now visible in the binary image. However, there are a few artifacts at the edge of the keys and the small text next to the digits that may continue to hinder accurate OCR of the whole image. Additional pre-processing using morphological reconstruction helps to remove these artifacts and produce a cleaner image for OCR.

% Perform morphological reconstruction and show binarized image.
marker = imerode(Icorrected,strel("line",10,0));
Iclean = imreconstruct(marker,Icorrected);

Ibinary = imbinarize(Iclean);

figure
imshowpair(Iclean,Ibinary,"montage")

Figure contains an axes object. The axes object contains an object of type image.

Now invert the clean binarized image to produce an image containing dark text on a light background for OCR.

BW2 = imcomplement(Ibinary);
figure
imshowpair(Ibinary,BW2,"montage")

Figure contains an axes object. The axes object contains an object of type image.

After these pre-processing steps, the digits are now well segmented from the background and ocr produces some results.

results = ocr(BW2,LayoutAnalysis="block");

results.Text
ans = 
    'ww] 2 x 3
     md ud wb
     on/ wB wm?
     -* . 0 #)
     
     '

The results look largely inaccurate except for few characters. This is due to difference in sizes of characters in the keypad which is causing the automatic layout analysis to fail.

One approach to improve the results is to leverage a priori knowledge about the text within the image. In this example, the text you are interested in contains only numeric digits and *# and ' characters. You can improve the results by constraining ocr to only select the best matches from the set "0123456789*#".

% Use the "CharacterSet" parameter to constrain OCR
results = ocr(BW2,CharacterSet="0123456789*#");

results.Text
ans = 
    '2 3
     4
     78
     
     *0
     
     '

The results are now better and contain only characters from the given character set. However, there are still few characters of interest in the image that are missing in the recognition results.

ROI-based Processing To Improve Results

To further improve the recognition results in this situation, identify specific regions in the image that ocr should process. In the keypad example image, these regions would be those that just contain the digits, *, and # characters. You may select the regions manually using imrect, or you can automate the process. For information about how to automatically detect text regions, see Automatically Detect and Recognize Text Using MSER and OCR and Automatically Detect and Recognize Text Using Pretrained CRAFT Network and OCR. In this example, you will use regionprops to find the characters of interest on the keypad.

% Use regionprops to find bounding boxes around text regions and measure their area.
cc = bwconncomp(Ibinary);
stats = regionprops(cc, ["BoundingBox","Area"]);

% Extract bounding boxes and area from the output statistics.
roi = vertcat(stats(:).BoundingBox);
area = vertcat(stats(:).Area);

% Show all the connected regions.
img = insertObjectAnnotation(I,"rectangle",roi,area,"LineWidth",3);
figure;
imshow(img);

Figure contains an axes object. The axes object contains an object of type image.

The smallest character of interest in this example is the digit "1". Use its area to filter any outliers.

% Define area constraint based on the area of smallest character of interest.
areaConstraint = area > 347;

% Keep regions that meet the area constraint.
roi = double(roi(areaConstraint,:));

% Show remaining bounding boxes after applying the area constraint.
img = insertShape(I,"rectangle",roi);
figure;
imshow(img);

Figure contains an axes object. The axes object contains an object of type image.

Further processing based on a region's aspect ratio is applied to identify regions that are likely to contain a single character. This helps to remove the smaller text characters that are jumbled together next to the digits. In general, the larger the text the easier it is for ocr to recognize.

% Compute the aspect ratio.
width  = roi(:,3);
height = roi(:,4);
aspectRatio = width ./ height;

% An aspect ratio between 0.25 and 1.25 is typical for individual characters
% as they are usually not very short and wide or very tall and skinny.
roi = roi( aspectRatio > 0.25 & aspectRatio < 1.25 ,:);

% Show regions after applying the area and aspect ratio constraints.
img = insertShape(I,"rectangle",roi);
figure;
imshow(img);

Figure contains an axes object. The axes object contains an object of type image.

The remaining regions can be passed into the ocr function, which accepts rectangular regions of interest as input. The size of the regions are increased slightly to include additional background pixels around the text characters. This helps to improve the internal heuristics used to determine the polarity of the text on the background (e.g. light text on a dark background vs. dark text on a light background).

numAdditionalPixels = 5;
roi(:,1:2) = roi(:,1:2) - numAdditionalPixels;
roi(:,3:4) = roi(:,3:4) + 2*numAdditionalPixels;

Disable the automatic layout analysis by setting LayoutAnalysis to "none". When ROI inputs are provided manually, setting LayoutAnalysis to "block",“word”, “textline”, “character” or “none” may help improve results. Empirical analysis is required to determine the optimal layout analysis value.

results = ocr(BW2,roi,CharacterSet="0123456789*#",LayoutAnalysis="none");

The recognized text can be displayed on the original image using insertObjectAnnotation. The deblank function is used to remove any trailing characters, such as white space or new lines.

text = deblank({results.Text});
img  = insertObjectAnnotation(I,"rectangle",roi,text);

figure; 
imshow(img)

Figure contains an axes object. The axes object contains an object of type image.

Although regionprops enabled you to find the digits in the keypad image, it may not work as well for images of natural scenes where there are many objects in addition to the text. For these types of images, the technique shown in the example Automatically Detect and Recognize Text Using Pretrained CRAFT Network and OCR may provide better text detection results.

Summary

This example showed how the ocr function can be used to recognize text in images, and how a seemingly easy image for OCR required extra pre-processing steps to produce good results.

References

[1] Ray Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.

Related Topics