Text Localization in OCR

Now we will learn to get the position of text and prepare a bounding box around it.

To get the bounding box, we can run the ocr_data() method on the image.

bound_box = ocr_data(img)

Step 1: Load the libraries

R




install.packages(c("png", "tesseract", "magick", "boundingbox", "grid", "magrittr", "ggplot2"))
 
library(png)
library(tesseract)
library(magick)
library(boundingbox)
library(grid)
library(tesseract)
library(magrittr)
library(ggplot2)


Step 2: Load image and generate the bounding box data. The ocr_data() method takes an image and sends the coordinates of the rectangle box in form of (x1, y1, x2, y2) coordinates separated by comma which we extract in later step. The coordinates data is stored in bound_box variable.

R




# png load image
img = image_read('https://media.w3wiki.org/wp-content/uploads/20190328185307/gfg28.png')
 
# getting word and bounding box
bound_box = ocr_data(img)


Step 3: Convert the coordinates from chr to double by extracting the bound_box data splitting by comma and then saving them as xmin, ymin, xmax and ymax respectively.

R




bound_box = as.data.frame(bound_box)
# convert the co ordinates into dataframe
bound_box$bbox <- strsplit(bound_box$bbox, ",")
bound_box$xmin <- sapply(bound_box$bbox, function(x) as.numeric(x[1]))
bound_box$ymin <- sapply(bound_box$bbox, function(x) as.numeric(x[2]))
bound_box$xmax <- sapply(bound_box$bbox, function(x) as.numeric(x[3]))
bound_box$ymax <- sapply(bound_box$bbox, function(x) as.numeric(x[4]))


Output:

           word confidence            bbox
1 w3wiki 92.04797 5,15,661,96
2 A 96.76034 48,124,71,150
3 computer 96.31223 82,126,237,158
4 science 96.52452 248,123,362,150
5 portal 96.56268 376,122,466,158
6 for 96.14149 480,122,524,150
7 geeks 96.14149 536,122,626,158

Step 4: Plot the image

R




# Plot image with bounding boxes
ggplot() +
  annotation_custom(rasterGrob(img)) +
  geom_rect(data = bound_box, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), color = "red", fill = NA) +
  geom_text(data = bound_box, aes(x = (xmin+xmax)/2, y = ymax+10, label = word), color = "red", size = 3) +
  theme_void()+
  scale_y_reverse()


Output:

Optical Character Recognition (Ocr) Using R

OCR transforms text images into machine-readable formats. With applications ranging from receipts to license plates, we explore the process, syntax, and examples, demonstrating its versatility. In this tutorial, we will learn to perform Optical Character Recognition in R programming language using the Tesseract and Magick libraries.

Similar Reads

Optical Character Recognition

OCR stands for Optical Character Recognition. It is the procedure that transforms a text image into a text format that computers can read. OCR generally scans the image and extracts the text from the image that we can store in any string variable. OCRs are used to read receipts, cheques, code scanners, license plate scanners, and other numerous applications....

Example 1: Reading text from an Image

Step 1: Install and load the libraries:...

Example 2: Converting text from PDF.

...

Text Localization in OCR

...

Advantages of OCR

...

Disadvantages of OCR

Here we need to convert the PDF into png and then perform the OCR. The syntax is as follows:...

Conclusion

...

Contact Us