Text Localization in OCR

Now we will learn to get the position of text and prepare a bounding box around it.

To get the bounding box, we can run the ocr_data() method on the image.

bound_box = ocr_data(img)

Step 1: Load the libraries

R

install.packages(c("png", "tesseract", "magick", "boundingbox", "grid", "magrittr", "ggplot2"))
 
library(png)
library(tesseract)
library(magick)
library(boundingbox)
library(grid)
library(tesseract)
library(magrittr)
library(ggplot2)

Step 2: Load image and generate the bounding box data. The ocr_data() method takes an image and sends the coordinates of the rectangle box in form of (x1, y1, x2, y2) coordinates separated by comma which we extract in later step. The coordinates data is stored in bound_box variable.

R

# png load image
img = image_read('https://media.w3wiki.org/wp-content/uploads/20190328185307/gfg28.png')
 
# getting word and bounding box
bound_box = ocr_data(img)

Step 3: Convert the coordinates from chr to double by extracting the bound_box data splitting by comma and then saving them as xmin, ymin, xmax and ymax respectively.

R

bound_box = as.data.frame(bound_box)
# convert the co ordinates into dataframe
bound_box$bbox <- strsplit(bound_box$bbox, ",")
bound_box$xmin <- sapply(bound_box$bbox, function(x) as.numeric(x[1]))
bound_box$ymin <- sapply(bound_box$bbox, function(x) as.numeric(x[2]))
bound_box$xmax <- sapply(bound_box$bbox, function(x) as.numeric(x[3]))
bound_box$ymax <- sapply(bound_box$bbox, function(x) as.numeric(x[4]))

Output:

           word confidence            bbox
1 w3wiki   92.04797     5,15,661,96
2             A   96.76034   48,124,71,150
3      computer   96.31223  82,126,237,158
4       science   96.52452 248,123,362,150
5        portal   96.56268 376,122,466,158
6           for   96.14149 480,122,524,150
7         geeks   96.14149 536,122,626,158

Step 4: Plot the image

R

# Plot image with bounding boxes
ggplot() +
  annotation_custom(rasterGrob(img)) +
  geom_rect(data = bound_box, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), color = "red", fill = NA) +
  geom_text(data = bound_box, aes(x = (xmin+xmax)/2, y = ymax+10, label = word), color = "red", size = 3) +
  theme_void()+
  scale_y_reverse()

Output:

Optical Character Recognition (Ocr) Using R

OCR transforms text images into machine-readable formats. With applications ranging from receipts to license plates, we explore the process, syntax, and examples, demonstrating its versatility. In this tutorial, we will learn to perform Optical Character Recognition in R programming language using the Tesseract and Magick libraries.

Tags:

#Geeks Premier League 2023 #AI-ML-DS #Geeks Premier League #R Machine Learning

Example 2: Converting text from PDF.

Advantages of OCR

Text Localization in OCR

R

R

R

R

Optical Character Recognition (Ocr) Using R

Similar Reads

Contact Us