Text Localization in OCR
Now we will learn to get the position of text and prepare a bounding box around it.
To get the bounding box, we can run the ocr_data() method on the image.
bound_box = ocr_data(img)
Step 1: Load the libraries
R
install.packages ( c ( "png" , "tesseract" , "magick" , "boundingbox" , "grid" , "magrittr" , "ggplot2" )) library (png) library (tesseract) library (magick) library (boundingbox) library (grid) library (tesseract) library (magrittr) library (ggplot2) |
Step 2: Load image and generate the bounding box data. The ocr_data() method takes an image and sends the coordinates of the rectangle box in form of (x1, y1, x2, y2) coordinates separated by comma which we extract in later step. The coordinates data is stored in bound_box variable.
R
# png load image img = image_read ( 'https://media.w3wiki.org/wp-content/uploads/20190328185307/gfg28.png' ) # getting word and bounding box bound_box = ocr_data (img) |
Step 3: Convert the coordinates from chr to double by extracting the bound_box data splitting by comma and then saving them as xmin, ymin, xmax and ymax respectively.
R
bound_box = as.data.frame (bound_box) # convert the co ordinates into dataframe bound_box$bbox <- strsplit (bound_box$bbox, "," ) bound_box$xmin <- sapply (bound_box$bbox, function (x) as.numeric (x[1])) bound_box$ymin <- sapply (bound_box$bbox, function (x) as.numeric (x[2])) bound_box$xmax <- sapply (bound_box$bbox, function (x) as.numeric (x[3])) bound_box$ymax <- sapply (bound_box$bbox, function (x) as.numeric (x[4])) |
Output:
word confidence bbox
1 w3wiki 92.04797 5,15,661,96
2 A 96.76034 48,124,71,150
3 computer 96.31223 82,126,237,158
4 science 96.52452 248,123,362,150
5 portal 96.56268 376,122,466,158
6 for 96.14149 480,122,524,150
7 geeks 96.14149 536,122,626,158
Step 4: Plot the image
R
# Plot image with bounding boxes ggplot () + annotation_custom ( rasterGrob (img)) + geom_rect (data = bound_box, aes (xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), color = "red" , fill = NA ) + geom_text (data = bound_box, aes (x = (xmin+xmax)/2, y = ymax+10, label = word), color = "red" , size = 3) + theme_void ()+ scale_y_reverse () |
Output:
Optical Character Recognition (Ocr) Using R
OCR transforms text images into machine-readable formats. With applications ranging from receipts to license plates, we explore the process, syntax, and examples, demonstrating its versatility. In this tutorial, we will learn to perform Optical Character Recognition in R programming language using the Tesseract and Magick libraries.
Contact Us