Preprocessing the Images

The second step in training the model is preprocessing. With Preprocessing, we can feed the model data it actually needs. Preprocessing consists of various steps like Cropping, Noise Reduction, Greyscale, and much more which allows us to create a better Machine learning Model. Various processes are done to transform the image while Preprocessing. These include Grayscale Conversion, Resizing, etc. These steps make the input images simpler which helps the computer identify patterns in these images.

This step can be done with OpenCV or the library of your choice using any supported programming language like Python or R Programming Language. OpenCV is the industry and hobbyist choice for preprocessing and is very reliable. But we need to mention the libraries we would use:

Python3

import os 
import numpy as np 
import matplotlib.pyplot as plt 
from pathlib import Path 
from collections import Counter 
import tensorflow as tf 
from tensorflow import keras 
from tensorflow.keras import layers 

The Code is also available in notebook format. To access it follow this link:

Python3

# Path to the Dataset 
direc = Path("ML\samples") 
  
dir_img = sorted(list(map(str, list(direc.glob("*.png"))))) 
img_labels = [img.split(os.path.sep)[-1]. 
              split(".png")[0] for img in dir_img] 
char_img = set(char for label in img_labels for char in label) 
char_img = sorted(list(char_img)) 
  
print("Number of dir_img found: ", len(dir_img)) 
print("Number of img_labels found: ", len(img_labels)) 
print("Number of unique char_img: ", len(char_img)) 
print("Characters present: ", char_img) 
  
# Batch Size of Training and Validation 
batch_size = 16
  
# Setting dimensions of the image 
img_width = 200
img_height = 50
  
# Setting downsampling factor 
downsample_factor = 4
  
# Setting the Maximum Length 
max_length = max([len(label) for label in img_labels]) 
  
# Char to integers 
char_to_num = layers.StringLookup( 
    vocabulary=list(char_img), mask_token=None
) 
  
# Integers to original chaecters 
num_to_char = layers.StringLookup( 
    vocabulary=char_to_num.get_vocabulary(), 
    mask_token=None, invert=True
) 
  
  
def data_split(dir_img, img_labels, 
               train_size=0.9, shuffle=True): 
    # Get the total size of the dataset 
    size = len(dir_img) 
    # Create an indices array and shuffle it if required 
    indices = np.arange(size) 
    if shuffle: 
        np.random.shuffle(indices) 
    # Calculate the size of training samples 
    train_samples = int(size * train_size) 
    # Split data into training and validation sets 
    x_train, y_train = dir_img[indices[:train_samples]], 
    img_labels[indices[:train_samples]] 
    x_valid, y_valid = dir_img[indices[train_samples:]], 
    img_labels[indices[train_samples:]] 
    return x_train, x_valid, y_train, y_valid 
  
  
# Split data into training and validation sets 
x_train, x_valid,\ 
    y_train, y_valid = data_split(np.array(dir_img), 
                                  np.array(img_labels)) 
  
  
def encode_sample(img_path, label): 
    # Read the image 
    img = tf.io.read_file(img_path) 
    # Converting the image to grayscale 
    img = tf.io.decode_png(img, channels=1) 
    img = tf.image.convert_image_dtype(img, tf.float32) 
    # Resizing to the desired size 
    img = tf.image.resize(img, [img_height, img_width]) 
    # Transposing the image 
    img = tf.transpose(img, perm=[1, 0, 2]) 
    # Mapping image label to numbers 
    label = char_to_num(tf.strings.unicode_split(label, 
                                                 input_encoding="UTF-8")) 
  
    return {"image": img, "label": label} 

How to Break a CAPTCHA System with Machine Learning?

CAPTCHA, short for Completely Automated Public Turing Test to Tell Computers and Humans Apart, is a revolutionary technology that helps identify humans from bots and saves your site from malicious intentions. But this technology has begun to show its age. Captcha was supposed to be a robust system, but artificial intelligence is driving it almost useless. To break a Captcha, we require a machine-learning model which we need to train. After its training, all that is required is to feed the model any CAPTCHA you want, which it will solve for you.

Through this article, we will explore how one can break a CAPTCHA system with the help of machine learning. We will discuss in detail the complete process. Besides, we will also share the limitations of this approach and the ethical and moral issues that need to be considered while attempting this. This should be remembered that our intention behind breaking CAPTCHA should be to educate ourselves and highlight the incapability of the system to filter out non-humans. But CAPTCHAs are the things saving sites from malicious attacks, and they are effectively safeguarding the internet. So, using bots to break CAPTCHAs on websites without permission is unethical at best and also illegal, depending on your location.

Preprocessing the Images

Python3

Python3

How to Break a CAPTCHA System with Machine Learning?

Similar Reads

Contact Us