Data Preparation for Training

In this section, we will convert the given images into NumPy arrays of their pixels after resizing them because training a Deep Neural Network on large-size images is highly inefficient in terms of computational cost and time.

For this purpose, we will use the OpenCV library and Numpy library of python to serve the purpose. Also, after all the images are converted into the desired format we will split them into training and validation data so, that we can evaluate the performance of our model.

Python3




IMG_SIZE = 256
SPLIT = 0.2
EPOCHS = 10
BATCH_SIZE = 64


Some of the hyperparameters which we can tweak from here for the whole notebook.

Python3




X = []
Y = []
 
for i, cat in enumerate(classes):
  images = glob(f'{path}/{cat}/*.jpeg')
 
  for image in images:
    img = cv2.imread(image)
     
    X.append(cv2.resize(img, (IMG_SIZE, IMG_SIZE)))
    Y.append(i)
 
X = np.asarray(X)
one_hot_encoded_Y = pd.get_dummies(Y).values


One hot encoding will help us to train a model which can predict soft probabilities of an image being from each class with the highest probability for the class to which it really belongs.

Python3




X_train, X_val, Y_train, Y_val = train_test_split(X, one_hot_encoded_Y,
                                                  test_size = SPLIT,
                                                  random_state = 2022)
print(X_train.shape, X_val.shape)


Output:

(12000, 256, 256, 3) (3000, 256, 256, 3)

In this step, we will achieve the shuffling of the data automatically because the train_test_split function split the data randomly in the given ratio.

Lung Cancer Detection using Convolutional Neural Network (CNN)

Computer Vision is one of the applications of deep neural networks that enables us to automate tasks that earlier required years of expertise and one such use in predicting the presence of cancerous cells.

In this article, we will learn how to build a classifier using a simple Convolution Neural Network which can classify normal lung tissues from cancerous. This project has been developed using collab and the dataset has been taken from Kaggle whose link has been provided as well.

The process which will be followed to build this classifier:

Flow Chart for the Project

Similar Reads

Modules Used

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code....

Importing Dataset

...

Data Visualization

The dataset which we will use here has been taken from -https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological-images.  This dataset includes 5000 images for three classes of lung conditions:...

Data Preparation for Training

...

Model Development

In this section, we will try to understand visualize some images which have been provided to us to build the classifier for each class....

Model Evaluation

...

Conclusion:

...

Contact Us