Convolutional Neural Networks (CNN) for Sentence Classification

Sentence classification is the task of automatically assigning categories to sentences based on their content. This has broad applications like identifying spam emails, classifying customer feedback, or determining the topic of a news article. Convolutional Neural Networks (CNNs) have proven remarkably successful for this task. In this article, we will see how we can use convolutional neural networks for sentence classification.

Table of Content

  • Why use CNN for sentence classification?
  • Implementation of Convolutional Neural Networks for Sentence Classification
    • Step 1 : Importing Necessary Libraries
    • Step 2: Generate Sample Data
    • Step 3: Data Preprocessing
    • Step 4: Defining the Model
    • Step 5: Compiling and training the model
    • Step 6: Prediction

Why use CNN for sentence classification?

CNN are mostly used for image classification tasks, due to their ability to detect local patterns.

Convolutional Neural Networks (CNNs) are effective for sentence classification due to their unique structure and capabilities. Here’s why CNNs are particularly suited for the task of classifying sentences:

  1. Detection of Local Patterns: Unlike traditional models that may analyze text linearly or treat words individually, CNNs excel at capturing local contextual relationships within the text. By applying filters over the word embeddings, CNNs can detect phrases and combinations of words that carry significant meaning, making them good at understanding the syntactic and semantic nuances of language.
  2. Hierarchical Feature Learning: CNNs operate through multiple layers, each designed to recognize increasingly complex patterns. In sentence classification, this means that lower layers might identify basic elements like parts of speech or simple phrases, while deeper layers can interpret more complex constructs like idiomatic expressions or technical jargon. This layered approach mirrors the way humans process textual information, considering both the details and the bigger picture.
  3. Robustness to Sentence Length: CNNs are less sensitive to the length of the input sentences compared to some other models. Through operations like max pooling, which down-samples the input’s dimensions, they manage to distil the text to its most essential parts. This means that regardless of a sentence’s length, the model can efficiently process and extract the most salient features, ensuring consistent performance across varied inputs.
  4. Efficiency and Speed: CNNs are computationally efficient due to their architecture, which makes them suitable for applications needing rapid processing of large volumes of text, such as real-time content moderation or interactive language-based applications.
  5. Reduced Need for Manual Feature Engineering: CNNs have the capability to automatically learn significant features from the training data without extensive intervention or manual feature design. This autonomous feature extraction reduces the potential for human bias and error, while also simplifying the model development process.

Implementation of Convolutional Neural Networks for Sentence Classification

Here, we will implement a CNN model for Sentence Classification:

Step 1 : Importing Necessary Libraries

At first we will import all the necessary files required for our model.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

Step 2: Generate Sample Data


We will now generate sample data on which our model will be trained.

# Sample data
sentences = [
"I love reading books",
"The weather today is great",
"TensorFlow makes machine learning easy",
"I enjoy running in the park"
]

# Binary labels for simplicity: 0 for neutral, 1 for positive
labels = [1, 1, 1, 0]

Step 3: Data Preprocessing

We use Keras to prepare text data for neural network training by converting sentences to sequences of integers representing words, then padding these sequences to ensure uniform length, and finally converting labels to a format suitable for model training. This preprocessing involves tokenization, sequence padding, and label formatting to make the data compatible with TensorFlow’s requirements for efficient computation.

tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)

# Pad sequences to ensure uniform input size
padded_sequences = pad_sequences(sequences, maxlen=10)

# Convert labels to a numpy array
labels = np.array(labels, dtype=np.float32)

Step 4: Defining the Model

The code snippet defines a convolutional neural network (CNN) model for binary classification of sentences using Keras, a high-level neural networks API that runs on top of TensorFlow.

# Define the model
model = Sequential([
Embedding(input_dim=100, output_dim=16, input_length=10),
Conv1D(32, 5, activation='relu'),
GlobalMaxPooling1D(),
Dense(10, activation='relu'),
Dense(1, activation='sigmoid') # Binary classification
])

Step 5: Compiling and training the model

The code shows the final steps needed to prepare and train a Convolutional Neural Network (CNN) model using Keras, specifically compiling the model and training it

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(padded_sequences, labels, epochs=10)

Step 6: Prediction

In this code we demonstrate how to use a trained model to predict classes for new data.

# Example of predicting new data
test_sentences = ["I dislike running", "Reading is enjoyable"]
test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen=10)
predictions = model.predict(test_padded)
print(predictions) # Outputs a probability of belonging to class 1

Implementing the whole code at once,

Python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Sample data
sentences = [
    "I love reading books",
    "The weather today is great",
    "TensorFlow makes machine learning easy",
    "I enjoy running in the park"
]

# Binary labels for simplicity: 0 for neutral, 1 for positive
labels = [1, 1, 1, 0]

# Tokenize the data
tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)

# Pad sequences to ensure uniform input size
padded_sequences = pad_sequences(sequences, maxlen=10)

# Convert labels to a numpy array
labels = np.array(labels, dtype=np.float32)

# Define the model
model = Sequential([
    Embedding(input_dim=100, output_dim=16, input_length=10),
    Conv1D(32, 5, activation='relu'),
    GlobalMaxPooling1D(),
    Dense(10, activation='relu'),
    Dense(1, activation='sigmoid')  # Binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(padded_sequences, labels, epochs=10)

# Example of predicting new data
test_sentences = ["I dislike running", "Reading is enjoyable"]
test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen=10)
predictions = model.predict(test_padded)
print(predictions)  # Outputs a probability of belonging to class 1

Output:

[[0.53922826]
[0.54247886]]

The output [0.53922826] and [0.54247886] are the predicted probabilities of the input sentences belonging to class 1. These values indicate the model’s confidence in its predictions, with values closer to 0 indicating low confidence and values closer to 1 indicating high confidence.

Note: We have generated a small corpus due to computational limitations.



Contact Us