Implementing Sentimental Analysis with CatBoost
For this example, we will use the IMDb dataset from the datasets library, which contains 50,000 movie reviews labeled as positive or negative. This dataset is readily available and well-suited for sentiment analysis.
Step 1: Install Necessary Libraries
We will be installing CatBoost library and Datasets module using the following command:
pip install catboost
pip install datasets
Step 2: Load Dataset
First, we load the IMDb dataset using the Hugging Face datasets library and separates it into training and test sets for further use in machine learning tasks. Specifically, train_data contains the reviews and labels for training, while test_data contains the reviews and labels for testing and evaluation.
from datasets import load_dataset
# Load the IMDb dataset
dataset = load_dataset('imdb')
train_data = dataset['train']
test_data = dataset['test']
Step 3: Text Preprocessing using TF-IDF
In the following code, we use TfidfVectorizer
from the sklearn.feature_extraction.text
module to convert the text data from the IMDb dataset into numerical feature vectors based on the TF-IDF scheme, limited to 5000 features. The fit_transform
method is applied to the training data (train_data['text']
) to learn the vocabulary and transform the text into TF-IDF features, while the transform
method is applied to the test data (test_data['text']
) to transform it using the same vocabulary. The labels for the training and test sets are extracted and stored in y_train
and y_test
, respectively, for use in model training and evaluation.
from sklearn.feature_extraction.text import TfidfVectorizer
# Vectorize text data
vectorizer = TfidfVectorizer(max_features=5000)
X_train = vectorizer.fit_transform(train_data['text'])
X_test = vectorizer.transform(test_data['text'])
y_train = train_data['label']
y_test = test_data['label']
Step 4: Model Training
Here, the code initializes a CatBoostClassifier
with specified parameters (iterations, learning rate, depth, and verbosity) and fits the model to the TF-IDF transformed training data (X_train
and y_train
).
from catboost import CatBoostClassifier
# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=1000, learning_rate=0.1, depth=6, verbose=100)
# Fit the model
model.fit(X_train, y_train)
Step 5: Model Training
After training the model, we predict the sentiments on the test set and evaluate the model’s performance.
from sklearn.metrics import accuracy_score, classification_report
# Predict on the test set
y_pred = model.predict(X_test)
# Evaluate the performance
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(classification_report(y_test, y_pred))
Complete Code for Sentimental Analysis using CatBoost
from datasets import load_dataset
# Load the IMDb dataset
dataset = load_dataset('imdb')
train_data = dataset['train']
test_data = dataset['test']
from sklearn.feature_extraction.text import TfidfVectorizer
# Vectorize text data
vectorizer = TfidfVectorizer(max_features=5000)
X_train = vectorizer.fit_transform(train_data['text'])
X_test = vectorizer.transform(test_data['text'])
y_train = train_data['label']
y_test = test_data['label']
from catboost import CatBoostClassifier
# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=1000, learning_rate=0.1, depth=6, verbose=100)
# Fit the model
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score, classification_report
# Predict on the test set
y_pred = model.predict(X_test)
# Evaluate the performance
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(classification_report(y_test, y_pred))
Output:
Accuracy: 0.8766
precision recall f1-score support
0 0.89 0.86 0.88 12500
1 0.87 0.89 0.88 12500
accuracy 0.88 25000
macro avg 0.88 0.88 0.88 25000
weighted avg 0.88 0.88 0.88 25000
Sentiment Analysis using CatBoost
Sentiment analysis is crucial for understanding the emotional tone behind text data, making it invaluable for applications such as customer feedback analysis, social media monitoring, and market research. In this article, we will explore how to perform sentiment analysis using CatBoost.
Table of Content
- Key Features of CatBoost
- Why use to CatBoost for Sentiment Analysis?
- Implementing Sentimental Analysis with CatBoost
- Step 1: Install Necessary Libraries
- Step 2: Load Dataset
- Step 3: Text Preprocessing using TF-IDF
- Step 4: Model Training
- Step 5: Model Training
- Complete Code for Sentimental Analysis using CatBoost
- Conclusion
Contact Us