Target Encoding

Random Forest Classification on One-hot encoded data

Random Forest Classification on Target encoded data

The code initializes a TargetEncoder object from the category_encoders library, specifying the columns to encode as the features excluding the last column (‘Class’) using col_names[:-1].
An OrdinalEncoder object (oe) is also initialized to encode the target variable (‘Class’) to ordinal values.
The target variable is encoded using oe to create y_train_oe and y_test_oe.
Copies of the training and testing feature sets (X_train and X_test) are created to preserve the original data.
The fit_transform method of the TargetEncoder object is applied to the training feature set (X_train_te), fitting the encoder and transforming the training data into target encoded format using the encoded target variable (y_train_oe).
Similarly, the transform method is applied to the testing feature set (X_test_te) to transform it into the same target encoded format using the encoded target variable (y_test_oe).
The head() method is used to display the first few rows of the transformed training feature set (X_train_te).

Python3

target_encoder = ce.TargetEncoder(cols=col_names[:-1])
oe = ce.OrdinalEncoder(cols=["Class"])
y_train_oe = oe.fit_transform(y_train)
y_test_oe = oe.transform(y_test)
X_train_te = X_train.copy()
X_test_te = X_test.copy()
X_train_te = target_encoder.fit_transform(X_train_te, y_train_oe)
X_test_te = target_encoder.transform(X_test_te, y_test_oe)
X_train_te.head()

Output:


Cost    Maintenance    Doors    Persons    Luggage boot    Safety
107    1.168639    1.159292    1.466667    1.596950    1.522777    1.738462
901    1.521127    1.159292    1.397661    1.627907    1.295896    1.513100
1709    1.684814    1.688623    1.466667    1.000000    1.522777    1.738462
706    1.264706    1.517045    1.450867    1.000000    1.421397    1.513100
678    1.264706    1.517045    1.397661    1.000000    1.421397    1.000000

How to fit categorical data types for random forest classification?

Categorical variables are an essential component of many datasets, representing qualitative characteristics rather than numerical values. While random forest classification is a powerful machine-learning technique, it typically requires numerical input data. Therefore, encoding categorical variables into a suitable format is a crucial step in preparing data for random forest classification. In this article, we’ll explore different encoding methods and their applications in fitting categorical data types for random forest classification.

Target Encoding

How to fit categorical data types for random forest classification?

Similar Reads

Contact Us