Example 1: Training a CatBoostClassifier with Snapshot Saving and Resuming
In this example, we’ll train a CatBoostClassifier on the Iris dataset. We’ll save the model’s snapshots during training and demonstrate how to resume training from a snapshot. Step-by-Step Process
1.Install CatBoost:
pip install catboost
2.Load the Dataset and Prepare Data:
from catboost import CatBoostClassifier, Pool
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load Iris dataset
data = load_iris()
X = data.data
y = data.target
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
3.Initialize and train Catboost Classifier:
# Convert to CatBoost Pool
train_pool = Pool(X_train, y_train)
test_pool = Pool(X_test, y_test)
# Initialize CatBoost Classifier with snapshot parameters
model = CatBoostClassifier(
iterations=1000,
learning_rate=0.03,
depth=6,
loss_function='MultiClass',
save_snapshot=True,
snapshot_file='catboost_snapshot',
snapshot_interval=60
)
# Train the model
model.fit(train_pool, eval_set=test_pool, verbose=100)
# Save the model
model.save_model('catboost_model')
# Output Predictions
predictions = model.predict(test_pool)
print(predictions)
Output:
0: learn: 1.0835464 test: 1.0803546 best: 1.0803546 (0) total: 50ms remaining: 49.9s
100: learn: 0.0213311 test: 0.0385356 best: 0.0385356 (100) total: 1.24s remaining: 10.9s
...
900: learn: 0.0013542 test: 0.0383536 best: 0.0383536 (900) total: 10.6s remaining: 1.17s
999: learn: 0.0011300 test: 0.0383546 best: 0.0383536 (900) total: 11.7s remaining: 0us
Snapshot files will be created periodically, with the state of the model saved.
4.Resume Training from Snapshot:
If training is interrupted, you can resume training using the snapshot file:
# Re-initialize CatBoost Classifier with snapshot parameters
model = CatBoostClassifier(
iterations=1000,
learning_rate=0.03,
depth=6,
loss_function='MultiClass',
save_snapshot=True,
snapshot_file='catboost_snapshot',
snapshot_interval=60
)
# Resume training from the snapshot
model.fit(train_pool, eval_set=test_pool, verbose=100, init_model='catboost_snapshot')
# Output Predictions
predictions = model.predict(test_pool)
print(predictions)
Output:
[1 0 2 1 1 0 1 2 0 1 1 2 1 0 2 0 0 0 1 2 0 1 2 0 2 1 2 2 2 2]
CatBoost Training, Recovering and Snapshot Parameters
CatBoost means categorical boosting. It is a powerful open-source machine learning library known for its efficiency, accuracy, and ability to handle various data types. It excels in gradient boosting algorithms, making it suitable for classification, regression, and ranking tasks. This guide delves into the key concepts of CatBoost training, recovery from interruptions, and snapshot parameters for smooth training workflows.
Table of Content
- Training with CatBoost
- Recovering Training Progress in Catboost
- Example 1: Training a CatBoostClassifier with Snapshot Saving and Resuming
- Example 2: Regression with CatBoostRegressor Using Snapshot Mechanism
- Monitoring and Evaluation
Contact Us