AUTOXGBOOST + Optuna

Rupak (Bob) Roy - II

Published in

DataDrivenInvestor

4 min readDec 18, 2021

An auto version of one of the most powerful machine learning XGBOOST

Hi everyone. WhatsUp! all good? today we will go through a new way to automate the XGBoost via AutoAGB.

We will start with our regular XGBOOST and will carry forward to the advancement of AutoXGB. alright?

First load all the data

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd# Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")#drop the missing values
dataset = dataset.dropna()X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values#------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Regular XGBoost

#---XGBOOST------------------# Fitting XGBoost to the Training set
from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)# Predicting the Test set results
y_pred_xg = classifier.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred_xg)#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred_xg))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred_xg)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred_xg))

Evaluation metrics XGBOOST — Sklearn Evaluation Metrics XGBOOST

Now let’s perform RandmizedSearchCV an alternative to GridSearchCV (gridsearchcv uses brute force, thus takes a lot of processing power and time)

# Set parameter grid
xgb_params = {'max_depth': [3, 5, 6, 8, 9, 10, 11], 
              'learning_rate': [0.01, 0.1, 0.2, 0.3, 0.5], 
              'subsample': np.arange(0.4, 1.0, 0.1),
              'colsample_bytree': np.arange(0.3, 1.0, 0.1), 
              'colsample_bylevel': np.arange(0.3, 1.0, 0.1),
              'n_estimators': np.arange(100, 600, 100),
              'gamma': np.arange(0, 0.7, 0.1)}from sklearn.model_selection import RandomizedSearchCV
# Create RandomizedSearchCV instance
xgb_grid = RandomizedSearchCV(estimator=XGBClassifier(objective='binary:logistic', 
                                                      tree_method="gpu_hist", # Use GPU
                                                      random_state=42,
                                                      eval_metric='aucpr'), # AUC under PR curve
                              param_distributions=xgb_params, 
                              cv=5, 
                              verbose=2,
                              n_iter=60,
                              scoring='average_precision')# Run XGBoost grid search
xgb_grid.fit(X_train, y_train)#to get the optimal setting for xgboost
xgb_grid.best_params_
xgb_grid.best_score_

The best parameters: xgb_grid.best_params_

{'subsample': 0.4,
 'n_estimators': 500,
 'max_depth': 3,
 'learning_rate': 0.1,
 'gamma': 0.4,
 'colsample_bytree': 0.8000000000000003,
 'colsample_bylevel': 0.7000000000000002}

and its score: 0.9934761095555403

Now fit the model with the best parameters

#fit the model with the best estimatorclassifier1 = XGBClassifier(subsample= 0.4,n_estimators= 500,
                            max_depth=3,learning_rate=0.1,gamma=0.4,colsample_bytree= 0.8000000000000003,
                            colsample_bylevel=0.7000000000000002)classifier1.fit(X_train,y_train)# Predicting the Test set results
y_pred_xg1 = classifier1.predict(X_test)# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred_xg1)#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred_xg1))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred_xg1)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred_xg1))

Combining the scores of with and without RandomizedSearchCV

Comparison with and without Optimized XGBoost Parameters

Now let’s perform AUTOXGBoost

#AutoXGBOOST
#Current version there is dependency bug so, install autoxgb without #dependenciespip install --no-deps autoxgboost
#pip install autoxgboost

Now set the optimal parameters

AUTOXGBOOST: Optimal Parameters

The link to more parameter definitions from the developer and also can you use CLI to run and execute the script: https://github.com/abhishekkrthakur/autoxgb

Now we will train the model

AUTOXGBOOST: Training

That's it!

The reasons why AutoXGB performs better as follows:

Use of Bayesian optimization with Optuna for hyperparameter tuning, which is faster than RandomizedSearchCV or GridSearchCV as it uses information from previous iterations to find the best hyperparameters in fewer iterations.
Thus with Optuna provides better accuracy with speed
Documented Optimized memory usage which consumes 8x less memory.

and the drawbacks

we lose significant control over parameter tunning
another observable issue is the lack of control over the evaluation metrics.

Next, we will look into the rush for adopting the OPTUNA the best alternatives to Grid Search Hyperparameter tunning technique.

If you find this article useful…. do browse my other techniques like Bagging Classifier, Voting Classifier, Stacking, and more I guarantee you will like them too. See you soon with another interesting topic.

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

https://www.quora.com/profile/Rupak-Bob-Roy

Have a good day, Talk soon.

AUTOXGBOOST + Optuna

Have a good day, Talk soon.

Written by Rupak (Bob) Roy - II