The goal is to predict ‘TOTAL_GROSS’ earnings using an XGBoost regression model.
Data Preparation:
Features: [‘RETRO’, ‘DETAIL’, ‘OVERTIME’, ‘INJURED’, ‘QUINN_EDUCATION’, ‘REGULAR’]
Target: ‘TOTAL_GROSS’
Top Departments Selection:
The top 10 departments by employee count are selected for analysis.
Train-Test Split:
The data is split into training and testing sets (80% training, 20% testing).
XGBoost Regression Model:
An XGBoost Regressor is created and trained on the training set.
Prediction:
The model predicts ‘TOTAL_GROSS’ on the test set (‘X_test’).
Evaluation:
The R2 score is calculated to assess the model’s performance on the test data.
Output:
The R2 score for the XGBoost model is 0.3542, indicating the model explains 35.42% of the variance in ‘TOTAL_GROSS’.