Predicting Bank Customer Attrition

Analyse patterns of customer attrition in a bank and predict which customers are likely to leave.

This project focuses on exploring a dataset related to bank customer attrition, specifically customers leaving their credit card services. The primary goal is to analyse key factors contributing to churn and develop a predictive model that can identify customers at risk of leaving. By doing so, bank staff can take proactive measures to retain valuable customers and reduce attrition.


Highlights

I exported the machine learning model that I've trained to use it in an interactive page where you can input customer data and see if they would be classified as an existing or attrited customer. This is a simple demonstration of how the model can be used to predict customer attrition in a bank.

Have a look at the Tableau dashboard that I created to visualise the customer attrition dataset:

TableauTableau Dashboard for Customer Attrition Data

Bank Customer Attrition Dashboard


The Process

In this project, I aimed to predict bank customer attrition using machine learning techniques. The process began with data preprocessing, where I cleaned and transformed the dataset to ensure it was in the best shape for modeling. This involved handling missing values, encoding categorical variables, and scaling numerical features. Proper preprocessing was crucial in maintaining data integrity and optimising the learning process for the models.

Image description

Snapshot of Data Preprocessing

Next, I conducted exploratory data analysis (EDA) using various visualisations to identify trends and patterns in the dataset. By analysing factors such as customer age, credit limit, income category, and transaction behaviors, I was able to identify key indicators of customer attrition. These insights guided my feature selection process, helping to focus on the most relevant attributes.

Exploratory Data Analysis for Customer Attrition Data

Snapshot of Data Exploration

I then trained multiple machine learning models, including Logistic Regression, Decision Tree, and XGBoost, to predict customer attrition. After comparing the models based on performance metrics such as accuracy, precision, and recall, I found that XGBoost outperformed the others with an accuracy of approximately 97%. Due to its strong predictive capabilities and ability to handle complex relationships within the data, I selected XGBoost as the final model.

Snapshot of Model Evaluation

Snapshot of Model Evaluation

To further enhance model performance, I implemented feature engineering, class imbalance handling, and hyperparameter tuning. Feature engineering allowed me to create more informative variables, improving the model's ability to distinguish between customers likely to churn and those likely to stay. I then optimised the model by tuning hyperparameters using grid search and random search techniques to find the best combination of parameters for XGBoost.

Snapshot of Feature Importance for the selected model

Snapshot of Feature Importance for the selected model

To ensure the model's reliability, I validated its performance using K-Folds Cross-Validation, which helped confirm its robustness across different subsets of the dataset. This step ensured that the model's accuracy was not just a result of overfitting but was consistently performing well across various data splits.

Snapshot of K-Folds Cross Validation Results for the trained models

K-Folds Cross Validation Results for the trained models

Finally, I exported the trained model for future deployment and created an interactive visualisation dashboard using Tableau. This dashboard presents key insights such as the correlation of transaction amount/count and customer attrition, attrition status by income category, and the relationship of income and card category relative to the number of inactive months of the customer. By integrating machine learning predictions with intuitive visualisations, the project offers a data-driven tool for banks to proactively address customer attrition and enhance customer retention strategies.

This project strengthened my understanding of data preprocessing, model selection, hyperparameter tuning, and validation techniques, while also allowing me to explore how predictive analytics can drive business decisions. The combination of machine learning and visualisation ensures that the results are not only accurate but also actionable for stakeholders.