Predicting Bank Customer Attrition

A financial behavior analysis project that applies PCA, classification, and clustering to understand and predict customer churn in the banking sector.

Project Overview

This project analyzes customer attrition in the banking sector using a publicly available dataset. It applies Principal Component Analysis (PCA) to reduce feature complexity, Logistic Regression and K-Nearest Neighbors (KNN) to classify churned customers, and K-Means Clustering to segment clients based on their financial behavior. The objective is to uncover key patterns driving attrition and provide actionable insights to support retention strategies and decision-making.

Objectives

1. Dimensionality Reduction: Use PCA to reduce numerical features while retaining at least 90% of variance. 2. Classification: Compare Logistic Regression and KNN to predict customer churn. 3. Clustering: Segment customers using K-Means based on spending and credit usage.

Dataset

• Title: Bank Customer Churn Prediction Dataset • Source: Kaggle • Format: CSV, 10,000+ records, 23 attributes • Target Variable: Attrition_Flag

Key Techniques

• Data cleaning: removed ID, encoded categoricals, scaled numerics • PCA: Reduced from 20+ features to 12 while retaining 90% variance • Classification: - Logistic Regression: 90.1% accuracy, better F1 for churners - KNN: 89.7% accuracy, slightly better precision • Clustering: K=4, Silhouette Score=0.19 - Profiles based on engagement, spending, and credit

Results Summary

• PCA reduced dimensionality to 12 components with minimal loss • Best model: Logistic Regression • 4 key clusters found with varying risk and value Confusion Matrix - KNN Accuracy Comparison Chart