Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset

Oct 26, 2021

Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset

The study addresses credit risk management in Ethiopian banking, focusing on predicting non-performing loans (NPLs) using machine learning. Traditional evaluation relies on payment history, demographics, and collateral, yet defaults persist. The authors benchmarked several ML algorithms—Random Forest, Decision Tree, KNN, SVM, XGBoost, and AdaBoost on anonymized loan data from a private Ethiopian bank, aiming to improve prediction accuracy and support decision-making. Additionally, they explored feature selection to identify key predictors of default.

A major challenge in the dataset was class imbalance, with significantly fewer default loans. The researchers tackled this via oversampling methods: K-Means SMOTE, CTGAN, and undersampling. Models were then trained and validated across these variations. This robust preprocessing ensured fair comparison and revealed the impact of dataset balancing on model performance.

The standout finding was that SVM achieved the highest F1-score when trained on K-Means SMOTE oversampled data, outperforming more complex ensemble methods in this scenario researchgate.net. In contrast, XGBoost emerged as the best performer on certain imbalanced or differently sampled datasets. These nuanced outcomes emphasize that model effectiveness isn't solely tied to sophistication, but also to data balance and feature handling.

Crucially, the study found that borrower characteristics—age, years of employment, and total income were stronger predictors of default than collateral-related features. This suggests that Ethiopian banks may enhance their credit assessments by integrating these demographic and financial indicators, rather than over-relying on collateral value alone. Overall, the research demonstrates the potential of ML and intelligent preprocessing in reducing credit risk in developing economies.

Download PDF

Copyright © chapa 2025