Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset
Oct 26, 2021

The study addresses credit risk management in Ethiopian banking, focusing on predicting non-performing loans (NPLs) using machine learning. Traditional evaluation relies on payment history, demographics, and collateral, yet defaults persist. The authors benchmarked several ML algorithms—Random Forest, Decision Tree, KNN, SVM, XGBoost, and AdaBoost on anonymized loan data from a private Ethiopian bank, aiming to improve prediction accuracy and support decision-making. Additionally, they explored feature selection to identify key predictors of default.
A major challenge in the dataset was class imbalance, with significantly fewer default loans. The researchers tackled this via oversampling methods: K-Means SMOTE, CTGAN, and undersampling. Models were then trained and validated across these variations. This robust preprocessing ensured fair comparison and revealed the impact of dataset balancing on model performance.
The standout finding was that SVM achieved the highest F1-score when trained on K-Means SMOTE oversampled data, outperforming more complex ensemble methods in this scenario researchgate.net. In contrast, XGBoost emerged as the best performer on certain imbalanced or differently sampled datasets. These nuanced outcomes emphasize that model effectiveness isn't solely tied to sophistication, but also to data balance and feature handling.
Crucially, the study found that borrower characteristics—age, years of employment, and total income were stronger predictors of default than collateral-related features. This suggests that Ethiopian banks may enhance their credit assessments by integrating these demographic and financial indicators, rather than over-relying on collateral value alone. Overall, the research demonstrates the potential of ML and intelligent preprocessing in reducing credit risk in developing economies.
More Research
01 Bilicho - Enhancing Documentation Understanding with GPT-3.5 Turbo and RAG

Bilicho is an AI assistant by Chapa that helps developers integrate faster by answering API questions using smart documentation search.
02 Eyezon Ethiopia Report

Analysis of Eyezon Ethiopia's first quarter: donation volume, donor behavior, transaction trends, and social media sentiment from Oct 2021-Jan 2022.
03 Understanding MYGERD and Donors Using Data Analysis

Data-driven analysis of MyGerd platform: donor behavior, donation trends, and social sentiment supporting the GERD dam.