Machine Learning Applications in Credit Risk Prediction

Title:	Machine Learning Applications in Credit Risk Prediction
Number:	25/08
Author(s):	Kübra Bölükbaş, Ertan Tok
Language:	English
Date:	July 2025
Abstract:	The goal of this study is to identify the most effective model for predicting credit risk, the likelihood a commercial loan defaults (become a non-performing loan) in the Turkish banking sector and to determine which firm and loan characteristics influence that risk. The analysis draws on an unbalanced dataset of 1.2 million firm-level observations for 2018–2023, combining financial ratios with detailed loan- and firm-specific information. Class imbalance is addressed through oversampling (including SMOTE) and multiple down-sampling schemes. Although the risk is assessed ex-ante, model performance is evaluated ex-post using the ROC-AUC metric. Within tested conventional econometric and machine learning approaches accompanied with different sampling techniques, Extreme Gradient Boosting (XGBoost) with oversampling delivers the best result with a ROC-AUC score of 0.914. Compared with logistic regression under the same sampling setup, a 4.9-percentage-point increase in test ROC-AUC is attained, confirming the model’s superior predictive performance over conventional approaches. Accordingly, the study finds that the industry and location in which a firm operates, its loan-restructuring status, loan cost and type (fixed vs. floating rate), the firm’s record of bad checks, and core ratios capturing profitability, liquidity and leverage to be the most influential predictors of credit risk.
Keywords:	Credit risk, Machine learning techniques, Financial ratios, Banking sector, Macro-financial stability, Feature importance
JEL Codes:	C52; C53; C55; G17; G2; G32; G33