Two percentage points look like a minor change. However, they could mean savings of millions of Euros – especially when they represent an improvement in the accuracy of predicting whether or not a customer will repay their loan. This is exactly the type of improvement that a new model for assessing credit risk may bring when it comes to loan applicants in Kazakhstan. Risk analysts Barbora Šicková and Hynek Hilbert are developing the model at Home Credit’s headquarters in Prague. While Hynek is working with the traditional logistic regression model, which calculates the likelihood of the borrower repaying the loan based on the data provided by the client, Barbora is using a new model known as XGBoost for the same purpose, and then they will compare the results. It is a type of “decision tree” that, while more demanding in terms of implementation, should yield more accurate results.
“The quality of the model may improve by just a few percent. But with so many loan applications and approved loans – around 40 million in 2019 alone – such a “minor” change can save us from lending out a lot of money that otherwise might not come back from customers,” Hynek explains. In principle, a model can be improved in two ways: either by getting more information about the applicant, e.g. from scoring provided by external partners such as mobile operators and utility providers, or by using a more sophisticated rating method. The XGBoost model can find out more while still using the same data – and it can even detect hidden relationships. “Compared with external sources that have to be paid for and often yield improvements of half to one percentage point, XGBoost clearly seems to be the better option,” says Hynek.
Home Credit has been using this model successfully for rating applicants in Russia and in China. The company now wants to incorporate XGBoost into its methodology in other countries as well. As part of this process, it’s necessary to select predictors – the data with the highest relevance. The problem is that there is a lot of those to choose from and the “wrong” ones sometimes mask the “right” ones. “There are hidden predictors too, where two variables have no predictive power individually – only when they are combined. XGBoost can detect this on its own, whereas a model built on regression analysis has to be told,” Barbora explains.
For example, a simple predictor like “time of day” when an application was made online is taken into consideration, then the predictor suggests that those who make applications during working hours are more likely to default, because they are probably not in regular employment – meaning they don’t have a stable source of income. “The same characteristic is found in applications made at night, although for different reasons; night-time loan applications while most people are sleeping is simply non-standard behaviour,” says Bára. But when a second predictor, namely “day of the week” is added into the equation, the “time of day” predictor alone is no longer absolutely valid as a result of the interaction with the other predictor – that’s because it’s perfectly reasonable for someone to apply for a loan between 9 am and 5 pm on a weekend, and likewise people tend to stay up later then.
The most relevant predictor is usually the information on whether or not the customer has had a loan before and whether they were able to stick to the repayment schedule. “Then comes the question whether it is better to look 12 months into the past, or just three months – whether we should take into consideration the customer’s current loans or the historic ones as well. XGBoost can decide which option is the best on its own,” Hynek adds.
The resultant model can express a percentage probability that the client will not repay the loan. It is up to Home Credit to set the threshold up to which it still wants to provide a loan to customers in the individual countries.
A more reliable predictive model can have two effects. Either the number of problematic loans will decrease while the number of loans approved remains the same, or the number of loans approved will increase while the default rate remains the same. Choosing which path to take in the individual countries so that the business remains socially sustainable and profitable is up to the company, but the advanced analytical tools at its disposal like XGBoost are what make it possible.
Bára Šicková is a risk analyst at Home Credit’s headquarters in Prague. She graduated from the University of Economics in Prague in Econometrics and Operations Research. She joined Home Credit in 2018. Hynek Hilbert is data scientist at Home Credit’s headquarters in Prague. He graduated from Charles University in Prague in Probability, Mathematical Statistics and Econometrics. He joined Home Credit in 2019.