A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting

Zian Shin, Jihoon Moon, Seungmin Rho

Abstract


Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.


Full Text:

PDF

References


Adadi, A. and Berrada, M., “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, Vol. 6, pp. 52138-52160, 2018.

Ahmadi, A., Nabipour, M., Mohammadi-Ivatloo, B., Amani, A. M., Rho, S., and Piran, M. J., “Long-Term Wind Power Forecasting Using Tree-Based Learning Algorithms,” IEEE Access, Vol. 8, pp. 151511-151522, 2020.

Altman, N. and Krzywinski, M., “Ensemble methods: bagging and random forests,” Nature Methods, Vol. 14, No. 10, pp. 933-935, 2017.

Belgiu, M. and Drăguţ, L., “Random forest in remote sensing: A review of applications and future directions,” ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 114, pp. 24-31, 2016.

Chen, T. and Guestrin, C., “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.

Chun, Y. E., Kim, S. B., Lee, J. Y., and Woo, J. H., “Study on credit rating model using explainable AI,” Journal of the Korean Data and Information Science Society, Vol. 32, No. 2, pp. 283-295, 2021.

Chun, Y. E., Park, Y., Sung, N., and Choi, J., “Model analysis using estimation of shapley value on classification of sentences explaining causes of changes in stock prices,” KIISE Transactions on Computing Practices, Vol. 26, No. 4, pp. 195-201, 2020.

Jung, C. and Lee, H., “A comparative study of explainable AI techniques for process analysis,” Journal of the Institute of Electronics and Information Engineers, Vol. 57, No. 8, pp. 51-59, 2020.

Hung, P. D., Hanh, T. D., and Tung, T. D., “Term deposit subscription prediction using spark MLlib and ML packages,” in Proceedings of the 2019 5th International Conference on E-Business and Applications, pp. 88-93, 2019.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y., “LightGBM: A highly efficient gradient boosting decision tree,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, Vol. 30, pp. 3146-3154, 2017.

Kim, S., Kim, W., Jang, Y., and Kim, H., “Development of Explainable AI-Based Learning Support System,” The Journal of Korean Association of Computer Education, Vol. 24, No. 1, pp. 107-115, 2021.

Kwon, B. C., Choi, M.-J., Kim, J. T., Choi, E., Kim, Y. B., Kwon, S., Sun, J., and Choo, J., “RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records,” IEEE Transactions on Visualization and Computer Graphics, Vol. 25, No. 1, pp. 299-309, 2018.

Landis, J. R. and Koch, G. G., “An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers,” Biometrics, pp. 363-374, 1977.

Lee, D. Y. and Hwang, B. S., “Performance comparison of algorithm for the prediction of time deposit,” in Proceedings of the Korea Computer Congress, pp. 2074-2076, 2018.

Lee, Y.-G., Oh, J.-Y., and Kim, G., “Interpretation of load forecasting using explainable artificial intelligence techniques,” The Transactions of the Korean Institute of Electrical Engineers, Vol. 69, No. 3, pp. 480-485, 2020.

Lim, M. and Jang, H., “A Study on the Risk Reduction Plan of Cryptocurrency Exchange,” Journal of Platform Technology, Vol. 8, No. 4, pp. 29-37, 2020.

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I., “From local explanations to global understanding with explainable AI for trees,” Nature Machine Intelligence, Vol. 2, No. 1, pp. 56-67, 2020.

Mangalathu, S., Hwang, S. H., and Jeon, J. S., “Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach,” Engineering Structures, Vol. 219, p. 110927, 2020.

Moon, J., Jung, S., Rew, J., Rho, S., and Hwang, E., “Combination of short-term load forecasting models based on a stacking ensemble approach,” Energy and Buildings, Vol. 216, p. 109921, 2020.

Moon, J., Kim, J., Kang, P., and Hwang, E., “Solving the Cold-Start Problem in Short-Term Load Forecasting Using Tree-Based Methods,” Energies, Vol. 13, No. 4, p. 886, 2020.

Moon, J., Kim, Y., Son, M., and Hwang, E., “Hybrid short-term load forecasting scheme using random forest and multilayer perceptron,” Energies, Vol. 11, No. 12, p. 3283, 2018.

Moro, S., Cortez, P., and Rita, P., “A data-driven approach to predict the success of bank telemarketing,” Decision Support Systems, Vol. 62, pp. 22-31, 2014.

Natekin, A. and Knoll, A., “Gradient boosting machines, a tutorial,” Frontiers in Neurorobotics, Vol. 7, p. 21, 2013.

Oh, H. R., Son, A. L., and Lee, Z., “Occupational accident prediction modeling and analysis using SHAP,” Journal of Digital Contents Society, Vol. 22, No. 7, pp. 1115-1123, 2021.

Oshiro, T. M., Perez, P. S., and Baranauskas, J. A., “How Many Trees in a Random Forest?,” Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 154-168, 2012.

Park, J., Moon, J., Jung, S., and Hwang, E., “Multistep-ahead solar radiation forecasting scheme based on the light gradient boosting machine: A case study of Jeju Island,” Remote Sensing, Vol. 12, No. 14, p. 2271, 2020.

Park, S. H., Lee, J. H., Jung, Y. W., and Won, Y. J., “Performance comparison of periodic deposit prediction using machine learning,” Proceedings of the Korea Software Congress, pp. 2139-2141, 2018.

Park, S., Moon, J., Jung, S., Jung, S., and Hwang, E., “SHAP-based Explainable Influenza Occurrence Forecasting using LightGBM,” Proceedings of the Korea Software Congress, pp. 666-668, 2020.

Park, S., Moon, J., and Hwang, E., “Explainable anomaly detection for district heating based on shapley additive explanations,” Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), pp. 762-765, 2020.

Park, S., Moon, J., Jung, S., Rho, S., and Baik, S. W., Hwang, E., “A two-stage industrial load forecasting scheme for day-ahead combined cooling, heating and power scheduling,” Energies, Vol. 13, No. 2, p. 443, 2020.

Park, W. and Jang, H., “A study on implementing a priority tasks for invigoration of cloud in financial sector,” Journal of Platform Technology, Vol. 8, No. 1, pp. 10-15, 2020.

Parlar, T., “Using Data Mining Techniques for detecting the important features of the bank direct marketing data,” International Journal of Economics and Financial Issues, Vol. 7, No. 2, p. 692, 2017.

Rew, J., Cho, Y., Moon, J., and Hwang, E., “Habitat suitability estimation using a two-stage ensemble approach,” Remote Sensing, Vol. 12, No. 9, p. 1475, 2020.

Rew, J., Kim, H., and Hwang, E., “Hybrid segmentation scheme for skin feature extraction using dermoscopy images,” Computers, Materials & Continua, Vol. 69, No. 1, pp. 801-817, 2021.

Ribeiro, M. H. D. M., and dos Santos Coelho, L., “Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series,” Applied Soft Computing, Vol. 86, p. 105837, 2020.

Rodriguez, J. D., Perez, A., and Lozano, J. A., “Sensitivity analysis of k-fold cross validation in prediction error estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 3, pp. 569-575, 2009.

Sun, J. C. and Kim, I. S., “Improvement of selective consent method in the collection process of personal information of financial institutions,” The Journal of Society for e-Business Studies, Vol. 25, No. 1, pp. 123-134, 2020.


Refbacks

  • There are currently no refbacks.