Classification of Parent Company’s Downward Business Clients Using Random Forest: Focused on Value Chain at the Industry of Automobile Parts

Teajin Kim, Jeongshik Hong, Yunsu Jeon, Jongryul Park, Teayuk An


The value chain has been utilized as a strategic tool to improve competitive advantage, mainly at the enterprise level and at the industrial level. However, in order to conduct value chain analysis at the enterprise level, the client companies of the parent company should be classified according to whether they belong to it’s value chain. The establishment of a value chain for a single company can be performed smoothly by experts, but it takes a lot of cost and time to build one which consists of multiple companies. Thus, this study proposes a model that automatically classifies the companies that form a value chain based on actual transaction data. A total of 19 transaction attribute variables were extracted from the transaction data and processed into the form of input data for machine learning method. The proposed model was constructed using the Random Forest algorithm. The experiment was conducted on a automobile parts company. The experimental results demonstrate that the proposed model can classify the client companies of the parent company automatically with 92% of accuracy, 76% of F1-score and 94% of AUC. Also, the empirical study confirm that a few transaction attributes such as transaction concentration, transaction amount and total sales per customer are the main characteristics representing the companies that form a value chain.

Full Text:



Archer, K. J. and Kimes, R. V., “Empirical characterization of random forest variable importance measures,” Computational Statistics & Data Analysis, Vol. 52, No. 4, pp. 2249-2260, 2008.

Barney, J. B. and Ouchi, W. G., Organizational economics, San Francisco: Jossey-Bass, 1986.

Breiman, L., “Random Forests,” Machine learning, Vol. 45, No. 1, pp. 5-32, 2001.

Brown, I. and Mues, C., “An experimental comparison of classification algorithms for imbalanced credit scoring data sets,” Expert Systems with Applications, Vol. 39, No. 3, pp. 3446-3453, 2012.

Choi, S. H. and Choi, J. I., “GVC Case Analysis of the Motor Industry: Focusing on Hyundai Motor,” Journal of Digital Convergence, Vol. 14, No. 12, pp. 73-84, 2016.

Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and Keogh, E., “Querying and mining of time series data: experimental comparison of representations and distance measures,” Proceedings of the VLDB Endowment, Vol. 1, No. 2, pp. 1542-1552, 2008.

Gang, S. H., Kim, C. H., and Chung, H. Y., “Machining process improvement of automobile hub assembly parts,” Proceedings of the Korea Academy Industrial Cooperation Society, pp. 242-244, 2015.

Go, S. W., Hong, D. P., Gang, S. H., Do, J. H., Lee, G. H., and Yu, S., S., Structural changes and countermeasures in each sector of the economy in the digital economy 3, Korea Information Strategy Development Institute, pp. 1-208, 2005.

Han, E. J., Screening Test Data Analysis for Cataract Happening Prediction Model using Random forest, Yonsei University Graduate School of Medical Statistics, Master’s Thesis, 2004.

Hong, J. S., Park, K. H., and Park, J. R., “Hybrid Classifiers of Classification Techniques for Mixed Data,” Journal of the Korean Institute of Industrial Engineers, Vol. 43, No. 5, pp. 341-349, 2017.

Kim, C. S., Jo, H. J., and Jeong, J. H., “Modular Production and Hyundai Production System: The Case of Hyundai Mobis,” Economy and Society, Vol. 92, pp. 351-385, 2011.

Kim, J. H., “Development of Fostering Strategies for MICE Industry through the Value Chain Analysis,” Northeast Asia Tourism Research, Vol. 7, No. 4, pp. 131-150, 2011.

Kim, K. S., “The Characteristics of Corporate Growth and Innovation in the Materials, Components, and Equipments Sectors of Korean Display Industrial Value Chain,” Journal of Korea Technology Innovation Society, Vol. 20, No. 1 pp. 205-238, 2017.

Kim, S. J. and An, H. C., “Random Forest’s Assessment Model for Corporate Bond Ratings,” Korea Intelligent Information Systems Society Spring Conference, pp. 371-376, 2014.

Kim, T. J., Lee, J. H., and Hong, J. S., “Supply Network Analysis of Second and Third Outsourcing Firms with E-Invoice at Automobile Parts Industry: Focused to Brake Manufacturing Firms,” The Jounal of Society for e-Business Studies, Vol. 21, No. 3, pp. 79-99, 2016.

Kotsiantis, S. B., Zaharakis, I., and Pintelas, P., “Supervised machine learning: A review of classification techniques,” Informatica, Vol. 31, pp. 249-268, 2007.

Kwon, A. N., Variable Selection Using Random Forest, Inha University Graduate School of Statistics, Master’s Thesis, 2013.

Lee, H. J., Park, J. S., and Kim, M. T., “Transformation of Value Chain and Business Models in the 3G Mobile Service Industry,” Proceedings of Symposium of the Korean Institute of communications and Information Sciences, pp. 1833-1836, 2005.

Lee, H. S., Lim, D. H., and Mun, Y. S., “Value chain analysis system using company data,” Korean Institute Of Industrial Engineers Fall Conference, pp. 1974-1985, 2016.

Lee, R. E., Kim, K. T., Lee, S. J., Jeong, G. J., Lee, S. J., Lee, H. S., Mun, Y. S., and Lim, D. H., “Data-based Value Chain Construction Algorithm Development and Smart Device Application,” Korean Operations Research and Management Society Spring Conference, pp. 109-128, 2016.

Liaw, A., and Wiener, M., “Classification and Regression by Random Forest,” R News, Vol. 2/3, pp. 18-22, 2002.

Linden, G., Kraemer, K. L., and Dedrick, J., “Who captures value in a global innovation network?: the case of Apple’s iPod,” Communications of the ACM, Vol. 52, No. 3, pp. 140-144, 2009.

Li, R. H. and Belford, G. G., “Instability of decision tree classification algorithms,” In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 570-575, 2002.

Liu, M., Wang, M., Wang, J., and Li, D., “Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar,” Sensors and Actuators B: Chemical, Vol. 177, pp. 970-980, 2013.

Macher, J. T., Mowery, D. C., and Simcoe, T. S., “e-Business and disintegration of the semiconductor industry value chain,” Industry and Innovation, Vol. 9, No. 3, 155-181, 2002.

Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., and de Mendonça, A., “Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests,” BMC research notes, Vol. 4, No. 1, p. 299, 2011.


  • There are currently no refbacks.