Empirical Study on Analyzing Training Data for CNN-based Product Classification Deep Learning Model

Nakyong Lee, Jooyeon Kim, Junho Shim

Abstract


In e-commerce, rapid and accurate automatic product classification according to product information is important. Recent developments in deep learning technology have been actively applied to automatic product classification. In order to develop a deep learning model with good performance, the quality of training data and data preprocessing suitable for the model are crucial. In this study, when categories are inferred based on text product data using a deep learning model, both effects of the data preprocessing and of the selection of training data are extensively compared and analyzed. We employ our CNN model as an example of deep learning model. In the experimental analysis, we use a real e-commerce data to ensure the verification of the study results. The empirical analysis and results shown in this study may be meaningful as a reference study for improving performance when developing a deep learning product classification model.


Full Text:

PDF

References


Aanen, S. S., Vandic, D., and Frasincar, F., “Automated product taxonomy mapping in an e-commerce environment,” Expert Systems with Applications, Vol. 42, No. 3, pp. 1298-1313, 2015.

Abels, S. and Hahn, A., “Automatic Classification and Re-Classification of Product Data in e-Business,” 2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops), pp. 350-353, 2005.

Cortez, E., Rojas Herrera, M., da Silva, A. S., De Moura, E. S., and Neubert, M., “Lightweight Methods for Large-Scale Product Categorization,” Journal of the American Society for Information Science & Technology, Vol. 62, No. 9, pp. 1839- 1848, 2011.

Dalal, M. K. and Zaveri, M. A., “Automatic Text Classification: a Technical Review,” International Journal of Computer Applications, Vol. 28, No. 2, pp. 37-40, 2011.

Goumy, S. and Mejri, M.-A., “Ecommerce Product Title Classification,” In SIGIR 2018 Workshop on eCommerce, 2018.

Ha, J. W., Pyo, H., and Kim, J., “Large-scale item categorization in e-commerce using multiple recurrent neural networks,” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 107-115, 2016.

Khoo, A., Marom, Y., and Albrecht, D., “Experiments with sentence classification,” Proceedings of the Australasian Language Technology Workshop 2006, pp. 18-25, 2006.

Kil, H.-H., “The Study of Korean Stopwords list for Text mining,” URIMALGEUL : The Korean Language and Literature, Vol. 78, pp. 1-25, 2018.

Kim, Y., “Convolutional neural networks for sentence classification,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746-1751, 2014.

Kozareva, Z., “Everyone likes shopping! multi-class product categorization for e- commerce,” Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1329-1333, 2015,

Krishnan, A. and Amarthaluri, A., “Large Scale Product Categorization using Structured and Unstructured Attributes,” KDD ’19. Mar 2019.

LeCun, Y., Bengio, Y., and Hinton, G., “Deep Learning,” Nature, Vol. 521, pp. 436-444, 2015.

Lee, D. J., Hwang, I. B., and Lee, S.-G., “Efficient Management of Statistical Information of Keywords on E-Catalogs,” The Jounal of Society for e-Business Studies, Vol. 14, No. 4, pp. 1-17, 2009.

Lee, T., Lee, I.-H., Lee, S. K., Lee, S.-G., Kim, D. K., Chun, J. H., Lee, H., and Shim, J. H., “Building an Operational Product Ontology System,” Electronic Commerce Research and Applications, Vol. 5, No. 1, pp. 16-28, 2006.

Lin, Y.-C., Das, P., Trotman, A., and Kallumadi, S., “A Dataset and Baselines for e-Commerce Product Categorization,” 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 213-216, 2019.

Shen, D., Ruvini, J.-D., and Sarwar, B., “Large-scale item categorization for e- commerce,” Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM '12), pp. 595-604, 2012.

Shen, D., Ruvini, J.-D., Mukherjee, R., and Sundaresan, N., “A study of smoothing algorithms for item categorization on e- commerce sites,” Neurocomputing, Vol. 92, pp. 54-60, 2012.

Skinner, M., “Product Categorization with LSTMs and Balanced Pooling Views,” In SIGIR 2018 Workshop on eCommerce, 2018.

Suzuki, S., Iseki, Y., Shiino, H., Zhang, H., Iwamoto, A., and Takahashi, F., “Convolutional Neural Network and Bidirectional LSTM Based Taxonomy Classification Using External Dataset at SIGIR eCom Data Challenge,” In SIGIR 2018 Workshop on eCommerce, 2018.

Xia, Y., Levine, A., Das, P., Di Fabbrizio, G., Shinzato, K., and Datta, A., “Large- Scale Categorization of Japanese Product Titles Using Neural Attention Models,” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2, pp. 663-668, 2017.

Yeon, J. H., Lee, D. J., Shim, J. H., and Lee, S.-G., “Product Review Data and Sentiment Analytical Processing Modeling,” The Jounal of Society for e-Business Studies, Vol. 16, No. 4, pp. 125-137, 2011.

Yu, W., Sun, Z., Liu, H., Li, Z., and Zheng, Z., “Multi-level Deep Learning based E- commerce Product Categorization,” In SIGIR 2018 Workshop on eCommerce, 2018.

Zahavy, T., Magnani, A., Krishnan, A., and Mannor, S., “Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce,” CoRR, Vol. abs/1611.09534, 2016.


Refbacks

  • There are currently no refbacks.