Development of An Automatic Classification System for Game Reviews Based on Word Embedding and Vector Similarity

Yu-Jeong Yang, Bo-Hyun Lee, Jin-Sil Kim, Ki Yong Lee


Because of the characteristics of game software, it is important to quickly identify and reflect users’ needs into game software after its launch. However, most sites such as the Google Play Store, where users can download games and post reviews, provide only very limited and ambiguous classification categories for game reviews. Therefore, in this paper, we develop an automatic classification system for game reviews that categorizes reviews into categories that are clearer and more useful for game providers. The developed system converts words in reviews into vectors using word2vec, which is a representative word embedding model, and classifies reviews into the most relevant categories by measuring the similarity between those vectors and each category. Especially, in order to choose the best similarity measure that directly affects the classification performance of the system, we have compared the performance of three representative similarity measures, the Euclidean similarity, cosine similarity, and the extended Jaccard similarity, in a real environment. Furthermore, to allow a review to be classified into multiple categories, we use a threshold-based multi-category classification method. Through experiments on real reviews collected from Google Play Store, we have confirmed that the system achieved up to 95% accuracy.

Full Text:



Chevalier, J. A. and Mayzlin, D., “The Effect of Word of Mouth on Sales: Online Book Reviews,” Journal of Marketing Research, Vol. 43, No. 3, pp. 345-354, 2006.

DMC Report, “2018 Mobile Game and Mobile Game Advertising Market Size and Status,”

Duan, W., Gu, B., and Whinston, A. B., “The dynamics of online word-of-mouth and product sales—An empirical investigation of the movie industry,” Journal of Retailing, Vol. 84, No. 2, pp. 233-242, 2008.

Huang, A., “Similarity measures for text document clustering,” Proceedings of the 6th New Zealand Computer Science Research Student Conference, pp. 49-56, 2008.

Kim, J., Byeon, H., and Lee, S. H., “Enhancement of User Understanding and Service Value Using Online Reviews,” The Journal of Information Systems, Vol. 20, No. 2, pp. 21-36, 2011.

Korea Creative Content Agency, “2018 Korea Game White Paper,”

Kostyra, D. S., Reiner, J., Natter, M., and Klapper, D., “Decomposing the Effects of Online Customer Reviews on Brand, Price, and Product Attributes,” International Journal of Research in Marketing, Vol. 33, No. 1, pp. 11-26, 2015.

Lee, D. H. and Kim, K. H., “Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec,” The Journal of Information Systems, Vol. 23, No. 2, pp. 83-96, 2018.

Lilleberg, J., Zhu, Y., and Zhang, Y., “Support vector machines and Word2vec for text classification with semantic features,” IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), 2015.

Mikolov, T., Chen, K., Corrado, G., and Dean, J., “Efficient Estimation of Word Representations in Vector Space,” ICLR Workshop Paper, 2013.

Mikolov, T., Yih, W., and Zweig, G., “Linguistic Regularities in Continuous Space Word Representations,” Proceedings of NAACL-HLT, 2013.

Setty, V., Kreitz, G., Vitenberg, R., van Steen, M., Urdaneta, G., and Gimåker, S., “The hidden pub/sub of Spotify,” Proceedings of the 7th ACM International Conference on Distributed Eventbased Systems, pp. 231-240, 2013.

Sudeep Das, “Making meaningful restaurant recommendations at opentable,”, 2015.

Tanimoto, T. T., “An elementary mathematical theory of classification and prediction,” IBM Report (November, 1958), cited in: G. Salton, Automatic Information Organization and Retrieval, p. 238, 1968.

Wensen, L., Zewen, C., Jun, W., and Xiaoyi, W., “Short text classification based on Wikipedia and Word2vec,” 2016 2nd IEEE International Conference on Computer and Communications (ICCC), 2016.

Yeon, J. H., Lee, D. J., Shim, J. H., and Lee, S. G., “Product Review Data and Sentiment Analytical Processing Modeling,” The Journal of Society for e-Business Studies, Vol. 16, No. 4, pp. 125-137, 2011.

Zhang, D., Xu, H., Su, Z., and Xu, Y., “Chinese comments sentiment classification based on word2vec and SVMperf,” Expert System with Applications, Vol. 42, No. 4, pp. 1857-1863, 2015.

Zhu, F. and Zhang, X. M., “Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics,” Journal of Marketing, Vol. 74, No. 2, pp. 133-148, 2010.


  • There are currently no refbacks.