Quantitative and Qualitative Considerations to Apply Methods for Identifying Content Relevance between Knowledge Into Managing Knowledge Service

Keedong Yoo

Abstract


Identification of associated knowledge based on content relevance is a fundamental functionality in managing service and security of core knowledge. This study compares the performance of methods to identify associated knowledge based on content relevance, i.e., the associated document network composition performance of keyword-based and word-embedding approach, to examine which method exhibits superior performance in terms of quantitative and qualitative perspectives. As a result, the keyword-based approach showed superior performance in core document identification and semantic information representation, while the word embedding approach showed superior performance in F1-Score and Accuracy, association intensity representation, and large-volume document processing. This study can be utilized for more realistic associated knowledge service management, reflecting the needs of companies and users.


Full Text:

PDF

References


Allan, J., “Building hypertext using information retrieval,” Information Processing & Management, Vol. 33, pp. 145-159, 1997.

Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T., “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606, 2016.

Choi, J. and Hwang, Y. S., “Patent keyword network analysis for improving technology development efficiency,” Technological Forecasting and Social Change, Vol. 83, pp. 170-182, 2014.

Choi, J., Yi, S., and Lee, K. C., “Analysis of keyword networks in MIS research and implications for predicting knowledge evolution,” Information & Management, Vol. 48, pp. 371-381, 2011.

Dai, A. M., Olah, C., and Le, Q. V., “Document embedding with paragraph vectors,” arXiv preprint arXiv:1507.07998, 2015.

De Boom, C., Canneyt, S., Demeester, T. and Dhoedt, B., “Representation learning for very short texts using weighted word embedding aggregation,” Pattern Recognition Letters, Vol. 80, pp. 150-156, 2016.

Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., Advances in knowledge discovery and data mining, 21, AAAI press Menlo Park, 1996.

Feldman, R. and Dagan, I., “Knowledge Discovery in Textual Databases (KDT),” Proceedings of the 1st Internatinal Conference on KDD, pp. 112-117, 1995.

Frantzi, K., Ananiadou, S., and Mima, H., “Automatic recognition of multi-word terms: The C-value/NC-value Method,” International Journal of Digital Libraries, Vol. 3, No. 2, pp. 117-132, 2000.

Han, J., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., Zhang, L. V., Dupuy, D., Walhout, A. J. M., Cuslck, M. E., Roth, F. P., and Vidal, M., “Evidence for dynamically organized modularity in the yeast protein-protein interaction network”, Nature, Vol. 430, No. 6995, pp. 88-93, 2004.

Haveliwala, T. H., Gionis, A., Klein, D., and Indyk, P., “Evaluating strategies for similarity search on the web,” Proceedings of the 11th international conference on World Wide Web, pp. 432-442, 2002.

Henzinger, M. R., “Hyperlink analysis for the web,” IEEE Internet Computing, Vol. 5, pp. 45-50, 2001.

Hwang, S. and Kim, D., “BERT-based Classification Model for Korean Documents,” The Journal of Society for e-Business Studies, Vol. 25, No. 1, pp. 203-214, 2020.

Kamkarhaghighi, M. and Makrehchi, M., “Content Tree Word Embedding for document representation,” Expert Systems with Applications, Vol. 90, pp. 241-249, 2017.

Kenter, T., Borisov, A., and De Rijke, M., “Siamese cbow: Optimizing word embeddings for sentence representations,” arXiv preprint arXiv:1606.04640, 2016.

Kil, H., “A Study on the Centrality Types of Reading Fingerprint Text,” Journal of Cheongram Korean Language Education, Vol. 74, pp. 39-70, 2020.

Klimek, P., Jovanovic, A. S., Egloff, R., and Schneider, R., “Successful fish go with the flow: Citation impact prediction based on centrality measures for term-document networks,” Scientometrics, Vol. 107, pp. 1265-1282, 2016.

Le, Q. and Mikolov, T., “Distributed representations of sentences and documents,” Proceedings of the International Conference on Machine Learning, pp. 1188-1196, 2014.

Lee, D. and Kim, K., “Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec,” The Journal of Society for e-Business Studies, Vol. 23, No. 2, pp. 83-96, 2018.

Pennington, J., Socher, R., and Manning, C., “Glove: Global vectors for word representation,” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.

Rose, S., Engel, D., Cramer, N., and Cowley, W., “Automatic keyword extraction from individual documents,” Text Mining: Applications and Theory, pp. 1-20, 2010.

Yoo, K., “Application suite for autonomous management and service of verbal knowledge”, The Journal of Society for e-Business Studies, Vol. 21, No. 1, pp. 79-90, 2016.

Yoo, K., “Keyword-based networked knowledge map expressing content relevance between knowledge,” Journal of Intelligence and Information Systems, Vol. 24, No. 3, pp. 119-134, 2018.

Yoo, S. and Jeong, O., “An intelligent chatbot utilizing BERT model and knowledge graph,” The Journal of Society for e-Business Studies, Vol. 24, No. 3, pp. 87-98, 2019.

Zhu, L., Liu, X., He, S., Shi, J., and Pang, M., “Keywords co-occurrence mapping knowledge domain research base on the theory of Big Data in oil and gas industry,” Scientometrics, Vol. 105, pp. 249-260, 2015.

Zhuge, H. and Zhang, J., “Automatically constructing semantic link network on documents,” Concurrency and Computation: Practice and Experience, Vol. 23, pp. 956-971, 2011.


Refbacks

  • There are currently no refbacks.