A Tensor Space Model based Semantic Search Technique

Kee-Joo Hong, Han-Joon Kim, Jae-Young Chang, Jong-Hoon Chun

Abstract


Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users’ search intent without big cognitive efforts. Usually, semantic search engines requires ontology and semantic metadata to analyze user queries. However, building a particular ontology and semantic metadata intended for large amounts of data is a very time-consuming and costly task. This is why commercialization practices of semantic search are insufficient. In order to resolve this problem, we propose a novel semantic search method which takes advantage of our previous semantic tensor space model. Since each term is represented as the 2nd-order ‘document-by-concept’ tensor (i.e., matrix), and each concept as the 2nd-order ‘document-by-term’ tensor in the model, our proposed semantic search method does not require to build ontology. Nevertheless, through extensive experiments using the OHSUMED document collection and SCOPUS journal abstract data, we show that our proposed method outperforms the vector space model-based search method.


Full Text:

PDF

References


Baeza-Yates, R. and Ribeiro-Neto, B., Modern information retrieval: The Concepts and Technology behind Search, New York: ACM Press, Chapter 3, 2011.

Berlanga, R., Nebot, V., and Pérez, M., “Tailored semantic annotation for semantic search,” Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 30, pp. 69-81, 2015.

Gantz, J. and David R., “The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,” IDC iView: IDC Analyze the Future 2007, pp. 1-16, 2012.

Heck, L. P., Hakkani-Tür, D., and Tür, G., “Leveraging knowledge graphs for web-scale unsupervised semantic parsing,” INTERSPEECH, pp. 1594-1598, 2013.

Kim, H. J. and Chang, J. Y., “A Semantic Text Model with Wikipedia-based Concept Space,” The Journal of Society for e- Business Studies, Vol. 19, No. 3, pp. 107-123, 2014.

Kim, H. J., Hong, K. J., and Chang, J. Y., “Semantically enriching text representation model for document clustering,” Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 922-925, 2015.

Nadeau, D. and Sekine, S., “A survey of named entity recognition and classification,” Lingvisticae Investigationes, Vol. 30, No. 1, pp. 3-26, 2007.

Navigli, R., “Word sense disambiguation: A survey,” ACM Computing Surveys (CSUR), Vol. 41, No. 2, pp. 1-69, 2009.

Page, L., Brin, S., and Motwani, R., Winograd, T., “The PageRank citation ranking: bringing order to the Web,” 1999.

Rossi, R. G., Marcacini, R. M., and Rezende, S. O., “Benchmarking text collections for classification and clustering tasks,” Institute of Mathematics and Computer Sciences, University of Sao Paulo, 2013.

Salton, G., Wong, A., and Yang, C. S., “A vector space model for automatic indexing,” Communications of the ACM, Vol. 18, No. 11, pp. 613-620, 1975.

Sudeepthi, G., Anuradha, G., and Babu, M. S. P., “A survey on semantic web search engine,” IJCSI International Journal of Computer Science Issues, Vol. 9, No. 2, pp. 241-245, 2012.

Tablan, V., Bontcheva, K., and Roberts, I., Cunningham, H., “Mímir: An open-source semantic search framework for interactive information seeking and discovery,” Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 30, pp. 52-68, 2015.

Yang, K. and Shahabi, C., “A PCA-based similarity measure for multivariate time series,” Proceedings of the 2nd ACM international workshop on multimedia databases, pp. 65-74, 2004.


Refbacks

  • There are currently no refbacks.