A Minimization Technique of XML Path Comparison Based on Signature

Kyunghoon Jang, Byung-Yeon Hwang

Abstract


Since XML allows users to define any tags, XML documents with various structures have been created. Accordingly, many studies on clustering and searching the XML documents based on the similarity of paths have been done in order to manage the documents efficiently. To retrieve XML documents having similar structures, the three-dimensional bitmap indexing technique uses a path as a unit when it creates an index. If a path structure is changed, the technique recognizes it as a new path. Thus, another technique to measure the similarity of paths was proposed.
To compute the similarity between two paths, the technique compares every node of the paths. It causes unnecessary comparison of the nodes, which do not exist in common between the two paths. In this paper, we propose a new technique that minimizes the comparison using signatures and show the performance evaluation results of the technique. The comparison speed of proposed technique was 20 percent faster than the existing technique.


Full Text:

PDF

References


김우생, “비트벡터에 기반한 XML 문서군집화 기법”, 전자공학회논문지 C1, 제47권, 제5호, pp. 10-16, 2010.

김현주, 박소미, 박석, “확장된 질의 처리를 위한 경로간 의미적 유사도를 고려한 XML 문서 순위화 기법”, 정보과학회논문지D, 제37권, 제2호, pp. 113-120, 2010.

이경하, 문봉기, 이규철, “관계형 XML 가지 패턴 질의를 위한 비트맵 인덱스와 질의 처리 기법”, 정보과학회논문지 D, 제37권, 제3호, pp. 146-164, 2010.

이범석, 황병연, “XML 문서의 유사 경로 검색을 위한 인덱싱 시스템”, 정보처리학회논문지, 제15-D권, 제2호, pp. 171-178, 2008.

Dalamagas, T., Cheng, T., Winkel, K. J.,and Sellis, T., “A Methodology for Clustering XML Documents by Structure,”Information Systems, Vol. 31, No. 3, pp. 187-228, 2006.

Faloutsos, C., “Signature Files : Design and Performance Comparison of Some Signature Extraction Methods,” ACMSIGMOD, pp. 63-82, 1985.

http://www.w3.org/TR/REC-xml/.

Hwang, J. H. and Ryu, K. H., “Clustering and Retrieval of XML Documents by Structure,” Lecture Notes in Computer Science, Vol. 3481, 2005.

Lee, J. M. and Hwang, B. Y., “Path Bitmap Indexing for Retrieval of XML Documents,” Lecture Notes in Computer Science, Vol. 3885, pp. 329-339, 2006.

Sacks-Davis, R., Kent, A., and Ramamohanarao, K., “Multikey Access Methods Based on Superimposed Coding Techniques,”ACM Transactions on Database Systems, Vol. 12, No. 4, pp. 655-696, 1984.

XQEngine, http://www.fatdog.com.

Yoon, J. P., Raghavan, V., Chakilam, V.,and Kerschberg, L., “BitCube : A Three-Dimensional Bitmap Indexing for XML Documents,” Journal of Intelligent Information System, Vol. 17, pp. 241-254, 2001.


Refbacks

  • There are currently no refbacks.