Thursday 25th of April 2024
 

Efficient structural similarity computation between XML documents


Ali Aitelhadj

This work is mainly motivated by the description of a new approach for calculating the structural similarity of XML documents. Practically, the majority of existing work on XML documents clustering considers the tree structures of these documents as mere vectors and, therefore, does not take into account their hierarchical contexts. Furthermore, in order to calculate the structural similarity of XML documents, most methods encountered in these works perform depth-first traversal to visit the nodes of the tree structures of these documents. More precisely, it is the preorder tree walk which is usually the most used. Recently, other studies present an alternative approach that takes into account the hierarchical contexts of these tree structures, but unfortunately, they have particularly high time complexity in the calculation of structural similarity. In this paper, we propose a new method based on breadth-first traversal of these tree structures. The goal consists in clustering more rapidly XML documents sharing similar structures. Besides the fact that the method is fast, it also takes into account the hierarchical contexts of XML documents. Reconciling the speed required for clustering XML documents with taking into account the hierarchical contexts of their tree structures ensures higher reliability of the proposed method. To validate our proposal, experiments were conducted on both real and synthetic XML data. The results clearly demonstrate the viability of our approach.

Keywords: Clustering, Structural similarity, hierarchical context, Tree level, Ancestor and descendant levels, depth- and breadth-first traversals.

Download Full-Text


ABOUT THE AUTHOR

Ali Aitelhadj
Mouloud Mammeri University of Tizi-Ouzou


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »