Thursday 25th of April 2024
 

Self Organizing Map-based Document Clustering Using WordNet Ontologies


Tarek F. Gharib, Mohammed M. Fouad, Abdulfattah Mashat and Ibrahim Bidawi

With the rapid development of web content, retrieving relevant information is difficult task. The efficient clustering algorithms are needed to improve the results of the retrieval. Document clustering is a process of recognizing the similarity or dissimilarity among the given objects and forms subgroups sharing common characteristics. In this paper, we propose a semantic text document clustering approach that using WordNet lexical and Self Organizing Maps. The proposed approach uses the WordNet to identify the importance of the concepts in the document. The SOM is used to cluster the document. We use this approach to enhance the effectiveness of document clustering algorithms. The approach takes the advantages of the semantics available in knowledge base and the relationship between the words in the input documents. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experiments show advantage of the proposed approach over the others.

Keywords: Text Document Clustering; WordNet Lexical Categories; Self Organizing Map (SOM)

Download Full-Text


ABOUT THE AUTHORS

Tarek F. Gharib
Prof.Tarek Fouad Gharib is a Professor of Information Systems at Department of Information Systems, Ain Shams University, Cairo, Egypt. He received his Ph.D. degree in Theoretical Physics from the University of Ain Shams in 1994. His research interests include data mining techniques, bioinformatics, graph and sequential data mining and information retrieval. He received the National Science Foundation Award in 2001. Prof Gharib currently is in sabbatical at Faculty of Computing and Information Technology, King Abdulaziz University,Jeddah, Saudi Arabia

Mohammed M. Fouad
Mohammed M. Fouad obtained his B.Sc. (Honor) and M.Sc. in Computer Science from the Faculty of Computers and Information Sciences – Ain Shams University in 2004 and 2009, respectively. His dissertation was in the field of Pattern Recognition and Data Mining with application to web document mining. He used Semantic text analysis and fuzzy clustering in such research point. Currently, Mohammed is a Ph.D. student in Computer Science in the same faculty in \"Natural Scene Description using Stereo Vision Techniques\" field. His research interests are in the Data Mining, Computer Vision, Image Processing and Pattern Recognition fields

Abdulfattah Mashat
Dr. Abdul Fattah Suleman Meshat is Dean of Admission and Registration, King Abdulaziz University, Jeddah, Saudi Arabia. He received his Ph.D. degree in Distributed Multimedia Systems from The University of Leeds, UK in 1999. His research interests include Multimedia Systems, E- learning , computer networks, Ad-hoc and wireless networks, QoS Support for multimedia traffic, Modeling and simulation of computer networks

Ibrahim Bidawi
Dr. Ibrahim Albidewi is Dean of Faculty of Computers and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. He received his Ph.D. degree in Computer Vision from The University Swansea, UK in 1993. His research interests include Computer Vision, Software engineering, E- learning , and Artificial Intelligent .


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »