Saturday 20th of April 2024
 

Document Representation and Clustering with WordNet Based Similarity Rough Set Model


Nguyen Chi Thanh and Koichi Yamada

Most studies on document clustering till date use Vector Space Model (VSM) to represent documents in the document space, where documents are denoted by a vector in a word vector space. The standard VSM does not take into account the semantic relatedness between terms. Thus, terms with some semantic similarity are dealt with in the same way as terms with no semantic relatedness. Since this unconcern about semantics reduces the quality of clustering results, many studies have proposed various approaches to introduce knowledge of semantic relatedness into VSM model. Those approaches give better results than the standard VSM. However they still have their own issues. We propose a new approach as a combination of two approaches, one of which uses Rough Sets theory and co-occurrence of terms, and the other uses WordNet knowledge to solve these issues. Experiments for its evaluation show advantage of the proposed approach over the others.

Keywords: document clustering, document representation, rough sets, text mining.

Download Full-Text


ABOUT THE AUTHORS

Nguyen Chi Thanh
PhD Student, Dept. of Management and Information Systems Science, Nagaoka University of Technology

Koichi Yamada
Professor, Dept. of Management and Information Systems Science, Nagaoka University of Technology


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »