International Journal of Computer Science Issues

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

Si-Yuan Jing

Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just deal with nominal value. In this work, we try to apply neighborhood rough sets to solve the keyword reduction problem. A heuristic algorithm is proposed meanwhile compared with some classical methods, such as Information Gain, Mutual Information, CHI square statistics, etc. The experimental results show that the proposed methods can outperform other methods.

Keywords: Text Categorization; Keyword Reduction; Neighborhood Rough Sets; Heuristic Algorithm

Download Full-Text

ABOUT THE AUTHOR

Si-Yuan Jing
Sichuan Province University Key Laboratory of Internet Natural Language Intelligent Processing, Leshan Normal University

International Journal of Computer Science Issues More than a traditional journal...

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

International Journal of Computer Science Issues

More than a traditional journal...