Keyword Reduction for Text Categorization using Neighborhood Rough Sets
Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just deal with nominal value. In this work, we try to apply neighborhood rough sets to solve the keyword reduction problem. A heuristic algorithm is proposed meanwhile compared with some classical methods, such as Information Gain, Mutual Information, CHI square statistics, etc. The experimental results show that the proposed methods can outperform other methods.
Keywords: Text Categorization; Keyword Reduction; Neighborhood Rough Sets; Heuristic Algorithm
Download Full-Text
ABOUT THE AUTHOR
Si-Yuan Jing
Sichuan Province University Key Laboratory of Internet Natural Language Intelligent Processing, Leshan Normal University
Si-Yuan Jing
Sichuan Province University Key Laboratory of Internet Natural Language Intelligent Processing, Leshan Normal University