International Journal of Computer Science Issues

An Advanced Concept-Based Mining Model to Enrich Text Clustering

M Yasodha and P. Ponmuthuramalingam

Text mining are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. A new concept-based mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The concept-based mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and concept-based similarity measure The proposed model can efficiently find significant matching concepts between documents, according to the semantics of their sentences. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. The experiments demonstrate extensive comparison between the concept-based analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentence-based, document-based, corpus-based, and combined approach concept analysis.

Keywords: Keywords - Concept-based mining model, sentence-based, document-based, corpus-based, concept analysis, conceptual term frequency, concept-based similarity.

Download Full-Text

ABOUT THE AUTHORS

M Yasodha
M. Yasodha is working as an Assistant Professor in the Department of Computer Science, Dr. N.G.P. Arts and Science College, Coimbatore and doing Ph.D., in Bharathiar University, Coimbatore. She has done her M.Phil., in the area of Data Mining in Bharathiar University, Coimbatore. She has done her post graduate degree MCA in Bharathiar University, Coimbatore. She has presented and published a number of papers in reputed journals. She has four years of teaching and research experience and her research interests include Data Mining, Web mining, Semantic Web mining and Text mining.

P. Ponmuthuramalingam
DR. P. Ponmuthuramalingam received his Masters Degree in Computer Science from Alagappa University,Karaikudi in 1988 and the Ph.D. in Computer Science from Bharathiar University, Coimbatore. He is working as Associate Professor and Head in Department of Computer Science, Government Arts College(Autonomous), Coimbatore. His research interest includes Text mining, Semantic Web, Network Security and Parallel Algorithms.

International Journal of Computer Science Issues More than a traditional journal...

An Advanced Concept-Based Mining Model to Enrich Text Clustering

International Journal of Computer Science Issues

More than a traditional journal...