Saturday 23rd of September 2017
 

Modified Pattern Extraction Algorithm for Efficient Semantic Similarity Measures between Words


Pushpa C N, Thriveni J, Venugopal K R and L M Patnaik

Semantic Similarity measures plays an important and significant role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Modified Pattern Extraction Algorithm [MPEA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) Support Vector Machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify the synonymous word-pairs and non-synonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value for Miller-Charles dataset and 75.3% of correlation value for Word Similarity dataset.

Keywords: : Information Retrieval, Semantic Similarity, Support Vector Machine, Web Mining, Web Search Engine, Web Snippets.

Download Full-Text


ABOUT THE AUTHORS

Pushpa C N
Department of Computer Science and Engineering, University Visvesvaraya College of Engineering Bangalore, Karnataka, India

Thriveni J
Department of Computer Science and Engineering, University Visvesvaraya College of Engineering Bangalore, Karnataka, India

Venugopal K R
Department of Computer Science and Engineering, University Visvesvaraya College of Engineering Bangalore, Karnataka, India

L M Patnaik
Indian Institute of Science, Bangalore, Karnataka, India


IJCSI Published Papers Indexed By:

 

 

 

 
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »