Thursday 25th of April 2024
 

An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes


Tarek El-Shishtawy and Fatma El-Ghannam

In spite of its robust syntax, semantic cohesion, and less ambiguity, lemma level analysis and generation does not yet focused in Arabic NLP literatures. In the current research, we propose the first non-statistical accurate Arabic lemmatizer algorithm that is suitable for information retrieval (IR) systems. The proposed lemmatizer makes use of different Arabic language knowledge resources to generate accurate lemma form and its relevant features that support IR purposes. As a POS tagger, the experimental results show that, the proposed algorithm achieves a maximum accuracy of 94.8%. For first seen documents, an accuracy of 89.15% is achieved, compared to 76.7% of up to date Stanford accurate Arabic model, for the same, dataset.

Keywords: Arabic NLP, Information Retrieval, Arabic Lemmateizer, POS tagger

Download Full-Text


ABOUT THE AUTHORS

Tarek El-Shishtawy
Dr. Tarek El-Shishtawy have participated in many Arabic computational Linguistic projects. Large Scale Arabic annotated Corpus, 1995, was one of important projects for Egyptian Computer Society, and Academy of Scientific Research and Technology, He has many publications in Arabic Corpus, machine translation, Text, and data Mining.

Fatma El-Ghannam
Fatma El-Ghannam has great research interests in Arabic language generation and analysis. Currently, she\'s preparing for a Ph.D. degree in NLP.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »