Wednesday 24th of April 2024
 

A New Machine Learning Approach for Arabic-English Documents Classification


Walid Mohamed Aly, Wafaa Hanna Sharaby and Hany Atef Kelleny

This paper aims at developing a system that is capable of classifying Arabic and English un-structured documents; it proposes to classify these documents in consecutive two phases. In the first phase, incremental Automated Domain-Meta-Document Construction (ADC) algorithm is applied as a new automated machine learning approach. ADC constructs updatable summarized Domain-Meta-Documents, which corresponds to the trained classified documents. The output would be stored in a knowledge base in order to help in the classification process. In the second phase, an enhanced supervised classification algorithm based on automated calculation of threshold value would utilize the previously generated Domain-Meta-Documents to classify the incoming Dataset. To evaluate the performance of this proposed approach, two experiments were conducted on two standard dataset, namely Corpus of Contemporary Arabic (CCA) and Newsgroup 20, whose results revealed that the proposed classification approach outperformed the compared classification algorithms (C4.5 and Back Propagation Neural Network) in different measures. The general accuracy of the proposed system was found to be about 95%.

Keywords: Unstructured Documents, Machine Learning, Classification, Threshold.

Download Full-Text


ABOUT THE AUTHORS

Walid Mohamed Aly
has acquired his Ph.D. from faculty of Engineering, Alexandria University, Egypt. He is currently working as associate professor at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include intelligent systems, soft computing, modeling, simulation and machine learning.

Wafaa Hanna Sharaby
graduated from faculty of Engineering, Ain Shams University, Egypt. He has awarded Ph.D. from Academy of Economic Studies Bucharest, ROMANIA, in the area of Management Information Systems. He is working as a lecturer at the departments of Computer Science and Information Systems, Future Academy (The Higher Institute of Specialized Technological Studies (HISTS)), Egypt. He has more than 25 years of programming, system analysis and design experience.

Hany Atef Kelleny
is a student master at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include soft computing and machine learning.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »