A New Machine Learning Approach for Arabic-English Documents Classification
This paper aims at developing a system that is capable of classifying Arabic and English un-structured documents; it proposes to classify these documents in consecutive two phases. In the first phase, incremental Automated Domain-Meta-Document Construction (ADC) algorithm is applied as a new automated machine learning approach. ADC constructs updatable summarized Domain-Meta-Documents, which corresponds to the trained classified documents. The output would be stored in a knowledge base in order to help in the classification process.
In the second phase, an enhanced supervised classification algorithm based on automated calculation of threshold value would utilize the previously generated Domain-Meta-Documents to classify the incoming Dataset. To evaluate the performance of this proposed approach, two experiments were conducted on two standard dataset, namely Corpus of Contemporary Arabic (CCA) and Newsgroup 20, whose results revealed that the proposed classification approach outperformed the compared classification algorithms (C4.5 and Back Propagation Neural Network) in different measures. The general accuracy of the proposed system was found to be about 95%.
Keywords: Unstructured Documents, Machine Learning, Classification, Threshold.
Download Full-Text
ABOUT THE AUTHORS
Walid Mohamed Aly
has acquired his Ph.D. from faculty of Engineering, Alexandria University, Egypt. He is currently working as associate professor at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include intelligent systems, soft computing, modeling, simulation and machine learning.
Wafaa Hanna Sharaby
graduated from faculty of Engineering, Ain Shams University, Egypt. He has awarded Ph.D. from Academy of Economic Studies Bucharest, ROMANIA, in the area of Management Information Systems. He is working as a lecturer at the departments of Computer Science and Information Systems, Future Academy (The Higher Institute of Specialized Technological Studies (HISTS)), Egypt. He has more than 25 years of programming, system analysis and design experience.
Hany Atef Kelleny
is a student master at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include soft computing and machine learning.
Walid Mohamed Aly
has acquired his Ph.D. from faculty of Engineering, Alexandria University, Egypt. He is currently working as associate professor at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include intelligent systems, soft computing, modeling, simulation and machine learning.
Wafaa Hanna Sharaby
graduated from faculty of Engineering, Ain Shams University, Egypt. He has awarded Ph.D. from Academy of Economic Studies Bucharest, ROMANIA, in the area of Management Information Systems. He is working as a lecturer at the departments of Computer Science and Information Systems, Future Academy (The Higher Institute of Specialized Technological Studies (HISTS)), Egypt. He has more than 25 years of programming, system analysis and design experience.
Hany Atef Kelleny
is a student master at the College of Computing and Information Technology (CCIT), Arab Academy for Science, Technology & Maritime Transport (AASTMT). His research interests include soft computing and machine learning.