Tuesday 23rd of April 2024
 

Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce


Arif Nurwidyantoro and Edi Winarko

In this paper, MapReduce programming model is used to parallelize training and tagging process in maximum entropy part of speech tagging for Bahasa Indonesia. In training process, MapReduce model is implemented dictionary, tagtoken, and feature creation. In tagging process, MapReduce is implemented to tag lines of document in parallel. The training experiments showed that total training time using MapReduce is faster, but its result reading time inside the process slow down the total training time. The tagging experiments using different number of map and reduce process showed that MapReduce implementation could speedup the tagging process. The fastest tagging result is showed by tagging process using 1,000,000 word corpus and 30 map process.

Keywords: POS tagging, Maximum Entropy, MapReduce

Download Full-Text


ABOUT THE AUTHORS

Arif Nurwidyantoro
Arif Nurwidyantoro received his bachelor degree from Institut Pertanian Bogor, Indonesia, and master degree from Universitas Gadjah Mada, Indonesia, both in Computer Sciences. He currently works as teaching assistants at Universitas Gadjah Mada. He has interest in data mining, especially text and web mining, and also in large data processing.

Edi Winarko
Edi Winarko received his bachelor degree in Statistics from Universitas Gadjah Mada, Indonesia, M.Sc in Computer Sciences from Queen University, Canada, and Ph.D in Computer Sciences from Flinders University, Australia. He currently works as lecturer at Department of Computer Sciences and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada. His research interests are data warehousing, data mining, and information retrieval. He is a member of ACM and IEEE.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »