Friday 26th of April 2024
 

An Approach of Chunk Alignment for French-Vietnamese Bilingual Corpora


Ngoc Tan Le, Ngoc Tien Le and Dien Dinh

The machine translation domain has been developed and improved very quickly. But the issue of long sentences is still a problem in this domain. Hence using phrase chunking on the purpose of reducing the length of sentences to improve the translation quality is a promising approach. In this paper, we present the approach of lexical analysis – phrase chunking – applied to French sentences in combination with a French-Vietnamese bilingual dictionary. And we also define the boundaries of the chunks to create a set of French-Vietnamese bilingual segments in order to overcome limitations due to the long sentences. We tested the system model with a French-Vietnamese bilingual corpus composed of 10,000 sentences pairs and evaluated on a sample of 100 sentences pairs in this corpus after the chunking process by our system. And our system has been evaluated with an accuracy more than 90%, and the value of F-measure is 91.61%.

Keywords: Bilingual corpus, machine translation, extraction of parallel corpus, chunk alignment, French Tree Bank corpus, Conditional Random Fields.

Download Full-Text


ABOUT THE AUTHORS

Ngoc Tan Le
Ngoc Tan LE is a lecturer in Department of Computer Science in Industrial University of Ho Chi Minh, Vietnam. He has graduated Master in 2009 from the University of Lyon 1, France. His current research interests include Vietnamese-related NLPs using machine learning of linguistics knowledge from French-Vietnamese bilingual corpora.

Ngoc Tien Le
Ngoc Tien LE is a lecturer in Department of Computer Science in Industrial University of Ho Chi Minh, Vietnam. He has graduated Master in 2008 from the University of Natural Science of Ho Chi Minh, Vietnam. His current research interests include Machine Learning, Machine Translation and Named Entity Recognition.

Dien Dinh
Dien DINH is associate professor in Computer Science. He is actually deputy head of Knowledge Engineering Department in University of Natural Sciences, VNU-HCMC, Vietnam. He received the Ph.D. degree in Linguistics in 2005 from the University of Social Sciences & Humanity, VNU-HCMC and Ph.D. degree in Computer Sciences in 2002 from the University of Natural Sciences, VNU-HCMC. His research interests include Vietnamese-related NLPs using machine learning of linguistics knowledge from English-Vietnamese bilingual corpora.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »