Person Name Recognition for Uyghur Using Conditional Random Fields
This paper describes the person name recognition system for Uyghur, a highly agglutinative language, using the conditional random fields (CRFs) approach. In this paper, our experiments with various feature combinations for Uyghur have been explained. We also described a method to build Uyghur corpus from a set of hand annotated sentences. Feature selection is an important factor in recognition of person names using CRF, we used features as like Context Words, Stems of words, Suffix and its length, whether a suffix is exist, first and last syllable of the word, POS Information, Dictionary feature etc. For evaluation, we perform several experiments using different feature settings. This model proved to have a Recall of 81.86%, Precision of 88.79% and F-score of 85.19%.
Keywords: NER, Uyghur language, person name recognition, CRF, feature
Download Full-Text
ABOUT THE AUTHORS
Muhtar Arkin
College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China
Abdurahim Mahmut
College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China
Askar Hamdulla
College of Software, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China
Muhtar Arkin
College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China
Abdurahim Mahmut
College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China
Askar Hamdulla
College of Software, Xinjiang University, Urumqi, Xinjiang, 830046, P.R. China