International Journal of Computer Science Issues

Information Extraction from Arabic News

Hala Elsayed and Tarek Elghazaly

Information Extraction IE is finding of specific facts from collections of the vast unstructured texts in web and large documents. Named Entity Recognition NER is a sub-problem of information extraction. Recent researches in information extraction are growing also interest in NER that is help to extract desired information from massive texts therefore extracting entities is important tasks in Natural Language Processing NLP. Arabic language need to more researches in information extraction domain therefore we introduce this research. The experiment is concerned of extraction entities and the entities relation from the Arabic text. We use the Arabic news from Egyptian Arabic newswire. The paper introduce a method for extract numerous unknown using entity and entities relation from Arabic Corpus that is generated from Egyptian Arabic newswire to extract Information using the Named Entities and Entities Relation in Arabic language. The experiment contained nearly 625368 entries, the number of sentences 36423, the selecting sample about 3400 sentences represent the crimes news. In the results we obtained some information that is consider a tool for decision maker in analyze the text.

Keywords: Information Extraction IE, Natural Language Processing NLP, Named Entities Recognition, Corpus, Gazetteers.

Download Full-Text

ABOUT THE AUTHORS

Hala Elsayed
Computer and Information Sciences Dept., ISSR, Cairo University Cairo, Egypt

Tarek Elghazaly
Computer and Information Sciences Dept., ISSR, Cairo University Cairo, Egypt

International Journal of Computer Science Issues More than a traditional journal...

Information Extraction from Arabic News

International Journal of Computer Science Issues

More than a traditional journal...