Thursday 25th of April 2024
 

Language Specific Crawler for Myanmar Web Pages



With the enormous growth of the World Wide Web, search engines play a critical role in retrieving information from the borderless Web. Although many search engines can search for content in numerous major languages, they are not capable of searching pages of less-computerized languages such as Myanmar due to the use of multiple non-standard encodings in the Myanmar Web pages. Since the Web is a distributed, dynamic and rapidly growing information resource, a normal Web crawler cannot download all pages. For a Language specific search engine, Language Specific Crawler (LSC) is needed to collect targeted pages. This paper presents a LSC implemented as multi-threaded objects that run concurrently with language identifier. The LSC is capable of collecting as many Myanmar Web pages as possible. In experiments, the implemented algorithm collected Myanmar pages at a satisfactory level of coverage. The results of an evaluation of the LSC by two criteria, recall and precision and a method to measure the total number of Myanmar Web pages on the entire Web are also discussed. Finally, another analysis was conducted to determine the location of the servers of Myanmar Web content, and those results are presented.

Keywords: Language Specific Crawling, Myanmar, Web Search, Language Identification

Download Full-Text

IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »