Thursday 28th of March 2024
 

A framework for dynamic indexing from hidden web


Hasan Mahmud, Moumie Soulemane and Mohammad Rafiuzzaman

The proliferation of dynamic websites operating on databases requires generating web pages on-the-fly which is too sophisticated for most of the search engines to index. In an attempt to crawl the contents of dynamic web pages, weve tried to come up with a simple approach to index these huge amounts of dynamic contents hidden behind the search forms. Our key contribution in this paper is the design and implementation of a simple framework to index the dynamic web pages and the use of Hadoop MapReduce framework to update and maintain the index. In our approach, from an initial URL, our crawler downloads both the static and dynamic web pages, detects form interfaces, adaptively selects keywords to generate most promising search results, automatically fill-up search form interfaces, submits the dynamic URL and processes the result until some conditions are satisfied.

Keywords: Dynamic web pages, crawler, hidden web, index, hadoop.

Download Full-Text


ABOUT THE AUTHORS

Hasan Mahmud
Hasan Mahmud has received his Bachelor degree in Computer Science and Information Technology (CIT) from Islamic University of Technology (IUT), Bangladesh in 2004. After that he had joined as a faculty member in Computer Science and Engineering (CSE) department at Stamford University Bangladesh. He did his Master of Science degree in Computer Science (Specialization on NetCentric Informatics) from University of Trento (UniTN), Italy in 2009. He had received University Guild Grant Scholarship for the two years (2007-2009) Master’s study and also awarded with early degree scholarship. He has a number of research papers published in different international journals. He is currently working as an Assistant Professor in the department of Computer Science and Information Technology (CIT), IUT, Bangladesh. His current research interests are on web mining, Human Computer Interaction, and Ubiquitous Computing.

Moumie Soulemane
Moumie Soulemane did his Higher Diploma in Computer Science and Information Technology (CIT) with specialization in web technology from the Islamic University of Technology (IUT), Bangladesh in 2010. Currently he is the final year student for the B.Sc. degree in Computer Science and Information Technology (CIT) at the same university.

Mohammad Rafiuzzaman
Mohammad Rafiuzzaman is currently in the final year of bachelor degree in Computer Science and Information Technology (CIT) at the Islamic University of Technology.


IJCSI Published Papers Indexed By:

 

 

 

 
+++
About IJCSI

IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us
FAQs

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482
Email: info@ijcsi.org

More contact details »