International Journal of Computer Science Issues

Graph Theoretic and Genetic Algorithm-Based Model for Web Content Mining

Moses Akinjide Adelola, Sunday Olumide Adewale and Gabriel Babatunde Iwasokun

The World Wide Web (www) is arguably the largest and the most heterogeneous repository of data and has continued to expand in size and complexity. With consistency in expansion, retrieval of required web pages and information has become a herculean task for web users due to information overload and worst still, existing web content retrieval techniques have not exhibited enough efficiency in areas of speed and accuracy. This paper presents a Graph Theoretic (GT) and Genetic Algorithm (GA)-based technique for mining of web documents. The technique utilizes graph representations of document content to address the problems of initialization, convergence to local minimal and failure to handle large datasets. The technique works in three phases; namely contents extraction, preprocessing and database formulation while Maximum Common Sub-graph (MCS) was used to calculate the distance between clusters. Results of the web-based experimental study on Pentium 4 with 2GHz processor and 1GB RAM running on Window 7 operating system platform with web scraper (import.io) as front-end and PHP 6 and MySQL5 as back-ends show the applicability and the superiority of the new techniques over some existing ones.

Keywords: Web mining, graph theory, genetic algorithm, knowledge discovery

Download Full-Text

ABOUT THE AUTHORS

Moses Akinjide Adelola
PhD Research student

Sunday Olumide Adewale
Professor of Computer Science

Gabriel Babatunde Iwasokun
Lecturer/Researcher

International Journal of Computer Science Issues More than a traditional journal...

Graph Theoretic and Genetic Algorithm-Based Model for Web Content Mining

International Journal of Computer Science Issues

More than a traditional journal...