Friday 23rd of February 2018

A Vision Based Approach for Web Data Extraction Using Enhanced Cocitation Algorithm

R.Vijay and K.Prasadh

Normally, the World Wide Web maintains a set of databases which can store several data records retrieved by web query interface. The information maintained in web is hidden in the database that can be retrieved through dynamic script pages are termed as deep web content. These forms of deep web contents are normally accessed by the web queries, but, extracting the structured data from web database involves complexity. To address the issue, Wei Liu et. al., presented programming language independent vision based approach that use the visual features of deep web pages for web data extraction. The vision based approach also includes the process of extraction of data record and data item. But the unsolved issues in Lius vision based approach is that it not only process the deep web pages in one data region of the web page but also consumes additional time to extract the visual information of web pages. To address the demerit present in ViDE, a novel technique called vision based approach for deep web data extraction is presented. In this work, we describe a framework that processes the deep web pages present in multi data regions. The framework uses enhanced co-citation algorithm that, instead of developing a new set of APIs for the extraction of visual information, the algorithm retrieve the visual information of the deep web pages directly from the web database. Empirical studies with large set of database for web data extraction demonstrate that the performance of the proposed vision based approach [VBEC] are capable of offering high precision while enabling efficient and accurate recall value of similar queries with better time consumption compared to other extraction processes.

Keywords: Deep web data, vision based approach, multi data regions, co-citation algorithm, visual features, and web data extraction

Download Full-Text


R.Vijay is a Research Scholar in M.S University

Dr K.Prasadh is working as principal in V.K College of Engineering Paaripally,Kerala

IJCSI Published Papers Indexed By:





IJCSI is a refereed open access international journal for scientific papers dealing in all areas of computer science research...

Learn more »
Join Us

Read the most frequently asked questions about IJCSI.

Frequently Asked Questions (FAQs) »
Get in touch

Phone: +230 911 5482

More contact details »