Semantic Extraction from List Web Pages
Extracting structured information from web pages is a problem that has many applications and that gained increased interest in recent years.
We propose an approach that can achieve extraction and semantic description of data contained in a list web page. Our approach is fully automatic and is based on a \seed\ ontology that contains minimal information about the domain. It uses an instance-based classifier to characterize the attributes of the ontology. In opposition to existing methods, our approach does not make any assumption on the design of web pages ; it is totally layout independent.
Experimental results obtained from different web pages of different web sites from different domains show that our approach is effective.
Keywords: Web Information Extraction; list web pages, probablistic model, ontology
Download Full-Text
ABOUT THE AUTHORS
Ismail Jellouli
DESA degree in computer science in 2007, currently a Ph D student in computer science. He is an IEEE student member and his research interests include information extraction, semantic web and reference reconciliation.
Mohammed El Mohajir
MOHAJIR is European Master in Environmental System Modeling (1992) and Doctor of Science (1997). He is Professor at the department of computer sciences at the Faculty of Science Dhar Mahraz. He is the vice-chair of the IEEE Morocco Section. His main research is about conceptual modeling, design and development of decision-support Information Systems, ETL processes for datawarehouse and SOLAP, Distributed and Parallel Processing Systems and semantic web.
Ismail Jellouli
DESA degree in computer science in 2007, currently a Ph D student in computer science. He is an IEEE student member and his research interests include information extraction, semantic web and reference reconciliation.
Mohammed El Mohajir
MOHAJIR is European Master in Environmental System Modeling (1992) and Doctor of Science (1997). He is Professor at the department of computer sciences at the Faculty of Science Dhar Mahraz. He is the vice-chair of the IEEE Morocco Section. His main research is about conceptual modeling, design and development of decision-support Information Systems, ETL processes for datawarehouse and SOLAP, Distributed and Parallel Processing Systems and semantic web.