<journal article>
Extraction of Informative Blocks from Deep Web Page Using Similar Layout Feature

Creator
Language
Publisher
Date
Source Title
Vol
Issue
First Page
Last Page
Publication Type
Access Rights
Abstract Due to the explosive growth and popularity of the deep web, information extraction from deep web page has gained more and more attention. However, the HTML structure of web page has become more compli...cated, making it difficult to recognize target content by only analyzing the HTML source code. In this paper, we propose a method to extract the informative blocks from a deep web using the layout feature. We consider the visual rectangular region of an HTML element as a visual block in web page. We transform the elements’ layout of a visual block into a layout tree. By calculating the similarity of layout trees, we cluster the visual blocks that have similar layout feature. Finally, the cluster which has the largest area is extracted as the informative block cluster. The experiment results show that this method is optimal when the threshold of layout tree similarity is 0.4show more

Hide fulltext details.

pdf hirokawa_19 pdf 599 KB 373  

Details

Record ID
Peer-Reviewed
Subject Terms
ISSN
Created Date 2015.12.03
Modified Date 2020.10.15

People who viewed this item also viewed