Creator |
|
|
|
Language |
|
Publisher |
|
Date |
|
Source Title |
|
Vol |
|
Issue |
|
First Page |
|
Last Page |
|
Publication Type |
|
Access Rights |
|
Abstract |
Due to the explosive growth and popularity of the deep web, information extraction from deep web page has gained more and more attention. However, the HTML structure of web page has become more compli...cated, making it difficult to recognize target content by only analyzing the HTML source code. In this paper, we propose a method to extract the informative blocks from a deep web using the layout feature. We consider the visual rectangular region of an HTML element as a visual block in web page. We transform the elements’ layout of a visual block into a layout tree. By calculating the similarity of layout trees, we cluster the visual blocks that have similar layout feature. Finally, the cluster which has the largest area is extracted as the informative block cluster. The experiment results show that this method is optimal when the threshold of layout tree similarity is 0.4show more
|