<学術雑誌論文>
Extraction of Informative Blocks from Deep Web Page Using Similar Layout Feature

作成者
本文言語
出版者
発行日
収録物名
開始ページ
終了ページ
出版タイプ
アクセス権
概要 Due to the explosive growth and popularity of the deep web, information extraction from deep web page has gained more and more attention. However, the HTML structure of web page has become more compli...cated, making it difficult to recognize target content by only analyzing the HTML source code. In this paper, we propose a method to extract the informative blocks from a deep web using the layout feature. We consider the visual rectangular region of an HTML element as a visual block in web page. We transform the elements’ layout of a visual block into a layout tree. By calculating the similarity of layout trees, we cluster the visual blocks that have similar layout feature. Finally, the cluster which has the largest area is extracted as the informative block cluster. The experiment results show that this method is optimal when the threshold of layout tree similarity is 0.4続きを見る

本文ファイル

hirokawa_19 pdf 599 KB 223  

詳細

レコードID
査読有無
主題
ISSN
登録日 2015.12.03
更新日 2020.10.15

この資料を見た人はこんな資料も見ています