<会議発表論文>
Automatic Wrapper Generation for Multilingual Web Resources

作成者
本文言語
出版者
発行日
収録物名
開始ページ
終了ページ
出版タイプ
アクセス権
権利関係
関連DOI
関連DOI
関連URI
関連URI
関連HDL
関連情報
概要 We present a wrapper generation system to extract contents of semi-structured documents which contain instances of a record. The generation is done automatically using general assumptions on the struc...ture of instances. It outputs a set of pairs of left and right delimiters surrounding instances of a field. In addition to input documents, our system also receives a set of symbols with which a delimiter must begin or end. Our system treats semi-structured documents just as strings so that it does not depend on markup and natural languages. It does not require any training examples which show where instances are. We show experimental results on both static and dynamic pages which are gathered from 13 Web sites, markuped in HTML or XML, and written in four natural languages. In addition to usual contents, generated wrappers extract useful information hidden in comments or tags which are ignored by other wrapper generation algorithms. Some generated delimiters contain whitespaces or multibyte characters.続きを見る

本文ファイル

pdf DS02 pdf 133 KB 557  

詳細

レコードID
査読有無
関連URI
主題
ISSN
NCID
注記
タイプ
登録日 2009.04.22
更新日 2020.11.02

この資料を見た人はこんな資料も見ています