Extracting partial structures from HTML documents - 九大コレクション | 九州大学附属図書館

＜テクニカルレポート＞
Extracting partial structures from HTML documents

作成者	著者識別子 K000167 作成者名 Sakamoto, Hiroshi 坂本, 比呂志所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	作成者名 Arimura, Hiroki 有村, 博紀所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	著者識別子 K000164 作成者名 Arikawa, Setsuo 有川, 節夫所属機関所属機関名 Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
本文言語	英語
出版者	Department of Informatics, Kyushu University
出版者	九州大学大学院システム情報科学研究院情報理学部門
発行日	2000-11
収録物名	DOI Technical Report
巻	181
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI	DOI Technical Report \|\| 181
関連DOI	http://www.i.kyushu-u.ac.jp/research/report.html
関連URI	DOI Technical Report \|\| 181
関連URI	http://www.i.kyushu-u.ac.jp/research/report.html
関連情報	DOI Technical Report \|\| 181
関連情報	http://www.i.kyushu-u.ac.jp/research/report.html
概要	The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick’s wrapper class (Kusshmerick 2000) may be unsuccessful in the case that sufficiently long delimiters a...re not found. The wrapper class introduced in this paper partially overcomes this difficulty by using the tree struc tures of HTML documents. The learning problem to learn such a wrapper program from given text is considered. Moreover, we try to expand our wrapper to extract a portion of HTML not only text attributes.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
trcs181	pdf	97.2 KB	159
trcs181.ps	gz	135 KB	87

詳細

レコードID	3040
査読有無	査読無
主題	data extraction
	wrapper induction
	semistructed data
	learning from examples
タイプ	テクニカルレポート
登録日	2009.04.22
更新日	2018.08.31