Gathering Text Files Generated from Templates - 九大コレクション | 九州大学附属図書館

＜会議発表論文＞
Gathering Text Files Generated from Templates

作成者	著者識別子 100021285 作成者名 Ikeda, Daisuke 池田, 大輔イケダ, ダイスケ所属機関所属機関名 Kyushu University Library 九州大学附属図書館
作成者	著者識別子 L002646 作成者名 Yamada, Yasuhiro 山田, 泰寛ヤマダ, ヤスヒロ所属機関所属機関名 Department of Informatics, Kyushu University 九州大学システム情報科学府
本文言語	日本語
発行日	2004-08-30
収録物名	Proceedings of the 30th VLDB Conference
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI
関連DOI	Proceedings of the 30th VLDB Conference
関連DOI	http://www.i.kyushu-u.ac.jp/index.html
関連URI	Proceedings of the 30th VLDB Conference
関連URI	http://www.i.kyushu-u.ac.jp/index.html
関連URI	以下と同一 http://cips.eas.asu.edu/iiwebfinalproceedings/14.pdf
関連HDL
関連情報	Proceedings of the 30th VLDB Conference
関連情報	http://www.i.kyushu-u.ac.jp/index.html
概要	Information integration comprises the three steps: data discovery; information extraction; and information integration. In this paper, we focus on the data discovery step which is crucial for the foll...owing steps. We first define what the data discovery is from the viewpoint of information extraction. The problem is, given a large amount of files, to find some sets of files such that found files in each set share some template. Each set corresponds to a template and multiple templates could be hidden in given files. We exploits a linear time algorithm which was originally developed by the authors for the common parts detection problem. The algorithm found different templates from collected Web pages including many noise files. We can cluster files according to the found templates. Files of a cluster is used as input data for an information extraction algorithm.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
14	pdf	957 KB	545

詳細

レコードID	6076
査読有無	査読無
関連URI	http://cips.eas.asu.edu/iiwebfinalproceedings/14.pdf
主題	data discovery
	information extraction
	information integration
注記	The 30th VLDB Conference, 30 August 2004, Toronto, Canada
タイプ	会議発表論文
登録日	2009.04.22
更新日	2020.10.13