Gathering Text Files Generated from Templates - Collections | Kyushu University Library

Back to Results List

＜conference paper＞
Gathering Text Files Generated from Templates

Creator	Author PID K000021 Creator Name Ikeda, Daisuke 池田, 大輔イケダ, ダイスケ Affiliation Affiliation Name Kyushu University Library 九州大学附属図書館
Creator	Author PID L002646 Creator Name Yamada, Yasuhiro 山田, 泰寛ヤマダ, ヤスヒロ Affiliation Affiliation Name Department of Informatics, Kyushu University 九州大学システム情報科学府
Language	Japanese
Date	2004-08-30
Source Title	Proceedings of the 30th VLDB Conference
Publication Type	Accepted Manuscript
Access Rights	open access
Related DOI
Related DOI	Proceedings of the 30th VLDB Conference
Related DOI	http://www.i.kyushu-u.ac.jp/index.html
Related URI	Proceedings of the 30th VLDB Conference
Related URI	http://www.i.kyushu-u.ac.jp/index.html
Related URI	isIdenticalTo http://cips.eas.asu.edu/iiwebfinalproceedings/14.pdf
Related HDL
Relation	Proceedings of the 30th VLDB Conference
Relation	http://www.i.kyushu-u.ac.jp/index.html
Abstract	Information integration comprises the three steps: data discovery; information extraction; and information integration. In this paper, we focus on the data discovery step which is crucial for the foll...owing steps. We first define what the data discovery is from the viewpoint of information extraction. The problem is, given a large amount of files, to find some sets of files such that found files in each set share some template. Each set corresponds to a template and multiple templates could be hidden in given files. We exploits a linear time algorithm which was originally developed by the authors for the common parts detection problem. The algorithm found different templates from collected Web pages including many noise files. We can cluster files according to the found templates. Files of a cluster is used as input data for an information extraction algorithm.show more

Hide fulltext details.

File	FileType	Size	Views	Description
14	pdf	957 KB	450

Details

Record ID	6076
Peer-Reviewed	Unrefereed
Related URI	http://cips.eas.asu.edu/iiwebfinalproceedings/14.pdf
Subject Terms	data discovery
	information extraction
	information integration
Notes	The 30th VLDB Conference, 30 August 2004, Toronto, Canada
Type	会議発表論文
Created Date	2009.04.22
Modified Date	2020.10.13