<テクニカルレポート>
Efficient Substructure Discovery from Large Semi-structed Data

作成者
本文言語
出版者
発行日
雑誌名
出版タイプ
アクセス権
概要 In this paper, we consider a data mining problem for semi-structured data. Modeling semi-structured data as labeled ordered trees, we present an efficient algorithm for discovering frequent substructu...res from a large collection of semi-structured data. By extending the enumeration technique developed by Bayardo (SIGMOD'98) for discovering long itemsets, our algorithm scales almost linearly in the total size of maximal tree patterns contained in an input collection depending mildly on the size of the longest pattern. We also developed several pruning techniques that significantly speed-up the search. Experiments on Web data show that the our algorithm runs efficiently on real-life datasets combined with proposed pruning techniques in the wide range of parameters.続きを見る

本文情報を非表示

trcs200 pdf 493 KB 76  
trcs200.ps gz 474 KB 64  

詳細

レコードID
査読有無
関連情報
タイプ
登録日 2009.04.22
更新日 2018.08.31