<テクニカルレポート>
Optimized Substructure Discovery for Semi-structured Data

作成者
本文言語
出版者
発行日
雑誌名
出版タイプ
アクセス権
概要 We address the problem of finding interesting substructures from a colletion of semi-structured data such as XML or HTML. Our framework of data mining is optimized pattern discovery introduced by Fuku...da et al., where the goal of a mining algorithm is to discover a pattern that optimizes a given statistical measure such as the information entropy over a class of simple patterns. In this paper, modeling semi-structured data with labeled ordered trees, we study the efficient algorithm for the optimized pattern discovery problem for the class. In a previous paper, we developed the rightmost expansion technique and the incremental occurrence update technique by generalizing enumeration technique developed by Bayardo (SIGMOD'98) for discovering long itemsets to implement an efficient frequent pattern miner for the class of labeled ordered trees. By combining these technique with the pruning technique for optimized patterns of Morishita and Sese (PODS'00), we present an efficient algorithm for finding optimized patterns for labeled ordered trees of bounded size. Experimental results show that our algorithm perform well on a variety of size of data and range of parameters. We also show an approximation hardness result for labeled ordered trees of unbounded size.続きを見る

本文情報を非表示

trcs206 pdf 930 KB 105  
trcs206.ps gz 0.98 MB 119  

詳細

レコードID
査読有無
関連情報
タイプ
登録日 2009.04.22
更新日 2018.08.31