Discovering Frequent Substructures in Large Unordered Trees - 九大コレクション | 九州大学附属図書館

＜テクニカルレポート＞
Discovering Frequent Substructures in Large Unordered Trees

作成者	作成者名 Asai, Tatsuya 浅井, 達哉所属機関所属機関名 Kyushu University 九州大学
	作成者名 Arimura, Hiroki 有村, 博紀所属機関所属機関名 Kyushu University 九州大学
	作成者名 Uno, Takeaki 宇野, 毅明所属機関所属機関名 National Institute of Informatics 国立情報学研究所
	作成者名 Nakano, Shin-ichi 中野, 眞一所属機関所属機関名 Gunma University 群馬大学
本文言語	英語
出版者	Department of Informatics, Kyushu University
出版者	九州大学大学院システム情報科学研究院情報理学部門
発行日	2003-06
収録物名	DOI Technical Report
巻	216
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI	DOI Technical Report \|\| 216
関連DOI	http://www.i.kyushu-u.ac.jp/research/report.html
関連URI	DOI Technical Report \|\| 216
関連URI	http://www.i.kyushu-u.ac.jp/research/report.html
関連情報	DOI Technical Report \|\| 216
関連情報	http://www.i.kyushu-u.ac.jp/research/report.html
概要	In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unorde...red trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in $ O(kb^2n) $ per pattern, where $ k $ is the size of $ T $, $ b $ is the branching factor of the data tree, and $ n $ is the total number of occurrences of $ T $ in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
trcs216	pdf	237 KB	698

詳細

レコードID	3055
査読有無	査読無
タイプ	テクニカルレポート
登録日	2009.04.22
更新日	2018.08.31