作成者 |
|
|
|
|
本文言語 |
|
出版者 |
|
|
発行日 |
|
収録物名 |
|
巻 |
|
出版タイプ |
|
アクセス権 |
|
関連DOI |
|
|
関連URI |
|
|
関連情報 |
|
|
概要 |
In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unorde...red trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in $ O(kb^2n) $ per pattern, where $ k $ is the size of $ T $, $ b $ is the branching factor of the data tree, and $ n $ is the total number of occurrences of $ T $ in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search.続きを見る
|