<テクニカルレポート>
A Template Discovery Algorithm by Substring Amplification

作成者
本文言語
出版者
発行日
雑誌名
出版タイプ
アクセス権
概要 In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but un...known pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings.We assume that the frequency distribution of replaced strings follow a power-law distribution. Although the longest common subsequence problem, which is one of the famous common part discovery problems, is well-known to be NP-complete, we show that the template discovery problem can be solved in linear time with high probability. This complexity is achieved due to the following our contributions: reformulation of the problem, using a set of substrings to express a string, and counting all occurrences $ F( f ) $ with frequency $ f $ instead of just frequency $ f $. We demonstrate the effectiveness of the proposed algorithm using data on the Web. Moreover, we show noise robustness and effectiveness even when input strings are generated by a union of patterns and pattern with the iterate operation.続きを見る

本文情報を非表示

trcs220 pdf 162 KB 49  

詳細

レコードID
査読有無
関連情報
注記
タイプ
登録日 2009.04.22
更新日 2017.01.18