部分文字列増幅法による共通パターン発見アルゴリズム - 九大コレクション | 九州大学附属図書館

＜学術雑誌論文＞
部分文字列増幅法による共通パターン発見アルゴリズム

作成者	著者識別子 00294992 作成者名 IKEDA DAISUKE 池田大輔所属機関所属機関名 (Present address)Kyushu University Library (現)九州大学附属図書館
	作成者名山田泰寛 YAMADA YASUHIRO 所属機関所属機関名九州大学大学院システム情報科学府 Department of Informatics, Kyushu University
	著者識別子 40126785 作成者名廣川佐千男 HIROKAWA SACHIO 所属機関所属機関名九州大学情報基盤センター Computing and Communications Center, Kyushu University
本文言語	日本語
出版者	一般社団法人情報処理学会
出版者	Information Processing Society of Japan (IPSJ)
発行日	2005-01-15
収録物名	情報処理学会論文誌. 数理モデル化と応用
巻	46
号	2
開始ページ	56
終了ページ	66
出版タイプ	Version of Record
アクセス権	open access
権利関係	(C) 2005 by the Information Processing Society of Japan
関連DOI
関連URI	以下と同一 http://ci.nii.ac.jp/naid/110002768729
関連HDL
概要	In this paper, we consider to find common parts among given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some pattern, to find consta...nt parts of the pattern. A pattern is a string over constant and variable symbols, and generates strings by replacing variables into constant strings. We assume that the frequency of replaced constant strings follows a power-law distribution, and construct an algorithm which solves the problem with high probability. Although the longest common subsequence problem, which is one of the famous common part discovery problems, is well-known to be NP-complete, we show that the template discovery problem can be solved in O(n) with high probability, where n is the total length of input strings. This complexity is achieved due to the following our contributions: reformulation of the problem, using a set of substrings to express a template, and using string frequency and all occurrences to find substrings common to input strings. Moreover, using data on the Web, we show noise robustness and effectiveness for the case that input strings are generated by different patterns. 本論文では,複数の文字列に共通な部分を見つける問題を考察する.まず,この問題をパターンから生成された文字列の集合が与えられたときに,そのパターンの定数部分を見つける問題(テンプレート発見問題)として定式化する.パターンとは定数と変数からなる文字列で,パターンが生成する語は変数を定数文字列で置きかえて得られる.置きかえに用いられる文字列中の部分文字列の頻度分布はベキ分布に従うことを仮定し,高確率でテンプレート発見を解くアルゴリズムを構築する.共通部分の発見問題の1つである最長の共通部分列を探す問題はNP完全であることが知られているが,問題の再定式化,部分文字列の集合による定数部分の表現方法,部分文字列の頻度と総出現数から共通部分を発見する手法により,テンプレート発見問題は高確率で0(n)時間で解けることを示す.ここで,nは入力文字列の長さの和である.さらに,このアルゴリズムがノイズに対し頑健であることと,複数のテンプレートが混在する場合でも有効であることを,Web上の実データに適用することで実証する.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
hirokawa_220	pdf	260 KB	314

詳細

レコードID	1316975
査読有無	査読有
関連URI	http://ci.nii.ac.jp/naid/110002768729
ISSN	03875806
NCID	AA11464803
注記	利用は著作権の範囲内に限られます
登録日	2013.12.09
更新日	2023.07.28