近似文字列照合のための効率的なアルゴリズム - 九大コレクション

＜会議発表論文＞
近似文字列照合のための効率的なアルゴリズム

作成者	著者識別子 K000012 作成者名中藤, 哲也 Nakatoh, Tetsuya 所属機関所属機関名九州大学情報基盤センター Computing and Communications Center, Kyushu University
	作成者名馬場, 謙介 Baba, Kensuke 所属機関所属機関名科学技術振興事業団 Japan Science and Technology Corporation
	著者識別子 L002646 作成者名山田, 泰寛 Yamada, Yasuhiro 所属機関所属機関名九州大学大学院システム情報科学府 Graduate School of Information Science and Electrical Engineering, Kyushu University
	著者識別子 100021285 作成者名池田, 大輔 Ikeda, Daisuke 所属機関所属機関名九州大学情報基盤センター Computing and Communications Center, Kyushu University
	著者識別子 K000008 作成者名廣川, 佐千男 Hirokawa, Sachio 所属機関所属機関名九州大学情報基盤センター Computing and Communications Center, Kyushu University
本文言語	日本語
発行日	2003-06
出版タイプ	Accepted Manuscript
アクセス権	open access
関連DOI	http://matu.cc.kyushu-u.ac.jp/
関連URI	http://matu.cc.kyushu-u.ac.jp/
関連情報	http://matu.cc.kyushu-u.ac.jp/
概要	文字列中に存在する特定のパターンを見つけ出す問題は，文字列照合問題と呼ばれ，Web上の情報からの検索やDNA配列の特定パターンの検索に用いられるなど，幅広い応用範囲を持っている．近似文字列照合問題の一種である不一致を許す文字列照合問題は，単なる文字列照合問題より応用範囲が広く，また難易度も高い．我々は，高速フーリエ変換（FFT）を利用してこの問題の解を高速に計算する効率的なアルゴリズムを提案する．...すべての位置でのマッチングのスコアを求めるもので，ミスマッチ数に制限はない．すなわちk-mismatch問題より難易度が高い．本アルゴリズムは，k個のサンプルを用いた確率アルゴリズムで，計算量はO(knlogm)である．このkは1≦k≦\|Σ\|の範囲を持ち，k=\|Σ\|においては常に正しいスコアが得られる決定性アルゴリズムとなる．すなわち，本アルゴリズムにおいては，精度と計算量の兼ね合いを，正確なスコアベクトルを得る事も含め自由に選ぶことが可能である． The problem to find out a pattern from the string is called String matching problem. That has a wide application range, such as text search, search form the database and an information extraction from Web. Specially, String matching with mismatches problem is more difficult problem than String matching problem. We propose a new efficient algorithm to solve String matching with mismatches problem fast by utilizing fast Fourier transformation (FFT). That does not restrict the number of mismatches. That is a randomized algorithm, and its time complexity is O(knlogm), where k is the number of randomly sampled estimations and its value is in the range of 1 to \|Σ\|. We can compute an exact score vector with k = \|Σ\|. Exactly, our algorithm can be deterministic, too. We can choose a balance of time complexity and precision freely.続きを見る

本文ファイル

ファイル	ファイルタイプ	サイズ	閲覧回数	説明
2003_b_1	pdf	411 KB	313

詳細

レコードID	2962
査読有無	査読有
主題	文字列照合
	近似文字列照合
	不一致
	ＦＦＴ
	畳み込み
	確率アルゴリズム
	決定性アルゴリズム
	String Matching
	mismatch
	FFT
	convolution
	randomized algorithm
	deterministic algorithm
	パターン発見と抽出
注記	第14回データ工学ワークショップ(DEWS2003), 6-B-03, Jun 2003.
タイプ	会議発表論文
登録日	2009.04.22
更新日	2018.08.31

この情報を出力する

このページのリンク

他の検索サイト

利用統計

＜会議発表論文＞ 近似文字列照合のための効率的なアルゴリズム

本文ファイル

詳細

この資料を見た人はこんな資料も見ています

＜会議発表論文＞
近似文字列照合のための効率的なアルゴリズム