<テクニカルレポート>
Extracting Best Consensus Motifs from Positive and Negative Examples

作成者
本文言語
出版者
発行日
収録物名
出版タイプ
アクセス権
関連DOI
関連URI
関連情報
概要 We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet $ Sigma $ is a family $ Omega $ of su...bsets of $ Sigma^* $. A motif ;$ p $ of type $ Sigma $ is a string $ pi = pi1... pi_n $ of motif components, each of which stands for an element in $ Omega $. The BCM problem for $ Omega $ is, given a yes-no sample $ S = left{(alpha ^(i), \beta ^(1), . . . ,(alpha^(m),\beta^(m))\right} $ of pairs of strings in $ Sigma^* $ with $ alpha ^(i) \eq \beta^(i) $ for $ 1leq i leq m, $ to find a motif $ pi $ of type $ Omega $ that maximizes the number of good pairs in S, where$ left (alpha^(i),\beta^(i) \right) $ is good for $ pi $ if $ pi $ accepts $alpha^(i) $ and rejects ,$ \beta ^(i) $ We prove that the BCM problem is NP-complete even for a very simple type $ Omega_i = left{z mid 0 \eq z subseteq Sigma \right} $, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type $ Omega_ infty = Omega _1 cup left{Sigma^[ij] mid 1 leq i leq j \right} $ , where $ Sigma^[ij] $ is the set of strings over $ Sigma $ of length between i and j. Furthermore, for the BCM problem for $ Omega $, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.続きを見る

本文ファイル

pdf rifis-tr-115 pdf 0.98 MB 448  

詳細

レコードID
査読有無
主題
タイプ
登録日 2009.04.22
更新日 2021.12.13