<テクニカルレポート>
Extracting Best Consensus Motifs from Positive and Negative Examples

作成者
本文言語
出版者
発行日
雑誌名
出版タイプ
アクセス権
概要 We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet $ Sigma $ is a family $ Omega $ of su...bsets of $ Sigma^* $. A motif ;$ p $ of type $ Sigma $ is a string $ pi = pi1... pi_n $ of motif components, each of which stands for an element in $ Omega $. The BCM problem for $ Omega $ is, given a yes-no sample $ S = left{(alpha ^(i), \beta ^(1), . . . ,(alpha^(m),\beta^(m))\right} $ of pairs of strings in $ Sigma^* $ with $ alpha ^(i) \ eq \beta^(i) $ for $ 1leq i leq m, $ to find a motif $ pi $ of type $ Omega $ that maximizes the number of good pairs in S, where$ left (alpha^(i),\beta^(i) \right) $ is good for $ pi $ if $ pi $ accepts $alpha^(i) $ and rejects ,$ \beta ^(i) $ We prove that the BCM problem is NP-complete even for a very simple type $ Omega_i = left{z mid 0 \ eq z subseteq Sigma \right} $, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type $ Omega_ infty = Omega _1 cup left{Sigma^[ij] mid 1 leq i leq j \right} $ , where $ Sigma^[ij] $ is the set of strings over $ Sigma $ of length between i and j. Furthermore, for the BCM problem for $ Omega $, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.続きを見る

本文情報を非表示

rifis-tr-115 pdf 0.98 MB 66  

詳細

レコードID
査読有無
関連情報
主題
タイプ
登録日 2009.04.22
更新日 2017.01.20