作成者 |
|
|
|
本文言語 |
|
出版者 |
|
|
発行日 |
|
収録物名 |
|
巻 |
|
出版タイプ |
|
アクセス権 |
|
関連DOI |
|
|
関連URI |
|
|
関連情報 |
|
|
概要 |
We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet $ Sigma $ is a family $ Omega $ of su...bsets of $ Sigma^* $. A motif ;$ p $ of type $ Sigma $ is a string $ pi = pi1... pi_n $ of motif components, each of which stands for an element in $ Omega $. The BCM problem for $ Omega $ is, given a yes-no sample $ S = left{(alpha ^(i), \beta ^(1), . . . ,(alpha^(m),\beta^(m))\right} $ of pairs of strings in $ Sigma^* $ with $ alpha ^(i) \eq \beta^(i) $ for $ 1leq i leq m, $ to find a motif $ pi $ of type $ Omega $ that maximizes the number of good pairs in S, where$ left (alpha^(i),\beta^(i) \right) $ is good for $ pi $ if $ pi $ accepts $alpha^(i) $ and rejects ,$ \beta ^(i) $ We prove that the BCM problem is NP-complete even for a very simple type $ Omega_i = left{z mid 0 \eq z subseteq Sigma \right} $, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type $ Omega_ infty = Omega _1 cup left{Sigma^[ij] mid 1 leq i leq j \right} $ , where $ Sigma^[ij] $ is the set of strings over $ Sigma $ of length between i and j. Furthermore, for the BCM problem for $ Omega $, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.続きを見る
|