## ＜テクニカルレポート＞Extracting Best Consensus Motifs from Positive and Negative Examples

作成者 作成者名 所属機関 所属機関名 Department of Information Systems, Kyushu University 九州大学大学院総合理工学研究科情報システム学専攻 著者識別子 作成者名 所属機関 所属機関名 Department of Information Systems, Kyushu University 九州大学大学院総合理工学研究科情報システム学専攻 著者識別子 作成者名 所属機関 所属機関名 Research Institute of Fundamental Information Science Kyushu University 九州大学理学部附属基礎情報学研究施設 英語 Research Institute of Fundamental Information Science, Kyushu University 九州大学理学部附属基礎情報学研究施設 1995-06-08 115 Accepted Manuscript open access We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet $Sigma$ is a family $Omega$ of su...bsets of $Sigma^*$. A motif ;$p$ of type $Sigma$ is a string $pi = pi1... pi_n$ of motif components, each of which stands for an element in $Omega$. The BCM problem for $Omega$ is, given a yes-no sample $S = left{(alpha ^(i), \beta ^(1), . . . ,(alpha^(m),\beta^(m))\right}$ of pairs of strings in $Sigma^*$ with $alpha ^(i) \ eq \beta^(i)$ for $1leq i leq m,$ to find a motif $pi$ of type $Omega$ that maximizes the number of good pairs in S, where$left (alpha^(i),\beta^(i) \right)$ is good for $pi$ if $pi$ accepts $alpha^(i)$ and rejects ,$\beta ^(i)$ We prove that the BCM problem is NP-complete even for a very simple type $Omega_i = left{z mid 0 \ eq z subseteq Sigma \right}$, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type $Omega_ infty = Omega _1 cup left{Sigma^[ij] mid 1 leq i leq j \right}$ , where $Sigma^[ij]$ is the set of strings over $Sigma$ of length between i and j. Furthermore, for the BCM problem for $Omega$, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.続きを見る

### 本文情報を非表示

rifis-tr-115 pdf 0.98 MB 74

### 詳細

レコードID 3217 査読無 RIFIS Technical Report || 115 || p1-12 http://www.i.kyushu-u.ac.jp/research/report.html algorithms and computational complexity genome informatics テクニカルレポート 2009.04.22 2017.01.20