作成者 |
|
本文言語 |
|
出版者 |
|
|
発行日 |
|
収録物名 |
|
巻 |
|
号 |
|
開始ページ |
|
終了ページ |
|
出版タイプ |
|
アクセス権 |
|
Crossref DOI |
|
関連DOI |
|
|
関連URI |
|
|
関連情報 |
|
|
概要 |
We develop a method for learning the optimal strategies of 2-person zero-sum Markov game with expected average reward criterion. To do this, at each stage the players play a modified matrix game with ...relation to each state, and then receive an information about the result of the game from a teacher. Using the information, the players generate a pair of mixed strategies with relation to each state used at next stage. Then, such a pair of mixed strategies generated by the players converges with probability one and in mean square to a pair of the optimal stationary strategies. Further, when the learning is stopped at some stage by the teacher, the probability of error is estimated.続きを見る
|