Creator |
|
Language |
|
Publisher |
|
|
Date |
|
Source Title |
|
Vol |
|
Issue |
|
First Page |
|
Last Page |
|
Publication Type |
|
Access Rights |
|
Crossref DOI |
|
Related DOI |
|
|
Related URI |
|
|
Relation |
|
|
Abstract |
We develop a method for learning the optimal strategies of 2-person zero-sum Markov game with expected average reward criterion. To do this, at each stage the players play a modified matrix game with ...relation to each state, and then receive an information about the result of the game from a teacher. Using the information, the players generate a pair of mixed strategies with relation to each state used at next stage. Then, such a pair of mixed strategies generated by the players converges with probability one and in mean square to a pair of the optimal stationary strategies. Further, when the learning is stopped at some stage by the teacher, the probability of error is estimated.show more
|