<journal article>
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES

Creator
Language
Publisher
Date
Source Title
Vol
First Page
Last Page
Publication Type
Access Rights
Crossref DOI
Related DOI
Related URI
Relation
Abstract This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the reward-penalty type... for the communicating case of multi-chain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expected reward criterion. Also, a numerical experiment is given to show the practical effectiveness of the algorithm.show more

Hide fulltext details.

pdf bic039_p011 pdf 160 KB 346  

Details

PISSN
EISSN
NCID
Record ID
Peer-Reviewed
Subject Terms
Type
Created Date 2010.03.11
Modified Date 2020.11.02

People who viewed this item also viewed