A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES - Collections

＜journal article＞
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES

Creator	Creator Name Iki, Tetsuichiro 伊喜, 哲一郎 Affiliation Affiliation Name Faculty of Education and Culture, Miyazaki University 宮崎大学教育文化学部
	Creator Name Horiguchi, Masayuki Affiliation Affiliation Name General Education, Yuge National College of Maritime Technology
	Creator Name Yasuda, Masami 安田, 正實 Affiliation Affiliation Name Faculty of Science, Chiba University 千葉大学理学部
	Creator Name Kurano, Masami 蔵野, 正美 Affiliation Affiliation Name Faculty of Education, Chiba University 千葉大学教育学部
Language	English
Publisher	Research Association of Statistical Sciences
Publisher	統計科学研究会
Date	2007-12
Source Title	Bulletin of informatics and cybernetics
Vol	39
First Page	11
Last Page	24
Publication Type	Version of Record
Access Rights	open access
Crossref DOI	https://doi.org/10.5109/16771
Related DOI	Bulletin of informatics and cybernetics \|\| 39 \|\| p11-24
Related DOI	http://bic.math.kyushu-u.ac.jp/
Related URI	Bulletin of informatics and cybernetics \|\| 39 \|\| p11-24
Related URI	http://bic.math.kyushu-u.ac.jp/
Relation	Bulletin of informatics and cybernetics \|\| 39 \|\| p11-24
Relation	http://bic.math.kyushu-u.ac.jp/
Abstract	This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the reward-penalty type... for the communicating case of multi-chain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expected reward criterion. Also, a numerical experiment is given to show the practical effectiveness of the algorithm.show more

Hide fulltext details.

File	FileType	Size	Views	Description
bic039_p011	pdf	160 KB	355

Details

PISSN	0286-522X
EISSN	2435-743X
NCID	AA10634475
Record ID	16771
Peer-Reviewed	Refereed
Subject Terms	Adaptive policy
	Average case
	Communicating case
	Learning algorithm
	Markov decision processes
	Reward-penalty type
	Unknown transition matrix
Type	学術雑誌論文
Created Date	2010.03.11
Modified Date	2020.11.02

Export

Link to this page

Search Other Services

Statistics

＜journal article＞ A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES

Hide fulltext details.

Details

People who viewed this item also viewed

＜journal article＞
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES