Policy Learning Using Modified Learning Vector Quantization for Reinforcement Learning Problems - Collections | Kyushu University Library

Back to Results List

＜departmental bulletin paper＞
Policy Learning Using Modified Learning Vector Quantization for Reinforcement Learning Problems

Creator	Creator Name Afif Mohd Faudzi, Ahmad Affiliation Affiliation Name Department of Electrical and Electronic Engineering, Graduate School of Information and Electrical Engineering, Kyushu University \| Department of Electrical and Electronic Engineering, Universiti Malaysia
Creator	Author PID K000238 Creator Name 村田, 純一 Murata, Junichi ムラタ, ジュンイチ Affiliation Affiliation Name 九州大学大学院システム情報科学研究院電気システム工学 : 教授 Department of Electrical Engineering, Faculty of Information Science and Electrical Engineering, Kyushu University : Professor
Language	English
Publisher	九州大学大学院システム情報科学研究院
Publisher	Faculty of Information Science and Electrical Engineering, Kyushu University
Date	2015-07-24
Source Title	Research reports on information science and electrical engineering of Kyushu University
Vol	20
Issue	2
First Page	39
Last Page	44
Publication Type	Version of Record
Access Rights	open access
JaLC DOI	https://doi.org/10.15017/1560523
Abstract	Reinforcement learning (RL) enables an agent to _nd an optimal solution to a problem by interacting with the environment. In the previous research, Q-learning, one of the popular learning meth-ods in ...RL, is used to generate a policy. From it, abstract policy is extracted by LVQ algorithm. In this paper, the aim is to train the agent to learn an optimal policy from scratch as well as to generate the abstract policy in a single operation by LVQ algorithm. When applying LVQ algorithm in a RL frame-work, due to an erroneous teaching signal in LVQ algorithm, the learning sometimes end up with failure or with non-optimal solution. Here, a new LVQ algorithm is proposed to overcome this problem. The new LVQ algorithm introduce, _rst, a regular reward that is obtained by the agent autonomously based on its behavior and second, a function that convert a regular reward to a new reward so that the learning system does not su_er from an undesirable e_ect by a small reward. Through these modi_cations, the agent is expected to _nd the optimal solution more e_ciently.show more

Hide fulltext details.

File	FileType	Size	Views	Description
p039	pdf	439 KB	264

Details

PISSN	1342-3819
EISSN	2188-0891
NCID	AN10569524
Record ID	1560523
Peer-Reviewed	Refereed
Subject Terms	Policy learning
	Learning Vector Quantization
	Reinforcement learning and Abstraction
Created Date	2016.02.03
Modified Date	2020.10.12