WEKO3
インデックスリンク
アイテム
Reinforcement learning accelerated by using state transition model with robotic applications
http://hdl.handle.net/2297/1847
http://hdl.handle.net/2297/1847b25e9a60-3685-465e-9f14-4945874039c7
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2017-10-03 | |||||
タイトル | ||||||
タイトル | Reinforcement learning accelerated by using state transition model with robotic applications | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||
資源タイプ | journal article | |||||
著者 |
Senda, Kei
× Senda, Kei× Fujii, Shinji× Mano, Syusuke |
|||||
提供者所属 | ||||||
内容記述タイプ | Other | |||||
内容記述 | 金沢大学工学部 | |||||
書誌情報 |
2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 巻 4, p. 3732-3737, 発行日 2004-09-01 |
|||||
DOI | ||||||
関連タイプ | isIdenticalTo | |||||
識別子タイプ | DOI | |||||
関連識別子 | https://doi.org/10.1109/iros.2004.1389995 | |||||
出版者 | ||||||
出版者 | IEEE | |||||
抄録 | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | This paper discusses a method to accelerate reinforcement learning. Firstly defined is a concept that reduces the state space conserving policy. An algorithm is then given that calculates the optimal cost-to-go and the optimal policy in the reduced space from those in the original space. Using the reduced state space, learning convergence is accelerated. Its usefulness for both DP (dynamic programing) iteration and Q-learning are compared through a maze example. The convergence of the optimal cost-to-go in the original state space needs approximately N or more times as long as that in the reduced state space, where N is a ratio of the state number of the original space to the reduced space. The acceleration effect for Q-learning is more remarkable than that for the DP iteration. The proposed technique is also applied to a robot manipulator working for a peg-in-hole task with geometric constraints. The state space reduction can be considered as a model of the change of observation, i.e., one of cognitive actions. The obtained results explain that the change of observation is reasonable in terms of learning efficiency. | |||||
権利 | ||||||
権利情報 | ©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4, 2004,pp. 3732-3737 | |||||
著者版フラグ | ||||||
出版タイプ | VoR | |||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 |