ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. B. 理工学域; 数物科学類・物質化学類・機械工学類・フロンティア工学類・電子情報通信学類・地球社会基盤学類・生命理工学類
  2. b 10. 学術雑誌掲載論文
  3. 1.査読済論文(工)

Reinforcement learning accelerated by using state transition model with robotic applications

http://hdl.handle.net/2297/1847
http://hdl.handle.net/2297/1847
b25e9a60-3685-465e-9f14-4945874039c7
名前 / ファイル ライセンス アクション
TE-PR-SENDA-K-IROS04_3732.pdf TE-PR-SENDA-K-IROS04_3732.pdf (463.2 kB)
Item type 学術雑誌論文 / Journal Article(1)
公開日 2017-10-03
タイトル
タイトル Reinforcement learning accelerated by using state transition model with robotic applications
言語
言語 eng
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_6501
資源タイプ journal article
著者 Senda, Kei

× Senda, Kei

WEKO 9835
研究者番号 60206662

Senda, Kei

Search repository
Fujii, Shinji

× Fujii, Shinji

WEKO 10241

Fujii, Shinji

Search repository
Mano, Syusuke

× Mano, Syusuke

WEKO 10242

Mano, Syusuke

Search repository
提供者所属
内容記述タイプ Other
内容記述 金沢大学工学部
書誌情報 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

巻 4, p. 3732-3737, 発行日 2004-09-01
DOI
関連タイプ isIdenticalTo
識別子タイプ DOI
関連識別子 https://doi.org/10.1109/iros.2004.1389995
出版者
出版者 IEEE
抄録
内容記述タイプ Abstract
内容記述 This paper discusses a method to accelerate reinforcement learning. Firstly defined is a concept that reduces the state space conserving policy. An algorithm is then given that calculates the optimal cost-to-go and the optimal policy in the reduced space from those in the original space. Using the reduced state space, learning convergence is accelerated. Its usefulness for both DP (dynamic programing) iteration and Q-learning are compared through a maze example. The convergence of the optimal cost-to-go in the original state space needs approximately N or more times as long as that in the reduced state space, where N is a ratio of the state number of the original space to the reduced space. The acceleration effect for Q-learning is more remarkable than that for the DP iteration. The proposed technique is also applied to a robot manipulator working for a peg-in-hole task with geometric constraints. The state space reduction can be considered as a model of the change of observation, i.e., one of cognitive actions. The obtained results explain that the change of observation is reasonable in terms of learning efficiency.
権利
権利情報 ©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4, 2004,pp. 3732-3737
著者版フラグ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
戻る
0
views
See details
Views

Versions

Ver.1 2023-07-27 12:10:16.848714
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3