找回密码
 To register

QQ登录

只需一步,快速开始

扫一扫,访问微社区

Titlebook: Reinforcement Learning; Richard S. Sutton Book 1992 Springer Science+Business Media New York 1992 agents.algorithms.artificial intelligenc

[复制链接]
查看: 12558|回复: 43
发表于 2025-3-21 16:35:22 | 显示全部楼层 |阅读模式
书目名称Reinforcement Learning
编辑Richard S. Sutton
视频video
丛书名称The Springer International Series in Engineering and Computer Science
图书封面Titlebook: Reinforcement Learning;  Richard S. Sutton Book 1992 Springer Science+Business Media New York 1992 agents.algorithms.artificial intelligenc
描述Reinforcement learning is the learning of a mapping fromsituations to actions so as to maximize a scalar reward orreinforcement signal. The learner is not told which action to take, asin most forms of machine learning, but instead must discover whichactions yield the highest reward by trying them. In the mostinteresting and challenging cases, actions may affect not only theimmediate reward, but also the next situation, and through that allsubsequent rewards. These two characteristics -- trial-and-errorsearch and delayed reward -- are the most important distinguishingfeatures of reinforcement learning. .Reinforcement learning is both a new and a very old topic in AI. Theterm appears to have been coined by Minsk (1961), and independently incontrol theory by Walz and Fu (1965). The earliest machine learningresearch now viewed as directly relevant was Samuel‘s (1959) checkerplayer, which used temporal-difference learning to manage delayedreward much as it is used today. Of course learning and reinforcementhave been studied in psychology for almost a century, and that workhas had a very strong impact on the AI/engineering work. One could infact consider all of reinforcement learning to
出版日期Book 1992
关键词agents; algorithms; artificial intelligence; control; learning; machine learning; proving; reinforcement le
版次1
doihttps://doi.org/10.1007/978-1-4615-3618-5
isbn_softcover978-1-4613-6608-9
isbn_ebook978-1-4615-3618-5Series ISSN 0893-3405
issn_series 0893-3405
copyrightSpringer Science+Business Media New York 1992
The information of publication is updating

书目名称Reinforcement Learning影响因子(影响力)




书目名称Reinforcement Learning影响因子(影响力)学科排名




书目名称Reinforcement Learning网络公开度




书目名称Reinforcement Learning网络公开度学科排名




书目名称Reinforcement Learning被引频次




书目名称Reinforcement Learning被引频次学科排名




书目名称Reinforcement Learning年度引用




书目名称Reinforcement Learning年度引用学科排名




书目名称Reinforcement Learning读者反馈




书目名称Reinforcement Learning读者反馈学科排名




单选投票, 共有 1 人参与投票
 

0票 0.00%

Perfect with Aesthetics

 

0票 0.00%

Better Implies Difficulty

 

0票 0.00%

Good and Satisfactory

 

1票 100.00%

Adverse Performance

 

0票 0.00%

Disdainful Garbage

您所在的用户组没有投票权限
发表于 2025-3-21 23:12:07 | 显示全部楼层
发表于 2025-3-22 02:33:55 | 显示全部楼层
发表于 2025-3-22 08:31:57 | 显示全部楼层
https://doi.org/10.1007/978-1-4615-3618-5agents; algorithms; artificial intelligence; control; learning; machine learning; proving; reinforcement le
发表于 2025-3-22 11:58:11 | 显示全部楼层
0893-3405 learner is not told which action to take, asin most forms of machine learning, but instead must discover whichactions yield the highest reward by trying them. In the mostinteresting and challenging cases, actions may affect not only theimmediate reward, but also the next situation, and through that
发表于 2025-3-22 14:24:18 | 显示全部楼层
Technical Note,he action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed each iteration, rather than just one.
发表于 2025-3-22 20:23:27 | 显示全部楼层
发表于 2025-3-22 21:41:06 | 显示全部楼层
Introduction: The Challenge of Reinforcement Learning,m. In the most interesting and challenging cases, actions may affect not only the immediate’s reward, but also the next situation, and through that all subsequent rewards. These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.
发表于 2025-3-23 02:36:48 | 显示全部楼层
Book 1992 not told which action to take, asin most forms of machine learning, but instead must discover whichactions yield the highest reward by trying them. In the mostinteresting and challenging cases, actions may affect not only theimmediate reward, but also the next situation, and through that allsubsequ
发表于 2025-3-23 08:20:02 | 显示全部楼层
 关于派博传思  派博传思旗下网站  友情链接
派博传思介绍 公司地理位置 论文服务流程 影响因子官网 SITEMAP 大讲堂 北京大学 Oxford Uni. Harvard Uni.
发展历史沿革 期刊点评 投稿经验总结 SCIENCEGARD IMPACTFACTOR 派博系数 清华大学 Yale Uni. Stanford Uni.
|Archiver|手机版|小黑屋| 派博传思国际 ( 京公网安备110108008328) GMT+8, 2025-6-22 11:37
Copyright © 2001-2015 派博传思   京公网安备110108008328 版权所有 All rights reserved
快速回复 返回顶部 返回列表