过度
发表于 2025-3-26 23:42:16
Introduction of Reinforcement Learning (RL), is a type of machine learning task where decisionmakers try to maximize long-term rewards or minimize long-term costs. In an RL task, decision-makers observe the environments, and act according to the observations. After the actions, the decision-makers can get rewards or costs.
长矛
发表于 2025-3-27 02:33:02
http://reply.papertrans.cn/83/8260/825929/825929_32.png
garrulous
发表于 2025-3-27 07:22:47
http://reply.papertrans.cn/83/8260/825929/825929_33.png
gorgeous
发表于 2025-3-27 13:14:54
http://reply.papertrans.cn/83/8260/825929/825929_34.png
monopoly
发表于 2025-3-27 14:15:29
http://reply.papertrans.cn/83/8260/825929/825929_35.png
画布
发表于 2025-3-27 20:56:28
http://reply.papertrans.cn/83/8260/825929/825929_36.png
高谈阔论
发表于 2025-3-28 01:08:03
PG: Policy Gradient,The policy optimization algorithms in Chaps. 2–6 use the optimal value estimates to find the optimal policy, so those algorithms are called optimal value algorithm. However, estimating optimal values are not necessary for policy optimization.
去才蔑视
发表于 2025-3-28 04:46:22
,AC: Actor–Critic,Actor–critic method combines the policy gradient method and bootstrapping. On the one hand, it uses policy gradient theorem to calculate policy gradient and update parameters. This part is called actor. On the other hand, it estimates values, and uses the value estimate to bootstrap.
Encephalitis
发表于 2025-3-28 07:25:49
http://reply.papertrans.cn/83/8260/825929/825929_39.png
薄膜
发表于 2025-3-28 13:00:32
Maximum-Entropy RL,This chapter introduces maximum-entropy RL, which uses the concept of entropy in information theory to encourage exploration.