过度 发表于 2025-3-26 23:42:16
Introduction of Reinforcement Learning (RL), is a type of machine learning task where decisionmakers try to maximize long-term rewards or minimize long-term costs. In an RL task, decision-makers observe the environments, and act according to the observations. After the actions, the decision-makers can get rewards or costs.长矛 发表于 2025-3-27 02:33:02
http://reply.papertrans.cn/83/8260/825929/825929_32.pnggarrulous 发表于 2025-3-27 07:22:47
http://reply.papertrans.cn/83/8260/825929/825929_33.pnggorgeous 发表于 2025-3-27 13:14:54
http://reply.papertrans.cn/83/8260/825929/825929_34.pngmonopoly 发表于 2025-3-27 14:15:29
http://reply.papertrans.cn/83/8260/825929/825929_35.png画布 发表于 2025-3-27 20:56:28
http://reply.papertrans.cn/83/8260/825929/825929_36.png高谈阔论 发表于 2025-3-28 01:08:03
PG: Policy Gradient,The policy optimization algorithms in Chaps. 2–6 use the optimal value estimates to find the optimal policy, so those algorithms are called optimal value algorithm. However, estimating optimal values are not necessary for policy optimization.去才蔑视 发表于 2025-3-28 04:46:22
,AC: Actor–Critic,Actor–critic method combines the policy gradient method and bootstrapping. On the one hand, it uses policy gradient theorem to calculate policy gradient and update parameters. This part is called actor. On the other hand, it estimates values, and uses the value estimate to bootstrap.Encephalitis 发表于 2025-3-28 07:25:49
http://reply.papertrans.cn/83/8260/825929/825929_39.png薄膜 发表于 2025-3-28 13:00:32
Maximum-Entropy RL,This chapter introduces maximum-entropy RL, which uses the concept of entropy in information theory to encourage exploration.