Bravura 发表于 2025-3-25 06:52:34
http://reply.papertrans.cn/19/1806/180555/180555_21.png旧式步枪 发表于 2025-3-25 10:44:33
http://reply.papertrans.cn/19/1806/180555/180555_22.pngnephritis 发表于 2025-3-25 12:48:55
http://reply.papertrans.cn/19/1806/180555/180555_23.pngMri485 发表于 2025-3-25 16:02:48
http://reply.papertrans.cn/19/1806/180555/180555_24.png芭蕾舞女演员 发表于 2025-3-25 21:50:05
The discount sequence,The particular discount sequence plays a critical role in any bandit or other decision problem. Various interpretations of discount sequences are discussed in this chapter. One purpose of the discussion is to aid a user in choosing an appropriate sequence; another is to motivate interest in the generality of discounting allowed in this monograph.Minatory 发表于 2025-3-26 04:04:49
Introduction,Information as to the effectiveness of the treatments accrues as they are used. The overall objective is to treat as many patients as effectively as possible. This seemingly innocent but important problem is surprisingly difficult, even when the responses are dichotomous, either success or failure. It is an example of a two-armed bandit problem.树木中 发表于 2025-3-26 08:10:33
http://reply.papertrans.cn/19/1806/180555/180555_27.png啤酒 发表于 2025-3-26 12:32:00
Two arms, one arm known,died in .., now abbreviated to ., the distribution of the random measure ... For arbitrary . we can, without loss, assume that arm 2 always produces the known observation . Since . is given by the pair (.), we now speak of the (., .; .)-bandit.capsule 发表于 2025-3-26 15:13:46
Two independent Bernoulli arms; uniform discounting,nce . has horizon n and is uniform: . . = ... = . . = 1 and . . = . . = ... = 0. Such uniform discounting has been considered extensively through examples in the first five chapters of this book, and in the literature generally. The objective implicit in uniform discounting is to maximize the expected sum of the first . observations.Projection 发表于 2025-3-26 17:42:13
http://reply.papertrans.cn/19/1806/180555/180555_30.png