Status Updates From Reinforcement Learning: An ...
Reinforcement Learning: An Introduction by
Status Updates Showing 1-30 of 1,173
Pawan
is on page 460 of 552
Parameters such as α, n, γ, λ can be picked by trials (n ~ 5 in most cases), instead of grid search, as they seem fairly independent. Methods can be specific to the problem (which maybe a prediction or control, amongst others) with tradeoffs between latency and memory. LLMs help with method selection, discretisation and reduction of state-action space to explore feasibility of ideas.
Had to relearn backgammon.
— Apr 05, 2026 01:06PM
Add a comment
Had to relearn backgammon.
Pawan
is on page 357 of 552
This subject is rife with higher algebra (especially progressions and probability). Notations can cause mental havoc. Part III seems like interesting prose.
— Apr 04, 2026 10:58AM
Add a comment
Pawan
is on page 335 of 552
Tabular methods can be understood without programming by creating a scenario with a small state-action space, initialising policy intuitively, and then iterating through the algorithms by setting the variable values. This may not be possible for complex algorithms in the second part - especially off-policy and online search and approximation.
It gets increasingly interesting. Need to revisit a few chapters.
— Mar 14, 2026 07:39PM
Add a comment
It gets increasingly interesting. Need to revisit a few chapters.
Pawan
is on page 255 of 552
Not as difficult to read now. Going to try non-model based implementations in JAX, but first, remembering my kite equations using plotting libraries.
— Jan 11, 2026 08:57PM
Add a comment
Pawan
is on page 140 of 552
I had read a different edition of this book for a coursework until a few chapters back, which was available online for free, through none other than Google search. My mind was too shallow to estimate values for bootstrapping, back then. Time to dig.
— Jan 05, 2026 09:16PM
Add a comment








