Jump to ratings and reviews
Rate this book

Deep Reinforcement Learning Hands-On: A practical and easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF

Rate this book
Maxim Lapan delivers intuitive explanations and gradual insights into complex reinforcement learning (RL) concepts, starting from the basics of RL on simple environments and tasks to modern state-of-the-art methods



Purchase of the print or Kindle book includes a free PDF eBook.

Key FeaturesLearn with concise explanations, modern libraries, and diverse applications from games to stock trading and NLP chatbotsSpeed up RL models using algorithmic and engineering approachesNew content on RL from human feedback (RLHF), MuZero, and transformersBook DescriptionReward yourself and take this journey into RL with the third edition of Deep Reinforcement Learning Hands-On. The book takes you through the basics of reinforcement learning to the latest use cases, including the use of reinforcement learning with a wide variety of applications, including discrete optimization, game playing, stock trading, and web browser navigation.

The book retains its strengths by providing concise and easy-to-follow explanations. You’ll work through practical and diverse examples, from grid environments and games to stock trading and RL agents in web environments, to give you a well-rounded understanding of reinforcement learning, its capabilities, and use cases. You’ll learn about key topics, such as deep Q-networks, policy gradient methods, continuous control problems, and highly scalable, non-gradient methods.

If you want to learn about RL using a practical approach with real-world applications, concise explanations, and the incremental development of topics, then Deep Reinforcement Learning Hands-On, Third Edition is your ideal companion.

This book will equip you with both the practical know-how of RL and the theoretical foundation to understand and implement most modern RL papers.

What you will learnStay on the cutting edge with new content on MuZero, RL with human feedback, and LLMsUnderstand the deep learning context of RL and implement complex deep learning modelsEvaluate RL methods, including cross-entropy, DQN, actor-critic, TRPO, PPO, DDPG, and D4PGImplement RL algorithms using PyTorch and modern RL librariesApply deep RL to real-world scenarios, from board games to stock tradingLearn advanced exploration techniques for improved model performanceWho this book is forThis book is ideal for machine learning engineers, software engineers and data scientists looking to apply deep reinforcement learning in practice. Both beginners and experienced practitioners will gain practical expertise in modern reinforcement learning techniques and their applications using PyTorch.

Table of ContentsWhat Is Reinforcement Learning?OpenAI GymDeep Learning with PyTorchThe Cross-Entropy MethodTabular Learning and the Bellman EquationDeep Q-NetworksHigher-Level RL LibrariesDQN ExtensionsWays to Speed up RLStocks Trading Using RLPolicy Gradients – an AlternativeActor-Critic Methods - A2C and A3CThe TextWorld EnvironmentWeb NavigationContinuous Action SpaceTrust Regions – PPO, TRPO, ACKTR, and SACBlack-Box Optimization in RLAdvanced Exploration

1091 pages, Kindle Edition

Published November 12, 2024

18 people are currently reading
11 people want to read

About the author

Maxim Lapan

4 books8 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
3 (100%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
2 reviews
February 14, 2025
After reading Reinforcement Learning: An Introduction by Richard Sutton, which primarily focuses on tabular methods and provides only a relatively superficial discussion of deep neural networks, I found this book to be a refreshing complement. Both books complement each other well, and I especially appreciated the implementations, which often helped deepen my understanding of the algorithms.

However, I did notice a potential issue in the RLFH chapter. The author correctly includes the loss function for training the reward model, but there is no mention of the modification to the reward function that accounts for the KL divergence penalty between the new and old policy (see paper Fine-Tuning Language Models from Human Preferences). Additionally, I couldn’t find this adjustment in the implementation. That said, I reviewed this in a physical copy, so I may have missed something. I hope the author can verify whether this omission was intentional or an oversight.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.