Jump to ratings and reviews
Rate this book

Super Study Guide: Transformers & Large Language Models

Rate this book

247 pages, Paperback

Published August 3, 2024

9 people are currently reading
66 people want to read

About the author

Afshine Amidi

4 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
11 (68%)
4 stars
3 (18%)
3 stars
2 (12%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for João Panizzutti.
89 reviews
October 4, 2025
Finally a book that teaches you EVERYTHING that you need to know about LLMS from first principles (if you already have a grasp on math).
This was a very complex book as i tried to reimplement each one of the concepts by myself. But it finally made me understand how LLMS work in a very conceptual level.
Amazing book for anyone that wants to learn how ChatGpt or any of the so called AIs (which are just probabilistic models) work.

A bit of my overview on the attention mechanism from the book:
Transformers receive tokens, which are composed of on average 4 characters each.

Based on these token the role of the transformer is to output what the next token on the sentence is gonna be

First each token gains a positional embedding and a contextual embedding
# The Self Attention Mechanism
This is the mechanism that teaches the model what each word means and how it relates to all the other ones.
Each word gets a Query, a Key, and a Value.
The Query is what the word looks for
The key is the anwser to the query
And the value is the information of the word itself
For every word we pass the query througout all the other words in the sentence, we use the dot product to compare it to every other key.

Then we divide that dot product by a square root to not let it be too huge.

We pass that dot product into a softmax for every other word. That gives us a vector of the relationship of the word towards all the others in the phrase. This tells the model how that words relate the rest of the phrase

Based on this vector we get a new calculation for the word based on the Weighted Sum of all the other values multipled by the "focus" value .That is the new value of that word.

Then we do that for all the words

# Multi Head
Theres many heads that do that. Because of that each of them can learn something different about the words. In the end we concatenate the results

# Feed Forward
After that there are some feed forward neural networks to discover other informations about the words we may be missed

# The Decoder
Basically the decoder is the part that does all of this, it makes each word have
Context Awareness
Position Awareness
Token Awareness

When we use encoder and decoders we are trying to translate from one languague to another but if we are doing languague modelling (like LLMs do) we will only use decoders.

But LLMs are causal, that means it can only see the attention of PREVIOUS tokens. The name of this is [[Masked Self Attention ]]
# Probabilities
When we get the probabilities for the next word we can either
Use a deterministic choice, choosing the token that is the most likely
Or use sampling, we sample ghe
Profile Image for TK.
110 reviews97 followers
December 29, 2025
I recommend this book if you have already studied these topics in-depth before and are looking for a quick summary and overview to have a knowledge recap. If you're learning from scratch, I would suggest taking a look at other resources that have more depth in theory, like Understanding Deep Learning (book), Attention is All You Need (paper), Language Modeling from Scratch from Stanford (course), and Transformers, The Tech Behind LLMs (DL course). If you're looking for a more practical guide, I suggest the Transformers from Scratch notebook from Kaggle and How Transformer LLMs Work course by DeepLearning.AI, so you can implement it from scratch with Python and PyTorch.

As a follow-up guide, I would give a 4-star review. As a starting book, 3.
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.