Status Updates From Super Study Guide: Transfor...

Super Study Guide: Transformers & Large Language Models
by

Afshine Amidi

all editions | this edition
show all updates | only show updates with text

Status Updates Showing 1-16 of 16

Elvis Oric is on page 5 of 247

— Feb 15, 2026 12:46PM Add a comment

KN is on page 94 of 247

Logit: z = Wx + b
p_i = Softmax(z_i) = e^z_i / \sum e^z
has the property
log(p_i / p_j) = z_i - z_j

Can generalize to softmax(z / T)
Default: T =1
T increases -> flattened distribution -> more “creative”

— Feb 09, 2026 10:11PM Add a comment

KN is on page 86 of 247

Attention: Q, K, V

Transformer: Unified vector to encode token + position (shared embedding)

Encoder: Self-attention
Decoder: Cross-attend to all inputs

— Feb 08, 2026 01:22AM Add a comment