KN’s Reviews > Super Study Guide: Transformers & Large Language Models > Status Update

KN
KN is on page 86 of 247
Attention: Q, K, V

Transformer: Unified vector to encode token + position (shared embedding)

Encoder: Self-attention
Decoder: Cross-attend to all inputs
Feb 08, 2026 01:22AM
Super Study Guide: Transformers & Large Language Models

flag

KN’s Previous Updates

KN
KN is on page 94 of 247
Logit: z = Wx + b
p_i = Softmax(z_i) = e^z_i / \sum e^z
has the property
log(p_i / p_j) = z_i - z_j

Can generalize to softmax(z / T)
Default: T =1
T increases -> flattened distribution -> more “creative”
Feb 09, 2026 10:11PM
Super Study Guide: Transformers & Large Language Models


No comments have been added yet.