Status Updates From Super Study Guide: Transfor...

Super Study Guide: Transformers & Large Language Models Super Study Guide: Transformers & Large Language Models
by


Status Updates Showing 1-16 of 16

order by

KN
KN is on page 94 of 247
Logit: z = Wx + b
p_i = Softmax(z_i) = e^z_i / \sum e^z
has the property
log(p_i / p_j) = z_i - z_j

Can generalize to softmax(z / T)
Default: T =1
T increases -> flattened distribution -> more “creative”
Feb 09, 2026 10:11PM Add a comment
Super Study Guide: Transformers & Large Language Models

KN
KN is on page 86 of 247
Attention: Q, K, V

Transformer: Unified vector to encode token + position (shared embedding)

Encoder: Self-attention
Decoder: Cross-attend to all inputs
Feb 08, 2026 01:22AM Add a comment
Super Study Guide: Transformers & Large Language Models

TK
TK is on page 134 of 247
Dec 27, 2025 04:49AM Add a comment
Super Study Guide: Transformers & Large Language Models

TK
TK is on page 112 of 247
Dec 24, 2025 04:20AM Add a comment
Super Study Guide: Transformers & Large Language Models

TK
TK is on page 98 of 247
Dec 23, 2025 02:48AM Add a comment
Super Study Guide: Transformers & Large Language Models

TK
TK is on page 75 of 247
Dec 20, 2025 02:54PM Add a comment
Super Study Guide: Transformers & Large Language Models

TK
TK is on page 10 of 247
Nov 05, 2025 05:08AM Add a comment
Super Study Guide: Transformers & Large Language Models

Rahul
Rahul is on page 23 of 247
Oct 08, 2024 07:43AM Add a comment
Super Study Guide: Transformers & Large Language Models