Yuchen’s Reviews > AI Engineering: Building Applications with Foundation Models > Status Update
Yuchen
is on page 405 of 532
Part of Ch8 feels like a repeat of the another book "Designing machine learning systems"'s ch4, especially about
1. how to handle lack of labels: use weak/semi supervision or active learning
2. data argumentation: perturbation, synthetic data
Some of the ideas are new
1. distillation
2. instruction data synthesis
Data evaluation and verification are challenging. It is common to use model eval method here too.
— Nov 27, 2025 03:36PM
1. how to handle lack of labels: use weak/semi supervision or active learning
2. data argumentation: perturbation, synthetic data
Some of the ideas are new
1. distillation
2. instruction data synthesis
Data evaluation and verification are challenging. It is common to use model eval method here too.
Like flag
Yuchen’s Previous Updates
Yuchen
is finished
Ch10 brings everything together
Step1:use RAG, see ch 6
Step 2:guardrails, see ch 5 for attach model, ch4 for qualify failures
Step 3:router&gateway, layer stacking, ch7
Step 4:cache, KV cache, prompt cache, SQL cache, semantic cache
step5: agent, ch6, write actions
Re user feedback
1. conversational interface makes it easier for user feedback
2. AI engineering is closer to product.
— Nov 27, 2025 09:59PM
Step1:use RAG, see ch 6
Step 2:guardrails, see ch 5 for attach model, ch4 for qualify failures
Step 3:router&gateway, layer stacking, ch7
Step 4:cache, KV cache, prompt cache, SQL cache, semantic cache
step5: agent, ch6, write actions
Re user feedback
1. conversational interface makes it easier for user feedback
2. AI engineering is closer to product.
Yuchen
is on page 448 of 532
Ch9 is one of best. Inference optimization part has lot of insights.
1.quantization is more common/useful compared to pruning.
2.overcome decoding bottleneck: overcome decoding bottleneck: speculative decoding, inference with reference, parallel decoding,
3. attention mechanism optimization: KV cache size, kernel & complier.
— Nov 27, 2025 08:20PM
1.quantization is more common/useful compared to pruning.
2.overcome decoding bottleneck: overcome decoding bottleneck: speculative decoding, inference with reference, parallel decoding,
3. attention mechanism optimization: KV cache size, kernel & complier.
Yuchen
is on page 363 of 532
Part of Ch8 feels like a repeat of the another book "Designing machine learning systems"'s ch4, especially about
1. how to handle lack of labels: use weak/semi supervision or active learning
2. data argumentation: perturbation, synthetic data
Some of the ideas are new
1. distillation
2. instruction data synthesis
— Nov 27, 2025 03:33PM
1. how to handle lack of labels: use weak/semi supervision or active learning
2. data argumentation: perturbation, synthetic data
Some of the ideas are new
1. distillation
2. instruction data synthesis
Yuchen
is on page 363 of 532
Ch7 is dense about fine-tuning. One good summary is "as model size increases, fine tuning become in practical as updating entire model's weight is inpractical".
One of the most important PEFT method is LoRA. It is memory efficient and modular: Easy to fine tuning multiple LoRA models based on the same base model.
— Nov 26, 2025 10:39PM
One of the most important PEFT method is LoRA. It is memory efficient and modular: Easy to fine tuning multiple LoRA models based on the same base model.
Yuchen
is on page 306 of 532
Good chapter about RAG.
1. Term-based retrievers: BM25
2. Embedding-based retrievers: a lot of vector search algorithms
3. Ask the agent to use the function as tool: usually the tool function has a tool description, parameters section as agent's context. You can either let agent plan the work using real function name, or use a translator to convert language to function name.
— Nov 24, 2025 10:26PM
1. Term-based retrievers: BM25
2. Embedding-based retrievers: a lot of vector search algorithms
3. Ask the agent to use the function as tool: usually the tool function has a tool description, parameters section as agent's context. You can either let agent plan the work using real function name, or use a translator to convert language to function name.
Yuchen
is on page 253 of 532
Brief introduction for prompt engineering.
1. understand system and user prompt
2. best practice: adopt a persona, provide example, specify output format, break into subtask and use code to connect them, instruct model to use chain-of-thought
— Nov 23, 2025 10:37PM
1. understand system and user prompt
2. best practice: adopt a persona, provide example, specify output format, break into subtask and use code to connect them, instruct model to use chain-of-thought
Yuchen
is on page 211 of 532
Ch4 talks about how to evaluate a model. I can tell that this chapter is more about trying different things and there is no formal procedure for it.
1. how to test hallucination: ask whether X has connection to Y for a model, when X has nothing to do with Y
2. a public benchmark might leak into model's training set
3. Use n-gram overlapping or perplexity to test data contamination
— Nov 22, 2025 10:22PM
1. how to test hallucination: ask whether X has connection to Y for a model, when X has nothing to do with Y
2. a public benchmark might leak into model's training set
3. Use n-gram overlapping or perplexity to test data contamination
Yuchen
is on page 159 of 532
Solid Ch3.
1. why foundational models hard to evaluate? the language model component makes the evaluation open-ended. Unlike the ML's classification problems.
2. AI as a judge is useful, but human's evaluation is useful as it captures human's preference. Model can achieve perfect score in benchmark, human evaluation will never get saturated.
— Nov 22, 2025 02:36PM
1. why foundational models hard to evaluate? the language model component makes the evaluation open-ended. Unlike the ML's classification problems.
2. AI as a judge is useful, but human's evaluation is useful as it captures human's preference. Model can achieve perfect score in benchmark, human evaluation will never get saturated.
Yuchen
is on page 112 of 532
Informative chapter:
1. ch2 will be better if it explains the evolution from encoder/decoder RNN, to Dot-Product Attention, to scale dot product (LLM uses this).
2. the transformer blocks explanation is confusing.
3. discussion about pre-training, post-training and sampling are very informative. Consider read them again after finishing the book. Ch2 refer to other chapters a lot.
— Nov 20, 2025 10:30PM
1. ch2 will be better if it explains the evolution from encoder/decoder RNN, to Dot-Product Attention, to scale dot product (LLM uses this).
2. the transformer blocks explanation is confusing.
3. discussion about pre-training, post-training and sampling are very informative. Consider read them again after finishing the book. Ch2 refer to other chapters a lot.
Yuchen
is on page 49 of 532
Chapter 1:
Good summary and great comparison between the ML engineering and the AI (application) engineering.
1. Multiple techniques to get the foundation models to generate what you want: prompt engineer, retrieval-augmented generation (RAG) and fine-tuning.
2. with foundation model availability today, it is possible to start with product, then invest in data&models when product show promises.
— Nov 15, 2025 04:34PM
Good summary and great comparison between the ML engineering and the AI (application) engineering.
1. Multiple techniques to get the foundation models to generate what you want: prompt engineer, retrieval-augmented generation (RAG) and fine-tuning.
2. with foundation model availability today, it is possible to start with product, then invest in data&models when product show promises.

