Status Update

Yuchen is on page 405 of 532

Part of Ch8 feels like a repeat of the another book "Designing machine learning systems"'s ch4, especially about

1. how to handle lack of labels: use weak/semi supervision or active learning
2. data argumentation: perturbation, synthetic data

Some of the ideas are new

1. distillation
2. instruction data synthesis

Data evaluation and verification are challenging. It is common to use model eval method here too.

— Nov 27, 2025 03:36PM

AI Engineering: Building Applications with Foundation Models

Like flag

Yuchen’s Previous Updates

Yuchen is finished

Ch10 brings everything together

Step1:use RAG, see ch 6
Step 2:guardrails, see ch 5 for attach model, ch4 for qualify failures
Step 3:router&gateway, layer stacking, ch7
Step 4:cache, KV cache, prompt cache, SQL cache, semantic cache
step5: agent, ch6, write actions

Re user feedback
1. conversational interface makes it easier for user feedback
2. AI engineering is closer to product.

— Nov 27, 2025 09:59PM

Yuchen is on page 448 of 532

Ch9 is one of best. Inference optimization part has lot of insights.
1.quantization is more common/useful compared to pruning.
2.overcome decoding bottleneck: overcome decoding bottleneck: speculative decoding, inference with reference, parallel decoding,
3. attention mechanism optimization: KV cache size, kernel & complier.

— Nov 27, 2025 08:20PM

Yuchen is on page 363 of 532

— Nov 27, 2025 03:33PM

Yuchen is on page 363 of 532

Ch7 is dense about fine-tuning. One good summary is "as model size increases, fine tuning become in practical as updating entire model's weight is inpractical".

One of the most important PEFT method is LoRA. It is memory efficient and modular: Easy to fine tuning multiple LoRA models based on the same base model.

— Nov 26, 2025 10:39PM

Yuchen is on page 306 of 532

Good chapter about RAG.

1. Term-based retrievers: BM25
2. Embedding-based retrievers: a lot of vector search algorithms
3. Ask the agent to use the function as tool: usually the tool function has a tool description, parameters section as agent's context. You can either let agent plan the work using real function name, or use a translator to convert language to function name.

— Nov 24, 2025 10:26PM

Yuchen is on page 253 of 532

Brief introduction for prompt engineering.

1. understand system and user prompt
2. best practice: adopt a persona, provide example, specify output format, break into subtask and use code to connect them, instruct model to use chain-of-thought

— Nov 23, 2025 10:37PM

Yuchen is on page 211 of 532

Ch4 talks about how to evaluate a model. I can tell that this chapter is more about trying different things and there is no formal procedure for it.

1. how to test hallucination: ask whether X has connection to Y for a model, when X has nothing to do with Y
2. a public benchmark might leak into model's training set
3. Use n-gram overlapping or perplexity to test data contamination

— Nov 22, 2025 10:22PM

Yuchen is on page 159 of 532

Solid Ch3.

1. why foundational models hard to evaluate? the language model component makes the evaluation open-ended. Unlike the ML's classification problems.
2. AI as a judge is useful, but human's evaluation is useful as it captures human's preference. Model can achieve perfect score in benchmark, human evaluation will never get saturated.

— Nov 22, 2025 02:36PM

Yuchen is on page 112 of 532

Informative chapter:

1. ch2 will be better if it explains the evolution from encoder/decoder RNN, to Dot-Product Attention, to scale dot product (LLM uses this).
2. the transformer blocks explanation is confusing.
3. discussion about pre-training, post-training and sampling are very informative. Consider read them again after finishing the book. Ch2 refer to other chapters a lot.

— Nov 20, 2025 10:30PM

Yuchen is on page 49 of 532

Chapter 1:

Good summary and great comparison between the ML engineering and the AI (application) engineering.

1. Multiple techniques to get the foundation models to generate what you want: prompt engineer, retrieval-augmented generation (RAG) and fine-tuning.
2. with foundation model availability today, it is possible to start with product, then invest in data&models when product show promises.

— Nov 15, 2025 04:34PM

Post a comment »
Comments

No comments have been added yet.

Yuchen’s Reviews > AI Engineering: Building Applications with Foundation Models > Status Update

Yuchen’s Previous Updates

Post a comment »Comments

Post a comment »
Comments