Title: Inference Engineering Author(s): Philip Kiely ISBN: 979-8-9943597-2-3 Publisher: Baseten Books Publication Date Year: 2026 Publication Date Month: March Publication Date Day: 5 Page count: 256 Format: Paperback, Hardcover, Ebook (PDF, EPUB) Description: A guide for engineers who want to understand the hardware, software, techniques, and infrastructure required to run AI models in production. Covers the full inference stack from CUDA and GPU architecture to frameworks (PyTorch, vLLM, SGLang, TensorRT-LLM) and production operations, including LLMs, image/video generation, speech, and multimodal models. Language: English Link to book page: https://www.baseten.co/inference-engi... Link to book cover: https://philipkiely.com/images/infere... Author's Goodreads page: https://www.goodreads.com/author/show...
Author(s): Philip Kiely
ISBN: 979-8-9943597-2-3
Publisher: Baseten Books
Publication Date Year: 2026
Publication Date Month: March
Publication Date Day: 5
Page count: 256
Format: Paperback, Hardcover, Ebook (PDF, EPUB)
Description:
A guide for engineers who want to understand the hardware, software, techniques, and infrastructure required to run AI models in production. Covers the full inference stack from CUDA and GPU architecture to frameworks (PyTorch, vLLM, SGLang, TensorRT-LLM) and production operations, including LLMs, image/video generation, speech, and multimodal models.
Language: English
Link to book page: https://www.baseten.co/inference-engi...
Link to book cover: https://philipkiely.com/images/infere...
Author's Goodreads page: https://www.goodreads.com/author/show...