Rate this book

C++ AVX Optimization: CPU SIMD Vectorization

David Spuler

Rate this book

AVX SIMD instructions are free CPU parallelization hidden in every CPU. Access these vectorized instructions for extra speed without learning assembly language by coding in C++ AVX intrinsics.

Key Introduction to AVX SIMD intrinsicsVectorization and horizontal reductionsLow latency tricks and branchless programmingInstruction-level parallelism and out-of-order executionLoop unrolling & double loop unrolling
Table of
Part AVX Optimizations
1. AVX Intrinsics
2. Simple AVX Example
3. CPU Platform Detection
4. Common Bugs & Slugs
5. One-Dimensional Vectorization
6. Horizontal Reductions
7. Vector Dot Product
8. Loop Optimizations
9. Softmax
10. Advanced AVX Techniques
Part Low-Level Code Optimizations
11. Compile-Time Optimizations
12. Zero Runtime Cost Operations
13. Bitwise Operations
14. Floating-Point Computations
15. Arithmetic Optimizations
16. Branch Prediction
17. Instruction-Level Parallelism
18. Core Pinning
19. Cache Locality
20. Cache Warming
21. Contiguous Memory Blocks
22. False Sharing
23. Memory Pools
Appendix Long List of Low Latency Techniques
Appendix License Details

335 pages, Kindle Edition

Published July 16, 2025

1 person is currently reading