Increase the efficiency of CUDA C++ kernels for AI and high-performance computing on the powerful NVIDIA GPUs. Leverage your GPU investment with the power of an efficient software layer.
Main Topics - Speeding up CUDA C++ kernels - Parallelization and vectorization - Compute optimizations - Memory access optimizations
Table of 1. Parallel Programming 2. Optimizing CUDA Programs 3. Vectorization 4. AI Kernel Optimization 5. Profiling Tools 6. Compilers and Optimizers 7. Timing CUDA C++ Programs 8. Memory Optimizations 9. Coalescing and Striding 10. Data Transfer Optimizations 11. Heap Memory Allocation 12. Compute Optimizations 13. Warp Divergence 14. Grid Optimizations 15. Compile-Time Optimizations 16. Arithmetic Optimizations 17. Floating-Point Bit Tricks 18. Advanced Techniques CUDA C++ Slugs