Video details

cudaFlow: Modern C++ Programming Model for GPU Task Graph Parallelism - CppCon 2021


Presented by Tsung-Wei Huang & Dian-Lun Lin --- Graphics processing unit (GPU) has become central to today’s scientific computing applications, such as machine learning and simulation. As the application complexity continues to grow, the need to quickly execute thousands of dependent GPU tasks has become a major bottleneck in the development flow. To overcome this challenge, modern CUDA has introduced CUDA Graph for users to directly offload a GPU task graph on a GPU to minimize scheduling overheads. However, programming CUDA Graph is extremely tedious and involves many low-level details that are difficult to program correctly. Consequently, we introduce in this paper, cudaFlow, a modern C++ programming model to streamline the building of large GPU workloads using CUDA Graph. cudaFlow enables efficient implementations of GPU decomposition strategies supplied with incremental update methods to express complex GPU algorithms that are hard to execute efficiently by mainstream stream-based models. We have demonstrated the simplicity and efficiency of cudaFlow on large-scale GPU applications composed of thousands of tasks and dependencies.
The talk will cover five major components: 1. What is the new CUDA Graph programming model? 2. Why do we need a C++ programming model for GPU task graph parallelism? 3. Designs, implementations, and deployments of the proposed cudaFlow programming model. 4. Real use cases of cudaFlow and its performance advantages in large GPU workloads. 5. Remarks and roadmap suggestions for the GPU programming community.
By the end of the presentation, the audience will know how to leverage the new GPU task graph parallelism to boost the performance of large-scale GPU applications, such as machine learning and scientific simulations.
--- Tsung-Wei Huang As a university faculty member, a central theme of my research is to make parallel computing easier to handle. I am passionate about using modern C++ technology to solve parallel and heterogeneous computing problems. One such effort is my Taskflow project (, A General-purpose Parallel and Heterogeneous Task Programming System using Modern C++, which I developed to help developers quickly write parallel and heterogeneous programs with high performance and simultaneous high productivity.
Dian-Lun Lin PhD student, University of Utah --- Videos Filmed & Edited by Bash Films:
YouTube Channel Managed by Digital Medium Ltd
The CppCon YouTube Channel Is Sponsored By: JetBrains :