Onsite

High-Level GPU Programming

Modern HPC systems combine CPUs and accelerators such as GPUs or FPGAs, making code optimization for diverse platforms time-consuming. Cross-platform portability ecosystems provide a higher-level abstraction layer, simplifying parallel programming in shared memory environments. Examples include SYCL, Kokkos for C++, and standard C++.

  • SYCL, an open standard by Khronos Group, offers a unified C++ layer for diverse devices, achieving parallel execution on CPUs, GPUs, FPGAs, and more.
  • Kokkos Core, a C++ framework, enables high-performance applications across HPC platforms, addressing challenges of intricate node architectures. Kokkos supports various backend programming models like CUDA, HIP, SYCL, HPX, OpenMP, and C++ threads.
  • The latest C++ standard enables some GPU programming.

This training introduces GPU programming using SYCL, Kokkos, and standard C++ to write Portable and performant accelerated applications. The course consists of lectures and hands-on sessions using LUMI, and Mahti featuring AMD, and Nvidia GPUs. At the end of the training, we also provide the opportunity for the participants to apply the acquired knowledge to personal coding projects and real-world application scenarios.

Where and when

Wednesday 27th – Friday 29th November

This is an on-premise event at the CSC Training Facilities located on the premises of CSC at Keilaranta 14, Espoo, Finland.

Learning outcome

At the end of this training, participants will be able to:

  • write hardware-agnostic code to express parallelism using SYCL, standard C++ and Kokkos that can run on CPUs and GPUs
  • manage memory across devices
  • do basic performance analysis
  • evaluate the drawbacks between different approaches for programming GPUs

Prerequisites

This course targets Developers who know C++ and would like to learn how to program GPUs or for Developers who are already doing GPU programming using a non-portable approach such as CUDA or HIP and would like to write performant code which runs on various computing platforms. In order to be able to follow the course the participants are expected to have basic familiarity with C++ concepts such as raw pointers, classes, structures, templates, lambdas, functors.

The content level of the course is broken down as: beginner’s – 70%, intermediate – 20%, advanced – 10%, community-targeted content – ​​0%.

(Tentatively) Program (coarse grained):

Day 1, Wednesday 27.11, 9:00-17:00

09:00-11:00 Introduction to GPUs and GPU parallel programming model
11:00-12:00 Refresher of C++ concepts
12:00-13:00 Lunch break
13: 00-16:45 SYCL I
16:45-17:00 Day 1 wrap-up

Day 2, Thursday 28.11, 9:00-17:00

09:00-12:00 SYCL II
12:00-13:00 Lunch break
13:00-15:00 SYCL III
13:00-16:45 Mahti and LUMI (SYCL installation, usage and exercises) , SYCLomatic, Memory optimizations
16:45-17:00 Day 2 wrap-up

Day 3 Friday 29.11, 9:00-17: 00

09:00-10:00 Interoperability with third-party libraries, and multi-gpu, multi-node programming
10:00-12:00 Kokkos & Standard C++
12:00-13:00 Lunch break
13:00-16:45 Exercises & Bring your own code
16:45-17:00 Day 3 wrap-up & Course closing

The previous training material can be viewed at Github.