Onsite

High Performance R

Description

Are you using R but not sure if your R code makes the best use of the computing resources available? Would you like to learn to speed up R analyses by parallel computing, identify bottlenecks in your R scripts, or get tips on handling large datasets in R? Join our new course that focuses on using R efficiently and making most of R in a high performance computing environment.

The topics of this course include:

  • making use of the properties of R as a programming language to write efficient R code
  • exploring performance issues of R code by benchmarking and profiling processes and memory usage
  • parallel and distributed computing with R on both local and supercomputing resources

The topics will be covered using short lectures and/or demonstrations followed by hands-on exercises using RStudio and batch jobs on the supercomputer Puhti. The participants are welcome to bring their own R code (short script sections, not full projects) and a small data set (maximum 5 GB) to be used in the some of the exercises (but note that we do not solve any problems with the code itself).

Target audience

This course is meant for anyone familiar with the basics of R and wanting to learn how to make their analyses in R more efficient and how to use R in a high performance computing environment. For example:

  • current users of RStudio in CSC’s Puhti web interface: move beyond RStudio and make most of the computing resources of the supercomputer
  • R users running R on their own computer so far: use your computer’s resources efficiently and learn to use R in a high performance computing environment
  • experienced users of another programming language and/or high performance computing: get familiar with the functional nature of the R language and its resource management

Where & when

This is a two-day course from 9:00 to 16:00. The course will be offered on-site at the CSC Training Facilities (Keilaranta 14, Espoo, Finland). A Zoom link can be provided to participants not able to join on-site, but please note that this is not a hybrid course so online participants will be offered limited support. For participants joining the course on site in Espoo, lunch and a snack is included in the price.

Learning outcomes

After attending this course, participants will be able to:

  • explore potential R code performance issues with benchmarking and profiling
  • understand the key properties of the R language and how they relate to the computer’s resource management
  • run R scripts with the batch job system on the supercomputer Puhti
  • get started with parallel and distributed computing with R

Pre-requisites

Required:

  • basics of the R programming language
    • if you are a complete beginner with R and programming in general, we recommend the course Data Analysis with R instead

Useful to make the most of the course content but not required:

Lecturers

Billy Braithwaite and Heli Juottonen (CSC)

Registration deadline: 16.9.2024