Online BioMonth: CSC supercomputing and data management for bioscientists

BioWeek is now BioMonth! 

In March you can spend your Tuesday and Wednesday mornings learning how to use supercomputer Puhti to do your data processing or simulations effectively, and how to handle your data not only while doing the analysis, but also before and after that! Join the BioMonth 2021 classes and make sure you get the most out of our services* and your data management in shape!

* The CSC services discussed in this course are free-of-charge for academic research, education and training purposes in Finnish higher education institutions and in state research institutes (subsidized by the Ministry of Education and Culture, Finland).  

With 1 registration you get 6 half-days (9:00-12:00) of interesting topics in March 2021.

16.3. and 17.3.
23.3. and 24.3.
30.3. and 31.3.


Puhti is a CSC’s supercomputer that comprises powerful CPU partitions with a wide range of memory sizes and local storage options. Puhti allows the user to reserve compute and memory resources flexibly, and the user can run anything from interactive single core data processing to medium scale simulations spanning multiple nodes. Puhti has a wide selection of scientific software installed.

Allas is CSC’s general-purpose research data storage server. It is a part of the CSC storage portfolio and can be accessed on the CSC servers as well as from anywhere on the internet. Allas can be used both for static research data that needs to be available for analysis and to collect and host cumulating or changing data.

Good research data management is the basis of successful research. Research data management (RDM) concerns the managing and organisation of data during as well as after the active phase of a project. It is important to consider all stages of data management from collecting and processing the data to publishing and sharing it using a Data Management Plan (DMP). This will increase the impact and visibility of your work and enable reuse of the data in the future.

Later this course we will introduce HPC-compliant containers called Singularity containers which allows Puhti users to run their applications in a containerised environment. These containers can serve as an alternative approach to conda packages.

Puhti (16.-17.3. 9:00-12:00 EET (UTC+2))
Getting started with Puhti
Module system
Data storage in Puhti
Running sbatch jobs
Performance analysis
Running interactive jobs in Puhti

Data (23.-24.3. 9:00-12:00 EET (UTC+2))
What is Allas? 
Projects, clients and interfaces.
Examples for storing, using and sharing small or large datasets.
Examples for using Allas from Puhti and from your local environment
Research data management: what happens before and after computing?
Data management planning
Sensitive data services offered by CSC
Publishing and sharing data after the project

Containers (30.-31.3. 9:00-12:00 EET (UTC+2))
Introduction to Singularity
Running applications as singularity containers
Building singularity containers
Converting conda packages as singularity applications
Workflows with singularity containers 


One should be comfortable working with the command line environment in Linux and able to use an editor (e.g., vi, nano, emacs, etc.) in order to get the maximum benefit from this course.

So, ideal candidates for this course are:

  • Bioinformaticians or computer scientists with some bio-background
  • Biologists with Linux skills

The course will include hands-on using the Linux command line. You’ll need to be able for example to move around in the directory hierarchy (cd, ls, pwd), create directories, copy, rename, delete files (mkdir, cp, mv, rm), uncompress files (tar, unzip), edit files (any text editor, e.g. gedit, nano, emacs), look at file contents (more, less), use some environment variables ($HOME, …).

Here are some links for easy self-study:

New to CSC? Worry not! We will organise a “Getting started” session online on Monday 15th of March at 13:00 EET (UTC+2) before the course, where you can get help for getting the CSC credentials!