Practical Machine Learning with Spatial Data
This course gives a practical introduction to machine learning with spatial data, both to shallow learning and deep learning models, including convolutional neural networks (CNN).
The course consists of lectures and hands-on exercises in Python. We will use scikit-learn for the shallow learning exercises and keras for deep learning exercises.
Learning outcome
After the course the participants should have the skills and knowledge needed to start applying machine learning for different spatial data analysis tasks. In addition, participants will be able to makes use of the GPU resources available at CSC High Performance Computers for training and deploying their own machine learning models.
Prerequisites
- Basics of geoinformatics, vector and raster data, coordinate systems.
- Basics of Python. The course will include a fair amount of reading Python code, so you should be able to follow Python syntax. If you need to refresh your Python skills you can go through the materials of Helsinki University GeoPython course.
- Basic Linux commands: cd, ls, mv, cp, rm, chmod, less, tail, echo, mkdir, pwd. If unfamiliar, take a look for example at LinuxSurvival first two modules.
The course is similar to the Practical machine learning for spatial data course kept in autumn 2019 and 2020. Course exercise materials of are available in Github: https://github.com/csc-training/GeoML/
Course organizers and lecturers: Kylli Ek, Samantha Wittke, Billy Braithwaite, Markus Koskela, Mats Sjöberg (all CSC)
Preliminary program
Monday 7.11.2022
9:00-12:00
- Introduction
- Lecture 1: Introduction to machine learning
- Exercise 1: Image segmentation using k-means with scikit-learn
12.00-13:00
Lunch break
13:00-16:15
- Lecture 2: Shallow machine learning models
- Lecture 3: Preparing spatial data for machine learning
- Exercise 2: Preparing vector data for regression
Tuesday 8.11.2022
9:00-12:00
- Exercise 4: Shallow regression with scikit-learn
- Exercise 5: Image classification using shallow classifiers, grid search with scikit-learn
- Lecture 4: Introduction to deep learning models
12.00-13:00
Lunch break
13:00-16:15
- Lecture 5: Fully connected neural networks
- Exercise 6: Fully connected regressor with keras
- Exercise 7: Fully connected classifier with keras
Wednesday 9.11.2022
9:00-12:00
- Lecture 6: Convolutional neural networks (CNN)
- Lecture 7: Puhti GPUs and batch jobs
- Exercise 8: Data preparations for CNN
12.00-13:00
Lunch break
13:00-16:00
- Exercise 8 continues: CNN based image segmentation with keras
- Wrap-up and where to go from here
We will have coffee breaks also during morning and afternoon sessions. Participants at CSC are provided lunch and refreshments during coffee breaks.
Exercise set-up
All exercises will be done in CSCs supercomputer Puhti, using the Puhti web interface.
All participants will get Puhti training accounts.
Participants at CSC
The course is organized in CSC training class room, where everybody has access to a training PC.
Online participants
Technical requirements, minimum:
- Zoom.
- Web browser.
Optional:
- ArcGIS or QGIS for viewing the spatial data files (GeoTiff, Shape, GeoPackage).
The places for participating in CSC office are fully booked and registration for on-site course is at the moment possible only to waitlist, so we recommend registration to online version.
Time
7.11.2022 - 9.11.2022
09:00 - 16:00