Introducing CSC’s content retention policy in Services for Research and Education
The volume of digital data is ever-growing. It’s produced by research instruments, algorithms on supercomputers and citizen science. It may be videos, csv files, scripts or text. Sometimes it includes personal data such as in interviews, or sensitive data as in human genomes. These factors emphasise the importance of responsible and efficient data management.
The owner of the data needs to be aware of legislative issues, understand the meaning of good data management practices and make plans for the data lifecycle. GDPR also sets requirements on data. Personal data must always be managed and deleted according to published plans and consents.
The CSC Project Manager makes the data life-cycle decisions
CSC offers a wide range of computing and data management services for all of stages of the data life cycle. During the project the data can be stored in Allas, from where it can be easily moved to Puhti for processing. The processed data may be moved to Fairdata IDA or EUDAT B2SHARE, or to some other domain-specific data repository, for publishing. Some of the data may be transferred to the Digital Preservation service for Research Data, which guarantees preservation of digital assets for several decades or even centuries, where as some data have to be securely deleted at some point of the research.
Many of the data management, storage and computing services at CSC require creating a CSC project, where one user is assigned as the Project Manager. This Project Manager makes the decisions about the data life cycle management in CSC’s services on behalf of the other members of the CSC project.
Good planning and documentation makes handling data responsibly easier
Creating and maintaining a data management plan is crucial for ensuring that all the specific requirements of the data are taken into consideration during the different phases of the project. When creating a data management plan, the data owner and other project members should think about the volume of data that is produced, where it is analyzed and stored, as well as the risks that should be taken into consideration at different stages of the research. A good data management plan also covers the data deletion. As the data are cultivated and versioned, good data management means that for example redundant data, trivial data that can be easily reproduced and outdated data are deleted. There may be also GDPR related reasons to delete data at a certain point of time. For an overview of the topic, see University of Helsinki 5s method for cleaning data.
The data retention policy plays a role when your CSC project comes to an end
From our user surveys and discussions with various stakeholders, such as data management experts in Finnish higher educational institutes and research organisations, we have identified development needs related to CSC projects and the data life cycle management:
- CSC should make clearer when and how CSC projects are extended and ended, and when data are deleted
- CSC should clarify the related roles and responsibilities
We agree with our users that it is important to refine our policies to support better data management. We’re now moving towards a common content deletion process when a CSC project comes to its end.
Starting from mid-2022 we’ll have one common policy to delete CSC project’s content (i.e data, software, servers, systems or processes) from our data management, storage and computing services. The CSC project’s content will be deleted after project closure, if the users have not done it themselves. This is to ensure that user’s content in our services is handled responsibly. The policy will apply to the following services: Puhti, Mahti, Allas, cPouta, ePouta, Rahti, SD Connect, SD Desktop and Fairdata IDA. In Fairdata IDA we additionally consult the Project Manager or the organisation that has granted the IDA storage space to ensure proper data life-cycle management for published data.
We know that this can be a big change for our users. Therefore we want to be transparent and actively communicating this change. We will update MyCSC customer portal to offer a better view on CSC project data life cycle management and renew our user communications so that all the CSC project members receive several reminders about the CSC project expiration, with the option to extend it.
We’re also looking into other areas of development to make service and data life cycle management easier in the future. We are for example adding more details about project’s service usage in MyCSC customer portal and we aim to utilize our users existing data management plans more efficiently.
Suvi Pousi
The author works as a product owner in data management services and is responsible for implementing the CSC’s content retention policy in academic CSC projects.