Better tailored care through advanced data transfer and analysis methods
CSC is continuously developing its services including secure data transfers and data analysis methods in partnership with research organizations in order to meet the requirements of data-intensive research fields, including cancer research. Some of the most valuable insights generated by research come from the integration of complex datasets. It is therefore important to promote national computing and data storage services. This creates a strong foundation for data-intensive research in Finland.
CSC Sensitive data (SD) services support national requirements for sensitive data
The full release versions of SD Desktop and SD Connect are publicly available since March 30, 2022. The services are accessible via a web user interface, on-demand from the user’s own computer. SD Connect is a service for collecting and storing encrypted sensitive research data during the active phase of a research project, while SD Desktop users can directly access and manage that data in a virtual computing environment. SD Connect and SD Desktop can serve as a workspace for collaborative research projects, facilitating data collection and sharing between organizations.
Developing national sequencing capacities
Developing and scaling sequence data management is essential in order to increase genetic knowledge and answer more complex research questions. The sequencing data generation in Finland is supported by data services of CSC, and are an essential function in the national life science research ecosystem in Finland. As an example of cooperation, joint sequencing capacity of the Helsinki University Hospital (HUS) and the Institute for Molecular Medicine in Finland (FIMM) is built on compute and data management services provided by CSC. Bringing the sequence data close to the compute services directly from the sequencers streamlines the researcher’s workflow.
Therefore, a development goal is to integrate and scale the sequence data management with CSC Sensitive Data services.
The sequencing data workflow of FIMM is improved with a direct connection to CSC’s computing and data services. DNA sequences can be uploaded by the sequencing facility directly into researchers’ workspace in SD Connect. Here, the encrypted data can easily be shared with other researchers, via a URL, or analyzed in SD Desktop. Also other data types, such as imaging data, can be safely shared.
CSC’s sensitive data services support cancer research
The development of SD Connect benefits organizations and research projects such as iCAN. iCAN is a Finnish national R&D flagship program funded by the Academy of Finland. The founding hosts are University of Helsinki and HUS. The aim of iCAN is to facilitate breakthrough discoveries leading to improved treatments and quality of life for cancer patients. The project combines cancer genetics, translational and clinical cancer research, biobanks, information technology and artificial intelligence in a completely novel way. Cancer research uses genetic and molecular data in the development of new diagnostics and treatment.
iCAN is using SD Connect to transfer data from sequencing facilities to CSC and back to the HUS environment. Consented patient samples are sent to FIMM, who in turn directly uploads the sequencing data to SD Connect. The data is encrypted using Crypt4GH, a standard secure method for sharing human genetic data developed by the Global Alliance for Genomics & Health. In this manner, the data is interoperable within the overall CSC SD family of services, and also potentially with other service providers with similar data. The workflow is integrated within the HUS computing environment. Sequence data are securely and automatically downloaded using an automated process. Within our trusted research environment it is possible to do analyses combining sequencing and register data, once appropriate permissions have been acquired by the researcher. The analysis results are shared back to biobanks where they can be used to improve patient care.
Novel measurement technologies such as single-cell transcriptomics and spatially resolved transcriptomics open unprecedented opportunities in cancer research. However, in order to exploit these technologies to their full potential, efficient data analysis methods are required. In 2022, CSC organized three training courses on the analysis of single-cell and spatial data, and CSC also leads an international collaboration network in this fast-moving field. By working and developing together, we can make important advances in cancer diagnostics and treatments.
Expectations for service development also in the future
Some of the most valuable insights generated from data-intensive research come from linking and analyzing complex and heterogeneous datasets. Data management is hard work, and it is essential to support the growth of data-intensive science areas with a national level data infrastructure.
The development of SD Connect has made it easier for researchers to manage, share and analyze their research data. With the support of development programmes such as ELIXIR Finland, CSC will continue to develop the family of SD services in order to meet the researchers’ needs. For instance, in addition to collecting and storing the data in SD Connect, FIMM and iCAN may use SD Desktop to directly analyze the data. When the data collection phase is over, researchers could spin up virtual computing clusters on their SD Desktop and analyze the data stored in SD Connect via data streaming. At the end of their research, they can also directly publish the original data for reuse under controlled access on the upcoming Federated EGA service without the need of creating extra copies.
However, the expected data management capabilities and data storage needs exceed the current capacities available on existing infrastructures. For example in iCAN alone, sequencing is scaling up and will reach 3 PB annual data production rate in 2026. All this data is necessary to understand the molecular basis of cancer.
Unprecedented opportunities from e-Infrastructure for cancer research
During the active phases of research national computing needs to develop in order to be able to support data-intensive research such as cancer research in Finland. The challenges of data-intensive computing have been widely recognised, and CSC strategy is to respond to the needs.
Helena Lodenius
The author works as a project coordinator at CSC