What to take into consideration when compiling research material?

Image: Adobe Stock

What to take into consideration when compiling research material?

Research material management begins already before collecting the data itself. At the very least, it is important to know who the target users are and how they are going to use it, as well as how they will be able to access the data and with what kind of rights. This especially affects the nature of the source data, and where and how it can be acquired.

If the material is intended to contain previously published text or audiovisual data, copyrights may pose significant limitations. Respectively, data collected e.g. via interviews is only usable if the agreements made with the informants allow it. There is a lot of otherwise useful data out there gathering dust, either physically or virtually, because no license allowing it to be used for research can be applied to it.

Ideally, the platform the material is going to be published on is known already at the beginning of the process. Most disciplines have their own repositories and archives that typically have their own rules and guidelines. Many research infrastructures also offer various kinds of support services. The intended repository most likely has ways to classify the deposited data, so it is also a good idea to think ahead how your material will fit those categories.

Publishing and maintaining research data require technical skills, especially if the repository does not have a fixed publication process or the content producers have the necessary skills themselves. This is why the whole data life cycle needs to be considered in budgeting and the funding be planned to include the expenses of all stages of the work, including the ones in which you may no longer personally be directly involved with your material.

Another question to consider in the compilation phase is data quality. The matter of data life cycle is present again. High-quality data not only meets its original purpose but can also be interesting for another researcher in another project. The quality is affected by the data format, in choosing which the repository most likely provides instructions. Different fields have different requirements for the data. For instance, a linguistics researcher can be satisfied with sentences presented out of context (a way used to offer e.g. published novels for research) whereas in social sciences, the text as a whole is required.

In addition to quality, availability and accessibility also affect the usability. If the data contains e.g. personal information or other sensitive data, offering it to end users will most likely require security procedures, including authentication and authorization. Access to the data may also require applying for a personal permission based on a detailed application, and the right might only be granted for a fixed period. If the data is especially sensitive, most repositories may not even have procedures available to publish it so that the security requirements are met. Data can be made less sensitive e.g. by anonymization, but this may significantly hamper its usability. Audio and video data naturally contains personal data because the informants are recognizable based on voice or picture.

It is crucial for accessibility that users actually find the data. It helps if the data has clear and informative metadata that is available in the facilities that are used on the applicable field to look for data. Persistent identifiers are also needed, so that both the data and its metadata will be discoverable in the future, even if systems and locations change.

Even if writing a detailed data management plan is not a requirement for compiling and publishing research data, it pays off to plan the data life cycle in advance in order to avoid unpleasant surprises later on and to ensure the data can be used the way it was intended.


Check out the renewed Data Management webpages and Service Catalog.

Are you applying for funding from Academy of Finland? The information package for the academy applicant gathers useful links to our renewed data management service pages.

More about this topic » Go to insights and news »

Tero Aalto

The author is a language technologist and works with the Language Bank of Finland.