There is a difference between archiving and publishing data, and it is mainly dependent on the research cycle, since publication of data is the final step of the research data lifecycle. Archiving data means ensuring that a copy of one’s dataset is kept (usually by one’s host institution) in a secure location for the long term (10 years or more). Publishing data is making one’s data publicly available, whether that is a restricted public or an open public. Also, decisions on what others may do with one’s data needs to be set out through the use of a licence.
The amount of data accumulated during the research is often considerable and not all of it should be archived and/or published. Archiving all digital data leads to high storage, maintenance and managing costs. Any data archived or published should apply the FAIR Data Principles.
Normally, research data is published either by being deposited in data repositories or data journals.
Data Repositories
Besides institutional, repositories are divided in two main categories: general-purpose or discipline-specific. The role of a repository is to archive and/or publish data. A repository should ensure that data is accessible for the years to come with technological advances.
Re3data is a global registry of research data repositories for different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. Re3data promotes a culture of sharing, increased access and better visibility of research data.
CoreTrustSeal is a certification that indicates that the repository has complied with the coretrust requirements and has obtained a high level of trustworthiness.
Choosing a Data Repository
When publishing one’s data, consider choosing an institutional data repository, multi-disciplinary repository or a discipline specific repository. Multi-disciplinary repositories can also offer good visibility for one’s data. An example of a multi-disciplinary repository is Zenodo which is often used as a repository for EC funded research. Some other general purpose repositories include FigShare, Mendeley Data, DataDryad, and OpenScienceFramework. Discipline-specific repositories are used when the repository supports the type of data to be shared according to subject areas.
When choosing a repository for permanent research data storage, The repository must comply with the FAIR Data Principles. The compliance with FAIR principles of the repository can be checked on the FAIRsharing portal.
Questions that should be answered when choosing an appropriate repository include:
A list of thematic data repositories can be downloaded here: Thematic Data Repositories [PDF]
Data Journals
Research data can also be published as supplementary material or information to a research paper with a scientific journal. Some scientific publishers have detailed policies for the publication of research data in addition to research papers.
There is also a possibility of publishing research data as an independent element of research in publications that specifically publish only research data, the so-called data journals. Due to their specificity, such journals are not oriented towards data analysis and theoretical contributions, so the paper in a data journal usually consists of a summary, introduction, description of data, methods and materials, conclusion and the possibility of data reuse.
A list of thematic data journals can be downloaded here: Thematic Data Journals [PDF]
VU Amsterdam recommends that a dataset should consist of the following documents:
A data licence is a legal arrangement between the creator of the data and the end-user, specifying what users are permitted to do with their data. The type of licence one chooses will define the terms of use. It is advisable to use the least restrictive licence possible.
The funding body
The type of data
Creative Commons Licences (CC licences) provide a standardised way to grant public permission to use creative works and data under copyright law. From the re-user perspective, a Creative Commons licence on a copyrighted work provides details of how the data can be re-used.
Licences other than Creative Commons
The Open Data Commons Open Database License (ODbL) is a copyleft licence agreement intended to allow users to freely share, modify, and use a database while maintaining this same freedom for others.
Copyleft
Copyleft licence is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. In this sense, freedom refers to the use of the work for any purpose, and the ability to modify, copy, share, and redistribute the work.