Penn State Data Management Plan Tutorial

3.1 Access and Availability of Data

Printer-friendly version

A major reason why federal funding agencies require DMPs is to encourage researchers to think as early in the project as possible about how and when data will be shared and made available. Funding agencies expect you to be clear in the DMP about your approach or policy for sharing and giving access to your data. Sharing also makes possible reuses and repurposing of data, as explained later in this tutorial.

Ways of sharing data can vary. They depend on the research domain, on the availability of services, on the size of the data set, and other factors. Many of these are covered later in this tutorial in the section which addresses long-term preservation of data. Preserving data ensures long-term access to it. What you want to make sure your DMP states is that you will deposit your data in a repository - whether it's a repository dedicated to data from your research domain or an institutional repository that accepts data sets. Penn State has such a repository - ScholarSphere, about which you may learn more here, in the tutorial.

You should note in the DMP when data will be made available - i.e., during the project, or afterward? Will there be any embargoes? If so, why and for how long? Will data be shared indefinitely, or will there be temporal constraints?

Will all the data be shared, or only a portion? Have you adequately addressed levels of access, e.g., who is allowed to use the data, etc.? (You may already have addressed this in the section of the plan related to data access.)

part of an example data management plan provided by DataOne. Click below caption for text description.
Part of an example data management plan provided by DataONE.
Click to expand to provide more information

3. Policies for Access and Sharing

After publication of manuscripts based on the data we collect, we will share our data and metadata with the NutNet community via data updates sent annually as .csv files from the existing central relational database. Other NutNet users will need to contact Lind for access to the data.

We will also submit both of our datasets (abundance and stoichiometry) to the U of M Digital Conservancy, an archive for digital preservaton. Borer has access to this resource as a faculty member. This will occur within a year of publication. The data will be publicly available via the Digital Conservancy, which provides a permanent URL for digital documents.

In DMPs, researchers often state that their findings will be published as articles and other genres relevant to their particular disciplines. Increasingly, publications are linking to data sets that have been deposited into repositories; this is another reason why depositing your data into repositories is important.

The NSF and other funding agencies that require DMPs now frown on the previously common practice of sharing data sets upon request, such as via email. Researchers' email addresses are not permanent, for one thing, and on-demand access unnecessarily burdens the researcher with the redundant act of locating and attaching, or pointing to, the requested data set. By submitting the data set to a repository, data become publicly discoverable, findable, and accessible.

While typically not mentioned in the DMP, the citation of data marks another method of sharing data and attributing the researchers who created them. The organization DataCite has examples of citing data. The most crucial component of such a citation is a Digital Object Identifier or DOI. DOIs are assigned to ensure persistent access. Even though the digital object may undergo changes over time, the DOI stays the same.

Another benefit to researchers of having your data made available in a repository is that it becomes more widely accessible and citable. Citation of data is increasingly required by editors and having a permanent URL or repository identifier helps others find your data, as Andrew Stephenson describes in the following video:


Link to YouTube video.

Making publicly funded data available is a common good, however, there are instances where it is important to restrict data. Thus, in a DMP researchers should note what data will be shared and whether there are any restrictions to sharing.

Red stamp on a page of text labeling it as restricted
A "restricted data" stamp from the 1950s on documentation about opacity classifications.
Credit: restricted data on Flickr. CC-BY 2.0.