Penn State Data Management Plan Tutorial

Components of a Typical Plan

A DMP basically consists of five parts, in which the following aspects of data are addressed:

Part 1: The types of data to be collected or produced during the project, and the processes or methodology for doing so:

  • Types of data that will be generated by your research (e.g., human subjects related surveys, field data, samples, model output data)
  • Data format(s) and file types (e.g., .txt, .pdf, .xls, .csv, .jpeg, etc.)
  • How the data will be collected or accessed (if using existing data)

Part 2: The formats for the data and the standards that will be followed for documenting and describing the data:

  • Information about your data you will need to save (i.e., experimental design, environmental conditions, global positioning information, etc.)
  • What metadata standard you will use to document your data (i.e., some research domains have widely accepted formats, others may not and you may target how that decision may be made in the project)
  • How you plan to record your metadata
Record showing metadata in GenBank.
Part 3: The availability of the data, including information about ways in which the data will be accessed, and whether there are any issues related to privacy and/or intellectual property:

  • Expected availability of the data during the project period
  • List/Explain any ethical or privacy issues incurred by the data
  • Address any intellectual property rights issues (e.g., who holds the rights to these data?)
Diagram listing potential concerns regarding sensitive data, as described in text above
If you are collecting sensitive data (e.g., data stemming from human-subject research), then sharing such data will likely require different types or levels of access.Are higher levels of security required?Will an embargo be needed?
Part 4: The guidelines, procedures, or policies for data reuse and/or redistribution, attribution, as well as for the creation of derivatives from the data:

  • What you will permit in terms of reuse and redistribution of the data, based on policies for access and sharing
  • Think about what other researchers (whether in your subject domain or others) may find your data useful
  • Identify the lead person or committee on the project who will make the decisions on redistribution on a case-by-case basis
  • Where the data will be deposited (e.g., data repository, repository service at your institution, etc.)

Part 5: The measures that will be taken to help ensure the long-term preservation of, and access to, the data - including possible mention of factors such as format migration and who will be responsible for managing the data for the duration of the project:

  • Will all of the data produced on your project be preserved, or only some?
  • Context for your data (e.g., tools, project documentation, metadata etc.) required to make it accessible and understandable
  • Anticipated transformations of the data in order to deposit it and make it available
  • The length of time the repository will be available to the public and/or maintained (some directorates have a suggested minimum for the time after a project ends or after publication of certain data)

Again, remember: Data management plans submitted with NSF proposals cannot be longer than two pages.