Penn State Data Management Plan Tutorial

Summary

Printer-friendly version

Since 2011, funding agencies such as the NSF, the NIH, and the NEH have required that researchers applying for grant funding for their projects also include a data management plan, also known as a DMP - a document that describes how the applicant will manage the research data that are generated for the duration of the project.

There are many reasons why a DMP is necessary:

  1. A DMP gets researchers thinking about the data lifecycle before they start collecting data. It compels them to consider and plan how they will gather, describe, analyze, preserve, and make accessible and usable their data for other researchers to repurpose or to create derivatives from.
  2. Data have intrinsic value that others can learn from and build off of. A DMP helps ensure that data will be available for research verification purposes, if not also reproducibility purposes.
  3. A DMP can help stave off data loss and breaches of data, especially for sensitive or restricted data.
  4. Research projects funded by federal agencies are projects funded with taxpayer dollars, which means the data should be publicly available. A DMP is intended as additional assurance that such data will be accessible to the public.

Penn State's tools for data management planning include its repository service, ScholarSphere; guidance that integrates Penn State's research administration guidelines and policies for writing a plan; and boilerplate language stating the commitment from the University Libraries and Information Technology Services to preserve and make persistently accessible data sets that are deposited into ScholarSphere. Researchers are welcome to build on the language, in consultation with librarians and technologists at Penn State.

Penn State also has a login for the DMPTool, which lets researchers fill in the components of a DMP and, upon completion, then generates a DMP. It is strongly advised that researchers review the resulting DMP to make sure that it does not exceed the two-page limit and that the plan makes sense.

This brief video (4:40) shares a humorous data management and sharing snafu in three short acts:

 

Link to YouTube video.

Click for transcript of Data Sharing and Management Snafu in 3 Short Acts. This will expand to provide more information.

DR. JUDY BENIGN: Hello! My name is Dr. Judy Benign, I'm an oncologist at NYU School of Medicine.

BROWN BEAR: Hello, Dr. Judy Benign!

DR. JUDY BENIGN: I read your article on B-cell function. I think that I could use the data for my work on pancreatic cancer.

BROWN BEAR: I am not an oncologist!

DR. JUDY BENIGN: I know but I think I could use the data for my work on pancreatic cancer. Do you have the data?

BROWN BEAR: Everything you need to know is in the article!

DR. JUDY BENIGN: No. What I need is the data! Will you share your data?

BROWN BEAR: I am not sure that will be possible.

DR. JUDY BENIGN: But your work is in PubMed Central and was funded by NIH.

BROWN BEAR: That is true!

DR. JUDY BENIGN: ... and it was published in Science which requires that you share your data.

BROWN BEAR: I did publish in Science.

DR. JUDY BENIGN: Then I am requesting your data! Can I have a copy of your data?

BROWN BEAR: I am not sure where my data is!

DR. JUDY BENIGN:But surely you saved your data!

BROWN BEAR: I did, I saved it on a USB drive!

DR. JUDY BENIGN: Where is the USB drive?

BROWN BEAR: It is in a box... ... it is in a box at home... I just moved!

DR. JUDY BENIGN: but can I use your data?

BROWN BEAR: There are many boxes! So many boxes! I forgot to label the boxes.

[ON SCREEN TEXT: 7 months later]

DR. JUDY BENIGN: Hello again! Thank you for sending me a copy of your data on a USB drive, I received the envelope yesterday.

BROWN BEAR: You are welcome, but I will need that back when you are finished, that is my only copy!

DR. JUDY BENIGN:I did have a question.

BROWN BEAR: What is your question? You might find the answer in my article!

DR. JUDY BENIGN:No. I received the data, but when I opened it up it was in hexadecimal.

BROWN BEAR: Yes - that is right!

DR. JUDY BENIGN: I cannot read hexadecimal!

BROWN BEAR: You asked for my data and I gave it to you. I have done what you asked.

DR. JUDY BENIGN: But is there a way to read the hexadecimal?

BROWN BEAR: You will need the program that created the hexadecimal file!

DR. JUDY BENIGN: Yes, I will. What is the name of the program?

BROWN BEAR: "Cytosynth"

DR. JUDY BENIGN: I do not know this program.

BROWN BEAR: It was a very good program! The company that made it went bankrupt in 2007!

DR. JUDY BENIGN: Do you have a copy of the program?

BROWN BEAR: I do not use this program any more because the company that made it when a bankrupt. Maybe you can buy a copy on eBay?

[ON SCREEN TEXT: 20 minutes later...]

DR. JUDY BENIGN: I have good news!

BROWN BEAR: You again!

DR. JUDY BENIGN: I talked to my colleague... she knew a person with a copy of the software!

BROWN BEAR: Then why do you need me? Everything you need to know about the data is in the article!

DR. JUDY BENIGN:I opened the data and I could not understand it!

BROWN BEAR: If you have the program you will find it is clear!

DR. JUDY BENIGN: Well... I noticed that you called your data fields "Sam"... Is that an abbreviation?

BROWN BEAR: Yes! It is an abbreviation of my co-author's name... His name is Samuel Lee, we call him "Sam".

DR. JUDY BENIGN: I see... and what is the content of the field called "Sam1"?

BROWN BEAR: Ah yes... "Sam1 is the level of CXCR4 expression.

DR. JUDY BENIGN: and what is the content of the field called "Sam2"?

BROWN BEAR: That is logical if you think about it!

DR. JUDY BENIGN: What is the content of the field called "Sam2"?

BROWN BEAR: I don't remember!

DR. JUDY BENIGN: what about "Sam3"?

DR. JUDY BENIGN: Is there a guide to the data anywhere?

BROWN BEAR: Yes, of course!

It is the article that is published in Science!

DR. JUDY BENIGN: The article does not tell me what the field names mean. Is there any record of what these field names mean?

BROWN BEAR: Yes! My co-author knows what the content of Sam2 is... and Sam3... and Sam4

DR. JUDY BENIGN: Can I talk to your co-author?

BROWN BEAR: I am not sure!

DR. JUDY BENIGN: I would very much like to talk to you co-author.

BROWN BEAR: Well, he was a graduate student. He went back to China 2 years ago.

DR. JUDY BENIGN: Can I have his contact information?

BROWN BEAR: He is in China... his name is "Sam Lee".

DR. JUDY BENIGN: I think I cannot use your data.

BROWN BEAR: You could check the article... to see if what you need is there!

DR. JUDY BENIGN: Please stop talking now!



banner

Check Your Understanding

Why do researchers need a DMP?

(a) It is required by funding agencies.
(b) Having a plan helps ensure data sharing and access.
(c) Replicability of research results depends a lot on good management of data.
(d) All of the above.

test-bulbClick for answer.

ANSWER: (d) All of the above. The key funding agencies, such as the NSF, NIH, and NEH, are requiring DMPs to foster increased sharing of, and thus access to, research data. Since federal tax dollars fund such projects, the public has a right to have access to the data generated by them. Having a plan for managing data through their lifecycle also aids in the reproducibility of science.