Kim has come a long way in her understanding of data management planning. In the previous parts of this tutorial, she learned how to document data and about the importance of using standards in such documentation and description. She now appreciates the need to be clear in the DMP about how the data will be shared and made accessible, and what the time frame will be for making them available. Kim has also been discussing with Dr. Smart what kinds of derivatives from the data might be possible and interesting for related research communities. She is now in a position to finish the DMP by addressing storage and preservation of Dr. Smart's data.
Part 5 of this tutorial guides you through the process of where to store your data, once your project concludes, so that you may be assured of long-term preservation of the data for ongoing access. You'll be able to address the following in this part of the DMP:
Disciplinary data repositories that are applicable to the research data sets you will be collecting and sharing. (Note: some repositories have requirements in terms of types of data, descriptive standards, and size of data.)
Information about Penn State's repository service, ScholarSphere [1], where researchers have deposited data to share them and ensure persistent access to them.
Tips on how to store your data for safekeeping.
As you consider where to deposit your data, think about, as well, how long you will make your data available and accessible, after your project ends. In addition, how much of your data will you make available? All of it - from the raw files to the processed outputs? How often will you need to access it? How will you enable other users to make use of it, particularly if the use of the data requires the application of other tools or systems?
Ben Goldman discusses some of the challenges and resources available to the preservation of digital data.
As mentioned earlier, there may be data repositories suitable for the data that your project will produce. To find such repositories, you may wish to consult Databib [4] - a growing list of repositories for research data primarily in the sciences and social sciences. Data sets stored in a disciplinary repository have some advantages, including a greater likelihood of discovery by other researchers. Another benefit to researchers in having your data made available in a repository is that it is more widely accessible and citable. Andrew Stephenson attests to the advantages of data repositories in the following video.
Examples of disciplinary data repositories:
For help in selecting the appropriate data repository for your data, consult the Libraries’ Data Management mailing list, l-data-mgmt@lists.psu.edu [11].
Sometimes, there is not a disciplinary repository for your data, or if there is, then it may have requirements for data set deposits that your data cannot meet - such as requirements in size, format, documentation, etc. In such a case, consider depositing your data to ScholarSphere [1], Penn State's institutional repository. ScholarSphere takes any file format, and there is no maximum amount of data that users can deposit (although there are upload maximums because deposit occurs via the Web).
As Andrew Stephenson describes it, ScholarSphere is a time saver as well.
ScholarSphere is a self-deposit repository service ensuring the long-time preservation of data for ongoing access. No registration or creation of an account is necessary. The service is available to anyone in the Penn State community to use - all that is required is a current Web Access ID.
To learn more about ScholarSphere, visit its Help page [13].
Depositing your data to a formal repository such as those mentioned in the previous section is a good practice for research projects.
However, preserving your data and making them accessible in only one place is not enough. A distributed approach to storing your data is highly recommended. By being part of a campus community, a researcher has options beyond local storage of her data. One should investigate options beyond campus as well. This is something librarians and archivists can help with, as described in the video below.
Below are ways you can distribute storage of your data (based on U. Minnesota Libraries' "Storing Data Securely," [15].):
This way, files are physically (geographically) dispersed for disaster recovery purposes.
In the last section of the DMP, be sure to discuss how the project will store and preserve the data. This entails mention not only of any data repositories where the project will deposit data but also how, for the duration of the project, data storage will be handled and managed and kept secure. A distributed approach to data storage is the standard to follow, which includes maintaining at least three copies of data; keeping a "master" file for the sole purpose of making copies, and keeping files both in external hard drives and in external but remote storage or on remote servers.
Data sets deposited into a disciplinary repository have some advantages, including a greater chance of discovery by other researchers in your field because of their familiarity with such a repository. Examples of disciplinary data repositories (more of which can be found in DataBib):
Occasionally, there is not a disciplinary data repository available for your data. In such cases, you should consider depositing your data sets to Penn State's repository service, ScholarSphere, which is a self-deposit service that takes any format and requires no creation of an account. The only requirement for deposit is that you have a current Penn State Web Access ID.
True or False: ScholarSphere is a free service for all Penn State researchers.
(a) True
(b) False
ANSWER: (a) True. ScholarSphere, Penn State’s repository service for all its faculty, students, and staff does not charge any fees for usage. All that is required is a Penn State Web Access ID. Once you log into ScholarSphere, you automatically become a user (whether you deposit files or not). Currently, there is also no limit to the number of files you may deposit (i.e., no maximum on storage size for deposited files). ScholarSphere takes any file format; it can accommodate web uploads of single files up to 500MB and a folder of files up to 1 GB. Since it has Dropbox integration, files larger than 500MB may be uploaded to ScholarSphere via Dropbox. Finally, while the default access level is open access, users may adjust access to “Penn State only” or to “private.”
Links
[1] https://scholarsphere.psu.edu/
[2] http://commons.wikimedia.org/wiki/File:Open_Data_stickers.jpg
[3] http://www.youtube.com/watch?v=asVx9PFmefQ
[4] http://databib.org/
[5] http://www.youtube.com/watch?v=xMQKzU8mj6U
[6] http://datadryad.org/
[7] http://chemxseer.ist.psu.edu/
[8] http://data.esa.org/esa/style/skins/esa/index.jsp
[9] http://www.icpsr.umich.edu/icpsrweb/landing.jsp
[10] http://www.iedadata.org/
[11] mailto:l-data-mgmt@lists.psu.edu
[12] http://www.youtube.com/watch?v=6KfmjYJEreA
[13] https://scholarsphere.psu.edu/help/
[14] http://www.youtube.com/watch?v=8Hgy48S9t_A
[15] https://www.lib.umn.edu/datamanagement/storedata
[16] http://its.psu.edu/accounts/pass
[17] http://ait.its.psu.edu/services/storage/backup/tsm.html
[18] http://rcc.its.psu.edu/resources/hpc/
[19] http://rcc.its.psu.edu/
[20] http://aws.amazon.com/
[21] http://box.psu.edu/
[22] https://www.dropbox.com/
[23] http://home.elephantdrive.com/
[24] https://drive.google.com/
[25] https://www.jungledisk.com/
[26] https://spideroak.com/