Penn State Data Management Plan Tutorial

1.3 Data Formats

In addition to documenting the types of data likely to be collected, a DMP also describes the format(s) the data are likely to take. Experienced researchers are familiar with how frequently formats and storage devices change. Dr. Stephenson describes his experience in the following video.


The formats that research data can take include, but are not necessarily limited to, the following (from the U. Edinburgh, “Defining research data” (pp. 5-6), in Edinburgh University Data Library Research Data Management Handbook):

  • Text files - MS Word docs, .txt files, PDF, RTF, XML (Extensible Markup Language)
  • Numerical - SPSS, Stata, Excel
  • Multimedia - jpg / jpeg, gif, tiff, png, mpeg, mp4, QuickTime
  • Models - 3D, statistical
  • Software - Java, C, Python
  • Discipline specific formats - Flexible Image Transport System (FITS) in astronomy, Crystallographic Information File (CIF) for crystallography
  • Instrument specific formats - Olympus Confocal Microscope Data Format, Carl Zeiss
  • Specimen collections

Other factors to consider when thinking about data formats include whether the format is proprietary or is an open, community-supported standard. Some formats that are proprietary, such as .docx and .xlsx, are widely used, that it is likely they will be around for a long time, thus avoiding format obsolescence.

The use of formats that are open, well-documented standards with robust usage by researchers helps ensure that your data will be accessible over the long term (from “File Formats for Long-Term Access,” at MIT’s Data Management and Publishing site). Ben Goldman describes some resources available to help researchers select appropriate formats in the following video.


