Skip to main content

Data Management Best Practices: Citing Data

The hardest part of citing a dataset is that you might not be used to doing so. If you used a dataset that was collected and published by people who aren't you, you should cite it like you would cite an article - in the reference, cited sources, and bibliographies sections of your works. Just like we give credit for articles and other documents that influenced our work, we should give credit to those who collected datasets we analyzed. 

How to Cite Data

Citing data is very much like citing anything else. You'll need to know: 

  • author/creator
  • date of publication
  • title, including version or edition
  • publisher or distributor (such as the name of the repository where the data was found)
  • URL, DOI or other persistent identifier

Example Citations from IASSIST's Quick Guide to Data Citation

APA (6th edition)

Smith, T.W., Marsden, P.V., & Hout, M. (2011). General social survey, 1972-2010 cumulative file (ICPSR31521-v1) [data file and codebook]. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi: 10.3886/ICPSR31521.v1

MLA (7th edition)

Smith, Tom W., Peter V. Marsden, and Michael Hout. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012. doi:10.3886/ICPSR31521.v1

Chicago (16th edition) (author-date)

Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1

There are FORCE11's Data Citation Principles, or further explanation of why we should cite data:

  • Importance - Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
  • Credit and Attribution - Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.
  • Evidence - In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited.
  • Unique Identification - A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.
  • Access - Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
  • Persistence - Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe.
  • Specificity and Verifiability - Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited.
  • Interoperability and Flexibility - Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities.

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/group/joint-declaration-data-citation-principles-final].