Data Reference: Citing Data

Citing Data

We cite data for the same reasons we cite anything else:  to give credit to those who made the data available and to help our readers find the data we used.  A good citation answers 4 basic questions:

  • Who collected, produced, or provided this resource? 
    • Authors,  researchers, data collectors, and/or organizations that sponsored the research.
  • What is this resource? 
    • A title, the data's version or edition, the format or resource type.
  • When was this resource collected, created, or made available? 
    • A dataset may have a date-of-collection as well as a publication date.
  • Where can someone find this resource?
    • The website, organization, and/or publisher that provides access to the data. Whenever possible, this should include an identifier like a DOI or a URL for the data's website. Make sure your citation includes enough information for a reader to find the data easily!

The details you need to answer these questions will be different, depending on the data you are using.  It's not an exact science, but as a general rule, a good citation will answer these questions accurately and thoroughly.  It's better to have too much information than too little!

Data Citation Instructions

Many datasets, databases, and data resources will give you a recommended citation, or suggest how you should cite data from that source.  

Sometimes the website has this information on individual data set pages. More frequently, the website or database where you found your data will also have information on how to cite that data in their FAQs, "About" page, or "How to Use" information.

Data Citation Tools

Formatting Citations

The APA Style Guide provides a recommended citation format for databases, with examples, but other style guides, including MLA and Chicago, don't--so you'll have to create your own.

For citation styles that do not have a specific dataset format, you can base your citation on the closest equivalent formats:  if the dataset is online, use a format for online items, and if it was created by multiple "authors" or editors or researchers, use a format for edited works or items with more than one author.  Or, just base your citation off of the general reference format for your style guide.

If a grant application, scholarly journal, or instructor has strict requirements for citations, they'll usually make this clear, and may provide examples.  If they don't offer an example for citing the kind of dataset you are using, don't be afraid to ask! 

Citation Resources

Copyright & Use Restrictions

Copyright and Data

Under U.S. law, facts (and many collections of facts) are not protected by copyright; only original creative expressions are copyrightable.  Even if someone claims that their data is copyrighted, the data themselves--the numbers or records or points of data--are not copyrightable.  You can reuse that data in your own research without worrying about copyright infringement.

Copyright and Data Visualizations

Graphs, charts, and data visualizations can be more complicated.  U.S. copyright law protects original works of authorship that are fixed in a tangible medium of expression.  If a graph, chart, or data visualization doesn't have any originality--if the data are displayed in the only obvious way, or there's no creative choices beyond arbitrary colors in a graph--then it might not be protected by copyright at all.  The more creative and original a visualization is, the more likely it's protected by copyright. 

License and Use Agreements

Data may, however, be protected by licenses, contracts, or use agreements.  In order to access and use some datasets, like this polling data from the Roper Center, you have to agree to their terms of use.  Those terms might limit how and where you use data, and what you do with the data once you're done using it.  Even if a dataset is not protected by copyright, a use agreement can still control how we use that data.

Subjects: Data
  • Last Updated: Sep 21, 2021 9:53 AM
  • URL: