Guides: Data Management Resources: Documentation

Data Management For Teams

If data management is important for the individual researcher, it is doubly important to a research team. All the actions that lead to well managed or poorly managed data are compounded and exacerbated to have larger impacts due to the amount of people working together.

The two most valuable times for conducting data management are during on-boarding and off-boarding, as these are the periods of transition where data management has the most impact.

We created the following resources for you to use to guide your group into a smooth transition and support building good research practices into your workflow.

You may have heard people tell you to create metadata to go along with your data. The reason for this recommendation is so that your data will be understandable and usable in the future - either for you and your lab members or for a wider audience should you share your data outside the lab. There are many ways to document your data beyond using metadata, though, and more information on all of them are here. If you have questions please ask!

Recommended Practice

Keep a file with information about your project in the same folder as your other files. A rule of thumb is to write as much information as necessary to understand your data.

Project Level

Title: Name of the dataset or research project that produced it
Creator: Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane).
Identifier: Unique number used to identify the data, even if it is just an internal project reference number
Date: Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
Method: How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
Processing: How the data have been altered or processed (e.g., normalized)
Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
Funder: Organizations or agencies who funded the research

File Level

Subject: Keywords or phrases describing the subject or content of the data
Place: All applicable physical locations
Language: All languages used in the dataset
Variable list: All variables in the data files, where applicable
Code list: Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')

Technical Description

File inventory: All files associated with the project, including extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
File Formats: Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
File structure: Organization of the data file(s) and layout of the variables, where applicable
Version: Unique date/time stamp and identifier for each version
Checksum: A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
Necessary software: Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data

Access

Rights: Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
Access information: Where and how your data can be accessed by other researchers

Research Data Engineer

Lauren Phegley

she/her

Email Me

Head of Research Data Services

Lynda Kellam

she/her

Email Me

Subjects: Data & GIS