Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
- Top row should be headers with labels.
If labels aren't clear, include a ReadMe or a Data Dictionary (see Documentation Tab)
- Each row under that is a single record.
- Each column is a single variable.
- Every column should be consistent.
All numbers should have the same number of decimal places. All dates should be formatted consistently. All Text fields should use controlled vocabulary. All coded values should be consistent. Etc.
- Don't use color or comments to add meaning
Color and comments may not migrate into new formats and the meaning of colors are easily forgotten. Instead, add another column with the information you want to note
- Don't leave cells empty
Have a method for noting "No Answer", "Null", or "Missing" values so these cells are not mistaken as zeros or otherwise misinterpretted
- Put notes in a separate file
The point of a spreadsheet is to have your data organized neatly so you can run calculations on it easily and/or reorder and filter your data. If you put notes in the document you limit your ability to do this
- Double check your dates, numeric fields, and gene names
Excel has many bad habits related to dates and numeric fields. It auto-formats fields that it perceives to be dates. If you're using a Mac to open an Excel file that was created on a Windows machine, you might notice a 4-year variation in the dates. A recent study found a large number of paper have gene name errors where Excel converted the names of some genes to dates. Excel does not support dates prior to January 1, 1900 (or 1904 if using a Mac).
Read more about Excel on our Excel Page