Skip to main content
Click logo to go to Libraries homepage

Data Management Best Practices: Spreadsheets

Spreadsheet Best Practices 

  • Top row should be headers with labels.
    If labels aren't clear, include a ReadMe or a Data Dictionary (see Documentation Tab)
     
  • Each row under that is a single record.
     
  • Each column is a single variable.
     
  • Every column should be consistent.
    All numbers should have the same number of decimal places. All dates should be formatted consistently. All Text fields should use controlled vocabulary. All coded values should be consistent. Etc.
     
  • Don't use color or comments to add meaning
    Color and comments may not migrate into new formats and the meaning of colors are easily forgotten. Instead, add another column with the information you want to note
     
  • Don't leave cells empty
    Have a method for noting "No Answer", "Null", or "Missing" values so these cells are not mistaken as zeros or otherwise misinterpretted
     
  • Put notes in a separate file
    The point of a spreadsheet is to have your data organized neatly so you can run calculations on it easily and/or reorder and filter your data. If you put notes in the document you limit your ability to do this
     
  • Double check your dates, numeric fields, and gene names
    Excel has many bad habits related to dates and numeric fields. It auto-formats fields that it perceives to be dates. If you're using a Mac to open an Excel file that was created on a Windows machine, you might notice a 4-year variation in the dates. A recent study found a large number of paper have gene name errors where Excel converted the names of some genes to dates. Excel does not support dates prior to January 1, 1900 (or 1904 if using a Mac).
     

Read more about Excel on our Excel Page 

Alternative Tools

Excel and other spreadsheets might not be the best option for your research. Here are some alternative tools to collect, store, and analyze your data:

R

SPSS

STATA

SAS

Python (Lynda tutorial; PennKey required)

SQL (Lynda tutorial; PennKey required)

Access (Lynda tutorial; PennKey required)

NVivo

MATLAB

More Tutorials from Penn Libraries

More Tools for Digital Scholarship

Also search Research DataQ for answers