Guides: Data Management Resources: File Organization

File Organization Tools

Tools for File Renaming

Windows

Mac

NameChanger

Linux

Tools for Version Control

Tools for File Organization

Tools for Citation Management

Tools for Workflow Management

Pegasus

File Naming Guidelines

There are two general rules for file organization: Be consistent and be descriptive. You want to make sure you and your colleagues can find anything you are looking for quickly. You'll need to figure out which specifics make the most sense to you and document your convention in a place everyone in your research group can follow. Here are some guidelines to include in your convention:

Choose 2-3 descriptors to identify the project or collection the item belongs to and what the specific item is. Have a standard for your research group so things can easily be found and shared.
Use capitals (camel case) or underscores instead of periods or spaces. Example: surveyResponseData.csv or survey_response_data.csv
Use no more than 30 characters whenever possible
Use date format ISO 8601: YYYY-MM-DD
- The year first format makes it easy to find newest/oldest files. Wikipedia's ISO 8601 page provides additional information on the date and time standard.
Avoid special characters in a file name. Common things to avoid are using spaces or ampersands (&).
Document your naming convention so you remember what it is and your project collaborators know what it is.

Feel free to use our File Naming Formula Template to help you and your team create meaningful file names that follow best practices. Once you fill it out, print it out and put it in a prominent place in your workplace. If you work on collaborative files, appoint someone to hold your team accountable for their file naming practices.

File Directory Organization

One of the key aspects of data management is keeping an organized file directory. If you have a streamlined and consistent way of organizing your files into directories, it saves you from having to continuously search through directories to find the right file. There is no 'best' way to organize your directories, but we have found that there are wrong ways. Below are the methods we have found that make for smoother directory organization:

Always keep original data files untouched. If you download a dataset or export it from a tool, keep it in a 'raw' file directory and then copy the file to edit or work with.
Structure your directories in a nested fashion around 3-4 directories deep
Use proper file names to communicate what the directory holds
For shared files, document what each directory holds and make sure everyone follows the organizational convention
Organize your directory by elements such as project, initiative, fiscal year, calendar year, laboratory, researcher, course, or specimen.

A good way to brainstorm the best directory structure for you is to identify the largest "buckets" of content you will have and what they attributes that they have in common. For example to organize files for college courses, the buckets of content might be academic year, course, homework, and readings. Using the largest bucket first, create nested directories to hold the content.

Download our Template Directory Structure to look at how you might structure a complex and data heavy research project. You can alter this template to suite your research needs and help map out the structure of your directory.

Example Directory Organization for College Courses (slash indicates directory):

\2023_Fall
\2024_Spring
- \PSYCH_342
  - Readings
  - Papers
  - Final_Projects
- \CRIM_400
  - Readings
  - Papers
  - Final_Projects

Version Control

Version control is the strategy of tracking of changes and edits to files and directories. This allows you to revert to previous versions if you make a mistake or even delete something! This can be a key practice for success on complicated projects and on collaborative teams.

Even if you're tracking changes with the software you're using, you should always keep a copy of the original unedited data available and save a new version when substantial changes are made. It's like saving your progress in a video game along the way so you don't have to go back to the beginning after coming across an unexpected challenge.

There are two main versions of conducting version control:

Manual Version Control
- The process of personally saving versions of your files along the research process. This is good option for those who do not have files that cooperate with software version control (such as rich text files or media files). The file storage system (Box, One Drive) or software (MS Word) you use may have some built in version control, but that is not the main purpose of the tool. Be consistent about when you save another version and how you keep track of your system.
  - Manual Version Control Suggestions
  - Document Version Control
  - NIH Version Control Guidelines - example of how to outline a version control convention
Software Version Control Systems
- Systems specifically designed to version control code. These are more complex than manual version control, but are more powerful and integrate into your research process easier. These systems save just the changes to your files instead of a new copy for each version.
  - Happy Git and GitHub for the useR - a guide for R users on using Git and GitHub for version control
  - Git and GitHub for Beginners [YouTube Video] - a crash course video on Git and GitHub from freeCodeCamp

Research Data Engineer

Lauren Phegley

she/her

Email Me

Head of Research Data Services

Lynda Kellam