Skip to Main Content
Go to Penn Libraries homepage   Go to Guides homepage
Banner: RDDS; Research Data & Digital Scholarship displayed between 3D mesh surfaces


Prepare Dataset for Sharing

File Naming 

Use descriptive file names with no special characters or spaces. The file names should be unique enough from one another as to not get confused. All files should follow a similar naming pattern, which is called a file naming schema. Make sure in your README document explains exactly what the file naming schema is. More information on file naming is available on the File Organization page of our Data Management Resources Guide, along with the File Naming Formula Template that you can use to develop meaningful names.  

File Format

Files should be in non-proprietary (vendor independent), open, and standard file formats or at the very least file types that are common in your field. ScholarlyCommons lists its preferred formats in the Preferred Formats page in the Policies section of this ScholarlyCommons Guide. Saving files in a non-proprietary and open file formats, such as Comma Separated Value (.csv), Plain Text (.txt), Markdown (.md) allows for the highest likelihood of data being preserved overtime. In addition, files should not have any encryption or password protection, as this removes our ability to conduct digital preservation. Lastly, if data needs to be compressed, it should only be in lossless compression format (such as GZIP). Ideally, try to not compress files before submitting them into ScholarlyCommons.  

File Organization

If you need your data to be in a folder, instead of just uploading ten or less individual items, you will need to compress your files into a lossless compression format, such as .tar or .tar.gz. Before you compress your files, make sure to organize them into meaningful directories based on similar types, uses, or meaning. This might mean that executable code goes in one directory, a cleaned raw data goes into another directory, and outputs from the analysis goes into another. In your README file documentation, describe the files organization meaning and the relationship files have with each other. 

Prepare Documentation for Sharing


Documentation is information on the structure, contents, and layout of a dataset (ICSPR). Documentation must clearly describe your dataset so others can interpret and use your files. Good documentation is one of the deciding factors of effective data sharing. 

It is important to provide all documentation that is required to help a reader understand and use the dataset. This can take the form of README files, data dictionaries, codebooks, unsigned informed consent forms, survey instruments, and commented runnable code. 



During submission to ScholarlyCommons, the submitter provides high level documentation such as title, creator, license, etc. This type of documentation is also called metadata, which in its simplest definition is "data about data". This metadata allows your research to be discoverable and provide the user with critical information about the dataset. 

Only certain fields are required, but the more metadata you provide during the submission process, the more FAIR (Findable, Accessible, Interoperable, and Reusable) it will be. The required fields for the Dataset submission form are: Title, Author, Penn Collection(s), Abstract, Discipline, Subject, Data type, Distributor, and License. 

It is also incredibly important to connect the data to other related publications, datasets, or software, as this allows people to find more information related to the dataset and provides a clearer scope of the research work. You can associate items in the 'Related publications or datasets' metadata field by adding the associated items DOI or other persistent identifier (Handle, Permalink, etc.). If there is not persistent identifier, use a URL. 

For those who are curious, we use the Dublin Core Metadata Schema for our metadata fields, with a few legacy fields and homegrown ones as well. You can look at an items full metadata record by clicking on the "View all metadata" button at the bottom of a record page.

Penn Libraries Home Franklin Home
(215) 898-7555