When you're collecting information about people, or any other type of sensitive information, it's important to consider how secure the data is at every stage of its life cycle - from collection to archiving. Penn has a number of resources available to help ensure your survey participants are protected as you do your research.
Storage depends on how sensitive the data you've collected is. Consult with IRB to determine whether the storage option you've chosen is appropriate. Also about what data can be stored in Penn+Box and Amazon Web Services.
For more information, see the Data Storage Best Practices guide and ISC's Information Security Polices and Procedures page.
Google Drive and non-Penn affiliated DropBox accounts are not recommended for storage of sensitive or human subjects data as these are not secure storage options.
Many of the above storage options have options for collaboration. Additionally, a relatively secure option for sharing data is LabArchives, when using through your Penn account. LabArchives is an online lab notebook, but can be used as storage and sharing space for data and information collected in all disciplines.
For sharing more sensitive files and information, Penn also offers a Secure Share service that allows you to send encrypted files to other Penn researchers.
Some data may not be publishable due to its highly sensitive nature. However, there are options for even very sensitive data.
Datasets may be published in a repository designed for sensitive data. Because we are a member institution of ICPSR, Penn researchers can publish data in this social science repository. If the data should not be fully public, it can be limited to researchers at other member institutions or even kept as onsite only, where researchers would need to travel to Ann Arbor to view the data files in person.
If your data can't be published, you can still publish a metadata record -or a record about your data- so that people know your dataset exists and roughly what information your dataset contains. This record may also include information about how to access the dataset if it is available privately somewhere.
There may be other publishing options for your sensitive dataset. Contact us to discuss the specifics of your data.
Password protecting your computer and or files is a great way to control access. Of course, if your password is too simple, it doesn't work so well.
Encryption converts your data into an unreadable code that requires a password or key to be read. You can encrypt data while it is stored on your hard drive or other storage medium, or you can encrypt the data while transferring it from one location to another.
SAS's Information Technology for Research has a very good explanation of these encryption processes.
A lot of researchers believe they can't share their data if it contains personally identifying information (PII). Certainly PII and other sensitive information should not be shared - but de-identification may prevent re-identification of human subjects and if that's not possible, there are other ways to share your data while minimizing risks for your subjects.
Direct identifiers are things like names, addresses, phone numbers, PennKeys, pictures or anything that could, on its own, identify a research participant. Some variables would not be direct identifiers in large datasets can be in small datasets if the response is rare.
Indirect identifiers are values that could be combined with other values to identify a participant. These identifiers could also be combined with other datasets or information to re-identify a participant. It's understood that a person can be identified with minimal information. In most cases, age, gender, and ZIP code are enough to identify a participant.
Methods for De-identifying Quantitative Data
The Qualtitative Data Repository at Syracuse University has some excellent additional advice on de-identification available here.