#s-lg-box-27454552-container #s-lg-col-3 h2.s-lib-box-title {display: block;} Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text Mining at Penn Libraries

A guide to text mining resources at Penn Libraries

Existing Corpora

The process of building a corpus is complex, and researchers just getting started with text mining might want to rely on corpora that have been compiled and rigorously validated by others. The corpora listed on this page were built with particular goals or research questions in mind, and are likely to provide solid foundations for a first text mining project. Some are very large, and can be broken into subsets useful for answering questions about narrower time spans, genres, or geographic regions. Researchers interested in building entirely new corpora should look at Sources of Text Data instead.

Corpora Available to Penn Affiliates

Corpora Freely Available to the Public