This guide was created by Rachel Liu. She is the Research Data and Digital Scholarship Text and Data Mining Assistant at Van Pelt-Dietrich Library Center. Rachel is a graduate student in Learning Sciences & Technologies concentrating on Education Data Mining.
Software for Text Analysis
Once you have built your corpus, you will need to use specialized software to analyze it. Different kinds of software are suited to different disciplines and research questions. The software listed below do not require programming language knowledge.
Google NGram Viewer: Google Ngram Viewer is a tool that allows you to explore language usage trends over time.
Google Pinpoint: Part of Google’s Journalist Studio, search keywords and identify entities in large amounts of text.
Voyant Tools: Voyant tool is an open-source, web-based text reading and analysis environment.
AntConc: (Tutorial) A freeware corpus analysis toolkit for concordancing and finding clusters (frequency patterns of word sequences) or n-grams (sequences of n words within your corpus or document).
MALLET: MAchine Learning for LanguagE Toolkit is a Java programming language-based software for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.