Skip to Main Content
Go to Penn Libraries homepage   Go to Guides homepage
Banner: RDDS; Research Data & Digital Scholarship displayed between 3D mesh surfaces

Text Analysis

A guide to text mining tools and methods

Voyant

Voyant is a web-based text analysis tool that allows users to visualize and analyze textual data. Its powerful features could effectively identify patterns and trends within a corpus. Meanwhile, researchers and scholars can also leverage the useful statistics provided by Voyant, such as vocabulary densities, distinctive words, and word frequencies to better understand the structure of their data.

Developed by Stéfan Sinclair (McGill University) and Geoffrey Rockwell (University of Alberta)
Sinclair, Stéfan and Geoffrey Rockwell, 2016. Voyant Tools. Web. http://voyant-tools.org/.

● Free, Open-source project
● Web-based text reading and analysis environment
● Lower the barrier of entry for text analysis

  • No coding required
  • No login required

● Large, robust user community
● Consistently upgraded and supported infrastructure
● Balances user-friendliness with powerful functionality

  • 25+ visualization tools

Getting Started

Please follow the steps outlined below to get started with Voyant.

  1. To begin with, open a web browser and navigate to the Voyant website (https://voyant-tools.org/.
  2. Then, click on the "Upload" button beneath the search box to upload your text corpus. Please be aware that the supported document type includes plain text, HTML, XML, PDF, and Microsoft Word documents. If you would like to explore the features of Voyant but do not have any specific textual data in mind, you can also access a shared text corpus by clicking on the "Open" button.

        3. After uploading or selecting the corpus, Voyant will direct you to the main interface for detailed explorations.

Exploring Voyant

The powerful features of Voyant include:

  • Cirrus: The cirrus feature creates a word cloud with the words that appear most frequently in the corpus. You could also use the slide bar below to adjust the number of terms that appear in the word cloud.

  • Trends: The trends feature generates a graph that demonstrates how the frequency of a particular word changes over time. You could adjust the type of plot (stacked bar, line plot, area plot, etc.) in the "Display" section.

  • Summary: The summary feature provides a summary of the text corpus, including vocabulary densities, word counts, the most frequent words, and other statistics in the corpus.

  • Reader: The reader feature allows users to read through their text corpus while highlighting specific words or phrases along the way.

  • Contexts: The contexts feature can display a concordance (or surrounding context) and calculate the correlation (and significance of correlation) of words or phrases in the corpus.

 

 

Bookmarking Corpus

One of the most useful features of Voyant Tools is the ability to bookmark and share URLs that refer to your collection of texts. It is especially convenient if you want to work with the same texts during different sessions. To export a link for your corpus and the current set of tools, please click on the "Export" (diskette) icon located in the blue bar at the top of the Voyant interface. Then, choose the format in which you want to export your link and click on the "Export" button.

 

 

You can also export the tools and data as an HTML snippet and embed the link in another web paper. This may be useful for presentations and reports. For example, we have included an example below using the built-in Voyant data "Shakespeare's Plays".

Penn Libraries Home Franklin Home
(215) 898-7555