Skip to main content
Click logo to go to Libraries homepage

Japanese Text Analysis: Overview

A guide to tools, corpora, and other resources related to Japanese text analysis and natural language processing, with a focus on the digital humanities.


Help and Resources

Morphological Analysis


  • Voyant Tools now works with Japanese (including tokenizing your text)
  • Topic Modeling Tool also works out-of-the-box with pre-tokenized Japanese
  •'s Text Tools works with pre-tokenized Japanese (uncheck tokenize by character in the options)



Word segmentation, part-of-speech tagging, and more:

Geographic services:

  • GeoNLP geographic name services and software from NII
  • GeoNames (English place names only) downloadable and web-searchable gazetteer



Modern usage:


NINJAL corpora:

Check out the Center for Corpus Development, NINJAL for links to many corpora and databases. Here are some notable corpora.



Subject Guide

Molly Des Jardin's picture
Molly Des Jardin
Van Pelt 527
Social:Flickr Page

Japanese WordNet

WordNet, including synsets (synonym sets) only, has been created for Japanese. Please visit the page to download the sqlite3 database of Japanese WordNet, then use one of the APIs in a variety of programming languages to use it in your own code. Here is a link to the Python API for Japanese WordNet.

Corpora ready for MALLET

These are plain-text files formatted for use with the topic modeling software MALLET. They contain the title of the work, author, and year, followed by the text with words separated by spaces. I have provided both lemmatized and raw format text when available.