Skip to main content

Japanese Text Analysis: Overview

A guide to tools, corpora, and other resources related to Japanese text analysis and natural language processing, with a focus on the digital humanities.


Help and Resources

Morphological Analysis


  • Voyant Tools now works with Japanese (including tokenizing your text)
  • Topic Modeling Tool also works out-of-the-box with pre-tokenized Japanese
  •'s Text Tools works with pre-tokenized Japanese (uncheck tokenize by character in the options)



Word segmentation, part-of-speech tagging, and more:

Geographic services:

  • GeoNLP geographic name services and software from NII
  • GeoNames (English place names only) downloadable and web-searchable gazetteer



Modern usage:


NINJAL corpora:

Check out the Center for Corpus Development, NINJAL for links to many corpora and databases. Here are some notable corpora.


Subject Guide

Molly Des Jardin's picture
Molly Des Jardin
Van Pelt 527
Social: Flickr Page

Japanese WordNet

WordNet, including synsets (synonym sets) only, has been created for Japanese. Please visit the page to download the sqlite3 database of Japanese WordNet, then use one of the APIs in a variety of programming languages to use it in your own code. Here is a link to the Python API for Japanese WordNet.

Sample Segmented Corpora

These are plain-text files (compressed in zip format) of a couple of NINJAL corpora, segmented with spaces between words to be used with software for Western text analysis like MALLET, Topic Modeling Tool, and Voyant.