The Stanford NLP Group makes some of our
Natural Language Processing software
available to everyone! We provide statistical NLP,
deep learning NLP, and rule-based NLP tools for
major computational linguistics problems, which can be
incorporated into applications with human language technology needs.
These packages are widely used in industry, academia, and government.
Supported software distributions
This code is actively being developed, and we try to answer questions and
fix bugs on a best-effort basis.
All our supported software distributions are written in Java.
Current versions of our software from October 2014 forward require Java 8+.
(Versions from March 2013 to September 2014 required Java 1.6+; versions from
2005 to Feb 2013 required Java 1.5+. The Stanford Parser was
first written in Java 1.1.)
Distribution packages include components for command-line
invocation, jar files, a Java API, and source code. You can
also find us
A number of helpful people have extended our work, with bindings or
translations for other languages. As a result, much of this
software can also easily be used from Python (or Jython), Ruby,
These software distributions are open source,
licensed under the
General Public License (v3 or later for Stanford CoreNLP; v2 or later for the other releases).
Note that this is the full GPL,
which allows many free uses, but
does not allow its incorporation (even in part or in translation) into any type of
which you distribute.
Commercial licensing is also available; please contact us if you are interested.
Bug fixes and code contributions are very welcome; see the
on our GitHub site.
- Stanford CoreNLP
An integrated suite of natural language processing tools for
English, Spanish, and (mainland) Chinese in
Java, including tokenization, part-of-speech tagging, named
entity recognition, parsing, and coreference. See also:
Deterministic Coreference Resolution, the
demo, and the CoreNLP FAQ.
- Stanford Parser
Implementations of probabilistic natural language
parsers in Java: PCFG and dependency parsers,
a lexicalized PCFG parser, a super-fast neural-network dependency parser,
and a deep learning reranker. See also:
Online parser demo, the
Stanford Dependencies page,
neural-network dependency parser documentation,
and Parser FAQ.
- Stanford Named Entity Recognizer
A Conditional Random Field sequence
model, together with well-engineered features for Named Entity
Recognition in English, Chinese, German, and Spanish.
Online NER demo.
- Stanford POS Tagger
A maximum-entropy (CMM) part-of-speech
(POS) tagger for English, Arabic, Chinese, French, German, and Spanish, in Java.
- Stanford Word Segmenter
A CRF-based word segmenter in Java. Supports Arabic and Chinese.
- Stanford Classifier
A machine learning classifier, with good feature templates for text
categorization. Provides a softmax
(a.k.a., maximum entropy or multiclass
logistic regression) classifier, Naive Bayes, and other options.
- Tregex, Tsurgeon, and Semgrex
Tools for matching patterns in linguistic trees (following
the tgrep/tgrep2 tradition), a GUI for this, and a tree-transformation utility built
on top of this matching language. Also, a similar utility
for matching patterns in dependency graphs.
A state-of-the-art phrase-based machine translation system.
- Stanford EnglishTokenizer
- A fast tokenizer for English text (producing Penn Treebank
- Stanford TokensRegex
- A tool for matching regular expressions over tokens.
- Stanford Temporal Tagger (SUTime)
- A rule-based temporal tagger for English text.
Online SUTime demo.
- Stanford Pattern-based Information Extraction and Diagnostics (SPIED)
- A boostrapped pattern-based entity extraction system.
- Stanford Relation Extractor
- A tool for extracting relations between entities.
- Stanford Open Information Extraction
- A tool for extracting open domain relation triples; e.g., "cats play with yarn" yields
(cats; play with; yarn).
Other open source software distributions
- GloVe: Global Vectors for Word Representations
Software in C for learning state-of-the-art distributed word representations.
We also distribute a number of sets of pre-trained word vectors.
- Topic Modeling Toolbox (TMT)
A suite of topic modeling tools for social scientists and others
who wish to perform analysis on datasets that have a substantial
textual component. Unfortunately, this software is no
longer developed or supported.
- Stanford Biomedical Event Parser (SBEP)
Biomedical Event Extraction for the BioNLP 2009/2011 shared task.
Binary software distributions
These systems are not available as source code, but only as
compiled Java byte-code and libraries.
- Entailment-based MT evaluation
Software to predict the adequacy of MT system output. The
scoring is based in assessing the quality of entailment
between the system output and the reference translation.
This is software that we at one point distributed. But we
feel either that we are unable to or it isn't useful to maintain it
any more. It's still here in case it's useful, but we won't
answer questions about it.
- FrameNet Reader software
- Support files for reading FrameNet XML files (as they
existed in 2002-03 - FrameNet version 0.75/1.0) into Java
- Simple manual annotation tool
- A simple tool for annotating spans of text with classes
suitable for supervised training of named entity recognition and
information extraction models. Works on plain text and HTML
documents. Click to download
Have a support question? Please ask us on Stack Overflow
using the tag stanford-nlp.
Feedback, questions, licensing issues, and bug reports / fixes can also be sent to our
mailing lists (see immediately below).
We have 3 mailing lists for
all of which are shared
with other JavaNLP tools (with the exclusion of the parser). Each address is
java-nlp-user This is the best list to post to in order
to send feature requests, make announcements, or for discussion among JavaNLP
users. (Please ask support questions on
Stack Overflow using the
You have to subscribe to be able to use this list.
Join the list via this webpage or by emailing
firstname.lastname@example.org. (Leave the
subject and message body empty.) You can also
the list archives.
java-nlp-announce This list will be used only to announce
new versions of Stanford JavaNLP tools. So it will be very low volume (expect 2-4
messages a year). Join the list via this webpage or by emailing
email@example.com. (Leave the
subject and message body empty.)
java-nlp-support This list goes only to the software
maintainers. It's a good address for licensing questions, etc. For
general use and support questions, you're better off using Stack
Overflow or joining and using
You cannot join
java-nlp-support, but you can mail questions to