The Stanford NLP Group makes parts of our Natural Language Processing software available to the public. These are statistical NLP toolkits for various major computational linguistics problems. They can be incorporated into applications with human language technology needs.

All the software we distribute is written in Java. All recent distributions require Sun/Oracle JDK 1.5+. Distribution packages include components for command-line invocation, jar files, a Java API, and source code.

Supported software distributions

This code is being developed, and we try to answer questions and fix bugs on a best-effort basis.

All these software distributions are open source, licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses, but does not allow its incorporation into any type of distributed proprietary software, even in part or in translation. Commercial licensing is also available; please contact us if you are interested.

Stanford CoreNLP
An integrated suite of natural language processing tools for English in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference.
The Stanford Parser
Implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser in Java. Includes: Online parser demo, Stanford Dependencies page, and Parser FAQ.
The Stanford POS Tagger
A maximum-entropy (CMM) part-of-speech (POS) tagger for English, Arabic, Chinese, and German, in Java.
The Stanford Named Entity Recognizer
A Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German.
Stanford Chinese Word Segmenter
A CRF-based Chinese Word Segmenter in Java.
The Stanford Classifier
A machine learning classifier, directed at text categorization. A conditional loglinear classifier (a.k.a. a maximum entropy or multiclass logistic regression model).
Tregex and Tsurgeon
A Tgrep2-style utility for matching patterns in trees, and a tree-transformation utility built on top of this matching language.
Topic Modeling Toolbox
A suite of topic modeling tools for social scientists and others who wish to perform analysis on datasets that have a substantial textual component.
Phrasal
A state-of-the-art phrase-based machine translation system.
Stanford Biomedical Event Parser (SBEP)
Biomedical Event Extraction for the BioNLP 2009/2011 shared task.
Stanford EnglishTokenizer
A fast tokenizer for English text (producing Penn Treebank tokenization, roughly)

Binary software distributions

These systems are not available as source code, but only as compiled Java byte-code and libraries.

Entailment-based MT evaluation software
Software to predict the adequacy of MT system output. The scoring is based in assessing the quality of entailment between the system output and the reference translation.

End-of-life distributions

This is software that we at one point distributed. But we feel either that we are unable to or it isn't useful to maintain it any more. It's still here in case it's useful, but we won't answer questions about it.

FrameNet Reader software
Support files for reading FrameNet XML files (as they existed in 2002-03 - FrameNet version 0.75/1.0) into Java data structures.
Simple manual annotation tool
A simple tool for annotating spans of text with classes suitable for supervised training of named entity recognition and information extraction models. Works on plain text and HTML documents. Click to download stanford-manual-annotation-tool-2004-05-16.tar.gz.