The Stanford NLP Group

The Stanford NLP Group makes some of our Natural Language Processing software available to everyone! We provide statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.

This code is actively being developed, and we try to answer questions and fix bugs on a best-effort basis.

All our supported software distributions are written in Java. Current versions of our software from October 2014 forward require Java 8+. (Versions from March 2013 to September 2014 required Java 1.6+; versions from 2005 to Feb 2013 required Java 1.5+. The Stanford Parser was first written in Java 1.1.) Distribution packages include components for command-line invocation, jar files, a Java API, and source code. You can also find us on GitHub and Maven. A number of helpful people have extended our work, with bindings or translations for other languages. As a result, much of this software can also easily be used from Python (or Jython), Ruby, Perl, Javascript, F#, and other .NET and JVM languages.

These software distributions are open source, licensed under the GNU General Public License (v3 or later for Stanford CoreNLP; v2 or later for the other releases). Note that this is the full GPL, which allows many free uses, but does not allow its incorporation (even in part or in translation) into any type of proprietary software which you distribute. Commercial licensing is also available; please contact us if you are interested. Bug fixes and code contributions are very welcome; see the contributing page on our GitHub site.

Questions

Have a support question? Please ask us on Stack Overflow using the tag stanford-nlp.

Feedback, questions, licensing issues, and bug reports / fixes can also be sent to our mailing lists (see immediately below).

Mailing Lists

We have 3 mailing lists for this tool, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu:

java-nlp-user This is the best list to post to in order to send feature requests, make announcements, or for discussion among JavaNLP users. (Please ask support questions on Stack Overflow using the stanford-nlp tag.)

You have to subscribe to be able to use this list. Join the list via this webpage or by emailing java-nlp-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 2-4 messages a year). Join the list via this webpage or by emailing java-nlp-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off using Stack Overflow or joining and using java-nlp-user. You cannot join java-nlp-support, but you can mail questions to java-nlp-support@lists.stanford.edu.

Stanford CoreNLP [backup download page]

An integrated suite of natural language processing tools for English, Spanish, and (mainland) Chinese in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference. See also: Stanford Deterministic Coreference Resolution, the online CoreNLP demo, and the CoreNLP FAQ.

Stanza

A Python natural language analysis package that provides implementations of fast neural network models for tokenization, multi-word token expansion, part-of-speech and morphological features tagging, lemmatization and dependency parsing using the Universal Dependencies formalism. Pretrained models are provided for more than 70 human languages. In addition, it is able to call the CoreNLP Java package and inherits additonal functionality from there, such as constituency parsing, coreference resolution, and linguistic pattern matching.

Stanford Parser

Implementations of probabilistic natural language parsers in Java: PCFG and dependency parsers, a lexicalized PCFG parser, a super-fast neural-network dependency parser, and a deep learning reranker. See also: Online parser demo, the Stanford Dependencies page, neural-network dependency parser documentation, and Parser FAQ.

Stanford POS Tagger

A maximum-entropy (CMM) part-of-speech (POS) tagger for English, Arabic, Chinese, French, German, and Spanish, in Java.

Stanford Named Entity Recognizer

A Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English, Chinese, German, and Spanish. Online NER demo.

Stanford RegexNER

Deterministically tag NER sequences with regular expressions.

Stanford Coreference Resolution

Rule-based, statistical, and neural models for nominal coreference resolution in Java.

Stanford Word Segmenter

A CRF-based word segmenter in Java. Supports Arabic and Chinese.

Stanford Classifier

A machine learning classifier, with good feature templates for text categorization. Provides a softmax (a.k.a., maximum entropy or multiclass logistic regression) classifier, Naive Bayes, and other options.

Stanford EnglishTokenizer

A fast tokenizer for English text (producing Penn Treebank tokenization, roughly)

Tregex, Tsurgeon, and Semgrex

Tools for matching patterns in linguistic trees (following the tgrep/tgrep2 tradition), a GUI for this, and a tree-transformation utility built on top of this matching language. Also, a similar utility for matching patterns in dependency graphs.

Stanford TokensRegex

A tool for matching regular expressions over tokens.

Stanford Temporal Tagger (SUTime)

A rule-based temporal tagger for English text. Online SUTime demo.

Stanford Pattern-based Information Extraction and Diagnostics (SPIED)

A boostrapped pattern-based entity extraction system.

Stanford Relation Extractor

A tool for extracting relations between entities.

Stanford Neural Machine Translation

Latest research on neural machine translation (NMT) at Stanford NLP group. We release our codebase which produces state-of-the-art results in various translation tasks such as English-German and English-Czech. In addtion, to encourage reproducibility and increase transparency, we release our preprocessed data and pretrained models as well.

Stanford Natural Language Inference Corpus (SNLI)

The SNLI corpus is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE).

Semantic Parsing with Execution (SEMPRE)

SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms.

Stanford Open Information Extraction

A tool for extracting open domain relation triples; e.g., "cats play with yarn" yields (cats; play with; yarn).

GloVe: Global Vectors for Word Representations

Software in C for learning state-of-the-art distributed word representations. We also distribute a number of sets of pre-trained word vectors.

Deep Learning for Sentiment Analysis

This page provides a live demo of fine-grained sentiment analysis using recursive neural networks on the Stanford Sentiment Treebrank.

Phrasal

A state-of-the-art phrase-based machine translation system.

Topic Modeling Toolbox (TMT)

A suite of topic modeling tools for social scientists and others who wish to perform analysis on datasets that have a substantial textual component. Unfortunately, this software is no longer developed or supported.

Stanford Biomedical Event Parser (SBEP)

Biomedical Event Extraction for the BioNLP 2009/2011 shared task.

Entailment-based MT Evaluation Software

Software to predict the adequacy of MT system output. The scoring is based in assessing the quality of entailment between the system output and the reference translation.

Simple manual annotation tool

A simple tool for annotating spans of text with classes suitable for supervised training of named entity recognition and information extraction models. Works on plain text and HTML documents. Click to download stanford-manual-annotation-tool-2004-05-16.tar.gz.