|
The Stanford NLP Group makes parts of our
Natural Language Processing software
available to the public. These are statistical NLP toolkits for
various major computational linguistics problems. They can be
incorporated into applications with human language technology needs.
All the software we distribute is written in Java. All recent
distributions require Sun/Oracle JDK 1.5+.
Distribution packages include components for command-line
invocation, jar files, a Java API, and source code.
Supported software distributions
This code is being developed, and we try to answer questions and
fix bugs on a best-effort basis.
All these software distributions are open source,
licensed under the
GNU
General Public License (v2 or later).
Note that this is the full GPL,
which allows many free uses, but
does not allow its incorporation into any type of distributed
proprietary software,
even in part or in translation.
Commercial licensing is also available; please contact us if you are interested.
- Stanford CoreNLP
-
An integrated suite of natural language processing tools for English in
Java, including tokenization, part-of-speech tagging, named
entity recognition, parsing, and coreference.
Online corenlp demo
- Stanford Parser
-
Implementations of probabilistic natural language
parsers, both highly optimized PCFG and dependency parsers,
and a lexicalized PCFG parser in Java. Includes:
Online parser demo,
Stanford Dependencies page,
and Parser FAQ.
- Stanford POS Tagger
-
A maximum-entropy (CMM) part-of-speech
(POS) tagger for English, Arabic, Chinese, French, and German, in Java.
- Stanford Named Entity Recognizer
-
A Conditional Random Field sequence
model, together with well-engineered features for Named Entity
Recognition in English and German.
Online ner demo
- Stanford Word Segmenter
-
A CRF-based word segmenter in Java. Supports Arabic and Chinese.
- Stanford Classifier
-
A machine learning classifier, directed at text
categorization. A conditional loglinear classifier
(a.k.a. a maximum entropy or multiclass
logistic regression model).
- Tregex and
Tsurgeon
-
A Tgrep2-style utility for matching
patterns in trees, and a tree-transformation utility built
on top of this matching language.
- Phrasal
-
A state-of-the-art phrase-based machine translation system.
- Stanford Biomedical Event Parser (SBEP)
-
Biomedical Event Extraction for the BioNLP 2009/2011 shared task.
- Stanford EnglishTokenizer
- A fast tokenizer for English text (producing Penn Treebank
tokenization, roughly)
Other open source software distributions
- Topic Modeling Toolbox
-
A suite of topic modeling tools for social scientists and others
who wish to perform analysis on datasets that have a substantial
textual component.
Binary software distributions
These systems are not available as source code, but only as
compiled Java byte-code and libraries.
- Entailment-based MT evaluation
software
-
Software to predict the adequacy of MT system output. The
scoring is based in assessing the quality of entailment
between the system output and the reference translation.
End-of-life distributions
This is software that we at one point distributed. But we
feel either that we are unable to or it isn't useful to maintain it
any more. It's still here in case it's useful, but we won't
answer questions about it.
- FrameNet Reader software
- Support files for reading FrameNet XML files (as they
existed in 2002-03 - FrameNet version 0.75/1.0) into Java
data structures.
- Simple manual annotation tool
- A simple tool for annotating spans of text with classes
suitable for supervised training of named entity recognition and
information extraction models. Works on plain text and HTML
documents. Click to download
stanford-manual-annotation-tool-2004-05-16.tar.gz.
|
|
|