|
|
About | Questions | Mailing lists | Download | Extensions | Models | Online demo | Release history | FAQ
Stanford NER (also known as CRFClassifier) is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. The software provides a general (arbitrary order) implementation of linear chain Conditional Random Field (CRF) sequence models, coupled with well-engineered feature extractors for Named Entity Recognition. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) for a better introduction.) Included with the download are good 3 class (PERSON, ORGANIZATION, LOCATION) named entity recognizers for English (in versions with and without additional distributional similarity features) and another pair of models trained on the CoNLL 2003 English training data. The distributional similarity features improve performance but the models require considerably more memory.
The CRF code is by Jenny Finkel. The feature extractors are by Dan Klein, Christopher Manning, and Jenny Finkel. Much of the documentation and usability is due to Anna Rafferty. The CRF sequence models provided here do not precisely correspond to any published paper, but the correct paper to cite for the software is:
Jenny Rose Finkel, Trond Grenager, and Christopher
Manning. 2005. Incorporating Non-local Information into Information
Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual
Meeting of the Association for Computational Linguistics (ACL 2005),
pp. 363-370.
http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
The software provided here is similar to the baseline local+Viterbi
model in that paper, but adds new
distributional similarity based features (in the -distSim
classifiers). The big models were trained on a mixture of CoNLL, MUC-6, MUC-7
and ACE named entity corpora, and as a result the models are fairly robust
across domains.
You can look at a Powerpoint Introduction to NER and the Stanford NER package [ppt] [pdf] or the FAQ, which has some information on training models. Further documentation is provided in the included README and in the javadocs.
Stanford NER is available for download, licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. Stanford NER code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing with a ready-to-sign agreement is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.
There is also a list of Frequently Asked Questions (with answers!). Additional questions, feedback, and bug reports/fixes can be sent to our mailing lists.
We have 3 mailing lists for the Stanford Named Entity Recognizer, all of which are shared
with other JavaNLP tools (with the exclusion of the parser). Before writing, please check to see if your question has been answered in the FAQ. Each address is
at @lists.stanford.edu:
java-nlp-user This is the best list to post to in order
to ask questions, make announcements, or for discussion among JavaNLP
users. You have to subscribe to be able to use it.
Join the list via this webpage or by emailing
java-nlp-user-join@lists.stanford.edu. (Leave the
subject and message body empty.) You can also
look at
the list archives.
java-nlp-announce This list will be used only to announce
new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3
messages a year). Join the list via this webpage or by emailing
java-nlp-announce-join@lists.stanford.edu. (Leave the
subject and message body empty.)
java-nlp-support This list goes only to the software
maintainers. It's a good address for licensing questions, etc. For
general use and support questions, you're better off joining and using
java-nlp-user.
You cannot join java-nlp-support, but you can mail questions to
java-nlp-support@lists.stanford.edu.
The download is a 66M zipped file (mainly consisting of classifier data objects). If you unpack that file, you should have everything needed. It includes batch files for running under Windows or Unix/Linux/MacOSX, a simple GUI, and the ability to run as a server. Stanford NER requires Java v1.6+.
| 3 class | Location, Person, Organization |
| 4 class | Location, Person, Organization, Misc |
| 7 class | Time, Location, Organization, Person, Money, Percent, Date |
edu/stanford/nlp/models/.... You can run
jar -t to get the list of files in the jar file.
| Version | Date | Description |
|---|---|---|
| 1.2.8 | -nthreads option | |
| 1.2.7 | Add Chinese model, include Wikipedia data in 3-class English model | |
| 1.2.6 | Minor bug fixes | |
| 1.2.5 | Fix encoding issue | |
| 1.2.4 | Caseless versions of models supported | |
| 1.2.3 | Minor bug fixes | |
| 1.2.2 | Improved thread safety | |
| 1.2.1 | Models reduced in size but on average improved in accuracy (improved distsim clusters) | |
| 1.2 | Normal download includes 3, 4, and 7 class models. Updated for compatibility with other software releases. | |
| 1.1.1 | Minor bug and usability fixes, and changed API (in particular the methods to classify and output tagged text) | |
| 1.1 | Additional feature flags, various code updates | |
| 1.0 | Initial release |
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · machines · Wiki · Calendar |
Site design by Bill MacCartney |