Stanford Named Entity Recognizer (NER)

About | Questions | Mailing lists | Download | Extensions | Release history | FAQ

About

CRFClassifier is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. The software provides a general (arbitrary order) implementation of linear chain Conditional Random Field (CRF) sequence models, of the sort pioneered by Lafferty, McCallum, and Pereira (2001), coupled with well-engineered feature extractors for Named Entity Recognition. Included are a good 3 class (PERSON, ORGANIZATION, LOCATION) named entity recognizer for English (in versions with and without additional distributional similarity features) and another pair of models trained on the CoNLL 2003 English training data. The distributional similarity features improve performance but the models require considerably more memory.

The CRF code is by Jenny Finkel. The feature extractors are by Dan Klein, Christopher Manning, and Jenny Finkel. Much of the documentation and usability is due to Anna Rafferty. The CRF sequence models provided here do not precisely correspond to any published paper, but the correct paper to cite for the software is:

Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370. http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf

The software provided here is similar to the baseline local+Viterbi model in that paper, but adds new distributional similarity based features (in the -distSim classifiers). The models were all trained on the union of the CoNLL, MUC-6, MUC-7 and ACE named entity corpora, and as a result the models are fairly robust across domains.

You can look at a Powerpoint Introduction to NER and the Stanford NER package [ppt] [pdf] or the FAQ, which has some information on training models. Further documentation is provided in the included README and in the javadocs.

CRFClassifier requires Java 1.5. The tagger is licensed under the GNU GPL v2 or later. (Note that this is the full GPL, which allows its use for research purposes, free software projects, etc., but does not allow its incorporation into any type of distributed proprietary software, even in part or in translation. Source is included. The package includes components for command-line invocation and a Java API. Commercial licensing of Stanford NER is also available.)

Questions

There is also a list of Frequently Asked Questions (with answers!). Additional questions, feedback, and bug reports/fixes can be sent to our mailing lists.


Mailing Lists

We have 3 mailing lists for the Stanford Named Entity Recognizer, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu:

  1. java-nlp-user This is the best list to post to in order to ask questions, make announcements, or for discussion among JavaNLP users. You have to subscribe to be able to use it. Join the list via this webpage or by emailing java-nlp-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can also look at the list archives.
  2. java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 messages a year). Join the list via this webpage or by emailing java-nlp-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
  3. java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using java-nlp-user. You cannot join java-nlp-support, but you can mail questions to java-nlp-support@lists.stanford.edu.

Download

Download Stanford Named Entity Recognizer version 1.1.1
Download Stanford Named Entity Recognizer version 1.1 (compatible with Stanford POS Tagger 1.6)

The download is a 54,477,828 byte gzipped tar file (mainly consisting of classifier data objects). If you unpack that file, you should have everything needed. It includes batch files for running under Windows or Unix/Linux/MacOSX, a simple GUI, and the ability to run as a server.

Extensions: Packages by others using Stanford NER


Release History


VersionDateDescription
1.1.12009-01-16 Minor bug and usability fixes, and changed API (in particular the methods to classify and output tagged text)
1.12008-05-07 Additional feature flags, various code updates
1.02006-09-18 Initial release