See: Description
Interface | Description |
---|---|
KBPRelationExtractor |
An interface for a KBP-style relation extractor
|
PriorModelFactory<IN extends CoreMap> |
Class | Description |
---|---|
AbstractSequenceClassifier<IN extends CoreMap> |
This class provides common functionality for (probabilistic) sequence models.
|
ChineseMorphFeatureSets |
A class for holding Chinese morphological features used for word segmentation and POS tagging.
|
ChineseQuantifiableEntityNormalizer |
A Chinese correspondence of the
QuantifiableEntityNormalizer that normalizes NUMBER, DATE, TIME,
MONEY, PERCENT and ORDINAL amounts expressed in Chinese. |
ClassifierCombiner<IN extends CoreMap & HasWord> |
Merges the outputs of two or more AbstractSequenceClassifiers according to
a simple precedence scheme: any given base classifier contributes only
classifications of labels that do not exist in the base classifiers specified
before, and that do not have any token overlap with labels assigned by
higher priority classifiers.
|
EmbeddingFeatureFactory |
For features generated from word embeddings.
|
EmpiricalNERPrior<IN extends CoreMap> |
This was the empirical NER prior used for long distance consistency
in the Finkel et al.
|
EmpiricalNERPriorBIO<IN extends CoreMap> | |
EmpiricalNERPriorBIOFactory<IN extends CoreMap> | |
EmpiricalNERPriorFactory<IN extends CoreMap> |
Used for creating an NER prior by reflection.
|
EntityCachingAbstractSequencePrior<IN extends CoreMap> |
This class keeps track of all labeled entities and updates
its list whenever the label at a point gets changed.
|
EntityCachingAbstractSequencePriorBIO<IN extends CoreMap> |
This class keeps track of all labeled entities and updates the
its list whenever the label at a point gets changed.
|
KBPBasicSpanishCorefSystem |
Perform basic coreference for Spanish
|
KBPEnsembleExtractor |
An ensemble of other KBP relation extractors.
|
KBPRelationExtractor.Accuracy |
A class to compute the accuracy of a relation extractor.
|
KBPRelationExtractor.KBPInput | |
KBPSemgrexExtractor |
A tokensregex extractor for KBP.
|
KBPStatisticalExtractor |
A relation extractor to work with Victor's new KBP data.
|
KBPTokensregexExtractor |
A tokensregex extractor for KBP.
|
KBPTokensregexExtractor.Object |
IMPORTANT: Don't rename this class without updating the rules defs file.
|
KBPTokensregexExtractor.Subject |
IMPORTANT: Don't rename this class without updating the rules defs file.
|
NERClassifierCombiner |
Subclass of ClassifierCombiner that behaves like a NER, by copying
the AnswerAnnotation labels to NERAnnotation.
|
NERFeatureFactory<IN extends CoreLabel> |
Features for Named Entity Recognition.
|
NERFeatureFactory.FeatureCollector |
This class handles collecting features into a set, in a more memory efficient way.
|
NERGUI | |
NERServer |
A named-entity recognizer server for Stanford's NER.
|
NERServer.NERClient |
This example sends material to the NER server one line at a time.
|
NumberNormalizer |
Provides functions for converting words to numbers.
|
PresetSequenceClassifier<IN extends CoreMap> |
Created by jebolton on 7/14/17.
|
QuantifiableEntityNormalizer |
Various methods for normalizing Money, Date, Percent, Time, and
Number, Ordinal amounts.
|
UniformPrior<IN extends CoreMap> |
Uniform prior to be used for generic Gibbs inference in the ie.crf.CRFClassifier.
|
UniformPriorFactory<IN extends CoreMap> |
Enum | Description |
---|---|
KBPRelationExtractor.NERTag |
A list of valid KBP NER tags.
|
KBPRelationExtractor.RelationType |
Known relation types (last updated for the 2013 shared task).
|
KBPRelationExtractor.RelationType.Cardinality | |
NERClassifierCombiner.Language |
This package implements various subpackages for information extraction. Some examples of use appear later in this description. At the moment, three types of information extraction are supported (where some of these have internal variants):
There are some demonstrations of the stuff here which you can run (and several
other classes have main()
methods which exhibit their
functionality):
NERGUI
is a simple GUI front-end to the NER tagging
components.crf/NERGUI
is a simple GUI front-end to the CRF-based NER tagging
components. This version only supports the CRF-based NER tagger.demo/NERDemo
is a simple class examplifying the programmatical use
of the CRF-based NER tagger.
0. Setup: For all of these examples except 3., you need to be
connected to the Internet, and for the application's web search module
to be
able to connect to search engines. The web search
functionality is provided by the supplied edu.stanford.nlp.web
package. How web search works is controlled
by a websearch.init
file in your current directory (or if
none is present, you will get search results from AltaVista). If
you are registered to use the GoogleAPI, you should probably edit
this file so web queries can be done to Google using their SOAP
interface. Even if not, you can specify additional or different
search engines to access in websearch.init
.
A copy of this file is supplied in the distribution. The
DescExtractor
in 4. also requires another init file so that
it can use the include part-of-speech tagger.
1. Corporate Contact Information. This illustrates simple information
extraction from a web page.
Using the included
ExtractDemo.bat
or by hand run:
java edu.stanford.nlp.ie.ExtractDemo
serialized-extractors/companycontact
serialized-extractors/companycontact/Corporation-Information.kaon
Corporation
as the Concept to extract.Extract
, and look at the results:
http://www.ziatech.com/
http://www.cs.stanford.edu/
http://www.ananova.com/business/story/sm_635565.html
Audiovox Corporation
2. Corporate Contact Information merged. This illustrates the addition
of information merger across web pages. Using the included
MergeExtractDemo.bat
or similarly do:
java edu.stanford.nlp.ie.ExtractDemo -m
The ExtractDemo
screen is similar, but adds a button to
Select a Merger.
serialized-extractors/mergers
and Select the file
unscoredmerger.obj
.Audiovox Corporation
3. Company names via direct use of an HMM information extractor.
One can also train, load, and use HMM information extractors directly,
without using any of the RDF-based KAON framework
(http://kaon.semanticweb.org/
) used by ExtractDemo.
edu.stanford.nlp.ie.hmm.Tester
illustrates the use
of a pretrained HMM on data via the command line interface:
cd serialized-extractors/companycontact/
java edu.stanford.nlp.ie.hmm.Tester cisco.txt company
company-name.hmm
java edu.stanford.nlp.ie.hmm.Tester EarningsReports.txt
company company-name.hmm
java edu.stanford.nlp.ie.hmm.Tester companytest.txt
company company-name.hmm
The first shows the HMM running on an unmarked up file with a single
document. The second shows a Corpus
of several
documents, separated with ENDOFDOC, used as a document delimiter
inside a Corpus. This second use of Tester
expects to
normally have an annotated corpus on which it can score its answers.
Here, the corpus is unannotated, and so some of the output is
inappropriate, but it shows what is selected as the company name
for each document (it's mostly correct...).
The final example shows it running on a corpus that does have answers
marked in it. It does the testing with the XML elements stripped, but
then uses them to evaluate correctness.
ENDOFDOC
on them. Then one can
train (and then test) as follows. Training an HMM
(optimizing all its probabilities) takes a long time
(it depends on the speed of the computer, but 10 minutes or
so to adjust probabilities for a fixed structure, and often
hours if one additionally attempts structure learning).
cd edu/stanford/nlp/ie/training/
java -server edu.stanford.nlp.ie.hmm.Trainer companydata.txt
company mycompany.hmm
java edu.stanford.nlp.ie.hmm.HMMSingleFieldExtractor Company
mycompany.hmm mycompany.obj
java edu.stanford.nlp.ie.hmm.Tester testdoc.txt company
mycompany.hmm
ExtractDemo
. Note that company
in the second line must match the element name in the
marked-up data that you will train on, while
Company
in the third line must match the
relation name in the ontology over which you will extract with
mycompany.obj
. These two names need not be the
same. The last step then runs the trained HMM on a file.
4. Extraction of descriptions (such as biographical information about
a person or a description of an animal).
This does extraction of such descriptions
from a web page. This component uses a POS tagger, and looks for where
to find a path to it in the file
descextractor.init
in the current directory. So,
you should be in the root directory of the current archive,
which has such a file. Double click on the included
MergeExtractDemo.bat
in that directory, or by hand
one can equivalently do:
java edu.stanford.nlp.ie.ExtractDemo -m
serialized-extractors/description
serialized-extractors/description/Entity-NameDescription.kaon
serialized-extractors/mergers
and Select the file
unscoredmerger.obj
.Entity
as the Concept to extract.
Gareth Evans
Tawny Frogmouth
Christopher Manning
Joshua Nkomo