AbstractSequenceClassifier (Stanford JavaNLP API)

java.lang.Object
- edu.stanford.nlp.ie.AbstractSequenceClassifier<IN>

All Implemented Interfaces:

java.util.function.Function<java.lang.String,java.lang.String>

Direct Known Subclasses:

ChineseNumberSequenceClassifier, ClassifierCombiner, CMMClassifier, CRFClassifier, NumberSequenceClassifier, PresetSequenceClassifier, RegexNERSequenceClassifier
```
public abstract class AbstractSequenceClassifier<IN extends CoreMap>
extends java.lang.Object
implements java.util.function.Function<java.lang.String,java.lang.String>
```
This class provides common functionality for (probabilistic) sequence models. It is a superclass of our CMM and CRF sequence classifiers, and is even used in the (deterministic) NumberSequenceClassifier. See implementing classes for more information. An implementation must implement these 5 abstract methods:
List<IN> classify(List<IN> document);
List<IN> classifyWithGlobalInformation(List<IN> tokenSequence, final CoreMap document, final CoreMap sentence);
void train(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter);
void serializeClassifier(String serializePath);
void loadClassifier(ObjectInputStream in, Properties props) throws IOException, ClassCastException, ClassNotFoundException;
but a runtime (or rule-based) implementation can usefully implement just the first, and throw UnsupportedOperationException for the rest. Additionally, this method throws UnsupportedOperationException by default, but is implemented for some classifiers:
Pair<Counter<Integer>, TwoDimensionalCounter<Integer,String>> printProbsDocument(List<CoreLabel> document);

Author:

Jenny Finkel, Dan Klein, Christopher Manning, Dan Cer, sonalg (made the class generic)

Field Summary

Fields
Modifier and Type	Field and Description
`Index<java.lang.String>`	`classIndex`
`java.util.List<FeatureFactory<IN>>`	`featureFactories` Support multiple feature factories (NERFeatureFactory, EmbeddingFeatureFactory) - Thang Sep 13, 2013.
`SeqClassifierFlags`	`flags`
`protected MaxSizeConcurrentHashSet<java.lang.String>`	`knownLCWords` Different threads can add or query knownLCWords at the same time, so we need a concurrent data structure.
`protected IN`	`pad`
`int`	`windowSize`

Constructor Summary

Constructors
Constructor and Description
`AbstractSequenceClassifier(java.util.Properties props)` Construct a SeqClassifierFlags object based on the passed in properties, and then call the other constructor.
`AbstractSequenceClassifier(SeqClassifierFlags flags)` Initialize the featureFactory and other variables based on the passed in flags.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`apply(java.lang.String in)` Maps a String input to an XML-formatted rendition of applying NER to the String.
`java.lang.String`	`backgroundSymbol()` Returns the background class for the classifier.
`abstract java.util.List<IN>`	`classify(java.util.List<IN> document)` Classify a `List` of something that extends`CoreMap`.
`java.util.List<java.util.List<IN>>`	`classify(java.lang.String str)` Classify the tokens in a String.
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)`
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)`
`void`	`classifyAndWriteAnswers(java.lang.String textFile)` Load a text file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.lang.String testFile, boolean outputScores)` Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.lang.String testFile, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)` Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.lang.String testFile, java.io.OutputStream outStream, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)` If the flag `outputEncoding` is defined, the output is written in that character encoding, otherwise in the system default character encoding.
`Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`classifyAndWriteAnswers(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)`
`void`	`classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents, int k, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter)` Run the classifier on the documents in an ObjectBank, and print the answers to a given PrintWriter (with timing to stderr).
`void`	`classifyAndWriteAnswersKBest(java.lang.String testFile, int k, DocumentReaderAndWriter<IN> readerAndWriter)` Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
`void`	`classifyAndWriteViterbiSearchGraph(java.lang.String testFile, java.lang.String searchGraphPrefix, DocumentReaderAndWriter<IN> readerAndWriter)` Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.
`java.util.List<java.util.List<IN>>`	`classifyFile(java.lang.String filename)` Classify the contents of a file.
`void`	`classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> textFiles)` Run the classifier on a collection of text files.
`void`	`classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores)`
`Counter<java.util.List<IN>>`	`classifyKBest(java.util.List<IN> doc, java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField, int k)` Takes a list of tokens and provides the K best sequence labelings of these tokens with their scores.
`java.util.List<java.util.List<IN>>`	`classifyRaw(java.lang.String str, DocumentReaderAndWriter<IN> readerAndWriter)` Classify the tokens in a String.
`java.util.List<IN>`	`classifySentence(java.util.List<? extends HasWord> tokenSequence)` Classify a List of IN.
`java.util.List<IN>`	`classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence, CoreMap doc, CoreMap sentence)` Classify a List of IN using whatever additional information is passed in globalInfo.
`void`	`classifyStdin()`
`void`	`classifyStdin(DocumentReaderAndWriter<IN> readerWriter)`
`java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>>`	`classifyToCharacterOffsets(java.lang.String sentences)` Classify the contents of a `String` to classified character offset spans.
`java.lang.String`	`classifyToString(java.lang.String sentences)` Classify the contents of a String to a tagged word/class String.
`java.lang.String`	`classifyToString(java.lang.String sentences, java.lang.String outputFormat, boolean preserveSpacing)` Classify the contents of a `String` to one of several String representations that shows the classes.
`abstract java.util.List<IN>`	`classifyWithGlobalInformation(java.util.List<IN> tokenSequence, CoreMap document, CoreMap sentence)` Classify a `List` of something that extends `CoreMap` using as additional information whatever is stored in the document and sentence.
`java.lang.String`	`classifyWithInlineXML(java.lang.String sentences)` Classify the contents of a `String`.
`boolean`	`countResults(java.util.List<IN> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)` Count results using a method appropriate for the tag scheme being used.
`static boolean`	`countResultsSegmenter(java.util.List<? extends CoreMap> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)`
`DocumentReaderAndWriter<IN>`	`defaultReaderAndWriter()` This is the DocumentReaderAndWriter used for reading training and testing files.
`void`	`dumpFeatures(java.util.Collection<java.util.List<IN>> documents)` Does nothing by default.
`void`	`finalizeClassification(CoreMap document)` Classification is finished for the document.
`java.util.Set<java.lang.String>`	`getKnownLCWords()`
`Sampler<java.util.List<IN>>`	`getSampler(java.util.List<IN> input)`
`SequenceModel`	`getSequenceModel(java.util.List<IN> doc)`
`java.util.Set<java.lang.String>`	`labels()`
`void`	`loadClassifier(java.io.File file)`
`void`	`loadClassifier(java.io.File file, java.util.Properties props)` Loads a classifier from the file specified.
`void`	`loadClassifier(java.io.InputStream in)` Load a classifier from the specified InputStream.
`void`	`loadClassifier(java.io.InputStream in, java.util.Properties props)` Load a classifier from the specified InputStream.
`abstract void`	`loadClassifier(java.io.ObjectInputStream in, java.util.Properties props)` Load a classifier from the specified input stream.
`void`	`loadClassifier(java.lang.String loadPath)` Loads a classifier from the file specified by loadPath.
`void`	`loadClassifier(java.lang.String loadPath, java.util.Properties props)` Loads a classifier from the file, classpath resource, or URL specified by loadPath.
`void`	`loadClassifierNoExceptions(java.io.File file)`
`void`	`loadClassifierNoExceptions(java.io.File file, java.util.Properties props)`
`void`	`loadClassifierNoExceptions(java.io.InputStream in, java.util.Properties props)` Loads a classifier from the given input stream.
`void`	`loadClassifierNoExceptions(java.lang.String loadPath)`
`void`	`loadClassifierNoExceptions(java.lang.String loadPath, java.util.Properties props)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromFile(java.lang.String filename)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromFile(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromFiles(java.util.Collection<java.io.File> files, DocumentReaderAndWriter<IN> readerAndWriter)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromFiles(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromFiles(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerAndWriter)`
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromReader(java.io.BufferedReader in, DocumentReaderAndWriter<IN> readerAndWriter)` Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader.
`ObjectBank<java.util.List<IN>>`	`makeObjectBankFromString(java.lang.String string, DocumentReaderAndWriter<IN> readerAndWriter)` Reads a String into an ObjectBank object.
`DocumentReaderAndWriter<IN>`	`makePlainTextReaderAndWriter()`
`static <INN extends CoreMap> DocumentReaderAndWriter<INN>`	`makePlainTextReaderAndWriter(SeqClassifierFlags flags)` Makes a DocumentReaderAndWriter based on flags.plainTextReaderAndWriter.
`DocumentReaderAndWriter<IN>`	`makeReaderAndWriter()` Makes a DocumentReaderAndWriter based on the flags the CRFClassifier was constructed with.
`DocumentReaderAndWriter<IN>`	`plainTextReaderAndWriter()` This is the default DocumentReaderAndWriter used for reading text files for runtime classification.
`protected void`	`printFeatureLists(IN wi, java.util.Collection<java.util.Collection<java.lang.String>> features)` Print the String features generated from a token.
`protected void`	`printFeatures(IN wi, java.util.Collection<java.lang.String> features)` Print the String features generated from a IN
`void`	`printProbs(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter)` Takes the files, reads them in, and prints out the likelihood of each possible label at each point.
`void`	`printProbs(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)` Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
`Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>>`	`printProbsDocument(java.util.List<IN> document)`
`void`	`printProbsDocuments(ObjectBank<java.util.List<IN>> documents)` Takes a `List` of documents and prints the likelihood of each possible label at each point.
`static Triple<java.lang.Double,java.lang.Double,java.lang.Double>`	`printResults(Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)` Given counters of true positives, false positives, and false negatives, prints out precision, recall, and f1 for each key.
`protected void`	`reinit()` This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier.
`java.util.List<java.lang.String>`	`segmentString(java.lang.String sentence)` Have a word segmenter segment a String into a list of words.
`java.util.List<java.lang.String>`	`segmentString(java.lang.String sentence, DocumentReaderAndWriter<IN> readerAndWriter)`
`abstract void`	`serializeClassifier(java.io.ObjectOutputStream oos)` Serialize a sequence classifier to an object output stream
`abstract void`	`serializeClassifier(java.lang.String serializePath)` Serialize a sequence classifier to a file on the given path.
`void`	`train()` Train the classifier based on values in flags.
`void`	`train(java.util.Collection<java.util.List<IN>> docs)` Trains a classifier from a Collection of sequences.
`abstract void`	`train(java.util.Collection<java.util.List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)` Trains a classifier from a Collection of sequences.
`void`	`train(java.lang.String filename)`
`void`	`train(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)`
`void`	`train(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)`
`void`	`train(java.lang.String baseTrainDir, java.lang.String trainFiles, DocumentReaderAndWriter<IN> readerAndWriter)`
`int`	`windowSize()`
`void`	`writeAnswers(java.util.List<IN> doc, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter)` Write the classifications of the Sequence classifier to a writer in a format determined by the DocumentReaderAndWriter used.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose, identity

- Field Detail
  - flags
```
public SeqClassifierFlags flags
```
  - classIndex
```
public Index<java.lang.String> classIndex
```
  - featureFactories
```
public java.util.List<FeatureFactory<IN extends CoreMap>> featureFactories
```
    Support multiple feature factories (NERFeatureFactory, EmbeddingFeatureFactory) - Thang Sep 13, 2013.
  - pad
```
protected IN extends CoreMap pad
```
  - windowSize
```
public int windowSize
```
  - knownLCWords
```
protected MaxSizeConcurrentHashSet<java.lang.String> knownLCWords
```
    Different threads can add or query knownLCWords at the same time, so we need a concurrent data structure. Created in reinit().
- Constructor Detail
  - AbstractSequenceClassifier
```
public AbstractSequenceClassifier(java.util.Properties props)
```
    Construct a SeqClassifierFlags object based on the passed in properties, and then call the other constructor.
    
    Parameters:
    
    props - See SeqClassifierFlags for known properties.
  - AbstractSequenceClassifier
```
public AbstractSequenceClassifier(SeqClassifierFlags flags)
```
    Initialize the featureFactory and other variables based on the passed in flags.
    
    Parameters:
    
    flags - A specification of the AbstractSequenceClassifier to construct.
- Method Detail
  - defaultReaderAndWriter
```
public DocumentReaderAndWriter<IN> defaultReaderAndWriter()
```
    This is the DocumentReaderAndWriter used for reading training and testing files. It is the DocumentReaderAndWriter specified by the readerAndWriter flag and defaults to edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter which is suitable for reading CoNLL-style TSV files.
    
    Returns:
    
    The default DocumentReaderAndWriter
  - plainTextReaderAndWriter
```
public DocumentReaderAndWriter<IN> plainTextReaderAndWriter()
```
    This is the default DocumentReaderAndWriter used for reading text files for runtime classification. It is the DocumentReaderAndWriter specified by the plainTextDocumentReaderAndWriter flag and defaults to edu.stanford.nlp.sequences.PlainTextDocumentReaderAndWriter which is suitable for reading plain text files, in languages with a Tokenizer available. This reader is now allocated lazily when required, since many times (such as when using AbstractSequenceClassifiers in StanfordCoreNLP, these DocumentReaderAndWriters are never used. Synchronized for safe lazy initialization.
    
    Returns:
    
    The default plain text DocumentReaderAndWriter
  - reinit
```
protected final void reinit()
```
    This method should be called after there have been changes to the flags (SeqClassifierFlags) variable, such as after deserializing a classifier. It is called inside the loadClassifier methods. It assumes that the flags variable and the pad variable exist, but reinitializes things like the pad variable, featureFactory and readerAndWriter based on the flags.
    Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?
  - getKnownLCWords
```
public java.util.Set<java.lang.String> getKnownLCWords()
```
  - makeReaderAndWriter
```
public DocumentReaderAndWriter<IN> makeReaderAndWriter()
```
    Makes a DocumentReaderAndWriter based on the flags the CRFClassifier was constructed with. Will create an instance of the class specified in the property flags.readerAndWriter and initialize it with the CRFClassifier's flags.
    
    Returns:
    
    The appropriate ReaderAndWriter for training/testing this classifier
  - makePlainTextReaderAndWriter
```
public static <INN extends CoreMap> DocumentReaderAndWriter<INN> makePlainTextReaderAndWriter(SeqClassifierFlags flags)
```
    Makes a DocumentReaderAndWriter based on flags.plainTextReaderAndWriter. Useful for reading in untokenized text documents or reading plain text from the command line. An example of a way to use this would be to return a edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter for the Chinese Segmenter.
  - makePlainTextReaderAndWriter
```
public DocumentReaderAndWriter<IN> makePlainTextReaderAndWriter()
```
  - backgroundSymbol
```
public java.lang.String backgroundSymbol()
```
    Returns the background class for the classifier.
    
    Returns:
    
    The background class name
  - labels
```
public java.util.Set<java.lang.String> labels()
```
  - classifySentence
```
public java.util.List<IN> classifySentence(java.util.List<? extends HasWord> tokenSequence)
```
    Classify a List of IN. This method returns a new list of tokens, not the list of tokens passed in, and runs the new tokens through ObjectBankWrapper. (Both these behaviors are different from that of the classify(List) method.
    
    Parameters:
    
    tokenSequence - The List of IN to be classified.
    
    Returns:
    
    The classified List of IN, where the classifier output for each token is stored in its CoreAnnotations.AnswerAnnotation field.
  - classifySentenceWithGlobalInformation
```
public java.util.List<IN> classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence,
                                                                CoreMap doc,
                                                                CoreMap sentence)
```
    Classify a List of IN using whatever additional information is passed in globalInfo. Used by SUTime (NumberSequenceClassifier), which requires the doc date to resolve relative dates.
    
    Parameters:
    
    tokenSequence - The List of IN to be classified.
    
    Returns:
    
    The classified List of IN, where the classifier output for each token is stored in its "answer" field.
  - getSequenceModel
```
public SequenceModel getSequenceModel(java.util.List<IN> doc)
```
  - getSampler
```
public Sampler<java.util.List<IN>> getSampler(java.util.List<IN> input)
```
  - classifyKBest
```
public Counter<java.util.List<IN>> classifyKBest(java.util.List<IN> doc,
                                                 java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField,
                                                 int k)
```
    Takes a list of tokens and provides the K best sequence labelings of these tokens with their scores.
    
    Parameters:
    
    doc - The List of tokens
    
    answerField - The key for each token into which the label for the token will be written
    
    k - The number of best sequence labelings to generate
    
    Returns:
    
    A Counter where each key is a List of tokens with labels written in the answerField and its value is the score (conditional probability) assigned to this labeling of the sequence.
  - classify
```
public java.util.List<java.util.List<IN>> classify(java.lang.String str)
```
    Classify the tokens in a String. Each sentence becomes a separate document.
    
    Parameters:
    
    str - A String with tokens in one or more sentences of text to be classified.
    
    Returns:
    
    List of classified sentences (each a List of something that extends CoreMap).
  - classifyRaw
```
public java.util.List<java.util.List<IN>> classifyRaw(java.lang.String str,
                                                      DocumentReaderAndWriter<IN> readerAndWriter)
```
    Classify the tokens in a String. Each sentence becomes a separate document. Doesn't override default readerAndWriter.
    
    Parameters:
    
    str - A String with tokens in one or more sentences of text to be classified.
    
    Returns:
    
    List of classified sentences (each a List of something that extends CoreMap).
  - classifyFile
```
public java.util.List<java.util.List<IN>> classifyFile(java.lang.String filename)
```
    Classify the contents of a file.
    
    Parameters:
    
    filename - Contains the sentence(s) to be classified.
    
    Returns:
    
    List of classified List of IN.
  - apply
```
public java.lang.String apply(java.lang.String in)
```
    Maps a String input to an XML-formatted rendition of applying NER to the String. Implements the Function interface. Calls classifyWithInlineXML(String) [q.v.].
    
    Specified by:
    
    apply in interface java.util.function.Function<java.lang.String,java.lang.String>
  - classifyToString
```
public java.lang.String classifyToString(java.lang.String sentences,
                                         java.lang.String outputFormat,
                                         boolean preserveSpacing)
```
    Classify the contents of a String to one of several String representations that shows the classes. Plain text or XML input is expected and the PlainTextDocumentReaderAndWriter is used. The classifier will tokenize the text and treat each sentence as a separate document. The output can be specified to be in a choice of three formats: slashTags (e.g., Bill/PERSON Smith/PERSON died/O ./O), inlineXML (e.g., <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .), or xml, for stand-off XML (e.g., <wi num="0" entity="PERSON">Sue</wi> <wi num="1" entity="O">shouted</wi> ). There is also a binary choice as to whether the spacing between tokens of the original is preserved or whether the (tagged) tokens are printed with a single space (for inlineXML or slashTags) or a single newline (for xml) between each one.
    Fine points: The slashTags and xml formats show tokens as transformed by any normalization processes inside the tokenizer, while inlineXML shows the tokens exactly as they appeared in the source text. When a period counts as both part of an abbreviation and as an end of sentence marker, it is included twice in the output String for slashTags or xml, but only once for inlineXML, where it is not counted as part of the abbreviation (or any named entity it is part of). For slashTags with preserveSpacing=true, there will be two successive periods such as "Jr.." The tokenized (preserveSpacing=false) output will have a space or a newline after the last token.
    
    Parameters:
    
    sentences - The String to be classified. It will be tokenized and divided into documents according to (heuristically determined) sentence boundaries.
    
    outputFormat - The format to put the output in: one of "slashTags", "xml", "inlineXML", "tsv", or "tabbedEntities"
    
    preserveSpacing - Whether to preserve the input spacing between tokens, which may sometimes be none (true) or whether to tokenize the text and print it with one space between each token (false)
    
    Returns:
    
    A String with annotated with classification information.
  - classifyWithInlineXML
```
public java.lang.String classifyWithInlineXML(java.lang.String sentences)
```
    Classify the contents of a String. Plain text or XML is expected and the PlainTextDocumentReaderAndWriter is used by default. The classifier will treat each sentence as a separate document. The output can be specified to be in a choice of formats: Output is in inline XML format (e.g., <PERSON>Bill Smith</PERSON> went to <LOCATION>Paris</LOCATION> .)
    
    Parameters:
    
    sentences - The string to be classified
    
    Returns:
    
    A String with annotated with classification information.
  - classifyToString
```
public java.lang.String classifyToString(java.lang.String sentences)
```
    Classify the contents of a String to a tagged word/class String. Plain text or XML input is expected and the PlainTextDocumentReaderAndWriter is used by default. Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./O
    
    Parameters:
    
    sentences - The String to be classified
    
    Returns:
    
    A String annotated with classification information.
  - classifyToCharacterOffsets
```
public java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>> classifyToCharacterOffsets(java.lang.String sentences)
```
    Classify the contents of a String to classified character offset spans. Plain text or XML input text is expected and the PlainTextDocumentReaderAndWriter is used by default. Output is a (possibly empty, but not null) List of Triples. Each Triple is an entity name, followed by beginning and ending character offsets in the original String. Character offsets can be thought of as fenceposts between the characters, or, like certain methods in the Java String class, as character positions, numbered starting from 0, with the end index pointing to the position AFTER the entity ends. That is, end - start is the length of the entity in characters.
    Fine points: Token offsets are true wrt the source text, even though the tokenizer may internally normalize certain tokens to String representations of different lengths (e.g., " becoming `` or ''). When a period counts as both part of an abbreviation and as an end of sentence marker, and that abbreviation is part of a named entity, the reported entity string excludes the period.
    
    Parameters:
    
    sentences - The string to be classified
    
    Returns:
    
    A List of Triples, each of which gives an entity type and the beginning and ending character offsets.
  - segmentString
```
public java.util.List<java.lang.String> segmentString(java.lang.String sentence)
```
    Have a word segmenter segment a String into a list of words. ONLY USE IF YOU LOADED A CHINESE WORD SEGMENTER!!!!!
    
    Parameters:
    
    sentence - The string to be classified
    
    Returns:
    
    List of words
  - segmentString
```
public java.util.List<java.lang.String> segmentString(java.lang.String sentence,
                                                      DocumentReaderAndWriter<IN> readerAndWriter)
```
  - classify
```
public abstract java.util.List<IN> classify(java.util.List<IN> document)
```
    Classify a List of something that extendsCoreMap. The classifications are added in place to the items of the document, which is also returned by this method. Warning: In many circumstances, you should not call this method directly. In particular, if you call this method directly, your document will not be preprocessed to add things like word distributional similarity class or word shape features that your classifier may rely on to work correctly. In such cases, you should call classifySentence instead.
    
    Parameters:
    
    document - A List of something that extends CoreMap.
    
    Returns:
    
    The same List, but with the elements annotated with their answers (stored under the CoreAnnotations.AnswerAnnotation key). The answers will be the class labels defined by the CRF Classifier. They might be things like entity labels (in BIO notation or not) or something like "1" vs. "0" on whether to begin a new token here or not (in word segmentation).
  - classifyWithGlobalInformation
```
public abstract java.util.List<IN> classifyWithGlobalInformation(java.util.List<IN> tokenSequence,
                                                                 CoreMap document,
                                                                 CoreMap sentence)
```
    Classify a List of something that extends CoreMap using as additional information whatever is stored in the document and sentence. This is needed for SUTime (NumberSequenceClassifier), which requires the document date to resolve relative dates.
    
    Parameters:
    
    tokenSequence - A List of something that extends CoreMap
    
    document -
    
    sentence -
    
    Returns:
    
    Classified version of the input tokenSequence
  - finalizeClassification
```
public void finalizeClassification(CoreMap document)
```
    Classification is finished for the document. Do any cleanup (if information was stored as part of the document for global classification)
    
    Parameters:
    
    document -
  - train
```
public void train()
```
    Train the classifier based on values in flags. It will use the first of these variables that is defined: trainFiles (and baseTrainDir), trainFileList, trainFile.
  - train
```
public void train(java.lang.String filename)
```
  - train
```
public void train(java.lang.String filename,
                  DocumentReaderAndWriter<IN> readerAndWriter)
```
  - train
```
public void train(java.lang.String baseTrainDir,
                  java.lang.String trainFiles,
                  DocumentReaderAndWriter<IN> readerAndWriter)
```
  - train
```
public void train(java.lang.String[] trainFileList,
                  DocumentReaderAndWriter<IN> readerAndWriter)
```
  - train
```
public void train(java.util.Collection<java.util.List<IN>> docs)
```
    Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.
    
    Parameters:
    
    docs - An ObjectBank or a collection of sequences of IN
  - train
```
public abstract void train(java.util.Collection<java.util.List<IN>> docs,
                           DocumentReaderAndWriter<IN> readerAndWriter)
```
    Trains a classifier from a Collection of sequences. Note that the Collection can be (and usually is) an ObjectBank.
    
    Parameters:
    
    docs - An ObjectBank or a collection of sequences of IN
    
    readerAndWriter - A DocumentReaderAndWriter to use when loading test files
  - makeObjectBankFromString
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromString(java.lang.String string,
                                                               DocumentReaderAndWriter<IN> readerAndWriter)
```
    Reads a String into an ObjectBank object. NOTE: that the current implementation of ReaderIteratorFactory will first try to interpret each string as a filename, so this method will yield unwanted results if it applies to a string that is at the same time a filename. It prints out a warning, at least.
    
    Parameters:
    
    string - The String which will be the content of the ObjectBank
    
    Returns:
    
    The ObjectBank
  - makeObjectBankFromFile
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename)
```
  - makeObjectBankFromFile
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename,
                                                             DocumentReaderAndWriter<IN> readerAndWriter)
```
  - makeObjectBankFromFiles
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String[] trainFileList,
                                                              DocumentReaderAndWriter<IN> readerAndWriter)
```
  - makeObjectBankFromFiles
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String baseDir,
                                                              java.lang.String filePattern,
                                                              DocumentReaderAndWriter<IN> readerAndWriter)
```
  - makeObjectBankFromFiles
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.util.Collection<java.io.File> files,
                                                              DocumentReaderAndWriter<IN> readerAndWriter)
```
  - makeObjectBankFromReader
```
public ObjectBank<java.util.List<IN>> makeObjectBankFromReader(java.io.BufferedReader in,
                                                               DocumentReaderAndWriter<IN> readerAndWriter)
```
    Set up an ObjectBank that will allow one to iterate over a collection of documents obtained from the passed in Reader. Each document will be represented as a list of IN. If the ObjectBank iterator() is called until hasNext() returns false, then the Reader will be read till end of file, but no reading is done at the time of this call. Reading is done using the reading method specified in flags.documentReader, and for some reader choices, the column mapping given in flags.map.
    
    Parameters:
    
    in - Input data addNEWLCWords do we add new lowercase words from this data to the word shape classifier
    
    Returns:
    
    The list of documents
  - printProbs
```
public void printProbs(java.lang.String filename,
                       DocumentReaderAndWriter<IN> readerAndWriter)
```
    Takes the file, reads it in, and prints out the likelihood of each possible label at each point.
    
    Parameters:
    
    filename - The path to the specified file
  - printProbs
```
public void printProbs(java.util.Collection<java.io.File> testFiles,
                       DocumentReaderAndWriter<IN> readerWriter)
```
    Takes the files, reads them in, and prints out the likelihood of each possible label at each point.
    
    Parameters:
    
    testFiles - A Collection of files
  - printProbsDocuments
```
public void printProbsDocuments(ObjectBank<java.util.List<IN>> documents)
```
    Takes a List of documents and prints the likelihood of each possible label at each point. Also prints probability calibration information over document collection.
    
    Parameters:
    
    documents - A List of List of something that extends CoreMap.
  - classifyStdin
```
public void classifyStdin()
                   throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - classifyStdin
```
public void classifyStdin(DocumentReaderAndWriter<IN> readerWriter)
                   throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - printProbsDocument
```
public Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> printProbsDocument(java.util.List<IN> document)
```
  - dumpFeatures
```
public void dumpFeatures(java.util.Collection<java.util.List<IN>> documents)
```
    Does nothing by default. Subclasses can override if necessary.
  - classifyAndWriteAnswers
```
public void classifyAndWriteAnswers(java.lang.String textFile)
                             throws java.io.IOException
```
    Load a text file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.plainTextDocumentReaderAndWriter to determine how to read the textFile format. By default this gives edu.stanford.nlp.sequences.PlainTextDocumentReaderAndWriter. Note: This means that it works right for a plain textFile (and not a tab-separated columns test file).
    
    Parameters:
    
    textFile - The file to test on.
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format. By default, this means that it is set up to read a tab-separated columns test file
    
    Parameters:
    
    testFile - The file to test on.
    
    outputScores - Whether to calculate and then log performance scores (P/R/F1)
    
    Returns:
    
    A Triple of P/R/F1 if outputScores is true, else null
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile,
                                                                                          DocumentReaderAndWriter<IN> readerWriter,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr).
    
    Parameters:
    
    testFile - The file to test on.
    
    readerWriter - A reader and writer to use for the output
    
    outputScores - Whether to calculate and then log performance scores (P/R/F1)
    
    Returns:
    
    A Triple of P/R/F1 if outputScores is true, else null
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile,
                                                                                          java.io.OutputStream outStream,
                                                                                          DocumentReaderAndWriter<IN> readerWriter,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    If the flag outputEncoding is defined, the output is written in that character encoding, otherwise in the system default character encoding.
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String baseDir,
                                                                                          java.lang.String filePattern,
                                                                                          DocumentReaderAndWriter<IN> readerWriter,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - classifyFilesAndWriteAnswers
```
public void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> textFiles)
                                  throws java.io.IOException
```
    Run the classifier on a collection of text files. Uses the plainTextReaderAndWriter to process them.
    
    Parameters:
    
    textFiles - A File Collection to process.
    
    Throws:
    
    java.io.IOException - For any IO error
  - classifyFilesAndWriteAnswers
```
public void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles,
                                         DocumentReaderAndWriter<IN> readerWriter,
                                         boolean outputScores)
                                  throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
                                                                                          DocumentReaderAndWriter<IN> readerWriter,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswers
```
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
                                                                                          java.io.PrintWriter printWriter,
                                                                                          DocumentReaderAndWriter<IN> readerWriter,
                                                                                          boolean outputScores)
                                                                                   throws java.io.IOException
```
    Parameters:
    
    documents -
    
    printWriter -
    
    readerWriter -
    
    outputScores - Whether to calculate and output the performance scores (P/R/F1) of the classifier
    
    Returns:
    
    A Triple of overall P/R/F1, if outputScores is true, else null. The scores are done on a 0-100 scale like percentages.
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswersKBest
```
public void classifyAndWriteAnswersKBest(java.lang.String testFile,
                                         int k,
                                         DocumentReaderAndWriter<IN> readerAndWriter)
                                  throws java.io.IOException
```
    Load a test file, run the classifier on it, and then print the answers to stdout (with timing to stderr). This uses the value of flags.documentReader to determine testFile format.
    
    Parameters:
    
    testFile - The name of the file to test on.
    
    k - How many best to print
    
    readerAndWriter - Class to be used for printing answers
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteAnswersKBest
```
public void classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents,
                                         int k,
                                         java.io.PrintWriter printWriter,
                                         DocumentReaderAndWriter<IN> readerAndWriter)
                                  throws java.io.IOException
```
    Run the classifier on the documents in an ObjectBank, and print the answers to a given PrintWriter (with timing to stderr). The value of flags.documentReader is used to determine testFile format.
    
    Parameters:
    
    documents - The ObjectBank to test on.
    
    Throws:
    
    java.io.IOException
  - classifyAndWriteViterbiSearchGraph
```
public void classifyAndWriteViterbiSearchGraph(java.lang.String testFile,
                                               java.lang.String searchGraphPrefix,
                                               DocumentReaderAndWriter<IN> readerAndWriter)
                                        throws java.io.IOException
```
    Load a test file, run the classifier on it, and then write a Viterbi search graph for each sequence.
    
    Parameters:
    
    testFile - The file to test on.
    
    Throws:
    
    java.io.IOException
  - writeAnswers
```
public void writeAnswers(java.util.List<IN> doc,
                         java.io.PrintWriter printWriter,
                         DocumentReaderAndWriter<IN> readerAndWriter)
```
    Write the classifications of the Sequence classifier to a writer in a format determined by the DocumentReaderAndWriter used.
    
    Parameters:
    
    doc - Documents to write out
    
    printWriter - Writer to use for output
  - countResults
```
public boolean countResults(java.util.List<IN> doc,
                            Counter<java.lang.String> entityTP,
                            Counter<java.lang.String> entityFP,
                            Counter<java.lang.String> entityFN)
```
    Count results using a method appropriate for the tag scheme being used.
  - countResultsSegmenter
```
public static boolean countResultsSegmenter(java.util.List<? extends CoreMap> doc,
                                            Counter<java.lang.String> entityTP,
                                            Counter<java.lang.String> entityFP,
                                            Counter<java.lang.String> entityFN)
```
  - printResults
```
public static Triple<java.lang.Double,java.lang.Double,java.lang.Double> printResults(Counter<java.lang.String> entityTP,
                                                                                      Counter<java.lang.String> entityFP,
                                                                                      Counter<java.lang.String> entityFN)
```
    Given counters of true positives, false positives, and false negatives, prints out precision, recall, and f1 for each key.
  - serializeClassifier
```
public abstract void serializeClassifier(java.lang.String serializePath)
```
    Serialize a sequence classifier to a file on the given path.
    
    Parameters:
    
    serializePath - The path/filename to write the classifier to.
  - serializeClassifier
```
public abstract void serializeClassifier(java.io.ObjectOutputStream oos)
```
    Serialize a sequence classifier to an object output stream
  - loadClassifierNoExceptions
```
public void loadClassifierNoExceptions(java.io.InputStream in,
                                       java.util.Properties props)
```
    Loads a classifier from the given input stream. Any exceptions are rethrown as unchecked exceptions. This method does not close the InputStream.
    
    Parameters:
    
    in - The InputStream to read from
  - loadClassifier
```
public void loadClassifier(java.io.InputStream in)
                    throws java.io.IOException,
                           java.lang.ClassCastException,
                           java.lang.ClassNotFoundException
```
    Load a classifier from the specified InputStream. No extra properties are supplied. This does not close the InputStream.
    
    Parameters:
    
    in - The InputStream to load the serialized classifier from
    
    Throws:
    
    java.io.IOException - If there are problems accessing the input stream
    
    java.lang.ClassCastException - If there are problems interpreting the serialized data
    
    java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
  - loadClassifier
```
public void loadClassifier(java.io.InputStream in,
                           java.util.Properties props)
                    throws java.io.IOException,
                           java.lang.ClassCastException,
                           java.lang.ClassNotFoundException
```
    Load a classifier from the specified InputStream. The classifier is reinitialized from the flags serialized in the classifier. This does not close the InputStream.
    
    Parameters:
    
    in - The InputStream to load the serialized classifier from
    
    props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
    
    Throws:
    
    java.io.IOException - If there are problems accessing the input stream
    
    java.lang.ClassCastException - If there are problems interpreting the serialized data
    
    java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
  - loadClassifier
```
public abstract void loadClassifier(java.io.ObjectInputStream in,
                                    java.util.Properties props)
                             throws java.io.IOException,
                                    java.lang.ClassCastException,
                                    java.lang.ClassNotFoundException
```
    Load a classifier from the specified input stream. The classifier is reinitialized from the flags serialized in the classifier.
    
    Parameters:
    
    in - The InputStream to load the serialized classifier from
    
    props - This Properties object will be used to update the SeqClassifierFlags which are read from the serialized classifier
    
    Throws:
    
    java.io.IOException - If there are problems accessing the input stream
    
    java.lang.ClassCastException - If there are problems interpreting the serialized data
    
    java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
  - loadClassifier
```
public void loadClassifier(java.lang.String loadPath)
                    throws java.lang.ClassCastException,
                           java.io.IOException,
                           java.lang.ClassNotFoundException
```
    Loads a classifier from the file specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream.
    
    Throws:
    
    java.lang.ClassCastException
    
    java.io.IOException
    
    java.lang.ClassNotFoundException
  - loadClassifier
```
public void loadClassifier(java.lang.String loadPath,
                           java.util.Properties props)
                    throws java.lang.ClassCastException,
                           java.io.IOException,
                           java.lang.ClassNotFoundException
```
    Loads a classifier from the file, classpath resource, or URL specified by loadPath. If loadPath ends in .gz, uses a GZIPInputStream.
    
    Throws:
    
    java.lang.ClassCastException
    
    java.io.IOException
    
    java.lang.ClassNotFoundException
  - loadClassifierNoExceptions
```
public void loadClassifierNoExceptions(java.lang.String loadPath)
```
  - loadClassifierNoExceptions
```
public void loadClassifierNoExceptions(java.lang.String loadPath,
                                       java.util.Properties props)
```
  - loadClassifier
```
public void loadClassifier(java.io.File file)
                    throws java.lang.ClassCastException,
                           java.io.IOException,
                           java.lang.ClassNotFoundException
```
    Throws:
    
    java.lang.ClassCastException
    
    java.io.IOException
    
    java.lang.ClassNotFoundException
  - loadClassifier
```
public void loadClassifier(java.io.File file,
                           java.util.Properties props)
                    throws java.lang.ClassCastException,
                           java.io.IOException,
                           java.lang.ClassNotFoundException
```
    Loads a classifier from the file specified. If the file's name ends in .gz, uses a GZIPInputStream, else uses a regular FileInputStream. This method closes the File when done.
    
    Parameters:
    
    file - Loads a classifier from this file.
    
    props - Properties in this object will be used to overwrite those specified in the serialized classifier
    
    Throws:
    
    java.io.IOException - If there are problems accessing the input stream
    
    java.lang.ClassCastException - If there are problems interpreting the serialized data
    
    java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
  - loadClassifierNoExceptions
```
public void loadClassifierNoExceptions(java.io.File file)
```
  - loadClassifierNoExceptions
```
public void loadClassifierNoExceptions(java.io.File file,
                                       java.util.Properties props)
```
  - printFeatures
```
protected void printFeatures(IN wi,
                             java.util.Collection<java.lang.String> features)
```
    Print the String features generated from a IN
  - printFeatureLists
```
protected void printFeatureLists(IN wi,
                                 java.util.Collection<java.util.Collection<java.lang.String>> features)
```
    Print the String features generated from a token.
  - windowSize
```
public int windowSize()
```

Class AbstractSequenceClassifier<IN extends CoreMap>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Field Detail

flags

classIndex

featureFactories

pad

windowSize

knownLCWords

Constructor Detail

AbstractSequenceClassifier

AbstractSequenceClassifier

Method Detail

defaultReaderAndWriter

plainTextReaderAndWriter

reinit

getKnownLCWords

makeReaderAndWriter

makePlainTextReaderAndWriter

makePlainTextReaderAndWriter

backgroundSymbol

labels

classifySentence

classifySentenceWithGlobalInformation

getSequenceModel

getSampler

classifyKBest

classify

classifyRaw

classifyFile

apply

classifyToString

classifyWithInlineXML

classifyToString

classifyToCharacterOffsets

segmentString

segmentString

classify

classifyWithGlobalInformation

finalizeClassification

train

train

train

train

train

train

train

makeObjectBankFromString

makeObjectBankFromFile

makeObjectBankFromFile

makeObjectBankFromFiles

makeObjectBankFromFiles

makeObjectBankFromFiles

makeObjectBankFromReader

printProbs

printProbs

printProbsDocuments

classifyStdin

classifyStdin

printProbsDocument

dumpFeatures

classifyAndWriteAnswers

classifyAndWriteAnswers

classifyAndWriteAnswers

classifyAndWriteAnswers

classifyAndWriteAnswers

classifyFilesAndWriteAnswers

classifyFilesAndWriteAnswers

classifyAndWriteAnswers

classifyAndWriteAnswers

classifyAndWriteAnswersKBest

classifyAndWriteAnswersKBest

classifyAndWriteViterbiSearchGraph

writeAnswers

countResults

countResultsSegmenter