public abstract class AbstractSequenceClassifier<IN extends CoreMap>
extends java.lang.Object
implements java.util.function.Function<java.lang.String,java.lang.String>
List<IN> classify(List<IN> document);
List<IN> classifyWithGlobalInformation(List<IN> tokenSequence, final CoreMap document, final CoreMap sentence);
void train(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter);
void serializeClassifier(String serializePath);
void loadClassifier(ObjectInputStream in, Properties props) throws IOException,
ClassCastException, ClassNotFoundException;
Pair<Counter<Integer>, TwoDimensionalCounter<Integer,String>> printProbsDocument(List<CoreLabel> document);
Modifier and Type | Field and Description |
---|---|
Index<java.lang.String> |
classIndex |
java.util.List<FeatureFactory<IN>> |
featureFactories
Support multiple feature factories (NERFeatureFactory, EmbeddingFeatureFactory) - Thang Sep 13, 2013.
|
SeqClassifierFlags |
flags |
protected MaxSizeConcurrentHashSet<java.lang.String> |
knownLCWords
Different threads can add or query knownLCWords at the same time,
so we need a concurrent data structure.
|
protected IN |
pad |
int |
windowSize |
Constructor and Description |
---|
AbstractSequenceClassifier(java.util.Properties props)
Construct a SeqClassifierFlags object based on the passed in properties,
and then call the other constructor.
|
AbstractSequenceClassifier(SeqClassifierFlags flags)
Initialize the featureFactory and other variables based on the passed in
flags.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
apply(java.lang.String in)
Maps a String input to an XML-formatted rendition of applying NER to the
String.
|
java.lang.String |
backgroundSymbol()
Returns the background class for the classifier.
|
abstract java.util.List<IN> |
classify(java.util.List<IN> document)
Classify a
List of something that extendsCoreMap . |
java.util.List<java.util.List<IN>> |
classify(java.lang.String str)
Classify the tokens in a String.
|
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
void |
classifyAndWriteAnswers(java.lang.String textFile)
Load a text file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.lang.String testFile,
boolean outputScores)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.lang.String testFile,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.lang.String testFile,
java.io.OutputStream outStream,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores)
If the flag
outputEncoding is defined, the output is written in that
character encoding, otherwise in the system default character encoding. |
Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
classifyAndWriteAnswers(java.lang.String baseDir,
java.lang.String filePattern,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
void |
classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents,
int k,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerAndWriter)
Run the classifier on the documents in an ObjectBank, and print the
answers to a given PrintWriter (with timing to stderr).
|
void |
classifyAndWriteAnswersKBest(java.lang.String testFile,
int k,
DocumentReaderAndWriter<IN> readerAndWriter)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
void |
classifyAndWriteViterbiSearchGraph(java.lang.String testFile,
java.lang.String searchGraphPrefix,
DocumentReaderAndWriter<IN> readerAndWriter)
Load a test file, run the classifier on it, and then write a Viterbi search
graph for each sequence.
|
java.util.List<java.util.List<IN>> |
classifyFile(java.lang.String filename)
Classify the contents of a file.
|
void |
classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> textFiles)
Run the classifier on a collection of text files.
|
void |
classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
Counter<java.util.List<IN>> |
classifyKBest(java.util.List<IN> doc,
java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField,
int k)
Takes a list of tokens and provides the K best sequence labelings of these tokens with their scores.
|
java.util.List<java.util.List<IN>> |
classifyRaw(java.lang.String str,
DocumentReaderAndWriter<IN> readerAndWriter)
Classify the tokens in a String.
|
java.util.List<IN> |
classifySentence(java.util.List<? extends HasWord> tokenSequence)
Classify a List of IN.
|
java.util.List<IN> |
classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence,
CoreMap doc,
CoreMap sentence)
Classify a List of IN using whatever additional information is passed in globalInfo.
|
void |
classifyStdin() |
void |
classifyStdin(DocumentReaderAndWriter<IN> readerWriter) |
java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>> |
classifyToCharacterOffsets(java.lang.String sentences)
Classify the contents of a
String to classified character offset
spans. |
java.lang.String |
classifyToString(java.lang.String sentences)
Classify the contents of a String to a tagged word/class String.
|
java.lang.String |
classifyToString(java.lang.String sentences,
java.lang.String outputFormat,
boolean preserveSpacing)
Classify the contents of a
String to one of several String
representations that shows the classes. |
abstract java.util.List<IN> |
classifyWithGlobalInformation(java.util.List<IN> tokenSequence,
CoreMap document,
CoreMap sentence)
Classify a
List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
java.lang.String |
classifyWithInlineXML(java.lang.String sentences)
Classify the contents of a
String . |
boolean |
countResults(java.util.List<IN> doc,
Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN)
Count results using a method appropriate for the tag scheme being used.
|
static boolean |
countResultsSegmenter(java.util.List<? extends CoreMap> doc,
Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN) |
DocumentReaderAndWriter<IN> |
defaultReaderAndWriter()
This is the DocumentReaderAndWriter used for reading training and testing files.
|
void |
dumpFeatures(java.util.Collection<java.util.List<IN>> documents)
Does nothing by default.
|
void |
finalizeClassification(CoreMap document)
Classification is finished for the document.
|
java.util.Set<java.lang.String> |
getKnownLCWords() |
Sampler<java.util.List<IN>> |
getSampler(java.util.List<IN> input) |
SequenceModel |
getSequenceModel(java.util.List<IN> doc) |
java.util.Set<java.lang.String> |
labels() |
void |
loadClassifier(java.io.File file) |
void |
loadClassifier(java.io.File file,
java.util.Properties props)
Loads a classifier from the file specified.
|
void |
loadClassifier(java.io.InputStream in)
Load a classifier from the specified InputStream.
|
void |
loadClassifier(java.io.InputStream in,
java.util.Properties props)
Load a classifier from the specified InputStream.
|
abstract void |
loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
Load a classifier from the specified input stream.
|
void |
loadClassifier(java.lang.String loadPath)
Loads a classifier from the file specified by loadPath.
|
void |
loadClassifier(java.lang.String loadPath,
java.util.Properties props)
Loads a classifier from the file, classpath resource, or URL specified by loadPath.
|
void |
loadClassifierNoExceptions(java.io.File file) |
void |
loadClassifierNoExceptions(java.io.File file,
java.util.Properties props) |
void |
loadClassifierNoExceptions(java.io.InputStream in,
java.util.Properties props)
Loads a classifier from the given input stream.
|
void |
loadClassifierNoExceptions(java.lang.String loadPath) |
void |
loadClassifierNoExceptions(java.lang.String loadPath,
java.util.Properties props) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFile(java.lang.String filename) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFile(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.util.Collection<java.io.File> files,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.lang.String[] trainFileList,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.lang.String baseDir,
java.lang.String filePattern,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromReader(java.io.BufferedReader in,
DocumentReaderAndWriter<IN> readerAndWriter)
Set up an ObjectBank that will allow one to iterate over a collection of
documents obtained from the passed in Reader.
|
ObjectBank<java.util.List<IN>> |
makeObjectBankFromString(java.lang.String string,
DocumentReaderAndWriter<IN> readerAndWriter)
Reads a String into an ObjectBank object.
|
DocumentReaderAndWriter<IN> |
makePlainTextReaderAndWriter() |
static <INN extends CoreMap> |
makePlainTextReaderAndWriter(SeqClassifierFlags flags)
Makes a DocumentReaderAndWriter based on
flags.plainTextReaderAndWriter.
|
DocumentReaderAndWriter<IN> |
makeReaderAndWriter()
Makes a DocumentReaderAndWriter based on the flags the CRFClassifier
was constructed with.
|
DocumentReaderAndWriter<IN> |
plainTextReaderAndWriter()
This is the default DocumentReaderAndWriter used for reading text files for runtime
classification.
|
protected void |
printFeatureLists(IN wi,
java.util.Collection<java.util.Collection<java.lang.String>> features)
Print the String features generated from a token.
|
protected void |
printFeatures(IN wi,
java.util.Collection<java.lang.String> features)
Print the String features generated from a IN
|
void |
printProbs(java.util.Collection<java.io.File> testFiles,
DocumentReaderAndWriter<IN> readerWriter)
Takes the files, reads them in, and prints out the likelihood of each possible
label at each point.
|
void |
printProbs(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter)
Takes the file, reads it in, and prints out the likelihood of each possible
label at each point.
|
Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> |
printProbsDocument(java.util.List<IN> document) |
void |
printProbsDocuments(ObjectBank<java.util.List<IN>> documents)
Takes a
List of documents and prints the likelihood of each
possible label at each point. |
static Triple<java.lang.Double,java.lang.Double,java.lang.Double> |
printResults(Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN)
Given counters of true positives, false positives, and false
negatives, prints out precision, recall, and f1 for each key.
|
protected void |
reinit()
This method should be called after there have been changes to the flags
(SeqClassifierFlags) variable, such as after deserializing a classifier.
|
java.util.List<java.lang.String> |
segmentString(java.lang.String sentence)
Have a word segmenter segment a String into a list of words.
|
java.util.List<java.lang.String> |
segmentString(java.lang.String sentence,
DocumentReaderAndWriter<IN> readerAndWriter) |
abstract void |
serializeClassifier(java.io.ObjectOutputStream oos)
Serialize a sequence classifier to an object output stream
|
abstract void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path.
|
void |
train()
Train the classifier based on values in flags.
|
void |
train(java.util.Collection<java.util.List<IN>> docs)
Trains a classifier from a Collection of sequences.
|
abstract void |
train(java.util.Collection<java.util.List<IN>> docs,
DocumentReaderAndWriter<IN> readerAndWriter)
Trains a classifier from a Collection of sequences.
|
void |
train(java.lang.String filename) |
void |
train(java.lang.String[] trainFileList,
DocumentReaderAndWriter<IN> readerAndWriter) |
void |
train(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter) |
void |
train(java.lang.String baseTrainDir,
java.lang.String trainFiles,
DocumentReaderAndWriter<IN> readerAndWriter) |
int |
windowSize() |
void |
writeAnswers(java.util.List<IN> doc,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerAndWriter)
Write the classifications of the Sequence classifier to a writer in a
format determined by the DocumentReaderAndWriter used.
|
public SeqClassifierFlags flags
public Index<java.lang.String> classIndex
public java.util.List<FeatureFactory<IN extends CoreMap>> featureFactories
public int windowSize
protected MaxSizeConcurrentHashSet<java.lang.String> knownLCWords
public AbstractSequenceClassifier(java.util.Properties props)
props
- See SeqClassifierFlags for known properties.public AbstractSequenceClassifier(SeqClassifierFlags flags)
flags
- A specification of the AbstractSequenceClassifier to construct.public DocumentReaderAndWriter<IN> defaultReaderAndWriter()
edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter
which
is suitable for reading CoNLL-style TSV files.public DocumentReaderAndWriter<IN> plainTextReaderAndWriter()
edu.stanford.nlp.sequences.PlainTextDocumentReaderAndWriter
which
is suitable for reading plain text files, in languages with a Tokenizer available.
This reader is now allocated lazily when required, since many times (such as when using
AbstractSequenceClassifiers in StanfordCoreNLP, these DocumentReaderAndWriters are never used.
Synchronized for safe lazy initialization.protected final void reinit()
Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?
public java.util.Set<java.lang.String> getKnownLCWords()
public DocumentReaderAndWriter<IN> makeReaderAndWriter()
public static <INN extends CoreMap> DocumentReaderAndWriter<INN> makePlainTextReaderAndWriter(SeqClassifierFlags flags)
public DocumentReaderAndWriter<IN> makePlainTextReaderAndWriter()
public java.lang.String backgroundSymbol()
public java.util.Set<java.lang.String> labels()
public java.util.List<IN> classifySentence(java.util.List<? extends HasWord> tokenSequence)
tokenSequence
- The List of IN to be classified.CoreAnnotations.AnswerAnnotation
field.public java.util.List<IN> classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence, CoreMap doc, CoreMap sentence)
tokenSequence
- The List of IN to be classified.public SequenceModel getSequenceModel(java.util.List<IN> doc)
public Counter<java.util.List<IN>> classifyKBest(java.util.List<IN> doc, java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField, int k)
doc
- The List of tokensanswerField
- The key for each token into which the label for the token will be writtenk
- The number of best sequence labelings to generatepublic java.util.List<java.util.List<IN>> classify(java.lang.String str)
str
- A String with tokens in one or more sentences of text to be
classified.List
of classified sentences (each a List of something that
extends CoreMap
).public java.util.List<java.util.List<IN>> classifyRaw(java.lang.String str, DocumentReaderAndWriter<IN> readerAndWriter)
str
- A String with tokens in one or more sentences of text to be classified.List
of classified sentences (each a List of something that
extends CoreMap
).public java.util.List<java.util.List<IN>> classifyFile(java.lang.String filename)
filename
- Contains the sentence(s) to be classified.List
of classified List of IN.public java.lang.String apply(java.lang.String in)
apply
in interface java.util.function.Function<java.lang.String,java.lang.String>
public java.lang.String classifyToString(java.lang.String sentences, java.lang.String outputFormat, boolean preserveSpacing)
String
to one of several String
representations that shows the classes. Plain text or XML input is expected
and the PlainTextDocumentReaderAndWriter
is used. The classifier
will tokenize the text and treat each sentence as a separate document. The
output can be specified to be in a choice of three formats: slashTags
(e.g., Bill/PERSON Smith/PERSON died/O ./O), inlineXML (e.g.,
<PERSON>Bill Smith</PERSON> went to
<LOCATION>Paris</LOCATION> .), or xml, for stand-off XML (e.g.,
<wi num="0" entity="PERSON">Sue</wi> <wi num="1"
entity="O">shouted</wi> ). There is also a binary choice as to
whether the spacing between tokens of the original is preserved or whether
the (tagged) tokens are printed with a single space (for inlineXML or
slashTags) or a single newline (for xml) between each one.
Fine points: The slashTags and xml formats show tokens as transformed by any normalization processes inside the tokenizer, while inlineXML shows the tokens exactly as they appeared in the source text. When a period counts as both part of an abbreviation and as an end of sentence marker, it is included twice in the output String for slashTags or xml, but only once for inlineXML, where it is not counted as part of the abbreviation (or any named entity it is part of). For slashTags with preserveSpacing=true, there will be two successive periods such as "Jr.." The tokenized (preserveSpacing=false) output will have a space or a newline after the last token.
sentences
- The String to be classified. It will be tokenized and
divided into documents according to (heuristically
determined) sentence boundaries.outputFormat
- The format to put the output in: one of "slashTags", "xml",
"inlineXML", "tsv", or "tabbedEntities"preserveSpacing
- Whether to preserve the input spacing between tokens, which may
sometimes be none (true) or whether to tokenize the text and print
it with one space between each token (false)String
with annotated with classification information.public java.lang.String classifyWithInlineXML(java.lang.String sentences)
String
. Plain text or XML is expected
and the PlainTextDocumentReaderAndWriter
is used by default.
The classifier will treat each sentence as a separate document. The output can be
specified to be in a choice of formats: Output is in inline XML format
(e.g., <PERSON>Bill Smith</PERSON> went to
<LOCATION>Paris</LOCATION> .)sentences
- The string to be classifiedString
with annotated with classification information.public java.lang.String classifyToString(java.lang.String sentences)
PlainTextDocumentReaderAndWriter
is used by default.
Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./Osentences
- The String to be classifiedpublic java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>> classifyToCharacterOffsets(java.lang.String sentences)
String
to classified character offset
spans. Plain text or XML input text is expected and the
PlainTextDocumentReaderAndWriter
is used by default.
Output is a (possibly
empty, but not null
) List of Triples. Each Triple is an entity
name, followed by beginning and ending character offsets in the original
String. Character offsets can be thought of as fenceposts between the
characters, or, like certain methods in the Java String class, as character
positions, numbered starting from 0, with the end index pointing to the
position AFTER the entity ends. That is, end - start is the length of the
entity in characters.
Fine points: Token offsets are true wrt the source text, even though the tokenizer may internally normalize certain tokens to String representations of different lengths (e.g., " becoming `` or ''). When a period counts as both part of an abbreviation and as an end of sentence marker, and that abbreviation is part of a named entity, the reported entity string excludes the period.
sentences
- The string to be classifiedList
of Triple
s, each of which gives an entity
type and the beginning and ending character offsets.public java.util.List<java.lang.String> segmentString(java.lang.String sentence)
sentence
- The string to be classifiedpublic java.util.List<java.lang.String> segmentString(java.lang.String sentence, DocumentReaderAndWriter<IN> readerAndWriter)
public abstract java.util.List<IN> classify(java.util.List<IN> document)
List
of something that extendsCoreMap
.
The classifications are added in place to the items of the document,
which is also returned by this method.
Warning: In many circumstances, you should not call this method directly.
In particular, if you call this method directly, your document will not be preprocessed
to add things like word distributional similarity class or word shape features that your
classifier may rely on to work correctly. In such cases, you should call
classifySentence
instead.document
- A List
of something that extends CoreMap
.List
, but with the elements annotated with their
answers (stored under the
CoreAnnotations.AnswerAnnotation
key). The answers will be the class labels defined by the CRF
Classifier. They might be things like entity labels (in BIO
notation or not) or something like "1" vs. "0" on whether to
begin a new token here or not (in word segmentation).public abstract java.util.List<IN> classifyWithGlobalInformation(java.util.List<IN> tokenSequence, CoreMap document, CoreMap sentence)
List
of something that extends CoreMap
using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.tokenSequence
- A List
of something that extends CoreMap
document
- sentence
- public void finalizeClassification(CoreMap document)
document
- public void train()
public void train(java.lang.String filename)
public void train(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.lang.String baseTrainDir, java.lang.String trainFiles, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.util.Collection<java.util.List<IN>> docs)
docs
- An ObjectBank or a collection of sequences of INpublic abstract void train(java.util.Collection<java.util.List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)
docs
- An ObjectBank or a collection of sequences of INreaderAndWriter
- A DocumentReaderAndWriter to use when loading test filespublic ObjectBank<java.util.List<IN>> makeObjectBankFromString(java.lang.String string, DocumentReaderAndWriter<IN> readerAndWriter)
string
- The String which will be the content of the ObjectBankpublic ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.util.Collection<java.io.File> files, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromReader(java.io.BufferedReader in, DocumentReaderAndWriter<IN> readerAndWriter)
flags.documentReader
, and for some
reader choices, the column mapping given in flags.map
.in
- Input data addNEWLCWords do we add new lowercase words from this
data to the word shape classifierpublic void printProbs(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
filename
- The path to the specified filepublic void printProbs(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter)
testFiles
- A Collection of filespublic void printProbsDocuments(ObjectBank<java.util.List<IN>> documents)
List
of documents and prints the likelihood of each
possible label at each point. Also prints probability calibration information over document collection.documents
- A List
of List
of something that extends
CoreMap
.public void classifyStdin() throws java.io.IOException
java.io.IOException
public void classifyStdin(DocumentReaderAndWriter<IN> readerWriter) throws java.io.IOException
java.io.IOException
public Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> printProbsDocument(java.util.List<IN> document)
public void dumpFeatures(java.util.Collection<java.util.List<IN>> documents)
public void classifyAndWriteAnswers(java.lang.String textFile) throws java.io.IOException
textFile
- The file to test on.java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile, boolean outputScores) throws java.io.IOException
testFile
- The file to test on.outputScores
- Whether to calculate and then log performance scores (P/R/F1)java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
testFile
- The file to test on.readerWriter
- A reader and writer to use for the outputoutputScores
- Whether to calculate and then log performance scores (P/R/F1)java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String testFile, java.io.OutputStream outStream, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
outputEncoding
is defined, the output is written in that
character encoding, otherwise in the system default character encoding.java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> textFiles) throws java.io.IOException
textFiles
- A File Collection to process.java.io.IOException
- For any IO errorpublic void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public Triple<java.lang.Double,java.lang.Double,java.lang.Double> classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
documents
- printWriter
- readerWriter
- outputScores
- Whether to calculate and output the performance scores (P/R/F1) of the classifiernull
. The scores are done
on a 0-100 scale like percentages.java.io.IOException
public void classifyAndWriteAnswersKBest(java.lang.String testFile, int k, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
testFile
- The name of the file to test on.k
- How many best to printreaderAndWriter
- Class to be used for printing answersjava.io.IOException
public void classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents, int k, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
documents
- The ObjectBank to test on.java.io.IOException
public void classifyAndWriteViterbiSearchGraph(java.lang.String testFile, java.lang.String searchGraphPrefix, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
testFile
- The file to test on.java.io.IOException
public void writeAnswers(java.util.List<IN> doc, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter)
doc
- Documents to write outprintWriter
- Writer to use for outputpublic boolean countResults(java.util.List<IN> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public static boolean countResultsSegmenter(java.util.List<? extends CoreMap> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public static Triple<java.lang.Double,java.lang.Double,java.lang.Double> printResults(Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public abstract void serializeClassifier(java.lang.String serializePath)
serializePath
- The path/filename to write the classifier to.public abstract void serializeClassifier(java.io.ObjectOutputStream oos)
public void loadClassifierNoExceptions(java.io.InputStream in, java.util.Properties props)
in
- The InputStream to read frompublic void loadClassifier(java.io.InputStream in) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifier(java.io.InputStream in, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic abstract void loadClassifier(java.io.ObjectInputStream in, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifier(java.lang.String loadPath) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifier(java.lang.String loadPath, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifierNoExceptions(java.lang.String loadPath)
public void loadClassifierNoExceptions(java.lang.String loadPath, java.util.Properties props)
public void loadClassifier(java.io.File file) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifier(java.io.File file, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
file
- Loads a classifier from this file.props
- Properties in this object will be used to overwrite those
specified in the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifierNoExceptions(java.io.File file)
public void loadClassifierNoExceptions(java.io.File file, java.util.Properties props)
protected void printFeatures(IN wi, java.util.Collection<java.lang.String> features)
protected void printFeatureLists(IN wi, java.util.Collection<java.util.Collection<java.lang.String>> features)
public int windowSize()