public class WordsToSentencesAnnotator extends java.lang.Object implements Annotator
List<CoreLabel>
under the TokensAnnotation
field,
and runs it through WordToSentenceProcessor
and puts the new List<Annotation>
under the SentencesAnnotation
field.DEFAULT_REQUIREMENTS, STANFORD_CDC_TOKENIZE, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_COREF, STANFORD_COREF_MENTION, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_DOCDATE, STANFORD_ENTITY_MENTIONS, STANFORD_GENDER, STANFORD_KBP, STANFORD_LEMMA, STANFORD_LINK, STANFORD_MWT, STANFORD_NATLOG, STANFORD_NER, STANFORD_OPENIE, STANFORD_PARSE, STANFORD_POS, STANFORD_QUOTE, STANFORD_QUOTE_ATTRIBUTION, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TOKENSREGEX, STANFORD_TRUECASE, STANFORD_UD_FEATURES
Constructor and Description |
---|
WordsToSentencesAnnotator() |
WordsToSentencesAnnotator(boolean verbose) |
WordsToSentencesAnnotator(boolean verbose,
java.lang.String boundaryTokenRegex,
java.util.Set<java.lang.String> boundaryToDiscard,
java.util.Set<java.lang.String> htmlElementsToDiscard,
java.lang.String newlineIsSentenceBreak,
java.lang.String boundaryMultiTokenRegex,
java.util.Set<java.lang.String> tokenRegexesToDiscard) |
WordsToSentencesAnnotator(java.util.Properties properties) |
Modifier and Type | Method and Description |
---|---|
void |
annotate(Annotation annotation)
If setCountLineNumbers is set to true, we count line numbers by
telling the underlying splitter to return empty lists of tokens
and then treating those empty lists as empty lines.
|
static WordsToSentencesAnnotator |
newlineSplitter(java.lang.String... nlToken)
Return a WordsToSentencesAnnotator that splits on newlines (only), which are then deleted.
|
static WordsToSentencesAnnotator |
nonSplitter()
Return a WordsToSentencesAnnotator that never splits the token stream.
|
java.util.Set<java.lang.Class<? extends CoreAnnotation>> |
requirementsSatisfied()
Returns a set of requirements for which tasks this annotator can
provide.
|
java.util.Set<java.lang.Class<? extends CoreAnnotation>> |
requires()
Returns the set of tasks which this annotator requires in order
to perform.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
exactRequirements, unmount
public WordsToSentencesAnnotator()
public WordsToSentencesAnnotator(java.util.Properties properties)
public WordsToSentencesAnnotator(boolean verbose)
public WordsToSentencesAnnotator(boolean verbose, java.lang.String boundaryTokenRegex, java.util.Set<java.lang.String> boundaryToDiscard, java.util.Set<java.lang.String> htmlElementsToDiscard, java.lang.String newlineIsSentenceBreak, java.lang.String boundaryMultiTokenRegex, java.util.Set<java.lang.String> tokenRegexesToDiscard)
public static WordsToSentencesAnnotator newlineSplitter(java.lang.String... nlToken)
nlToken
- Zero or more new line tokens, which might be a \n or the fake
newline tokens returned from the tokenizer.public static WordsToSentencesAnnotator nonSplitter()
public void annotate(Annotation annotation)
public java.util.Set<java.lang.Class<? extends CoreAnnotation>> requires()
Annotator
public java.util.Set<java.lang.Class<? extends CoreAnnotation>> requirementsSatisfied()
Annotator
requirementsSatisfied
in interface Annotator