public class TokenizerAnnotator extends java.lang.Object implements Annotator
List<CoreLabel>
) under
CoreAnnotation.TokensAnnotation.Modifier and Type | Class and Description |
---|---|
static class |
TokenizerAnnotator.TokenizerType
Enum to identify the different TokenizerTypes.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
EOL_PROPERTY |
static java.lang.String |
KEEP_NL_OPTION |
DEFAULT_REQUIREMENTS, STANFORD_CDC_TOKENIZE, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_COREF, STANFORD_COREF_MENTION, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_DOCDATE, STANFORD_ENTITY_MENTIONS, STANFORD_GENDER, STANFORD_KBP, STANFORD_LEMMA, STANFORD_LINK, STANFORD_MWT, STANFORD_NATLOG, STANFORD_NER, STANFORD_OPENIE, STANFORD_PARSE, STANFORD_POS, STANFORD_QUOTE, STANFORD_QUOTE_ATTRIBUTION, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TOKENSREGEX, STANFORD_TRUECASE, STANFORD_UD_FEATURES
Constructor and Description |
---|
TokenizerAnnotator()
Gives a non-verbose, English tokenizer.
|
TokenizerAnnotator(boolean verbose) |
TokenizerAnnotator(boolean verbose,
java.util.Properties props) |
TokenizerAnnotator(boolean verbose,
java.util.Properties props,
java.lang.String options) |
TokenizerAnnotator(boolean verbose,
java.lang.String lang) |
TokenizerAnnotator(boolean verbose,
java.lang.String lang,
java.lang.String options) |
TokenizerAnnotator(boolean verbose,
TokenizerAnnotator.TokenizerType lang) |
TokenizerAnnotator(java.util.Properties properties) |
TokenizerAnnotator(java.lang.String lang) |
Modifier and Type | Method and Description |
---|---|
static void |
adjustFinalToken(java.util.List<CoreLabel> tokens) |
void |
annotate(Annotation annotation)
Does the actual work of splitting TextAnnotation into CoreLabels,
which are then attached to the TokensAnnotation.
|
Tokenizer<CoreLabel> |
getTokenizer(java.io.Reader r)
Returns a thread-safe tokenizer
|
java.util.Set<java.lang.Class<? extends CoreAnnotation>> |
requirementsSatisfied()
Returns a set of requirements for which tasks this annotator can
provide.
|
java.util.Set<java.lang.Class<? extends CoreAnnotation>> |
requires()
Returns the set of tasks which this annotator requires in order
to perform.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
exactRequirements, unmount
public static final java.lang.String EOL_PROPERTY
public static final java.lang.String KEEP_NL_OPTION
public TokenizerAnnotator()
public TokenizerAnnotator(java.util.Properties properties)
public TokenizerAnnotator(boolean verbose)
public TokenizerAnnotator(java.lang.String lang)
public TokenizerAnnotator(boolean verbose, TokenizerAnnotator.TokenizerType lang)
public TokenizerAnnotator(boolean verbose, java.lang.String lang)
public TokenizerAnnotator(boolean verbose, java.lang.String lang, java.lang.String options)
public TokenizerAnnotator(boolean verbose, java.util.Properties props)
public TokenizerAnnotator(boolean verbose, java.util.Properties props, java.lang.String options)
public Tokenizer<CoreLabel> getTokenizer(java.io.Reader r)
public static void adjustFinalToken(java.util.List<CoreLabel> tokens)
public void annotate(Annotation annotation)
public java.util.Set<java.lang.Class<? extends CoreAnnotation>> requires()
Annotator
public java.util.Set<java.lang.Class<? extends CoreAnnotation>> requirementsSatisfied()
Annotator
requirementsSatisfied
in interface Annotator