T
- The type of the tokens returned by the Tokenizerpublic interface TokenizerFactory<T> extends IteratorFromReaderFactory<T>
public static TokenizerFactory<? extends HasWord> newTokenizerFactory();
public static TokenizerFactory<Word> newWordTokenizerFactory(String options);
These are expected by certain JavaNLP code (e.g., LexicalizedParser),
which wants to produce a TokenizerFactory by reflection.Modifier and Type | Method and Description |
---|---|
Tokenizer<T> |
getTokenizer(java.io.Reader r)
Get a tokenizer for this reader.
|
Tokenizer<T> |
getTokenizer(java.io.Reader r,
java.lang.String extraOptions)
Get a tokenizer for this reader.
|
void |
setOptions(java.lang.String options)
Sets default options for how tokenizers built from this factory should behave.
|
getIterator
Tokenizer<T> getTokenizer(java.io.Reader r)
r
- A Reader (which is assumed to already by buffered, if appropriate)Tokenizer<T> getTokenizer(java.io.Reader r, java.lang.String extraOptions)
r
- A Reader (which is assumed to already by buffered, if appropriate)extraOptions
- Options for how this tokenizer should behavevoid setOptions(java.lang.String options)
options
- Options for how this tokenizer should behave