T
- The class of the returned tokenspublic static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord> extends java.lang.Object implements TokenizerFactory<T>
PTBTokenizer
for details of the parameters and options.PTBTokenizer
,
Serialized FormModifier and Type | Field and Description |
---|---|
protected LexedTokenFactory<T> |
factory |
protected java.lang.String |
options |
Modifier and Type | Method and Description |
---|---|
java.util.Iterator<T> |
getIterator(java.io.Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(java.io.Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(java.io.Reader r,
java.lang.String extraOptions)
Get a tokenizer for this reader.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newCoreLabelTokenizerFactory(java.lang.String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and
uses the options passed in.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newPTBTokenizerFactory(boolean tokenizeNLs,
boolean invertible) |
static <T extends HasWord> |
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
java.lang.String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and
options passed in.
|
static TokenizerFactory<Word> |
newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and
treats carriage returns as normal whitespace.
|
static PTBTokenizer.PTBTokenizerFactory<Word> |
newWordTokenizerFactory(java.lang.String options)
Constructs a new PTBTokenizer that returns Word objects and
uses the options passed in.
|
void |
setOptions(java.lang.String options)
Sets default options for how tokenizers built from this factory should behave.
|
protected final LexedTokenFactory<T extends HasWord> factory
protected java.lang.String options
public static TokenizerFactory<Word> newTokenizerFactory()
public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(java.lang.String options)
options
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(java.lang.String options)
options
- A String of options. For the default, recommended
options for PTB-style tokenization compatibility, pass
in an empty String.public static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, java.lang.String options)
tokenFactory
- The LexedTokenFactoryoptions
- A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
public java.util.Iterator<T> getIterator(java.io.Reader r)
getIterator
in interface IteratorFromReaderFactory<T extends HasWord>
r
- Where to read objects frompublic Tokenizer<T> getTokenizer(java.io.Reader r)
getTokenizer
in interface TokenizerFactory<T extends HasWord>
r
- A Reader (which is assumed to already by buffered, if appropriate)public Tokenizer<T> getTokenizer(java.io.Reader r, java.lang.String extraOptions)
TokenizerFactory
getTokenizer
in interface TokenizerFactory<T extends HasWord>
r
- A Reader (which is assumed to already by buffered, if appropriate)extraOptions
- Options for how this tokenizer should behavepublic void setOptions(java.lang.String options)
TokenizerFactory
setOptions
in interface TokenizerFactory<T extends HasWord>
options
- Options for how this tokenizer should behave