Interface | Description |
---|---|
CoreTokenFactory<IN extends CoreMap> |
To make tokens like CoreMap or CoreLabel.
|
DocumentProcessor<IN,OUT,L,F> |
Top-level interface for transforming Documents.
|
LexedTokenFactory<T> |
Constructs a token (of arbitrary type) from a String and its position
in the underlying text.
|
ListProcessor<IN,OUT> |
An interface for things that operate on a List.
|
SerializableFunction<T1,T2> |
This interface is a conjunction of Function and Serializable, which is
a bad idea from the perspective of the type system, but one that seems
more palatable than other bad ideas until java's type system is flexible
enough to support type conjunctions.
|
Tokenizer<T> |
Tokenizers break up text into individual Objects.
|
TokenizerFactory<T> |
A TokenizerFactory is a factory that can build a Tokenizer (an extension of Iterator)
from a java.io.Reader.
|
TSVSentenceProcessor |
An interface for running an action (a callback function) on each line of a TSV file representing
a collection of sentences in a corpus.
|
WordSegmenter |
An interface for segmenting strings into words
(in unwordsegmented languages).
|
Class | Description |
---|---|
AbstractListProcessor<IN,OUT,L,F> |
Class AbstractListProcessor
|
AbstractTokenizer<T> |
An abstract tokenizer.
|
Americanize |
Takes a HasWord or String and returns an Americanized version of it.
|
AmericanizeFunction | |
ChineseDocumentToSentenceProcessor |
Convert a Chinese Document into a List of sentence Strings.
|
CodepointCoreLabelProcessor |
Processor to add codepoint counts to tokens
In general this will be the same as the character offsets, but certain fancy characters such as 𝒚̂𝒊 will change that. |
CoreLabelProcessor |
Abstract class for processing a
List<CoreLabel> . |
CoreLabelTokenFactory |
Constructs
CoreLabel s from Strings optionally with
beginning and ending (character after the end) offset positions in
an original text. |
DistSimClassifier |
Maps a String to its distributional similarity class.
|
DocumentPreprocessor |
Produces a list of sentences from either a plain text or XML document.
|
LexerTokenizer | |
LexerUtils |
This class contains various static utility methods invoked by our JFlex NL lexers.
|
LowercaseAndAmericanizeFunction | |
LowercaseFunction | |
Morphology |
Morphology computes the base form of English words, by removing
just inflections (not derivational morphology).
|
ProcessMorphologyRequest | |
PTBEscapingProcessor<IN extends HasWord,L,F> |
Produces a new Document of Words in which special characters of the PTB
have been properly escaped.
|
PTBTokenizer<T extends HasWord> |
A fast, rule-based tokenizer implementation, which produces Penn Treebank
style tokenization of English text.
|
PTBTokenizer.PTBTokenizerFactory<T extends HasWord> |
This class provides a factory which will vend instances of PTBTokenizer
which wrap a provided Reader.
|
Stemmer |
Stemmer, implementing the Porter Stemming Algorithm
The Stemmer class transforms a word into its root form. |
StopList |
Simple stoplist class.
|
StoplistFilter<L,F> |
Filter which removes stop-listed words.
|
StripTagsProcessor<L,F> |
A
Processor whose process method deletes all
SGML/XML/HTML tags (tokens starting with < and ending
with > . |
TokenizerAdapter |
This class adapts between a
java.io.StreamTokenizer
and a edu.stanford.nlp.process.Tokenizer . |
TransformXML<T> |
Reads XML from an input file or stream and writes XML to an output
file or stream, while transforming text appearing inside specified
XML tags by applying a specified
. |
TransformXML.NoEscapingSAXInterface<T> |
This version of the SAXInterface doesn't escape the text produced
by the function.
|
TransformXML.SAXInterface<T> | |
TSVSentenceIterator |
Reads sentences from a TSV, provided a list of fields to populate.
|
TSVUtils |
A set of utilities for parsing TSV files into CoreMaps
|
WhitespaceTokenizer<T extends HasWord> |
A WhitespaceTokenizer is a tokenizer that splits on and discards only
whitespace characters.
|
WhitespaceTokenizer.WhitespaceTokenizerFactory<T extends HasWord> |
A factory which vends WhitespaceTokenizers.
|
WordSegmentingTokenizer |
A tokenizer that works by calling a WordSegmenter.
|
WordShapeClassifier |
Provides static methods which
map any String to another String indicative of its "word shape" -- e.g.,
whether capitalized, numeric, etc.
|
WordTokenFactory |
Constructs a Word from a String.
|
WordToSentenceProcessor<IN> |
Transforms a List of words into a List of Lists of words (that is, a List
of sentences), by grouping the words.
|
WordToTaggedWordProcessor<IN extends HasWord,L,F> |
Transforms a Document of Words into a document all or partly of
TaggedWords by breaking words on a tag divider character.
|
Enum | Description |
---|---|
DocumentPreprocessor.DocType | |
LexerUtils.DashesEnum | |
LexerUtils.EllipsesEnum | |
LexerUtils.QuotesEnum | |
TSVSentenceIterator.SentenceField |
A list of possible fields in the sentence table
|
TSVSentenceProcessor.SentenceField |
A list of possible fields in the sentence table.
|
WordToSentenceProcessor.NewlineIsSentenceBreak |
Document process(Document)
method
which takes a document and returns another document, which may
be parsed, stoplisted, stemmed, etc.