public class SentenceUtils
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
static <T> java.lang.String |
extractNgram(java.util.List<T> list,
int start,
int end)
Returns the substring of the sentence from start (inclusive)
to end (exclusive).
|
static <T extends HasWord> |
listToOriginalTextString(java.util.List<T> list)
Returns the sentence as a string, based on the original text and spacing
prior to tokenization.
|
static <T extends HasWord> |
listToOriginalTextString(java.util.List<T> list,
boolean printBeforeBeforeStart)
Returns the sentence as a string, based on the original text and spacing
prior to tokenization.
|
static <T> java.lang.String |
listToString(java.util.List<T> list)
Returns the sentence as a string with a space between words.
|
static <T> java.lang.String |
listToString(java.util.List<T> list,
boolean justValue)
Returns the sentence as a string with a space between words.
|
static <T> java.lang.String |
listToString(java.util.List<T> list,
boolean justValue,
java.lang.String separator)
As already described, but if separator is not null, then objects
such as TaggedWord
|
static <T extends CoreMap> |
listToString(java.util.List<T> list,
java.lang.String... keys)
Pretty print CoreMap classes using the same semantics as the toShorterString method.
|
static java.util.List<CoreLabel> |
toCoreLabelList(java.util.List<? extends HasWord> words)
Create a sentence as a List of
CoreLabel objects from
a List of other label objects. |
static java.util.List<CoreLabel> |
toCoreLabelList(java.lang.String... words)
Create a sentence as a List of
CoreLabel objects from
an array (or varargs) of String objects. |
static java.util.ArrayList<TaggedWord> |
toTaggedList(java.util.List<java.lang.String> lex,
java.util.List<java.lang.String> tags)
Create an ArrayList as a list of
TaggedWord from two
lists of String , one for the words, and the second for
the tags. |
static java.util.ArrayList<Word> |
toUntaggedList(java.util.List<java.lang.String> lex)
Create an ArrayList as a list of
Word from a
list of String . |
static java.util.ArrayList<Word> |
toUntaggedList(java.lang.String... words)
Create a Sentence as a list of
Word objects from
an array of String objects. |
static java.util.List<HasWord> |
toWordList(java.lang.String... words) |
static <T> java.lang.String |
wordToString(T o,
boolean justValue) |
static <T> java.lang.String |
wordToString(T o,
boolean justValue,
java.lang.String separator) |
public static java.util.ArrayList<TaggedWord> toTaggedList(java.util.List<java.lang.String> lex, java.util.List<java.lang.String> tags)
TaggedWord
from two
lists of String
, one for the words, and the second for
the tags.lex
- a list whose items are of type String
and
are the wordstags
- a list whose items are of type String
and
are the tagspublic static java.util.ArrayList<Word> toUntaggedList(java.util.List<java.lang.String> lex)
Word
from a
list of String
.lex
- a list whose items are of type String
and
are the wordspublic static java.util.ArrayList<Word> toUntaggedList(java.lang.String... words)
Word
objects from
an array of String objects.words
- The words to make it frompublic static java.util.List<HasWord> toWordList(java.lang.String... words)
public static java.util.List<CoreLabel> toCoreLabelList(java.lang.String... words)
CoreLabel
objects from
an array (or varargs) of String objects.words
- The words to make it frompublic static java.util.List<CoreLabel> toCoreLabelList(java.util.List<? extends HasWord> words)
CoreLabel
objects from
a List of other label objects.words
- The words to make it frompublic static <T> java.lang.String listToString(java.util.List<T> list)
value()
of each item -
this will give the expected answer for a short form representation
of the "sentence" over a range of cases. It is equivalent to
calling toString(true)
.
TODO: Sentence used to be a subclass of ArrayList, with this
method as the toString. Therefore, there may be instances of
ArrayList being printed that expect this method to be used.list
- The tokenized sentence to print outpublic static <T> java.lang.String listToString(java.util.List<T> list, boolean justValue)
list
- The tokenized sentence to print outjustValue
- If true
and the elements are of type
Label
, return just the
value()
of the Label
of each word;
otherwise,
call the toString()
method on each item.public static <T> java.lang.String listToString(java.util.List<T> list, boolean justValue, java.lang.String separator)
separator
- The string used to separate Word and Tag
in TaggedWord, etcpublic static <T extends CoreMap> java.lang.String listToString(java.util.List<T> list, java.lang.String... keys)
public static <T extends HasWord> java.lang.String listToOriginalTextString(java.util.List<T> list)
List<HasWord>
.list
- The sentence (List of tokens) to print outpublic static <T extends HasWord> java.lang.String listToOriginalTextString(java.util.List<T> list, boolean printBeforeBeforeStart)
List<HasWord>
.list
- The sentence (List of tokens) to print outprintBeforeBeforeStart
- Whether to print the BeforeAnnotation before the first token
of the sentence. (In general, the BeforeAnnotation is the same
as the AfterAnnotation of the preceding token. So, usually this
is correct to do only for the first sentence of a text.)public static <T> java.lang.String wordToString(T o, boolean justValue)
public static <T> java.lang.String wordToString(T o, boolean justValue, java.lang.String separator)
public static <T> java.lang.String extractNgram(java.util.List<T> list, int start, int end)
start
- Leftmost index of the substringend
- Rightmost index of the ngram