public class LabeledChunkIdentifier
extends java.lang.Object
<tag>-<type>
,
where the tag is a prefix indicating where in the chunk it is.
Supports various encodings: IO, IOB, IOE, BILOU, SBEIO, []
The type is
Example: Bill gave Xerox Bank of America shares
IO: I-PER O I-ORG I-ORG I-ORG I-ORG O
IOB1: I-PER O I-ORG B-ORG I-ORG I-ORG O
IOB2: B-PER O B-ORG B-ORG I-ORG I-ORG O
IOE1: I-PER O E-ORG I-ORG I-ORG I-ORG O
IOE2: E-PER O E-ORG I-ORG I-ORG E-ORG O
BILOU: U-PER O U-ORG B-ORG I-ORG L-ORG O
SBEIO: S-PER O S-ORG B-ORG I-ORG E-ORG OModifier and Type | Class and Description |
---|---|
static class |
LabeledChunkIdentifier.LabelTagType
Class representing a label, tag and type.
|
Constructor and Description |
---|
LabeledChunkIdentifier() |
Modifier and Type | Method and Description |
---|---|
java.util.List<CoreMap> |
getAnnotatedChunks(java.util.List<CoreLabel> tokens,
int totalTokensOffset,
java.lang.Class textKey,
java.lang.Class labelKey)
Find and annotate chunks.
|
java.util.List<CoreMap> |
getAnnotatedChunks(java.util.List<CoreLabel> tokens,
int totalTokensOffset,
java.lang.Class textKey,
java.lang.Class labelKey,
java.lang.Class tokenChunkKey,
java.lang.Class tokenLabelKey) |
java.util.List<CoreMap> |
getAnnotatedChunks(java.util.List<CoreLabel> tokens,
int totalTokensOffset,
java.lang.Class textKey,
java.lang.Class labelKey,
java.lang.Class tokenChunkKey,
java.lang.Class tokenLabelKey,
java.util.function.Predicate<Pair<CoreLabel,CoreLabel>> checkTokensCompatible)
Find and annotate chunks.
|
java.util.List<CoreMap> |
getAnnotatedChunks(java.util.List<CoreLabel> tokens,
int totalTokensOffset,
java.lang.Class textKey,
java.lang.Class labelKey,
java.util.function.Predicate<Pair<CoreLabel,CoreLabel>> checkTokensCompatible) |
java.lang.String |
getDefaultNegTag() |
java.lang.String |
getDefaultPosTag() |
java.lang.String |
getNegLabel() |
LabeledChunkIdentifier.LabelTagType |
getTagType(java.lang.String label) |
static boolean |
isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk ended between the previous and current token.
|
boolean |
isIgnoreProvidedTag() |
static boolean |
isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev,
LabeledChunkIdentifier.LabelTagType cur)
Returns whether a chunk started between the previous and current token
|
void |
setDefaultNegTag(java.lang.String defaultNegTag) |
void |
setDefaultPosTag(java.lang.String defaultPosTag) |
void |
setIgnoreProvidedTag(boolean ignoreProvidedTag) |
void |
setNegLabel(java.lang.String negLabel) |
public java.util.List<CoreMap> getAnnotatedChunks(java.util.List<CoreLabel> tokens, int totalTokensOffset, java.lang.Class textKey, java.lang.Class labelKey)
tokens
- - List of tokens to look for chunkstotalTokensOffset
- - Index of tokens to offset bytextKey
- - Key to use to find the token textlabelKey
- - Key to use to find the token label (to determine if inside chunk or not)public java.util.List<CoreMap> getAnnotatedChunks(java.util.List<CoreLabel> tokens, int totalTokensOffset, java.lang.Class textKey, java.lang.Class labelKey, java.util.function.Predicate<Pair<CoreLabel,CoreLabel>> checkTokensCompatible)
public java.util.List<CoreMap> getAnnotatedChunks(java.util.List<CoreLabel> tokens, int totalTokensOffset, java.lang.Class textKey, java.lang.Class labelKey, java.lang.Class tokenChunkKey, java.lang.Class tokenLabelKey)
public java.util.List<CoreMap> getAnnotatedChunks(java.util.List<CoreLabel> tokens, int totalTokensOffset, java.lang.Class textKey, java.lang.Class labelKey, java.lang.Class tokenChunkKey, java.lang.Class tokenLabelKey, java.util.function.Predicate<Pair<CoreLabel,CoreLabel>> checkTokensCompatible)
tokens
- - List of tokens to look for chunkstotalTokensOffset
- - Index of tokens to offset bylabelKey
- - Key to use to find the token label (to determine if inside chunk or not)textKey
- - Key to use to find the token texttokenChunkKey
- - If not null, each token is annotated with the chunk using this keytokenLabelKey
- - If not null, each token is annotated with the text associated with the chunk using this keycheckTokensCompatible
- - If not null, additional check to see if this token and the previous are compatiblepublic static boolean isEndOfChunk(LabeledChunkIdentifier.LabelTagType prev, LabeledChunkIdentifier.LabelTagType cur)
prev
- - the label/tag/type of the previous tokencur
- - the label/tag/type of the current tokenpublic static boolean isStartOfChunk(LabeledChunkIdentifier.LabelTagType prev, LabeledChunkIdentifier.LabelTagType cur)
prev
- - the label/tag/type of the previous tokencur
- - the label/tag/type of the current tokenpublic LabeledChunkIdentifier.LabelTagType getTagType(java.lang.String label)
public java.lang.String getDefaultPosTag()
public void setDefaultPosTag(java.lang.String defaultPosTag)
public java.lang.String getDefaultNegTag()
public void setDefaultNegTag(java.lang.String defaultNegTag)
public java.lang.String getNegLabel()
public void setNegLabel(java.lang.String negLabel)
public boolean isIgnoreProvidedTag()
public void setIgnoreProvidedTag(boolean ignoreProvidedTag)