public class DefaultLexicalMapper extends java.lang.Object implements Mapper, java.io.Serializable
Modifier and Type | Field and Description |
---|---|
java.util.regex.Pattern |
arabicDigit |
java.util.regex.Pattern |
arabicPunc |
java.util.regex.Pattern |
latinPunc |
java.util.regex.Pattern |
segmentationMarker |
Constructor and Description |
---|
DefaultLexicalMapper() |
Modifier and Type | Method and Description |
---|---|
boolean |
canChangeEncoding(java.lang.String parent,
java.lang.String element)
Indicates whether
child can be converted to another encoding. |
static void |
main(java.lang.String[] args) |
java.lang.String |
map(java.lang.String parent,
java.lang.String element)
Maps from one string representation to another.
|
void |
setup(java.io.File path,
java.lang.String... options)
Perform initialization prior to the first call to
map . |
public final java.util.regex.Pattern latinPunc
public final java.util.regex.Pattern arabicPunc
public final java.util.regex.Pattern arabicDigit
public final java.util.regex.Pattern segmentationMarker
public java.lang.String map(java.lang.String parent, java.lang.String element)
Mapper
public void setup(java.io.File path, java.lang.String... options)
Mapper
map
.public boolean canChangeEncoding(java.lang.String parent, java.lang.String element)
Mapper
child
can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.canChangeEncoding
in interface Mapper
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.public static void main(java.lang.String[] args)