public class IBMArabicEscaper extends java.lang.Object implements java.util.function.Function<java.util.List<HasWord>,java.util.List<HasWord>>
LexicalizedParser
.
It performs these functions functions:
ArabicTreeNormalizer
Function<List<HasWord>, List<HasWord>>
in order to run with the parser.Constructor and Description |
---|
IBMArabicEscaper() |
IBMArabicEscaper(boolean annoteAndClassOnly) |
Modifier and Type | Method and Description |
---|---|
java.util.List<HasWord> |
apply(java.util.List<HasWord> sentence)
Converts an input list of
HasWord in IBM Arabic to
LDC ATBv3 representation. |
java.lang.String |
apply(java.lang.String w)
Applies escaping to a single word.
|
void |
disableWarnings()
Disable warnings generated when tokens are escaped.
|
static void |
main(java.lang.String[] args)
This main method preprocesses one-sentence-per-line input, making the
same changes as the Function.
|
public IBMArabicEscaper()
public IBMArabicEscaper(boolean annoteAndClassOnly)
public void disableWarnings()
public java.util.List<HasWord> apply(java.util.List<HasWord> sentence)
HasWord
in IBM Arabic to
LDC ATBv3 representation. The method safely copies the input object
prior to escaping.public java.lang.String apply(java.lang.String w)
w
- The wordjava.lang.RuntimeException
- If a word is nullified (which is really bad for the parser and
for MT)public static void main(java.lang.String[] args) throws java.io.IOException
.sent
appended to their names. If you give the flag
-f
then output is instead sent to stdout. Input and output
is always in UTF-8.args
- A list of filenames. The files must be UTF-8 encoded.java.io.IOException
- If there are any issues