public abstract class AbstractTreebankLanguagePack extends java.lang.Object implements TreebankLanguagePack
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of
Treebank data.
|
protected static char |
DEFAULT_GF_CHAR |
protected boolean |
generateOriginalDependencies
For languages where a Universal Dependency converter
exists this variable determines whether the original
or the Universal converter will be used.
|
protected char |
gfCharacter
Default character for indicating that something is a grammatical fn; probably should be overridden by
lang specific ones
|
Constructor and Description |
---|
AbstractTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack.
|
AbstractTreebankLanguagePack(char gfChar)
Gives a handle to the TreebankLanguagePack.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
basicCategory(java.lang.String category)
Returns the basic syntactic category of a String.
|
java.lang.String |
categoryAndFunction(java.lang.String category)
Returns the syntactic category and 'function' of a String.
|
java.util.function.Predicate<java.lang.String> |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a
punctuation tag that should be ignored by EVALB-style evaluation.
|
java.lang.String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation
should ignore for this treebank/language.
|
boolean |
generateOriginalDependencies()
Used for languages where an original Stanford Dependency
converter and a Universal Dependency converter exists.
|
java.util.function.Function<java.lang.String,java.lang.String> |
getBasicCategoryFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method. |
java.util.function.Function<java.lang.String,java.lang.String> |
getCategoryAndFunctionFunction()
Returns a
Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method. |
java.lang.String |
getEncoding()
Return the input Charset encoding for the Treebank.
|
char |
getGfCharacter() |
TokenizerFactory<? extends HasWord> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that
will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space).
|
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilt)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilt,
HeadFinder typedDependencyHeadFinder)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
abstract HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isEvalBIgnoredPunctuationTag(java.lang.String str)
Accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing
character.
|
boolean |
isPunctuationTag(java.lang.String str)
Accepts a String that is a punctuation
tag name, and rejects everything else.
|
boolean |
isPunctuationWord(java.lang.String str)
Accepts a String that is a punctuation
word, and rejects everything else.
|
boolean |
isSentenceFinalPunctuationTag(java.lang.String str)
Accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
boolean |
isStartSymbol(java.lang.String str)
Accepts a String that is a start symbol of the treebank.
|
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be
truncated to give the basic syntactic category of a label.
|
MorphoFeatureSpecification |
morphFeatureSpec()
Returns a morphological feature specification for words in this language.
|
java.util.function.Predicate<java.lang.String> |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation
tag name, and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation
tag name, and rejects everything else.
|
abstract java.lang.String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language.
|
java.util.function.Predicate<java.lang.String> |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation
word, and rejects everything else.
|
java.util.function.Predicate<java.lang.String> |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation
word, and rejects punctuation.
|
abstract java.lang.String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language.
|
java.util.function.Predicate<java.lang.String> |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
abstract java.lang.String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this
treebank/language.
|
abstract java.lang.String[] |
sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for
this treebank/language.
|
void |
setGenerateOriginalDependencies(boolean generateOriginalDependencies)
Used for languages where an original Stanford Dependency
converter and a Universal Dependency converter exists.
|
void |
setGfCharacter(char gfCharacter)
Sets the grammatical function indicating character to gfCharacter.
|
java.lang.String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol
of the treebank, or null if none is defined.
|
java.util.function.Predicate<java.lang.String> |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol
of the treebank, and rejects everything else.
|
abstract java.lang.String[] |
startSymbols()
Returns a String array of treebank start symbols.
|
java.lang.String |
stripGF(java.lang.String category)
Returns the category for a String with everything following
the gf character (which may be language specific) stripped.
|
boolean |
supportsGrammaticalStructures()
Whether or not we have typed dependencies for this language.
|
abstract java.lang.String |
treebankFileExtension()
Returns the extension of treebank files for this treebank.
|
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use
with this language/treebank.
|
TokenizerFactory<Tree> |
treeTokenizerFactory()
Return a TokenizerFactory for Trees of this language/treebank.
|
abstract HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when making typed dependencies.
|
protected char gfCharacter
protected static final char DEFAULT_GF_CHAR
public static final java.lang.String DEFAULT_ENCODING
protected boolean generateOriginalDependencies
public AbstractTreebankLanguagePack()
public AbstractTreebankLanguagePack(char gfChar)
gfChar
- The character that sets of grammatical functions in node labels.public abstract java.lang.String[] punctuationTags()
punctuationTags
in interface TreebankLanguagePack
public abstract java.lang.String[] punctuationWords()
punctuationWords
in interface TreebankLanguagePack
public abstract java.lang.String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags
in interface TreebankLanguagePack
public abstract java.lang.String[] sentenceFinalPunctuationWords()
sentenceFinalPunctuationWords
in interface TreebankLanguagePack
public java.lang.String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags
in interface TreebankLanguagePack
public boolean isPunctuationTag(java.lang.String str)
isPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isPunctuationWord(java.lang.String str)
isPunctuationWord
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isSentenceFinalPunctuationTag(java.lang.String str)
isSentenceFinalPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic boolean isEvalBIgnoredPunctuationTag(java.lang.String str)
isEvalBIgnoredPunctuationTag
in interface TreebankLanguagePack
str
- The string to checkpublic java.util.function.Predicate<java.lang.String> punctuationTagAcceptFilter()
punctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> punctuationTagRejectFilter()
punctuationTagRejectFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> punctuationWordAcceptFilter()
punctuationWordAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> punctuationWordRejectFilter()
punctuationWordRejectFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> sentenceFinalPunctuationTagAcceptFilter()
sentenceFinalPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> evalBIgnoredPunctuationTagAcceptFilter()
evalBIgnoredPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public java.util.function.Predicate<java.lang.String> evalBIgnoredPunctuationTagRejectFilter()
evalBIgnoredPunctuationTagRejectFilter
in interface TreebankLanguagePack
public java.lang.String getEncoding()
Charset
class.getEncoding
in interface TreebankLanguagePack
public char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters
in interface TreebankLanguagePack
public java.lang.String basicCategory(java.lang.String category)
labelAnnotationIntroducingCharacters()
.
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one of this set, a second instance of the same item from this set is
also excluded (to deal with '-LLB-', '-RCB-', etc.).basicCategory
in interface TreebankLanguagePack
category
- The whole String name of the labelpublic java.lang.String stripGF(java.lang.String category)
TreebankLanguagePack
stripGF
in interface TreebankLanguagePack
category
- The String name of the label (may previously have had basic category called on it)public java.util.function.Function<java.lang.String,java.lang.String> getBasicCategoryFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory() method.getBasicCategoryFunction
in interface TreebankLanguagePack
public java.lang.String categoryAndFunction(java.lang.String category)
category-function
.
categoryAndFunction
in interface TreebankLanguagePack
category
- The whole String name of the labelpublic java.util.function.Function<java.lang.String,java.lang.String> getCategoryAndFunctionFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction() method.getCategoryAndFunctionFunction
in interface TreebankLanguagePack
public boolean isLabelAnnotationIntroducingCharacter(char ch)
isLabelAnnotationIntroducingCharacter
in interface TreebankLanguagePack
ch
- The character to checkpublic boolean isStartSymbol(java.lang.String str)
isStartSymbol
in interface TreebankLanguagePack
str
- The str to testpublic java.util.function.Predicate<java.lang.String> startSymbolAcceptFilter()
startSymbolAcceptFilter
in interface TreebankLanguagePack
public abstract java.lang.String[] startSymbols()
startSymbols
in interface TreebankLanguagePack
public java.lang.String startSymbol()
startSymbol
in interface TreebankLanguagePack
public abstract java.lang.String treebankFileExtension()
treebankFileExtension
in interface TreebankLanguagePack
public TokenizerFactory<? extends HasWord> getTokenizerFactory()
WhitespaceTokenizer
.getTokenizerFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory()
grammaticalStructureFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilt)
grammaticalStructureFactory
in interface TreebankLanguagePack
puncFilt
- A filter which should reject punctuation words (as Strings)public GrammaticalStructureFactory grammaticalStructureFactory(java.util.function.Predicate<java.lang.String> puncFilt, HeadFinder typedDependencyHeadFinder)
grammaticalStructureFactory
in interface TreebankLanguagePack
puncFilt
- A filter which should reject punctuation words (as Strings)typedDependencyHeadFinder
- A HeadFinder which finds heads for typed dependenciespublic boolean supportsGrammaticalStructures()
TreebankLanguagePack
supportsGrammaticalStructures
in interface TreebankLanguagePack
public char getGfCharacter()
public void setGfCharacter(char gfCharacter)
TreebankLanguagePack
setGfCharacter
in interface TreebankLanguagePack
gfCharacter
- Sets the character in label names that sets of
grammatical function marking (from the phrase label).public TreeReaderFactory treeReaderFactory()
treeReaderFactory
in interface TreebankLanguagePack
public TokenizerFactory<Tree> treeTokenizerFactory()
treeTokenizerFactory
in interface TreebankLanguagePack
public abstract HeadFinder headFinder()
headFinder
in interface TreebankLanguagePack
public abstract HeadFinder typedDependencyHeadFinder()
typedDependencyHeadFinder
in interface TreebankLanguagePack
public MorphoFeatureSpecification morphFeatureSpec()
morphFeatureSpec
in interface TreebankLanguagePack
public void setGenerateOriginalDependencies(boolean generateOriginalDependencies)
TreebankLanguagePack
setGenerateOriginalDependencies
in interface TreebankLanguagePack
public boolean generateOriginalDependencies()
TreebankLanguagePack
generateOriginalDependencies
in interface TreebankLanguagePack