public class TreeNormalizer extends java.lang.Object implements java.io.Serializable
TreeNormalizermay wish to perform is interning the
Strings passed to it. Can be reused as a Singleton. Designed to be extended. The
TreeNormalizermethods are in two groups. The contract for this class is that first normalizeTerminal or normalizeNonterminal will be called on each
Stringthat will be put into a
Tree, when they are read from files or otherwise created. Then
normalizeWholeTreewill be called on the
Tree. It normally walks the
Treemaking whatever modifications it wishes to. A
TreeNormalizerneed not make a deep copy of a
Tree. It is assumed to be able to work destructively, because afterwards we will only use the normalized
Tree. Implementation note: This is a very old legacy class used in conjunction with PennTreeReader. It seems now that it would be better to move the String normalization into the tokenizer, and then we are just left with a (possibly destructive) TreeTransformer.
|Constructor and Description|
|Modifier and Type||Method and Description|
Normalizes a nonterminal contents (and maybe intern it).
Normalizes a leaf contents (and maybe intern it).
Normalize a whole tree -- this method assumes that the argument that it is passed is the root of a complete
public java.lang.String normalizeTerminal(java.lang.String leaf)
leaf- The String that decorates the leaf
public java.lang.String normalizeNonterminal(java.lang.String category)
category- The String that decorates this nonterminal node
public Tree normalizeWholeTree(Tree tree, TreeFactory tf)
Tree. It is normally implemented as a Tree-walking routine.
This method may return
null. This is interpreted to
mean that this is a tree that should not be included in further
processing. PennTreeReader recognizes this return value, and
asks for another Tree from the input Reader.
tree- The tree to be normalized
tf- the TreeFactory to create new nodes (if needed)