A short utility program that dumps out trees from multiple files
into one file of tagged text. Useful for combining many parse tree
training files into one tagger training file, since the tagger
doesn't have convenient ways of reading in an entire directory.
There are a few command line arguments available:
Command line arguments
-output <filename> |
File to output the data to |
-tagSeparator <separator> |
Separator to use between word and tag |
-treeRange <range> |
If tree files have numbers, they will be filtered out if not
in this range. Can be null. |
-inputEncoding <encoding> |
Encoding to use when reading tree files |
-outputEncoding <encoding> |
Encoding to use when writing tags |
-treeFilter <classname> |
A Filter<Tree> to load by reflection which eliminates
trees from the data read |
-noTags |
If present, will only output the words, no tags at all
|
-noSpaces |
If present, words will be concatenated together |
All other arguments will be treated as filenames to read.