public class GrammaticalStructureConversionUtils
extends java.lang.Object
GrammaticalStructure.main(String[])
.Modifier and Type | Class and Description |
---|---|
static class |
GrammaticalStructureConversionUtils.ConverterOptions
Enum to identify the different TokenizerTypes.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_PARSER_FILE |
Modifier and Type | Method and Description |
---|---|
static void |
convertTrees(java.lang.String[] args,
java.lang.String defaultLang)
Given sentences or trees, output the typed dependencies.
|
static java.lang.String |
dependenciesToCoNLLXString(java.util.Collection<TypedDependency> deps,
CoreMap sentence)
Returns a dependency tree in CoNNL-X format.
|
static java.lang.String |
dependenciesToCoNLLXString(GrammaticalStructure gs,
CoreMap sentence)
Calls dependenciesToCoNLLXString with the basic dependencies
from a grammatical structure.
|
static java.lang.String |
dependenciesToString(GrammaticalStructure gs,
java.util.Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep,
boolean convertToUPOS) |
static void |
printDependencies(GrammaticalStructure gs,
java.util.Collection<TypedDependency> deps,
Tree tree,
boolean conllx,
boolean extraSep,
boolean convertToUPOS)
Print typed dependencies in either the Stanford dependency representation
or in the conllx format.
|
public static final java.lang.String DEFAULT_PARSER_FILE
public static void printDependencies(GrammaticalStructure gs, java.util.Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep, boolean convertToUPOS)
deps
- Typed dependencies to printtree
- Tree corresponding to typed dependencies (only necessary if conllx
== true)conllx
- If true use conllx format, otherwise use Stanford representationextraSep
- If true, in the Stanford representation, the extra dependencies
(which do not preserve the tree structure) are printed after the
basic dependenciesconvertToUPOS
- If true convert the POS tags to universal POS tags and output
them along the original POS tags.public static java.lang.String dependenciesToCoNLLXString(GrammaticalStructure gs, CoreMap sentence)
dependenciesToCoNLLXString(Collection, CoreMap)
)public static java.lang.String dependenciesToCoNLLXString(java.util.Collection<TypedDependency> deps, CoreMap sentence)
deps
- The list of TypedDependency relations.sentence
- The corresponding CoreMap for the sentence.public static java.lang.String dependenciesToString(GrammaticalStructure gs, java.util.Collection<TypedDependency> deps, Tree tree, boolean conllx, boolean extraSep, boolean convertToUPOS)
public static void convertTrees(java.lang.String[] args, java.lang.String defaultLang)
By default, the method outputs the collapsed typed dependencies with processing of conjuncts. The input can be given as plain text (one sentence by line) using the option -sentFile, or as trees using the option -treeFile. For -sentFile, the input has to be strictly one sentence per line. You can specify where to find a parser with -parserFile serializedParserPath. See LexicalizedParser for more flexible processing of text files (including with Stanford Dependencies output). The above options assume a file as input. You can also feed trees (only) via stdin by using the option -filter. If one does not specify a -parserFile, one can specify which language pack to use with -tLPP, This option specifies a class which determines which GrammaticalStructure to use, which HeadFinder to use, etc. It will default to edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams, but any TreebankLangParserParams can be specified.
If no method of producing trees is given other than to use the LexicalizedParser, but no parser is specified, a default parser is used, the English parser. You can specify options to load with the parser using the -parserOpts flag. If the default parser is used, and no options are provided, the option -retainTmpSubcategories is used.
The following options can be used to specify the types of dependencies wanted:
The -conllx
option will output the dependencies in the CoNLL format,
instead of in the standard Stanford format (relation(governor,dependent))
and will retain punctuation by default.
When used in the "collapsed" format, words such as prepositions, conjunctions
which get collapsed into the grammatical relations and are not part of the
sentence per se anymore will be annotated with "erased" as grammatical relation
and attached to the fake "ROOT" node with index 0.
Keeping punctuation is the default behavior. This can be stopped with
-keepPunct false
The -extraSep
option used with -nonCollapsed will print the basic
dependencies first, then a separator ======, and then the extra
dependencies that do not preserve the tree structure. The -test option is
used for debugging: it prints the grammatical structure, as well as the
basic, collapsed and CCprocessed dependencies. It also checks the
connectivity of the collapsed dependencies. If the collapsed dependencies
list doesn't constitute a connected graph, it prints the possible offending
nodes (one of them is the real root of the graph).
Using the -conllxFile, you can pass a file containing Stanford dependencies in the CoNLL format (e.g., the basic dependencies), and obtain another representation using one of the representation options.
Usage:
java edu.stanford.nlp.trees.GrammaticalStructure [-treeFile FILE | -sentFile FILE | -conllxFile FILE | -filter]
[-collapsed -basic -CCprocessed -test -generateOriginalDependencies]
args
- Command-line arguments, as above