public class ClauseSplitterSearchProblem
extends java.lang.Object
For usage at test time, load a model from
ClauseSplitter.load(String)
, and then take the top clauses of a given tree
with topClauses(double, int)
, yielding a list of
SentenceFragment
s.
ClauseSearcher searcher = ClauseSearcher.factory("/model/path/");
List<SentenceFragment> sentences = searcher.topClauses(threshold);
For training, see ClauseSplitter.train(Stream, File, File)
.
Modifier and Type | Class and Description |
---|---|
static interface |
ClauseSplitterSearchProblem.Action
An action being taken; that is, the type of clause splitting going on.
|
static interface |
ClauseSplitterSearchProblem.Featurizer
Mostly just an alias, but make sure our featurizer is serializable!
|
class |
ClauseSplitterSearchProblem.State
A search state.
|
static class |
ClauseSplitterSearchProblem.TrainingOptions
The options used for training the clause searcher.
|
Modifier and Type | Field and Description |
---|---|
boolean |
assumedTruth
The assumed truth of the original clause.
|
static ClauseSplitterSearchProblem.Featurizer |
DEFAULT_FEATURIZER
The default featurizer to use during training.
|
protected static java.util.Map<java.lang.String,java.util.List<java.lang.String>> |
HARD_SPLITS
A specification for clause splits we _always_ want to do.
|
protected static java.util.Set<java.lang.String> |
INDIRECT_SPEECH_LEMMAS
A set of words which indicate that the complement clause is not factual, or at least not necessarily factual.
|
int |
sentenceLength
The length of the sentence, as determined from the tree.
|
SemanticGraph |
tree
The tree to search over.
|
Modifier | Constructor and Description |
---|---|
|
ClauseSplitterSearchProblem(SemanticGraph tree,
boolean assumedTruth)
Create a clause searcher which searches naively through every possible subtree as a clause.
|
protected |
ClauseSplitterSearchProblem(SemanticGraph tree,
boolean assumedTruth,
java.util.Optional<Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String>> isClauseClassifier,
java.util.Optional<java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>>> featurizer)
Create a searcher manually, suppling a dependency tree, an optional classifier for when to split clauses,
and a featurizer for that classifier.
|
Modifier and Type | Method and Description |
---|---|
protected void |
search(IndexedWord root,
java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments,
Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String> classifier,
java.util.Map<java.lang.String,? extends java.util.List<java.lang.String>> hardCodedSplits,
java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>> featurizer,
java.util.Collection<ClauseSplitterSearchProblem.Action> actionSpace,
int maxTicks)
The core implementation of the search.
|
void |
search(java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments)
Search, using the default weights / featurizer.
|
void |
search(java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments,
Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String> classifier,
java.util.Map<java.lang.String,java.util.List<java.lang.String>> hardCodedSplits,
java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>> featurizer,
int maxTicks)
Search from the root of the tree.
|
java.util.List<SentenceFragment> |
topClauses(double thresholdProbability,
int maxClauses)
Get the top few clauses from this searcher, cutting off at the given minimum
probability.
|
protected static final java.util.Map<java.lang.String,java.util.List<java.lang.String>> HARD_SPLITS
protected static final java.util.Set<java.lang.String> INDIRECT_SPEECH_LEMMAS
public final SemanticGraph tree
public final boolean assumedTruth
public final int sentenceLength
public static final ClauseSplitterSearchProblem.Featurizer DEFAULT_FEATURIZER
protected ClauseSplitterSearchProblem(SemanticGraph tree, boolean assumedTruth, java.util.Optional<Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String>> isClauseClassifier, java.util.Optional<java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>>> featurizer)
ClauseSplitter.load(String)
instead of this
constructor.tree
- The dependency tree to search over.assumedTruth
- The assumed truth of the tree (relevant for natural logic inference). If in doubt, pass in true.isClauseClassifier
- The classifier for whether a given dependency arc should be a new clause. If this is not given, all arcs are treated as clause separators.featurizer
- The featurizer for the classifier. If no featurizer is given, one should be given in search(java.util.function.Predicate, Classifier, Map, java.util.function.Function, int)
, or else the classifier will be useless.ClauseSplitter.load(String)
public ClauseSplitterSearchProblem(SemanticGraph tree, boolean assumedTruth)
tree
- The dependency tree to search over.assumedTruth
- The truth of the premise. Almost always True.public java.util.List<SentenceFragment> topClauses(double thresholdProbability, int maxClauses)
thresholdProbability
- The threshold under which to stop returning clauses. This should be between 0 and 1.maxClauses
- A hard limit on the number of clauses to return.SentenceFragment
objects, representing the top clauses of the sentence.public void search(java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments)
topClauses(double, int)
may be a more convenient method for
an end user.candidateFragments
- The callback function for results. The return value defines whether to continue searching.public void search(java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments, Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String> classifier, java.util.Map<java.lang.String,java.util.List<java.lang.String>> hardCodedSplits, java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>> featurizer, int maxTicks)
candidateFragments
- The callback function.classifier
- The classifier for whether an arc should be on the path to a clause split, a clause split itself, or neither.featurizer
- The featurizer to use during search, to be dot producted with the weights.search(Predicate)
protected void search(IndexedWord root, java.util.function.Predicate<Triple<java.lang.Double,java.util.List<Counter<java.lang.String>>,java.util.function.Supplier<SentenceFragment>>> candidateFragments, Classifier<ClauseSplitter.ClauseClassifierLabel,java.lang.String> classifier, java.util.Map<java.lang.String,? extends java.util.List<java.lang.String>> hardCodedSplits, java.util.function.Function<Triple<ClauseSplitterSearchProblem.State,ClauseSplitterSearchProblem.Action,ClauseSplitterSearchProblem.State>,Counter<java.lang.String>> featurizer, java.util.Collection<ClauseSplitterSearchProblem.Action> actionSpace, int maxTicks)
root
- The root word to search from. Traditionally, this is the root of the sentence.candidateFragments
- The callback for the resulting sentence fragments.
This is a predicate of a triple of values.
The return value of the predicate determines whether we should continue searching.
The triple is a triple of
Supplier
.classifier
- The classifier for whether an arc should be on the path to a clause split, a clause split itself, or neither.featurizer
- The featurizer to use. Make sure this matches the weights!actionSpace
- The action space we are allowed to take. Each action defines a means of splitting a clause on a dependency boundary.