|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.classify.AbstractLinearClassifierFactory
edu.stanford.nlp.classify.LinearClassifierFactory
public class LinearClassifierFactory
Builds various types of linear classifiers, with functionality for
setting objective function, optimization method, and other parameters.
Classifiers can be defined with passed constructor arguments or using setter methods.
Defaults to Quasi-newton optimization of a LogConditionalObjectiveFunction
(Merges old classes: CGLinearClassifierFactory, QNLinearClassifierFactory, and MaxEntClassifierFactory).
Field Summary | |
---|---|
protected static double[] |
sigmasToTry
|
Constructor Summary | |
---|---|
LinearClassifierFactory()
|
|
LinearClassifierFactory(boolean useSum)
|
|
LinearClassifierFactory(double tol)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
double sigma)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
int prior,
double sigma,
double epsilon)
|
|
LinearClassifierFactory(double tol,
boolean useSum,
int prior,
double sigma,
double epsilon,
int mem)
|
|
LinearClassifierFactory(Minimizer min)
|
|
LinearClassifierFactory(Minimizer min,
boolean useSum)
|
|
LinearClassifierFactory(Minimizer min,
double tol,
boolean useSum)
|
|
LinearClassifierFactory(Minimizer min,
double tol,
boolean useSum,
double sigma)
|
|
LinearClassifierFactory(Minimizer min,
double tol,
boolean useSum,
int prior,
double sigma)
|
|
LinearClassifierFactory(Minimizer min,
double tol,
boolean useSum,
int prior,
double sigma,
double epsilon)
Create a factory that builds linear classifiers from training data. |
|
LinearClassifierFactory(Minimizer min,
double tol,
boolean useSum,
LogPrior logPrior)
|
Method Summary | |
---|---|
double[][] |
adaptWeights(double[][] origWeights,
GeneralDataset adaptDataset)
Adapt classifier (adjust the mean of Gaussian prior) under construction -pichuan |
void |
crossValidateSetSigma(GeneralDataset dataset)
Calls the method crossValidateSetSigma(GeneralDataset, int) with 5-fold cross-validation. |
void |
crossValidateSetSigma(GeneralDataset dataset,
int kfold)
callls the method crossValidateSetSigma(GeneralDataset, int, Scorer, LineSearcher) with
multi-class log-likelihood scoring (see MultiClassAccuracyStats ) and golden-section line search
(see GoldenSectionLineSearch ). |
void |
crossValidateSetSigma(GeneralDataset dataset,
int kfold,
LineSearcher minimizer)
|
void |
crossValidateSetSigma(GeneralDataset dataset,
int kfold,
Scorer scorer)
|
void |
crossValidateSetSigma(GeneralDataset dataset,
int kfold,
Scorer scorer,
LineSearcher minimizer)
Sets the sigma parameter to a value that optimizes the cross-validation score given by scorer . |
double |
getSigma()
|
double[] |
heldOutSetSigma(GeneralDataset train)
|
double[] |
heldOutSetSigma(GeneralDataset train,
GeneralDataset dev)
|
double[] |
heldOutSetSigma(GeneralDataset train,
GeneralDataset dev,
LineSearcher minimizer)
|
double[] |
heldOutSetSigma(GeneralDataset train,
GeneralDataset dev,
Scorer scorer)
|
double[] |
heldOutSetSigma(GeneralDataset trainSet,
GeneralDataset devSet,
Scorer scorer,
LineSearcher minimizer)
Sets the sigma parameter to a value that optimizes the held-out score given by scorer . |
double[] |
heldOutSetSigma(GeneralDataset train,
Scorer scorer)
|
Classifier |
loadFromFilename(String file)
Given the path to a file representing the text based serialization of a Linear Classifier, reconstitutes and returns that LinearClassifier. |
void |
resetWeight()
resetWeight sets the restWeight flag. |
void |
setEpsilon(double eps)
Sets the epsilon value for LogConditionalObjectiveFunction . |
void |
setHeldOutSearcher(LineSearcher heldOutSearcher)
Set the LineSearcher to be used in heldOutSetSigma(GeneralDataset, GeneralDataset) . |
void |
setMem(int mem)
Set the mem value for QNMinimizer . |
void |
setMinimizer(Minimizer min)
Sets the minimizer. |
void |
setPrior(LogPrior logPrior)
Set the prior. |
void |
setRetrainFromScratchAfterSigmaTuning(boolean retrainFromScratchAfterSigmaTuning)
If set to true, then when training a classifier, after an optimal sigma is chosen a model is relearned from scratch. |
void |
setSigma(double sigma)
|
void |
setTol(double tol)
Set the tolerance. |
void |
setTuneSigmaCV(int folds)
setTuneSigmaCV sets the tuneSigmaCV flag: when turned on,
the sigma is tuned by cross-validation. |
void |
setTuneSigmaHeldOut()
setTuneSigmaHeldOut sets the tuneSigmaHeldOut flag: when turned on,
the sigma is tuned by means of held-out (70%-30%). |
void |
setUseSum(boolean useSum)
SetUseSum sets the useSum flag: when turned on,
the Summed Conditional Objective Function is used. |
void |
setVerbose(boolean verbose)
Set the verbose flag for CGMinimizer . |
Classifier |
trainClassifier(GeneralDataset dataset,
double[] initial)
|
Classifier |
trainClassifierSemiSup(GeneralDataset data,
GeneralDataset biasedData,
double[][] confusionMatrix,
double[] initial)
IMPORTANT: dataset and biasedDataset must have same featureIndex, labelIndex |
Classifier |
trainClassifierV(GeneralDataset train,
double min,
double max,
boolean accuracy)
Train a classifier with a sigma tuned on a validation set. |
Classifier |
trainClassifierV(GeneralDataset train,
GeneralDataset validation,
double min,
double max,
boolean accuracy)
Train a classifier with a sigma tuned on a validation set. |
double[][] |
trainWeights(GeneralDataset dataset)
|
double[][] |
trainWeights(GeneralDataset dataset,
double[] initial)
|
double[][] |
trainWeights(GeneralDataset dataset,
double[] initial,
boolean bypassTuneSigma)
|
double[][] |
trainWeightsSemiSup(GeneralDataset data,
GeneralDataset biasedData,
double[][] confusionMatrix,
double[] initial)
|
void |
useConjugateGradientAscent()
Sets the minimizer to CGMinimizer . |
void |
useConjugateGradientAscent(boolean verbose)
Sets the minimizer to CGMinimizer , with the passed verbose flag. |
void |
useHybridMinimizer()
|
void |
useHybridMinimizer(double initialSMDGain,
int stochasticBatchSize,
StochasticCalculateMethods stochasticMethod,
int cutoffIteration)
|
void |
useQuasiNewton()
Sets the minimizer to QuasiNewton. |
void |
useStochasticGradientDescent()
|
void |
useStochasticGradientDescent(double gainSGD,
int stochasticBatchSize)
|
void |
useStochasticGradientDescentToQuasiNewton(SeqClassifierFlags p)
|
void |
useStochasticMetaDescent()
|
void |
useStochasticMetaDescent(double initialSMDGain,
int stochasticBatchSize,
StochasticCalculateMethods stochasticMethod)
|
void |
useStochasticQN(double initialSMDGain,
int stochasticBatchSize)
|
Methods inherited from class edu.stanford.nlp.classify.AbstractLinearClassifierFactory |
---|
trainClassifier, trainClassifier, trainClassifier |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static double[] sigmasToTry
Constructor Detail |
---|
public LinearClassifierFactory()
public LinearClassifierFactory(Minimizer min)
public LinearClassifierFactory(boolean useSum)
public LinearClassifierFactory(double tol)
public LinearClassifierFactory(Minimizer min, boolean useSum)
public LinearClassifierFactory(Minimizer min, double tol, boolean useSum)
public LinearClassifierFactory(double tol, boolean useSum, double sigma)
public LinearClassifierFactory(Minimizer min, double tol, boolean useSum, double sigma)
public LinearClassifierFactory(Minimizer min, double tol, boolean useSum, int prior, double sigma)
public LinearClassifierFactory(double tol, boolean useSum, int prior, double sigma, double epsilon)
public LinearClassifierFactory(double tol, boolean useSum, int prior, double sigma, double epsilon, int mem)
public LinearClassifierFactory(Minimizer min, double tol, boolean useSum, int prior, double sigma, double epsilon)
min
- The method to be used for optimization (minimization) (default: QNMinimizer
)tol
- The convergence threshold for the minimization (default: 1e-4)useSum
- Asks to the optimizer to minimize the sum of the
likelihoods of individual data items rather than their product (default: false)prior
- What kind of prior to use, as an enum constant from class
LogPriorsigma
- The strength of the prior (smaller is stronger for most
standard priors) (default: 1.0)epsilon
- A second parameter to the prior (currently only used
by the Huber prior)public LinearClassifierFactory(Minimizer min, double tol, boolean useSum, LogPrior logPrior)
Method Detail |
---|
public double[][] adaptWeights(double[][] origWeights, GeneralDataset adaptDataset)
origWeights
- the original weights trained from the training dataadaptDataset
- the Dataset used to adapt the trained weights
public double[][] trainWeights(GeneralDataset dataset)
trainWeights
in class AbstractLinearClassifierFactory
public double[][] trainWeights(GeneralDataset dataset, double[] initial)
public double[][] trainWeights(GeneralDataset dataset, double[] initial, boolean bypassTuneSigma)
public Classifier trainClassifierSemiSup(GeneralDataset data, GeneralDataset biasedData, double[][] confusionMatrix, double[] initial)
public double[][] trainWeightsSemiSup(GeneralDataset data, GeneralDataset biasedData, double[][] confusionMatrix, double[] initial)
public Classifier trainClassifierV(GeneralDataset train, GeneralDataset validation, double min, double max, boolean accuracy)
train
- validation
-
public Classifier trainClassifierV(GeneralDataset train, double min, double max, boolean accuracy)
train
- The data to train (and validate) on.
public void setTol(double tol)
public void setPrior(LogPrior logPrior)
logPrior
- One of the priors defined in
LogConditionalObjectiveFunction
.
LogPrior.QUADRATIC
is the default.public void setVerbose(boolean verbose)
CGMinimizer
.
Only used with conjugate-gradient minimization.
false
is the default.
public void setMinimizer(Minimizer min)
QNMinimizer
is the default.
public void setEpsilon(double eps)
LogConditionalObjectiveFunction
.
public void setSigma(double sigma)
public double getSigma()
public void useQuasiNewton()
QNMinimizer
is the default.
public void useStochasticQN(double initialSMDGain, int stochasticBatchSize)
public void useStochasticMetaDescent()
public void useStochasticMetaDescent(double initialSMDGain, int stochasticBatchSize, StochasticCalculateMethods stochasticMethod)
public void useStochasticGradientDescent()
public void useStochasticGradientDescent(double gainSGD, int stochasticBatchSize)
public void useStochasticGradientDescentToQuasiNewton(SeqClassifierFlags p)
public void useHybridMinimizer()
public void useHybridMinimizer(double initialSMDGain, int stochasticBatchSize, StochasticCalculateMethods stochasticMethod, int cutoffIteration)
public void setMem(int mem)
QNMinimizer
.
Only used with quasi-newton minimization. 15 is the default.
mem
- Number of previous function/derivative evaluations to store
to estimate second derivative. Storing more previous evaluations
improves training convergence speed. This number can be very
small, if memory conservation is the priority. For large
optimization systems (of 100,000-1,000,000 dimensions), setting this
to 15 produces quite good results, but setting it to 50 can
decrease the iteration count by about 20% over a value of 15.public void useConjugateGradientAscent(boolean verbose)
CGMinimizer
, with the passed verbose
flag.
public void useConjugateGradientAscent()
CGMinimizer
.
public void setUseSum(boolean useSum)
useSum
flag: when turned on,
the Summed Conditional Objective Function is used. Otherwise, the
LogConditionalObjectiveFunction is used. The default is false.
public void setTuneSigmaHeldOut()
tuneSigmaHeldOut
flag: when turned on,
the sigma is tuned by means of held-out (70%-30%). Otherwise no tuning on sigma is done.
The default is false.
public void setTuneSigmaCV(int folds)
tuneSigmaCV
flag: when turned on,
the sigma is tuned by cross-validation. The number of folds is the parameter.
If there is less data than the number of folds, leave-one-out is used.
The default is false.
public void resetWeight()
restWeight
flag. This flag makes sense only if sigma is tuned:
when turned on, the weights outputed by the tuneSigma method will be reset to zero when training the
classifier.
The default is false.
public void crossValidateSetSigma(GeneralDataset dataset)
crossValidateSetSigma(GeneralDataset, int)
with 5-fold cross-validation.
dataset
- the data set to optimize sigma on.public void crossValidateSetSigma(GeneralDataset dataset, int kfold)
crossValidateSetSigma(GeneralDataset, int, Scorer, LineSearcher)
with
multi-class log-likelihood scoring (see MultiClassAccuracyStats
) and golden-section line search
(see GoldenSectionLineSearch
).
dataset
- the data set to optimize sigma on.kfold
- public void crossValidateSetSigma(GeneralDataset dataset, int kfold, Scorer scorer)
public void crossValidateSetSigma(GeneralDataset dataset, int kfold, LineSearcher minimizer)
public void crossValidateSetSigma(GeneralDataset dataset, int kfold, Scorer scorer, LineSearcher minimizer)
scorer
. Search for an optimal value
is carried out by minimizer
dataset
- the data set to optimize sigma on.kfold
- public void setHeldOutSearcher(LineSearcher heldOutSearcher)
LineSearcher
to be used in heldOutSetSigma(GeneralDataset, GeneralDataset)
.
public double[] heldOutSetSigma(GeneralDataset train)
public double[] heldOutSetSigma(GeneralDataset train, Scorer scorer)
public double[] heldOutSetSigma(GeneralDataset train, GeneralDataset dev)
public double[] heldOutSetSigma(GeneralDataset train, GeneralDataset dev, Scorer scorer)
public double[] heldOutSetSigma(GeneralDataset train, GeneralDataset dev, LineSearcher minimizer)
public double[] heldOutSetSigma(GeneralDataset trainSet, GeneralDataset devSet, Scorer scorer, LineSearcher minimizer)
scorer
. Search for an optimal value
is carried out by minimizer
dataset the data set to optimize sigma on.
kfold
public void setRetrainFromScratchAfterSigmaTuning(boolean retrainFromScratchAfterSigmaTuning)
public Classifier trainClassifier(GeneralDataset dataset, double[] initial)
public Classifier loadFromFilename(String file)
file
-
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |