See: Description
Interface | Description |
---|---|
Classifier<L,F> |
A simple interface for classifying and scoring data points, implemented
by most of the classifiers in this package.
|
ClassifierCreator<L,F> |
Creates a classifier with given weights
|
ClassifierFactory<L,F,C extends Classifier<L,F>> |
A simple interface for training a Classifier from a Dataset of training
examples.
|
ProbabilisticClassifier<L,F> | |
ProbabilisticClassifierCreator<L,F> |
Creates a probablic classifier with given weights
|
RVFClassifier<L,F> |
A simple interface for classifying and scoring data points with
real-valued features.
|
Class | Description |
---|---|
AbstractLinearClassifierFactory<L,F> |
Shared methods for training a
LinearClassifier . |
AdaptedGaussianPriorObjectiveFunction<L,F> |
Adapt the mean of the Gaussian Prior by shifting the mean to the previously trained weights
|
BiasedLogConditionalObjectiveFunction |
Maximizes the conditional likelihood with a given prior.
|
BiasedLogisticObjectiveFunction | |
ClassifierExample |
Sample code that illustrates the training and use of a linear classifier.
|
ColumnDataClassifier |
ColumnDataClassifier provides a command-line interface for doing
context-free (independent) classification of a series of data items,
where each data item is represented by a line of
a file, as a list of String variables, in tab-separated columns.
|
CrossValidator<L,F> |
This class is meant to simplify performing cross validation of
classifiers for hyper-parameters.
|
CrossValidator.SavedState | |
Dataset<L,F> |
An interfacing class for
ClassifierFactory that incrementally
builds a more memory-efficient representation of a List of
Datum objects for the purposes of training a Classifier
with a ClassifierFactory . |
GeneralDataset<L,F> |
The purpose of this interface is to unify
Dataset and RVFDataset . |
GeneralizedExpectationObjectiveFunction<L,F> |
Implementation of Generalized Expectation Objective function for
an I.I.D.
|
KNNClassifier<K,V> |
A simple k-NN classifier, with the options of using unit votes, or weighted votes (by
similarity value).
|
KNNClassifierFactory<K,V> |
This constructs trained
KNNClassifier objects, given
sets of RVFDatums, or Counters (dimensions are identified by the keys). |
LinearClassifier<L,F> |
Implements a multiclass linear classifier.
|
LinearClassifierFactory<L,F> |
Builds various types of linear classifiers, with functionality for
setting objective function, optimization method, and other parameters.
|
LinearClassifierFactory.LinearClassifierCreator<L,F> | |
LogConditionalEqConstraintFunction |
Maximizes the conditional likelihood with a given prior.
|
LogConditionalObjectiveFunction<L,F> |
Maximizes the conditional likelihood with a given prior.
|
LogisticClassifier<L,F> |
A classifier for binary logistic regression problems.
|
LogisticClassifierFactory<L,F> |
Builds a classifier for binary logistic regression problems.
|
LogisticObjectiveFunction |
Maximizes the conditional likelihood with a given prior.
|
LogisticUtils |
A central place for utility functions used when training robust logistic models.
|
LogPrior |
A Prior for functions.
|
MultinomialLogisticClassifier<L,F> |
A multinomial logistic regression classifier.
|
NaiveBayesClassifier<L,F> |
A Naive Bayes classifier with a fixed number of features.
|
NaiveBayesClassifierFactory<L,F> |
Creates a NaiveBayesClassifier given an RVFDataset.
|
NBLinearClassifierFactory<L,F> |
Provides a medium-weight implementation of Bernoulli (or binary)
Naive Bayes via a linear classifier.
|
NominalDataReader |
A class to read some UCI datasets into RVFDatum.
|
OneVsAllClassifier<L,F> |
One vs All multiclass classifier
|
PRCurve |
A class to create recall-precision curves given scores
used to fit the best monotonic function for logistic regression and SVMs.
|
RVFDataset<L,F> |
An interfacing class for
ClassifierFactory that incrementally builds
a more memory-efficient representation of a List of RVFDatum
objects for the purposes of training a Classifier with a
ClassifierFactory . |
SemiSupervisedLogConditionalObjectiveFunction |
Maximizes the conditional likelihood with a given prior.
|
ShiftParamsLogisticClassifierFactory<L,F> | |
ShiftParamsLogisticObjectiveFunction | |
SVMLightClassifier<L,F> |
This class represents a trained SVM Classifier.
|
SVMLightClassifierFactory<L,F> |
This class is meant for training SVMs (
SVMLightClassifier s). |
WeightedDataset<L,F> | |
WeightedRVFDataset<L,F> |
A weighted version of the RVF dataset.
|
Enum | Description |
---|---|
LogPrior.LogPriorType |
Classifier
contract only guarantees routines for getting a classification for an example,
and the scores assigned to each class for that example.
Note that training is dependent upon the individual classifier.
Classifiers operate over Datum
objects. A Datum
is a list of descriptive features and
a class label; features and labels can be any object, but usually String
s are used. A Datum can store
only categorical features (common in NLP) or it can store features with real values. The latter is referred to in
this package as an RVFDatum (real-valued feature datum). Datum objects are grouped using Dataset
objects.
Some classifiers use Dataset objects as a way of grouping inputs.
Following is a set of examples outlining how to create, train, and use each of the different classifier types.
GeneralDataset
, which is a list to Datum
objects.
A Datum
is a list of descriptive features, along with a label; features and labels can be any object,
though we usually use strings.
GeneralDataset dataSet=new Dataset(); while (more datums to make) { ... make featureList: e.g., ["PrevWord=at","CurrentTag=NNP","isUpperCase"] ... make label: e.g., ["PLACE"]; Datum d = new BasicDatum(featureList, label); dataSet.add(d); }There are some useful methods in
GeneralDataset
such as:
dataSet.applyFeatureCountThreshold(int cutoff); dataSet.summaryStatistics(); // dumps the number of features and datumsNext, one makes a
LinearClassifierFactory
and calls its trainClassifier(GeneralDataset dataSet)
method:
LinearClassifierFactory lcFactory = new LinearClassifierFactory(); LinearClassifier c = lcFactory.trainClassifier(dataSet);
LinearClassifierFactory
has options for different optimizers (default: QNMinimizer),
the converge threshold for minimization, etc. Check the class description for detailed information.
A classifier, once built, can be used to classify new Datum
instances:
Object label = c.classOf(mysteryDatum);If you want scores instead, you can ask:
Counter scores = c.scoresOf(mysteryDatum);The scores which are returned by the log-linear classifiers are the feature-weight dot products, not the normalized probabilities. There are some other useful methods like
justificationOf(Datum d)
, and
logProbabilityOf(Datum d)
, also various methods for visualizing the
weights and the most highly weighted features.
This concludes the log-linear classifiers with binary features.
We can also train log-linear classifiers with real-valued features. In this case,
RVFDatum
should be used.
RVFDatum
objects. A RVFDatum is composed of a set of features
and real-value pairs. RVFDatums are grouped using a RVFDataset
.
To assemble an RVFDatum
by using a Counter
and assigning an Object
label to it.
Counter features = new Counter(); features.incrementCount("FEATURE_A", 1.2); features.incrementCount("FEATURE_B", 2.3); features.incrementCount("FEATURE_C", 0.5); RVFDatum rvfDatum = new RVFDatum(features, "DATUM_LABEL");
RVFDataset
objects are representations of RVFDatum
objects that efficiently store
the data with which to train the classifier. This type of dataset only accepts RVFDatum
objects via its add
method (other Datum
objects that are not instances of RVFDatum
will be ignored), and is equivalent
to a Dataset
if all RVFDatum
objects have only features with value 1.0. Since it is a subclass of GeneralDataset
,
the methods shown above as applied to the GeneralDataset
can also be applied to the RVFDataset
.
writeClassifier(classifier, serializationPath);Alternately, if your features are Strings, and you wish to serialize to a human readable text file, you can use
saveToFilename
in LinearClassifier
and reconstitute using loadFromFilename
in LinearClassifierFactory
. Though the format is not as compact as a serialized object,
and implicitly presumes the features are Strings, this is useful for debugging purposes.