edu.stanford.nlp.classify
Class RVFDataset

java.lang.Object
  extended by edu.stanford.nlp.classify.GeneralDataset
      extended by edu.stanford.nlp.classify.RVFDataset

public class RVFDataset
extends GeneralDataset

An interfacing class for ClassifierFactory that incrementally builds a more memory-efficent representation of a List of RVFDatum objects for the purposes of training a Classifier with a ClassifierFactory.

Author:
Jenny Finkel (jrfinkel@stanford.edu), Rajat Raina (added methods to record data sources and ids), Anna Rafferty (various refactoring with GeneralDataset/Dataset)

Field Summary
 
Fields inherited from class edu.stanford.nlp.classify.GeneralDataset
data, featureIndex, labelIndex, labels, size
 
Constructor Summary
RVFDataset()
           
RVFDataset(Index labelIndex, int[] labels, Index featureIndex, int[][] data, double[][] values)
          Constructor that fully specifies a Dataset.
RVFDataset(int numDatums)
           
RVFDataset(int numDatums, Index featureIndex, Index labelIndex)
           
 
Method Summary
 void add(Datum d)
           
 void add(Datum d, String src, String id)
           
 void clear()
          Resets the Dataset so that it is empty and ready to collect data.
 void clear(int numDatums)
          Resets the Dataset so that it is empty and ready to collect data.
 RVFDatum getRVFDatum(int index)
           
 String getRVFDatumId(int index)
           
 String getRVFDatumSource(int index)
           
 double[][] getValuesArray()
           
protected  void initialize(int numDatums)
          This method takes care of resetting values of the dataset such that it is empty with an initial capacity of numDatums Should be accessed only by appropriate methods within the class, such as clear(), which take care of other parts of the emptying of data
static void main(String[] args)
           
 void printFullFeatureMatrix(PrintWriter pw)
          prints the full feature matrix in tab-delimited form.
 void printFullFeatureMatrixWithValues(PrintWriter pw)
          Modification of printFullFeatureMatrix to correct bugs & print values (Rajat).
 void printSparseFeatureMatrix()
          prints the sparse feature matrix using printSparseFeatureMatrix() to System.out.
 void printSparseFeatureMatrix(PrintWriter pw)
          prints a sparse feature matrix representation of the Dataset.
static RVFDataset readSVMLightFormat(String filename)
          Constructs a Dataset by reading in a file in SVM light format.
static RVFDataset readSVMLightFormat(String filename, Index featureIndex, Index labelIndex)
          Constructs a Dataset by reading in a file in SVM light format.
static RVFDataset readSVMLightFormat(String filename, List<String> lines)
          Constructs a Dataset by reading in a file in SVM light format.
 Pair<GeneralDataset,GeneralDataset> split(double percentDev)
           
 Pair<GeneralDataset,GeneralDataset> split(int start, int end)
           
 void summaryStatistics()
          Prints some summary statistics to stderr for the Dataset.
static RVFDatum svmLightLineToRVFDatum(String l)
           
 String toString()
           
 String toSummaryString()
           
 
Methods inherited from class edu.stanford.nlp.classify.GeneralDataset
addAll, applyFeatureCountThreshold, featureIndex, getDataArray, getFeatureCounts, getLabelsArray, labelIndex, labelIterator, numClasses, numFeatures, numFeatureTokens, numFeatureTypes, printSVMLightFormat, printSVMLightFormat, size, trimData, trimLabels, trimToSize, trimToSize, trimToSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

RVFDataset

public RVFDataset()

RVFDataset

public RVFDataset(int numDatums,
                  Index featureIndex,
                  Index labelIndex)

RVFDataset

public RVFDataset(int numDatums)

RVFDataset

public RVFDataset(Index labelIndex,
                  int[] labels,
                  Index featureIndex,
                  int[][] data,
                  double[][] values)
Constructor that fully specifies a Dataset. Needed this for MulticlassDataset.

Method Detail

split

public Pair<GeneralDataset,GeneralDataset> split(double percentDev)
Specified by:
split in class GeneralDataset

split

public Pair<GeneralDataset,GeneralDataset> split(int start,
                                                 int end)
Specified by:
split in class GeneralDataset

add

public void add(Datum d)
Specified by:
add in class GeneralDataset

add

public void add(Datum d,
                String src,
                String id)

getRVFDatum

public RVFDatum getRVFDatum(int index)
Specified by:
getRVFDatum in class GeneralDataset
Parameters:
index -
Returns:
the index-ed datum

getRVFDatumSource

public String getRVFDatumSource(int index)

getRVFDatumId

public String getRVFDatumId(int index)

clear

public void clear()
Resets the Dataset so that it is empty and ready to collect data.

Overrides:
clear in class GeneralDataset

clear

public void clear(int numDatums)
Resets the Dataset so that it is empty and ready to collect data.

Overrides:
clear in class GeneralDataset
Parameters:
numDatums - initial capacity of dataset

initialize

protected void initialize(int numDatums)
Description copied from class: GeneralDataset
This method takes care of resetting values of the dataset such that it is empty with an initial capacity of numDatums Should be accessed only by appropriate methods within the class, such as clear(), which take care of other parts of the emptying of data

Specified by:
initialize in class GeneralDataset
Parameters:
numDatums - initial capacity of dataset

summaryStatistics

public void summaryStatistics()
Prints some summary statistics to stderr for the Dataset.

Specified by:
summaryStatistics in class GeneralDataset

printFullFeatureMatrix

public void printFullFeatureMatrix(PrintWriter pw)
prints the full feature matrix in tab-delimited form. These can be BIG matrices, so be careful! [Can also use printFullFeatureMatrixWithValues]


printFullFeatureMatrixWithValues

public void printFullFeatureMatrixWithValues(PrintWriter pw)
Modification of printFullFeatureMatrix to correct bugs & print values (Rajat). Prints the full feature matrix in tab-delimited form. These can be BIG matrices, so be careful!


readSVMLightFormat

public static RVFDataset readSVMLightFormat(String filename)
Constructs a Dataset by reading in a file in SVM light format.


readSVMLightFormat

public static RVFDataset readSVMLightFormat(String filename,
                                            List<String> lines)
Constructs a Dataset by reading in a file in SVM light format. The lines parameter is filled with the lines of the file for further processing (if lines is null, it is assumed no line information is desired)


readSVMLightFormat

public static RVFDataset readSVMLightFormat(String filename,
                                            Index featureIndex,
                                            Index labelIndex)
Constructs a Dataset by reading in a file in SVM light format. the created dataset has the same feature and label index as given


svmLightLineToRVFDatum

public static RVFDatum svmLightLineToRVFDatum(String l)

printSparseFeatureMatrix

public void printSparseFeatureMatrix()
prints the sparse feature matrix using printSparseFeatureMatrix() to System.out.


printSparseFeatureMatrix

public void printSparseFeatureMatrix(PrintWriter pw)
prints a sparse feature matrix representation of the Dataset. Prints the actual Object.toString() representations of features.


main

public static void main(String[] args)

getValuesArray

public double[][] getValuesArray()
Specified by:
getValuesArray in class GeneralDataset

toString

public String toString()
Overrides:
toString in class Object

toSummaryString

public String toSummaryString()


Stanford NLP Group