public class CoNLLBenchmark
extends java.lang.Object
This loads the CoNLL dataset and 300 dimensional google word embeddings and trains a model on the data using binary and unary factors. This is a nice explanation of why it is key to have ConcatVector as a datastructure, since there is no need to specify the number of words in advance anywhere, and data structures will happily resize with a minimum of GCC wastage.
Modifier and Type | Class and Description |
---|---|
static class |
CoNLLBenchmark.CoNLLSentence |
Constructor and Description |
---|
CoNLLBenchmark() |
Modifier and Type | Method and Description |
---|---|
void |
benchmarkOptimizer() |
GraphicalModel |
generateSentenceModel(ConcatVectorNamespace namespace,
CoNLLBenchmark.CoNLLSentence sentence,
java.util.List<java.lang.String> tags) |
java.util.Map<java.lang.String,double[]> |
getEmbeddings(java.lang.String cacheFilename,
java.util.List<CoNLLBenchmark.CoNLLSentence> sentences) |
java.util.List<CoNLLBenchmark.CoNLLSentence> |
getSentences(java.lang.String filename) |
static void |
main(java.lang.String[] args) |
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
public void benchmarkOptimizer() throws java.lang.Exception
java.lang.Exception
public GraphicalModel generateSentenceModel(ConcatVectorNamespace namespace, CoNLLBenchmark.CoNLLSentence sentence, java.util.List<java.lang.String> tags)
public java.util.List<CoNLLBenchmark.CoNLLSentence> getSentences(java.lang.String filename) throws java.io.IOException
java.io.IOException
public java.util.Map<java.lang.String,double[]> getEmbeddings(java.lang.String cacheFilename, java.util.List<CoNLLBenchmark.CoNLLSentence> sentences) throws java.io.IOException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassNotFoundException