public class ConcatVector
extends java.lang.Object
Implements a concat vector using an array of arrays, with all its attending resizing efficiencies, and double-pointer inefficiencies. Benchmarking from MinimalML (where I adapted this design from) shows that this is the most efficient of several strategies that can be used to implement this.
What is a ConcatVector? Why do I need it?
In short, you want this for online learning, where you may not know all your sparse features' sizes at initialization. A concat vector is a vector that behaves like a concatenation of smaller component vectors when you want a dot product. However, it never physically concatenates anything, it just dot products each component, and takes the sum. That way, if you need to expand a component during online learning, it's no problem. As an auxiliary benefit, you can specify sparse and dense components, greatly speeding up dot product calculation when you have lots of sparse features.
Constructor and Description |
---|
ConcatVector(int numComponents)
Constructor that initializes space for this concat vector.
|
Modifier and Type | Method and Description |
---|---|
void |
addVectorInPlace(ConcatVector other,
double multiple)
This will add the vector "other" to this vector, scaling other by multiple.
|
ConcatVector |
deepClone() |
double |
dotProduct(ConcatVector other)
This function assumes both vectors are infinitely padded with 0s, so it won't complain if there's a dim mismatch.
|
void |
elementwiseProductInPlace(ConcatVector other)
This will multiply the vector "other" to this vector.
|
double[] |
getDenseComponent(int i)
This function will throw an assert if the component you're requesting isn't dense
|
int |
getNumberOfComponents() |
ConcatVectorProto.ConcatVector.Builder |
getProtoBuilder() |
int |
getSparseIndex(int component)
Gets you the index of one hot in a component, assuming it is sparse.
|
double |
getValueAt(int component,
int offset)
This assumes infinite padding with 0s.
|
boolean |
isComponentSparse(int i) |
void |
mapInPlace(java.util.function.DoubleUnaryOperator fn)
Apply a function to every element of every component of this vector, and replace with the result.
|
ConcatVector |
newEmptyClone()
Creates a ConcatVector whose dimensions are the same as this one for all dense components, but is otherwise
completely empty.
|
static ConcatVector |
readFromProto(ConcatVectorProto.ConcatVector m)
Recreates an in-memory concat vector object from a Proto serialization.
|
static ConcatVector |
readFromStream(java.io.InputStream stream)
Static function to deserialize a concat vector from an input stream.
|
void |
setDenseComponent(int component,
double[] values)
Sets a single component of the concat vector value as a dense vector.
|
void |
setSparseComponent(int component,
int index,
double value)
Sets a single component of the concat vector value as a sparse, one hot value.
|
java.lang.String |
toString() |
boolean |
valueEquals(ConcatVector other,
double tolerance)
Compares two concat vectors by value.
|
void |
writeToStream(java.io.OutputStream stream)
Writes the protobuf version of this vector to a stream.
|
public ConcatVector(int numComponents)
numComponents
- The number of components (usually number of features) to allocate for.public ConcatVector newEmptyClone()
public void setDenseComponent(int component, double[] values)
component
- the index of the component to setvalues
- the array of dense values to put into the componentpublic void setSparseComponent(int component, int index, double value)
component
- the index of the component to setindex
- the index of the vector to one-hotvalue
- the value of that indexpublic double dotProduct(ConcatVector other)
other
- the MV to dot product withpublic ConcatVector deepClone()
public void addVectorInPlace(ConcatVector other, double multiple)
this = this + (other * multiple)
The function assumes that both vectors are padded infinitely with 0s, so will scale this vector by adding components and changing component sizes (dense to bigger dense) and shapes (sparse to dense) in order to accommodate the result.
other
- the vector to add to this onemultiple
- the multiple to usepublic void elementwiseProductInPlace(ConcatVector other)
this = this .* other
The function assumes that both vectors are padded infinitely with 0s, so will result in lots of 0s in this vector if it is longer than 'other'.
other
- the vector to multiply into this onepublic void mapInPlace(java.util.function.DoubleUnaryOperator fn)
fn
- the function to apply to every element of every component.public int getNumberOfComponents()
public boolean isComponentSparse(int i)
i
- the index of the component to checkpublic double[] getDenseComponent(int i)
i
- the index of the component to look atpublic double getValueAt(int component, int offset)
component
- the index of the component to retrieve a value fromoffset
- the offset within that componentpublic int getSparseIndex(int component)
component
- the index of the sparse component.public void writeToStream(java.io.OutputStream stream) throws java.io.IOException
stream
- the output stream to write tojava.io.IOException
- passed through from the streampublic static ConcatVector readFromStream(java.io.InputStream stream) throws java.io.IOException
stream
- the stream to read from, assuming protobuf encodingjava.io.IOException
- passed through from the streampublic ConcatVectorProto.ConcatVector.Builder getProtoBuilder()
public static ConcatVector readFromProto(ConcatVectorProto.ConcatVector m)
m
- the concat vector protopublic boolean valueEquals(ConcatVector other, double tolerance)
other
- the vector we're comparing totolerance
- the amount any pair of values can differ before we say the two vectors are different.public java.lang.String toString()
toString
in class java.lang.Object