I just figured out that...
dividing all elements of a vector by a constant and then normalizing the vector
...is the same as...
normalizing the vector and then dividing the answer (scalar) by the constant
:-D
I want to change the code that calculate the variance in AssociatedPairDataSetBuilder to return a scalar instead of a Vector that has to be normalized. I also want to normalize the diffSquare Vector before dividing it by the set's size. This should be faster, because we save one division calculation per element. This will also simplify code in other classes, because we don't have to call .norm() on the Vectors that are returned then.
Alternatively, just add another method that returns a scalar and keep the one that returns a Vector.
I'm putting this on the tracker so that I don't forget about it.
Logged In: YES
user_id=975307
Originator: NO
Where are we doing this calculation? I have some ideas to refactor these operations into a standard set of utilities.
Logged In: YES
user_id=975307
Originator: NO
Yup :)
I have been planning a generic StatisticsUtil class of sorts for some time now. Now should be a good time for us to add something like this.
Logged In: YES
user_id=1079274
Originator: YES
A generic StatisticsUtil class would be a most excellent idea. I've also been toying with the idea... It will reduce the amount of code in CILib dramatically, because mean, deviations and variances are calculated all over the show. We would just have to define a standard way of representing a set of Types, i.e. will we use ArrayList<Type> or will we create a new CILib specific class. This is so that everyone who wants to create sets of anything will use the CILib standard, because then it will work with the StatisticsUtil class. We will also need methods for scalars as well as vectors, because they are handled differently, i.e:
double getMean(CILibSetStandardClass<Numeric> set); //and
Vector getMean(CILibSetStandardClass<Vector> set);
Currently, I have getMean() and getVariance() in AssociatedPairDataSetBuilder. Those methods return the mean and variance of the "this" dataset respectively. Both of them make use of the getSetMean() and getSetVariance() methods, also in AssociatedPairDataSetBuilder.
Logged In: YES
user_id=1079274
Originator: YES
We should also keep the "new dataset stuff" in mind. Since some statistics will be performed on it.
Logged In: YES
user_id=975307
Originator: NO
Small steps. The new dataset stuff can use it but I don't think building the stats stuff into it is a good idea. The stats stuff will be the simplest building blocks from which other actions can benefit. I have a class I wrote for some functions with this code in it. I'm quickly gonna find it
Logged In: YES
user_id=1079274
Originator: YES
This tracker item depends on the StatsUtil class that still needs to be implemented... I'm therefore changing the status to "pending".
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
Logged In: YES
user_id=975307
Originator: NO
The initial StatsUtils has been committed. This will be refactored as needed.
Logged In: YES
user_id=1079274
Originator: YES
I removed the methods that calculated the mean and variance of a set/cluster/collection from AssociatedPairDataSetBuilder and moved them to the StatUtils class. Also added unit tests. The AssociatedPairDataSetBuilder caches and keeps track of its own mean and variance though.
This changed occured in revision 751
I might misunderstand your statement. But if a vector is normalized then the answer is still a vector (of unit length) and not a scalar. So dividing an unnormalized vector by a scalar is not the same as dividing the equivalent normalized vector by a scalar (the vectors will have the same direction, but different magnitudes).
Sorry, I clearly used the wrong terms. What I meant was:
dividing all elements of a vector by a scalar and then taking the norm of the
vector (which is a scalar)
...is the same as...
taking the norm of the vector (which is a scalar) and dividing it by the scalar
There is definitely some confusion here. A normal vector is a vector of unit length that is orthogonal to a specific point (be it on a surface or whatever).
Theuns, can you provide the math for this?
Hi
I think I might have figured out what Theuns has in mind, to calculate the norm (not normalize) of a vector is to assign a positive scalar to the vector, which is it's size in a vector space. Usually the norm is the euclidean length, but it can be different, if a different norm calculation is used (Manhattan etc.)
If this is the case, the method name is incorrect, it should be norm(), which returns a scalar and then what Theuns said is true. Normalization is the process of finding a vector of unit length parallel to the original vector ( u = v\norm(v)), which is still a useful method to have, since it is a common operation in vector algebra.
That's currently the implementation in the Vector class. Ie: ||x|| = sqrt(x_1^2 + ... + x_n^2).
I'm really confused about what all this is really about now... Is something missing?
I had a look at the class now. As is, the implementation is correct, that is indeed the vector's norm that is calculated. I think Theuns used the word normalized incorrectly in the Tracker submission. Currently CIlib cannot normalize a vector, if you want to add it, here is some code (not tested):
void normalize() {
double norm = this.norm();
if ((norm == 1)||(norm==0))
return;
for (int i = 0; i < this.size(); i++)
this.setReal(i,this.getReal(i) / norm);
}
Sorry if I caused unnecessary confusion, it just didn't make sense to me that the result of normalization is a scalar.
So you are wanting to create the unit vector of a vector?
That is normalization, if that is what is required.