Share

Computational Intelligence Library

Tracker: Feature Requests

5 Faster variance calculations - ID: 1835762
Last Update: Comment added ( ab_vanwyk )

I just figured out that...

dividing all elements of a vector by a constant and then normalizing the
vector

...is the same as...

normalizing the vector and then dividing the answer (scalar) by the
constant

:-D

I want to change the code that calculate the variance in
AssociatedPairDataSetBuilder to return a scalar instead of a Vector that
has to be normalized. I also want to normalize the diffSquare Vector before
dividing it by the set's size. This should be faster, because we save one
division calculation per element. This will also simplify code in other
classes, because we don't have to call .norm() on the Vectors that are
returned then.

Alternatively, just add another method that returns a scalar and keep the
one that returns a Vector.

I'm putting this on the tracker so that I don't forget about it.


Theuns Cloete ( heatzync ) - 2007-11-21 13:32

5

Open

Fixed

Gary Pampara

None

None

Public


Comments ( 17 )

Date: 2008-10-07 08:45
Sender: ab_vanwyk

That is normalization, if that is what is required.


Date: 2008-10-07 08:42
Sender: gpamparaProject Admin

So you are wanting to create the unit vector of a vector?


Date: 2008-10-07 08:41
Sender: ab_vanwyk

I had a look at the class now. As is, the implementation is correct, that
is indeed the vector's norm that is calculated. I think Theuns used the
word normalized incorrectly in the Tracker submission. Currently CIlib
cannot normalize a vector, if you want to add it, here is some code (not
tested):

void normalize() {

double norm = this.norm();


if ((norm == 1)||(norm==0))

return;
for (int i = 0; i < this.size(); i++)

this.setReal(i,this.getReal(i) / norm);

}

Sorry if I caused unnecessary confusion, it just didn't make sense to me
that the result of normalization is a scalar.


Date: 2008-10-07 08:22
Sender: gpamparaProject Admin

That's currently the implementation in the Vector class. Ie: ||x|| =
sqrt(x_1^2 + ... + x_n^2).

I'm really confused about what all this is really about now... Is
something missing?


Date: 2008-10-07 08:16
Sender: ab_vanwyk

Hi

I think I might have figured out what Theuns has in mind, to calculate the
norm (not normalize) of a vector is to assign a positive scalar to the
vector, which is it's size in a vector space. Usually the norm is the
euclidean length, but it can be different, if a different norm calculation
is used (Manhattan etc.)

If this is the case, the method name is incorrect, it should be norm(),
which returns a scalar and then what Theuns said is true. Normalization is
the process of finding a vector of unit length parallel to the original
vector ( u = v\norm(v)), which is still a useful method to have, since it
is a common operation in vector algebra.


Date: 2008-10-07 08:03
Sender: gpamparaProject Admin

There is definitely some confusion here. A normal vector is a vector of
unit length that is orthogonal to a specific point (be it on a surface or
whatever).

Theuns, can you provide the math for this?


Date: 2008-10-07 07:54
Sender: heatzync

Sorry, I clearly used the wrong terms. What I meant was:

dividing all elements of a vector by a scalar and then taking the norm of
the
vector (which is a scalar)

...is the same as...

taking the norm of the vector (which is a scalar) and dividing it by the
scalar


Date: 2008-10-07 07:45
Sender: ab_vanwyk

I might misunderstand your statement. But if a vector is normalized then
the answer is still a vector (of unit length) and not a scalar. So dividing
an unnormalized vector by a scalar is not the same as dividing the
equivalent normalized vector by a scalar (the vectors will have the same
direction, but different magnitudes).




Date: 2008-05-22 08:11
Sender: heatzync


I removed the methods that calculated the mean and variance of a
set/cluster/collection from AssociatedPairDataSetBuilder and moved them to
the StatUtils class. Also added unit tests. The
AssociatedPairDataSetBuilder caches and keeps track of its own mean and
variance though.

This changed occured in revision 751


Date: 2008-02-27 09:29
Sender: gpamparaProject Admin


The initial StatsUtils has been committed. This will be refactored as
needed.


Date: 2007-12-22 03:20
Sender: sf-robotSourceForge.net Site Admin


This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).


Date: 2007-12-07 11:20
Sender: heatzync


This tracker item depends on the StatsUtil class that still needs to be
implemented... I'm therefore changing the status to "pending".


Date: 2007-11-22 05:47
Sender: gpamparaProject Admin


Small steps. The new dataset stuff can use it but I don't think building
the stats stuff into it is a good idea. The stats stuff will be the
simplest building blocks from which other actions can benefit. I have a
class I wrote for some functions with this code in it. I'm quickly gonna
find it


Date: 2007-11-22 05:37
Sender: heatzync


We should also keep the "new dataset stuff" in mind. Since some statistics
will be performed on it.


Date: 2007-11-22 05:34
Sender: heatzync


A generic StatisticsUtil class would be a most excellent idea. I've also
been toying with the idea... It will reduce the amount of code in CILib
dramatically, because mean, deviations and variances are calculated all
over the show. We would just have to define a standard way of representing
a set of Types, i.e. will we use ArrayList<Type> or will we create a new
CILib specific class. This is so that everyone who wants to create sets of
anything will use the CILib standard, because then it will work with the
StatisticsUtil class. We will also need methods for scalars as well as
vectors, because they are handled differently, i.e:

double getMean(CILibSetStandardClass<Numeric> set); //and
Vector getMean(CILibSetStandardClass<Vector> set);

Currently, I have getMean() and getVariance() in
AssociatedPairDataSetBuilder. Those methods return the mean and variance of
the "this" dataset respectively. Both of them make use of the getSetMean()
and getSetVariance() methods, also in AssociatedPairDataSetBuilder.


Date: 2007-11-21 18:00
Sender: gpamparaProject Admin


Yup :)

I have been planning a generic StatisticsUtil class of sorts for some time
now. Now should be a good time for us to add something like this.


Date: 2007-11-21 16:14
Sender: gpamparaProject Admin


Where are we doing this calculation? I have some ideas to refactor these
operations into a standard set of utilities.


Attached File

No Files Currently Attached

Changes ( 18 )

Field Old Value Date By
close_date 2008-10-07 06:18 2008-10-07 07:54 heatzync
status_id Closed 2008-10-07 07:54 heatzync
close_date - 2008-10-07 06:18 gpampara
status_id Open 2008-10-07 06:18 gpampara
status_id Closed 2008-05-22 08:11 heatzync
close_date 2008-02-27 09:29 2008-05-22 08:11 heatzync
resolution_id Later 2008-02-27 09:29 gpampara
status_id Open 2008-02-27 09:29 gpampara
assigned_to heatzync 2008-02-27 09:29 gpampara
close_date - 2008-02-27 09:29 gpampara
close_date 2007-12-22 03:20 2007-12-23 15:45 gpampara
status_id Closed 2007-12-23 15:45 gpampara
status_id Pending 2007-12-22 03:20 sf-robot
close_date 2007-12-07 11:20 2007-12-22 03:20 sf-robot
resolution_id None 2007-12-07 11:20 heatzync
status_id Open 2007-12-07 11:20 heatzync
close_date - 2007-12-07 11:20 heatzync
assigned_to nobody 2007-11-21 13:33 heatzync