[luciddb-users] Matrix storage

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi.

We are implementing a sitevisitor similarity engine where we match the
unique sitevisitors against each other.

Tried to do this with mysql and monetdb example:

something like this (pseudo):

foreach(source in sites)
  sourceUV = select count(distinct uid) from UniqueSiteVisitorSample where
site = $source
  foreach(target in sites)
    targetUV = select count(distinct uid) from UniqueSiteVisitorSample where
site = $target
    totalUV = select count(distinct uid) from UniqueSiteVisitorSample where
site in ($source, $target )
    calcAndStoreSimilarity(source,target,sourceUV,targetUV,totalUV)

Thing is that we have 40 000 sites in our network which each will be
compared against eachother = 1.6 billion comparisons or (40 000 x 39 999)

We need to be able to compare at least a few hundred sites per second and
optimally a few thousand sites/sec to be able to get the job done in a
reasonable time (30-40 days) at the current rate we will be done in 5 years
:)

So my questions are:
* Do you guys think that LucidDB could help in storing the underlying
datastructure thus speeding up the queries above ?
* If not do you know about any other storage engine which would perform this
kind of matrix like storage ?

Current DDL (some cols removed):
CREATE TABLE UniqueSiteVisitorSample (
   uid bigint NOT NULL,
   site int NOT NULL,
  PRIMARY KEY  (uid,site)
);

We will do more and more of these Cluster Analysis stuff which involves
comparing each item to every other item and we cannot be first in the
universe doing this right ?

Kindly

//Marcus Herou

-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
mar...@ta...
http://www.tailsweep.com/
http://blogg.tailsweep.com/