Re: [Openscrobbler-devel] Distributing the system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

So, what exactly is the limiting factor to AudioScrobler? Is it the the 
processing time required? or the traffic (submissions, or viewing 
data)? A better understanding of the limitations would help me 
understand what we're trying to fix, and what we must avoid.  I think 
that the most difficult task is processing the data, but I'm not sure 
if we also have to take submission / viewing traffic into 
consideration.  What exactly do we hope to gain by spreading the work 
(whatever it may be) over multiple servers?

What you have to say about non-inter-user statistics and each cluster 
generating its own stats sounds good.  I'm just wondering what exactly 
is going to happen when the statistics are generated.  Is it going to 
be "look at everything" get totals, etc etc, or will the statistics 
generating be incremental?

Concerning not aggregating data until a user requests it, the only 
problem I see with this is: what will the user experience be? If 
someone clicks on the song for the first time, how long will it take to 
get the information they requested?

I think we definitely need to decide if we are going to have some sort 
of weekly rollover type thing like AS has, because if we do, each 
server could assign ids as it wishes, then go through some sort of 
reconciliation phase.   Otherwise, we could make use of the hashing 
system mentioned in the other email.  But we really should make this 
decision as it will probably effect other things too.

- Deep

On Mar 2, 2005, at 11:41 PM, Jonathan Dance wrote:

> So I have some fairly concrete thoughts about how to distribute the 
> system over the Internet. It's not "peer-to-peer" yet but we'll see.
>
> First, it may not be necessary for users to be directly aware of the 
> multi-server atmosphere. Usernames could include the server the user 
> is assigned to or the central server could know which server holds 
> each user.
>
> My observation is the majority of statistics are not inter-user. They 
> are about one user at a time. This perfect for "clustering" where each 
> server is responsible for any number of servers. What remains is the 
> aggregated stats. I believe the solution for this is each cluster to 
> generate its own aggregate stats - this is generally the "hard work." 
> The central server then takes the results from those aggregate stats 
> and combines them into a central aggregation.
>
> This still presents a problem, though. First, this is a lot of data. 
> The initial stuff like "top artists" and "top users" is easy. What is 
> not: every artist has top songs and top users. There are thousands of 
> artists. Every song has top users. There are TONS of songs. And this 
> assumes we're "only" copying the Audioscrobbler feature set.
>
> Another idea is to not aggregate something until a user requests it, 
> and then cache it and only re-aggregate it at most once a week. 
> Assuming a very large number of songs will never be requested, this 
> could save a lot. Plus it would distribute the requests to the 
> clusters more slowly.
>
> Another issue is unique IDs. Assuming we store songs/albums/artists in 
> a database, how will the clusters have the same IDs as the central 
> database (or, every other database)? The first inclination is to store 
> this on the central server and have the clusters download this 
> information. When a new song is submitted to a cluster, it tells the 
> central server.
>
> Obviously this isn't very "peer-to-peer," it's really coordinated 
> internet clustering. There's still a lot for the central server to do, 
> and I believe it needs more thought.
>
> --JD
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Openscrobbler-devel mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/openscrobbler-devel
>