[Openscrobbler-devel] Distributing the system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

So I have some fairly concrete thoughts about how to distribute the 
system over the Internet. It's not "peer-to-peer" yet but we'll see.

First, it may not be necessary for users to be directly aware of the 
multi-server atmosphere. Usernames could include the server the user is 
assigned to or the central server could know which server holds each 
user.

My observation is the majority of statistics are not inter-user. They 
are about one user at a time. This perfect for "clustering" where each 
server is responsible for any number of servers. What remains is the 
aggregated stats. I believe the solution for this is each cluster to 
generate its own aggregate stats - this is generally the "hard work." 
The central server then takes the results from those aggregate stats 
and combines them into a central aggregation.

This still presents a problem, though. First, this is a lot of data. 
The initial stuff like "top artists" and "top users" is easy. What is 
not: every artist has top songs and top users. There are thousands of 
artists. Every song has top users. There are TONS of songs. And this 
assumes we're "only" copying the Audioscrobbler feature set.

Another idea is to not aggregate something until a user requests it, 
and then cache it and only re-aggregate it at most once a week. 
Assuming a very large number of songs will never be requested, this 
could save a lot. Plus it would distribute the requests to the clusters 
more slowly.

Another issue is unique IDs. Assuming we store songs/albums/artists in 
a database, how will the clusters have the same IDs as the central 
database (or, every other database)? The first inclination is to store 
this on the central server and have the clusters download this 
information. When a new song is submitted to a cluster, it tells the 
central server.

Obviously this isn't very "peer-to-peer," it's really coordinated 
internet clustering. There's still a lot for the central server to do, 
and I believe it needs more thought.

--JD