|
From: Jonathan D. <jd...@wu...> - 2005-03-03 04:41:13
|
So I have some fairly concrete thoughts about how to distribute the system over the Internet. It's not "peer-to-peer" yet but we'll see. First, it may not be necessary for users to be directly aware of the multi-server atmosphere. Usernames could include the server the user is assigned to or the central server could know which server holds each user. My observation is the majority of statistics are not inter-user. They are about one user at a time. This perfect for "clustering" where each server is responsible for any number of servers. What remains is the aggregated stats. I believe the solution for this is each cluster to generate its own aggregate stats - this is generally the "hard work." The central server then takes the results from those aggregate stats and combines them into a central aggregation. This still presents a problem, though. First, this is a lot of data. The initial stuff like "top artists" and "top users" is easy. What is not: every artist has top songs and top users. There are thousands of artists. Every song has top users. There are TONS of songs. And this assumes we're "only" copying the Audioscrobbler feature set. Another idea is to not aggregate something until a user requests it, and then cache it and only re-aggregate it at most once a week. Assuming a very large number of songs will never be requested, this could save a lot. Plus it would distribute the requests to the clusters more slowly. Another issue is unique IDs. Assuming we store songs/albums/artists in a database, how will the clusters have the same IDs as the central database (or, every other database)? The first inclination is to store this on the central server and have the clusters download this information. When a new song is submitted to a cluster, it tells the central server. Obviously this isn't very "peer-to-peer," it's really coordinated internet clustering. There's still a lot for the central server to do, and I believe it needs more thought. --JD |