|
From: Mr. D. <mr...@gm...> - 2005-03-04 05:34:06
|
So, what exactly is the limiting factor to AudioScrobler? Is it the the processing time required? or the traffic (submissions, or viewing data)? A better understanding of the limitations would help me understand what we're trying to fix, and what we must avoid. I think that the most difficult task is processing the data, but I'm not sure if we also have to take submission / viewing traffic into consideration. What exactly do we hope to gain by spreading the work (whatever it may be) over multiple servers? What you have to say about non-inter-user statistics and each cluster generating its own stats sounds good. I'm just wondering what exactly is going to happen when the statistics are generated. Is it going to be "look at everything" get totals, etc etc, or will the statistics generating be incremental? Concerning not aggregating data until a user requests it, the only problem I see with this is: what will the user experience be? If someone clicks on the song for the first time, how long will it take to get the information they requested? I think we definitely need to decide if we are going to have some sort of weekly rollover type thing like AS has, because if we do, each server could assign ids as it wishes, then go through some sort of reconciliation phase. Otherwise, we could make use of the hashing system mentioned in the other email. But we really should make this decision as it will probably effect other things too. - Deep On Mar 2, 2005, at 11:41 PM, Jonathan Dance wrote: > So I have some fairly concrete thoughts about how to distribute the > system over the Internet. It's not "peer-to-peer" yet but we'll see. > > First, it may not be necessary for users to be directly aware of the > multi-server atmosphere. Usernames could include the server the user > is assigned to or the central server could know which server holds > each user. > > My observation is the majority of statistics are not inter-user. They > are about one user at a time. This perfect for "clustering" where each > server is responsible for any number of servers. What remains is the > aggregated stats. I believe the solution for this is each cluster to > generate its own aggregate stats - this is generally the "hard work." > The central server then takes the results from those aggregate stats > and combines them into a central aggregation. > > This still presents a problem, though. First, this is a lot of data. > The initial stuff like "top artists" and "top users" is easy. What is > not: every artist has top songs and top users. There are thousands of > artists. Every song has top users. There are TONS of songs. And this > assumes we're "only" copying the Audioscrobbler feature set. > > Another idea is to not aggregate something until a user requests it, > and then cache it and only re-aggregate it at most once a week. > Assuming a very large number of songs will never be requested, this > could save a lot. Plus it would distribute the requests to the > clusters more slowly. > > Another issue is unique IDs. Assuming we store songs/albums/artists in > a database, how will the clusters have the same IDs as the central > database (or, every other database)? The first inclination is to store > this on the central server and have the clusters download this > information. When a new song is submitted to a cluster, it tells the > central server. > > Obviously this isn't very "peer-to-peer," it's really coordinated > internet clustering. There's still a lot for the central server to do, > and I believe it needs more thought. > > --JD > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Openscrobbler-devel mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openscrobbler-devel > |