Re: [Openscrobbler-devel] Distributing the system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I think it would be better to develop it as the central server/clusters 
system that we have been discussing, as you mentioned, building an 
AS-like clone may end up just making things harder on us because we'll 
have to put a significant amount of effort into regrouping it into a 
distributed system.  I guess the bad part of going straight to the 
central server/clusters system is that it will take longer, right?

I finally took a look at the docs, and I am still having difficulty 
figuring out exactly what sort if db interaction is going to be taking 
place when a song play is submitted, and when a view [misc data] 
request is received.  I *think* it is better from a db design 
standpoint to simply insert the fact that a song is played when it is 
(and I think that's what the song_data table is for), but I think we 
would be able to provide a faster overall experience to the users if we 
were to include play counts with every song, artist, album, etc, and 
update them with every submission.  I think it would be worth it to 
have faster statistic browsing at the cost of slower submission 
processing.  I think i'm pretty much suggesting that we keep a 
submission queue / cruncher, and hope to have faster / simpler queries 
for viewing statistics.  Are we already planing on doing something like 
this (updating total playcounts) and I'm just not seeing it being 
mentioned? Is it really stupid for some reason that I don't understand? 
Are we doing anything to improve upon AS beyond turning it into a 
distributed system? (and is this even one of the project goals?, does 
it need to be?)

The "Please Wait ..." screen should be fine, it would definitely better 
than just having the page take forever.

Concerning the ids we're going to be assigning, what if the central 
server just had some sort of id maping table, so it would know that 
song 13 on server a = song 337 on server b?

- deep

On Mar 4, 2005, at 9:27 PM, Jonathan Dance wrote:

> Good reply!
>
> The distribution system has two aspects: "politics" and performance. 
> Let me explain.
>
> The idea was this: the initial development would create a standalone 
> audioscrobbler-like clone. Anyone could download the software and 
> setup a music tracking web site for themselves, their friends, 
> whoever. They can customize it if they want. Submit patches. Improve 
> it.
>
> From here, my idea was to link together the independent systems into 
> one large statistical pool. What this means is that it's not being 
> built from the top down, but instead the bottom up. If we did it this 
> way, we would be "stuck" with trying to regroup it into the 
> distributed system.
>
> Now, I'm not saying it has to go down that way. But that's the way I 
> thought it up. So lets talk about ignoring the pretenses under which 
> this all happened. The rest of my reply is inline below.
>
> On Mar 4, 2005, at 12:33 AM, Mr.Deep wrote:
>
>> So, what exactly is the limiting factor to AudioScrobler? Is it the 
>> the processing time required? or the traffic (submissions, or viewing 
>> data)? A better understanding of the limitations would help me 
>> understand what we're trying to fix, and what we must avoid.  I think 
>> that the most difficult task is processing the data, but I'm not sure 
>> if we also have to take submission / viewing traffic into 
>> consideration.  What exactly do we hope to gain by spreading the work 
>> (whatever it may be) over multiple servers?
>
> I can really only guess what AS's problems are. I can only guess based 
> on what they've done to fix it. They've moved processing into the 
> background or into memory (using memcached), which means that 
> generating things on the fly was a problem. So that's too many queries 
> on viewing.  By reducing the number of queries, they probably have an 
> easier time doing INSERTs as well. The INSERTs are also put into 
> memory (so its not the connections either) and are a queue for the 
> "cruncher" whose sole responsibility is to save the statistics.
>
> So basically it's all processing and not traffic at all. But the 
> processing traffic comes from both submissions and viewing. You 
> typically can't SELECT and INSERT at the same time (because you need 
> to lock the table).
>
>> What you have to say about non-inter-user statistics and each cluster 
>> generating its own stats sounds good.  I'm just wondering what 
>> exactly is going to happen when the statistics are generated.  Is it 
>> going to be "look at everything" get totals, etc etc, or will the 
>> statistics generating be incremental?
>
> I imagine it will initially be "look at everything" but we may evolve 
> towards incremental updating. We could just jump to incremental if its 
> not too hard.
>
> I suppose it really depends what the statistic in question is. 
> Obviously if its "in the past week" it will be "look at everything in 
> the past week." Of course the majority of data is regarding song total 
> and things.... I don't know, I think I need to think more about this.
>
>> Concerning not aggregating data until a user requests it, the only 
>> problem I see with this is: what will the user experience be? If 
>> someone clicks on the song for the first time, how long will it take 
>> to get the information they requested?
>
> I'm not really sure. It really depends on who receives the request. If 
> its a cluster, then it could display local results immediately (as 
> well as cached results if available) and ask the central server, or 
> all the other servers, for an update if necessary. If its the central 
> server, I suppose it wouldn't have anything, but it shouldn't take 
> very long to fetch results from the servers. I imagine we could throw 
> up a "Please wait" screen while it fetches the data.
>
>> I think we definitely need to decide if we are going to have some 
>> sort of weekly rollover type thing like AS has, because if we do, 
>> each server could assign ids as it wishes, then go through some sort 
>> of reconciliation phase.   Otherwise, we could make use of the 
>> hashing system mentioned in the other email.  But we really should 
>> make this decision as it will probably effect other things too.
>
> The hashing system is a cool idea, but it doesn't generate very good 
> keys. 32 byte (or more - I looked at some other "better" hashing 
> algorithms) keys are pretty long - that's the equivalent of 8 
> integers. That's probably be hard to index (*tries to remember how 
> btrees work* - well I think its only a matter of comparing the data). 
> Regardless, it might be the best idea going forward. A reconciliation 
> phase could also work, though.
>
> Alright, that's all for now.
>
> --JD
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Openscrobbler-devel mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/openscrobbler-devel
>