|
From: Mr. D. <mr...@gm...> - 2005-03-12 22:10:54
|
Thanks for the clarifications, sorry for the delay in reply time. On Mar 5, 2005, at 1:08 AM, Jonathan Dance wrote: > Continuing my inline-whoring.... > > On Mar 4, 2005, at 11:56 PM, Mr.Deep wrote: > >> I think it would be better to develop it as the central >> server/clusters system that we have been discussing, as you >> mentioned, building an AS-like clone may end up just making things >> harder on us because we'll have to put a significant amount of effort >> into regrouping it into a distributed system. I guess the bad part >> of going straight to the central server/clusters system is that it >> will take longer, right? > > Yea, it could take a really long time (especially at the current pace) > to get there. It would of course make sense to avoid as much > re-development as possible, but I think it reasonable to assume that > we're not going to jump from 0% to 100% - we're going to need a way to > get there, and that probably involves a "standalone" cluster-ish > system in the shorter term. > >> I finally took a look at the docs, and I am still having difficulty >> figuring out exactly what sort if db interaction is going to be >> taking place when a song play is submitted, and when a view [misc >> data] request is received. I *think* it is better from a db design >> standpoint to simply insert the fact that a song is played when it is >> (and I think that's what the song_data table is for), but I think we >> would be able to provide a faster overall experience to the users if >> we were to include play counts with every song, artist, album, etc, >> and update them with every submission. I think it would be worth it >> to have faster statistic browsing at the cost of slower submission >> processing. I think i'm pretty much suggesting that we keep a >> submission queue / cruncher, and hope to have faster / simpler >> queries for viewing statistics. Are we already planing on doing >> something like this (updating total playcounts) and I'm just not >> seeing it being mentioned? Is it really stupid for some reason that I >> don't understand? Are we doing anything to improve upon AS beyond >> turning it into a distributed system? (and is this even one of the >> project goals?, does it need to be?) > > The database at the moment is currently a result of the ERD and is not > final nor optimized. I also did it before I came up with any > solutions for the distributed system. > > At the moment, there is also no easy place to put the "cached" data in > the DB. (We call it caching, even though it's not in RAM or anything.) > For example: > - Total song count [for all users] is easy. Just put it in songs. > - Song count per user is not. There's no "user-songs" table. (Yet) > > Only saving aggregated statistics makes you lose granularity > (basically you lose the "time" element); for instance, you can't say > what happened in the past week, unless you capture that specifically. > We're having the same issue at work trying to create a stats package > for our game - balancing lots of details with performance, as well as > storage. > > Yes, at the moment this is all in "song_data" which does the job just > great, just not too quickly. My hope was to escape this problem by: > - clustering > - caching data to memory or disk - use memcached and/or store > generated profile data somewhere. > > As far as goals... no it doesn't need to be distributed (or, depending > on your point of view, aggregated). This is a "would be cool" factor > that would help bring all users together. It stemmed from the fact > that it would be awesome if all the music tracking sites could be > networked in a way so that there could be a "definitive" aggregation. > Having 5, 10, or 100 little sites with all their own statistics would > be inherently bad. (For example: LiveJournal. You want everyone to > have their journal at LJ so that you don't have to hop around the web, > etc.) There is a definite advantage to having lots of people on one > system. By aggregating the pieces, you create one system where there > was previously many. > > Maybe this is why I always think of it as "bringing together the > clusters" because part of my idea was even a site like Audioscrobbler, > which does not run Openscrobbler, could possibly contribute to the > global statistics. If there was an API that could be implemented for > any system, then even this would be possible. > > But I digress. > > The more important goal in the shorter-term is/was to get an > open-source listener tracking system that is geared to providing a > smaller number of users a larger number of features (compared to > Audioscrobbler). I want to see the return of time played, > weekly/monthly/etc stats, and stats that update more often then > whenever-they-feel-like-it. I also want to see albums! My real > motivation for doing this is more out of user frustration than geek > pride. > >> The "Please Wait ..." screen should be fine, it would definitely >> better than just having the page take forever. > > Yea, that'd be bad. I think the other option is to display something > like: > "Global statistics has not yet been generated for this > <song,album,artist>. Your request has been added to the queue and will > be processed shortly. Please check back in a few minutes." > >> Concerning the ids we're going to be assigning, what if the central >> server just had some sort of id maping table, so it would know that >> song 13 on server a = song 337 on server b? > > Not sure. Need more thought on how IDs will be used any how critical > it is that things get "aligned." > > --JD > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Openscrobbler-devel mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openscrobbler-devel > |