|
From: Jonathan D. <jd...@wu...> - 2005-03-05 06:08:34
|
Continuing my inline-whoring.... On Mar 4, 2005, at 11:56 PM, Mr.Deep wrote: > I think it would be better to develop it as the central > server/clusters system that we have been discussing, as you mentioned, > building an AS-like clone may end up just making things harder on us > because we'll have to put a significant amount of effort into > regrouping it into a distributed system. I guess the bad part of > going straight to the central server/clusters system is that it will > take longer, right? Yea, it could take a really long time (especially at the current pace) to get there. It would of course make sense to avoid as much re-development as possible, but I think it reasonable to assume that we're not going to jump from 0% to 100% - we're going to need a way to get there, and that probably involves a "standalone" cluster-ish system in the shorter term. > I finally took a look at the docs, and I am still having difficulty > figuring out exactly what sort if db interaction is going to be taking > place when a song play is submitted, and when a view [misc data] > request is received. I *think* it is better from a db design > standpoint to simply insert the fact that a song is played when it is > (and I think that's what the song_data table is for), but I think we > would be able to provide a faster overall experience to the users if > we were to include play counts with every song, artist, album, etc, > and update them with every submission. I think it would be worth it > to have faster statistic browsing at the cost of slower submission > processing. I think i'm pretty much suggesting that we keep a > submission queue / cruncher, and hope to have faster / simpler queries > for viewing statistics. Are we already planing on doing something > like this (updating total playcounts) and I'm just not seeing it being > mentioned? Is it really stupid for some reason that I don't > understand? Are we doing anything to improve upon AS beyond turning it > into a distributed system? (and is this even one of the project > goals?, does it need to be?) The database at the moment is currently a result of the ERD and is not final nor optimized. I also did it before I came up with any solutions for the distributed system. At the moment, there is also no easy place to put the "cached" data in the DB. (We call it caching, even though it's not in RAM or anything.) For example: - Total song count [for all users] is easy. Just put it in songs. - Song count per user is not. There's no "user-songs" table. (Yet) Only saving aggregated statistics makes you lose granularity (basically you lose the "time" element); for instance, you can't say what happened in the past week, unless you capture that specifically. We're having the same issue at work trying to create a stats package for our game - balancing lots of details with performance, as well as storage. Yes, at the moment this is all in "song_data" which does the job just great, just not too quickly. My hope was to escape this problem by: - clustering - caching data to memory or disk - use memcached and/or store generated profile data somewhere. As far as goals... no it doesn't need to be distributed (or, depending on your point of view, aggregated). This is a "would be cool" factor that would help bring all users together. It stemmed from the fact that it would be awesome if all the music tracking sites could be networked in a way so that there could be a "definitive" aggregation. Having 5, 10, or 100 little sites with all their own statistics would be inherently bad. (For example: LiveJournal. You want everyone to have their journal at LJ so that you don't have to hop around the web, etc.) There is a definite advantage to having lots of people on one system. By aggregating the pieces, you create one system where there was previously many. Maybe this is why I always think of it as "bringing together the clusters" because part of my idea was even a site like Audioscrobbler, which does not run Openscrobbler, could possibly contribute to the global statistics. If there was an API that could be implemented for any system, then even this would be possible. But I digress. The more important goal in the shorter-term is/was to get an open-source listener tracking system that is geared to providing a smaller number of users a larger number of features (compared to Audioscrobbler). I want to see the return of time played, weekly/monthly/etc stats, and stats that update more often then whenever-they-feel-like-it. I also want to see albums! My real motivation for doing this is more out of user frustration than geek pride. > The "Please Wait ..." screen should be fine, it would definitely > better than just having the page take forever. Yea, that'd be bad. I think the other option is to display something like: "Global statistics has not yet been generated for this <song,album,artist>. Your request has been added to the queue and will be processed shortly. Please check back in a few minutes." > Concerning the ids we're going to be assigning, what if the central > server just had some sort of id maping table, so it would know that > song 13 on server a = song 337 on server b? Not sure. Need more thought on how IDs will be used any how critical it is that things get "aligned." --JD |