mdn-backend Mailing List for Media Distribution Network
Status: Planning
Brought to you by:
tannewt
You can subscribe to this list here.
2008 |
Jan
|
Feb
(2) |
Mar
(10) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Andrew R. <ar...@gm...> - 2008-03-14 00:05:39
|
Here is one idea for doing search. It's a very simple, keyword-based algorithm that ranks first by number of distinct entries found, then by contiguous hits in search order, then by hits. This does plaintext searching only, no HTML fanciness. I've also attached a sample dataset. See what you guys think. Search.php is here: http://students.washington.edu/areusch/search.txt(rename it to .php due to webservers that actually run php) example-dataset.sql is here: http://students.washington.edu/areusch/example-dataset.sql The algorithm makes use of string sorting functions in PHP because PHP doesn't have the idea of a Comparable interface (so rather than make it I just made a more inefficient algorithm for the time being). Anyway, hence its inefficiency. On my (1.6GHz) processor with 1GB ram, the add function took about 5 minutes to add the dataset I'm sending, which is about 10000 entries. time php search.php search red hot chili peppers gives 217 results and the following time output: real 0m0.136s user 0m0.060s sys 0m0.008s This is awfully slow given our application, but I believe that there is huge optimization we can reap by writing a system that doesn't do string sorts. Those lines are marked in the code. A basic description of the algorithm: Add assigns the input file a new document id, tokenizes it (here we could apply a stemming lib) and, for each token, appends a reference string to the reverse index for that token like <docID>:<tokenOrdinal>SPACE where tokenOrdinal is the number of tokens from the start of the file. So for the following input with ID #1030: A quick dog barked at the other dog. The reverse index looks like: A 1030:0 quick 1030:1 dog 1030:2 1030:7 barked 1030:3 at 1030:4 the 1030:5 other 1030:6 Search retrieves the reverse index entry for each search term and appends the string :<query token ordinal> to each reference string. It then sorts them numerically (looking at the full numbers contiguously, which is the ultra-slow part), so that if you searched for words that are right next to each other, their reference strings appear right next to each other. Then the algorithm enumerates the sorted list and does some stats, which are then stored in the closest thing to a priority queue that exists in PHP (another array which I later sort by the same method). Comments? We can certainly speed the sorting using some type of bucketing system with Comparable classes. To use the sample code, get php command line/mysql, create a sql database and a user for it (or just use root as i did) then set up the top of search.php accordingly. Run php search.php setup to create the tables and you're good to go. Or, instead of doing serach.php setup you could alternatively import the dataset, which you can feed to mysql (it's just sql commands). The dataset is a set of 10000 songs by title/artist/album. Hopefully the list doesn't scrub attachments....oops it did, see above for links. Andrew |
From: Andrew R. <ar...@gm...> - 2008-03-12 22:15:51
|
Let's wait until this weekend then. If we are indeed meeting sunday I should be able to make part of it. Andrew On Wed, Mar 12, 2008 at 3:14 PM, Brendan Roof <bre...@gm...> wrote: > I'm not sure what exactly we would cover if we did. As a side note, I > think for now I will focus on developing the persistence capabilities > of the network, especially the push model. If anyone has any other > suggestions feel free to speak up. Andrew could you please send off > an email if we are meeting today? Thanks. > > -Brendan > > > On Mar 11, 2008, at 2:15 AM, Andrew Reusch wrote: > > > Do we want to meet for backend discussions this week? > > > > I have search code that is getting close to finishing--the weekend > > was much busier than I'd thought--so I will bring it/post to list > > on Wednesday and we can use as example code for Scott > > > > Andrew > > ---------------------------------------------------------------------- > > --- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Mdn-backend mailing list > > Mdn...@li... > > https://lists.sourceforge.net/lists/listinfo/mdn-backend > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Mdn-backend mailing list > Mdn...@li... > https://lists.sourceforge.net/lists/listinfo/mdn-backend > |
From: Brendan R. <bre...@gm...> - 2008-03-12 22:14:37
|
I'm not sure what exactly we would cover if we did. As a side note, I think for now I will focus on developing the persistence capabilities of the network, especially the push model. If anyone has any other suggestions feel free to speak up. Andrew could you please send off an email if we are meeting today? Thanks. -Brendan On Mar 11, 2008, at 2:15 AM, Andrew Reusch wrote: > Do we want to meet for backend discussions this week? > > I have search code that is getting close to finishing--the weekend > was much busier than I'd thought--so I will bring it/post to list > on Wednesday and we can use as example code for Scott > > Andrew > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Mdn-backend mailing list > Mdn...@li... > https://lists.sourceforge.net/lists/listinfo/mdn-backend |
From: Andrew R. <ar...@gm...> - 2008-03-11 10:01:52
|
Do we want to meet for backend discussions this week? I have search code that is getting close to finishing--the weekend was much busier than I'd thought--so I will bring it/post to list on Wednesday and we can use as example code for Scott Andrew |
From: Eric O. <eo...@cs...> - 2008-03-07 10:43:26
|
While this is certainly preferable to xml, and it looks like a nice protocol, I would still prefer a super-efficient custom binary protocol. I had a couple hours to poke at some code today, so I managed to put together a simple prototype (zip file included). This includes basic (and I do mean basic) client and server code and code for the login message with a basic response. And yes, Scott, the protocol library uses objects. The login message contains the same information as your version, but it is in a very different form: 2-bytes: command code 2-bytes: id number 4-bytes: version number (I don't know why I gave it 4 bytes, but it's easy to change) Null-terminated string: username Null-terminated string: password All numbers are little-endian integers and all strings are UTF-8. There is no marker for the end of a login message because its contents are known in advance other than the string endings. The code will currently keep searching forever until it finds a string-ending, but a limit could be easily added. An example login message: 00-00-04-00-01-00-00-00-65-72-69-63-00-69-73-20-61-77-65-73-6F-6D-65-21-21-2 1-31-00 After parsing: command = login id = 4 version = 1 username = eric password = is awesome!!!1 Other message types could be constructed similarly. ~Eric -----Original Message----- From: mdn...@li... [mailto:mdn...@li...] On Behalf Of Michael Frank Sent: Thursday, March 06, 2008 12:13 AM To: mdn...@li... Subject: [Mdn-backend] kicking off the discussion on protocol specifics hey all, I'm sorry this is such a long email, but everybody feel free to break it down and respond to individual parts. i've been poking around with JSON (javascript object notation) as a means for encapsulating our protocol messages. JSON is easy on the eyes, less verbose than XML, and there are libraries for many different languages which can read and write it. More information on JSON can be found here: http://www.json.org I found an MIT-licensed library called JsonFX for C# (http://jsonfx.net/BuildTools/) which takes native C# data types and converts them into the JSON wire format, and vice-versa. Very slick. My thought is to encapsulate each message in a JSON object, and separate each message with a newline character. The reason for terminating with a newline is so the receiving entity can know when a message has been completely received without actually parsing the message itself. I would advocate a hard-limit on message size, perhaps 16k, and I would enforce it by dropping the connection if a newline has not been reached after parsing 16k bytes into a buffer. We would probably want to experiment with varying the buffer size to find the biggest buffer we can without wasting too much space on the server side (which would have to keep multiple such buffers allocated). Each message would be either a command or a result. A command consists of two fields common to every command type, and then command-specific fields. The common fields are 'method', which is the command type, and 'id', which is a unique tag so we can batch multiple commands and process their results out-of-order. A result consists of two fields common to every result type, and then result-specific fields. The common fields are 'id', which is the unique tag taken from the related command, and 'rcode', which is the numerical response code, similar to HTTP response codes. >From what we discussed at the meeting today, I fleshed out what I think the specific protocol commands should look like. This is what the actual wire protocol would look like, with the exception of the portions enclosed in angle brackets, which are placeholders for parameters, and excessive whitespace i added for readability. The login command: { "method" : "LOGIN", "id" : <unique command identifier>, "version" : <the protocol version which the client uses>, "username" : <username>, "password" : <password> } The login result: { "rcode" : <response code>, "id" : <id>, "version" : <the protocol version which the server will use> } The update command: { "method" : "UPDATE", "id" : <id>, "urls" : [ <url>, ... ] } The update result: { "rcode" : <response code>, "id" : <id> } The search command: { "method" : "SEARCH", "id" : <id>, "query" : <search query string>, "nresults" : <number of results to return> } The search result: { "rcode" : <response code>, "id" : <id>, "results" : [ { "url" : <media url>, "artist" : <media artist>, "album" : <media album>, "title" : <media title>, ... }, ... ] } The purchase command: { "method" : "PURCHASE", "id" : <id>, "url" : <media url> } The purchase result: { "rcode" : <response code>, "id" : <id> } The download command: { "method" : "DOWNLOAD", "id" : <id>, "url" : <media url> } The download result: { "rcode" : <response code>, "id" : <id>, "peers" : [ { "address" : <peer IP address>, "port" : <peer port> }, ... ] } |
From: Scott S. <sco...@co...> - 2008-03-06 15:54:12
|
Michael, I like that idea. JsonFX sounds neat too. I like being able to treat things as objects for as long as possible and having the conversion be done by a library is cool. ~Scott Michael Frank wrote: > hey all, I'm sorry this is such a long email, but everybody feel free to > break it down and respond to individual parts. > > > > i've been poking around with JSON (javascript object notation) as a > means for encapsulating our protocol messages. JSON is easy on the > eyes, less verbose than XML, and there are libraries for many different > languages which can read and write it. More information on JSON can be > found here: http://www.json.org > > I found an MIT-licensed library called JsonFX for C# > (http://jsonfx.net/BuildTools/) which takes native C# data types and > converts them into the JSON wire format, and vice-versa. Very slick. > > My thought is to encapsulate each message in a JSON object, and separate > each message with a newline character. The reason for terminating with > a newline is so the receiving entity can know when a message has been > completely received without actually parsing the message itself. I > would advocate a hard-limit on message size, perhaps 16k, and I would > enforce it by dropping the connection if a newline has not been reached > after parsing 16k bytes into a buffer. We would probably want to > experiment with varying the buffer size to find the biggest buffer we > can without wasting too much space on the server side (which would have > to keep multiple such buffers allocated). > > Each message would be either a command or a result. > > A command consists of two fields common to every command type, and then > command-specific fields. The common fields are 'method', which is the > command type, and 'id', which is a unique tag so we can batch multiple > commands and process their results out-of-order. > > A result consists of two fields common to every result type, and then > result-specific fields. The common fields are 'id', which is the unique > tag taken from the related command, and 'rcode', which is the numerical > response code, similar to HTTP response codes. > > From what we discussed at the meeting today, I fleshed out what I think > the specific protocol commands should look like. This is what the > actual wire protocol would look like, with the exception of the portions > enclosed in angle brackets, which are placeholders for parameters, and > excessive whitespace i added for readability. > > The login command: > > { > "method" : "LOGIN", > "id" : <unique command identifier>, > "version" : <the protocol version which the client uses>, > "username" : <username>, > "password" : <password> > } > > The login result: > > { > "rcode" : <response code>, > "id" : <id>, > "version" : <the protocol version which the server will use> > } > > The update command: > > { > "method" : "UPDATE", > "id" : <id>, > "urls" : [ <url>, ... ] > } > > The update result: > > { > "rcode" : <response code>, > "id" : <id> > } > > The search command: > > { > "method" : "SEARCH", > "id" : <id>, > "query" : <search query string>, > "nresults" : <number of results to return> > } > > The search result: > > { > "rcode" : <response code>, > "id" : <id>, > "results" : > [ > { > "url" : <media url>, > "artist" : <media artist>, > "album" : <media album>, > "title" : <media title>, > ... > }, > ... > ] > } > > The purchase command: > > { > "method" : "PURCHASE", > "id" : <id>, > "url" : <media url> > } > > The purchase result: > > { > "rcode" : <response code>, > "id" : <id> > } > > The download command: > > { > "method" : "DOWNLOAD", > "id" : <id>, > "url" : <media url> > } > > The download result: > > { > "rcode" : <response code>, > "id" : <id>, > "peers" : > [ > { "address" : <peer IP address>, "port" : <peer port> }, > ... > ] > } > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Mdn-backend mailing list > Mdn...@li... > https://lists.sourceforge.net/lists/listinfo/mdn-backend > |
From: Michael F. <ms...@sy...> - 2008-03-06 08:21:24
|
hey all, I'm sorry this is such a long email, but everybody feel free to break it down and respond to individual parts. i've been poking around with JSON (javascript object notation) as a means for encapsulating our protocol messages. JSON is easy on the eyes, less verbose than XML, and there are libraries for many different languages which can read and write it. More information on JSON can be found here: http://www.json.org I found an MIT-licensed library called JsonFX for C# (http://jsonfx.net/BuildTools/) which takes native C# data types and converts them into the JSON wire format, and vice-versa. Very slick. My thought is to encapsulate each message in a JSON object, and separate each message with a newline character. The reason for terminating with a newline is so the receiving entity can know when a message has been completely received without actually parsing the message itself. I would advocate a hard-limit on message size, perhaps 16k, and I would enforce it by dropping the connection if a newline has not been reached after parsing 16k bytes into a buffer. We would probably want to experiment with varying the buffer size to find the biggest buffer we can without wasting too much space on the server side (which would have to keep multiple such buffers allocated). Each message would be either a command or a result. A command consists of two fields common to every command type, and then command-specific fields. The common fields are 'method', which is the command type, and 'id', which is a unique tag so we can batch multiple commands and process their results out-of-order. A result consists of two fields common to every result type, and then result-specific fields. The common fields are 'id', which is the unique tag taken from the related command, and 'rcode', which is the numerical response code, similar to HTTP response codes. From what we discussed at the meeting today, I fleshed out what I think the specific protocol commands should look like. This is what the actual wire protocol would look like, with the exception of the portions enclosed in angle brackets, which are placeholders for parameters, and excessive whitespace i added for readability. The login command: { "method" : "LOGIN", "id" : <unique command identifier>, "version" : <the protocol version which the client uses>, "username" : <username>, "password" : <password> } The login result: { "rcode" : <response code>, "id" : <id>, "version" : <the protocol version which the server will use> } The update command: { "method" : "UPDATE", "id" : <id>, "urls" : [ <url>, ... ] } The update result: { "rcode" : <response code>, "id" : <id> } The search command: { "method" : "SEARCH", "id" : <id>, "query" : <search query string>, "nresults" : <number of results to return> } The search result: { "rcode" : <response code>, "id" : <id>, "results" : [ { "url" : <media url>, "artist" : <media artist>, "album" : <media album>, "title" : <media title>, ... }, ... ] } The purchase command: { "method" : "PURCHASE", "id" : <id>, "url" : <media url> } The purchase result: { "rcode" : <response code>, "id" : <id> } The download command: { "method" : "DOWNLOAD", "id" : <id>, "url" : <media url> } The download result: { "rcode" : <response code>, "id" : <id>, "peers" : [ { "address" : <peer IP address>, "port" : <peer port> }, ... ] } |
From: Andrew R. <ar...@gm...> - 2008-03-04 07:28:56
|
We'll meet again Wednesday to continue discussing the protocol, hopefully finishing up a basic idea on purchasing music and starting appropriate wars over how to implement things. Same time, same place (6:30 in the Suzzallo Cafe) as that seems to be working well for people. Andrew |
From: Ian G. <igi...@cs...> - 2008-02-29 01:04:30
|
Hey guys, I feel bad for not getting this out earlier, but I can't make the meeting this week. My partner is really sick for a CSE 455 project, so I have to pick up his work and finish it before tomorrow. That said, I don't know much about protocol design (only how to use certain protocols that I've learned =P), so I wouldn't have been a great deal of help. I'd love to know what you all settle on, though. Perhaps I can find out next week or from minutes. ~ Ian |
From: Scott S. <sco...@co...> - 2008-02-26 02:37:02
|
Hi all, I've added everyone to the separate group list so you can use them now. ~Scott |