You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(90) |
Dec
(25) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(183) |
Feb
(124) |
Mar
(123) |
Apr
(75) |
May
(49) |
Jun
(60) |
Jul
(58) |
Aug
(41) |
Sep
(27) |
Oct
(30) |
Nov
(13) |
Dec
(19) |
2003 |
Jan
(119) |
Feb
(70) |
Mar
(5) |
Apr
(16) |
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(3) |
Nov
(4) |
Dec
(7) |
2004 |
Jan
(9) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(12) |
Jun
(4) |
Jul
(11) |
Aug
(17) |
Sep
(3) |
Oct
(15) |
Nov
(7) |
Dec
(2) |
2005 |
Jan
(4) |
Feb
(7) |
Mar
(2) |
Apr
(2) |
May
|
Jun
(1) |
Jul
(3) |
Aug
(1) |
Sep
(9) |
Oct
(4) |
Nov
(1) |
Dec
|
2006 |
Jan
(5) |
Feb
(7) |
Mar
(19) |
Apr
(8) |
May
(6) |
Jun
(2) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2007 |
Jan
(1) |
Feb
|
Mar
(4) |
Apr
(2) |
May
(2) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2008 |
Jan
|
Feb
(3) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
2009 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Simon W. <es...@ou...> - 2003-01-21 22:44:26
|
On Tue, 21 Jan 2003, Olivier Dragon wrote: > I think requiring DBI might be a bit too much for "easy to use" > scripts... I mean there are plenty of online board software out there. > That use fancy DBMSs like MySQL, PostgreSQL or Oracle. Security might be > a our main concern and is rarely addressed but I think we also need to > make it as easy as edit a config file and upload a few files (which I > think qualifies for all of our current scripts). This is key also to > keep the support request numbers down which seems to be a bit of an > issue at the moment. hear hear ( or is it here here ?) =20 > On the other hand, using DBI opens the way to a wide variety of > databases which could ultimately be left to the user to choose in the > configuration file. Beware the demon that is feature creep ! Our mission/goal/whatever is to provide software that can be installed=20 with the minimum of effort. As soon as you start down the DBI route you=20 end up with all sorts of headaches that are probably better solved by=20 other projects. my =A30.02 Simon. --=20 "Is there any tea on this spaceship ?" =20 |
From: Simon W. <es...@ou...> - 2003-01-21 22:41:58
|
On Tue, 21 Jan 2003, Nicholas Clark wrote: > Perl ships with complete source for SDBM_File, and that should compile > anywhere. The module that might be most useful is AnyDBM_File, which uses > whichever out of NDBM_File, DB_File, GDBM_File, SDBM_File, ODBM_File > are working. The two problems that immediately spring to mind with *DBM > files are how to lock them to protect against multiple writes (auxiliary > file?), and whether it's important that we don't know what file name they > actually create for us. (In fact, sometimes they create two files with > different extensions) A good reason to avoid db_files is the locking problems that come with them. > In an ideal world it would be possible to create an interface that makes it > simple for the end user to swap from files on disk to DB files to DBI. The interface is fairly simple: ->get( id ); ->save( id, \%data); # or something like that ->delete( id ); ->get_thread_from( id ); # if you're feeling generous. ->new( \%data ); The implementation knows where to go to get things and is reasonably clever enough that if multiple messages are stored in a single flat file per thread then it will slurp them all and cache them just in case. It takes care of saving them and updating any necessary index files. Simon. -- "2 to the power of 100,000 to 1 and falling...." |
From: Nicholas C. <ni...@un...> - 2003-01-21 22:33:28
|
On Tue, Jan 21, 2003 at 05:09:53PM -0500, Olivier Dragon wrote: > On Tue, Jan 21, 2003 at 02:17:14PM -0500, Wizard wrote: > > > While we're at it throwing ideas, how would the speed of XML or > > > otherwise formatted single-file parsing compare to something like > > > DB_File? As far as I know, DB_File is part of Perl (at least on my > > > system) and it would save a lot of parsing work out. > > > > It requires Berkeley DB which I'm not sure is available to all windows > > boxen. If you meant DBM files (which I believe is a different animal), > > that's part of Perl 5.0 and we could use that. I don't know about the speed > > thing, but I could test it. > > I thought DB_File was BerkleyDB 1.x... It requires external libraries? > It's not the impression I got from the documentation :o| Perl ships with complete source for SDBM_File, and that should compile anywhere. The module that might be most useful is AnyDBM_File, which uses whichever out of NDBM_File, DB_File, GDBM_File, SDBM_File, ODBM_File are working. The two problems that immediately spring to mind with *DBM files are how to lock them to protect against multiple writes (auxiliary file?), and whether it's important that we don't know what file name they actually create for us. (In fact, sometimes they create two files with different extensions) It ain't perl if AnyDBM_File doesn't work (perl5-porters, or at least Andy Dougherty will back me up on that) - in that no-one sane will have configured all 5 out, and their perl will have failed a regression test if they did. What proportion of ISPs aren't sane? How do we work round them - back to flat files? > On the other hand, using DBI opens the way to a wide variety of > databases which could ultimately be left to the user to choose in the > configuration file. In an ideal world it would be possible to create an interface that makes it simple for the end user to swap from files on disk to DB files to DBI. [Beware of low flying pigs :-(] Nicholas Clark |
From: Nicholas C. <ni...@un...> - 2003-01-21 22:08:00
|
On Tue, Jan 21, 2003 at 12:16:44PM -0500, Paul Roub wrote: > Everywhere that the file seek functions work. Or everywhere you can read a > line at time, which is... everywhere. You're still reading them, but not > parsing. If the file seek functions aren't working, I suspect they'll be the least of our problems. Are there any file systems that can't seek? (And anywhere you can read a line at a time you can read a block of bytes, so you can still fake a seek.) seeking is O(1). Reading line at a time is O(N) If a file gets over a disk block in size, then seeking direct to the right point saves ever needing to pull that block from disk. As others have said (oops, not in an easy position to quote messages), different approaches have different advantages and disadvantages every message in its own a file is good for backups, and good for middle deletions. However it's slow to open every file in turn, and if a server quotas by disk blocks used, there will be a lot of wasted space. Every message in the same file avoids lots of opens, seeks are (moderately) fast, and there's little wasteage from small files. However, middle deletes are slow, potentially expensive in terms of disk quota needed, and backups will be larger. Do we get the best of both worlds with many messages per file? Is 1 file per thread practical? Nicholas Clark |
From: Olivier D. <dr...@sh...> - 2003-01-21 22:04:05
|
On Tue, Jan 21, 2003 at 02:17:14PM -0500, Wizard wrote: > > While we're at it throwing ideas, how would the speed of XML or > > otherwise formatted single-file parsing compare to something like > > DB_File? As far as I know, DB_File is part of Perl (at least on my > > system) and it would save a lot of parsing work out. > > It requires Berkeley DB which I'm not sure is available to all windows > boxen. If you meant DBM files (which I believe is a different animal), > that's part of Perl 5.0 and we could use that. I don't know about the speed > thing, but I could test it. I thought DB_File was BerkleyDB 1.x... It requires external libraries? It's not the impression I got from the documentation :o| > > We could also use CSV or some other database schema. > > If we were going to go with some specific stand-alone Database mechanism, I > think I'd chose to use DBD::XBase for the SQL support (allowing for > conversion to Enterprise DBs), and include the CPAN modules. It works, is > pure perl, and supports a reasonable subset of SQL92(?). I've used it with > NeonDB. I am assuming however that all perl installs would include DBI by > default. The reason for my suggestion of DB_File is because the Perl code is extremely trivial and compact. A single `tie' or `dbopen' and you have a hash tied to the database data. From there it's all $hash{your_field} which is a lot simpler than SQL queries... I think requiring DBI might be a bit too much for "easy to use" scripts... I mean there are plenty of online board software out there. That use fancy DBMSs like MySQL, PostgreSQL or Oracle. Security might be a our main concern and is rarely addressed but I think we also need to make it as easy as edit a config file and upload a few files (which I think qualifies for all of our current scripts). This is key also to keep the support request numbers down which seems to be a bit of an issue at the moment. On the other hand, using DBI opens the way to a wide variety of databases which could ultimately be left to the user to choose in the configuration file. Oh Joy :o) -Olivier -- __-/| ? ? |\-__ __--/ / \ (^^) / \ \--__ _-/ / / \ / ( ) / \ \ \-_ / / / / ~( ^^ ~ \ \ \ \ / Oli Dragon dr...@sh... \ / Sfwr Eng III ( McMaster University \ / / / __--_ ( ) __--__ \ \ \ | / / _/ \_ \_ \_ \ \ | \/ / _/ \_ \_ \_ \ \/ \_/ / -\_\ \ \_/ \/ -) \/ *~ ___--<******************************************************>--___ [http://pgp.mit.edu:11371/pks/lookup?search=olivier+dragon&op=index] ~~~--<******************************************************>--~~~ |
From: Olivier D. <dr...@sh...> - 2003-01-21 21:47:39
|
On Tue, Jan 21, 2003 at 02:22:29PM -0500, Wizard wrote: > What are we calling this animal? > I'll suggest NMSBoard. By the way, what does the 'TF' in TFMail mean? I like NMSBoard. 'TF' = Tr`es Fonctionnel :o) -Olivier -- __-/| ? ? |\-__ __--/ / \ (^^) / \ \--__ _-/ / / \ / ( ) / \ \ \-_ / / / / ~( ^^ ~ \ \ \ \ / Oli Dragon dr...@sh... \ / Sfwr Eng III ( McMaster University \ / / / __--_ ( ) __--__ \ \ \ | / / _/ \_ \_ \_ \ \ | \/ / _/ \_ \_ \_ \ \/ \_/ / -\_\ \ \_/ \/ -) \/ *~ ___--<******************************************************>--___ [http://pgp.mit.edu:11371/pks/lookup?search=olivier+dragon&op=index] ~~~--<******************************************************>--~~~ |
From: Wizard <wi...@ne...> - 2003-01-21 19:27:20
|
What are we calling this animal? I'll suggest NMSBoard. By the way, what does the 'TF' in TFMail mean? Grant M. |
From: Wizard <wi...@ne...> - 2003-01-21 19:22:06
|
> While we're at it throwing ideas, how would the speed of XML or > otherwise formatted single-file parsing compare to something like > DB_File? As far as I know, DB_File is part of Perl (at least on my > system) and it would save a lot of parsing work out. It requires Berkeley DB which I'm not sure is available to all windows boxen. If you meant DBM files (which I believe is a different animal), that's part of Perl 5.0 and we could use that. I don't know about the speed thing, but I could test it. > We could also use CSV or some other database schema. If we were going to go with some specific stand-alone Database mechanism, I think I'd chose to use DBD::XBase for the SQL support (allowing for conversion to Enterprise DBs), and include the CPAN modules. It works, is pure perl, and supports a reasonable subset of SQL92(?). I've used it with NeonDB. I am assuming however that all perl installs would include DBI by default. Grant M. |
From: Olivier D. <dr...@sh...> - 2003-01-21 18:53:25
|
Greetings, While we're at it throwing ideas, how would the speed of XML or otherwise formatted single-file parsing compare to something like DB_File? As far as I know, DB_File is part of Perl (at least on my system) and it would save a lot of parsing work out. On Tue, Jan 21, 2003 at 07:08:20AM -0500, Wizard wrote: > <MessID name="14"> > <subject>NMS scripts are great!</subject> > <user>The Professor</user> > <email>fr...@fs...</email> > <date>20/01/03</date> > <moderate>0</moderate> > <MessID name="17"> > <subject>Re: NMS scripts are great!</subject> > <user>The Wizard</user> > <email>de...@nu...</email> > <date>22/01/03</date> > <moderate>0</moderate> > </MessID> > </MessID> Let me take a stab at this one too with a filesystem based hierarchy of DB files: /msgindex.db /000001.db /000001/msindex.db # replies of /000001.db /000001/000001.db /000001/000002.db /000001/000003.db /000001/000003/msgindex.db # replies of /000001/000003.db /000001/000003/000001.db /000001/000003/000002.db /000001/000003/000003.db /000001/000004.db /000002.db /000003.db /000004.db /000005.db ... msgindex contain a list of all the posts in a directory. We can quickly check the directory tree for threads of messages. The xxxxxx.db files contain the post data. Building an index (link) page seems simple enough. Modifying post would probably be slow (I think it does in place) but we usually don't modify posts. Deleting messages (unlink and delete msgindex reference), entire threads (rm -rf the directory) even is joke. The only issue I can see is DB_File has only been part of Perl since v??? and that the space requirements might be an unecessary strain on quota'd systems/users. On this, anyone knows how file size for DB_File compare to let say something equivalent in XML? We could also use CSV or some other database schema. What do you think? -Olivier -- __-/| ? ? |\-__ __--/ / \ (^^) / \ \--__ _-/ / / \ / ( ) / \ \ \-_ / / / / ~( ^^ ~ \ \ \ \ / Oli Dragon dr...@sh... \ / Sfwr Eng III ( McMaster University \ / / / __--_ ( ) __--__ \ \ \ | / / _/ \_ \_ \_ \ \ | \/ / _/ \_ \_ \_ \ \/ \_/ / -\_\ \ \_/ \/ -) \/ *~ ___--<******************************************************>--___ [http://pgp.mit.edu:11371/pks/lookup?search=olivier+dragon&op=index] ~~~--<******************************************************>--~~~ |
From: Nick C. <ni...@cl...> - 2003-01-21 18:09:27
|
On Tue, Jan 21, 2003 at 12:16:44PM -0500, Paul Roub wrote: > > > > Is this correct? If I'm getting what you're saying, we will need to delete > > messages from the beginning middle and end of a file if they contain more > > than one message. > > But in the normal (quota management) case, you'd delete oldest-first -- i.e. > from the beginning. If you batch this up a bit, it's not too bad. I don't like one big file including message bodies. The only simple way to delete a post (without risk of loosing the whole lot if the script is killed or the server reboots half way through) is to write out a new copy of the file and then rename it over the old version. That limits the total size you can have to *half* your quota, and makes it impossible to delete messages via the admin script once the account is over half quota. The users won't like that. Also, some system backup utils will dump only files modified since the last backup, and these will work much better with each message body in a separate file. -- Nick |
From: Wizard <wi...@ne...> - 2003-01-21 18:08:26
|
A couple more things (in reply to my own email) > Modularization. > Externalization of configuration. I have a module that I wrote some time ago for doing external configurations of Perl scripts. I know it works on numerous platforms, it's fast, and I am familiar with it. Should/can I use it? Here's the man page: http://www.neonedge.com/perl_tools/Config/ Let me know. There was something else, but now I can't remember. Definitely a Tuna-coma. Grant M. |
From: Wizard <wi...@ne...> - 2003-01-21 17:52:39
|
> I see what you're saying here. Of course you get into parsing > complications when you have to try and parse the headers of the messages > that are children of msg 2 out of this *without* using any XML:: > modules. I'm not sure what you mean here (I may be in a Tuna-coma). > I don't see that it would be that hard to make a data structure from an > index file that would convey the indenting information to the template. That's fine, just get me an example when you get a moment, and I'll happily take a look. NOTE: I DON'T WANT ANYONE TO THINK I'M FIXATED ON XML. I'm not, it's just that I seem to be visualizing the data in a format that is best described by XML. It appears to fit the schema naturally, and I don't have a decent grasp of the other formats yet. That's the reason I ask for format examples, so that I can visualize parsing them. > > But I actually don't mind too much, if you;re going tolead the > programming effort you can choose whatever you think is best :) Lead?!?!? who said that?!?!?! I thought I was just following! Quick, all cowards to the rear!!!! AAAAAAAARRRRGGGGHHHHH! > So far we have these: > > Separation of data from presentation Yes. This I think is the most important. > Posting alert email to admin I like this, but it's currently relegated to v2.1 (let me know if I should push this) > Optional threading I don't know if this is necessary for this iteration, but it should be pretty easy. > Thread-to-a-page Yup. > Multiple threads (all ?) to a page Er, I don't know about this. Perhaps a "show found thread" checkbox in a search function. v2.1? >Any others ? Modularization. Externalization of configuration. Moderation. IP/email/domain allow/deny Search on keyword/user/IP (and date?) User Accounts for posting That's it. Grant M. |
From: Paul R. <pa...@ro...> - 2003-01-21 17:16:43
|
> > 2. We will rarely, if ever, need to delete a message from > > the middle > > of the file > > Is this correct? If I'm getting what you're saying, we will need to delete > messages from the beginning middle and end of a file if they contain more > than one message. > But in the normal (quota management) case, you'd delete oldest-first -- i.e. from the beginning. If you batch this up a bit, it's not too bad. > Also, there's the issue of advertising posts and just plain inappropriate or > offensive posts. > True, but hopefully not *that* frequent. > > Building the index can be quite fast if we store content-length > > (probably as > > a line count) as above. > > I'm guessing that what you are proposing is fixed-length records. This could > be substantially faster on most systems, but is it easily portable? > Not at all. Either store the length in bytes, or the lenght in (variable) lines. > This kinda brings up another concern/thought. If we use either this format > or Simon's, don't I lose the benefit of a data-defined hierarchy? I > understand that they both try to solve this using indexing, but I still have > to deal with multi-level indents/nesting. In other words, with either of > these formats, don't I have to presort all of the data, assigning index > levels to each entry based upon the index level of the parent. With my > format (XML or not), the hierarchy is predetermined by the nesting of each > element, which means that they are already presorted. This means I only have > to increment/decrement the indent level based upon the data tags > encountered. I may be wrong, but it just sounds like a much simpler > implementation to me (and where I'm doing it, that sounds good :-). KISS > principle? > You're assuming (a) that we'll precompute the indent level, which I don't assume is a great idea. Also assuming that the indent level will continue to be correct when earlier (potentially parent) messages are deleted. Pre-computing, storing, re-computing what is essentially a presentation detail seems to be anti-KISS to me. > > Message display can still be quite efficient if we further extend > > the index > > to include the offset (by line, byte, whatever) of the message in > > the larger > > file. We seek (or quickly skip lines), we grab exactly content-length > > lines, and off we go. > > I do like the idea of this, but I would like to know it will work > everywhere. > Everywhere that the file seek functions work. Or everywhere you can read a line at time, which is... everywhere. You're still reading them, but not parsing. -paul |
From: Simon W. <es...@ou...> - 2003-01-21 17:13:20
|
On Tue, 2003-01-21 at 16:50, Wizard wrote: > This kinda brings up another concern/thought. If we use either this format > or Simon's, don't I lose the benefit of a data-defined hierarchy? I > understand that they both try to solve this using indexing, but I still have > to deal with multi-level indents/nesting. In other words, with either of > these formats, don't I have to presort all of the data, assigning index > levels to each entry based upon the index level of the parent. With my > format (XML or not), the hierarchy is predetermined by the nesting of each > element, which means that they are already presorted. This means I only have > to increment/decrement the indent level based upon the data tags > encountered. I may be wrong, but it just sounds like a much simpler > implementation to me (and where I'm doing it, that sounds good :-). KISS > principle? I see what you're saying here. Of course you get into parsing complications when you have to try and parse the headers of the messages that are children of msg 2 out of this *without* using any XML:: modules. <msg id="1"> <head>text</head> <msg id="2"> <head>text</text> <msg id="3"> <head>text</text> </msg> <msg id="4"> <head>text</head> </msg> <msg id="5"> <head>text</text> <msg id="6"> <head>text</head> </msg> </msg> </msg> I don't see that it would be that hard to make a data structure from an index file that would convey the indenting information to the template. But I actually don't mind too much, if you;re going tolead the programming effort you can choose whatever you think is best :) I do wonder if perhaps we haven't allowed ourselves to get bogged down in the detail when we should be looking at the features that should be in any new version. So far we have these: Separation of data from presentation Posting alert email to admin Optional threading Thread-to-a-page Multiple threads (all ?) to a page Any others ? S. |
From: Wizard <wi...@ne...> - 2003-01-21 17:11:24
|
> I strongly dislike the idea of creating something that seems to be one > thing (XML) but actually isn't. > > It strikes me as having a high chance for confusion later on, when someone > assumes that it is the think it looks like (XML) As I said, I don't care about the 'XML' thing, but isn't the problem you suggested the same as someone editing the current wwwboard.html file with an HTML editor, and munging the comment tags? I do understand your concern though, I just don't know that we can prevent stupidity ;-O. Besides, we don't have to 'say' that we store any format, we can just do it. What the user assumes is their problem. We could put a line at the beginning that is totally not XML and would crash a parser (maybe even check for that line). Then should we decide to convert to support XML, we just remove the line. Grant M. |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:56:05
|
On Tue, Jan 21, 2003 at 04:49:35PM +0000, Nick Cleaton wrote: > On Tue, Jan 21, 2003 at 04:28:33PM +0000, Nicholas Clark wrote: > > On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > > > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > > > > > For some reason I have this gut dislike of saying that something is "XML" > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > (which has a well defined spec) but then implementing a parser that can > > > > only read files formatted in our way. (for reasons of speed/size/inlining) > > > > > > That's simple to fix in the docs - it saves data in "an XML-like format". > > > > I'll buy you a beer for everyone who reads the docs before mailing the > > support list. :-) > > Where were we going to "say that something is XML" if not in the > docs ? > > You dislike the idea of saying XML, but we can't fix that by saying > "an XML-like format" instead because nobody is listening anyway ? > > Too zen for me :) I strongly dislike the idea of creating something that seems to be one thing (XML) but actually isn't. It strikes me as having a high chance for confusion later on, when someone assumes that it is the think it looks like (XML) Nicholas Clark |
From: Wizard <wi...@ne...> - 2003-01-21 16:55:16
|
> 2. We will rarely, if ever, need to delete a message from > the middle > of the file Is this correct? If I'm getting what you're saying, we will need to delete messages from the beginning middle and end of a file if they contain more than one message. Especially if these are on quota'd systems. A friend of mine used to run WWWBoard on www.coonhounds.com, and he was always getting over-quota charges where he'd have to go in and delete tons of messages. Also, there's the issue of advertising posts and just plain inappropriate or offensive posts. > Building the index can be quite fast if we store content-length > (probably as > a line count) as above. I'm guessing that what you are proposing is fixed-length records. This could be substantially faster on most systems, but is it easily portable? > We know exactly how many lines to skip (and > therefore not parse) at any time. Appending to the index is a one-shot if > we store an in-reply-to ID with each message, rather than a list > of replies > with each parent (see the JWZ article below). This kinda brings up another concern/thought. If we use either this format or Simon's, don't I lose the benefit of a data-defined hierarchy? I understand that they both try to solve this using indexing, but I still have to deal with multi-level indents/nesting. In other words, with either of these formats, don't I have to presort all of the data, assigning index levels to each entry based upon the index level of the parent. With my format (XML or not), the hierarchy is predetermined by the nesting of each element, which means that they are already presorted. This means I only have to increment/decrement the indent level based upon the data tags encountered. I may be wrong, but it just sounds like a much simpler implementation to me (and where I'm doing it, that sounds good :-). KISS principle? > Index display is easy if we extend the previously-mentioned format to > include Poster, Date and Subject line. The index file will still be quite > small relative to the main message text, which will never be read > when we're > displaying our message tree / list. > > Message display can still be quite efficient if we further extend > the index > to include the offset (by line, byte, whatever) of the message in > the larger > file. We seek (or quickly skip lines), we grab exactly content-length > lines, and off we go. I do like the idea of this, but I would like to know it will work everywhere. > > On thread-building, by the way, I highly recommend a look at Jamie > Zawinski's algorithm and pseudocode for that (for email, which is a > more-complicated version of the task we face): > > http://www.jwz.org/doc/threading.html > > Just my two cents ( 0.0187410 EUR, 0.0124266 GBP, 0.03 CAD). I'll take a look. Thanks, Grant M. |
From: Nick C. <ni...@cl...> - 2003-01-21 16:54:23
|
On Tue, Jan 21, 2003 at 04:28:33PM +0000, Nicholas Clark wrote: > On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > > > For some reason I have this gut dislike of saying that something is "XML" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > (which has a well defined spec) but then implementing a parser that can > > > only read files formatted in our way. (for reasons of speed/size/inlining) > > > > That's simple to fix in the docs - it saves data in "an XML-like format". > > I'll buy you a beer for everyone who reads the docs before mailing the > support list. :-) Where were we going to "say that something is XML" if not in the docs ? You dislike the idea of saying XML, but we can't fix that by saying "an XML-like format" instead because nobody is listening anyway ? Too zen for me :) -- Nick |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:28:49
|
On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > For some reason I have this gut dislike of saying that something is "XML" > > (which has a well defined spec) but then implementing a parser that can > > only read files formatted in our way. (for reasons of speed/size/inlining) > > That's simple to fix in the docs - it saves data in "an XML-like format". I'll buy you a beer for everyone who reads the docs before mailing the support list. :-) [No cheating by getting people on the development list to act as stooges] And if it's XML-like, why not have XML-unlike? Unless they've changed it since the original version, http://www.bbc.co.uk/cgi-bin/search/results.pl is getting its google feed in google's "Simple Results Format", rather than XML, because the google format was very easy to parse in a CGI without either making a half baked ultra-fragile regexp "XML" parser. (Political reasons meant that initially that script had to be a CGI. It's now mod_perl) The google format is nice. It's <tag>:<length>:<data> where tag is alphanumeric (and certainly no ':'s), length is the length of data (in bytes), and the data just is. It is very easy to parse in perl. (and C for that matter, although I didn't have to, because you can allocate a buffer using the size information before you have to read the actual data in). And the format actually allows data with embedded newlines, NUL bytes and any other indigestible characters. Nicholas Clark |
From: Wizard <wi...@ne...> - 2003-01-21 16:23:22
|
> Not necessarily. If you can pre-bake all the pages then it is simply a > case of letting the webserver deliver them back, no cgi involved. > > Then you just change the pages that need changing each time a message is > posted. > Ok, something like this then? (psuedo-code): /wwwview.pl?f=index -r wwwindex.html : print "Location: $url_wwwindex" ? &bake_new_index /wwwboard.pl [POST] &unlink_index if -r wwwindex.html; &post_message; # or vice-versa? > No offence taken :) Glad to hear it [damn, I mustn't be trying hard enough ;-)]. Grant M. |
From: Paul R. <pa...@ro...> - 2003-01-21 16:17:26
|
If I might jump in... I like the basic ideas going on here, but let me throw out a few suggestions to be shot down. I'll skip the whole XML-or-not question for now. Not obvious that it buys us much here, though. Not clear that we really gain much by the one-message-per-file method, either. You quickly end up with huge directories, and so forth. I realize there are pros and cons either way, but consider some of the characteristics of this app vs., say, email storage: 1. Reads are much more frequent than posts 2. We will rarely, if ever, need to delete a message from the middle of the file 3. Unlike email, we can completely track and control content-length information for each message Appending to the end of a file (even a large one) is obviously no big performance problem. So what about index building, index display, and message display? Building the index can be quite fast if we store content-length (probably as a line count) as above. We know exactly how many lines to skip (and therefore not parse) at any time. Appending to the index is a one-shot if we store an in-reply-to ID with each message, rather than a list of replies with each parent (see the JWZ article below). Index display is easy if we extend the previously-mentioned format to include Poster, Date and Subject line. The index file will still be quite small relative to the main message text, which will never be read when we're displaying our message tree / list. Message display can still be quite efficient if we furthur extend the index to include the offset (by line, byte, whatever) of the message in the larger file. We seek (or quickly skip lines), we grab exactly content-length lines, and off we go. On thread-building, by the way, I highly recommend a look at Jamie Zawinski's algorithm and pseudocode for that (for email, which is a more-complicated version of the task we face): http://www.jwz.org/doc/threading.html Just my two cents ( 0.0187410 EUR, 0.0124266 GBP, 0.03 CAD). -paul ----- Original Message ----- From: "Wizard" <wi...@ne...> To: "Simon Wilcox" <es...@ou...> Cc: "nms-devel" <nms...@li...> Sent: Tuesday, January 21, 2003 10:34 AM Subject: RE: [Nms-cgi-devel] WWWBoard2 - PLEASE RESPOND > > > > > Fair point but there is still an overhead in doing the regexes when my > > format uses simple splits. Plus you need to factor in the recursiveness > > of the format. Not that it won't work just fine but mine seems simpler > > to implement and less resource hungry. > > If you're saying that the index file will only contain the hierarchy (by > ID?) and no message data, then I think I get what you're saying. That would > be much faster for posting, but for viewing the index page we're then > talking about open & close on each and every message file to get the > subject, user, email and date. I'd guess that much disk access is going to > be much slower than the benefit of speeding up edits. If you're talking > about including everything but the message text, then I can see the split > being a benefit over regexes. I think I would use an alternate delimiter, > though (maybe '::' or '||'). > > > > > > I agree that looking to the future is good but since it will be a major > > upgrade to use a different backend or whatever I would suggest that as > > long as we have the file IO in a module that provides a standard > > interface, the implementation can be changed at any time. > > > > Munging the datastore to fit whatever new strategy you want to use > > should be straightforward. > > I honestly could care less about XML, but I thought that being an in-use > standard might lead to greater acceptance over a proprietary format. It's > 'Give the people what they want', as far as I'm concerned. What is it they > want? > > > Did you see one big xml file ? > > Yes, or one wwwindex.xml with everything but individual message text, which > would be separate files. > > > If so, it will need to be locked while you write out a new one with the > > new post in it. This may have performance issues. > > Yes, if it includes the message text. If it's just the index with an XML (or > whatever) thread similar to what we have now in wwwboard.html, then it > should be faster than what we have now due to lack of XHTML > formatting/header/form/footer output. > > > > I'll try and work up some examples of what I mean as soon as possible > > but I'm at work right now and I should be doing other things :) > > Ok, when you have a moment. I don't believe this will be resolved today ;-). > Grant M. > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Nms-cgi-devel mailing list > Nms...@li... > https://lists.sourceforge.net/lists/listinfo/nms-cgi-devel > |
From: Nick C. <ni...@cl...> - 2003-01-21 16:12:35
|
On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > For some reason I have this gut dislike of saying that something is "XML" > (which has a well defined spec) but then implementing a parser that can > only read files formatted in our way. (for reasons of speed/size/inlining) That's simple to fix in the docs - it saves data in "an XML-like format". -- Nick |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:08:00
|
On Tue, Jan 21, 2003 at 10:34:43AM -0500, Wizard wrote: > > I agree that looking to the future is good but since it will be a major > > upgrade to use a different backend or whatever I would suggest that as > > long as we have the file IO in a module that provides a standard > > interface, the implementation can be changed at any time. > > > > Munging the datastore to fit whatever new strategy you want to use > > should be straightforward. > > I honestly could care less about XML, but I thought that being an in-use > standard might lead to greater acceptance over a proprietary format. It's > 'Give the people what they want', as far as I'm concerned. What is it they > want? For some reason I have this gut dislike of saying that something is "XML" (which has a well defined spec) but then implementing a parser that can only read files formatted in our way. (for reasons of speed/size/inlining) The only concrete reason I can see for suggesting that it would be a bad think to have an XML message file is that we make it look like the site admin can download the message file, edit it in something that reads XML happily to "correct" something, and then upload the edited XML back to the server. And their XML editor has written out XML consistent with the schema that our output XML implied. Only it's not quite the same (is order allowed to change?) and our script barfs. And they hassle the support list about this. And we say "but you're not supposed to do that" and they reply "but you say it stores it as XML. And what I gave it *is* XML" Whereas if we have a clearly proprietary format abstracted nicely behind a flexible interface, then we can change from our format to XML to a relational database to trained monkeys whenever we feel like it. Nicholas Clark |
From: Simon W. <es...@ou...> - 2003-01-21 16:04:09
|
On Tue, 2003-01-21 at 15:51, Wizard wrote: > > Yes, I imagined that index would have just the hierarchy in it. I would > > hope that most pages would be cached so that there isn't a large > > overhead in returning a view. > I'll buy that, but the only way that I've ever done caching is under > mod_perl or using sessions. I don't think I really know of a way to do it > otherwise under CGI, short of writing out the XHTML file, but doesn't that > defeat the whole purpose? Not necessarily. If you can pre-bake all the pages then it is simply a case of letting the webserver deliver them back, no cgi involved. Then you just change the pages that need changing each time a message is posted. It breaks down a bit if you get very dynamic sitesas you're constantly rewriting html pages but for less trafficed ones it works quite well. > > I guess it depends a lot on how many viewing options there are, both > > statically (different layout options defined in a configuration file > > somewhere) and dynamically (search results, partial threads etc). > > Well, for right now, that's going to be limited, with room for expansion. > Anything beyond what we have now should be an improvement. > > Also, don't think that I don't necessarily approve of your idea, I'm just > trying to weigh pros and cons of each. This is how I work, and I understand > it can offend some people, but it's not intentional. No offence taken :) S. |
From: Wizard <wi...@ne...> - 2003-01-21 15:56:10
|
> Yes, I imagined that index would have just the hierarchy in it. I would > hope that most pages would be cached so that there isn't a large > overhead in returning a view. I'll buy that, but the only way that I've ever done caching is under mod_perl or using sessions. I don't think I really know of a way to do it otherwise under CGI, short of writing out the XHTML file, but doesn't that defeat the whole purpose? > I guess it depends a lot on how many viewing options there are, both > statically (different layout options defined in a configuration file > somewhere) and dynamically (search results, partial threads etc). Well, for right now, that's going to be limited, with room for expansion. Anything beyond what we have now should be an improvement. Also, don't think that I don't necessarily approve of your idea, I'm just trying to weigh pros and cons of each. This is how I work, and I understand it can offend some people, but it's not intentional. Grant M. |