From: Wizard <wi...@ne...> - 2003-01-21 12:13:07
|
Here's what I have in mind for WWWBoard2, along the lines of TFMail: 1.> WWWBoard2 stores XML for the main page and all messages. 2.> WWWBoard2 transforms this when requested using a header, footer and data template, like this (pseudo-code): <List-View> a.> Request for virtual wwwboard.html (which is now wwwview.pl) b.> The wwwheader.html file is read and printed. c.> The XML is parsed into the wwwdata.html and printed d.> The wwwfooter.html file is read and printed. Note: Yes, I realize that a single template would be simpler for the user, but it would require much more complex parsing, which would be much slower (and more server load). Perhaps I could include a utility to pre-parse a full HTML template? I am not using XSLT for the simple reason that it would be much more complex, parsing-wise, and as we're not using CPAN modules, I'm not going to reinvent the wheel. This may happen sometime later when we've converted the masses. <Message-View> a.> A particular message is selected (wwwview.pl?message=419) b.> If exist wwwmheader.html else wwwheader.html is parsed w/title & printed. c.> The XML message is parsed into the wwwmessage.html and printed d.> If exist wwwmfooter.html else wwwfooter.html is parsed w/thread & printed. I suspect that the XML for the main page will look like this: <MessID name="14"> <subject>NMS scripts are great!</subject> <user>The Professor</user> <email>fr...@fs...</email> <date>20/01/03</date> <moderate>0</moderate> <MessID name="17"> <subject>Re: NMS scripts are great!</subject> <user>The Wizard</user> <email>de...@nu...</email> <date>22/01/03</date> <moderate>0</moderate> </MessID> </MessID> Note the nesting. This would require recursive calls to the data templating mechanism which I just realized and need to think about (suggestions welcome). I'm not sure if I should just go ahead and put the message body in here and make the whole mess one file, only displaying the information requested (mainview or messageview). The 'moderated' tag allows for the moderation of messages by the admin, meaning if set to '1', the messages are not displayed to the user, except during submittal. This could be set as the default, meaning that all new messages are moderated until reviewed by the admin (or maybe only certain users/ips/domains?). This version could include a script to convert an existing WWWBoard to WWWBoard2, although it'd require some extensive testing (due to the message formats, not the wwwboard.html itself). Let me know what you think, Grant M. wi...@ne... |
From: Simon W. <es...@ou...> - 2003-01-21 13:45:34
|
On Tue, 2003-01-21 at 12:08, Wizard wrote: > Here's what I have in mind for WWWBoard2, along the lines of TFMail: XML is a bitch to parse without using modules that we don't want to have people install. It's also slooooow. I'd suggest a format that has one message per file and an index of related messages, perhaps something like this: id: 14 respondsto:12 subject:NMS scripts are great! user:The Professor email:fr...@fs... date:20/01/03 moderate:0 body: Some body text on multiple lines up to the end of the file Then the index looks like: 12+: 14 17 18 and contains a list of messages and the responses to it. The + indicates that the message is the head of a thread. A script would be provided to rebuild the index from the data files. In use, html pages would be generated for each thread and cached, either on request or on change. If the data set got very large then it could be extended by adding a threadhead: field to the data format and extending the index to store *all* the responses in that thread which would cut down the amount of data that would need to be read in any one go. Just my =C2=A30.02. Simon. |
From: Wizard <wi...@ne...> - 2003-01-21 14:39:26
|
<SNIP> > Then the index looks like: >=20 > 12+: 14 17 18 I'm not sure that if we have an expected XML hierarchy, how this is any = different. It's one thing to have a format with unexpected input (i.e, = true XML), but I don't see that parsing a defined XML format is any = different than parsing an HTML format, except that the hierarchy is more = determinate. The plus is that should we decide to later include full XML = parsing, then the formats is already there. I think that this would ease = the implementation of future enhancements such as database backends, = XSLT, and document conversion. >=20 > and contains a list of messages and the responses to it. The + = indicates > that the message is the head of a thread. >=20 > A script would be provided to rebuild the index from the data files. >=20 > In use, html pages would be generated for each thread and cached, = either > on request or on change. >=20 > If the data set got very large then it could be extended by adding a > threadhead: field to the data format and extending the index to store > *all* the responses in that thread which would cut down the amount of > data that would need to be read in any one go. I'm not sure that I'm really grasping all of this (read 'my brain = hurts'), but perhaps you could patch together some example files with = comments, and I'll trying to get a better understanding. I'm thinking = 'singly-linked list', but it's just not working out in my head. Thanks! Grant M. |
From: Simon W. <es...@ou...> - 2003-01-21 14:50:25
|
On Tue, 2003-01-21 at 14:34, Wizard wrote: > <SNIP> > > Then the index looks like: > > > > 12+: 14 17 18 > > I'm not sure that if we have an expected XML hierarchy, how this is any different. It's one thing to have a format with unexpected input (i.e, true XML), but I don't see that parsing a defined XML format is any different than parsing an HTML format, except that the hierarchy is more determinate. The plus is that should we decide to later include full XML parsing, then the formats is already there. I think that this would ease the implementation of future enhancements such as database backends, XSLT, and document conversion. > Fair point but there is still an overhead in doing the regexes when my format uses simple splits. Plus you need to factor in the recursiveness of the format. Not that it won't work just fine but mine seems simpler to implement and less resource hungry. I agree that looking to the future is good but since it will be a major upgrade to use a different backend or whatever I would suggest that as long as we have the file IO in a module that provides a standard interface, the implementation can be changed at any time. Munging the datastore to fit whatever new strategy you want to use should be straightforward. > > > > and contains a list of messages and the responses to it. The + indicates > > that the message is the head of a thread. > > > > A script would be provided to rebuild the index from the data files. > > > > In use, html pages would be generated for each thread and cached, either > > on request or on change. > > > > If the data set got very large then it could be extended by adding a > > threadhead: field to the data format and extending the index to store > > *all* the responses in that thread which would cut down the amount of > > data that would need to be read in any one go. > > I'm not sure that I'm really grasping all of this (read 'my brain hurts'), but perhaps you could patch together some example files with comments, and I'll trying to get a better understanding. I'm thinking 'singly-linked list', but it's just not working out in my head. > Thanks! Did you see one big xml file ? If so, it will need to be locked while you write out a new one with the new post in it. This may have performance issues. I'll try and work up some examples of what I mean as soon as possible but I'm at work right now and I should be doing other things :) S. |
From: Wizard <wi...@ne...> - 2003-01-21 15:39:38
|
> > Fair point but there is still an overhead in doing the regexes when my > format uses simple splits. Plus you need to factor in the recursiveness > of the format. Not that it won't work just fine but mine seems simpler > to implement and less resource hungry. If you're saying that the index file will only contain the hierarchy (by ID?) and no message data, then I think I get what you're saying. That would be much faster for posting, but for viewing the index page we're then talking about open & close on each and every message file to get the subject, user, email and date. I'd guess that much disk access is going to be much slower than the benefit of speeding up edits. If you're talking about including everything but the message text, then I can see the split being a benefit over regexes. I think I would use an alternate delimiter, though (maybe '::' or '||'). > > I agree that looking to the future is good but since it will be a major > upgrade to use a different backend or whatever I would suggest that as > long as we have the file IO in a module that provides a standard > interface, the implementation can be changed at any time. > > Munging the datastore to fit whatever new strategy you want to use > should be straightforward. I honestly could care less about XML, but I thought that being an in-use standard might lead to greater acceptance over a proprietary format. It's 'Give the people what they want', as far as I'm concerned. What is it they want? > Did you see one big xml file ? Yes, or one wwwindex.xml with everything but individual message text, which would be separate files. > If so, it will need to be locked while you write out a new one with the > new post in it. This may have performance issues. Yes, if it includes the message text. If it's just the index with an XML (or whatever) thread similar to what we have now in wwwboard.html, then it should be faster than what we have now due to lack of XHTML formatting/header/form/footer output. > > I'll try and work up some examples of what I mean as soon as possible > but I'm at work right now and I should be doing other things :) Ok, when you have a moment. I don't believe this will be resolved today ;-). Grant M. |
From: Simon W. <es...@ou...> - 2003-01-21 15:46:14
|
On Tue, 2003-01-21 at 15:34, Wizard wrote: > > > > > Fair point but there is still an overhead in doing the regexes when my > > format uses simple splits. Plus you need to factor in the recursiveness > > of the format. Not that it won't work just fine but mine seems simpler > > to implement and less resource hungry. > > If you're saying that the index file will only contain the hierarchy (by > ID?) and no message data, then I think I get what you're saying. That would > be much faster for posting, but for viewing the index page we're then > talking about open & close on each and every message file to get the > subject, user, email and date. I'd guess that much disk access is going to > be much slower than the benefit of speeding up edits. If you're talking > about including everything but the message text, then I can see the split > being a benefit over regexes. I think I would use an alternate delimiter, > though (maybe '::' or '||'). Yes, I imagined that index would have just the hierarchy in it. I would hope that most pages would be cached so that there isn't a large overhead in returning a view. Only when updating a page (with a new post) would there be much processing. Then the index is used to identify which data files need to be pulled in to build a new view page. I guess it depends a lot on how many viewing options there are, both statically (different layout options defined in a configuration file somewhere) and dynamically (search results, partial threads etc). Simon. |
From: Wizard <wi...@ne...> - 2003-01-21 15:56:10
|
> Yes, I imagined that index would have just the hierarchy in it. I would > hope that most pages would be cached so that there isn't a large > overhead in returning a view. I'll buy that, but the only way that I've ever done caching is under mod_perl or using sessions. I don't think I really know of a way to do it otherwise under CGI, short of writing out the XHTML file, but doesn't that defeat the whole purpose? > I guess it depends a lot on how many viewing options there are, both > statically (different layout options defined in a configuration file > somewhere) and dynamically (search results, partial threads etc). Well, for right now, that's going to be limited, with room for expansion. Anything beyond what we have now should be an improvement. Also, don't think that I don't necessarily approve of your idea, I'm just trying to weigh pros and cons of each. This is how I work, and I understand it can offend some people, but it's not intentional. Grant M. |
From: Simon W. <es...@ou...> - 2003-01-21 16:04:09
|
On Tue, 2003-01-21 at 15:51, Wizard wrote: > > Yes, I imagined that index would have just the hierarchy in it. I would > > hope that most pages would be cached so that there isn't a large > > overhead in returning a view. > I'll buy that, but the only way that I've ever done caching is under > mod_perl or using sessions. I don't think I really know of a way to do it > otherwise under CGI, short of writing out the XHTML file, but doesn't that > defeat the whole purpose? Not necessarily. If you can pre-bake all the pages then it is simply a case of letting the webserver deliver them back, no cgi involved. Then you just change the pages that need changing each time a message is posted. It breaks down a bit if you get very dynamic sitesas you're constantly rewriting html pages but for less trafficed ones it works quite well. > > I guess it depends a lot on how many viewing options there are, both > > statically (different layout options defined in a configuration file > > somewhere) and dynamically (search results, partial threads etc). > > Well, for right now, that's going to be limited, with room for expansion. > Anything beyond what we have now should be an improvement. > > Also, don't think that I don't necessarily approve of your idea, I'm just > trying to weigh pros and cons of each. This is how I work, and I understand > it can offend some people, but it's not intentional. No offence taken :) S. |
From: Wizard <wi...@ne...> - 2003-01-21 16:23:22
|
> Not necessarily. If you can pre-bake all the pages then it is simply a > case of letting the webserver deliver them back, no cgi involved. > > Then you just change the pages that need changing each time a message is > posted. > Ok, something like this then? (psuedo-code): /wwwview.pl?f=index -r wwwindex.html : print "Location: $url_wwwindex" ? &bake_new_index /wwwboard.pl [POST] &unlink_index if -r wwwindex.html; &post_message; # or vice-versa? > No offence taken :) Glad to hear it [damn, I mustn't be trying hard enough ;-)]. Grant M. |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:08:00
|
On Tue, Jan 21, 2003 at 10:34:43AM -0500, Wizard wrote: > > I agree that looking to the future is good but since it will be a major > > upgrade to use a different backend or whatever I would suggest that as > > long as we have the file IO in a module that provides a standard > > interface, the implementation can be changed at any time. > > > > Munging the datastore to fit whatever new strategy you want to use > > should be straightforward. > > I honestly could care less about XML, but I thought that being an in-use > standard might lead to greater acceptance over a proprietary format. It's > 'Give the people what they want', as far as I'm concerned. What is it they > want? For some reason I have this gut dislike of saying that something is "XML" (which has a well defined spec) but then implementing a parser that can only read files formatted in our way. (for reasons of speed/size/inlining) The only concrete reason I can see for suggesting that it would be a bad think to have an XML message file is that we make it look like the site admin can download the message file, edit it in something that reads XML happily to "correct" something, and then upload the edited XML back to the server. And their XML editor has written out XML consistent with the schema that our output XML implied. Only it's not quite the same (is order allowed to change?) and our script barfs. And they hassle the support list about this. And we say "but you're not supposed to do that" and they reply "but you say it stores it as XML. And what I gave it *is* XML" Whereas if we have a clearly proprietary format abstracted nicely behind a flexible interface, then we can change from our format to XML to a relational database to trained monkeys whenever we feel like it. Nicholas Clark |
From: Nick C. <ni...@cl...> - 2003-01-21 16:12:35
|
On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > For some reason I have this gut dislike of saying that something is "XML" > (which has a well defined spec) but then implementing a parser that can > only read files formatted in our way. (for reasons of speed/size/inlining) That's simple to fix in the docs - it saves data in "an XML-like format". -- Nick |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:28:49
|
On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > For some reason I have this gut dislike of saying that something is "XML" > > (which has a well defined spec) but then implementing a parser that can > > only read files formatted in our way. (for reasons of speed/size/inlining) > > That's simple to fix in the docs - it saves data in "an XML-like format". I'll buy you a beer for everyone who reads the docs before mailing the support list. :-) [No cheating by getting people on the development list to act as stooges] And if it's XML-like, why not have XML-unlike? Unless they've changed it since the original version, http://www.bbc.co.uk/cgi-bin/search/results.pl is getting its google feed in google's "Simple Results Format", rather than XML, because the google format was very easy to parse in a CGI without either making a half baked ultra-fragile regexp "XML" parser. (Political reasons meant that initially that script had to be a CGI. It's now mod_perl) The google format is nice. It's <tag>:<length>:<data> where tag is alphanumeric (and certainly no ':'s), length is the length of data (in bytes), and the data just is. It is very easy to parse in perl. (and C for that matter, although I didn't have to, because you can allocate a buffer using the size information before you have to read the actual data in). And the format actually allows data with embedded newlines, NUL bytes and any other indigestible characters. Nicholas Clark |
From: Nick C. <ni...@cl...> - 2003-01-21 16:54:23
|
On Tue, Jan 21, 2003 at 04:28:33PM +0000, Nicholas Clark wrote: > On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > > > For some reason I have this gut dislike of saying that something is "XML" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > (which has a well defined spec) but then implementing a parser that can > > > only read files formatted in our way. (for reasons of speed/size/inlining) > > > > That's simple to fix in the docs - it saves data in "an XML-like format". > > I'll buy you a beer for everyone who reads the docs before mailing the > support list. :-) Where were we going to "say that something is XML" if not in the docs ? You dislike the idea of saying XML, but we can't fix that by saying "an XML-like format" instead because nobody is listening anyway ? Too zen for me :) -- Nick |
From: Nicholas C. <ni...@cc...> - 2003-01-21 16:56:05
|
On Tue, Jan 21, 2003 at 04:49:35PM +0000, Nick Cleaton wrote: > On Tue, Jan 21, 2003 at 04:28:33PM +0000, Nicholas Clark wrote: > > On Tue, Jan 21, 2003 at 04:07:45PM +0000, Nick Cleaton wrote: > > > On Tue, Jan 21, 2003 at 04:07:55PM +0000, Nicholas Clark wrote: > > > > > > > > For some reason I have this gut dislike of saying that something is "XML" > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > (which has a well defined spec) but then implementing a parser that can > > > > only read files formatted in our way. (for reasons of speed/size/inlining) > > > > > > That's simple to fix in the docs - it saves data in "an XML-like format". > > > > I'll buy you a beer for everyone who reads the docs before mailing the > > support list. :-) > > Where were we going to "say that something is XML" if not in the > docs ? > > You dislike the idea of saying XML, but we can't fix that by saying > "an XML-like format" instead because nobody is listening anyway ? > > Too zen for me :) I strongly dislike the idea of creating something that seems to be one thing (XML) but actually isn't. It strikes me as having a high chance for confusion later on, when someone assumes that it is the think it looks like (XML) Nicholas Clark |
From: Wizard <wi...@ne...> - 2003-01-21 17:11:24
|
> I strongly dislike the idea of creating something that seems to be one > thing (XML) but actually isn't. > > It strikes me as having a high chance for confusion later on, when someone > assumes that it is the think it looks like (XML) As I said, I don't care about the 'XML' thing, but isn't the problem you suggested the same as someone editing the current wwwboard.html file with an HTML editor, and munging the comment tags? I do understand your concern though, I just don't know that we can prevent stupidity ;-O. Besides, we don't have to 'say' that we store any format, we can just do it. What the user assumes is their problem. We could put a line at the beginning that is totally not XML and would crash a parser (maybe even check for that line). Then should we decide to convert to support XML, we just remove the line. Grant M. |
From: Paul R. <pa...@ro...> - 2003-01-21 16:17:26
|
If I might jump in... I like the basic ideas going on here, but let me throw out a few suggestions to be shot down. I'll skip the whole XML-or-not question for now. Not obvious that it buys us much here, though. Not clear that we really gain much by the one-message-per-file method, either. You quickly end up with huge directories, and so forth. I realize there are pros and cons either way, but consider some of the characteristics of this app vs., say, email storage: 1. Reads are much more frequent than posts 2. We will rarely, if ever, need to delete a message from the middle of the file 3. Unlike email, we can completely track and control content-length information for each message Appending to the end of a file (even a large one) is obviously no big performance problem. So what about index building, index display, and message display? Building the index can be quite fast if we store content-length (probably as a line count) as above. We know exactly how many lines to skip (and therefore not parse) at any time. Appending to the index is a one-shot if we store an in-reply-to ID with each message, rather than a list of replies with each parent (see the JWZ article below). Index display is easy if we extend the previously-mentioned format to include Poster, Date and Subject line. The index file will still be quite small relative to the main message text, which will never be read when we're displaying our message tree / list. Message display can still be quite efficient if we furthur extend the index to include the offset (by line, byte, whatever) of the message in the larger file. We seek (or quickly skip lines), we grab exactly content-length lines, and off we go. On thread-building, by the way, I highly recommend a look at Jamie Zawinski's algorithm and pseudocode for that (for email, which is a more-complicated version of the task we face): http://www.jwz.org/doc/threading.html Just my two cents ( 0.0187410 EUR, 0.0124266 GBP, 0.03 CAD). -paul ----- Original Message ----- From: "Wizard" <wi...@ne...> To: "Simon Wilcox" <es...@ou...> Cc: "nms-devel" <nms...@li...> Sent: Tuesday, January 21, 2003 10:34 AM Subject: RE: [Nms-cgi-devel] WWWBoard2 - PLEASE RESPOND > > > > > Fair point but there is still an overhead in doing the regexes when my > > format uses simple splits. Plus you need to factor in the recursiveness > > of the format. Not that it won't work just fine but mine seems simpler > > to implement and less resource hungry. > > If you're saying that the index file will only contain the hierarchy (by > ID?) and no message data, then I think I get what you're saying. That would > be much faster for posting, but for viewing the index page we're then > talking about open & close on each and every message file to get the > subject, user, email and date. I'd guess that much disk access is going to > be much slower than the benefit of speeding up edits. If you're talking > about including everything but the message text, then I can see the split > being a benefit over regexes. I think I would use an alternate delimiter, > though (maybe '::' or '||'). > > > > > > I agree that looking to the future is good but since it will be a major > > upgrade to use a different backend or whatever I would suggest that as > > long as we have the file IO in a module that provides a standard > > interface, the implementation can be changed at any time. > > > > Munging the datastore to fit whatever new strategy you want to use > > should be straightforward. > > I honestly could care less about XML, but I thought that being an in-use > standard might lead to greater acceptance over a proprietary format. It's > 'Give the people what they want', as far as I'm concerned. What is it they > want? > > > Did you see one big xml file ? > > Yes, or one wwwindex.xml with everything but individual message text, which > would be separate files. > > > If so, it will need to be locked while you write out a new one with the > > new post in it. This may have performance issues. > > Yes, if it includes the message text. If it's just the index with an XML (or > whatever) thread similar to what we have now in wwwboard.html, then it > should be faster than what we have now due to lack of XHTML > formatting/header/form/footer output. > > > > I'll try and work up some examples of what I mean as soon as possible > > but I'm at work right now and I should be doing other things :) > > Ok, when you have a moment. I don't believe this will be resolved today ;-). > Grant M. > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Nms-cgi-devel mailing list > Nms...@li... > https://lists.sourceforge.net/lists/listinfo/nms-cgi-devel > |
From: Wizard <wi...@ne...> - 2003-01-21 16:55:16
|
> 2. We will rarely, if ever, need to delete a message from > the middle > of the file Is this correct? If I'm getting what you're saying, we will need to delete messages from the beginning middle and end of a file if they contain more than one message. Especially if these are on quota'd systems. A friend of mine used to run WWWBoard on www.coonhounds.com, and he was always getting over-quota charges where he'd have to go in and delete tons of messages. Also, there's the issue of advertising posts and just plain inappropriate or offensive posts. > Building the index can be quite fast if we store content-length > (probably as > a line count) as above. I'm guessing that what you are proposing is fixed-length records. This could be substantially faster on most systems, but is it easily portable? > We know exactly how many lines to skip (and > therefore not parse) at any time. Appending to the index is a one-shot if > we store an in-reply-to ID with each message, rather than a list > of replies > with each parent (see the JWZ article below). This kinda brings up another concern/thought. If we use either this format or Simon's, don't I lose the benefit of a data-defined hierarchy? I understand that they both try to solve this using indexing, but I still have to deal with multi-level indents/nesting. In other words, with either of these formats, don't I have to presort all of the data, assigning index levels to each entry based upon the index level of the parent. With my format (XML or not), the hierarchy is predetermined by the nesting of each element, which means that they are already presorted. This means I only have to increment/decrement the indent level based upon the data tags encountered. I may be wrong, but it just sounds like a much simpler implementation to me (and where I'm doing it, that sounds good :-). KISS principle? > Index display is easy if we extend the previously-mentioned format to > include Poster, Date and Subject line. The index file will still be quite > small relative to the main message text, which will never be read > when we're > displaying our message tree / list. > > Message display can still be quite efficient if we further extend > the index > to include the offset (by line, byte, whatever) of the message in > the larger > file. We seek (or quickly skip lines), we grab exactly content-length > lines, and off we go. I do like the idea of this, but I would like to know it will work everywhere. > > On thread-building, by the way, I highly recommend a look at Jamie > Zawinski's algorithm and pseudocode for that (for email, which is a > more-complicated version of the task we face): > > http://www.jwz.org/doc/threading.html > > Just my two cents ( 0.0187410 EUR, 0.0124266 GBP, 0.03 CAD). I'll take a look. Thanks, Grant M. |
From: Simon W. <es...@ou...> - 2003-01-21 17:13:20
|
On Tue, 2003-01-21 at 16:50, Wizard wrote: > This kinda brings up another concern/thought. If we use either this format > or Simon's, don't I lose the benefit of a data-defined hierarchy? I > understand that they both try to solve this using indexing, but I still have > to deal with multi-level indents/nesting. In other words, with either of > these formats, don't I have to presort all of the data, assigning index > levels to each entry based upon the index level of the parent. With my > format (XML or not), the hierarchy is predetermined by the nesting of each > element, which means that they are already presorted. This means I only have > to increment/decrement the indent level based upon the data tags > encountered. I may be wrong, but it just sounds like a much simpler > implementation to me (and where I'm doing it, that sounds good :-). KISS > principle? I see what you're saying here. Of course you get into parsing complications when you have to try and parse the headers of the messages that are children of msg 2 out of this *without* using any XML:: modules. <msg id="1"> <head>text</head> <msg id="2"> <head>text</text> <msg id="3"> <head>text</text> </msg> <msg id="4"> <head>text</head> </msg> <msg id="5"> <head>text</text> <msg id="6"> <head>text</head> </msg> </msg> </msg> I don't see that it would be that hard to make a data structure from an index file that would convey the indenting information to the template. But I actually don't mind too much, if you;re going tolead the programming effort you can choose whatever you think is best :) I do wonder if perhaps we haven't allowed ourselves to get bogged down in the detail when we should be looking at the features that should be in any new version. So far we have these: Separation of data from presentation Posting alert email to admin Optional threading Thread-to-a-page Multiple threads (all ?) to a page Any others ? S. |
From: Wizard <wi...@ne...> - 2003-01-21 17:52:39
|
> I see what you're saying here. Of course you get into parsing > complications when you have to try and parse the headers of the messages > that are children of msg 2 out of this *without* using any XML:: > modules. I'm not sure what you mean here (I may be in a Tuna-coma). > I don't see that it would be that hard to make a data structure from an > index file that would convey the indenting information to the template. That's fine, just get me an example when you get a moment, and I'll happily take a look. NOTE: I DON'T WANT ANYONE TO THINK I'M FIXATED ON XML. I'm not, it's just that I seem to be visualizing the data in a format that is best described by XML. It appears to fit the schema naturally, and I don't have a decent grasp of the other formats yet. That's the reason I ask for format examples, so that I can visualize parsing them. > > But I actually don't mind too much, if you;re going tolead the > programming effort you can choose whatever you think is best :) Lead?!?!? who said that?!?!?! I thought I was just following! Quick, all cowards to the rear!!!! AAAAAAAARRRRGGGGHHHHH! > So far we have these: > > Separation of data from presentation Yes. This I think is the most important. > Posting alert email to admin I like this, but it's currently relegated to v2.1 (let me know if I should push this) > Optional threading I don't know if this is necessary for this iteration, but it should be pretty easy. > Thread-to-a-page Yup. > Multiple threads (all ?) to a page Er, I don't know about this. Perhaps a "show found thread" checkbox in a search function. v2.1? >Any others ? Modularization. Externalization of configuration. Moderation. IP/email/domain allow/deny Search on keyword/user/IP (and date?) User Accounts for posting That's it. Grant M. |
From: Wizard <wi...@ne...> - 2003-01-21 18:08:26
|
A couple more things (in reply to my own email) > Modularization. > Externalization of configuration. I have a module that I wrote some time ago for doing external configurations of Perl scripts. I know it works on numerous platforms, it's fast, and I am familiar with it. Should/can I use it? Here's the man page: http://www.neonedge.com/perl_tools/Config/ Let me know. There was something else, but now I can't remember. Definitely a Tuna-coma. Grant M. |
From: Nick C. <ni...@cl...> - 2003-01-22 11:48:28
|
On Tue, Jan 21, 2003 at 01:03:36PM -0500, Wizard wrote: > A couple more things (in reply to my own email) > > Modularization. > > Externalization of configuration. > I have a module that I wrote some time ago for doing external configurations > of Perl scripts. I know it works on numerous platforms, it's fast, and I am > familiar with it. Should/can I use it? Here's the man page: > http://www.neonedge.com/perl_tools/Config/ > Let me know. That's a nice config file handler, but (for NMS) I don't like the developer directives as special comments. Some NMS users will delete those to save space, even if we tell them not to. We should have a module that addresses config handling, but I don't think it should be tied to a particular file format. If I were doing this, I would define a class CGI::NMS::Config, with methods for fetching config values, and have a subclass for each different type of source of configuration data. The config file handling in TFmail manages with a single method for getting config values: $config->get( KEY [,DEFAULT_VALUE] ) so CGI::NMS::Config could just document (but not implement) that. Bits of Config::DynaConf would become CGI::NMS::Config::IniFile (or something) and would implement new() and get(). A CGI that needs a config would be passed a CGI::NMS::Config object as an argument to its constructor, so the CGI script might look something like: my $script = CGI::NMS::Script::NMSFoo->new( config => CGI::NMS::Config::IniFile->new('/path/to/config/file.ini'), ... ); $script->request; The beauty of that type of approach is that the script modules don't know or care where the config comes from. Other file formats can be added just by writing other CGI::NMS::Config::* modules. If someone wants the configuration to come from a database, all they need to do is write CGI::NMS::Config::DBI and plumb it in, no other modules need to change. We could even have CGI::NMS::Config::Hash, to allow a particular installation of a CGI to have a hard coded config independent of any external thing: my $script = CGI::NMS::Script::NMSFoo->new( config => CGI::NMS::Config::Hash->new({ secure => 1, foomode => 'yes', }), ... ); $script->request; ... and so on. One thing that needs to be sorted out for any configuration source is multiline values, those will be needed for templates in the config file. Another is how to get metacharacters like " and \ in the config value. I don't know off the top of my head how ini files handle that type of stuff. -- Nick |
From: Wizard <wi...@ne...> - 2003-01-22 13:38:53
|
> That's a nice config file handler, but (for NMS) I don't like the > developer directives as special comments. Some NMS users will delete > those to save space, even if we tell them not to. Those are special for use with CFGs shared between Windows/Java and Perl. They don't need to be in there for NMS. It still sucks up any name=value pair as long as it doesn't look like a comment. > We should have a module that addresses config handling, but I don't > think it should be tied to a particular file format. If I were doing > this, I would define a class CGI::NMS::Config, with methods for > fetching config values, and have a subclass for each different type > of source of configuration data. This isn't really tied to any particular format. It will work just fine with any "name=value" or "name=<<HERE" or even "name = value" -type file. It's just that it was written to be compatible with .ini and .properties files. As far as the "different type of source..." what do you mean? Where else would the config params come from, if not a file? > Bits of Config::DynaConf would become CGI::NMS::Config::IniFile (or > something) and would implement new() and get(). Ok, I think I get it. If I understand correctly, you want CGI::NMS::Config to be a generic wrapper for whatever mechanism is called, so DynaConf (or likely some variation) would become one of the implemented access mechanisms for CGI::NMS::Config. Like AnyDBM, which was discussed yesterday. Is that correct? > One thing that needs to be sorted out for any configuration source > is multiline values, those will be needed for templates in the config > file. I prefer to see separate template files for templates, but that's just me. DynaConf does support <<HERE documents. It originally worked by sucking up everything between the 'name=' and either the next 'name=' or the next comment, but I decided that was a bit dangerous. > Another is how to get metacharacters like " and \ in the config value. I > don't know off the top of my head how ini files handle that type of > stuff. I don't think there's a real issue there until you do something to it. For instance, with DynaConf you can actually have declarations like this (note - I had to add a LIMIT '2' to the module to split only on the first '=' for non <<HERE directives. It should have been there, but well, you know): Template=<link rel="stylesheet" type="text/css" href="css/nms.css" /> which you can then do: print $cfg->get( 'Template' ); and it will work fine. In fact, I've even done: _OBJ03=<<EOO my $old_value = shift @_; my $new_value = shift @_; $new_value ? $new_value : $old_value; EOO $dispatch{ 'func1' } = $cfg->get( '_OBJ03' ); $value = &{$dispatch{ 'func1' }}( $old, $new ); The only issue is embedded newlines. They can only be done within <<HERE documents, because DynaConf doesn't let you include a line like this: Name = ' Test\n this\n string\n' but it will keep them if done like this: Name = <<HERE Test this string HERE I can do the wrapper which should be relatively straight-forward, but are my answers to your comments satisfactory? Should I do the mods I suggest and include it? Let me know, Grant M. |
From: Nick C. <ni...@cl...> - 2003-01-22 15:31:15
|
On Wed, Jan 22, 2003 at 08:34:02AM -0500, Wizard wrote: > > That's a nice config file handler, but (for NMS) I don't like the > > developer directives as special comments. Some NMS users will delete > > those to save space, even if we tell them not to. > > Those are special for use with CFGs shared between Windows/Java and Perl. > They don't need to be in there for NMS. It still sucks up any name=value > pair as long as it doesn't look like a comment. > > > We should have a module that addresses config handling, but I don't > > think it should be tied to a particular file format. If I were doing > > this, I would define a class CGI::NMS::Config, with methods for > > fetching config values, and have a subclass for each different type > > of source of configuration data. > > This isn't really tied to any particular format. It will work just fine with > any "name=value" or "name=<<HERE" or even "name = value" -type file. > It's just that it was written to be compatible with .ini and .properties > files. Ah, I hadn't got that it wasn't just an INI file reader. In that case IniFile probably isn't a good name for it. Maybe we're back to calling it CGI::NMS::Config::DynaConf. > As far as the "different type of source..." what do you mean? Where > else would the config params come from, if not a file? A config file in a different format, a DBM file, a remote database, hard coded into the CGI, fetched via LDAP or SOAP, made up at random, grabbed out of Win32::Registry, etc, etc. Anything that anyone might ever want to get config from. > > Bits of Config::DynaConf would become CGI::NMS::Config::IniFile (or > > something) and would implement new() and get(). > > Ok, I think I get it. If I understand correctly, you want CGI::NMS::Config > to be a generic wrapper for whatever mechanism is called, so DynaConf (or > likely some variation) would become one of the implemented access mechanisms > for CGI::NMS::Config. Like AnyDBM, which was discussed yesterday. Is that > correct? Yes, exactly. > > One thing that needs to be sorted out for any configuration source > > is multiline values, those will be needed for templates in the config > > file. > > I prefer to see separate template files for templates, but that's just me. > DynaConf does support <<HERE documents. It originally worked by sucking up > everything between the 'name=' and either the next 'name=' or the next > comment, but I decided that was a bit dangerous. Separate files for templates are nice, but TFmail users requested the ability to inline the templates into the main config file, so I added it. So long as DynaConf does here document like things, that's OK IMO. And it's a lot better than the way TFmail does it now, with % at the start of each line. > > Another is how to get metacharacters like " and \ in the config value. I > > don't know off the top of my head how ini files handle that type of > > stuff. > > I don't think there's a real issue there until you do something to it. For > instance, with DynaConf you can actually have declarations like this (note - > I had to add a LIMIT '2' to the module to split only on the first '=' for > non <<HERE directives. It should have been there, but well, you know): > > Template=<link rel="stylesheet" type="text/css" href="css/nms.css" /> > > which you can then do: > > print $cfg->get( 'Template' ); > > and it will work fine. In fact, I've even done: > > _OBJ03=<<EOO > my $old_value = shift @_; > my $new_value = shift @_; > $new_value ? $new_value : $old_value; > EOO > > $dispatch{ 'func1' } = $cfg->get( '_OBJ03' ); > $value = &{$dispatch{ 'func1' }}( $old, $new ); I'd be inclined to switch that off - it will be end users editing config files, if we let them put code in then they'll shoot themselves in the foot. > The only issue is embedded newlines. They can only be done within <<HERE > documents, because DynaConf doesn't let you include a line like this: > Name = ' Test\n this\n string\n' > > but it will keep them if done like this: > Name = <<HERE > Test > this > string > HERE The only issue I can see there is that it's impossible to have a value that includes newlines but doesn't end with a newline. I don't see that as a big problem at all. > I can do the wrapper which should be relatively straight-forward, but are my > answers to your comments satisfactory? Should I do the mods I suggest and > include it? Yes, I'm happy for this to go into CVS as /v2/lib/CGI/NMS/Config.pm and /v2/lib/CGI/NMS/Config/DynaConf.pm or similar. -- Nick |
From: Wizard <wi...@ne...> - 2003-01-22 16:47:52
|
> > and it will work fine. In fact, I've even done: > > > > _OBJ03=<<EOO > > my $old_value = shift @_; > > my $new_value = shift @_; > > $new_value ? $new_value : $old_value; > > EOO > > > > $dispatch{ 'func1' } = $cfg->get( '_OBJ03' ); > > $value = &{$dispatch{ 'func1' }}( $old, $new ); > > I'd be inclined to switch that off - it will be end users editing config > files, if we let them put code in then they'll shoot themselves in the > foot. The dispatch mechanism is part of the application, not the module. The code is just variable data as far as the module is concerned. > The only issue I can see there is that it's impossible to have a value > that includes newlines but doesn't end with a newline. I don't see that > as a big problem at all. chomp $value? ;-) > Yes, I'm happy for this to go into CVS as /v2/lib/CGI/NMS/Config.pm and > /v2/lib/CGI/NMS/Config/DynaConf.pm or similar. Ok, I'll put together the wrapper and strip down the module. I was thinking CGI::NMS::Config::CfgFile for the module? I know that you don't want to specify the mechanism implicitly, but Registry entries look incredibly like directory entries. I.e., "LMachine/Software/NMS/Config/", so not specifying could be a problem. Any suggestions? (I'm not even going to mention the fact that Win32::TieRegistry is a tied hash ;-) Let me know, Grant M. |
From: Nick C. <ni...@cl...> - 2003-01-22 17:41:24
|
On Wed, Jan 22, 2003 at 11:43:00AM -0500, Wizard wrote: > > The only issue I can see there is that it's impossible to have a value > > that includes newlines but doesn't end with a newline. I don't see that > > as a big problem at all. > > chomp $value? ;-) :) Yes, if the app knows that there's a newline on the end that the user doesn't want. Multiline strings without newlines at the end are quite silly anyway. I'm not worried. > > Yes, I'm happy for this to go into CVS as /v2/lib/CGI/NMS/Config.pm and > > /v2/lib/CGI/NMS/Config/DynaConf.pm or similar. > > Ok, I'll put together the wrapper and strip down the module. I was thinking > CGI::NMS::Config::CfgFile for the module? Could do, but you've kinda got 'config' in the name twice there. CGI::NMS::Config::DynaFile ? CGI::NMS::Config::NMSFile ? Either of those would make me happy, because the name says that it's a specific type of config file, with a syntax defined by us. NMS in the second one twice, I know, but it means different things (one of the NMS modules vs. the NMS config file format) so I think it's OK. I suppose CGI::NMS::Config::CfgFile would do if you're really attached to it though. Not a huge issue. > I know that you don't want to specify the mechanism implicitly, but Registry > entries look incredibly like directory entries. I.e., > "LMachine/Software/NMS/Config/", so not specifying could be a problem. Any > suggestions? Sorry, you've lost me there. What problem ? > (I'm not even going to mention the fact that Win32::TieRegistry is a tied hash ;-) good ;) -- Nick |