From: Jeffrey P S. <je...@cu...> - 2002-04-15 21:03:43
|
On 4/14/02 2:22 PM, "Mike Orr" <ir...@ms...> wrote: > I would like to see a CMS that handles standard HTML documents or > at least standard HTML document bodies. I could use this in a > couple projects, although none that's urgent to start right away. > The thing is, when ppl already know HTML, or you're dealing with > outside authors that have already written an HTML article, it's > easier to use that as your source rather than insisting on some > "pure" XML or wiki format. Yes, it causes complications for > converting the article body to another format later... *IF* that > ever becomes an issue for the particular application. Maybe it > won't. A pattern I've been using for text handling has migrated through a few refactorings to become a bag of Handlers. The main interfaces are:: from Interface import Base, Attribute class IHandlerResult(Base): """ IHandler objects return IHandlerResult objects, which are simple records that any text managing utility can use. """ source = Attribute("The web-editable source code.") cooked = Attribute("Rendered (cooked) code, used for display") fullsource = Attribute(("The full source of the text, to be presented " "to non-web clients")) headers = Attribute("A mapping object of headers/values") class IHandler(Base): """\ IHandler objects process text and return an IHandlerResult object. """ def handle(text): """\ Processes the incoming text and returns an IHandlerResult object. """ I have a couple of existing singletons for common uses -- editing text directly in a text area with little or no HTML knowledge, and uploading HTML exported from monsters like WordPerfect and other tools. They parse out the meta-tags and title tag (and put them in the headers result); they parse out the contents of the <body> tags. Then the "SafeHTMLHandler" uses an SGML parser coupled with a table of valid tags to rewrite the incoming code to something deemed as safe - basically it gets rid of a lot of the extra crap that Word, Wordperfect, Composer, etc, puts in. The original full source is kept (at the handler's clients discretion) for FTP/DAV editing without too much surprise on the authors part, while a <textarea> friendly version of the source (the parts between the body tags) is also kept for web based editing. And, there's the cooked content - basically this is where the handler transforms the input into HTML to be used within the standard look of the site. This is computed only on upload time. I also have a simple registry where handlers can be stored/retrieved, so that a document may change it's internal structure at any time at the users discretion, and new handlers may be written without the Document class having to know about them - it just queries the registry to find the right handler and to list available ones, and then uses what it wants to out of the results (which are always uniform). It's proven to be a nicely usable design so far. I'm now working on the next level - compound documents. -- Jeffrey P Shell www.cuemedia.com |