From: Darcy O'N. <ds...@sk...> - 2004-03-24 23:28:23
|
OK, so things are moving along great except for a couple of things. 1. The use of the <para> tag, especially when nested in <itemizedlist> - <listitem>, is causing some problems. The reason for this is that in Ventura you specify the relationship of the tag <itemizedlist><listitem><para> and then Ventura knows what 'Style' to apply to that section. The problem comes into play when Ventura doesn't recognize the 'tree structure', so it just applies the standard <para> tag. And in this case it should have bullet's. In short to make it convenient we need to figure out a way for Ventura to recognize the nested tags without me building every possible relationship. This can even cause havoc in the Section tags <sect1> <sect2> etc. But luckily they only go to <sect5> so it's limited. However, you can infinitely nest tags. Basically, should we use <simpara> for the simple paragraphs, or possible use and 'element' tag to identify nested tags? Are we looking to be 100% compatible with DocBook? I'm thinking should we freeze the tags we use now, and then as people have a need for more advanced stuff we add them as we go. It makes it much easier for me if someone tells me they're putting in <tagx> and they want it formated in such a way. As opposed to someone using a whole bunch of uncommon tags that break the document. Eventually, the whole DocBook standard could be supported but that's a big project in it's self! Maybe I should call Corel :) Darcy O'Neil |
From: Paul V. <pa...@vi...> - 2004-03-25 15:21:53
|
Hi Darcy, > 1. The use of the <para> tag, especially when nested in > <itemizedlist> - <listitem>, is causing some problems. The reason > for this is that in Ventura you specify the relationship of the tag > <itemizedlist><listitem><para> and then Ventura knows what 'Style' > to apply to that section. The problem comes into play when Ventura > doesn't recognize the 'tree structure', so it just applies the > standard <para> tag. And in this case it should have bullet's. But a standard <para> shouldn't have a bullet, should it? Is the problem maybe that Ventura _does_ remember "we're inside a bulleted list now" and then gives each paragraph a bullet just like Word does? DocBook is like HTML in this respect: a listitem gets one bullet, right at the start, no matter what happens inside. What happens if you import an HTML file containing a structure like this: <ul> <li> <p>First paragraph inside listitem</p> <p>Second paragraph inside listitem</p> <p>Third paragraph inside listitem</p> </li> <li> ... </li> </ul> Do all the paragraphs get bullets too? > In short to make it convenient we need to figure out a way for > Ventura to recognize the nested tags without me building every > possible relationship. This can even cause havoc in the Section tags > <sect1> <sect2> etc. But luckily they only go to <sect5> so it's > limited. However, you can infinitely nest tags. > > Basically, should we use <simpara> for the simple paragraphs, It wouldn't solve the problem, because there would still be a lot of paras-in-listitems left that contain blockquotes, programlistings etc. You can't use a <simpara> there. Also, many DocBook-aware XML editors start a <para> when you hit Enter. The docwriter would have to manually change that to <simpara> all the time (and then back to <para> as soon as he includes a block element in the paragraph). > or possible use and 'element' tag to identify nested tags? What do you mean by that? In any case we can't invent new tags because then the document isn't DocBook anymore. Editors will deem it invalid and refuse to treat it as DocBook (you could still edit it as general XML but then you lose all the DocBook validation). Processors will break on it. So that's simply not an option. Another consideration is that there's nothing wrong with the DocBook format as such. The problem you're facing now has to do with importing it into Ventura (which does support XML, but unfortunately not DocBook). So if the problem can be solved it has to be solved there, not by bending or breaking the DocBook standard. > Are we looking to be 100% compatible with DocBook? It's not really a question of being "compatible" - our docs *are* DocBook docs. And the processing tools we use are based on that. It just so happens they do a poor job when it comes to PDF. > I'm thinking should we freeze the tags we use now, and then as > people have a need for more advanced stuff we add them as we go. That would first require the generation of a list with all the tags we use now, and then ask people to avoid other tags if they can. That's not a good approach. One of the reasons DocBook was chosen is its richness of structural and semantical tags. If you author a DocBook doc, you should always use the best-fitting tag for the situation, AND only choose tags on the basis of meaning. You should never try to choose tags for presentational reasons, or because proprietary application A or B can't deal with certain tags or combinations thereof. As soon as you start doing that, you are working towards one specific representation of the text (possibly harming other renderings) and you make your document less valuable. Try to look at it this way: the DocBook XML version *is* the document. The HTML and PDF renderings are ways to look at it, they are a kind of viewport. If we have a problem with the rendering, we must try to fix the rendering, not change the document itself. That said, you're still left with the problem. As an outsider to Ventura I can't say if it can be fixed, and how. But: - First realize it doesn't have to be solved in a week. If you see a chance to develop a fix but it will take more time, by all means take your time! If we can have these great-looking PDF versions in half a year, without the errors they contain now: fine. - Second: if it really can't be fixed, we could add a transformation layer between the DocBook XML and Ventura. That is: we could develop XSL stylesheets that convert the DocBook XML files to something that's similar but with the "problem tags" removed or replaced by other tags (which we could define ourselves). This layer wouldn't be DocBook anymore; it would only be used for the Ventura import. Mind you: this could take months too before it worked! But that's no reason not to start with it if we can't fix it otherwise. Before we do such a thing though, we could also have a look at improving the current FO-producing stylesheets. After all, if we can produce good and attractive PDF without Ventura, this would even be better. But this too will take a lot of time I'm afraid. Greetings, Paul Vinkenoog |
From: Darcy O'N. <ds...@sk...> - 2004-03-25 19:12:50
|
Hello, > Another consideration is that there's nothing wrong with the DocBook > format as such. The problem you're facing now has to do with importing > it into Ventura (which does support XML, but unfortunately not > DocBook). So if the problem can be solved it has to be solved there, > not by bending or breaking the DocBook standard. I didn't mean to break the DocBook standard just identify certain <para> tags better. >>Are we looking to be 100% compatible with DocBook? > > It's not really a question of being "compatible" - our docs *are* > DocBook docs. And the processing tools we use are based on that. It > just so happens they do a poor job when it comes to PDF. Actually the real documents are HTML, PDF and possible RTF. Nobody reads a DocBook XML file. The DocBook standard is only a 'means to an end', and a way to manage the information. The users of the Firebird database are expecting to read the documents in an easily recognizable and standard format i.e. Paper, PDF, HTML The comment about being 100% compatible is better said as: can we reasonably be expected to support all the tags, from the start, in the DocBook standard when creating manuals if we want to develop quality manuals. >>I'm thinking should we freeze the tags we use now, and then as >>people have a need for more advanced stuff we add them as we go. > > That would first require the generation of a list with all the tags we > use now, and then ask people to avoid other tags if they can. Actually we don't want people to avoid tags, just tell us when they are going to use them, if they aren't on the current list, then I can add them to the mapping file and it will work fine. Without that notice the doc will publish fine but the section using the new tag won't look like the author intended. > - First realize it doesn't have to be solved in a week. If you see a > chance to develop a fix but it will take more time, by all means > take your time! If we can have these great-looking PDF versions in > half a year, without the errors they contain now: fine. > > - Second: if it really can't be fixed, we could add a transformation > layer between the DocBook XML and Ventura. That is: we could develop > XSL stylesheets that convert the DocBook XML files to something > that's similar but with the "problem tags" removed or replaced by > other tags (which we could define ourselves). This layer wouldn't be > DocBook anymore; it would only be used for the Ventura import. Mind > you: this could take months too before it worked! But that's no > reason not to start with it if we can't fix it otherwise. The best solution is to do a document release. i.e. freeze the document at a certain point, cut it and then process it with Ventura. Any errors can be hand edited. Now to most computer people automations is what they are used too, from a desktop publishers standpoint you have to manually go through each page and make the document perfect. If that means a few hours of hand editing for a better document then I'd prefer to do that. Spending days or hours making a 'transform layer' isn't worth it. > Before we do such a thing though, we could also have a look at > improving the current FO-producing stylesheets. After all, if we can > produce good and attractive PDF without Ventura, this would even be > better. But this too will take a lot of time I'm afraid. That's the current benefit of Ventura in that it is designed to publish books and make them look good without a lot of time. I suspect that if we get the XML import at 90 to 95% the other stuff can be hand edited to make the document perfect. Darcy O'Neil |
From: Paul V. <pa...@vi...> - 2004-03-26 11:12:21
|
Hi Darcy, > I didn't mean to break the DocBook standard just identify certain > <para> tags better. Do you have any suggestion how? Without breaking DocBook validity, that is. The only possibility I see is to use the "role" attribute and scan for that in your import definition. However, this would impose on every docwriter the task to manually add this attribute whenever he or she inserts a <para> in certain situations. I don't know if this will work in practice. On a more philosophical level, it does kind of "pollute" the source document. If we should take this route: what would you like to see in there, and in which situation(s) exactly? >> It's not really a question of being "compatible" - our docs *are* >> DocBook docs. And the processing tools we use are based on that. It >> just so happens they do a poor job when it comes to PDF. > Actually the real documents are HTML, PDF and possible RTF. Nobody > reads a DocBook XML file. The DocBook standard is only a 'means to > an end', and a way to manage the information. True, but the HTML, PDF, etc. are also (maybe even more so) "only" a means to an end: our goal is to produce documentation and make it accessible for the users. The output formats are not goals in themselves, they are means to let the user access the information in a practical and user-friendly way. In ten years there may be totally different output formats, but the DocBook sources will still be valid because they don't contain presentational markup - only structured informational content. That's why we should be careful not to bend our DocBook sources in order to satisfy one particular representation. If already we succeeded (which is doubtful) we might wind up with less clean DocBook sources and as a result get al kinds of other problems later, e.g. when we start rendering to new formats like HTML Help, or formats that don't even exist yet today. Again, the problem you're facing right now is not in the DocBook sources but in the rendering. So if possible, that's where we should deal with it, not in the sources - because those are fine. Now, about listing the tags and reporting the use of "new" tags: I don't think we can ask that from the docwriters. They are supposed to produce valid DocBook and apply the right tag for the situation. If they would have to check the tags they use against an external list and maintain a second list to report any tag you haven't taken care of yet, things become pretty cumbersome. There's a better way to do this: I could write a small program that reads any XML file, extracts the tags, and looks them up in a list of known tags. Any tags not in the list are reported so you can deal with them. This is faster (apart from writing the prog once) and more reliable than having everybody doing it manually. If this would be of help to you, I could write such a program in 1 - 3 weeks (depending on how busy I am with other things). Am I right in saying that the real problem is not in the tags (because we can deal with those one by one) but in the nesting? Is Ventura not aware of the nesting? That's the impression I get right now. Greetings, Paul Vinkenoog |
From: Darcy O'N. <ds...@sk...> - 2004-03-26 12:44:17
|
Hello, >If we should take this route: what would you like to see in there, and >in which situation(s) exactly? > > Not sure yet, we have lots of time to figure this out so I'll just keep hacking away with the XML Map and see if I can get a reasonable solution. > >True, but the HTML, PDF, etc. are also (maybe even more so) "only" a >means to an end: our goal is to produce documentation and make it >accessible for the users. The output formats are not goals in >themselves, they are means to let the user access the information in a >practical and user-friendly way. In ten years there may be totally >different output formats, but the DocBook sources will still be valid >because they don't contain presentational markup - only structured >informational content. > We're pretty much on the same page here, DocBook manages the 'information', but the end users require an accessible document. Without informed end users there really isn't a project. Since I started using Firebird the only docs available we're from Interbase, since we are moving away from Interbase it's going to get harder and harder for people to migrate from other databases. Think of it this way: the most poorly managed document system that still produces readable documents is better than the best managed document system that produces broken and non-user friendly document. Having spent years working at one of the worlds largest oil companies I can tell you that most of the documents were poorly managed (i.e. Excel = database, Word = desktop publishing system, Document Control System - None, Format Standard = Microsoft) but very accessible to the end user. Because the dumbest people could access the documents, the company was very successful. >There's a better way to do this: I could write a small program that >reads any XML file, extracts the tags, and looks them up in a list of >known tags. Any tags not in the list are reported so you can deal with >them. This is faster (apart from writing the prog once) and more >reliable than having everybody doing it manually. If this would be of >help to you, I could write such a program in 1 - 3 weeks (depending on >how busy I am with other things). > > The Ventura XML map actually creates a list of used tags, it would be easier for me to use diff to see which new tags were added during a 'release' period. We should probably concentrate on documenting Firebird as opposed to building the perfect doc system, even though that in itself is a worthy cause. >Am I right in saying that the real problem is not in the tags (because >we can deal with those one by one) but in the nesting? Is Ventura not >aware of the nesting? That's the impression I get right now. > > I agree that for the most part the DocBook standard is good, but even the in the DocBook guide it clearly states in the <para> description that 'Some processing systems may find the presence of block elements in a paragraph difficult to handle. - and - There is no easy answer to this problem." No single solution is perfect and over time this will get better, so for now we'll see what we can do. Darcy O'Neil |
From: Paul V. <pa...@vi...> - 2004-04-01 22:58:23
|
Hello Darcy, > Think of it this way: the most poorly managed document system that > still produces readable documents is better than the best managed > document system that produces broken and non-user friendly > document. Having spent years working at one of the worlds largest > oil companies I can tell you that most of the documents were poorly > managed (i.e. Excel = database, Word = desktop publishing system, > Document Control System - None, Format Standard = Microsoft) but > very accessible to the end user. Because the dumbest people could > access the documents, the company was very successful. What you say is true, but many companies have since discovered that this unmanageability of documents is becoming a bigger and bigger problem. In fact this is the most important reason why formats such as DocBook - separating content from presentation - were developed, and companies are now investing noticeable resources in converting their documentation. > We should probably concentrate on documenting Firebird as opposed to > building the perfect doc system, even though that in itself is a > worthy cause. Sure, our main goal is to produce documentation. But to be able to do that, and especially to keep them manageable, we must invest time in the system too. The more we can automate the rendering, the better. Everything that's not automated when it comes to e.g. the PDF rendering will have to be done by hand *over and over again* - for every new document, and for every new revision of every existing document. That's why I don't mind investing time thinking about and discussing the Ventura import you're working on. Because if this works, and if we succeed in minimizing the handwork, the time invested will pay itself back n-fold: we will have uncompromised DocBook sources, clear and attractive HTML pages, *and* state-of-the-art PDFs with minimal (ideally none at all) post-rendering correction necessary. In other words, by investing time in improving the rendering system now, we will have more time later to concentrate on what it's all about: writing docs. Greetings, Paul Vinkenoog |