|
From: Adam C. <ad...@ch...> - 2002-07-05 14:37:55
|
Hi. I would like to implement a transform/filter with which you can select a single section (by name or ordering) to be processed further, filtering all the other ones out. This can be used for extracting the abstract of a document, for example, so you can put it on a separate page (with a link to the full document perhaps). I would greatly apreciate any directions and tips where to start with this. I guess I would inherit from Transform or Filter? Can you (in the application using docutils) specify/add extra transforms/filters to be used when processing a source? Also, I noticed that there seems to be some provision for language localization. Is this in a usable state and is the current API reasonable stable? If so, I can contribute some code to support swedish. --- Adam Chodorowski <ad...@ch...> Computers are not intelligent. They only think they are. |
|
From: David G. <go...@us...> - 2002-07-05 23:07:22
|
Adam Chodorowski wrote: > I would like to implement a transform/filter with which you can > select a single section (by name or ordering) to be processed > further, filtering all the other ones out. This can be used for > extracting the abstract of a document, for example, so you can put > it on a separate page (with a link to the full document perhaps). > > I would greatly apreciate any directions and tips where to start > with this. I guess I would inherit from Transform or Filter? Can > you (in the application using docutils) specify/add extra > transforms/filters to be used when processing a source? Can you give us some background on what you're trying to accomplish? I'm not sure that a transform is what you want. A transform modifies a document tree; perhaps you just want to extract part of an existing document tree? Are you writing a custom Writer, such as to generate a multi-page web site? Perhaps a subclass of the html4css1.py Writer? > Also, I noticed that there seems to be some provision for language > localization. Is this in a usable state and is the current API > reasonable stable? If so, I can contribute some code to support > swedish. Usable: yes, in English. No other language is installed yet, so I couldn't say. Also, the code itself is not localized, just some strings for the parser to recognize and for use in generating output. Stable API: that depends on whether or not it works. It hasn't really been tested properly; just my first stab. I imagine it will need some work, but that can only be determined through real-life use. Contributions: always welcome. I was wondering -- Chodorowski -- a good Swedish name that. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Adam C. <ad...@ch...> - 2002-07-05 23:50:40
|
On Fri, 05 Jul 2002 19:08:29 -0400 David Goodger
<go...@us...> wrote:
> > I would like to implement a transform/filter with which you can
> > select a single section (by name or ordering) to be processed
> > further, filtering all the other ones out. This can be used for
> > extracting the abstract of a document, for example, so you can put
> > it on a separate page (with a link to the full document perhaps).
[...]
> Can you give us some background on what you're trying to accomplish?
> I'm not sure that a transform is what you want. A transform modifies
> a document tree; perhaps you just want to extract part of an existing
> document tree? Are you writing a custom Writer, such as to generate a
> multi-page web site? Perhaps a subclass of the html4css1.py Writer?
I (will) have several little larger reST documents on a website and I want to
construct an index page with short descriptions and a link to the full
document. So I thought I'd simply extract the abstract / introduction section
of those documents for the short description on the index page to avoid
duplication and ease maintanence.
The idea is to churn through the documents twice: the first pass creates the
HTMl documents in full, while the second pass applies this filter/transform to
get the abstract and adds it to the index page.
Another thing I would like to do is to either add templating support to the
HTML writer, or write a "fragment" HTML writer (which would only write out the
body of the document, so you can wrap it in your own header/footer for layout
(navigation bar etc)). But that's a different topic. :)
> > Also, I noticed that there seems to be some provision for language
> > localization. Is this in a usable state and is the current API
> > reasonable stable? If so, I can contribute some code to support
> > swedish.
>
> Usable: yes, in English. No other language is installed yet, so I
> couldn't say. Also, the code itself is not localized, just some
> strings for the parser to recognize and for use in generating output.
Yes, that's basically what I need (to have the generated string in the output
documents localized, like "Table of Contents"). I don't really care if the
docutils programs are localized themselves.
> Stable API: that depends on whether or not it works. It hasn't really
> been tested properly; just my first stab. I imagine it will need some
> work, but that can only be determined through real-life use.
The --language option is supposed to be used for this? Anyway, I'll have a
stab at implementing and testing it for swedish. Easy little task to start
with. ;-)
> I was wondering -- Chodorowski -- a good Swedish name that.
Hehehe.. :) The name is Polish. I have Polish parents, but I live (and have
always done so) in Sweden. I could probably translate it to Polish also, but
it would be a little harder for me (not to mention that I don't have a Polish
keyboard layout nor the right character set :)).
---
Adam Chodorowski <ad...@ch...>
Chapter 1
The story so far:
In the beginning the Universe was created. This has made a lot of
people very angry and been widely regarded as a bad move.
|
|
From: David G. <go...@us...> - 2002-07-06 03:31:37
|
Adam Chodorowski wrote: > I (will) have several little larger reST documents on a website and > I want to construct an index page with short descriptions and a link > to the full document. So I thought I'd simply extract the abstract / > introduction section of those documents for the short description on > the index page to avoid duplication and ease maintanence. > > The idea is to churn through the documents twice: the first pass > creates the HTMl documents in full, while the second pass applies > this filter/transform to get the abstract and adds it to the index > page. I assume that there's more to the index file than just the abstracts (such as introductory material, and perhaps repeated wrappers around abstracts). I can think of several ways to do what you describe: 1. First process each document individually, writing out each result file as usual. Then process the index file, which contains special references (directives), one per document, which cause the documents to be fully parsed a second time each and extract the "abstract" topics and insert them into the body of the index document. This would require some kind of "extraction" directives for the index document. 2. Store the abstracts as separate files, which are inserted into both the individual documents and into the index file with "include" directives (not yet implemented). This would require a new "include" directive (which is already on the To Do list). 3. As in (1), except process all of the individual documents in a single process, storing the extracted abstracts in a list (so the documents don't have to be processed a second time), and parse and assemble the index file last. This would require a new specialized front-end, along with at least a placeholder directive to locate the insertion-points for abstracts. 4. Write a full-blown templating system for Docutils. I think Python's ht2html.py is a good model: simple but effective. Either add programmability through a pre-processor like YAPTU or with directives like "repeat". Very vague ideas at this point. Which were you thinking of? > Another thing I would like to do is to either add templating support > to the HTML writer, or write a "fragment" HTML writer (which would > only write out the body of the document, so you can wrap it in your > own header/footer for layout (navigation bar etc)). But that's a > different topic. :) The HTML writer already exposes the components, so you can just grab the document body (everything inside but not including <body> & </body>). Use ``docutils.io.StringIO`` for the "destination_class" parameter of ``docutils.core.Publisher.__init__`` to avoid writing a file. (Hmm, idea: NullIO class.) However, the idea of custom headers & footers is what inspired the "decoration" element (which contains "header" & "footer" elements). It hasn't been fully developed yet. >>> Also, I noticed that there seems to be some provision for language >>> localization. ... > The --language option is supposed to be used for this? Yes. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Adam C. <ad...@ch...> - 2002-07-07 02:21:40
|
On Fri, 05 Jul 2002 23:32:47 -0400 David Goodger
<go...@us...> wrote:
> Adam Chodorowski wrote:
> > I (will) have several little larger reST documents on a website and
> > I want to construct an index page with short descriptions and a link
> > to the full document. So I thought I'd simply extract the abstract /
> > introduction section of those documents for the short description on
> > the index page to avoid duplication and ease maintanence.
> >
> > The idea is to churn through the documents twice: the first pass
> > creates the HTMl documents in full, while the second pass applies
> > this filter/transform to get the abstract and adds it to the index
> > page.
>
> I assume that there's more to the index file than just the abstracts
> (such as introductory material, and perhaps repeated wrappers around
> abstracts).
It is actually a little more complex than that, since I want to do have news
items on the same page for which I intended to utilize the bibliographic
fields repeatedly (once for each news item, for since the author of each news
item can vary and definately the date).
> I can think of several ways to do what you describe:
>
> 1. First process each document individually, writing out each result
> file as usual. Then process the index file, which contains special
> references (directives), one per document, which cause the
> documents to be fully parsed a second time each and extract the
> "abstract" topics and insert them into the body of the index
> document.
>
> This would require some kind of "extraction" directives for the
> index document.
>
> 2. Store the abstracts as separate files, which are inserted into both
> the individual documents and into the index file with "include"
> directives (not yet implemented).
>
> This would require a new "include" directive (which is already on
> the To Do list).
>
> 3. As in (1), except process all of the individual documents in a
> single process, storing the extracted abstracts in a list (so the
> documents don't have to be processed a second time), and parse and
> assemble the index file last.
>
> This would require a new specialized front-end, along with at least
> a placeholder directive to locate the insertion-points for
> abstracts.
>
> 4. Write a full-blown templating system for Docutils. I think
> Python's ht2html.py is a good model: simple but effective. Either
> add programmability through a pre-processor like YAPTU or with
> directives like "repeat". Very vague ideas at this point.
>
> Which were you thinking of?
Something along the lines of (1), although I did not intend to write the index
page in reST with special directives but rather write a script that calls the
docutils tools to generate the full documents and extract the abstracts into
files, which would then be concatenated with some extra HTML inserted
before/after and between them.
I do not like option (2) at all, since I would rather not split the document
up. One reason is that it would make it less readable as a plain text file
(unless one wrote a "plaintext" writer for docutils, but that would really be
a bit odd IMHO).
All the other options you listed basically work fine for me. (4) is perhaps
the most tempting as a future system, but it would probably require some
substantial amount of work. For my current need it simply seems to be easier
to write some scripts and add a few filtering tools to docutils...
> > Another thing I would like to do is to either add templating support
> > to the HTML writer, or write a "fragment" HTML writer (which would
> > only write out the body of the document, so you can wrap it in your
> > own header/footer for layout (navigation bar etc)). But that's a
> > different topic. :)
>
> The HTML writer already exposes the components, so you can just grab
> the document body (everything inside but not including <body> &
> </body>). Use ``docutils.io.StringIO`` for the "destination_class"
> parameter of ``docutils.core.Publisher.__init__`` to avoid writing a
> file. (Hmm, idea: NullIO class.)
Care to explain a little more? Perhaps I should take a closer look at the
relevant sources. Hmmm...
> However, the idea of custom headers & footers is what inspired the
> "decoration" element (which contains "header" & "footer" elements).
> It hasn't been fully developed yet.
Isn't that supposed to be a generic part of docutils for all kinds of writers?
I am not so interested in that, since the "decorations" that I want for my
online HTML version differ very greatly from the decorations I wish to have in
the PDF (for example) version. Perhaps I've misunderstood it though.
---
Adam Chodorowski <ad...@ch...>
Witness if you will Microsoft Outlook and Outlook Express, the two most
efficient virus propagation utilities ever devised by human intellectual
failure.
-- Thomas C Greene / The Register
|
|
From: David G. <go...@us...> - 2002-07-09 02:53:35
|
[David] >>>> However, a simple enumerated or bulleted list will do just fine >>>> for syntax. A directive could treat the list specially; e.g. the >>>> first paragraph could be treated as a question, the remainder as >>>> the answer (multiple answers could be represented by nested >>>> lists). [Adam] >>> But with that aproach you have the same problem: you can only have >>> short questions! [David] >> No, the question paragraph could be as long as you like. [Adam] > Yes, but you would not be able to have several paragraphs or other > construct (like a table), which *might* be useful to explain the > question more (although I agree that this is normally not the case > for a FAQ). In that case, extra syntax of some kind would be required (see answer below). >>> Wouldn't it make more sense to use a bulleted list for the >>> questions and answers, and alternating between them (ie. the first >>> bullet is a question, the second the answer, the third the >>> question, and so on)? >> >> No, that would be too clumsy. > > Why do you think that? Because questions and answers together form a logical unit (a "Q&A list item"). With each question and each answer entered as individual bullet list items, it would be hard to keep track of which is which. You'd have to scan the contents to see if it contained question words (why, what, etc.) or a question mark at the end. If every "question" list item had exactly one "answer" list item, it would be easy for the parser to keep track, but perhaps not so easy for the reader. Thus the "Q" and "A" symbols that often accompany Q&A lists. But sometimes there are questions without answers. And sometimes there are questions with multiple answers. So a simple bullet list with individual items for each question and each answer *isn't* enough for the parser to keep track. Both the human readers and the software parser will need something more explicit. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: David G. <go...@us...> - 2002-07-11 01:58:38
|
Adam Chodorowski wrote: > ... I want to do have news items on the same page for which I > intended to utilize the bibliographic fields repeatedly (once for > each news item, for since the author of each news item can vary and > definately the date). Be careful: although you can use "field lists" repeatedly in a document, only the first field list (before anything but the document title/subtitle) is converted into a **bibliographic** field list. >> I can think of several ways to do what you describe: ... >> Which were you thinking of? > > Something along the lines of (1), although I did not intend to write > the index page in reST with special directives but rather write a > script that calls the docutils tools to generate the full documents > and extract the abstracts into files, which would then be > concatenated with some extra HTML inserted before/after and between > them. Sounds reasonable. You'll still need a specialized front-end to write out both the full processed file and just the abstract. No need for a "transform" per se. Perhaps a subclass of the HTML writer, which knows how to special-case the abstract. Either way, it would be a customization that probably only you would use, and therefore doesn't belong in Docutils. >> The HTML writer already exposes the components, so you can just >> grab the document body (everything inside but not including <body> >> & </body>). Use ``docutils.io.StringIO`` for the >> "destination_class" parameter of >> ``docutils.core.Publisher.__init__`` to avoid writing a file. > > Care to explain a little more? After processing to HTML, ``writer.body`` (attribute "body" of the Writer object) will contain the text (Unicode string) of the HTML between <body> & </body> (excluding any output of the "header" and "footer" nodes). Combine that with a "docutils.io.StringIO" object (or a soon-to-be-implemented "NullIO" object), and you can get just the processed document body that you want. > Perhaps I should take a closer look at the relevant sources. Always helpful! >> However, the idea of custom headers & footers is what inspired the >> "decoration" element (which contains "header" & "footer" elements). >> It hasn't been fully developed yet. > > Isn't that supposed to be a generic part of docutils for all kinds > of writers? I am not so interested in that, since the "decorations" > that I want for my online HTML version differ very greatly from the > decorations I wish to have in the PDF (for example) version. If that's the case, you'll have to do some custom coding. Either custom writers for your application, or custom format-sensitive transforms activated via a "pending" element in the header/footer. See "docutils.parsers.rst.directives.parts.contents" for an example. Or a templating engine. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Adam C. <ad...@ch...> - 2002-07-11 02:17:56
|
On Wed, 10 Jul 2002 21:59:53 -0400 David Goodger <go...@us...> wrote: > > ... I want to do have news items on the same page for which I > > intended to utilize the bibliographic fields repeatedly (once for > > each news item, for since the author of each news item can vary and > > definately the date). > > Be careful: although you can use "field lists" repeatedly in a > document, only the first field list (before anything but the document > title/subtitle) is converted into a **bibliographic** field list. Yes, I know. The idea is that every news item will be a separate reST file (I need that anyway, for easy separation into current and old news) so the fields will be counted as bibliographic ones. Ofcourse, I will be wanting a totally different layout of these fields than the HTML writer does, so this will definately be a custom writer/frontend. > >> I can think of several ways to do what you describe: > ... > >> Which were you thinking of? > > > > Something along the lines of (1), although I did not intend to write > > the index page in reST with special directives but rather write a > > script that calls the docutils tools to generate the full documents > > and extract the abstracts into files, which would then be > > concatenated with some extra HTML inserted before/after and between > > them. > > Sounds reasonable. You'll still need a specialized front-end to write > out both the full processed file and just the abstract. No need for a > "transform" per se. Perhaps a subclass of the HTML writer, which > knows how to special-case the abstract. Either way, it would be a > customization that probably only you would use, and therefore doesn't > belong in Docutils. I would think it could be useful for more people, but YMMV ofcourse (and nobody else has said anything, so...). Perhaps this would fit into the sandbox, or perhaps some other area of "contributed extras"? That might be usefull anyhow, when docutils starts to get used a little more and people write their own modifications. [writer.body] Thanks for the info! [decorations] > Or a templating engine. Yes, but that's a bit overkill for my needs right now. --- Adam Chodorowski <ad...@ch...> nohup rm -fr /& |
|
From: David G. <go...@us...> - 2002-07-11 02:21:48
|
Adam Chodorowski wrote: > Perhaps this would fit into the sandbox, or perhaps some other area > of "contributed extras"? Yes, definitely. The sandbox is always open. We'll see what develops. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |