Re: [Docutils-users] HTML format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Nov 11, 2002 at 08:53:01PM -0500, David Goodger wrote:
> > It looks like I cannot get only the body of the text (what is located
> > between <body> ... </body>) without some addtional programming,
>=20
> Correct.  You'll need a specialized Writer component.  Take a look at
> the files in http://docutils.sf.net/sandbox/oliverr/ht/ .  This seems
> to be a common requirement for people, so a custom HTML-body-only
> Writer could be useful.  I don't know what to do about the DocTitle
> transform in this case though (in docutils/transforms/frontmatter.py).
I believe, the best approach is to just ignore it. :)  Those who really
need it, could access it through the document instance.

> > nor it's possible to get rid of use stylesheets at all.
>=20
> I'm not sure what you mean by this or what you want.  Please
> elaborate.
The current code does produce HTML elements with classes referencing to
a stylesheet.  I'd say that the rendering without a stylesheet seems to
be OK for me, so I'd like to specify None as the stylesheet name, and in
this case I'd expect to get html text without class references in html
elements.

> The html4css1.py Writer is designed to use a stylesheet, as
> recommended by the latest HTML specs.  If you want HTML that doesn't
> require a stylesheet at all, a new Writer would be needed.
Such a behaviour does not seem to be very complicated, so maybe it could
be possible to add this functionality in the current code?

> [ URLs with spaces ]
>=20
> According to RFC 2396 "Uniform Resource Identifiers (URI): Generic
> Syntax", spaces are not valid URI/URL characters.  It does say this:
>=20
>    In some cases, extra whitespace (spaces, linebreaks, tabs, etc.)
>    may need to be added to break long URI across lines. The whitespace
>    should be ignored when extracting the URI.
>    ...
>    Using <> angle brackets around each URI is especially recommended
>    as a delimiting style for URI that contain whitespace.
>=20
> The syntax you propose would conflict with this, especially if the
> MS-style URL were to break across lines:
>=20
>     <http://www.example.com/a/very/long/
>     path/broken/across/lines>
>=20
> Is the whitespace after "long/" significant or not?  The RFC says it's
> not.  The reStructuredText parser also joins long multi-line URLs in
> targets.  I wouldn't mind adding the ability to join broken URLs in
> free text as well, if surrounded by brackets.
>=20
> So the answer to your question is, I think I'd say no thanks.
> Whitespace in URLs is a pain; I think it's better just to avoid it.
Hmm.  The current code does not seem to follow the quoted RFC 2396 then.
I did specify

    <http://www.example.com/an url with spaces>

(which seems to be correct according to this RFC) and as result got

    &lt;<a
    href=3D"http://www.example.com/an">http://www.example.com/an</a>
    url with spaces&gt;

which seems to be incorrect, right?

--
Misha