From: Dinu G. <gh...@da...> - 2012-12-10 13:04:26
|
Hello, when using enumerated list items in conversions from ReST to HTML (both with rst2html.py, and docutils.core.publish_parts) I'm surprised to see the allowed "formatting" characters as described in http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#enumerated-lists (e.g. period or any parantheses) being always converted into a period like this: ReST: """ Auto-enumerated list a) item a) b) item b) c) item c) """ HTML: """ <p>Auto-enumerated list</p> <ol class="loweralpha simple"> <li>item a)</li> <li>item b)</li> <li>item c)</li> </ol> """ which appears in a browser with periods/dots like this: """ Auto-enumerated list a. item a) b. item b) c. item c) """ I had actually hoped to have the parantheses preserved in the HTML output. So I tried finding out where they get lost. In html4css1.css styles like "loweralpha"... there is actually no character specified for this and the mapping to e.g. CSS lower-alpha does not seem to guarantee any specific character, maybe except the dot if that's the default: ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } In the W3C specs I found this could be specified using one of these methods: http://www.w3.org/TR/1998/PR-CSS2-19980324/generate.html#counters http://www.w3.org/TR/1998/PR-CSS2-19980324/generate.html#markers So, dots or parantheses could be specified more or less like this: li:before {content: counter(item) ". "; counter-increment: item} li:before {content: counter(item) ") "; counter-increment: item} Then I stumbled upon this page explaining how to extend the default CSS styles for custom purposes: http://docutils.sourceforge.net/docs/howto/html-stylesheets.html Now, before I'm going down this line of experimenting, I'd like to ask if this is the recommended/canonical way to do it? And maybe if it wouldn't make sense to preserve these "formatting" characters inside the default CSS style without having to extend it manually? At least thats how I would expect Docutils to behave here. Regards, Dinu |
From: Guenter M. <mi...@us...> - 2012-12-10 21:37:58
|
On 2012-12-10, Dinu Gherman wrote: > Hello, > when using enumerated list items in conversions from ReST to HTML (both > with rst2html.py, and docutils.core.publish_parts) I'm surprised to see > the allowed "formatting" characters as described in > http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#enumerated-lists > (e.g. period or any parantheses) being always converted into a period > like this: > ReST: > """ > Auto-enumerated list > a) item a) > b) item b) > c) item c) > """ > HTML: > """ ><p>Auto-enumerated list</p> ><ol class="loweralpha simple"> ><li>item a)</li> ><li>item b)</li> ><li>item c)</li> ></ol> > """ > which appears in a browser with periods/dots like this: > """ > Auto-enumerated list > a. item a) > b. item b) > c. item c) > """ > I had actually hoped to have the parantheses preserved in the HTML > output. So I tried finding out where they get lost. The info, which separator is used in the rst source is lost during the parsing step. > In html4css1.css > styles like "loweralpha"... there is actually no character specified > for this and the mapping to e.g. CSS lower-alpha does not seem to > guarantee any specific character, maybe except the dot if that's the > default: > ol.loweralpha { > list-style: lower-alpha } > ol.upperalpha { > list-style: upper-alpha } > In the W3C specs I found this could be specified using one of these > methods: > http://www.w3.org/TR/1998/PR-CSS2-19980324/generate.html#counters > http://www.w3.org/TR/1998/PR-CSS2-19980324/generate.html#markers > So, dots or parantheses could be specified more or less like this: > li:before {content: counter(item) ". "; counter-increment: item} > li:before {content: counter(item) ") "; counter-increment: item} Yes indeed. However, the "html4css1" writer only uses CSS1 which does not know pseudo-elements like ":before". > Then I stumbled upon this page explaining how to extend the default CSS > styles for custom purposes: > http://docutils.sourceforge.net/docs/howto/html-stylesheets.html > Now, before I'm going down this line of experimenting, I'd like to ask > if this is the recommended/canonical way to do it? And maybe if it > wouldn't make sense to preserve these "formatting" characters inside > the default CSS style without having to extend it manually? At least > thats how I would expect Docutils to behave here. The "html4strict" writer in the sandbox comes with a "html4css2.css" stylesheet that extends the default "html4css1.css" including a variant of the CSS rules to style the default enumeration variants + nested lists:: /* default separator variants */ ol.loweralpha > li:before { content: counter(item, lower-alpha) ")"; } ol.upperalpha > li:before { content: counter(item, upper-alpha) "."; } ol.lowerroman > li:before { content: "(" counter(item, lower-roman) ")"; } ol.upperroman > li:before { content: counter(item, upper-roman) ")"; } /* nested counters (1, 1.1, 1.1.1, etc) */ /* nested enumerated lists "inherit" the class attribute, other lists not */ ol.nested > li:before, ol.nested ol > li:before { content: counters(item, ".") " "; } While still not keeping the source delimiter, this provides for more common delimiters with the non-decimal styles. (Of course, these rules are incomplete without the definition and increment of the counter elsewhere in the stylesheet.) You may try the html4strict writer or just copy parts of the definitions into your custom style sheet. Günter |
From: Dinu G. <gh...@da...> - 2012-12-13 08:14:34
|
Guenter Milde wrote: > On 2012-12-10, Dinu Gherman wrote: >> """ >> Auto-enumerated list > >> a. item a) >> b. item b) >> c. item c) >> """ > >> I had actually hoped to have the parantheses preserved in the HTML >> output. So I tried finding out where they get lost. > > The info, which separator is used in the rst source is lost during the > parsing step. I see. Is there any chance for keeping that information during parsing and making it accessible later? In my case I'm trying to build a site where non-techies can enter text snippets in ReST and I can easily for- see a big number of raised eyebrows in support questions if all enum- erations look the same like X., especially since the form X) and (X) are very common. > Yes indeed. However, the "html4css1" writer only uses CSS1 which does > not know pseudo-elements like ":before". > [...] > The "html4strict" writer in the sandbox comes with a "html4css2.css" > stylesheet that extends the default "html4css1.css" including a variant > of the CSS rules to style the default enumeration variants + nested lists:: Is there any reason/requirement for sticking to CSS1 (defined in late 1996) in Docutils? CSS2 was defined only two years later, in 1998, and according to Wikipedia [1] suffered enormous adoption issues, with re- vised version 2.1 becoming a W3C Recommendation only on 7 June 2011. Still I wonder if Docutils really would need to use the most problema- tic features of CSS 2 preventing it from using it as a default? [1] http://en.wikipedia.org/wiki/Cascading_Style_Sheets#Difficulty_with_adoption Regards, Dinu |
From: Guenter M. <mi...@us...> - 2012-12-13 16:42:32
|
On 2012-12-13, Dinu Gherman wrote: > Guenter Milde wrote: >> On 2012-12-10, Dinu Gherman wrote: >>> """ >>> Auto-enumerated list >>> a. item a) >>> b. item b) >>> c. item c) >>> """ >>> I had actually hoped to have the parantheses preserved in the HTML >>> output. So I tried finding out where they get lost. >> The info, which separator is used in the rst source is lost during the >> parsing step. > I see. Is there any chance for keeping that information during parsing > and making it accessible later? You might have a look in the "lossless rst writer" branch of the SVN repository. Keeping the original markup is clearly a prerequisite for such a writer. There are also examples of other markup characters that are kept for reference in the doctree nodes (if I remember right, e.g., section title underline characters). > In my case I'm trying to build a site where non-techies can enter text > snippets in ReST and I can easily for- see a big number of raised > eyebrows in support questions if all enum- erations look the same like > X., especially since the form X) and (X) are very common. Actually, I see this as a design choice of Docutils: the markup only conveys the content, the styling is done independently via style sheets. I agree that it would be nice to have the more common ``X)`` or ``(X)`` forms for alpha and loweralpha lists. (This is why I implemented them in the html4strict writer (i.e. the html4css2.css stylesheet file). >> Yes indeed. However, the "html4css1" writer only uses CSS1 which does >> not know pseudo-elements like ":before". >> [...] >> The "html4strict" writer in the sandbox comes with a "html4css2.css" >> stylesheet that extends the default "html4css1.css" including a variant >> of the CSS rules to style the default enumeration variants + nested lists:: > Is there any reason/requirement for sticking to CSS1 (defined in late > 1996) in Docutils? CSS2 was defined only two years later, in 1998, and > according to Wikipedia [1] suffered enormous adoption issues, with re- > vised version 2.1 becoming a W3C Recommendation only on 7 June 2011. There was a reason: when Docutils was written, IE6 had still a considerable (overwhelming) market share. Thus David decided to to make provisions for its deficiencies. > Still I wonder if Docutils really would need to use the most problema- > tic features of CSS 2 preventing it from using it as a default? > [1] http://en.wikipedia.org/wiki/Cascading_Style_Sheets#Difficulty_with_adoption Moving the standard HTML writer to strict HTML4/XHTML1 with CSS2 styling (while keeping html4css1 as an alternative) is somewhere on the long list of TODO items. Hope this helps, Günter |