From: William D. <wi...@fl...> - 2003-05-25 16:27:53
|
Hi, Great work david ! thanks a lot ! You use u'\u2014' for the html writer of attribution and it throws an encoding error with 'latin-1' encoding... It's maybe a problem of my configuration, but i think this must be set by the stylesheet, not by the writer. For me i will not write anythings before the attribution... For french people who didn't follow the cvs, i updated the french directives : admonition instead of admonestation intertitre instead of inter-titre =E9pigraphe with epigraph (i let also exergue) bye --=20 William Dode - http://flibuste.net |
From: David G. <go...@py...> - 2003-05-25 17:39:51
|
William Dode wrote: > You use u'\u2014' for the html writer of attribution and it throws an > encoding error with 'latin-1' encoding... Flippant answer: use UTF-8. Resistance is futile. Friendly answer: I've changed the html4css1.py writer to use "—" instead of the concrete Unicode character. > It's maybe a problem of my configuration, but i think this must be set > by the stylesheet, not by the writer. CSS2 has a "content" rule that can be used in conjunction with ":before" and ":after" pseudo-element selectors. The problem is that CSS2 support in browsers is (last time I checked) still spotty. So we can't rely on CSS2 in generated HTML (there's a reason for the "1" in the module name: "html4css1.py"). Please correct me if I'm wrong. Does anybody know of a good up-to-date CSS compatibility chart? > For me i will not write anythings before the attribution... As for the style, an em-dash is one of the two standard styles I've seen in `The Chicago Manual of Style`; the other one is "(attribution in parentheses)". Other languages may have other styles. Perhaps a runtime setting would be useful? Something like "--attribution=dash/parentheses/none". Any others? -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |
From: William D. <wi...@fl...> - 2003-05-26 15:53:57
|
David Goodger <go...@py...> writes: > William Dode wrote: > > You use u'\u2014' for the html writer of attribution and it throws an > > encoding error with 'latin-1' encoding... > > Flippant answer: use UTF-8. Resistance is futile. I know i know ;-) It's why i escape saying that in french the -- are not in the typographic standard... > > Friendly answer: I've changed the html4css1.py writer to use "—" > instead of the concrete Unicode character. > > > It's maybe a problem of my configuration, but i think this must be set > > by the stylesheet, not by the writer. > > CSS2 has a "content" rule that can be used in conjunction with > ":before" and ":after" pseudo-element selectors. The problem is that > CSS2 support in browsers is (last time I checked) still spotty. So we > can't rely on CSS2 in generated HTML (there's a reason for the "1" in > the module name: "html4css1.py"). Please correct me if I'm wrong. > Does anybody know of a good up-to-date CSS compatibility chart? > > > For me i will not write anythings before the attribution... > > As for the style, an em-dash is one of the two standard styles I've > seen in `The Chicago Manual of Style`; the other one is "(attribution > in parentheses)". Other languages may have other styles. Perhaps a > runtime setting would be useful? Something like > "--attribution=dash/parentheses/none". Any others? yes, good idea. -- William Dode - http://flibuste.net |
From: David G. <go...@py...> - 2003-05-30 18:38:50
|
[me] >> Perhaps a runtime setting would be useful? Something like >> "--attribution=dash/parentheses/none". Implemented. -- David Goodger |
From: <eng...@ss...> - 2003-06-02 10:03:09
|
On Fri, 30 May 2003, David Goodger wrote: > [me] > >> Perhaps a runtime setting would be useful? Something like > >> "--attribution=dash/parentheses/none". > > Implemented. for the latex writer too. -- BINGO: b to b Requirements --- Engelbert Gruber -------+ SSG Fintl,Gruber,Lassnig / A6170 Zirl Innweg 5b / Tel. ++43-5238-93535 ---+ |
From: Magnus <ma...@th...> - 2003-05-26 10:15:06
|
At 13:38 2003-05-25 -0400, David Goodger wrote: >Friendly answer: I've changed the html4css1.py writer to use "—" >instead of the concrete Unicode character. I prefer that too. Talking of "—", I often use "--" and "---" the way they work in Latex, i.e. to produce dashes of different lengths, -- for number ranges, and --- as a punctuation character. When I use rst2latex.py, my -- and --- in the text will display as expected in the final PDF file. I'd typically like to get my -- turned into – and --- turned into — when I generate HTML. I realize that there are situations where this is not right though, as if we want to show that "1 == --1" and "-1 == ---1" will evaluate to true. Any ideas on how we should handle this? I guess that Latex simply turns off -- and --- interpretation when it's inside a formula/equation. I suppose there is no way to make such a distinction in rst... -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: David G. <go...@py...> - 2003-05-26 21:11:05
|
Magnus Lyckå wrote: > Talking of "—", I often use "--" and "---" the way they work > in Latex, i.e. to produce dashes of different lengths, -- for number > ranges, and --- as a punctuation character. When I use rst2latex.py, > my -- and --- in the text will display as expected in the final PDF > file. While I'm sure this is convenient, I have a bit of a problem with it. It conflicts with the statement in FAQ section 2.7 (http://docutils.sf.net/FAQ.html), "ReStructuredText has no character entity subsystem". Who's to say that the author doesn't *want* two or three hyphens, and *not* an en-dash or em-dash? > I'd typically like to get my -- turned into – and --- turned > into — when I generate HTML. I realize that there are situations > where this is not right though, as if we want to show that "1 == --1" > and "-1 == ---1" will evaluate to true. I'm sure we could easily come up with other situations where we don't want "--" or "---" interpreted by the back-end formatter. Obfuscating a telephone number, for example: "My number is 893-5---". Or bleeping a bad word: "I bet you they won't play this song on the radio, I bet you they won't play this new ------- song." Or talking about reStructuredText syntax: """If the final block of a block quote begins with "--" (flush left within the block quote), it is interpreted as an attribution.""" > Any ideas on how we should handle this? I'm not going to say "make sure the generated LaTeX code has no interpretable character sequences". I'm not a LaTeX user, so I'll leave this discussion up to those concerned. But as an author, if I were to run the paragraph above through the LaTeX writer and it turned some hyphens into em- and en-dashes, I'd be somewhat surprised and miffed. If it hasn't happened already, it will. > I guess that Latex simply turns off -- and --- interpretation when > it's inside a formula/equation. That's a minimal solution, and doesn't handle the examples above. > I suppose there is no way to make such > a distinction in rst... As I wrote in the FAQ, I don't think it's reStructuredText's place to do any such implicit character processing. I think it *is* the place of Writer components to make sure the back-ends don't mess witht the text either. The HTML writer takes great pains to ensure that "--" inside comments is broken up so the comments don't prematurely terminate. -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |
From: Magnus <ma...@th...> - 2003-05-27 19:11:25
|
At 17:09 2003-05-26 -0400, David Goodger wrote: >While I'm sure this is convenient, I have a bit of a problem with it. It >conflicts with the statement in FAQ section 2.7 >(http://docutils.sf.net/FAQ.html), "ReStructuredText has no character >entity subsystem". Who's to say that the author doesn't *want* two or >three hyphens, and *not* an en-dash or em-dash? I undersand your point, but how are we supposed to get intervals like 1965--2003 printed nicely with rst? An – would be right there. If the original text file is supposed to look reasonable, I can only see three solutions: 1. Two dashes as above. 2. Only one dash. (1965-2003) 3. A Unicode EN DASH (0x2013) character instead of the normal hyphen in the original file. Solution 1 is what I use today, and it gives the right result in Latex, and a result I can live with i HTML. Not very pretty, but understandable, and I can convert it to – in a post-processing step. I think solution 2 would require too much AI if we don't want to settle for a hyphen as an interval indicator. That's not very pretty in most fonts. I think solution 3 requires use of Unicode through out the tool chain. Unfortunately, there is no support for that today. Sending Unicode files as email doesn't work very well. I have no ndash key on my keyboard. I hardly have a clue on how to input an ndash in any plain text editor such as vim, and while I can to it in Word, that is not a reasonable environment to write rst docs in. Besides, how are we going to see difference between dash, ndash and mdash in a fixed font environment. Hopeless! :( In the long run I do feel that Unicode would be best. I'm all ears for tips that will make Unicode more viable for text editing etc. -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: David G. <go...@py...> - 2003-05-27 20:41:03
|
Magnus Lyckå wrote: > I undersand your point, but how are we supposed to get intervals > like 1965--2003 printed nicely with rst? That depends on how much effort you're willing to expend. > If the original text file is supposed to look reasonable, I can only > see three solutions: > 1. Two dashes as above. > 2. Only one dash. (1965-2003) > 3. A Unicode EN DASH (0x2013) character instead of the normal hyphen > in the original file. > > Solution 1 is what I use today, and it gives the right result in > Latex, and a result I can live with i HTML. Not very pretty, but > understandable, and I can convert it to – in a post-processing > step. That may work for you but it's unacceptable to me. > I think solution 2 would require too much AI if we don't want to > settle for a hyphen as an interval indicator. That's not very pretty > in most fonts. I think most readers don't know or care about the difference between hyphen, em-dash, en-dash, minus, and other similar characters. I'm happy settling for a hyphen. > I think solution 3 requires use of Unicode through out the tool > chain. Unfortunately, there is no support for that today. There is support, more and more. Mac OS X comes with full Unicode and localization support. The TextEdit app supports UTF-8. Windows has good support too, I believe. I'm sure GNU/Linux & the BSDs must have support packages available. > Sending Unicode files as email doesn't work very well. I haven't had any trouble. (I assume you mean UTF-8 or something; no such thing as Unicode files.) I've made this message UTF-8 encoded. Here's a hyphen (-), an en-dash (–), an em-dash (—), and all three with spaces between (- – —). Some accented e's: éèëê. Can you read this? > I have no ndash key on my keyboard. Nor do I. But I do have a Unicode character palette. And I have keyboard shortcuts for common characters. En-dash is [option]-[hyphen], and em-dash is [shift]-[option]-[hyphen]. I know Windows has a character palette; does it have human-usable keyboard shortcuts? ([alt]-123 character codes don't count.) > In the long run I do feel that Unicode would be best. I'm all ears > for tips that will make Unicode more viable for text editing etc. The closest we're going to get to a character entity subsystem is with substitutions and the new "unicode" directive: <http://docutils.sf.net/spec/rst/directives.html#unicode-character-codes> I intend to convert the ISO 8879 and ISO 9573-13 character entity sets into reStructuredText include files, as begun by David Priest (<http://article.gmane.org/gmane.comp.python.documentation/432>). I'd be very happy if somebody beat me to it. (See <http://www.w3.org/TR/MathML2/chapter6.html#chars_entity-tables>; original data files available in the MathML2 archive, <http://www.w3.org/TR/MathML2/XHTML-MathML-20010221.zip>.) -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |
From: Magnus <ma...@th...> - 2003-05-28 14:54:34
|
At 16:39 2003-05-27 -0400, David Goodger wrote: >I haven't had any trouble. Where have you been then? In a pure MacOS and BeOS land of milk and honey? Please show me where that is! :) >(I assume you mean UTF-8 or something; no >such thing as Unicode files.) You obviously knew what I mean. I could live with *any* UTF encoding that's consistently supported by the environments I use. It's not long ago that it was deep magic to get ISO 8859-1 to work on a Linux box, and I'm not sure it still works smoothly if you set up cygwin X11 on a Windows box. :( >I've made this message UTF-8 encoded. >Here's a hyphen (-), an en-dash (b >with spaces between (- b >this? That was how your mail looked when it came through the mailing list. The one sent directly looked somewhat better, as does the web archive at http://sourceforge.net/mailarchive/forum.php?thread_id=3D2416581&forum_id=3D= 8812 Direct email: >I haven't had any trouble. (I assume you mean UTF-8 or something; no >such thing as Unicode files.) I've made this message UTF-8 encoded. >Here's a hyphen (-), an en-dash (=AD), an em-dash (=97), and all three >e >with spaces between (- =AD =97). Some accented e's: =C3=A9=C3=A8=C3=AB=C3= =AA. Can you read >d >this? It doesn't look right to me: A with ~, copyright, A with ~, dieresis, A with ~, is << called a chevron?, A with ~, a raised little 2. That's not what you planned, is it? Actually, the dashes look right to me, so it's partially correct. In Mozilla and Opera I can get the mail archive to look right for your message if I force the browser to UTF-8, in MS IE ndash and mdash looks like little black boxes, and UTF-8 is certainly not activated automagically by any of the browsers for that we page. You see? The current computer environment is sadly not quite ready for Unicode. :( For instance, mailing list software that places archives on web pages or makes digests would need to convert all messages to UTF-8 and give a correct header to the message. Each page can only have one encoding. Not all email clients grok UTF. I use Eudora, and obviously it managed to show the dashes but failed with the little e's. (They were little e's= right?) Or were they just spoiled on the way? Sigh. I hope UTF works with email in a few years. With text editors, it's probably a similar time span. The PythonWin editor, which I *like* (Scintilla based I think) will say that I'm in col 41 if I have a line of 20 =E5 before the cursor. It counts bytes, not characters! It will still save the file as latin1. It will also remove one *byte* on backspace so if you backspace over =E5 it's not= removed, but replaced with =C0. Also, it you copy and paste, it will paste UTF-8 into other apps without declaring that. If your native languange had been another than those three that can get along with US ASCII, you would have known that these kinds of problems are everywhere. -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program=20 |
From: David G. <go...@py...> - 2003-05-28 20:18:56
|
[David Goodger] >> I haven't had any trouble. [Magnus Lyckå] > Where have you been then? In a pure MacOS and BeOS land of > milk and honey? Please show me where that is! :) It is nice over here, yes. The built-in i18n and l10n support is superb. My wife is Japanese and we thought we'd have to buy a Japanese Language Kit for Mac OS X, like we did for OS 8 for the old machine. It was a pleasant surprise to find OS X has dozens of localizations built in, for the OS and many applications. >> (I assume you mean UTF-8 or something; no >> such thing as Unicode files.) > > You obviously knew what I mean. I *assumed*, and I'm glad my assumption was correct. One must be precise in technical discussions. > I could live with *any* UTF encoding > that's consistently supported by the environments I use. I don't mean to be flippant when I say this, but that's your problem. :-) > It doesn't look right to me: A with ~, copyright, A with ~, > dieresis, A with ~, is << called a chevron?, A with ~, a raised > little 2. That's not what you planned, is it? No, it isn't. I see your message as you describe it though. Your message came through as ``Content-Type: text/plain; charset="iso-8859-1"``. My earlier message came back to me through the list just fine, as ``Content-Type: text/plain; charset=UTF-8``. Clearly MailMan is handling encodings well. It seems like your email client can't handle UTF-8 though. > Actually, the dashes look right to me, so it's partially correct. In > Mozilla and Opera I can get the mail archive to look right for your > message if I force the browser to UTF-8, I was also able to get it to look right. The problem with that page was that it had mismatched encodings: ``<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">``. > You see? The current computer environment is sadly not quite ready > for Unicode. :( Well, yours isn't ;-). > Sigh. I hope UTF works with email in a few years. With text editors, > it's probably a similar time span. Given more pressure (& contributions) from users, it will happen. > If your native languange had been another than those three that can > get along with US ASCII, you would have known that these kinds of > problems are everywhere. No need for hasty assumptions. My second language was French, and my third was Japanese. I've seen much worse garbage than this, working with Japanese encodings since 1991. All I can say is that Docutils plays nice. -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |