From: Firat O. <ozg...@gm...> - 2009-08-28 21:26:54
|
Hello, Recently, I have added Sphinx support to my website where I write Turkish documentation for Python Programming Language. As you might know, Sphinx automatically creates permalinks to the headlines in the Table of Contents. For instance, if the headline is named "join method", then the permalink to this headline will show up in the url title of the page as " www.example.com/strings.html#join-method". If the characters in the headline are in ASCII form, then there is no problem. However, when we are dealing with non-English texts, non-ASCII characters will be converted into "-" sign to produce an output like " www.example.com/sozluk.html#s-zl-k-elerini-alfabe-s-ras-na-dizmek". This approach renders the "#s-zl-k-elerini-alfabe-s-ras-na-dizmek" part of the permalink unreadable to the Turkish reader. Sphinx project uses Docutils, specifically "nodes" module, to convert strings into identifiers... So I though it would be a good idea to convert Turkish-specific characters in the permalinks into their ASCII look-alikes. For example, in the above string ("s-zl-k-elerini-alfabe-s-ras-na-dizmek"), Docutils crops Turkish-specific "ö" (dotted o), "ü" (dotted u), "ğ" (g with overline-cedilla) and "ı" (undotted i) to create conforming id characters. Therefore, instead of cropping these characters altogether, we can turn them into their ASCII look-alikes to produce an output such as "sozluk-ogelerini-alfabe-sirasina-dizmek", which is much more readable... For my own project I patched the original nodes.py to have the desired effect. So my question is: Do you think this is acceptable and I should submit the patch to the Docutils project? By the way, the platform I work on is GNU/Linux (Ubuntu Jaunty Jackalope) with python-docutils-0.5.2 installed. Note: You can visit http://www.istihza.com/py3/icindekiler_python.html If you would like to see how the patched version of "nodes.py" behaves. Thanks, Firat |
From: Laura C. <la...@op...> - 2009-08-29 09:28:08
|
Hi Firat (and everybody else). I forwarded your article to the diversity mailing list (div...@py...) where we are taking about how to make things more accessible for speakers of non-English, and non-European languages. I got this back off list. Hope it helps. Laura To: Laura Creighton <la...@op...> From: James Bennett <ube...@gm...> Subject: Re: [Diversity] This showed up in docutils-develop <snip> (off-list since it's more docutils-related, but I'm not currently subscribed to that one) There's a little utility function in Django that does the right thing here ("Sözlük Öðelerini Alfabe Sýrasýna Dizmek" becomes "sozluk-ogelerini-alfabe-srasna-dizmek"). It's literally just a couple lines of code to do the conversion, and is just relying on the stdlib unicodedata module: http://code.djangoproject.com/browser/django/trunk/django/template/defaultfilter s.py?rev=11477#L222 If someone in docutils/Sphinx would like to drop that, or a functional equivalent, in, it should solve this pretty easily. ----------------- |
From: Firat O. <ozg...@gm...> - 2009-08-29 10:24:16
|
Hello Laura, Thank you very much for your interest. I subscribed to the diversity mailing list and posted an e-mail about the issue (though the post has not appeared in the list yet). This is the e-mail I posted to diversity: Hello, My name is Firat Ozgul. I am the one who posted the e-mail in docutils-develop mailing list relating to the Turkish characters cropped from url permalinks... As Laura stated, the solution is just a few lines of code. In my own project, I use the patched version of nodes.py. This is the code I use, if relevant: http://paste-it.net/public/ub37a24/ ...and this is the patch: http://www.istihza.com/denemeler/turkish.patch Note that, besides converting Turkish-specific characters into their ASCII look-alikes, I also delete "apostrophe" from the permalink to prevent cluttered output. Thanks a lot for your interest. 2009/8/29 Laura Creighton <la...@op...> > Hi Firat (and everybody else). > > I forwarded your article to the diversity mailing list ( > div...@py...) > where we are taking about how to make things more accessible for speakers > of non-English, and non-European languages. I got this back off list. > Hope it helps. > > Laura > > > To: Laura Creighton <la...@op...> > From: James Bennett <ube...@gm...> > Subject: Re: [Diversity] This showed up in docutils-develop > > <snip> > > (off-list since it's more docutils-related, but I'm not currently > subscribed to that one) > > There's a little utility function in Django that does the right thing > here ("Sözlük Öğelerini Alfabe Sırasına Dizmek" becomes > "sozluk-ogelerini-alfabe-srasna-dizmek"). It's literally just a couple > lines of code to do the conversion, and is just relying on the stdlib > unicodedata module: > > > http://code.djangoproject.com/browser/django/trunk/django/template/defaultfilter > s.py?rev=11477#L222<http://code.djangoproject.com/browser/django/trunk/django/template/defaultfilter%0As.py?rev=11477#L222> > > If someone in docutils/Sphinx would like to drop that, or a functional > equivalent, in, it should solve this pretty easily. > > ----------------- > > |
From: Guenter M. <mi...@us...> - 2009-08-31 07:36:03
|
On 2009-08-28, Firat Ozgul wrote: > Hello, > Recently, I have added Sphinx support to my website where I write Turkish > documentation for Python Programming Language. ... > problem. However, when we are dealing with non-English texts, non-ASCII > characters will be converted into "-" sign to produce an output like " > www.example.com/sozluk.html#s-zl-k-elerini-alfabe-s-ras-na-dizmek". ... > Sphinx project uses Docutils, specifically "nodes" module, to convert > strings into identifiers... > ... instead of cropping these characters altogether, we can turn the= > m > into their ASCII look-alikes to produce an output such as > "sozluk-ogelerini-alfabe-sirasina-dizmek", which is much more readable... > For my own project I patched the original nodes.py to have the desired > effect. So my question is: Do you think this is acceptable and I should > submit the patch to the Docutils project? Unfortunately, Sphinx uses an outdated version of the nodes.py module. The SVN version of Docutils' nodes.py ( # $Id: nodes.py 6011 2009-07-09 10:00:07Z gbrandl $ ) already does this replacement in a more generic way (not only for turkish but also other "latin-extended" characters: there is a dictionary:: _non_id_translate = { 0x00f8: u'o', # o with stroke 0x0111: u'd', # d with stroke 0x0127: u'h', # h with stroke 0x0131: u'i', # dotless i ... which you might try out and eventually complete. Thanks for reporting. Günter |
From: Firat O. <ozg...@gm...> - 2009-08-31 09:26:29
|
Hello Günter, Thanks for the information. I will check that. Fırat 2009/8/31 Guenter Milde <mi...@us...> > > Unfortunately, Sphinx uses an outdated version of the nodes.py module. > > The SVN version of Docutils' nodes.py ( > # $Id: nodes.py 6011 2009-07-09 10:00:07Z gbrandl $ > ) already does this replacement in a more generic way (not only for turkish > but also other "latin-extended" characters: there is a dictionary:: > > _non_id_translate = { > 0x00f8: u'o', # o with stroke > 0x0111: u'd', # d with stroke > 0x0127: u'h', # h with stroke > 0x0131: u'i', # dotless i > ... > > which you might try out and eventually complete. > > > Thanks for reporting. > > Günter > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. > |