From: <mi...@us...> - 2012-10-16 12:40:48
|
Revision: 7530 http://docutils.svn.sourceforge.net/docutils/?rev=7530&view=rev Author: milde Date: 2012-10-16 12:40:36 +0000 (Tue, 16 Oct 2012) Log Message: ----------- Add SmartQuotes transform for typographic quotes and dashes. Modified Paths: -------------- trunk/docutils/COPYING.txt trunk/docutils/FAQ.txt trunk/docutils/HISTORY.txt trunk/docutils/docs/dev/todo.txt trunk/docutils/docs/ref/transforms.txt trunk/docutils/docutils/parsers/rst/__init__.py trunk/docutils/docutils/transforms/universal.py Added Paths: ----------- trunk/docutils/docutils/utils/smartquotes.py trunk/docutils/test/test_transforms/test_smartquotes.py Modified: trunk/docutils/COPYING.txt =================================================================== --- trunk/docutils/COPYING.txt 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/COPYING.txt 2012-10-16 12:40:36 UTC (rev 7530) @@ -53,17 +53,17 @@ that have not yet been invented or conceived. (This dedication is derived from the text of the `Creative Commons -Public Domain Dedication -<http://creativecommons.org/licenses/publicdomain>`_. [#]_) +Public Domain Dedication`. [#]_) .. [#] Creative Commons has `retired this legal tool`__ and does not recommend that it be applied to works: This tool is based on United - States law and may not be applicable outside the US. For dedicating - new works to the public domain, Creative Commons recommend CC0_. So - does the Free Software Foundation in its license-list_. + States law and may not be applicable outside the US. For dedicating new + works to the public domain, Creative Commons recommend the replacement + Public Domain Dedication CC0_ (CC zero, "No Rights Reserved"). So does + the Free Software Foundation in its license-list_. __ http://creativecommons.org/retiredlicenses - .. _CC0: http://creativecommons.org/publicdomain/zero/1.0/legalcode + .. _CC0: http://creativecommons.org/about/cc0 Exceptions ========== @@ -76,18 +76,30 @@ <http://www.twinhelix.com>. Free usage permitted as long as this notice remains intact. -* docutils/math/__init__.py, - docutils/math/latex2mathml.py, +* docutils/utils/math/__init__.py, + docutils/utils/math/latex2mathml.py, docutils/writers/xetex/__init__.py, docutils/writers/latex2e/docutils-05-compat.sty, docs/user/docutils-05-compat.sty.txt, - docutils/error_reporting.py: + docutils/utils/error_reporting.py, + docutils/test/transforms/test_smartquotes.py: Copyright © Günter Milde. Released under the terms of the `2-Clause BSD license`_ (`local copy <licenses/BSD-2-Clause.txt>`__). -* docutils/math/math2html.py, +* docutils/utils/smartquotes.py + + Copyright © 2011 Günter Milde, + based on `SmartyPants`_ © 2003 John Gruber + (released under a 3-Clause BSD license included in the file) + and smartypants.py © 2004, 2007 Chad Miller. + Released under the terms of the `2-Clause BSD license`_ + (`local copy <licenses/BSD-2-Clause.txt>`__). + + .. _SmartyPants: http://daringfireball.net/projects/smartypants/ + +* docutils/utils/math/math2html.py, docutils/writers/html4css1/math.css Copyright © Alex Fernández Modified: trunk/docutils/FAQ.txt =================================================================== --- trunk/docutils/FAQ.txt 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/FAQ.txt 2012-10-16 12:40:36 UTC (rev 7530) @@ -1223,3 +1223,25 @@ sentence-end-double-space: t fill-column: 70 End: + +.. Here's a code css to make a table colourful:: + + /* Table: */ + + th { + background-color: #ede; + } + + /* alternating colors in table rows */ + table.docutils tr:nth-child(even) { + background-color: #F3F3FF; + } + table.docutils tr:nth-child(odd) { + background-color: #FFFFEE; + } + + table.docutils tr { + border-style: solid none solid none; + border-width: 1px 0 1px 0; + border-color: #AAAAAA; + } Modified: trunk/docutils/HISTORY.txt =================================================================== --- trunk/docutils/HISTORY.txt 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/HISTORY.txt 2012-10-16 12:40:36 UTC (rev 7530) @@ -38,9 +38,9 @@ - Fix [ 3546533 ] Unicode error with `date` directive. -* docutils/setup.py +* docutils/transforms/universal.py - - Add ``math.css`` stylesheet to data files (thanks to Dmitry Shachnev). + - Add SmartQuotes transform for typographic quotes and dashes. * docutils/writers/html4css1/__init__.py @@ -65,8 +65,12 @@ - Fix [ 3552403 ] Prevent broken PyXML replacing stdlibs xml module. -* docutils/tools/test/test_buildhtml.py +* setup.py + - Tag ``math.css`` stylesheet as data file (patch by Dmitry Shachnev). + +* tools/test/test_buildhtml.py + - Fix [ 3521167 ] allow running in any directory. - Fix [ 3521168 ] allow running with Python 3. @@ -74,7 +78,7 @@ Release 0.9.1 (2012-06-17) ========================== -* docutils/setup.py +* setup.py - Fix [ 3527842 ]. Under Python 3, converted tests and tools were installed in the PYTHONPATH. Converted tests are now @@ -84,13 +88,13 @@ ``setup.py install`` under Python 3, remove the spurious ``test/`` and ``tools/`` directories in the site library root. -* docutils/test/ +* test/ - Make tests independent from the location of the ``test/`` directory. - Use converted sources (from the ``build/`` directory) for tests under Python 3. -* docutils/tools/ +* tools/ - Make tools compatible with both, Python 2 and 3 without 2to3-conversion. @@ -104,7 +108,7 @@ - Fix [ 3525847 ]. Catch and report UnicodeEncodeError with ``locale == C`` and 8-bit char in path argument of `include` directive. -* docutils/test/alltests.py +* test/alltests.py - class `Tee`: catch UnicodeError when writing to "ascii" stream or file under Python 3. @@ -211,7 +215,7 @@ - Fix [ 3364658 ] (Change last file with Apache license to BSD-2-Clause) and [ 3395920 ] (correct copyright info for rst.el). -* docutils/test/ +* test/ - Apply [ 3303733 ] and [ 3365041 ] to fix tests under Py3k. Modified: trunk/docutils/docs/dev/todo.txt =================================================================== --- trunk/docutils/docs/dev/todo.txt 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/docs/dev/todo.txt 2012-10-16 12:40:36 UTC (rev 7530) @@ -1925,13 +1925,15 @@ supports Zotero databases and CSL_ styles with Docutils with an ``xcite`` role. - .. _CSL: http://www.citationstyles.org/ + * `sphinxcontrib-bibtex`_ Sphinx extension with "bibliography" + directive and "cite" role supporting BibTeX databases. - * Automatically insert a "References" heading? .. _CrossTeX: http://www.cs.cornell.edu/people/egs/crosstex/ .. _Pybtex: http://pybtex.sourceforge.net/ +.. _CSL: http://www.citationstyles.org/ +.. _sphinxcontrib-bibtex: http://sphinxcontrib-bibtex.readthedocs.org/ * _`Reference Merging` @@ -2021,25 +2023,7 @@ * _`Index Generation` -* _`Beautify` - Convert quotes and dashes to typographically correct entities. - Sphinx does this with ``smartypants.py``. - - Write a generic version that uses Unicode chars - (let the writer replace these if required). - - Some arguments for "smart quotes" are collected in a `mail to - docutils-user by Jörg W. Mittag from 2006-03-13`__. - - Also see the "... Or Not To Do?" list entry for - `Character Processing`_ - -__ http://article.gmane.org/gmane.text.docutils.user/2765 - -.. _Character Processing: rst/alternatives.html#character-processing - - HTML Writer =========== Modified: trunk/docutils/docs/ref/transforms.txt =================================================================== --- trunk/docutils/docs/ref/transforms.txt 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/docs/ref/transforms.txt 2012-10-16 12:40:36 UTC (rev 7530) @@ -11,16 +11,39 @@ .. contents:: +Transforms change the document tree in-place, add to the tree, or prune it. +Transforms resolve references and footnote numbers, process interpreted +text, and do other context-sensitive processing. Each transform is a +subclass of ``docutils.tranforms.Transform``. -For background about transforms and the Transformer object, see `PEP -258`_. +There are `transforms added by components`_, others (e.g. +``parts.Contents``) are added by the parser, if a corresponding directive is +found in the document. +To add a transform, components (objects inheriting from +Docutils.Component like Readers, Parsers, Writers, Input, Output) overwrite +the ``get_transforms()`` method of their base class. After the Reader has +finished processing, the Publisher calls +``Transformer.populate_from_components()`` with a list of components and all +transforms returned by the component's ``get_transforms()`` method are +stored in a `transformer object` attached to the document tree. + + +For more about transforms and the Transformer object, see also `PEP +258`_. (The ``default_transforms()`` attribute of component classes mentioned +there is deprecated. Use the ``get_transforms()`` method instead.) + .. _PEP 258: ../peps/pep-0258.html#transformer Transforms Listed in Priority Order =================================== +Transform classes each have a default_priority attribute which is used by +the Transformer to apply transforms in order (low to high). The default +priority can be overridden when adding transforms to the Transformer object. + + ============================== ============================ ======== Transform: module.Class Added By Priority ============================== ============================ ======== @@ -83,12 +106,14 @@ universal.FilterMessages Writer (w) 870 +universal.SmartQuotes Parser 850 + universal.TestMessages DocutilsTestSupport 880 writer_aux.Compound newlatex2e (w) 910 writer_aux.Admonitions html4css1 (w), 920 - newlatex2e (w) + latex2e (w) misc.CallBack n/a 990 ============================== ============================ ======== @@ -119,3 +144,64 @@ 800 899 very late 900 999 very late (non-standard) ==== ==== ================================================ + + +Transforms added by components +=============================== + + +readers.Reader: + | universal.Decorations, + | universal.ExposeInternals, + | universal.StripComments + +readers.ReReader: + None + +readers.standalone.Reader: + | references.Substitutions, + | references.PropagateTargets, + | frontmatter.DocTitle, + | frontmatter.SectionSubTitle, + | frontmatter.DocInfo, + | references.AnonymousHyperlinks, + | references.IndirectHyperlinks, + | references.Footnotes, + | references.ExternalTargets, + | references.InternalTargets, + | references.DanglingReferences, + | misc.Transitions + +readers.pep.Reader: + | references.Substitutions, + | references.PropagateTargets, + | references.AnonymousHyperlinks, + | references.IndirectHyperlinks, + | references.Footnotes, + | references.ExternalTargets, + | references.InternalTargets, + | references.DanglingReferences, + | misc.Transitions, + | peps.Headers, + | peps.Contents, + | peps.TargetNotes + +parsers.rst.Parser + universal.SmartQuotes + +writers.Writer: + | universal.Messages, + | universal.FilterMessages, + | universal.StripClassesAndElements + +writers.UnfilteredWriter + None + +writers.latex2e.Writer + writer_aux.Admonitions + +writers.html4css1.Writer: + writer_aux.Admonitions + +writers.odf_odt.Writer: + removes references.DanglingReferences Modified: trunk/docutils/docutils/parsers/rst/__init__.py =================================================================== --- trunk/docutils/docutils/parsers/rst/__init__.py 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/docutils/parsers/rst/__init__.py 2012-10-16 12:40:36 UTC (rev 7530) @@ -73,7 +73,8 @@ import docutils.parsers import docutils.statemachine from docutils.parsers.rst import states -from docutils import frontend, nodes +from docutils import frontend, nodes, Component +from docutils.transforms import universal class Parser(docutils.parsers.Parser): @@ -136,7 +137,12 @@ '"long", "short", or "none (no parsing)". Default is "short".', ['--syntax-highlight'], {'choices': ['long', 'short', 'none'], - 'default': 'long', 'metavar': '<format>'}),)) + 'default': 'long', 'metavar': '<format>'}), + ('Change straight quotation marks to typographic form: ' + 'one of "yes", "no", "alt[ernative]" (default "no").', + ['--smart-quotes'], + {'default': False, 'validator': frontend.validate_ternary}), + )) config_section = 'restructuredtext parser' config_section_dependencies = ('parsers',) @@ -149,6 +155,10 @@ self.state_classes = states.state_classes self.inliner = inliner + def get_transforms(self): + return Component.get_transforms(self) + [ + universal.SmartQuotes] + def parse(self, inputstring, document): """Parse `inputstring` and populate `document`, a document tree.""" self.setup_parse(inputstring, document) @@ -321,7 +331,7 @@ and the line number added. Preferably use the `debug`, `info`, `warning`, `error`, or `severe` - wrapper methods, e.g. ``self.error(message)`` to generate an + wrapper methods, e.g. ``self.error(message)`` to generate an ERROR-level directive error. """ return DirectiveError(level, message) Modified: trunk/docutils/docutils/transforms/universal.py =================================================================== --- trunk/docutils/docutils/transforms/universal.py 2012-10-15 12:15:55 UTC (rev 7529) +++ trunk/docutils/docutils/transforms/universal.py 2012-10-16 12:40:36 UTC (rev 7530) @@ -19,8 +19,8 @@ import time from docutils import nodes, utils from docutils.transforms import TransformError, Transform +from docutils.utils import smartquotes - class Decorations(Transform): """ @@ -201,3 +201,25 @@ node['classes'].remove(class_value) if class_value in self.strip_elements: return 1 + +class SmartQuotes(Transform): + + """ + Replace ASCII quotation marks with typographic form. + + Also replace multiple dashes with em-dashes and en-dashes. + """ + + default_priority = 850 + + def apply(self): + if self.document.settings.smart_quotes is False: + return + for node in self.document.traverse(nodes.Text): + if isinstance(node.parent, + (nodes.FixedTextElement, nodes.literal)): + # print "literal", node + continue + newtext = smartquotes.smartyPants(node.astext(), attr='2') + node.parent.replace(node, nodes.Text(newtext)) + # print "smartquote", Added: trunk/docutils/docutils/utils/smartquotes.py =================================================================== --- trunk/docutils/docutils/utils/smartquotes.py (rev 0) +++ trunk/docutils/docutils/utils/smartquotes.py 2012-10-16 12:40:36 UTC (rev 7530) @@ -0,0 +1,830 @@ +#!/usr/bin/python +# -*- coding: utf8 -*- + +# :Id: $Id$ +# :Copyright: © 2011 Günter Milde, +# original `SmartyPants`_: © 2003 John Gruber +# smartypants.py: © 2004, 2007 Chad Miller +# :License: Released under the terms of the `2-Clause BSD license`_, in short: +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notices and this notice are preserved. +# This file is offered as-is, without any warranty. +# +# .. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause + + +r""" +======================== +SmartyPants for Docutils +======================== + +Synopsis +======== + +Smart-quotes for Docutils. + +The original "SmartyPants" is a free web publishing plug-in for Movable Type, +Blosxom, and BBEdit that easily translates plain ASCII punctuation characters +into "smart" typographic punctuation characters. + +`smartypants.py`, endeavours to be a functional port of +SmartyPants to Python, for use with Pyblosxom_. + +`smartquotes.py` is an adaption of Smartypants to Docutils_. By using Unicode +characters instead of HTML entities for typographic quotes, it works for any +output format that supports Unicode. + +Authors +======= + +`John Gruber`_ did all of the hard work of writing this software in Perl for +`Movable Type`_ and almost all of this useful documentation. `Chad Miller`_ +ported it to Python to use with Pyblosxom_. +Adapted to Docutils_ by Günter Milde + +Additional Credits +================== + +Portions of the SmartyPants original work are based on Brad Choate's nifty +MTRegex plug-in. `Brad Choate`_ also contributed a few bits of source code to +this plug-in. Brad Choate is a fine hacker indeed. + +`Jeremy Hedley`_ and `Charles Wiltgen`_ deserve mention for exemplary beta +testing of the original SmartyPants. + +`Rael Dornfest`_ ported SmartyPants to Blosxom. + +.. _Brad Choate: http://bradchoate.com/ +.. _Jeremy Hedley: http://antipixel.com/ +.. _Charles Wiltgen: http://playbacktime.com/ +.. _Rael Dornfest: http://raelity.org/ + + +Copyright and License +===================== + +SmartyPants_ license (3-Clause BSD license): + + Copyright (c) 2003 John Gruber (http://daringfireball.net/) + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + + * Neither the name "SmartyPants" nor the names of its contributors + may be used to endorse or promote products derived from this + software without specific prior written permission. + + This software is provided by the copyright holders and contributors + "as is" and any express or implied warranties, including, but not + limited to, the implied warranties of merchantability and fitness for + a particular purpose are disclaimed. In no event shall the copyright + owner or contributors be liable for any direct, indirect, incidental, + special, exemplary, or consequential damages (including, but not + limited to, procurement of substitute goods or services; loss of use, + data, or profits; or business interruption) however caused and on any + theory of liability, whether in contract, strict liability, or tort + (including negligence or otherwise) arising in any way out of the use + of this software, even if advised of the possibility of such damage. + +smartypants.py license (2-Clause BSD license): + + smartypants.py is a derivative work of SmartyPants. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + + This software is provided by the copyright holders and contributors + "as is" and any express or implied warranties, including, but not + limited to, the implied warranties of merchantability and fitness for + a particular purpose are disclaimed. In no event shall the copyright + owner or contributors be liable for any direct, indirect, incidental, + special, exemplary, or consequential damages (including, but not + limited to, procurement of substitute goods or services; loss of use, + data, or profits; or business interruption) however caused and on any + theory of liability, whether in contract, strict liability, or tort + (including negligence or otherwise) arising in any way out of the use + of this software, even if advised of the possibility of such damage. + +.. _John Gruber: http://daringfireball.net/ +.. _Chad Miller: http://web.chad.org/ + +.. _Pyblosxom: http://pyblosxom.bluesock.org/ +.. _SmartyPants: http://daringfireball.net/projects/smartypants/ +.. _Movable Type: http://www.movabletype.org/ +.. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause +.. _Docutils: http://docutils.sf.net/ + +Description +=========== + +SmartyPants can perform the following transformations: + +- Straight quotes ( " and ' ) into "curly" quote characters +- Backticks-style quotes (\`\`like this'') into "curly" quote characters +- Dashes (``--`` and ``---``) into en- and em-dash entities +- Three consecutive dots (``...`` or ``. . .``) into an ellipsis entity + +This means you can write, edit, and save your posts using plain old +ASCII straight quotes, plain dashes, and plain dots, but your published +posts (and final HTML output) will appear with smart quotes, em-dashes, +and proper ellipses. + +SmartyPants does not modify characters within ``<pre>``, ``<code>``, ``<kbd>``, +``<math>`` or ``<script>`` tag blocks. Typically, these tags are used to +display text where smart quotes and other "smart punctuation" would not be +appropriate, such as source code or example markup. + + +Backslash Escapes +================= + +If you need to use literal straight quotes (or plain hyphens and +periods), SmartyPants accepts the following backslash escape sequences +to force non-smart punctuation. It does so by transforming the escape +sequence into a character: + +======== ===== ========= +Escape Value Character +======== ===== ========= +``\\\\`` \ \\ +\\" " " +\\' ' ' +\\. . . +\\- - \- +\\` ` \` +======== ===== ========= + +This is useful, for example, when you want to use straight quotes as +foot and inch marks: 6'2" tall; a 17" iMac. + +Options +======= + +For Pyblosxom users, the ``smartypants_attributes`` attribute is where you +specify configuration options. + +Numeric values are the easiest way to configure SmartyPants' behavior: + +"0" + Suppress all transformations. (Do nothing.) +"1" + Performs default SmartyPants transformations: quotes (including + \`\`backticks'' -style), em-dashes, and ellipses. "``--``" (dash dash) + is used to signify an em-dash; there is no support for en-dashes. + +"2" + Same as smarty_pants="1", except that it uses the old-school typewriter + shorthand for dashes: "``--``" (dash dash) for en-dashes, "``---``" + (dash dash dash) + for em-dashes. + +"3" + Same as smarty_pants="2", but inverts the shorthand for dashes: + "``--``" (dash dash) for em-dashes, and "``---``" (dash dash dash) for + en-dashes. + +"-1" + Stupefy mode. Reverses the SmartyPants transformation process, turning + the characters produced by SmartyPants into their ASCII equivalents. + E.g. "“" is turned into a simple double-quote ("), "—" is + turned into two dashes, etc. + + +The following single-character attribute values can be combined to toggle +individual transformations from within the smarty_pants attribute. For +example, to educate normal quotes and em-dashes, but not ellipses or +\`\`backticks'' -style quotes: + +``py['smartypants_attributes'] = "1"`` + +"q" + Educates normal quote characters: (") and ('). + +"b" + Educates \`\`backticks'' -style double quotes. + +"B" + Educates \`\`backticks'' -style double quotes and \`single' quotes. + +"d" + Educates em-dashes. + +"D" + Educates em-dashes and en-dashes, using old-school typewriter shorthand: + (dash dash) for en-dashes, (dash dash dash) for em-dashes. + +"i" + Educates em-dashes and en-dashes, using inverted old-school typewriter + shorthand: (dash dash) for em-dashes, (dash dash dash) for en-dashes. + +"e" + Educates ellipses. + +"w" + Translates any instance of ``"`` into a normal double-quote character. + This should be of no interest to most people, but of particular interest + to anyone who writes their posts using Dreamweaver, as Dreamweaver + inexplicably uses this entity to represent a literal double-quote + character. SmartyPants only educates normal quotes, not entities (because + ordinarily, entities are used for the explicit purpose of representing the + specific character they represent). The "w" option must be used in + conjunction with one (or both) of the other quote options ("q" or "b"). + Thus, if you wish to apply all SmartyPants transformations (quotes, en- + and em-dashes, and ellipses) and also translate ``"`` entities into + regular quotes so SmartyPants can educate them, you should pass the + following to the smarty_pants attribute: + + +Caveats +======= + +Why You Might Not Want to Use Smart Quotes in Your Weblog +--------------------------------------------------------- + +For one thing, you might not care. + +Most normal, mentally stable individuals do not take notice of proper +typographic punctuation. Many design and typography nerds, however, break +out in a nasty rash when they encounter, say, a restaurant sign that uses +a straight apostrophe to spell "Joe's". + +If you're the sort of person who just doesn't care, you might well want to +continue not caring. Using straight quotes -- and sticking to the 7-bit +ASCII character set in general -- is certainly a simpler way to live. + +Even if you I *do* care about accurate typography, you still might want to +think twice before educating the quote characters in your weblog. One side +effect of publishing curly quote characters is that it makes your +weblog a bit harder for others to quote from using copy-and-paste. What +happens is that when someone copies text from your blog, the copied text +contains the 8-bit curly quote characters (as well as the 8-bit characters +for em-dashes and ellipses, if you use these options). These characters +are not standard across different text encoding methods, which is why they +need to be encoded as characters. + +People copying text from your weblog, however, may not notice that you're +using curly quotes, and they'll go ahead and paste the unencoded 8-bit +characters copied from their browser into an email message or their own +weblog. When pasted as raw "smart quotes", these characters are likely to +get mangled beyond recognition. + +That said, my own opinion is that any decent text editor or email client +makes it easy to stupefy smart quote characters into their 7-bit +equivalents, and I don't consider it my problem if you're using an +indecent text editor or email client. + + +Algorithmic Shortcomings +------------------------ + +One situation in which quotes will get curled the wrong way is when +apostrophes are used at the start of leading contractions. For example: + +``'Twas the night before Christmas.`` + +In the case above, SmartyPants will turn the apostrophe into an opening +single-quote, when in fact it should be a closing one. I don't think +this problem can be solved in the general case -- every word processor +I've tried gets this wrong as well. In such cases, it's best to use the +proper character for closing single-quotes (``’``) by hand. + + +Version History +=============== + +1.6: 2010-08-26 + - Adaption to Docutils: + - Use Unicode instead of HTML entities, + - Remove code special to pyblosxom. + +1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400 + - Fixed bug where blocks of precious unalterable text was instead + interpreted. Thanks to Le Roux and Dirk van Oosterbosch. + +1.5_1.5: Sat, 13 Aug 2005 15:50:24 -0400 + - Fix bogus magical quotation when there is no hint that the + user wants it, e.g., in "21st century". Thanks to Nathan Hamblen. + - Be smarter about quotes before terminating numbers in an en-dash'ed + range. + +1.5_1.4: Thu, 10 Feb 2005 20:24:36 -0500 + - Fix a date-processing bug, as reported by jacob childress. + - Begin a test-suite for ensuring correct output. + - Removed import of "string", since I didn't really need it. + (This was my first every Python program. Sue me!) + +1.5_1.3: Wed, 15 Sep 2004 18:25:58 -0400 + - Abort processing if the flavour is in forbidden-list. Default of + [ "rss" ] (Idea of Wolfgang SCHNERRING.) + - Remove stray virgules from en-dashes. Patch by Wolfgang SCHNERRING. + +1.5_1.2: Mon, 24 May 2004 08:14:54 -0400 + - Some single quotes weren't replaced properly. Diff-tesuji played + by Benjamin GEIGER. + +1.5_1.1: Sun, 14 Mar 2004 14:38:28 -0500 + - Support upcoming pyblosxom 0.9 plugin verification feature. + +1.5_1.0: Tue, 09 Mar 2004 08:08:35 -0500 + - Initial release +""" + +default_smartypants_attr = "1" + +import re + +class smart(object): + """Smart quotes and dashes + + TODO: internationalization, see e.g. + http://de.wikipedia.org/wiki/Anf%C3%BChrungszeichen#Andere_Sprachen + """ + endash = u'–' # "–" EN DASH + emdash = u'—' # "—" EM DASH + lquote = u'‘' # "‘" LEFT SINGLE QUOTATION MARK + rquote = u'’' # "’" RIGHT SINGLE QUOTATION MARK + #lquote = u'‚' # "‚" SINGLE LOW-9 QUOTATION MARK (German) + ldquote = u'“' # "“" LEFT DOUBLE QUOTATION MARK + rdquote = u'”' # "”" RIGHT DOUBLE QUOTATION MARK + #ldquote = u'„' # "𔄤" DOUBLE LOW-9 QUOTATION MARK (German) + ellipsis = u'…' # "…" HORIZONTAL ELLIPSIS + +def smartyPants(text, attr=default_smartypants_attr): + convert_quot = False # translate " entities into normal quotes? + + # Parse attributes: + # 0 : do nothing + # 1 : set all + # 2 : set all, using old school en- and em- dash shortcuts + # 3 : set all, using inverted old school en and em- dash shortcuts + # + # q : quotes + # b : backtick quotes (``double'' only) + # B : backtick quotes (``double'' and `single') + # d : dashes + # D : old school dashes + # i : inverted old school dashes + # e : ellipses + # w : convert " entities to " for Dreamweaver users + + skipped_tag_stack = [] + do_dashes = "0" + do_backticks = "0" + do_quotes = "0" + do_ellipses = "0" + do_stupefy = "0" + + if attr == "0": + # Do nothing. + return text + elif attr == "1": + do_quotes = "1" + do_backticks = "1" + do_dashes = "1" + do_ellipses = "1" + elif attr == "2": + # Do everything, turn all options on, use old school dash shorthand. + do_quotes = "1" + do_backticks = "1" + do_dashes = "2" + do_ellipses = "1" + elif attr == "3": + # Do everything, turn all options on, use inverted old school dash shorthand. + do_quotes = "1" + do_backticks = "1" + do_dashes = "3" + do_ellipses = "1" + elif attr == "-1": + # Special "stupefy" mode. + do_stupefy = "1" + else: + for c in attr: + if c == "q": do_quotes = "1" + elif c == "b": do_backticks = "1" + elif c == "B": do_backticks = "2" + elif c == "d": do_dashes = "1" + elif c == "D": do_dashes = "2" + elif c == "i": do_dashes = "3" + elif c == "e": do_ellipses = "1" + elif c == "w": convert_quot = "1" + else: + pass + # ignore unknown option + + tokens = _tokenize(text) + result = [] + in_pre = False + + prev_token_last_char = "" + # This is a cheat, used to get some context + # for one-character tokens that consist of + # just a quote char. What we do is remember + # the last character of the previous text + # token, to use as context to curl single- + # character quote tokens correctly. + + for cur_token in tokens: + t = cur_token[1] + last_char = t[-1:] # Remember last char of this token before processing. + if not in_pre: + oldstr = t + t = processEscapes(t) + + if convert_quot != "0": + t = re.sub('"', '"', t) + + if do_dashes != "0": + if do_dashes == "1": + t = educateDashes(t) + if do_dashes == "2": + t = educateDashesOldSchool(t) + if do_dashes == "3": + t = educateDashesOldSchoolInverted(t) + + if do_ellipses != "0": + t = educateEllipses(t) + + # Note: backticks need to be processed before quotes. + if do_backticks != "0": + t = educateBackticks(t) + + if do_backticks == "2": + t = educateSingleBackticks(t) + + if do_quotes != "0": + if t == "'": + # Special case: single-character ' token + if re.match("\S", prev_token_last_char): + t = smart.rquote + else: + t = smart.lquote + elif t == '"': + # Special case: single-character " token + if re.match("\S", prev_token_last_char): + t = smart.rdquote + else: + t = smart.ldquote + + else: + # Normal case: + t = educateQuotes(t) + + if do_stupefy == "1": + t = stupefyEntities(t) + + prev_token_last_char = last_char + result.append(t) + + return "".join(result) + + +def educateQuotes(str): + """ + Parameter: String. + + Returns: The string, with "educated" curly quote characters. + + Example input: "Isn't this fun?" + Example output: “Isn’t this fun?“; + """ + + oldstr = str + punct_class = r"""[!"#\$\%'()*+,-.\/:;<=>?\@\[\\\]\^_`{|}~]""" + + # Special case if the very first character is a quote + # followed by punctuation at a non-word-break. Close the quotes by brute force: + str = re.sub(r"""^'(?=%s\\B)""" % (punct_class,), smart.rquote, str) + str = re.sub(r"""^"(?=%s\\B)""" % (punct_class,), smart.rdquote, str) + + # Special case for double sets of quotes, e.g.: + # <p>He said, "'Quoted' words in a larger quote."</p> + str = re.sub(r""""'(?=\w)""", smart.ldquote+smart.lquote, str) + str = re.sub(r"""'"(?=\w)""", smart.lquote+smart.ldquote, str) + + # Special case for decade abbreviations (the '80s): + str = re.sub(r"""\b'(?=\d{2}s)""", smart.rquote, str) + + close_class = r"""[^\ \t\r\n\[\{\(\-]""" + dec_dashes = r"""–|—""" + + # Get most opening single quotes: + opening_single_quotes_regex = re.compile(r""" + ( + \s | # a whitespace char, or + | # a non-breaking space entity, or + -- | # dashes, or + &[mn]dash; | # named dash entities + %s | # or decimal entities + &\#x201[34]; # or hex + ) + ' # the quote + (?=\w) # followed by a word character + """ % (dec_dashes,), re.VERBOSE) + str = opening_single_quotes_regex.sub(r'\1'+smart.lquote, str) + + closing_single_quotes_regex = re.compile(r""" + (%s) + ' + (?!\s | s\b | \d) + """ % (close_class,), re.VERBOSE) + str = closing_single_quotes_regex.sub(r'\1'+smart.rquote, str) + + closing_single_quotes_regex = re.compile(r""" + (%s) + ' + (\s | s\b) + """ % (close_class,), re.VERBOSE) + str = closing_single_quotes_regex.sub(r'\1%s\2' % smart.rquote, str) + + # Any remaining single quotes should be opening ones: + str = re.sub(r"""'""", smart.lquote, str) + + # Get most opening double quotes: + opening_double_quotes_regex = re.compile(r""" + ( + \s | # a whitespace char, or + | # a non-breaking space entity, or + -- | # dashes, or + &[mn]dash; | # named dash entities + %s | # or decimal entities + &\#x201[34]; # or hex + ) + " # the quote + (?=\w) # followed by a word character + """ % (dec_dashes,), re.VERBOSE) + str = opening_double_quotes_regex.sub(r'\1'+smart.ldquote, str) + + # Double closing quotes: + closing_double_quotes_regex = re.compile(r""" + #(%s)? # character that indicates the quote should be closing + " + (?=\s) + """ % (close_class,), re.VERBOSE) + str = closing_double_quotes_regex.sub(smart.rdquote, str) + + closing_double_quotes_regex = re.compile(r""" + (%s) # character that indicates the quote should be closing + " + """ % (close_class,), re.VERBOSE) + str = closing_double_quotes_regex.sub(r'\1'+smart.rdquote, str) + + # Any remaining quotes should be opening ones. + str = re.sub(r'"', smart.ldquote, str) + + return str + + +def educateBackticks(str): + """ + Parameter: String. + Returns: The string, with ``backticks'' -style double quotes + translated into HTML curly quote entities. + Example input: ``Isn't this fun?'' + Example output: “Isn't this fun?“; + """ + + str = re.sub(r"""``""", smart.ldquote, str) + str = re.sub(r"""''""", smart.rdquote, str) + return str + + +def educateSingleBackticks(str): + """ + Parameter: String. + Returns: The string, with `backticks' -style single quotes + translated into HTML curly quote entities. + + Example input: `Isn't this fun?' + Example output: ‘Isn’t this fun?’ + """ + + str = re.sub(r"""`""", smart.lquote, str) + str = re.sub(r"""'""", smart.rquote, str) + return str + + +def educateDashes(str): + """ + Parameter: String. + + Returns: The string, with each instance of "--" translated to + an em-dash character. + """ + + str = re.sub(r"""---""", smart.endash, str) # en (yes, backwards) + str = re.sub(r"""--""", smart.emdash, str) # em (yes, backwards) + return str + + +def educateDashesOldSchool(str): + """ + Parameter: String. + + Returns: The string, with each instance of "--" translated to + an en-dash character, and each "---" translated to + an em-dash character. + """ + + str = re.sub(r"""---""", smart.emdash, str) # em (yes, backwards) + str = re.sub(r"""--""", smart.endash, str) # en (yes, backwards) + return str + + +def educateDashesOldSchoolInverted(str): + """ + Parameter: String. + + Returns: The string, with each instance of "--" translated to + an em-dash character, and each "---" translated to + an en-dash character. Two reasons why: First, unlike the + en- and em-dash syntax supported by + EducateDashesOldSchool(), it's compatible with existing + entries written before SmartyPants 1.1, back when "--" was + only used for em-dashes. Second, em-dashes are more + common than en-dashes, and so it sort of makes sense that + the shortcut should be shorter to type. (Thanks to Aaron + Swartz for the idea.) + """ + str = re.sub(r"""---""", smart.endash, str) # em + str = re.sub(r"""--""", smart.emdash, str) # en + return str + + + +def educateEllipses(str): + """ + Parameter: String. + Returns: The string, with each instance of "..." translated to + an ellipsis character. + + Example input: Huh...? + Example output: Huh…? + """ + + str = re.sub(r"""\.\.\.""", smart.ellipsis, str) + str = re.sub(r"""\. \. \.""", smart.ellipsis, str) + return str + + +def stupefyEntities(str): + """ + Parameter: String. + Returns: The string, with each SmartyPants character translated to + its ASCII counterpart. + + Example input: “Hello — world.” + Example output: "Hello -- world." + """ + + str = re.sub(smart.endash, "-", str) # en-dash + str = re.sub(smart.emdash, "--", str) # em-dash + + str = re.sub(smart.lquote, "'", str) # open single quote + str = re.sub(smart.rquote, "'", str) # close single quote + + str = re.sub(smart.ldquote, '"', str) # open double quote + str = re.sub(smart.rdquote, '"', str) # close double quote + + str = re.sub(smart.ellipsis, '...', str)# ellipsis + + return str + + +def processEscapes(str): + r""" + Parameter: String. + Returns: The string, with after processing the following backslash + escape sequences. This is useful if you want to force a "dumb" + quote or other character to appear. + + Escape Value + ------ ----- + \\ \ + \" " + \' ' + \. . + \- - + \` ` + """ + str = re.sub(r"""\\\\""", r"""\""", str) + str = re.sub(r'''\\"''', r""""""", str) + str = re.sub(r"""\\'""", r"""'""", str) + str = re.sub(r"""\\\.""", r""".""", str) + str = re.sub(r"""\\-""", r"""-""", str) + str = re.sub(r"""\\`""", r"""`""", str) + + return str + + +def _tokenize(str): + """ + Parameter: String containing HTML markup. + Returns: Reference to an array of the tokens comprising the input + string. Each token is either a tag (possibly with nested, + tags contained therein, such as <a href="<MTFoo>">, or a + run of text between tags. Each element of the array is a + two-element array; the first is either 'tag' or 'text'; + the second is the actual value. + + Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin. + <http://www.bradchoate.com/past/mtregex.php> + """ + + pos = 0 + length = len(str) + tokens = [] + + depth = 6 + nested_tags = "|".join(['(?:<(?:[^<>]',] * depth) + (')*>)' * depth) + #match = r"""(?: <! ( -- .*? -- \s* )+ > ) | # comments + # (?: <\? .*? \?> ) | # directives + # %s # nested tags """ % (nested_tags,) + tag_soup = re.compile(r"""([^<]*)(<[^>]*>)""") + + token_match = tag_soup.search(str) + + previous_end = 0 + while token_match is not None: + if token_match.group(1): + tokens.append(['text', token_match.group(1)]) + + tokens.append(['tag', token_match.group(2)]) + + previous_end = token_match.end() + token_match = tag_soup.search(str, token_match.end()) + + if previous_end < len(str): + tokens.append(['text', str[previous_end:]]) + + return tokens + + + +if __name__ == "__main__": + + import locale + + try: + locale.setlocale(locale.LC_ALL, '') + except: + pass + + from docutils.core import publish_string + docstring_html = publish_string(__doc__, writer_name='html') + + print docstring_html + + + # Unit test output goes out stderr. No worries. + import unittest + sp = smartyPants + + class TestSmartypantsAllAttributes(unittest.TestCase): + # the default attribute is "1", which means "all". + + def test_dates(self): + self.assertEqual(sp("1440-80's"), u"1440-80’s") + self.assertEqual(sp("1440-'80s"), u"1440-‘80s") + self.assertEqual(sp("1440---'80s"), u"1440–‘80s") + self.assertEqual(sp("1960s"), "1960s") # no effect. + self.assertEqual(sp("1960's"), u"1960’s") + self.assertEqual(sp("one two '60s"), u"one two ‘60s") + self.assertEqual(sp("'60s"), u"‘60s") + + def test_ordinal_numbers(self): + self.assertEqual(sp("21st century"), "21st century") # no effect. + self.assertEqual(sp("3rd"), "3rd") # no effect. + + def test_educated_quotes(self): + self.assertEqual(sp('''"Isn't this fun?"'''), u'“Isn’t this fun?”') + + unittest.main() + + + + +__author__ = "Chad Miller <sma...@ch...>" +__version__ = "1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400" +__url__ = "http://wiki.chad.org/SmartyPantsPy" +__description__ = "Smart-quotes, smart-ellipses, and smart-dashes for weblog entries in pyblosxom" Property changes on: trunk/docutils/docutils/utils/smartquotes.py ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision Added: svn:eol-style + native Added: trunk/docutils/test/test_transforms/test_smartquotes.py =================================================================== --- trunk/docutils/test/test_transforms/test_smartquotes.py (rev 0) +++ trunk/docutils/test/test_transforms/test_smartquotes.py 2012-10-16 12:40:36 UTC (rev 7530) @@ -0,0 +1,51 @@ +#!/usr/bin/env python +# -*- coding: utf8 -*- + +# $Id$ + +# :Copyright: © 2011 Günter Milde. +# :License: Released under the terms of the `2-Clause BSD license`_, in short: +# +# Copying and distribution of this file, with or without modification, +# are permitted in any medium without royalty provided the copyright +# notice and this notice are preserved. +# This file is offered as-is, without any warranty. +# +# .. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause + +""" +Test module for universal.SmartQuotes transform. +""" + + +from __init__ import DocutilsTestSupport # must be imported before docutils +from docutils.transforms.universal import SmartQuotes +from docutils.parsers.rst import Parser + +def suite(): + parser = Parser() + s = DocutilsTestSupport.TransformTestSuite( + parser, suite_settings={'smart_quotes': True}) + s.generateTests(totest) + return s + + +totest = {} + +totest['transitions'] = ((SmartQuotes,), [ +["""\ +Test "smart quotes", 'single smart quotes' +-- and ---also long--- dashes. +""", +u"""\ +<document source="test data"> + <paragraph> + Test “smart quotes”, ‘single smart quotes’ + – and —also long— dashes. +"""], +]) + + +if __name__ == '__main__': + import unittest + unittest.main(defaultTest='suite') Property changes on: trunk/docutils/test/test_transforms/test_smartquotes.py ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision Added: svn:eol-style + native This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |