The fix for tables is nontrivial, as both stringlist.get_2D_block() in statemachine.py and
SimpleTableParser.check_columns() in tableparser.py must be made to account for
zero-width combining chars.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We already similar functions in docutils.utils, used for section headings and simple tables.
However, currently these do not treat zero-width characters.
TODO:
Fix grid tables (difficult, see attachment for a try).
Consider additional zero-width characters (ZWSP, WJ, ZWNBSP
...) in table parsing and section heading recognition.
test case (tables and headings with combining chars)
Fixed for section headings.
The fix for tables is nontrivial, as both stringlist.get_2D_block() in statemachine.py and
SimpleTableParser.check_columns() in tableparser.py must be made to account for
zero-width combining chars.
Fixed for headings and simple tables.
Fixing grid tables is an open task. Tools are in docutils.utils
but the grid table parser is rather complex.
In http://permalink.gmane.org/gmane.text.docutils.devel/7679
Edvard d'auvergne pointed to https://pypi.python.org/pypi/wcwidth, a Python implementation of the wcswidth C function.
We already similar functions in docutils.utils, used for section headings and simple tables.
However, currently these do not treat zero-width characters.
TODO:
Fix grid tables (difficult, see attachment for a try).
Consider additional zero-width characters (ZWSP, WJ, ZWNBSP
...) in table parsing and section heading recognition.
According to David, this "Seems like more trouble than it's worth."
http://permalink.gmane.org/gmane.text.docutils.devel/7707
+1
That was me adding weight, as the SF voting buttons are not visible for docutils tickets for non-members.
Cannot reproduce on latest master, I believe Python 3's native unicode handling makes this a non-issue.
A
This is fixed for headings and simple tables but still fails for grid tables (not tested in the original test sample). The new sample fails here.
Even when adjusting the grid, the parser goes wrong. See [bugs:#512].
Related
Bugs:
#512Last edit: Günter Milde 2025-09-21
Fixed, now also for grid tables, in [r10251].
Related
Commit: [r10251]
Fixed in Docutils 0.22.3.