Menu

#197 MathML: support pandoc(1) as an external converter

None
closed-accepted
nobody
None
5
2023-05-10
2022-10-28
Ximin Luo
No

In my personal experience, pandoc is more reliable than latexml, e.g. the current support for the latter in docutils does not automatically include amsfonts which prevents things like mathbb from working. In pandoc it Just Works.

1 Attachments

Discussion

  • Günter Milde

    Günter Milde - 2022-11-01

    Thank you for the patch.

    What are the advantages over the native LaTeX -> MathML converter?

    A test of the new conversion route with the maths documentation source
    mathematics.txt revealed:

    • it takes long to run (considerably longer than the native converter but far shorter than latexml),
    • there are > 120 conversion errors (unknown AMSmath macros),
    • the error messages need getting used to
      (instead of reporting an unknown macro, it tells about expecting "%", "\label", "\nonumber" or whitespace)
    • the successfull conversions are similar to the native MathML support.

    BTW: a fix for "amsfonts" commands with latexml is ready and will be soon in the repository.

     
  • Ximin Luo

    Ximin Luo - 2022-11-01

    (delete double-post; see comment below)

     

    Last edit: Ximin Luo 2022-11-01
  • Ximin Luo

    Ximin Luo - 2022-11-01

    One big advantage of pandoc is that it supports {align} and {gather}.

    Pandoc and latex2mathml.py supports different subsets of latex. For example, latex2mathml.py does not support \gsime, but pandoc does.

    The pandoc list of supported symbols is here [1] and it appears to be bigger (~2800 symbols) than the one in latex2mathml.py (~100 symbols), although it does not support some stuff like \underleftrightarrow that latex2mathml does.

    Actually it appears [1] is written by yourself, and that version is 8 years old, so if you know where a newer version is I can go file a PR to them to support newer stuff like \underleftrightarrow.

    (Another advantage on top of the other external tools, is that it is still under active development; the last commit to the texmath component was 24 days ago.)

    [1] https://github.com/jgm/texmath/blob/master/lib/totexmath/unimathsymbols.txt

     
    • Günter Milde

      Günter Milde - 2022-11-01

      One big advantage of pandoc is that it supports {align} and {gather}.

      I see. Active development and widespread use is another bonus (especially
      over other external tools).

      Pandoc and latex2mathml.py supports different subsets of latex. For
      example, latex2mathml.py does not support \gsime, but pandoc does.

      The pandoc list of supported symbols is here [1] and it appears to be
      bigger (~2800 symbols) than the one in latex2mathml.py (~100 symbols)

      latex2mathml uses the same database and extracts the about 600 symbols
      supported by the most common math packages. It also supports literal
      Unicode characters in the input.

      Actually it appears [1] is written by yourself, and that version is 8
      years old, so if you know where a newer version is I can go file a PR
      to them to support newer stuff like \underleftrightarrow.

      The database and related work is available under
      https://milde.users.sourceforge.net/LUCR/Math/
      The latest revision is used in latex2mathml but not published yet.

      The "unimathsymbols" database only contains LaTeX math macros that map
      directly to Unicode code points. (\underleftrightarrow is implemented
      using ↔ (\leftrightarrow) in a <munder> element.)

       
  • Günter Milde

    Günter Milde - 2022-11-04
    • status: open --> open-accepted
     
  • Günter Milde

    Günter Milde - 2022-11-04

    The patch is committed in [r9216].
    Thank you for your contribution.

     

    Related

    Commit: [r9216]

  • Günter Milde

    Günter Milde - 2023-05-10
    • status: open-accepted --> closed-accepted
     
  • Günter Milde

    Günter Milde - 2023-05-10

    Fixed in Docutils 0.20
    Thanks again for your contribution.

     

Log in to post a comment.