Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#11 Unicode issues - 2 (in statemachine)

closed-fixed
David Goodger
None
6
2003-06-29
2003-06-28
Roman Suzi
No

(I am continuing to use docutils with cyrillically encoded
documents.)

This I am doing:

html.py -l ru -i cp866 -o cp1251 -e koi8-r offend-2.txt
offend-2.html

This I am getting:

Errors go to errors.txt

Dumped settings in file settings.txt

It seems, there are several places in the docutils
which aren't
"8-bit safe". In any case, I think that docutils must not
traceback on badly formatted document.

I am also sure that this kind of error is not unique
to cyrillic. Probably the same problem is with latin-1.

Discussion

  • Roman Suzi
    Roman Suzi
    2003-06-28

    dumped settings

     
    Attachments
  • Roman Suzi
    Roman Suzi
    2003-06-28

    traceback

     
    Attachments
  • Roman Suzi
    Roman Suzi
    2003-06-28

    Offending doc

     
    Attachments
  • Roman Suzi
    Roman Suzi
    2003-06-28

    • priority: 5 --> 6
     
  • Roman Suzi
    Roman Suzi
    2003-06-28

    Logged In: YES
    user_id=287815

    This bug is not just Unicode bug. offend3.txt (attached)
    shows this bug with ASCII text file!

    This problem is usual for those who edit files in mixed
    environments - thus having \r and \n sometimes
    in wrong places.

    In my example, \r is between ========== and ==========.

    I think this condition must not give obscure error and traceback
    from docutils.

     
  • Roman Suzi
    Roman Suzi
    2003-06-28

    One more offending file example (this time 7bit clean)

     
    Attachments
  • David Goodger
    David Goodger
    2003-06-29

    Logged In: YES
    user_id=7733

    > In any case, I think that docutils must not
    > traceback on badly formatted document.

    I agree, and I've just checked in a fix to
    prevent tracebacks when there are
    parsing errors.

    Is this the bug you're reporting? If not, please
    describe it more specifically.

    Please also answer my questions in bug 760673:
    https://sf.net/tracker/?func=detail&atid=422030&aid=760673&group_id=38414
    In future, please don't close a bug report without
    a comment stating why. Also, don't bother with
    setting the bug priority; we have so few, it doesn't
    matter, and they're all high-priority to me.

    > This problem is usual for those who edit files in mixed
    > environments - thus having \r and \n sometimes
    > in wrong places.

    That has nothing to do with it. Docutils does line
    splitting on \r and \n and \r\n; mixed line endings
    shouldn't matter. The text simply doesn't parse.

    > I think this condition must not give obscure error
    > and traceback from docutils.

    I agree on the traceback. The UnicodeError could be
    considered a bug. Try it now; is the error reported
    still obscure? If so, how would you improve it?

     
  • Roman Suzi
    Roman Suzi
    2003-06-29

    Logged In: YES
    user_id=287815

    Yes, the bug has gone: docutils give traceback no more and
    uses -e correctly to report this error.

    As for another bug I reported earlier I thought it was fixed.
    --dump-settings are the same as for this one (I have not
    changed
    settings). Sorry for missing comment.

    > we have so few [bugs]

    And this is really true and amasing: I hope my findings will
    help
    improve docutils.

    The error now is right. Maybe there is a need to guess
    on the usage (table, heading, ...). Or add a word "matching":

    Missing matching underline for overline.

     
  • David Goodger
    David Goodger
    2003-06-29

    • status: open --> closed-fixed