#214 Ignores encoding of the Python files (Debian #308372)

v3.0
closed-fixed
nobody
5
2007-09-09
2007-07-25
No

This relates to a long-standing bug report about the previous release. Epydoc ignores the encoding specified in a Python source file and generates XHTML with either "ascii" or "iso-8859-1" encoding.

I've tested, and this is still true with 3.0. I'm not sure if it's fixable, or even whether you want to fix it. Let me know whether you want me to keep the Debian bug open.

Discussion

    • labels: --> html generation
    • milestone: --> v3.0
     
  • Logged In: NO

    The intended behavior is that epydoc should *not* ignore the encoding of the source code, but it *should* always generate ascii xhtml file output.

    In particular, when epydoc reads in documentation, it converts them to unicode strings using whatever encoding is appropriate for the given source file. When it writes the documentation, it writes it as ascii, but encodes all non-ascii characters using xhtml character references to the appropriate unicode codepoints. On any browser that supports xhtml unicode character references, this should result in correctly displayed html output. (Hopefully that's all browsers -- but I haven't done extensive testing).

    One of the reasons that I chose this design is that the output files can sometimes mix text that originates from multiple source files; and those source files might be encoded using different encodings. So the only sensible thing to do is to convert everything to a common encoding.

    It would be possible, if desired, to add an option to epydoc that allows an 'output encoding' to be specified. Any characters that could be encoded in that encoding would be; and any characters that could not be encoded would be represented using xhtml character references.

    So I have two questions:

    a) is the current design (encoding everything w/ character references) insufficient? If so, why? (e.g., because browsers xyz don't handle charrefs correctly)

    b) if the current design is insufficient, would adding an option to specify the output encoding be sufficient?

    If you don't know the answer, would it be possible to pass these questions back to the submitter of the Debian bug?

     
  • Logged In: YES
    user_id=1168720
    Originator: YES

    That makes a lot of sense. I will ask the original submitter to be certain, but I think your implementation should be sufficient. It seems that the important part is that no information is lost, and you've taken care of that.

     
  • Logged In: YES
    user_id=1168720
    Originator: YES

    Closing this bug. No response from original Debian bug submitter.

     
    • status: open --> closed-fixed