Python API documentation generation tool / Bugs / #214 Ignores encoding of the Python files (Debian #308372)

#214 Ignores encoding of the Python files (Debian #308372)

Milestone: v3.0

Status: closed-fixed

Owner: nobody

Labels: html generation (86)

Priority: 5

Updated: 2007-09-09

Created: 2007-07-25

Creator: Kenneth J. Pronovici

Private: No

This relates to a long-standing bug report about the previous release. Epydoc ignores the encoding specified in a Python source file and generates XHTML with either "ascii" or "iso-8859-1" encoding.

I've tested, and this is still true with 3.0. I'm not sure if it's fixable, or even whether you want to fix it. Let me know whether you want me to keep the Debian bug open.

Discussion

Kenneth J. Pronovici - 2007-07-25

labels: --> html generation

milestone: --> v3.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2007-07-25

Logged In: NO

The intended behavior is that epydoc should *not* ignore the encoding of the source code, but it *should* always generate ascii xhtml file output.

In particular, when epydoc reads in documentation, it converts them to unicode strings using whatever encoding is appropriate for the given source file. When it writes the documentation, it writes it as ascii, but encodes all non-ascii characters using xhtml character references to the appropriate unicode codepoints. On any browser that supports xhtml unicode character references, this should result in correctly displayed html output. (Hopefully that's all browsers -- but I haven't done extensive testing).

One of the reasons that I chose this design is that the output files can sometimes mix text that originates from multiple source files; and those source files might be encoded using different encodings. So the only sensible thing to do is to convert everything to a common encoding.

It would be possible, if desired, to add an option to epydoc that allows an 'output encoding' to be specified. Any characters that could be encoded in that encoding would be; and any characters that could not be encoded would be represented using xhtml character references.

So I have two questions:

a) is the current design (encoding everything w/ character references) insufficient? If so, why? (e.g., because browsers xyz don't handle charrefs correctly)

b) if the current design is insufficient, would adding an option to specify the output encoding be sufficient?

If you don't know the answer, would it be possible to pass these questions back to the submitter of the Debian bug?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kenneth J. Pronovici - 2007-07-25

Logged In: YES
user_id=1168720
Originator: YES

That makes a lot of sense. I will ask the original submitter to be certain, but I think your implementation should be sufficient. It seems that the important part is that no information is lost, and you've taken care of that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kenneth J. Pronovici - 2007-09-09

Logged In: YES
user_id=1168720
Originator: YES

Closing this bug. No response from original Debian bug submitter.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kenneth J. Pronovici - 2007-09-09

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ignores encoding of the Python files (Debian #308372)

Group

Searches

Help

#214 Ignores encoding of the Python files (Debian #308372)

Discussion