missing context-type meta header

Brought to you by: edloper

#151 missing context-type meta header

Milestone: v3.0

Status: closed-fixed

Owner: Edward Loper

Labels: html generation (86)

Priority: 3

Updated: 2007-01-17

Created: 2007-01-09

Creator: Paul Pogonyshev

Private: No

Generated HTML pages lack content-type meta header. While text/html is not really required, this header is used to specify charset of a page and that is much more important. For instance, if your documentation uses non-latin letters.

Ideally, header should look like this (in <head> section):

<meta http-equiv="Content-Type" content="text/html; charset=TAKEN-FROM-SOURCE-CODE" />

I'm not really sure how to retrieve source code encoding, but at least it must be possible with parsing. E.g. my Python files start with this line:

# -*- coding: utf-8 -*-

See Python 2.3 documentation, AFAIK encoding support was added in that version.

Discussion

Edward Loper - 2007-01-17

Logged In: YES
user_id=195958
Originator: NO

Epydoc attempts to ensure that *all* output it generates is 7-bin ASCII. If your python source files use non-ascii characters, then they'll be coverted to unicode when the module is parsed/introspected; and then those unicode characters will be rendered as html entities when the html is generated.

This seems to me to be the only sensible approach, given that it's possible to get docstrings from different modules, with potentially different encodings, on a single HTML page. (e.g., think of the module hierarchy page, which includes a summary description of each module.)

That said, I don't see how adding a meta-header that specifies charset as ASCII would hurt, so perhaps I should add it.

If you find that epydoc is not doing what it's supposed to -- i.e., if your non-ascii unicode characters are not getting rendered correctly -- then please send me an example file that generates the problem, and I'll look at what might be causing it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Edward Loper - 2007-01-17

priority: 5 --> 3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Pogonyshev - 2007-01-17

Logged In: YES
user_id=1203127
Originator: YES

It seems that I didn't check after upgrading to Epydoc 3.0.0alpha3 from an old version. It still wrote no charset, so I decided there still was the problem with non-ASCII characters. (My script adds charset header at post-processing stage, so I could only detect that bug was fixed in Epydoc by looking into HTML source.) So, I close this bug as already fixed.

However, I'd suggest adding a command-line option that would fix charset to option's value for the whole Epydoc run. If you can convert arbitrary characters (presumably in different encodings) to HTML entities, you should be able to convert them to e.g. UTF-8 as well. While this is a minor feature, it would allow to decrease size of generated HTML pages if those contain many non-ASCII characters.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Pogonyshev - 2007-01-17

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.