#1 %FC or other in Filename

closed
nobody
None
5
2005-06-29
2005-06-09
jensk
No

if %XX-Chars in File- or Foldername the sitegen reports
an error:

Walking DIRECTORY "/home/j/jaferien.de/public_html/"
Traceback (most recent call last):
File "sitemap_gen.py", line 1201, in ?
sitemap.Generate()
File "sitemap_gen.py", line 832, in Generate
input.ProduceURLs(self.ConsumeURL)
File "sitemap_gen.py", line 585, in ProduceURLs
os.path.walk(self._path, PerFile, None)
File "/usr/local/lib/python2.4/posixpath.py", line
290, in walk
func(arg, top, names)
File "sitemap_gen.py", line 581, in PerFile
consumer(url, False)
File "sitemap_gen.py", line 875, in ConsumeURL
hash = url.MakeHash()
File "sitemap_gen.py", line 299, in MakeHash
return md5.new(self.loc).digest()
UnicodeEncodeError: 'ascii' codec can't encode
character u'\xfc' in position 41: ordinal not in range(128)

i can not change the filenames - it is a german-problem ;-)

mfg jens

Discussion

  • Matt Warden
    Matt Warden
    2005-06-09

    Logged In: YES
    user_id=313828

    Can you somehow include the filename that is causing the
    problem? It's hard for me to test possible solutions. You
    might try altering the offending line from:

    return md5.new(self.loc).digest()

    to:

    return md5.new(self.loc.encode( "utf-8" )).digest()

    But, my guess is that there are going to need to be further
    changes in other locations.

     
  • Matt Warden
    Matt Warden
    2005-06-09

    Logged In: YES
    user_id=313828

    My apologies for commenting too soon. Here's a series of
    commands that can reproduce a similar problem and then show
    that encoding in utf8 seems to solve the problem.

    first, without the problem character:
    >>> object = u'Aufgerumt'
    >>> md5.new(object).digest()
    '\xdc6\x9c\xb8\x19\x0b\x80\xd5Z\xdd]_h\xea/\x81'

    with it:
    >>> object = u'Aufger\xe4umt'
    >>> md5.new(object).digest()
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character
    u'\xe4' in position 6: ordinal not in range(128)

    encoding at utf8 fixes problem:
    >>> md5.new(object.encode("utf-8")).digest()
    'T\xe8p\x18c\x1f\xe6;M\x08\xc1\xc4\x1c\x98\xa4\xf3'

    HTH.

     
  • jensk
    jensk
    2005-06-09

    Logged In: YES
    user_id=1293819

    i have make the changes to encode("utf-8"). now the
    sitemap_gen works without errors, but i'm not sure: is the
    xml-file ok?
    a URL is:
    http://jaferien.de/uebersicht/43/M%FCnchen/Reisen-H.html
    and the path is:
    /home/j/jaferien.de/public_html/43/Mnchen ...

    thanks for your help, sorry for my bad bad englsih, but we
    have learn russian in the east-germany-scool ;-)

     
  • Wyvern
    Wyvern
    2005-06-29

    • status: open --> closed