#71 Unicode path breaks output redirection for cdemu status

None
closed
nobody
None
unassigned
default
2013-11-25
2013-11-11
Ambrevar
No

Mount an image whose path contains a non-ASCII character. Then run

$ cdemu status | cat

or any kind of output redirection. Result:

Traceback (most recent call last):
Devices' status:
DEV   LOADED     FILENAME
  File "/usr/bin/cdemu", line 957, in <module>
    ret = cdemu.process_command(sys.argv[1:])
  File "/usr/bin/cdemu", line 145, in process_command
    return command[3](self, arguments[1:])
  File "/usr/bin/cdemu", line 272, in cmd_display_status
    print("%-5s %-10s %s" % (device, loaded, filenames[0]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 80: ordinal not in range(128)

My fix: in 'cdemu', replace

print("%-5s %-10s %s" % (device, loaded, filenames[0]))

by

 print("%-5s %-10s %s" % (device, loaded, filenames[0].encode('utf-8', errors='ignore')))

Note that the 'ignore' option should never harm, but just in case.

I've tried to understand where this issue comes from. If you print the filename

def cmd_load_device (self, arguments):
    if len(arguments) < 2:
        self.print_invalid_number_of_parameters("load")
        return False

    # We need to pass absolute filenames to daemon
    filenames = map(os.path.abspath, arguments[1:])
    print(filenames[0])

there will be no error. So I guess the encoding issues happens in D-Bus. There must be something wrong with the python bindings. If this happen to be true, it would be great to report upstream.

System (since you asked)

Linux edf23ads 3.11.6-1-ARCH #1 SMP PREEMPT Fri Oct 18 23:22:36 CEST 2013 x86_64 GNU/Linux

LSB Version:    1.4
Distributor ID: arch
Description:    Arch Linux
Release:    rolling
Codename:   n/a

Library version: 2.1.0
Daemon version: 2.1.0

vhba                   10616  2 
scsi_mod              128695  6 sg,vhba,usb_storage,libata,sd_mod,sr_mod

  PID TTY          TIME CMD
 3311 ?        00:00:03 cdemu-daemon

Discussion

  • Rok Mandeljc

    Rok Mandeljc - 2013-11-11

    Does adding utf8_strings=True as an argument to DeviceGetStatus (i.e., [loaded, filenames] = self.dbus_iface.DeviceGetStatus(device, utf8_strings=True)) fix the issue?

    It turns out that while DBus on-the-wire encoding is UTF8, by default, python DBus bindings return strings as dbus.String, which is subtype of unicode. According to docs, adding the above argument causes it to return utf8 strings (subtype of str) instead.

    The default behavior, which results in trying to print unicode string, does not seem to cause problems when using UTF8-based LANG (which I assume you are not?). In this case, sys.stdout.encoding is set to "UTF-8". When using non-UTF8 language, sys.stdout.encoding is set to (in my case at least) "ANSI_X3.4-1968" and printing unicode strings causes ascii codec errors.

     
  • Ambrevar

    Ambrevar - 2013-11-12

    Thanks for the quick response!

    Your solution

    [loaded, filenames] = self.dbus_iface.DeviceGetStatus(device, utf8_strings=True)

    works perfectly indeed! You solve it at the root of the problem, so your fix is better than mine I guess.

    Regarding your explanation: well, turns out I'm using UTF-8 locales, and the most widespread ones I guess! So this bug must affect a lot of users!

    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES=en_US.UTF-8
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    

    So if you are right about the python dbus implementation, then there must be quite a huge gap between what is documented and what is actually implemented!

    By the way sorry for posting this in the feature tracker, I didn't notice it was not the bugtracker. Got fooled by the tricky SF web design (again)...

     
  • Rok Mandeljc

    Rok Mandeljc - 2013-11-12

    Ticket moved from /p/cdemu/feature-requests/23/

     
  • Rok Mandeljc

    Rok Mandeljc - 2013-11-12

    Glad it worked for you.

    The only thing that is puzzling me is that for me, the issue is only reproducible if I force LANG from sl_SI.UTF-8 to sl_SI... If you run python and import sys module, what is the value of sys.stdout.encoding?

     
    • Rok Mandeljc

      Rok Mandeljc - 2013-11-12

      Ah, scratch that. I did not properly read the bug report. With output redirection, the issue was reproducible regardless of LANG setting.

      But, when you do output redirection, the sys.stdout.encoding value gets changed from "UTF-8" to None, so the underlying reason for the issue is the same.

       
  • Rok Mandeljc

    Rok Mandeljc - 2013-11-25
    • status: open --> closed
    • Milestone: -->
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks