#65 Hang in usb_start_wait_urb() on black measurements

version 1.11
open
nobody
5
2007-06-13
2007-06-10
Craig Ringer
No

Hi

When using lprof (svn at date of bug submission) with a GretagMacbeth Eye1 2 to set the display black level, the UI hangs when trying to measure the black region. The hang continues until it is broken by dramatically increasing the brightness of the region under the colour meter (eg by moving a bright window over it).

ps shows that lprof is in a system call (wchan=usb_start_wait_urb). I obtained a kernel backtrace using alt+sqsrq+t and extracted the thread information for lprof. It's attached as lprof_task_trace.txt .

I can't / don't know how to get a user-space backtrace on the process while it's in a system call. However, lprof_gdb_bt.txt shows what the stack looks like if I hit CTL-C in gdb while it's waiting in the kernel, then move a bright window over the colour meter.

I'm using libusb 0.1.12-5 from Debian Etch on a 2.6.18 kernel (full uname 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 GNU/Linux). I'll be updating to 2.6.21 shortly to see if that affects the problem.

Discussion

  • Craig Ringer
    Craig Ringer
    2007-06-10

     
    Attachments
  • Craig Ringer
    Craig Ringer
    2007-06-10

    Logged In: YES
    user_id=639504
    Originator: YES

    File Added: lprof_gdb_bt.txt

     
  • Craig Ringer
    Craig Ringer
    2007-06-10

     
    Attachments
  • Craig Ringer
    Craig Ringer
    2007-06-10

    Logged In: YES
    user_id=639504
    Originator: YES

    Sorry there's no debug info on the libusb calls. I'll build the latest libusb with debug info when I upgrade the kernel. I thought I'd file this in the mean time (a) in case anyone else hits this and (b) to record the info for comparison with after the update.

     
  • Hal Engel
    Hal Engel
    2007-06-10

    Logged In: YES
    user_id=1052244
    Originator: NO

    My system has libusb 0.1.12 but I am currently running a 2.6.20 kernel and I have not seen this problem. When I started working on this code my kernel was a 2.6.19 kernel and I know that there were usb issues when I had a 2.6.18 kernel related to my joysticks. The i1 displays do take a lot longer to read very dark patches then is the case with lighter patches. On my system very dark patches take about 4 to 5 seconds to read where as very bright patches take less than a second.

    The call chain looks like I would expect and this is happening in a call from the meter support code. One thing you might try is to install the ArgyllCMS beta, which is were the meter code comes from, and see if it does the same thing when you run the black level adjustment in dispcal. The current LProf code is from ArgyllCMS 0.70 beta 2 and is a few months old at this point so the code you download from the Argyll web site might be newer. If it also hangs like this then we can work with them to get this fixed in the meter support library. If not then perhaps the LProf code is doing something wrong when if initializes the meter or usb port.

     
  • Craig Ringer
    Craig Ringer
    2007-06-11

    Logged In: YES
    user_id=639504
    Originator: YES

    I'm still seeing the issue on 2.6.21 with libusb svn.

    Numeric: Error - Configuring USB port 'usb:/bus0/dev7 (GretagMacbeth i1 Display)' to 1 failed with -110 (could not set config 1: Connection timed out)

    I've enabled verbose HID & USB logging in the kernel, and see:

    usb 2-4.2: lprof timed out on ep0in len=0/8
    usb 2-4.2: usbfs: USBDEVFS_CONTROL failed cmd lprof rqt 194 rq 22 len 8 ret -110usb 2-4.2: lprof timed out on ep0out len=0/0
    usb 2-4.2: lprof timed out on ep0out len=0/0

    The issue turns out to be with the built-in USB hub in the Eizo panel - the profiler works when connected directly to a root hub port on the host's motherboard. For one brief moment I forgot that USB hubs are pure evil. (That said, it works fine with Windows).

     
  • Hal Engel
    Hal Engel
    2007-06-11

    Logged In: YES
    user_id=1052244
    Originator: NO

    Some USB hubs can be an issue with some devices. I have a hub and it works fine with my i1 on linux and windows. It has problems with my CH Products joystick but works fine with another usb joystick. I should also add that the CHP joystick will also fail to work on at least one of my motherboard ports. I have not tested to see if the joystick also has a problem with the hub on Windows.

    It looks like this is timing out while it is trying to open the port at the beginning of spot reading process.

    The error message is happening in src/argyll/spectro/usbio.c in function usb_open_port(...). If you change the values of verb and mydebug to 1 in
    src/moncalqt/monitorcal.cpp function MonitorCal::initInstrument() there will be a lot more messages dumped to the console as this runs. It might give more information about what is actually happening.

    Also in src/moncalqt/monitorcal.cpp function MonitorCal::initInstrument() the time out in the call to it->init_coms(it, comport, br, 15.0) on line 4190 is set to 15 seconds (last parameter) which should be more than long enough. Is 15 seconds about how long this "hangs" on you or do you have to cause it to fail to get out of the wait state on the usb port? IE. this should time out after about 15 seconds and it should return control to lprof in MonitorCal::takeStopReading() where it should issue an error message which will appear in the message area at the bottom of the dialog (do you see this message?). At that point the code tries to deal with the error. But I have never had this happen on my machine (IE. the spot reading is always OK) so this code is totally untested.

    Also looking at the error handling code it will try to use lower baud rates to recover from com errors and this is a section of code that should only be used for serial devices. In addition this code really does not do anything as this point since the read_sample operation has failed and we do not retry it. This needs to be fixed so that it only reaches this code for a serial device and not a usb device and to actually retry the read operation if needed or to not do any of this. I will also look at this code to see what I can do to allow the user to regain control but once the call has been made to it->read_sample all I think we can do is wait for it to time out. But it might be possible to do this in a separate thread and allow the user to abort the operation before it has timed out.

    I would like to encourage you to try reproducing this running ArgyllCMS dispcal to see if it happens there. If it does then we can work on this with them and perhaps end up with a better solution.

     
  • Hal Engel
    Hal Engel
    2007-06-13

    • labels: --> measurement system
    • milestone: --> version 1.11
     
  • Hal Engel
    Hal Engel
    2007-10-14

    Logged In: YES
    user_id=1052244
    Originator: NO

    Craig should this still be open or should this is written off as a hardware problem with the hub or problem with libusb?

    Current CVS is using a newer version of the Argyll meter code and perhaps it would be worth while to test with the hub again to see if this works better than before.