From: Alan O. <al...@si...> - 2011-01-02 17:38:09
|
On 01/02/2011 12:11 PM, Alan Ott wrote: > Sorry, in my previous email (quoted below) replace references to > /sys/bus/usb/devices/usbN with /sys/bus/usb/devices/$USB_ADDRESS. > > Alan. > > On 01/02/2011 11:54 AM, Alan Ott wrote: > >> On 01/01/2011 10:38 PM, Alan Stern wrote: >> >>>> This will only work, if the creation/deletion of $SYSFS/$DEVICE/busnum >>>> is atomic with the creation/deletion of $SYSFS/DEVICE. From inspecting >>>> the kernel code, it appears that the removal is atomic. >>>> >>> No, it isn't. device_del() calls device_remove_attrs() before calling >>> bus_remove_device(). I don't know why this isn't done in the opposite >>> order from device_add(), but it isn't. >>> >> device_del() calls device_remove_sys_dev_entry() before either of >> those two functions you mentioned. >> >> In device_del() (only the sysfs-related stuff): >> 1. dpm_sysfs_remove() (remove power-management related sysfs stuff) >> 2. device_remove_sys_dev_entry() (removes /sys/devices/pciX/usbN >> entry for this device. This is where the actual sysfs entries are; >> /sys/bus/usb/devices contains symlinks to /sys/devices/pciX/usbN). >> 3. device_remove_class_symlinks() (removes /sys/class/ symlinks to >> the /sys/devices/ files for this device). >> 4. device_remove_attrs() (remove the sysfs attribute (and group) >> files for the device). >> 5. bus_remove_device() (remove the symlink in >> /sys/bus/usb/devices/ to the /sys/devices/ files for this device). >> >> While it is true that libusb uses the /sys/bus/usb/devices symlinks, >> and that they are indeed removed last, I believe it is also true that >> device_remove_sys_dev_entry() removes the _target_ of these symlinks. >> The race condition here is that it's possible the >> /sys/bus/usb/devices/usbN/busnum file could not exist (have been >> deleted already), but the /sys/bus/usb/devices/usbN directory could be >> a broken symlink (if a device removal happens between steps 3 and 5 >> above). I believe opendir() would still set errno to ENOENT, but I'll >> need to check. >> >> >>>> The entire >>>> directory is deleted from device_remove_file(), called from >>>> device_del(). I followed it in LXR all the way down to >>>> sysfs_unlink_sibling() and it appears to remove the whole directory >>>> at once. >>>> >>>> The creation of sysfs entries does not appear to be so. device_add() >>>> calls device_create_sys_dev_entry() which then calls device_add_attrs() >>>> and device_add_groups(), >>>> >>> No, device_add() calls device_add_attrs() directly. >>> device_create_sys_dev_entry() has nothing to do with this. >>> >>> >> Yes, but bus_add_device() is what adds the /sys/bus/usb/usbN symlink >> (which is what libusb uses), and it's called _after_ >> device_add_attrs() in device_add(), indicating that during device >> creation, if there's a /sys/bus/usb/usbN symlink, that the busnum (and >> other attribute and group entries) are present. >> >> >>> A better way to solve this problem would be to avoid drawing >>> conclusions about the availability of sysfs. If the sysfs busnum file >>> doesn't exist then try usbfs, by all means. But don't assume it means >>> that sysfs is unusable. >>> >> Those two are one and the same in the current implementation (ie: it >> was like that when I got here). If sysfs fails (for either being too >> old or because the device got unplugged), the current implementation >> falls back to usbfs _permanently_ and globally. Further, if it >> happens because of a device removal after other devices have already >> been enumerated, the global state will be out of sync with the devices >> which were enumerated _before_ the fall-back. Further still, devices >> do not get enumerated correctly during future enumerations in some >> cases (that's the problem that led me down this path). >> >> If not a read failure, how do you suggest determining whether sysfs is >> usable? Libusb's libusb/os/linux_usbfs.c has a detailed comment at the >> top about which versions of the kernel added which sysfs entries. >> Having an "unusable" sysfs is not unlikely. It's apparently likely >> enough that running a new libusb with an older kernel is something the >> libusb maintainers do want to support (at least they did at some >> point, according to the comments and code). Maybe a call to uname() >> would be sufficient. Then again with distro kernels maybe not (other >> opinions on this would be appreciated). >> >> I thought at one point about setting a flag after _any_ device has >> been properly configured using sysfs, stating that sysfs is usable, >> and telling libusb to not fall back to usbfs (basically to assume that >> any failed reads were due to device removals, not due to sysfs >> inadequacy). This would work nearly all the time, except that a race >> condition still exists if the first device gets removed at the same >> time as the first enumeration (a read failure on the first device's >> busnum). This would cause libusb to fall-back to usbfs immediately, >> which might not be so bad, but still undesirable. >> >> Don't get the wrong idea, I'm always completely open to better ideas. >> It could really all come down to one question: What is a robust, >> race-proof way to determine whether sysfs has enough of the attributes >> and groups to be considered "usable" by libusb? If sysfs could be >> checked for usability one time at startup (in libusb_init()), the >> "fall back to usbfs on failure" code could be removed from the main >> execution path, and we'd be done. >> >> Alan. >> >> So I made a small test program, and it does appear that the sysfs device directory _can_ exist while files inside it do not. while (1) { FILE *fp = fopen("/sys/bus/usb/devices/1-5.3/busnum", "r"); if (!fp) { DIR *d = opendir("/sys/bus/usb/devices/1-5.3/"); if (d) { printf("busnum gone but dir remains\n"); struct dirent *ent; while ((ent = readdir(d)) != NULL) { printf("%s\n", ent->d_name); } } else { //printf("Dir is gone too.\n"); closedir(d); } } else fclose(fp); } If I run that program and then unplug device 1-5.3, I will always get the "busnum gone but dir remains" printout, and there are typically from one to three files listed as actually being present. So much for my kernel code analysis.... I should have just tried it once first. So that puts me back to square one then: How to determine the adequacy of sysfs without there being race conditions involved when devices get unplugged? Maybe the right place to fix this is in the kernel by calling bus_remove_device() before device_remove_attrs() in device_del(). I think regardless of the outcome of this discussion, that probably does need to get fixed. Alan. |