> > I have kernel debugging, but not a protocol analyzer yet. I've
> > studied libusb and the shim layer for Solaris's ugen driver some,
> It's a little worrisome that the failing packet is the
> "oh, hey, we don't know why, but Garmin says to, so, here" packet...
Would this be the gusb_reset_toggles() stuff? I tried to capture the
punchline of that in the big ole comment at the top, but if you're not
hip with the Zen of USB, it can be subtle.
Communications with USB devices happen over one or more connections
called pipes. USB connections consist of two endpoints; one host and
one device. Packets flow over pipes. The host and the device each
maintain a toggle state per endpoint. The toggle bit is included in
every packet. When a packet is successfully transferred, each
endpoint inverts the state of the toggle bit that will be included in
the subsequent packet. This is all part of USB's data integrity
The toggle bits are supposed to alternate, but both be reset to zero
in a very specific set of conditions. Reset is the obvious one, but
there's also some point in the device enumeration that resets the
toggles. Here's where these devices hit the skids.
The early implementation of Garmin's USB screwed this up and didn't
reset the toggles on enumeration. Since all the world is Windows and
Windows only does device discovery when the device is attached and
Windows itself (never mini drivers) does the discovery, the toggle
state was never blown.
Oh, the whole world isn't Windows? There are operating systems where
the OS shared enumeration/configuration cycles with the drivers,
perhaps letting the host OS fondle the hardware to see if it wants the
device, but still leave it accessible to user space tools (such as
libusb) that may want to run their own enumeratino/configuration
cycles? Drat. Suddenly (because the discovery that all the world
isnt' Windows tends to come as an epiphany to hardware companies)
implementing that sentence in the USB spec becomes important.
If you're lucky enough to get through the OS-initiated handling with
the "right" parity of the number of packets xferred, the lowly
application has a good chance of succeeding. The application might
succeed a second time if the first time had the "right" parity of the
number of packets transferred. The odds of this happening on all
pipes for any interesting number of transfers quickly approaches zero.
This is why, for a while, we had cases on Mac where GPSBabel would
run once but hang on the second time. Unplug and replug the device
and it'd run once but hang the second time.
If you're on an affected device and looking at the packets you put
into the HCI from the host, every packet up through the configuration
will look perfect. The first packet after that will never actually
leave the controller. If you're watching it on a bus analyzer, the
host will be streaming that packet repeatedly to the device and the
device will be continuously NAKing that packet, secure in the
knowledge that the toggle bit is "wrong" and therefore the data was
So the horror that is gusb_reset_toggles uses inside knowledge that
the devices reset their toggle state on a Garmin protocol "session
start" command and that multiple sesssion starts are benign. The
first one resets the host host's toggle if it's stuck. If it's not
stuck, it continues to the device. But the host state is now good.
The second one is guaranteed to get to the device. The resulting
ack may or may not be able to be sent if the devices's toggle state is
blown. (Why would they be out of sync? On these device, the command
and the ack go down different pipes.) Either way, the device and the
host should now be in sync for both the bulk out and the interrupt in
toggles and the third would succeed. On Mac and Linux, at least,
this gets us all unstuck and communications proceed.
The really early Garmin devices (and I think this would far predate
the X05 models) also ignored the length field on one of the
configuration packets and if you sent anything other than nine bytes
in one of the headers - after all, Windows always sent nine - would
Around mid '06 when I was really in Garmin's face about this and they
had the realization that the Mac market mattered to them (they
announced Mac support in Jan of '06) and that this unfortunate problem
really hosed Mac, they started fixing their USB implementation on
their high profile products to honor the toggle reset. The trick is
that it's on a product-by-product basis and the description in the
firmware release notes is never enough to figure out if the fix is in
any given product. Silence may mean it's been fixed and words like
"Improved USB compatibility" may not mean it's been fixed.
They've done pretty well updating the handhelds. I think the current
firmware updates for all the handhelds address this. I'll sometimes
tie into smaller niche products like Quest or the 276 that haven't had
this fix applied yet. (In fairness, it's been a while since I looked
at either one; they may have been fixed by now.) I don't know of
any Garmin device that I expect to work that, with current firmware,
doesn't succeed with the gusb_reset_toggles() code in place on Linux
Given my recollections of the timeline of the 305, I'd be surprised if
the current firmware has the toggle bug and that our "fix" is
ineffective, but this just sounds too familiar.
Is it possible that the host protocol stack and/or HCI driver
miscalculated the toggle state in light of a libusb-consumer sending
enumeration/configuration cycles? :-)
P.S. Next time you see someone saying that supporting
vendor-proprietary USB is easy, feel free to refer them to this
message. This cost me a trainload of time to figure out and USB was
my day job at the time...