Re: Problems with Texas Instruments TSB82AA2 chipset and cameras on both new and old stacks
Brought to you by:
aeb,
bencollins
From: Joan P. B. <joa...@ui...> - 2011-09-20 08:55:15
|
Some comments after the quote. Al 19/09/11 21:38, En/na Stefan Richter ha escrit: > In the first part of your log, everything looks perfectly fine. > > From timestamp 1418.424221 to 1431.015034, you get a storm of bus reset > events. None of these bus resets is concluded with a so-called self-ID- > complete event, which would be when the nodes on the bus identified > themselves to each other after the bus reset. > > Normal FireWire traffic can only happen after a self-ID-complete event > and before the next bus reset event. > > At timestamp 1431.015048, the first self-ID-complete event happens, i.e. > a successful conclusion of a bus reset. After that, the kernel and your > application are apparently able to use the cameras again. > > Bus resets _normally_ happen: > 1. If you plug a FireWire device in to or out of the FireWire bus or > power a FireWire device up or down, > 2. if bus management software or node management firmware or special > application software requests a bus reset as a means to announce > a change of node configuration to all peers. > > Bus resets _abnormally_ happen: > 3. If rogue software requests a bus reset for no good reason, > 4. if a physical interface chip malfunctions due to hardware errata, > 5. if a physical interface chip of a device or a cable or a > combination of chips and cable works too far off the standardized > electrical parameters of the FireWire physical layer. > > The storm of bus resets from timestamp 1418.424221 onwards is of course > of one of the abnormal types. > > On 3.: firewire-core since kernel 2.6.36 contains safeguards which > prevent any software that runs on the Linux node to cause a storm of > bus resets. > > On 4.: Texas Instruments TSB81BA3 (all revisions up to revision C > inclusive) is a seriously buggy PHY which tends to fall into bus reset > storms at the drop of a hat. However, this is only known to happen in > case of pure 1394b "beta mode" buses: > http://www.ti.com/litv/pdf/sllz015c > > Your cameras are 1394a devices, therefore the bus operates in "alpha > mode" a.k.a. data-strobe mode. The above mentioned TSB81BA3 errata are > *not* known to happen in alpha mode; instead, as far as documented by > TI and as far as I have observed myself, having one or more 1394a node > on the bus keeps TSB81BA3 working correctly. > > On 5.: Various circumstances can contribute to such malfunctions, alone > or in combination: > - Plainly damaged hardware of course, > - badly constructed or miswired or too long cables, > - badly laid out circuit boards, > - too weak or electrically too noisy power supplies, > - badly selected or overheating clocking crystals, > - overheating physical interface chips, > - electromagnetic interference from within one of the FireWire nodes > or from external sources. > > My suggestion to you, if you haven't done so already: > > Check whether you have high-quality cables, whether the environmental > conditions in your lab (EMI, temperature, humidity, mechanical vibrations, > radiation,...) are in line with what is specified for your embedded PC > and for the Bumblebee2 cameras or what is generally assumed for office > use, whether power supply is adequate and stable, whether your embedded PC > is properly shielded against EMI (inbound and outbound). > > If you are more or less certain that your equipment and lab are OK, you > should approach Point Grey or/and the vendors of the components of your > embedded PC with the issue. > > If you have adverse conditions in your lab (or wherever this equipment > is deployed) which you cannot improve, ask Point Grey and the other > component vendors as well whether they have recommendations for > counter measures. > > ----------------------------------------------------------------------- > > Oh wait, I now looked up the prior postings of this thread in my mail > archive. On 07 Dec 2010 you wrote that this equipment is built into a > vehicle. So that pretty much means that the last paragraph applies, > unless if the issue also happens if you have cameras and PC taken out of > the vehicle and are testing it under good conditions in your office. Well, the equipment is mounted on an underwater vehicle, but basically this vehicle is just a water-proof methacrylate cylinder with aluminum plates, with all the components inside. When the cylinder is opened, the problem is by far less frequent. I mean that the cameras can work for hours without errors and and only rare times they hang up. When the cylinder is closed, the error happens just after 20-40 minutes of operation. To isolate the problem, and suspecting that it may be related to heat accumulation, we performed several tests under good conditions in the office with the cylinder opened. First, the problem is present with two different firewire cards: RTD CM17208 and Eurotech COM-1461, both of them with TSB82AA2 and TSB81BA3. To check for a possible power problem, I connected a digital real-time oscilloscope to the 12V input of the firewire card with infinite persistence. The signal seemed quite normal (range 12V - 12'5 V, never below 12V) and no glitches were present during the failures. To discard a problem with the cameras, we took them out of the cylinder and plugged them to another PC and firewire card using the same cables. Applying heat externally and measuring temperature, they worked for hours without exhibiting any problem. Thus, from the list on 5, I would discard: - Plainly damaged hardware of course, - badly constructed or miswired or too long cables, - too weak or electrically too noisy power supplies, So it leaves us with the following list of possible causes: - badly laid out circuit boards, - badly selected or overheating clocking crystals, - overheating physical interface chips, - electromagnetic interference from within one of the FireWire nodes or from external sources. In any case, it seems that this is not a software problem, so sorry for the noise on the list. We will try to apply some extra cooling to see if the problem still persists, and let you know. Thank you very much for your help! |