linux1394-user Mailing List for IEEE 1394 for Linux (Page 12)
Brought to you by:
aeb,
bencollins
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(29) |
Nov
(36) |
Dec
(46) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(60) |
Feb
(82) |
Mar
(46) |
Apr
(50) |
May
(89) |
Jun
(60) |
Jul
(80) |
Aug
(130) |
Sep
(104) |
Oct
(105) |
Nov
(123) |
Dec
(107) |
2002 |
Jan
(142) |
Feb
(105) |
Mar
(63) |
Apr
(117) |
May
(136) |
Jun
(75) |
Jul
(105) |
Aug
(103) |
Sep
(149) |
Oct
(149) |
Nov
(98) |
Dec
(144) |
2003 |
Jan
(161) |
Feb
(100) |
Mar
(118) |
Apr
(126) |
May
(157) |
Jun
(173) |
Jul
(156) |
Aug
(89) |
Sep
(83) |
Oct
(106) |
Nov
(84) |
Dec
(69) |
2004 |
Jan
(119) |
Feb
(233) |
Mar
(232) |
Apr
(104) |
May
(113) |
Jun
(132) |
Jul
(87) |
Aug
(129) |
Sep
(186) |
Oct
(88) |
Nov
(148) |
Dec
(180) |
2005 |
Jan
(223) |
Feb
(176) |
Mar
(148) |
Apr
(193) |
May
(188) |
Jun
(236) |
Jul
(144) |
Aug
(89) |
Sep
(44) |
Oct
(86) |
Nov
(114) |
Dec
(89) |
2006 |
Jan
(94) |
Feb
(97) |
Mar
(57) |
Apr
(117) |
May
(46) |
Jun
(63) |
Jul
(51) |
Aug
(72) |
Sep
(50) |
Oct
(142) |
Nov
(70) |
Dec
(52) |
2007 |
Jan
(60) |
Feb
(67) |
Mar
(80) |
Apr
(81) |
May
(78) |
Jun
(52) |
Jul
(64) |
Aug
(55) |
Sep
(40) |
Oct
(87) |
Nov
(70) |
Dec
(44) |
2008 |
Jan
(80) |
Feb
(12) |
Mar
(82) |
Apr
(64) |
May
(33) |
Jun
(53) |
Jul
(41) |
Aug
(26) |
Sep
(35) |
Oct
(21) |
Nov
(30) |
Dec
(42) |
2009 |
Jan
(17) |
Feb
(32) |
Mar
(10) |
Apr
(19) |
May
(19) |
Jun
(28) |
Jul
(41) |
Aug
(14) |
Sep
(5) |
Oct
(46) |
Nov
(23) |
Dec
(20) |
2010 |
Jan
(46) |
Feb
(13) |
Mar
(9) |
Apr
(2) |
May
(19) |
Jun
(28) |
Jul
(37) |
Aug
(23) |
Sep
(5) |
Oct
(32) |
Nov
(19) |
Dec
(18) |
2011 |
Jan
(23) |
Feb
(9) |
Mar
(19) |
Apr
(38) |
May
(83) |
Jun
(30) |
Jul
(46) |
Aug
(32) |
Sep
(6) |
Oct
(3) |
Nov
(25) |
Dec
(31) |
2012 |
Jan
(21) |
Feb
(12) |
Mar
(19) |
Apr
(7) |
May
(27) |
Jun
(7) |
Jul
(2) |
Aug
(15) |
Sep
(8) |
Oct
(11) |
Nov
|
Dec
(3) |
2013 |
Jan
|
Feb
|
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(14) |
Jul
(1) |
Aug
(3) |
Sep
(4) |
Oct
(5) |
Nov
|
Dec
|
2014 |
Jan
(14) |
Feb
(2) |
Mar
(1) |
Apr
(3) |
May
(2) |
Jun
(7) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
(14) |
Nov
(5) |
Dec
(18) |
2015 |
Jan
(3) |
Feb
(1) |
Mar
(3) |
Apr
(5) |
May
|
Jun
(3) |
Jul
(4) |
Aug
|
Sep
(2) |
Oct
(11) |
Nov
(2) |
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(5) |
Jul
(1) |
Aug
(1) |
Sep
(2) |
Oct
(2) |
Nov
|
Dec
|
2017 |
Jan
|
Feb
(5) |
Mar
(7) |
Apr
|
May
(4) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
|
May
(1) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
2021 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Clemens L. <cl...@la...> - 2012-01-11 11:49:05
|
Stefan Richter wrote: > On Jan 10 Stefan Richter wrote: >> Clemens refers to an implementation limit deep down in the firewire-ohci >> controller driver. There is a per-context temporary buffer used to >> generate the "isochronous packets received" events. This buffer holds >> isochronous packet header data and timestamps (usually 4 + 4 bytes per >> packet) and is one page size big (usually 4 kBytes). >> >> If this buffer is exhausted in a single isochronous interrupt event, >> firewire-ohci silently discards information of the remaining packets, >> right? >> >> To document it would be one way, to improve the implementation for a more >> flexible API another more user friendly way... > > PS: > Besides this fixed-size buffer, there is also a dynamically allocated > variable-size buffer used in each isochronous interrupt event which the > kernel keeps until the user process read() the event. > > Can we change the implementation (a) to never discard information about > received/ sent iso packets, We could use more than one page-sized buffer for this information, but this would require more allocations in interrupt context, and the callback would then have to read this information from multiple buffers. It would be easier and more consistent to flush the packet information when the buffer is about to overflow. > (b) to generate multiple fw_cdev_event_iso_interrupt per interrupt > packet This is then implied by (a). I've been planning to add a function (ioctl) to allow clients to flush this information; this would allow sound drivers to get more precise position information. Regards, Clemens |
From: Stefan R. <st...@s5...> - 2012-01-10 15:07:15
|
On Jan 10 Stefan Richter wrote: > On Jan 10 Clemens Ladisch wrote: > > Alexander Neundorf wrote: > > > I seem to have a strange problem with the irq_interval parameter of > > > raw1394_iso_recv_init(). > > > It seems to have an upper limit of 512. Is that possible ? > > > > No, this parameter itself does not have an upper limit. > > > > > If I set it to higher values than that, I still get the callback after 512 > > > packets have arrived. > > > > Actually, you get the callback after the correct number of packets. > > What you do not get is information about more than 512 of those packets. > > > > It would have been a good idea to document juju's 4 KB limit ... > > Clemens refers to an implementation limit deep down in the firewire-ohci > controller driver. There is a per-context temporary buffer used to > generate the "isochronous packets received" events. This buffer holds > isochronous packet header data and timestamps (usually 4 + 4 bytes per > packet) and is one page size big (usually 4 kBytes). > > If this buffer is exhausted in a single isochronous interrupt event, > firewire-ohci silently discards information of the remaining packets, > right? > > To document it would be one way, to improve the implementation for a more > flexible API another more user friendly way... PS: Besides this fixed-size buffer, there is also a dynamically allocated variable-size buffer used in each isochronous interrupt event which the kernel keeps until the user process read() the event. Can we change the implementation (a) to never discard information about received/ sent iso packets, (b) to generate multiple fw_cdev_event_iso_interrupt per interrupt packet in order to keep the atomic allocations <= PAGE_SIZE? (The larger the in-kernel allocations, the more likely they will fail due to memory fragmentation. Availability of in-kernel allocations smaller than a page is not affected be memory fragmentation.) (c) Might we be able to collapse the two kernel-internal header buffers into a single one? -- Stefan Richter -=====-===-- ---= -=-=- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-01-10 14:53:57
|
On Jan 10 Clemens Ladisch wrote: > Alexander Neundorf wrote: > > I seem to have a strange problem with the irq_interval parameter of > > raw1394_iso_recv_init(). > > It seems to have an upper limit of 512. Is that possible ? > > No, this parameter itself does not have an upper limit. > > > If I set it to higher values than that, I still get the callback after 512 > > packets have arrived. > > Actually, you get the callback after the correct number of packets. > What you do not get is information about more than 512 of those packets. > > It would have been a good idea to document juju's 4 KB limit ... Clemens refers to an implementation limit deep down in the firewire-ohci controller driver. There is a per-context temporary buffer used to generate the "isochronous packets received" events. This buffer holds isochronous packet header data and timestamps (usually 4 + 4 bytes per packet) and is one page size big (usually 4 kBytes). If this buffer is exhausted in a single isochronous interrupt event, firewire-ohci silently discards information of the remaining packets, right? To document it would be one way, to improve the implementation for a more flexible API another more user friendly way... -- Stefan Richter -=====-===-- ---= -=-=- http://arcgraph.de/sr/ |
From: Clemens L. <cl...@la...> - 2012-01-10 12:22:01
|
Alexander Neundorf wrote: > I seem to have a strange problem with the irq_interval parameter of > raw1394_iso_recv_init(). > It seems to have an upper limit of 512. Is that possible ? No, this parameter itself does not have an upper limit. > If I set it to higher values than that, I still get the callback after 512 > packets have arrived. Actually, you get the callback after the correct number of packets. What you do not get is information about more than 512 of those packets. It would have been a good idea to document juju's 4 KB limit ... Regards, Clemens |
From: Alexander N. <neu...@kd...> - 2012-01-08 21:56:09
|
Hi, I seem to have a strange problem with the irq_interval parameter of raw1394_iso_recv_init(). It seems to have an upper limit of 512. Is that possible ? If I set it to higher values than that, I still get the callback after 512 packets have arrived. From looking at the libraw1394 2.0.7 sources I don't see anything like this. This is on a Slackware 13.37, kernel 2.6.37.6, 32bit x86, juju stack, OHCI driver. Before I was using much smaller values, so I'm not sure whether this is something new or not. Thanks Alex |
From: Stefan R. <st...@s5...> - 2012-01-08 13:43:41
|
Hi all, Linux 3.2 was released last week. The IEEE 1394 kernel drivers received the following changes, among less notable ones: firewire-net: - Increase throughput by use of unified transactions for incoming datagrams. firewire-ohci: - Add support for TI TSB41BA3D as local PHY. This PHY is found on special 1394b-400 cards. - Fix bus topology recognition in presence of a cycle master node with a wrong gap count. The issue was observed when switching off a bus-powered DesktopKonnekt6 audio device. - Properly synchronize isochronous I/O buffers on non-coherent architectures. Architectures with cache-coherent DMA like x86 or PPC were not affected. -- Stefan Richter -=====-===-- ---= -=--- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-01-05 18:45:53
|
On Jan 02 Stefan Richter wrote: > Dec 28 08:38:30 knuth kernel: [33372.539199] sd 8:0:0:0: Device offlined - not ready after error recovery > Dec 28 08:38:30 knuth kernel: [33372.539261] sd 8:0:0:0: [sdd] Unhandled error code > Dec 28 08:38:30 knuth kernel: [33372.539269] sd 8:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK > Dec 28 08:38:30 knuth kernel: [33372.539281] sd 8:0:0:0: [sdd] CDB: Write(10): 2a 00 2f 5c fe f8 00 04 00 00 > > The DID_BUS_BUSY error number was generated by firewire-sbp2 to tell > the SCSI core error handler that right now, commands cannot be sent and > it should try later again. In that case, the bus was "busy" because > firewire-sbp2 had to reconnect or re-login after those bus resets. > > I am surprised though that SCSI core behaves seemingly fragile here. > Its error handler should easily get through a typical 1394 bus reset > period. Note to self: Maybe that should be DID_REQUEUE instead of DID_BUS_BUSY. > One thought that I am having for a while now but wasn't able to try > out yet: Perhaps firewire-sbp2's bus reset handling should be changed > such that it holds on to a pending SCSI request and sends it again > after reset. That way, SCSI core does not have to requeue that request > but only sees that the request is taking a little bit longer than usual > and thus the (currently only one request deep) request queue is filled > for that entire time. > > Of course, that would be an optimization for normal bus resets and > might still not save you from bus resets due to malfunctioning hardware. -- Stefan Richter -=====-===-- ---= --=-= http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-01-04 18:34:08
|
On Jan 03 Keith Smith wrote: > > - If you were keen on FireWire 800, then you will likely only find > > TI TSB82AA2 + TSB81BA3 based CardBus cards, and TSB81BA3 comes with > > several serious errata which make it practically useless for S800β or > > any other β mode (whereas α modes work alright, i.e. classic 1394a > > "FireWire 400" etc.). > > As it happens I was considering FW800 as the RAID enclosure currently > attached does support it. If I decide to do this, I will avoid the > TSB81BA3. Card vendors rarely are able and willing to tell which chipset they use. So the only way to find out will likely be to stick the card into a Linux laptop and run lsfirewirephy from Clemens Ladisch's jujuutils. (lspci only shows the link layer chip, whereas TSB81BA3 is a physical layer chip.) The newer versions of that chip, TSB81BA3D and E, are fine, but unfortunately TSB82AA2 link layer based cards are usually still sold with the outdated TSB81BA3. Still, with some luck that PHY might work OK in combination with the particular 1394b PHY of your RAID enclosure, or you may get it to work by adding a 1394a device to the bus. -- Stefan Richter -=====-===-- ---= --=-- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-01-03 21:29:32
|
Hi all, it took longer than necessary, but now the source release tarballs of libraw1394 v0.5...v2.0.7 and libiec61883 v1.0.0...v1.2.0 are finally available at their former place at kernel.org again: ftp://ftp.kernel.org/pub/linux/libs/ieee1394/ http://www.kernel.org/pub/linux/libs/ieee1394/ https://www.kernel.org/pub/linux/libs/ieee1394/ There is a general change of policies and processes at kernel.org: Uploaded files are no longer gpg-signed by a daemon at kernel.org but by the person who uploaded them. Only signatures for the original *.tar files are installed at kernel.org. This means that you first need to uncompress a downloaded file before you can verify its authenticity, e.g.: $ wget ftp://ftp.kernel.org/pub/linux/libs/ieee1394/libiec61883-1.2.0.tar.{xz,sign} $ xz -d libiec61883-1.2.0.tar.xz $ gpg --verify libiec61883-1.2.0.tar.sign The tarballs at above URLs are the same as the ones that were there before the kernel.org break-in last year, i.e. I did not generate new ones. I did not have own backups of all of them though; I had to collect some of them from stale kernel.org FTP mirrors. I verified their contents against git checkouts of the respective versions though. A pro pos: - Since wiki.kernel.org is still read-only, and with it the ToDo list of ieee1394.wiki.kernel.org, I entered the open issues of libraw1394 at https://redmine.user.in-berlin.de/projects/libraw1394/issues now. This one is read-only for you all too, so just post any issue reports to the mailing list. Or better yet, post patches. :-) - libraw1394.git is still only two commits ahead of v2.0.7, but I should get a v2.0.8 release out sooner than later (with the usual loose semantics of "soon"). - As before, libiec61883 is maintained by Dan; I am just his proxy for file upload to kernel.org. -- Stefan Richter -=====-===-- ---= ---== http://arcgraph.de/sr/ |
From: Keith S. <kei...@ke...> - 2012-01-03 01:48:13
|
Both, Very good tips. Thanks for the replies. > BTW, if this laptop can be opened without special tools, you should check > whether you can clean it inside to reduce the chance of local hotspots due > to dust. If there are any internal cables, make sure they are still > firmly seated. Will do. > - If you were keen on FireWire 800, then you will likely only find > TI TSB82AA2 + TSB81BA3 based CardBus cards, and TSB81BA3 comes with > several serious errata which make it practically useless for S800β or > any other β mode (whereas α modes work alright, i.e. classic 1394a > "FireWire 400" etc.). As it happens I was considering FW800 as the RAID enclosure currently attached does support it. If I decide to do this, I will avoid the TSB81BA3. Regards, Keith. |
From: Stefan R. <st...@s5...> - 2012-01-03 00:51:16
|
On Jan 02 Carl Karsten wrote: > On Mon, Jan 2, 2012 at 3:53 PM, Keith Smith > <kei...@ke...> wrote: >> As I noted in my original post I have tried several enclosures and since >> then have also replaced all the cables. The problem remains. Given >> your assessment, that would strongly suggest that I have an intermittent >> component failure on the Viao as a result of the component overheating >> after sustained use, probably the port itself, and that the upgrade to >> Ubuntu 11 is coincidental. > >> This could be. It is an old machine and the cable would not have been >> touched for few years until the upgrade at which point I probably >> disconnected and reconnected it few times during the process. If the >> port was close to developing a fault this may have pushed it over the >> edge. The problem might have been there even before the update but the newer kernel might be less graceful about it. BTW, if this laptop can be opened without special tools, you should check whether you can clean it inside to reduce the chance of local hotspots due to dust. If there are any internal cables, make sure they are still firmly seated. >> Question: If I find an old 1394 PCMCIA card will that be seen firewire_sbp2? >> Is there a list of known compatible cards anywhere? > > Seems any will work. I have 4 or so different models of > pcmica/pccard plus 5 express cards (29 cards total) and a bunch of > pci cards, and a bunch of laptops with on board ports: all of them > work. Indeed any one will work. They are all OHCI-1394 compatible (except for an ancient and extremely rare CardBus card which was TI PCILynx based rather than OHCI based; but you won't find those cards anywhere). And not only are they all theoretically following the OHCI-1394 spec, by now we also have got workarounds for the various chip quirks of the controllers that are found on CardBus cards. There are only some extra considerations if there are special requirements: - If you were keen on FireWire 800, then you will likely only find TI TSB82AA2 + TSB81BA3 based CardBus cards, and TSB81BA3 comes with several serious errata which make it practically useless for S800β or any other β mode (whereas α modes work alright, i.e. classic 1394a "FireWire 400" etc.). - Several FireWire CardBus cards do not support the maximum possible packet size that the 1394 spec allows at a given speed. This somewhat reduces throughput of asynchronous protocols like SBP-2 or IP-over-1394 compared to the same controller as a PCI card or as onboard controller. I am not sure which chips, when used on CardBus cards, have that small drawback and why. I have got CardBus cards with NEC, VIA, and TI chips, and of them only the TI based cards support the maximum packet size. - Folks who plan to use a FireWire audio interface on a CardBus card should look for an Agere or TI based card. Actually any of the chipsets that are usually found on CardBus cards should do, but Agere and TI ones are most reliable for audio streaming and are indeed the only option if one wanted to use >= 2 FireWire audio devices together on the same bus. I don't really know if there are Agere based FireWire CardBus cards, but there are a few TI based ones available. - VIA VT6306 has got issues with DV capture via gstreamer or dv4l (whereas dvgrab and kino are fine with that chip). Another point worth noting about FireWire CardBus cards is that even if they feature 6-pin or 9-pin ports, their ports do not provide bus power. Some CardBus cards have a small coaxial power input socket though; if an extra PSU is plugged into that, the 6-/ 9-pin ports do supply bus power. Likewise with FireWire ExpressCard cards. > Some cards need one of these loaded: > > # acpiphp > # pciehp > # yenta_socket > > probably the last one. middle looks like it is for PCIE Hot Plug. yenta_socket is the one for CardBus, the other ones are for ExpressCard or other kinds of hotpluggable PCI Express slots. Some ancient CardBus bridges may not be supported by yenta_socket, but web search results suggest that Ricoh Co Ltd RL5c476 II is among the many supported bridges. -- Stefan Richter -=====-===-- ---= ---== http://arcgraph.de/sr/ |
From: Carl K. <ca...@pe...> - 2012-01-02 22:19:57
|
On Mon, Jan 2, 2012 at 3:53 PM, Keith Smith <kei...@ke...> wrote: > > Question: If I find an old 1394 PCMCIA card will that be seen firewire_sbp2? Is there a list of known compatible cards anywhere? Seems any will work. I have 4 or so different models of pcmica/pccard plus 5 express cards (29 cards total) and a bunch of pci cards, and a bunch of laptops with on board ports: all of them work. Some cards need one of these loaded: # acpiphp # pciehp # yenta_socket probably the last one. middle looks like it is for PCIE Hot Plug. -- Carl K |
From: Keith S. <kei...@ke...> - 2012-01-02 21:53:13
|
On 2 Jan 2012, at 16:48, Stefan Richter wrote: > [an excellent analysis of the log] > Bus resets normally only happen if devices are plugged in to or out of > the 1394 bus or are switched on or off; or shortly thereafter when a > bus manager software (a firmware or Linux' firewire-core etc.) > reconfigured the bus for the changed topology, or when a device finished > initializing its higher functions or switched some higher functions off. > > But I take it that there was nothing of this sort going on when you > captured that log; i.e. the bus was reset (multiple times even) while > only the normal traffic went on over the bus. Correct. There was a long running 'cp' job occurring at the time. > This plainly means that the bus is electrically unstable. There may > be an unreliable connector or bad cable or badly laid out PCB or an > overheated component or any combination of this. As I noted in my original post I have tried several enclosures and since then have also replaced all the cables. The problem remains. Given your assessment, that would strongly suggest that I have an intermittent component failure on the Viao as a result of the component overheating after sustained use, probably the port itself, and that the upgrade to Ubuntu 11 is coincidental. This could be. It is an old machine and the cable would not have been touched for few years until the upgrade at which point I probably disconnected and reconnected it few times during the process. If the port was close to developing a fault this may have pushed it over the edge. Bugger. Question: If I find an old 1394 PCMCIA card will that be seen firewire_sbp2? Is there a list of known compatible cards anywhere? Thanks for your help, Keith. |
From: Carl K. <ca...@pe...> - 2012-01-02 19:33:45
|
On Mon, Jan 2, 2012 at 9:21 AM, Stefan Richter <st...@s5...> wrote: > On Jan 02 Carl Karsten wrote: >> On Sun, Jan 1, 2012 at 11:04 PM, Carl Karsten <ca...@pe...> wrote: >> > On Sun, Jan 1, 2012 at 9:50 PM, Carl Karsten <ca...@pe...> wrote: >> >> when I plug in a firewire drive, this is all I see in dmesg: >> >> >> >> [10518.283056] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 >> >> [10518.807684] firewire_core: created device fw1: GUID 00203702003442f7, S400 >> >> >> >> If I boot the same box a live 10.04 all is fine. >> >> >> >> If I boot 11.10, plug in dv cam, kino, camera stream shows fine. >> >> >> >> modprobe firewire-sbp2 >> >> I wonder what was sposed to load that - like something in the hot plug system? >> >> Doesn't solve the esata, but that isn't this lists problem :) > > udev should automatically load firewire-sbp2 whenever an SBP-2 unit is > detected, = when a /sys/bus/firewire/devices/fw*.*/ is created whose > modalias matches ieee1394:ven*mo*sp0000609Ever00010483*. > > $ cd /sys/bus/firewire/devices/ > $ grep -E "ieee1394:ven.*mo.*sp0000609Ever00010483.*" */modalias > fw3.0/modalias:ieee1394:ven000001D2mo00000000sp0000609Ever00010483 > fw6.0/modalias:ieee1394:ven000001D2mo00000000sp0000609Ever00010483 > So I have got two SBP-2 devices attached at the moment. > > Check whether firewire-sbp2 is blacklisted in /etc/modprobe.d/* > or /etc/modprobe.conf. It shouldn't be. udev - oh right - here is my problem: carl@dc10:~$ cat /etc/udev/rules.d/fw-beep.rules SUBSYSTEM=="firewire", ACTION=="add", RUN="/usr/bin/sox -n -d synth .1 sin 700" Seems I need RUN+= else it replaces with the other things queued up, like loading firewire-sbp2. And I have similar rule for beeping when cards are added/removed, thus the esata problem. For the record, here is the default for 11.10: carl@dc10:~$ cat /etc/modprobe.d/blacklist-firewire.conf # Select the legacy firewire stack over the new CONFIG_FIREWIRE one. blacklist ohci1394 blacklist sbp2 blacklist dv1394 blacklist raw1394 blacklist video1394 #blacklist firewire-ohci #blacklist firewire-sbp2 Not sure why they include #lines, maybe to make it easy to swap to the old stack. -- Carl K |
From: Stefan R. <st...@s5...> - 2012-01-02 16:48:29
|
On Jan 01 Keith Smith wrote: > Hi all, > > This is a resend as my previous post seemed to come though without content.... trying again... It does show up empty in sourceforge-net's list archive indeed, but it is readable just fine the way it arrived at list subscribers and in these two list archives: http://marc.info/?l=linux1394-user http://news.gmane.org/gmane.linux.kernel.firewire.user Furthermore, I think your first message went into sourceforge-net's moderation queue, perhaps because they found something in the HTML part of the message. > --- > > > Hi guys, > > I'm hoping you can help me out here, or point me in the direction of somewhere better. I've had a quick look at the list archive but didn't see my exact problem, but apologies in advance if I missed something. > > I recently upgraded an old laptop I use for file and print (Sony Viao PCG-GR215SP(DE) Pentium IIIm 1.2GHz) from Ubuntu 8 to 11.4 and now I am having real problems with firewire connected disks. > > In short, after several hours of good operation "something" causes the bus to sbp2_scsi_abort, usually during a write operation, triggering a cascade failure where the drive winds up remounted read-only and reliable (an umount in this state will sometimes hang). > > Here is a typical log of events: > http://pastebin.ca/2097049 It is short enough; I am including it here for easier reading: ··········································································· Dec 28 08:38:26 knuth kernel: [33369.024072] firewire_sbp2: fw2.0: sbp2_scsi_abort Dec 28 08:38:30 knuth kernel: [33372.524063] firewire_sbp2: fw1.0: orb reply timed out, rcode=0x11 Dec 28 08:38:30 knuth kernel: [33372.524157] firewire_sbp2: fw1.0: failed to reconnect Dec 28 08:38:30 knuth kernel: [33372.539057] firewire_sbp2: fw2.0: orb reply timed out, rcode=0x00 Dec 28 08:38:30 knuth kernel: [33372.539123] firewire_sbp2: fw2.0: failed to reconnect Dec 28 08:38:30 knuth kernel: [33372.539199] sd 8:0:0:0: Device offlined - not ready after error recovery Dec 28 08:38:30 knuth kernel: [33372.539261] sd 8:0:0:0: [sdd] Unhandled error code Dec 28 08:38:30 knuth kernel: [33372.539269] sd 8:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK Dec 28 08:38:30 knuth kernel: [33372.539281] sd 8:0:0:0: [sdd] CDB: Write(10): 2a 00 2f 5c fe f8 00 04 00 00 Dec 28 08:38:30 knuth kernel: [33372.539307] end_request: I/O error, dev sdd, sector 794623736 Dec 28 08:38:30 knuth kernel: [33372.539375] Buffer I/O error on device sdd1, logical block 99327711 Dec 28 08:38:30 knuth kernel: [33372.539443] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539459] Buffer I/O error on device sdd1, logical block 99327712 Dec 28 08:38:30 knuth kernel: [33372.539528] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539538] Buffer I/O error on device sdd1, logical block 99327713 Dec 28 08:38:30 knuth kernel: [33372.539606] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539617] Buffer I/O error on device sdd1, logical block 99327714 Dec 28 08:38:30 knuth kernel: [33372.539685] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539695] Buffer I/O error on device sdd1, logical block 99327715 Dec 28 08:38:30 knuth kernel: [33372.539763] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539773] Buffer I/O error on device sdd1, logical block 99327716 Dec 28 08:38:30 knuth kernel: [33372.539840] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539851] Buffer I/O error on device sdd1, logical block 99327717 Dec 28 08:38:30 knuth kernel: [33372.539918] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.539929] Buffer I/O error on device sdd1, logical block 99327718 Dec 28 08:38:30 knuth kernel: [33372.539996] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.540040] Buffer I/O error on device sdd1, logical block 99327719 Dec 28 08:38:30 knuth kernel: [33372.540109] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.540121] Buffer I/O error on device sdd1, logical block 99327720 Dec 28 08:38:30 knuth kernel: [33372.540189] lost page write due to I/O error on sdd1 Dec 28 08:38:30 knuth kernel: [33372.540420] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.540536] sd 8:0:0:0: [sdd] Unhandled error code Dec 28 08:38:30 knuth kernel: [33372.540545] sd 8:0:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 28 08:38:30 knuth kernel: [33372.540556] sd 8:0:0:0: [sdd] CDB: Write(10): 2a 00 2f 5d 02 f8 00 02 68 00 Dec 28 08:38:30 knuth kernel: [33372.540581] end_request: I/O error, dev sdd, sector 794624760 Dec 28 08:38:30 knuth kernel: [33372.540786] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.548060] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.548354] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.548632] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.548906] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.549184] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.549456] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.549730] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.550004] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.550283] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.550560] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.550834] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.551111] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.551388] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.551663] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.551941] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.552021] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: last message repeated 57 times Dec 28 08:38:30 knuth kernel: [33372.614729] JBD2: Detected IO errors while flushing file data on sdd1-8 Dec 28 08:38:30 knuth kernel: [33372.614776] Aborting journal on device sdd1-8. Dec 28 08:38:30 knuth kernel: [33372.615398] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.616067] EXT4-fs (sdd1): delayed block allocation failed for inode 51642472 at logical offset 0 with max blocks 2048 with error -30 Dec 28 08:38:30 knuth kernel: [33372.616825] EXT4-fs (sdd1): This should not happen!! Data will be lost Dec 28 08:38:30 knuth kernel: [33372.616830] Dec 28 08:38:30 knuth kernel: [33372.620065] EXT4-fs error (device sdd1) in ext4_reserve_inode_write:5619: Journal has aborted Dec 28 08:38:30 knuth kernel: [33372.621072] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.622084] EXT4-fs (sdd1): Remounting filesystem read-only Dec 28 08:38:30 knuth kernel: [33372.623116] EXT4-fs error (device sdd1) in ext4_dirty_inode:5746: Journal has aborted Dec 28 08:38:30 knuth kernel: [33372.624247] EXT4-fs (sdd1): previous I/O error to superblock detected Dec 28 08:38:30 knuth kernel: [33372.625395] JBD2: I/O error detected when updating journal superblock for sdd1-8. Dec 28 08:38:30 knuth kernel: [33372.628596] ------------[ cut here ]------------ Dec 28 08:38:30 knuth kernel: [33372.629854] kernel BUG at /build/buildd/linux-2.6.38/fs/ext4/inode.c:2188! Dec 28 08:38:30 knuth kernel: [33372.631170] invalid opcode: 0000 [#1] SMP Dec 28 08:38:30 knuth kernel: [33372.632026] last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0f/PNP0C09:00/ACPI0003:00/power_supply/ACAD/online Dec 28 08:38:30 knuth kernel: [33372.632026] Modules linked in: nls_utf8 hfsplus appletalk radeon snd_intel8x0 snd_ac97_codec ac97_bus ttm snd_pcm drm_kms_helper pcmcia snd_timer snd drm joydev i2c_algo_bit yenta_socket soundcore ppdev pcmcia_rsrc snd_page_alloc firewire_sbp2 pcmcia_core parport_pc sony_laptop psmouse intel_rng shpchp serio_raw lp parport firewire_ohci usb_storage e100 firewire_core crc_itu_t Dec 28 08:38:30 knuth kernel: [33372.632026] Dec 28 08:38:30 knuth kernel: [33372.632026] Pid: 2177, comm: flush-8:48 Not tainted 2.6.38-13-generic-pae #53-Ubuntu Sony Corporation PCG-GR215SP(DE) Dec 28 08:38:30 knuth kernel: [33372.632026] EIP: 0060:[<c11b9676>] EFLAGS: 00010246 CPU: 0 Dec 28 08:38:30 knuth kernel: [33372.632026] EIP is at ext4_da_block_invalidatepages.clone.36+0xf6/0x100 Dec 28 08:38:30 knuth kernel: [33372.632026] EAX: 40000024 EBX: dfce5460 ECX: cd515c48 EDX: dfbe7aa0 Dec 28 08:38:30 knuth kernel: [33372.632026] ESI: 00000800 EDI: 000007ff EBP: cd515c8c ESP: cd515c34 Dec 28 08:38:30 knuth kernel: [33372.632026] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Dec 28 08:38:30 knuth kernel: [33372.632026] Process flush-8:48 (pid: 2177, ti=cd514000 task=dd411940 task.ti=cd514000) Dec 28 08:38:30 knuth kernel: [33372.632026] Stack: Dec 28 08:38:30 knuth kernel: [33372.632026] 0000000e dcd563b0 0000000e 0000000e 00000000 dfce5460 dfd262a0 dfe33380 Dec 28 08:38:30 knuth kernel: [33372.632026] dfb220c0 dfcae1a0 dfdf6da0 dfc0dae0 dfb605e0 dfc31ee0 dfb68b80 dfdf2fe0 Dec 28 08:38:30 knuth kernel: [33372.632026] dfc8e140 dfcc0ae0 dfbe7aa0 00000000 00000000 cd515e20 cd515cec c11bfb97 Dec 28 08:38:30 knuth kernel: [33372.632026] Call Trace: Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11bfb97>] mpage_da_map_and_submit+0x257/0x500 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1281c22>] ? radix_tree_gang_lookup_tag_slot+0x82/0xc0 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11bfeb7>] mpage_add_bh_to_extent+0x77/0x100 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c10ebe67>] ? find_get_pages_tag+0x37/0xf0 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11c0030>] __mpage_da_writepage+0xf0/0x190 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11c0214>] write_cache_pages_da+0x144/0x220 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11c05a2>] ext4_da_writepages+0x2b2/0x570 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c10f4a4c>] do_writepages+0x1c/0x40 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c11540ef>] writeback_single_inode+0x7f/0x1e0 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154442>] writeback_sb_inodes+0x92/0x120 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154636>] writeback_inodes_wb+0xc6/0x150 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154954>] wb_writeback+0x294/0x350 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c153225f>] ? _raw_spin_lock_irqsave+0x2f/0x50 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1149cfd>] ? get_nr_dirty_inodes+0xd/0x20 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154bc3>] wb_do_writeback+0x1b3/0x1c0 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154c41>] bdi_writeback_thread+0x71/0x200 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1154bd0>] ? bdi_writeback_thread+0x0/0x200 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c1076864>] kthread+0x74/0x80 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c10767f0>] ? kthread+0x0/0x80 Dec 28 08:38:30 knuth kernel: [33372.632026] [<c100b0fe>] kernel_thread_helper+0x6/0x10 Dec 28 08:38:30 knuth kernel: [33372.632026] Code: 55 b0 8b 44 95 b8 8b 58 14 8b 45 b4 83 c3 01 85 c0 74 08 8d 45 b4 e8 9a c9 f3 ff 39 df 0f 83 72 ff ff ff 83 c4 4c 5b 5e 5f 5d c3 <0f> 0b 0f 0b 8d b6 00 00 00 00 55 89 e5 57 56 53 83 ec 0c 3e 8d Dec 28 08:38:30 knuth kernel: [33372.632026] EIP: [<c11b9676>] ext4_da_block_invalidatepages.clone.36+0xf6/0x100 SS:ESP 0068:cd515c34 Dec 28 08:38:30 knuth kernel: [33372.774790] ---[ end trace fdef3dcca53b5b6a ]--- Dec 28 08:38:30 knuth kernel: [33372.780024] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.827160] firewire_sbp2: fw1.0: error status: 0:4 Dec 28 08:38:30 knuth kernel: [33373.061153] firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) Dec 28 08:38:30 knuth kernel: [33373.084892] firewire_sbp2: fw2.0: logged in to LUN 0000 (0 retries) ··········································································· > > > cat /proc/version: > Linux version 2.6.38-13-generic-pae (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) ) #53-Ubuntu SMP Mon Nov 28 19:41:58 UTC 2011 > > > lsmod | grep -e 1394 -e firewire: > firewire_ohci 31504 0 > firewire_core 56138 1 firewire_ohci > crc_itu_t 12627 1 firewire_core > > >lspci > 00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02) > 00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 02) > 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 01) > 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 01) > 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 01) > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41) > 00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01) > 00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 Controller (rev 01) > 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01) > 00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio Controller (rev 01) > 00:1f.6 Modem: Intel Corporation 82801CA/CAM AC'97 Modem Controller (rev 01) > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY > 02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AA22 IEEE-1394 Controller (PHY/Link Integrated) (rev 02) > 02:05.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) > 02:05.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) > 02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 41) > > > I'm not sure what the make/model of the adapter card is, but it will be whatever the onboard adapter is for the Viao. > > I have tried this with several drives and two enclosures (all of which work fine on other machines) so I don't think the problem is there. > > Happy to provide additional info. > > As you can expect this is causing me real grief. I'm really hoping you can help me out here. > > > Regards, > Keith. > > Here is the log once more, slightly reordered and with explanations: Dec 28 08:38:26 knuth kernel: [33369.024072] firewire_sbp2: fw2.0: sbp2_scsi_abort A SCSI request timed out for whatever reason. I.e. the SBP-2 bridge in the target device did not complete the SCSI transaction within the time-out which Linux' SCSI core set for that transaction. Those time-outs are usually huge, so this generally means that the target became unresponsive at least for that one request. A SCSI request timeout itself is not a problem per se because the Linux SCSI core is supposed to retry failed requests. It does of course not retry infinitely but usually two times or so. Dec 28 08:38:30 knuth kernel: [33372.524063] firewire_sbp2: fw1.0: orb reply timed out, rcode=0x11 Dec 28 08:38:30 knuth kernel: [33372.524157] firewire_sbp2: fw1.0: failed to reconnect A 1394 bus reset happened, firewire-sbp2 tried to reconnect after the bus reset, but it failed because a second bus reset happened. (rcode 0x11 = "RCODE_CANCELLED" = the SBP-2 reconnect transaction overlapped with a bus reset if I'm not mistaken.) Bus resets normally only happen if devices are plugged in to or out of the 1394 bus or are switched on or off; or shortly thereafter when a bus manager software (a firmware or Linux' firewire-core etc.) reconfigured the bus for the changed topology, or when a device finished initializing its higher functions or switched some higher functions off. But I take it that there was nothing of this sort going on when you captured that log; i.e. the bus was reset (multiple times even) while only the normal traffic went on over the bus. This plainly means that the bus is electrically unstable. There may be an unreliable connector or bad cable or badly laid out PCB or an overheated component or any combination of this. Dec 28 08:38:30 knuth kernel: [33372.539057] firewire_sbp2: fw2.0: orb reply timed out, rcode=0x00 Dec 28 08:38:30 knuth kernel: [33372.539123] firewire_sbp2: fw2.0: failed to reconnect firewire-sbp2 tried to reconnect also with the second device. It apparently was able to initiate the SBP-2 reconnect transaction but did not get reconnection status before time-out. Dec 28 08:38:30 knuth kernel: [33372.827160] firewire_sbp2: fw1.0: error status: 0:4 This error status belongs to a reconnect retry. "0:4" means "request complete, access denied". This can happen if the target already forgot the former login status when it handles this reconnect request. Dec 28 08:38:30 knuth kernel: [33373.061153] firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) When SBP-2 reconnect was rejected, the driver tries an SBP-2 login instead. This one immediately succeeded. Dec 28 08:38:30 knuth kernel: [33373.084892] firewire_sbp2: fw2.0: logged in to LUN 0000 (0 retries) It also succeeded to login again into the second SBP-2 device. But alas, the Linux SCSI core has been busily dealing with one of the two devices in the meantime and decided that there were too many SCSI transaction failures in a row and that it therefore should give up on that device (take it "offline"): Dec 28 08:38:30 knuth kernel: [33372.539199] sd 8:0:0:0: Device offlined - not ready after error recovery Dec 28 08:38:30 knuth kernel: [33372.539261] sd 8:0:0:0: [sdd] Unhandled error code Dec 28 08:38:30 knuth kernel: [33372.539269] sd 8:0:0:0: [sdd] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK Dec 28 08:38:30 knuth kernel: [33372.539281] sd 8:0:0:0: [sdd] CDB: Write(10): 2a 00 2f 5c fe f8 00 04 00 00 The DID_BUS_BUSY error number was generated by firewire-sbp2 to tell the SCSI core error handler that right now, commands cannot be sent and it should try later again. In that case, the bus was "busy" because firewire-sbp2 had to reconnect or re-login after those bus resets. I am surprised though that SCSI core behaves seemingly fragile here. Its error handler should easily get through a typical 1394 bus reset period. Admittedly, I am not actively testing such conditions with every new kernel; maybe newer ones have become less resilient (or shall I say, *even less* resilient) compared to older ones. One thought that I am having for a while now but wasn't able to try out yet: Perhaps firewire-sbp2's bus reset handling should be changed such that it holds on to a pending SCSI request and sends it again after reset. That way, SCSI core does not have to requeue that request but only sees that the request is taking a little bit longer than usual and thus the (currently only one request deep) request queue is filled for that entire time. Of course, that would be an optimization for normal bus resets and might still not save you from bus resets due to malfunctioning hardware. Dec 28 08:38:30 knuth kernel: [33372.539307] end_request: I/O error, dev sdd, sector 794623736 Dec 28 08:38:30 knuth kernel: [33372.539375] Buffer I/O error on device sdd1, logical block 99327711 Dec 28 08:38:30 knuth kernel: [33372.539443] lost page write due to I/O error on sdd1 [...] Dec 28 08:38:30 knuth kernel: [33372.540121] Buffer I/O error on device sdd1, logical block 99327720 Dec 28 08:38:30 knuth kernel: [33372.540189] lost page write due to I/O error on sdd1 Now that SCSI core gave up on the device, all previously queued up block I/Os fail. Dec 28 08:38:30 knuth kernel: [33372.540420] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.540536] sd 8:0:0:0: [sdd] Unhandled error code Dec 28 08:38:30 knuth kernel: [33372.540545] sd 8:0:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 28 08:38:30 knuth kernel: [33372.540556] sd 8:0:0:0: [sdd] CDB: Write(10): 2a 00 2f 5d 02 f8 00 02 68 00 Dec 28 08:38:30 knuth kernel: [33372.540581] end_request: I/O error, dev sdd, sector 794624760 Hmm, DID_NO_CONNECT? I wonder where this comes from. Maybe SCSI core is setting this for offline devices; not sure. Dec 28 08:38:30 knuth kernel: [33372.540786] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.548060] sd 8:0:0:0: rejecting I/O to offline device [...] Dec 28 08:38:30 knuth kernel: [33372.552021] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: last message repeated 57 times Higher levels are a bit slow to notice that they don't get a chance anymore. Dec 28 08:38:30 knuth kernel: [33372.614729] JBD2: Detected IO errors while flushing file data on sdd1-8 Dec 28 08:38:30 knuth kernel: [33372.614776] Aborting journal on device sdd1-8. Dec 28 08:38:30 knuth kernel: [33372.615398] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.616067] EXT4-fs (sdd1): delayed block allocation failed for inode 51642472 at logical offset 0 with max blocks 2048 with error -30 Dec 28 08:38:30 knuth kernel: [33372.616825] EXT4-fs (sdd1): This should not happen!! Data will be lost Dec 28 08:38:30 knuth kernel: [33372.616830] Dec 28 08:38:30 knuth kernel: [33372.620065] EXT4-fs error (device sdd1) in ext4_reserve_inode_write:5619: Journal has aborted Dec 28 08:38:30 knuth kernel: [33372.621072] sd 8:0:0:0: rejecting I/O to offline device Dec 28 08:38:30 knuth kernel: [33372.622084] EXT4-fs (sdd1): Remounting filesystem read-only Dec 28 08:38:30 knuth kernel: [33372.623116] EXT4-fs error (device sdd1) in ext4_dirty_inode:5746: Journal has aborted Dec 28 08:38:30 knuth kernel: [33372.624247] EXT4-fs (sdd1): previous I/O error to superblock detected Dec 28 08:38:30 knuth kernel: [33372.625395] JBD2: I/O error detected when updating journal superblock for sdd1-8. The filesystem layer doesn't take it too well. In fact, the failure condition triggers a bug in the ext4 filesystem code: Dec 28 08:38:30 knuth kernel: [33372.628596] ------------[ cut here ]------------ Dec 28 08:38:30 knuth kernel: [33372.629854] kernel BUG at /build/buildd/linux-2.6.38/fs/ext4/inode.c:2188! Dec 28 08:38:30 knuth kernel: [33372.631170] invalid opcode: 0000 [#1] SMP [...] Dec 28 08:38:30 knuth kernel: [33372.632026] EIP: [<c11b9676>] ext4_da_block_invalidatepages.clone.36+0xf6/0x100 SS:ESP 0068:cd515c34 Dec 28 08:38:30 knuth kernel: [33372.774790] ---[ end trace fdef3dcca53b5b6a ]--- Dec 28 08:38:30 knuth kernel: [33372.780024] sd 8:0:0:0: rejecting I/O to offline device Such bugs shouldn't exist, but they do, as quite commonly seen when hotpluggable devices are suddenly removed without prior umount. Anyway, all of this is just aftermath after the "Device offlined - not ready after error recovery", which in turn was obviously caused by a sequence of bus resets which - should be handled more gracefully by the stack of SBP-2/ SCSI/ block drivers, - probably happened de to unreliable hardware. The latter means that even a more resilient driver implementation might not save you. -- Stefan Richter -=====-===-- ---= ---=- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-01-02 15:21:40
|
On Jan 02 Carl Karsten wrote: > On Sun, Jan 1, 2012 at 11:04 PM, Carl Karsten <ca...@pe...> wrote: > > On Sun, Jan 1, 2012 at 9:50 PM, Carl Karsten <ca...@pe...> wrote: > >> when I plug in a firewire drive, this is all I see in dmesg: > >> > >> [10518.283056] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > >> [10518.807684] firewire_core: created device fw1: GUID 00203702003442f7, S400 > >> > >> If I boot the same box a live 10.04 all is fine. > >> > >> If I boot 11.10, plug in dv cam, kino, camera stream shows fine. > >> > > modprobe firewire-sbp2 > > I wonder what was sposed to load that - like something in the hot plug system? > > Doesn't solve the esata, but that isn't this lists problem :) udev should automatically load firewire-sbp2 whenever an SBP-2 unit is detected, = when a /sys/bus/firewire/devices/fw*.*/ is created whose modalias matches ieee1394:ven*mo*sp0000609Ever00010483*. $ cd /sys/bus/firewire/devices/ $ grep -E "ieee1394:ven.*mo.*sp0000609Ever00010483.*" */modalias fw3.0/modalias:ieee1394:ven000001D2mo00000000sp0000609Ever00010483 fw6.0/modalias:ieee1394:ven000001D2mo00000000sp0000609Ever00010483 So I have got two SBP-2 devices attached at the moment. Check whether firewire-sbp2 is blacklisted in /etc/modprobe.d/* or /etc/modprobe.conf. It shouldn't be. -- Stefan Richter -=====-===-- ---= ---=- http://arcgraph.de/sr/ |
From: Carl K. <ca...@pe...> - 2012-01-02 07:08:50
|
On Sun, Jan 1, 2012 at 11:04 PM, Carl Karsten <ca...@pe...> wrote: > On Sun, Jan 1, 2012 at 9:50 PM, Carl Karsten <ca...@pe...> wrote: >> when I plug in a firewire drive, this is all I see in dmesg: >> >> [10518.283056] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 >> [10518.807684] firewire_core: created device fw1: GUID 00203702003442f7, S400 >> >> If I boot the same box a live 10.04 all is fine. >> >> If I boot 11.10, plug in dv cam, kino, camera stream shows fine. >> modprobe firewire-sbp2 I wonder what was sposed to load that - like something in the hot plug system? Doesn't solve the esata, but that isn't this lists problem :) -- Carl K |
From: Carl K. <ca...@pe...> - 2012-01-02 05:04:48
|
On Sun, Jan 1, 2012 at 9:50 PM, Carl Karsten <ca...@pe...> wrote: > when I plug in a firewire drive, this is all I see in dmesg: > > [10518.283056] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > [10518.807684] firewire_core: created device fw1: GUID 00203702003442f7, S400 > > If I boot the same box a live 10.04 all is fine. > > If I boot 11.10, plug in dv cam, kino, camera stream shows fine. > Similar problem with plugging in a ECcard esata card and plugged in a drive: dmesg activity when the card is added, but nothing when the drive is plugged in. -- Carl K |
From: Carl K. <ca...@pe...> - 2012-01-02 03:50:48
|
when I plug in a firewire drive, this is all I see in dmesg: [10518.283056] firewire_core: phy config: card 0, new root=ffc1, gap_count=5 [10518.807684] firewire_core: created device fw1: GUID 00203702003442f7, S400 If I boot the same box a live 10.04 all is fine. If I boot 11.10, plug in dv cam, kino, camera stream shows fine. - Carl K |
From: Keith S. <kei...@ke...> - 2012-01-01 20:48:24
|
Hi all, This is a resend as my previous post seemed to come though without content.... trying again... --- Hi guys, I'm hoping you can help me out here, or point me in the direction of somewhere better. I've had a quick look at the list archive but didn't see my exact problem, but apologies in advance if I missed something. I recently upgraded an old laptop I use for file and print (Sony Viao PCG-GR215SP(DE) Pentium IIIm 1.2GHz) from Ubuntu 8 to 11.4 and now I am having real problems with firewire connected disks. In short, after several hours of good operation "something" causes the bus to sbp2_scsi_abort, usually during a write operation, triggering a cascade failure where the drive winds up remounted read-only and reliable (an umount in this state will sometimes hang). Here is a typical log of events: http://pastebin.ca/2097049 > cat /proc/version: Linux version 2.6.38-13-generic-pae (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) ) #53-Ubuntu SMP Mon Nov 28 19:41:58 UTC 2011 > lsmod | grep -e 1394 -e firewire: firewire_ohci 31504 0 firewire_core 56138 1 firewire_ohci crc_itu_t 12627 1 firewire_core >lspci 00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02) 00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41) 00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio Controller (rev 01) 00:1f.6 Modem: Intel Corporation 82801CA/CAM AC'97 Modem Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY 02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AA22 IEEE-1394 Controller (PHY/Link Integrated) (rev 02) 02:05.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) 02:05.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) 02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 41) I'm not sure what the make/model of the adapter card is, but it will be whatever the onboard adapter is for the Viao. I have tried this with several drives and two enclosures (all of which work fine on other machines) so I don't think the problem is there. Happy to provide additional info. As you can expect this is causing me real grief. I'm really hoping you can help me out here. Regards, Keith. |
From: Keith S. <kei...@ke...> - 2011-12-28 19:31:41
|
Hi guys, I'm hoping you can help me out here, or point me in the direction of somewhere better. I've had a quick look at the list archive but didn't see my exact problem, but apologies in advance if I missed something. I recently upgraded an old laptop I use for file and print (Sony Viao PCG-GR215SP(DE) Pentium IIIm 1.2GHz) from Ubuntu 8 to 11.4 and now I am having real problems with firewire connected disks. In short, after several hours of good operation "something" causes the bus to sbp2_scsi_abort, usually during a write operation, triggering a cascade failure where the drive winds up remounted read-only and reliable (an umount in this state will sometimes hang). Here is a typical log of events: http://pastebin.ca/2097049 > cat /proc/version: Linux version 2.6.38-13-generic-pae (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) ) #53-Ubuntu SMP Mon Nov 28 19:41:58 UTC 2011 > lsmod | grep -e 1394 -e firewire: firewire_ohci 31504 0 firewire_core 56138 1 firewire_ohci crc_itu_t 12627 1 firewire_core >lspci 00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02) 00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41) 00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio Controller (rev 01) 00:1f.6 Modem: Intel Corporation 82801CA/CAM AC'97 Modem Controller (rev 01) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY 02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AA22 IEEE-1394 Controller (PHY/Link Integrated) (rev 02) 02:05.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) 02:05.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80) 02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 41) I'm not sure what the make/model of the adapter card is, but it will be whatever the onboard adapter is for the Viao. I have tried this with several drives and two enclosures (all of which work fine on other machines) so I don't think the problem is there. Happy to provide additional info. As you can expect this is causing me real grief. I'm really hoping you can help me out here. Regards, Keith. |
From: Stefan R. <st...@s5...> - 2011-12-25 21:51:28
|
On Nov 26 Michal Suchanek wrote: > On 26 November 2011 10:36, Stefan Richter <st...@s5...> wrote: > > On Nov 22 Jack Chaney wrote: > > [block layer bugs WRT device removal] > >> I have been relying on Ubuntu 11.10 and I must reboot to minimize > >> expose to this problem. Also, I redistribute a shareware application > >> in a bootable iso file, and I need to remaster when this is fixed. > >> > >> Please send a heads-up when you know anything positive about this > >> problem. Thanks. > > > > This is not what you are waiting for to hear, but alas the current > > 3.2.0-rc still contains issues of this kind: > > http://marc.info/?l=linux-scsi&m=132214509516626 > > For me the latest 3.2 rc fixed an issue with kernel bug when removing > an USB floppy so there are some improvements. > > Or maybe I was just lucky and did not see the error because the block > device happened to always go away at the right time. > > 3.2 is worth some testing I guess, though. 3.2-rc7 panics if a program has a /dev/sr* open and issues an ioctl after the device was unplugged. Happens with 1394 CD-ROMs and USB CD-ROMs and is 100% reproducible. http://marc.info/?l=linux-scsi&m=132484684720593 When you plan to unplug a CD-ROM or card reader or any other drive which announced itself as a removable media device (or when you don't plan to unplug it but happen to trip over the cable), and you don't think you have any program running anymore which still has a /dev/sr* open, then remember: If udisks-daemon or hald is running, it may open the device at any moment and thus trigger the same kernel panic right after device removal. -- Stefan Richter -=====-==-== ==-- ==--= http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2011-12-20 20:18:47
|
On Dec 20 Carl Karsten wrote: > un/re-plugging the cable. waiting a few seconds between each (but not > watching syslog, so maybe didn't wait long enough...) - 3 or 4 worked > fine, fw2,3 created/destroyed as expected, then I unplug, and fw2 > doesn't destroy: > > juser@pc8:~$ ls /dev/fw? > /dev/fw0 /dev/fw1 /dev/fw2 > > and syslog has a ton of > Dec 20 09:08:22 pc8 kernel: [ 2325.059013] firewire_ohci: AR > evt_bus_reset, generation 0 > .... 0-255 > loop that 25 times. This can happen if a process keeps /dev/fw2 open, but in this case I think the problem is that your bus reset series was never finished by a proper self-ID-complete event. Only when the controller received selfIDs after bus reset, firewire-core gets to work and check which nodes are there and which are gone. [...] > Dec 20 09:08:23 pc8 kernel: [ 2325.704727] firewire_ohci: 2 selfIDs, generation 30, local node ID ffc1 > Dec 20 09:08:23 pc8 kernel: [ 2325.704744] firewire_ohci: selfID 0: 80458882, phy 0 [p..] S400 gc=5 +0W Lci > Dec 20 09:08:23 pc8 kernel: [ 2325.704758] firewire_ohci: selfID 0: 8145ccd4, phy 1 [c--] beta gc=5 -3W Lc > Dec 20 09:08:23 pc8 kernel: [ 2325.704797] firewire_ohci: 2 selfIDs, generation 8, local node ID ffc0 > Dec 20 09:08:23 pc8 kernel: [ 2325.704811] firewire_ohci: selfID 0: 80458882, phy 0 [p..] S400 gc=5 +0W Lci > Dec 20 09:08:23 pc8 kernel: [ 2325.704824] firewire_ohci: selfID 0: 8145ccd4, phy 1 [c--] beta gc=5 -3W Lc Here, both controllers got a self-ID-complete event with 2 nodes present in each event. [...] > Dec 20 09:08:32 pc8 kernel: [ 2334.764102] firewire_ohci: AR evt_bus_reset, generation 9 > Dec 20 09:08:32 pc8 kernel: [ 2334.764270] firewire_ohci: 1 selfIDs, generation 9, local node ID ffc0 > Dec 20 09:08:32 pc8 kernel: [ 2334.764283] firewire_ohci: selfID 0: 807f8842, phy 0 [-..] S400 gc=63 +0W Lci Here, only one controller reports a self-ID-complete event, now with the local node being the only one on the bus. But the other controller is notably silent. > Dec 20 09:08:32 pc8 kernel: [ 2334.764416] firewire_ohci: AT spd 0 tl 00, ffc0 -> ffc1, evt_missing_ack, QW req, fffff0000234 = c000001f Alas firewire-ohci does not log the PCI device name of the card... Anyhow. As a rule of thumb, anything is possible with faulty hardware. Anything bad, that is. -- Stefan Richter -=====-==-== ==-- =-=-- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2011-12-20 19:53:49
|
On Dec 20 Carl Karsten wrote: > On Tue, Dec 20, 2011 at 8:58 AM, Carl Karsten <ca...@pe...> wrote: > I am not sure how close to trunk this is, but here is what kernel > source I am using: > > juser@pc8:~/temp$ apt-get source linux-image-$(uname -r) > Reading package lists... Done > Building dependency tree > Reading state information... Done > Picking 'linux' as source package instead of 'linux-image-3.2.0-6-generic' > NOTICE: 'linux' packaging is maintained in the 'Git' version control system at: > http://kernel.ubuntu.com/git-repos/ubuntu/ubuntu-precise.git [...] > dpkg-source: info: unpacking linux_3.2.0-6.12.tar.gz Most of the time, the firewire kernel drivers in distributor kernels are identical with those in the kernel.org 2.6.x.y or 3.x.y kernel on which they are based on. > > > you could comment out the line > > > > > > device->max_speed = device->node->max_speed; > > > > > > in drivers/firewire/core-device.c and rebuild + reinstall > > > firewire-core. Then firewire-core will keep all requests to any > > > device down at S100. > > > > > > If it works then, we know that your bus is not capable to support 400 > > > Mbit/s properly but can limp along with 100 Mbit/s. > > > > juser@pc8:~$ ls /dev/fw? > > /dev/fw0 /dev/fw1 /dev/fw2 /dev/fw3 [...] > > [ 500.521851] firewire_ohci: AR spd 0 tl 3a, ffc0 -> ffc1, ack_complete, QR resp = 72650000 > > [ 500.521861] firewire_ohci: AT spd 0 tl 3a, ffc1 -> ffc0, pending/cancelled, QR req, fffff0000444 > > [ 500.522080] firewire_core: created device fw2: GUID 0108000000006351, S100 [...] > > [ 500.522924] firewire_core: created device fw3: GUID 00241b00964cac00, S100 [...] > > Any sense to making this a module parameter? Or we extend the retry mechanism to take repeated evt_missing_ack failures as a hint to try at a lower speed, and if that succeeds, log a warning and continue with degraded performance. In any case, we should make firewire-core's failure message a bit more informative than "giving up on config rom". We need to take care though: I have an SBP-2 bridge which keeps working as repeater while switched off (but being kept on bus power). It wrongly advertises "link on" in its selfID then, and all requests to it fail with evt_missing_ack. So there is nothing wrong with the cable or with the device, it is just a normal powered down state of the device (apart from one pin of its PHY not being controlled by the link as intended by the spec). Likewise, I have an old 6-port repeater (just a PHY without link) where the board designer miswired the respective pin of the PHY. So if we consistently get evt_missing_ack at fffff0000400, it can also be an awkward way of the device telling us that it does not have an active link. In that case, a log message from the kernel should not instill Fear, Uncertainty, nor Doubt into the user. > > I am now more motovated to invest in a cable tester. know of any? > > google showed me some, but I am cant tell if they are more than just a > > continuity tester. Not being an electrical engineer, I don't know what you would need for checking a 1394 cable for conformance. -- Stefan Richter -=====-==-== ==-- =-=-- http://arcgraph.de/sr/ |
From: Carl K. <ca...@pe...> - 2011-12-20 15:57:06
|
On Tue, Dec 20, 2011 at 8:58 AM, Carl Karsten <ca...@pe...> wrote: >> if you have a compiled kernel, you could comment out the line >> >> device->max_speed = device->node->max_speed; I am not sure how close to trunk this is, but here is what kernel source I am using: juser@pc8:~/temp$ apt-get source linux-image-$(uname -r) Reading package lists... Done Building dependency tree Reading state information... Done Picking 'linux' as source package instead of 'linux-image-3.2.0-6-generic' NOTICE: 'linux' packaging is maintained in the 'Git' version control system at: http://kernel.ubuntu.com/git-repos/ubuntu/ubuntu-precise.git Need to get 103 MB of source archives. Get:1 http://us.archive.ubuntu.com/ubuntu/ precise/main linux 3.2.0-6.12 (dsc) [4,736 B] Get:2 http://us.archive.ubuntu.com/ubuntu/ precise/main linux 3.2.0-6.12 (tar) [103 MB] Fetched 103 MB in 11s (8,895 kB/s) gpgv: Signature made Sun 18 Dec 2011 09:32:00 PM CST using RSA key ID FA1447CA gpgv: Can't check signature: public key not found dpkg-source: warning: failed to verify signature on ./linux_3.2.0-6.12.dsc dpkg-source: info: extracting linux in linux-3.2.0 dpkg-source: info: unpacking linux_3.2.0-6.12.tar.gz -- Carl K |