From: Federico B. <fed...@da...> - 2010-11-12 14:32:03
|
Hello all, I am developing a kernel driver for ltc244x AD series. It is an spi driver. During developing, I think I've found a bug in omap spi driver. My driver is quite simple. In probe function I start an hrtimer. Every 6ms (more or less) "timer.function" is called (I restart hrtimer using hrtimer_forward_now() in my timer.function). This function sends/receives 4 bytes via spi_async call. My "complete" function reads rx_buffer and elaborates. I have an irq function but as of now I disabled it. I simply want to have a constant time between spi reads (sort of, I know this is not real time etc.). When I insert module, everything works fine. Communication is going on, with 6ms (varying) between messages. My messages last 51us each one. I have an SPI sniffer connected to my Chestnut 40 pin header and I can see spi messages. I run some "dd if=/dev/urandom of=/dev/null" via ssh (2 different ssh) to stess-test the hardware during SPI communication. The problem manifests itself after some minutes: SPI starts sending messages without delay. This generates too much traffic on SPI and definitely hangs down my Overo. I double checked my code, I added in my "timer.function" call (which is the one called every 6ms) a simple gpio (147, for the records) switching. Oscilloscope shows 25Hz steady on that pin, so my function is called correctly at that frequency. On spi sniffer messages arrives with no delay. Every 52us a repeated message is sent. I am pretty sure that they are repeated messages because usually the first byte my driver sends in every message is cycling between 0xA0, 0xA1, 0xA2, 0xA3. When I have the error condition, the same message (for example 0xA0) is repeated many times, in fact it is repeated between timer.function (which is still called at 25Hz). After that, it switches to 0xA1 and goes on repeating itself. I'm no kernel expert, but seems that the problem is not hrtimers related. I am running kernel 2.6.33 from Sakoman git, using OE. I tried omap2_mcspi.c from head (2.6.37-rc1?) but the problem is still here. Maybe it is related with https://github.com/scottellis/overo-adc-mcp3002 cleanup null ref. Is there some of you who has already seen this problem or knows a solution? After a minute with spi rushing, this message appears on console: BUG: soft lockup - CPU#0 stuck for 61s! [omap2_mcspi:11] Modules linked in: ltc244x Pid: 11, comm: omap2_mcspi CPU: 0 Not tainted (2.6.33 #2) PC is at mcspi_wait_for_reg_bit+0x40/0x58 LR is at msecs_to_jiffies+0x20/0x28 pc : [<c0265d78>] lr : [<c0064368>] psr: 20000013 sp : cf93bf00 ip : 00000000 fp : fa09804c r10: fa098044 r9 : fa098050 r8 : ced44c43 r7 : fa098044 r6 : 00000001 r5 : c056eb20 r4 : 0000a2c0 r3 : 00000002 r2 : 00000064 r1 : 00000028 r0 : 0000a324 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 8ed38019 DAC: 00000017 Missed ticks 8 -- Federico Belvisi |
From: Ned F. <nfo...@wh...> - 2010-11-12 14:55:28
|
You might want to post your question to: spi...@li... That is where the SPI developers discuss drivers. As long as you are sure that your protocol driver is only sending one (or a few) spi_async requests each 6ms, then I agree that it sounds like this might be a controller driver bug. I don't use OMAP, so I can't help with that. On 11/12/2010 09:31 AM, Federico Belvisi wrote: > Hello all, > > > > I am developing a kernel driver for ltc244x AD series. It is an spi driver. > During developing, I think I've found a bug in omap spi driver. > > My driver is quite simple. In probe function I start an hrtimer. Every 6ms > (more or less) "timer.function" is called (I restart hrtimer using > hrtimer_forward_now() in my timer.function). This function sends/receives 4 > bytes via spi_async call. My "complete" function reads rx_buffer and > elaborates. I have an irq function but as of now I disabled it. I simply > want to have a constant time between spi reads (sort of, I know this is not > real time etc.). > > > > When I insert module, everything works fine. Communication is going on, with > 6ms (varying) between messages. My messages last 51us each one. I have an > SPI sniffer connected to my Chestnut 40 pin header and I can see spi > messages. > > > > I run some "dd if=/dev/urandom of=/dev/null" via ssh (2 different ssh) to > stess-test the hardware during SPI communication. The problem manifests > itself after some minutes: SPI starts sending messages without delay. This > generates too much traffic on SPI and definitely hangs down my Overo. I > double checked my code, I added in my "timer.function" call (which is the > one called every 6ms) a simple gpio (147, for the records) switching. > Oscilloscope shows 25Hz steady on that pin, so my function is called > correctly at that frequency. On spi sniffer messages arrives with no delay. > Every 52us a repeated message is sent. > > > > I am pretty sure that they are repeated messages because usually the first > byte my driver sends in every message is cycling between 0xA0, 0xA1, 0xA2, > 0xA3. When I have the error condition, the same message (for example 0xA0) > is repeated many times, in fact it is repeated between timer.function (which > is still called at 25Hz). After that, it switches to 0xA1 and goes on > repeating itself. > > > > I'm no kernel expert, but seems that the problem is not hrtimers related. > > > > I am running kernel 2.6.33 from Sakoman git, using OE. I tried omap2_mcspi.c > from head (2.6.37-rc1?) but the problem is still here. Maybe it is related > with > > https://github.com/scottellis/overo-adc-mcp3002 cleanup null ref. > > > > Is there some of you who has already seen this problem or knows a solution? > > > > After a minute with spi rushing, this message appears on console: > > > > BUG: soft lockup - CPU#0 stuck for 61s! [omap2_mcspi:11] > > Modules linked in: ltc244x > > > > Pid: 11, comm: omap2_mcspi > > CPU: 0 Not tainted (2.6.33 #2) > > PC is at mcspi_wait_for_reg_bit+0x40/0x58 > > LR is at msecs_to_jiffies+0x20/0x28 > > pc : [<c0265d78>] lr : [<c0064368>] psr: 20000013 > > sp : cf93bf00 ip : 00000000 fp : fa09804c > > r10: fa098044 r9 : fa098050 r8 : ced44c43 > > r7 : fa098044 r6 : 00000001 r5 : c056eb20 r4 : 0000a2c0 > > r3 : 00000002 r2 : 00000064 r1 : 00000028 r0 : 0000a324 > > Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel > > Control: 10c5387d Table: 8ed38019 DAC: 00000017 > > Missed ticks 8 > > > > > > -- > > Federico Belvisi > > > > > > > > ------------------------------------------------------------------------------ > Centralized Desktop Delivery: Dell and VMware Reference Architecture > Simplifying enterprise desktop deployment and management using > Dell EqualLogic storage and VMware View: A highly scalable, end-to-end > client virtualization framework. Read more! > http://p.sf.net/sfu/dell-eql-dev2dev > > > > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users -- Ned Forrester nfo...@wh... Oceanographic Systems Lab 508-289-2226 Office / 774-392-5352 Cell Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/ http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 |
From: Federico B. <fed...@da...> - 2010-11-12 15:39:41
|
Thank you for your reply, I'll contact them. Just a note: it is not 25Hz, but 75Hz. Indeed 6ms are 150Mhz, so a square wave on GPIO is half a period and is 75Hz. I cut off everything not needed in my code, and I still am experiencing the same problem, even without stress loading with dd. I just have to wait a little longer. If someone is interested into it, I can post it here (less than 200 code lines). -- Federico Belvisi > -----Messaggio originale----- > Da: Ned Forrester [mailto:nfo...@wh...] > Inviato: venerdì 12 novembre 2010 15:55 > A: General mailing list for gumstix users. > Cc: Federico Belvisi > Oggetto: Re: [Gumstix-users] SPI repeated messages on Overo > > You might want to post your question to: > > spi...@li... > > That is where the SPI developers discuss drivers. > > As long as you are sure that your protocol driver is only sending one > (or a few) spi_async requests each 6ms, then I agree that it sounds > like > this might be a controller driver bug. I don't use OMAP, so I can't > help with that. > > On 11/12/2010 09:31 AM, Federico Belvisi wrote: > > Hello all, > > > > > > > > I am developing a kernel driver for ltc244x AD series. It is an spi > driver. > > During developing, I think I've found a bug in omap spi driver. > > > > My driver is quite simple. In probe function I start an hrtimer. > Every 6ms > > (more or less) "timer.function" is called (I restart hrtimer using > > hrtimer_forward_now() in my timer.function). This function > sends/receives 4 > > bytes via spi_async call. My "complete" function reads rx_buffer and > > elaborates. I have an irq function but as of now I disabled it. I > simply > > want to have a constant time between spi reads (sort of, I know this > is not > > real time etc.). > > > > > > > > When I insert module, everything works fine. Communication is going > on, with > > 6ms (varying) between messages. My messages last 51us each one. I > have an > > SPI sniffer connected to my Chestnut 40 pin header and I can see spi > > messages. > > > > > > > > I run some "dd if=/dev/urandom of=/dev/null" via ssh (2 different > ssh) to > > stess-test the hardware during SPI communication. The problem > manifests > > itself after some minutes: SPI starts sending messages without delay. > This > > generates too much traffic on SPI and definitely hangs down my Overo. > I > > double checked my code, I added in my "timer.function" call (which is > the > > one called every 6ms) a simple gpio (147, for the records) switching. > > Oscilloscope shows 25Hz steady on that pin, so my function is called > > correctly at that frequency. On spi sniffer messages arrives with no > delay. > > Every 52us a repeated message is sent. > > > > > > > > I am pretty sure that they are repeated messages because usually the > first > > byte my driver sends in every message is cycling between 0xA0, 0xA1, > 0xA2, > > 0xA3. When I have the error condition, the same message (for example > 0xA0) > > is repeated many times, in fact it is repeated between timer.function > (which > > is still called at 25Hz). After that, it switches to 0xA1 and goes on > > repeating itself. > > > > > > > > I'm no kernel expert, but seems that the problem is not hrtimers > related. > > > > > > > > I am running kernel 2.6.33 from Sakoman git, using OE. I tried > omap2_mcspi.c > > from head (2.6.37-rc1?) but the problem is still here. Maybe it is > related > > with > > > > https://github.com/scottellis/overo-adc-mcp3002 cleanup null ref. > > > > > > > > Is there some of you who has already seen this problem or knows a > solution? > > > > > > > > After a minute with spi rushing, this message appears on console: > > > > > > > > BUG: soft lockup - CPU#0 stuck for 61s! [omap2_mcspi:11] > > > > Modules linked in: ltc244x > > > > > > > > Pid: 11, comm: omap2_mcspi > > > > CPU: 0 Not tainted (2.6.33 #2) > > > > PC is at mcspi_wait_for_reg_bit+0x40/0x58 > > > > LR is at msecs_to_jiffies+0x20/0x28 > > > > pc : [<c0265d78>] lr : [<c0064368>] psr: 20000013 > > > > sp : cf93bf00 ip : 00000000 fp : fa09804c > > > > r10: fa098044 r9 : fa098050 r8 : ced44c43 > > > > r7 : fa098044 r6 : 00000001 r5 : c056eb20 r4 : 0000a2c0 > > > > r3 : 00000002 r2 : 00000064 r1 : 00000028 r0 : 0000a324 > > > > Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel > > > > Control: 10c5387d Table: 8ed38019 DAC: 00000017 > > > > Missed ticks 8 > > > > > > > > > > > > -- > > > > Federico Belvisi > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > --------- > > Centralized Desktop Delivery: Dell and VMware Reference Architecture > > Simplifying enterprise desktop deployment and management using > > Dell EqualLogic storage and VMware View: A highly scalable, end-to- > end > > client virtualization framework. Read more! > > http://p.sf.net/sfu/dell-eql-dev2dev > > > > > > > > _______________________________________________ > > gumstix-users mailing list > > gum...@li... > > https://lists.sourceforge.net/lists/listinfo/gumstix-users > > > -- > Ned Forrester nfo...@wh... > Oceanographic Systems Lab 508-289-2226 Office / 774-392-5352 Cell > Applied Ocean Physics and Engineering Dept. > Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA > http://www.whoi.edu/ > http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 > http://www.whoi.edu/hpb/Site.do?id=1532 |
From: ScottEllis <sco...@gm...> - 2010-11-12 15:30:52
|
Are you doing anything in your completion callback that might sleep? That's not allowed and you might not see the problem without a load. If you do need to do something that sleeps, you'll have to defer it, probably with a work_queue. I have an example if you want it. Do you have errors in the spi_message.status field in the completion callback? If neither of those ideas gets you anywhere, you can put anything you want in your spi_message.complete field. You might want to put some bookkeeping info there and then check it in the completion callback to see it makes sense with your spi_async() calls. What you put in the spi_message.context field will be the arg to the completion callback. -- View this message in context: http://old.nabble.com/SPI-repeated-messages-on-Overo-tp30199878p30200376.html Sent from the Gumstix mailing list archive at Nabble.com. |
From: Federico B. <fed...@da...> - 2010-11-15 09:23:41
|
My completion callback is an empty function. I tried reading spi_message.status in callback and it shows always 0. This is a little code from my probe function: ... hrtimer_init(<c->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); ltc->timer.function = ltc244x_send_message; ltc->msg.complete = ltc244x_complete; ltc->msg.context = ltc244x_private_data; hrtimer_start(<c->timer, ktime_set(0, LTC_POLL_DELAY), HRTIMER_MODE_REL); ... My ltc244x_send_message is as follows: ... spi_message_init(<c->msg); ltc->msg_buf->tx_buffer[0] = 1; // write some fixed data ltc->msg_buf->tx_buffer[1] = 10; ltc->msg_buf->tx_buffer[2] = 0; ltc->msg_buf->tx_buffer[3] = 0; ltc->xfer->tx_buf = ltc->msg_buf->tx_buffer; ltc->xfer->rx_buf = ltc->msg_buf->rx_buffer; ltc->xfer->len = 4; ltc->xfer->cs_change = 0; ltc244x_private_data->ltc244x = ltc; ltc->msg.complete = ltc244x_complete; ltc->msg.context = ltc244x_private_data; spi_message_add_tail(ltc->xfer, <c->msg); spi_async(ltc->spi, <c->msg); missed = hrtimer_forward_now(<c->timer, ktime_set(0, LTC_POLL_DELAY)); return HRTIMER_RESTART; I double checked every pointer in order to find some null pointer. The structures are all allocated via kzalloc, just to be DMA-safe. LTC_POLL_DELAY expands to (6666666). As already said, my ltc244x_complete is empty. I configured this module on spi cs1, 1Mhz clock in board-overo.c, and disabled every colliding module (touchscreen modules). The problem arises even with no load on cpu (no dd, just load this module). If I put a printk in complete callback, the issue is a little more difficult to arise, maybe given from serial console slowness. -- Federico Belvisi > -----Messaggio originale----- > Da: ScottEllis [mailto:sco...@gm...] > Inviato: venerdì 12 novembre 2010 16:31 > A: gum...@li... > Oggetto: Re: [Gumstix-users] SPI repeated messages on Overo > > > Are you doing anything in your completion callback that might sleep? > That's not allowed and you might not see the problem without a load. > If you do need to do something that sleeps, you'll have to defer it, > probably with a work_queue. I have an example if you want it. > > Do you have errors in the spi_message.status field in the completion > callback? > > If neither of those ideas gets you anywhere, you can put anything you > want in your spi_message.complete field. You might want to put some > bookkeeping info there and then check it in the completion callback to > see it makes sense with your spi_async() calls. What you put in the > spi_message.context field will be the arg to the completion callback. > > -- > View this message in context: http://old.nabble.com/SPI-repeated- > messages-on-Overo-tp30199878p30200376.html > Sent from the Gumstix mailing list archive at Nabble.com. > > > ----------------------------------------------------------------------- > ------- > Centralized Desktop Delivery: Dell and VMware Reference Architecture > Simplifying enterprise desktop deployment and management using > Dell EqualLogic storage and VMware View: A highly scalable, end-to-end > client virtualization framework. Read more! > http://p.sf.net/sfu/dell-eql-dev2dev > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users |
From: ScottEllis <sco...@gm...> - 2010-11-15 13:31:55
|
It's kind of strange that you saw this in your oops message. PC is at mcspi_wait_for_reg_bit+0x40/0x58 That function only gets called in omap2_mcspi_txrx_pio() but that only gets called if len < DMA_MIN_BYTES ... omap2_mcspi_work() ... if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES) count = omap2_mcspi_txrx_dma(spi, t); else count = omap2_mcspi_txrx_pio(spi, t); ... from omap2_mcspi.c #define DMA_MIN_BYTES 8 I was going to suggest forcing pio mode transfers instead of dma, but it looks like that is already happening. Do you see why? You could set the message->rx_buff to NULL since that will change the behavior of omap2_mcspi_txrx_pio() and then see if your problem changes. That will cut about half the code from omap2_mcspi_txrx_pio() and maybe start narrowing down the problem. Otherwise, I don't see anything obvious. I know you usually call the kernel's scheduler complete() in a completion callback, but since there aren't any threads waiting on your completion, that probably doesn't matter. Since it's not working though, maybe that is worth trying too. Can't hurt. -- View this message in context: http://old.nabble.com/SPI-repeated-messages-on-Overo-tp30199878p30218881.html Sent from the Gumstix mailing list archive at Nabble.com. |
From: ScottEllis <sco...@gm...> - 2010-11-15 16:10:28
|
Federico Belvisi-3 wrote: > > I think I'm not going DMA because my transfer is only 4 bytes long, so PIO > will be used. I tried setting xfer->rx_buf to NULL buffer but the error is > still here, too many transactions. The oops message is the same as before. > Sorry about that. For some reason I was reading your msg->len as 10. No coffee. I have some time this morning. I'll try to duplicate what you are seeing here. -- View this message in context: http://old.nabble.com/SPI-repeated-messages-on-Overo-tp30199878p30220445.html Sent from the Gumstix mailing list archive at Nabble.com. |
From: Scott E. <sco...@gm...> - 2010-11-15 21:07:34
|
I am not having any luck reproducing the problem. I put together a test driver that uses an hrtimer to force a spi write of 4 bytes at 200 Hz. The SPI bus speed is 1 MHz. I think this is similar to what you are doing, maybe a little faster frequency for the timer. It's been running for awhile now. I have only been watching the CS line. The frequency stays around 200 Hz, occasionally jumping to 205 or so. Not sure why, could just be my scope. No load on the cpu though and I am getting one spi completion callback for every timer callback that where I am calling spi_async(). I am also not getting any oops. So it doesn't appear that my system is experiencing the problem you are describing.. Here is the code - https://github.com/scottellis/spi-scrap Maybe you could try it and see how it behaves with your board or let me know if I still don't understand the problem. I tested with a Tobi board and kernel 2.6.34. |
From: Scott E. <sco...@gm...> - 2010-11-15 23:19:04
|
Hi Federico, I do have one more data point for you. That initial Tobi board I was testing with starting getting very erratic after about 30 minutes watching the CS line with a scope. Still no oops errors or problems with the spi/timer callback counts, but the signal started getting very noisy. So much that the avg frequency was jumping from 50 to 300 Hz. I switched over to another board, a Summit. It's been running for about 1.5 hours now with the signal still steady as can be at 200 Hz. Hardly any noise. I used the same COM board, same image as I was using with the Tobi. I think the problem with my Tobi might be a bad solder job on the expansion header. Removing power from the board for a few minutes and restarting still didn't help it. Neither did trying a new COM on that board. Maybe that helps. Probably a completely different issue. On 11/15/10, Scott Ellis <sco...@gm...> wrote: > I am not having any luck reproducing the problem. > > I put together a test driver that uses an hrtimer to force a spi write > of 4 bytes > at 200 Hz. The SPI bus speed is 1 MHz. I think this is similar to what you > are > doing, maybe a little faster frequency for the timer. > > It's been running for awhile now. I have only been watching the CS line. > The frequency stays around 200 Hz, occasionally jumping to 205 or so. > Not sure why, could just be my scope. No load on the cpu though and I > am getting one spi completion callback for every timer callback that > where I am calling spi_async(). > > I am also not getting any oops. So it doesn't appear that my system is > experiencing the problem you are describing.. > > Here is the code - https://github.com/scottellis/spi-scrap > > Maybe you could try it and see how it behaves with your board or let me > know if I still don't understand the problem. > > I tested with a Tobi board and kernel 2.6.34. > |
From: Federico B. <fed...@da...> - 2010-11-15 14:15:17
|
I think I'm not going DMA because my transfer is only 4 bytes long, so PIO will be used. I tried setting xfer->rx_buf to NULL buffer but the error is still here, too many transactions. The oops message is the same as before. Sorry for the misunderstanding, I did not call complete() in my completion callback. My completion callback is empty. Please consider in my previous email "complete == spi completion callback". For the record, here it is: static void ltc244x_complete(void *ltc244x_private) { /* struct ltc244x_private *ltc_private = ltc244x_private; struct ltc244x *ltc = ltc_private->ltc244x; */ } Here is a little dump from my spi sniffer. Columns are: Index, Absolute time m:s.ms.us, Duration, length, Error, Record, Data (odd bytes: MOSI, even bytes: MISO). 5311,0:15.949.133,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5314,0:15.955.803,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5317,0:15.962.474,47.600 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5320,0:15.969.135,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5323,0:15.975.805,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5326,0:15.982.476,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5329,0:15.989.137,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5332,0:15.995.808,47.300 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5335,0:16.002.477,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5338,0:16.009.138,47.600 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5341,0:16.015.802,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5344,0:16.022.479,47.700 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5347,0:16.029.142,47.500 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5350,0:16.035.811,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5353,0:16.042.487,47.500 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5356,0:16.049.149,48.100 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5359,0:16.055.806,47.400 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5362,0:16.062.473,47.600 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5365,0:16.078.964,48.600 us,4 B,,Transaction,010F 0AFF 00FF 00FF 5368,0:16.079.014,47.200 us,4 B,,Transaction,01FF 0AFF 00FF 00FF 5371,0:16.079.062,47.100 us,4 B,,Transaction,01FF 0AFF 00FF 00FF 5374,0:16.079.110,47.100 us,4 B,,Transaction,01FF 0AFF 00FF 00FF 5377,0:16.079.158,47.100 us,4 B,,Transaction,01FF 0AFF 00FF 00FF 5380,0:16.079.206,47.100 us,4 B,,Transaction,01FF 0AFF 00FF 00FF 5383,0:16.079.254,47.100 us,4 B,,Transaction,01FF 0AFF 00FF 00FF Index is contiguous because every record has 2 subrecords (tx/rx) with incrementing index. As you can see, from index 5365 the absolute time increments by about 50us, whereas until 5362 the absolute time increment is in 6-7ms order. The received message is 0xFFFFFFFF (instead of 0x0FFFFFFF), because AD is still not ready to send data and puts MISO to high state. The real presence of the AD is not needed for this issue to happen since I discard every byte rxed. The strange thing is the BIG delay between 5362 and 5365... more than 15ms. >From that point on, messages are sent with no delay, and system is hanged. After 60 seconds console shows first oops message. This time only 16 seconds after insmod and the issue showed up. -- Federico Belvisi > -----Messaggio originale----- > Da: ScottEllis [mailto:sco...@gm...] > Inviato: lunedì 15 novembre 2010 14:32 > A: gum...@li... > Oggetto: Re: [Gumstix-users] R: SPI repeated messages on Overo > > > It's kind of strange that you saw this in your oops message. > > PC is at mcspi_wait_for_reg_bit+0x40/0x58 > > That function only gets called in omap2_mcspi_txrx_pio() but that only > gets called if len < DMA_MIN_BYTES > > ... omap2_mcspi_work() ... > > if (m->is_dma_mapped || t->len >= DMA_MIN_BYTES) > count = omap2_mcspi_txrx_dma(spi, t); > else > count = omap2_mcspi_txrx_pio(spi, t); > ... > > from omap2_mcspi.c > #define DMA_MIN_BYTES 8 > > I was going to suggest forcing pio mode transfers instead of dma, > but it looks like that is already happening. Do you see why? > > You could set the message->rx_buff to NULL since that will change > the behavior of omap2_mcspi_txrx_pio() and then see if your problem > changes. That will cut about half the code from omap2_mcspi_txrx_pio() > and maybe start narrowing down the problem. > > Otherwise, I don't see anything obvious. I know you usually call the > kernel's scheduler complete() in a completion callback, but since there > aren't any threads waiting on your completion, that probably doesn't > matter. Since it's not working though, maybe that is worth trying too. > Can't hurt. > > > -- > View this message in context: http://old.nabble.com/SPI-repeated- > messages-on-Overo-tp30199878p30218881.html > Sent from the Gumstix mailing list archive at Nabble.com. > > > ----------------------------------------------------------------------- > ------- > Centralized Desktop Delivery: Dell and VMware Reference Architecture > Simplifying enterprise desktop deployment and management using > Dell EqualLogic storage and VMware View: A highly scalable, end-to-end > client virtualization framework. Read more! > http://p.sf.net/sfu/dell-eql-dev2dev > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users |
From: Federico B. <fed...@da...> - 2010-11-16 16:50:05
|
Ok, thanks to your help it seems I do have found something wrong in my code. The real difference (apart from many other things) between my code and yours is at line 135: if (scrap_msg.busy) { printk(KERN_ALERT "scrap_msg still busy in timer callback\n"); } Without that, even your code breaks spi. I think that doing something on spi_message and spi_transfer before spi_complete gets called breaks spi somewhere. scrap_msg.busy is some kind of thread sync between spi_completion and send and should be unavoidable. The spi kernel doc isn't very clear about that sync needed between send and complete, and I thought Overo could handle my spi transfer rate at no cost. With this sync between send and completion, I had more than expected "scrap_msg still busy in timer callback" messages on console. So I recompiled my kernel (it was non-preemptive) and setted on low latency desktop. I then tried renice the omap2_mcspi kernel thread, and surely it was an improvement. The best solution I could find is changing omap2_mcspi to be a create_rt_workqueue instead of create_singlethread_workqueue. SPI is working really solid as of now. With the current setup, system under 2.05 load, messages are sent at variable rate between 6530us and 6840us (hrtimer goes at 6666us). So less than 200us more or less than expected. This seems good to me. At the end, it was a crossthread error in my module. Thank you, you saved my day. -- Federico Belvisi > -----Messaggio originale----- > Da: Scott Ellis [mailto:sco...@gm...] > Inviato: martedì 16 novembre 2010 00:19 > A: gum...@li... > Oggetto: Re: [Gumstix-users] R: R: SPI repeated messages on Overo > > Hi Federico, > > I do have one more data point for you. That initial Tobi board I was > testing with > starting getting very erratic after about 30 minutes watching the CS > line with a > scope. Still no oops errors or problems with the spi/timer callback > counts, but > the signal started getting very noisy. So much that the avg frequency > was jumping > from 50 to 300 Hz. > > I switched over to another board, a Summit. It's been running for > about 1.5 hours > now with the signal still steady as can be at 200 Hz. Hardly any > noise. I used the > same COM board, same image as I was using with the Tobi. > > I think the problem with my Tobi might be a bad solder job on the > expansion header. > Removing power from the board for a few minutes and restarting still > didn't help it. > Neither did trying a new COM on that board. > > Maybe that helps. Probably a completely different issue. > > > On 11/15/10, Scott Ellis <sco...@gm...> wrote: > > I am not having any luck reproducing the problem. > > > > I put together a test driver that uses an hrtimer to force a spi > write > > of 4 bytes > > at 200 Hz. The SPI bus speed is 1 MHz. I think this is similar to > what you > > are > > doing, maybe a little faster frequency for the timer. > > > > It's been running for awhile now. I have only been watching the CS > line. > > The frequency stays around 200 Hz, occasionally jumping to 205 or so. > > Not sure why, could just be my scope. No load on the cpu though and I > > am getting one spi completion callback for every timer callback that > > where I am calling spi_async(). > > > > I am also not getting any oops. So it doesn't appear that my system > is > > experiencing the problem you are describing.. > > > > Here is the code - https://github.com/scottellis/spi-scrap > > > > Maybe you could try it and see how it behaves with your board or let > me > > know if I still don't understand the problem. > > > > I tested with a Tobi board and kernel 2.6.34. > > > > ----------------------------------------------------------------------- > ------- > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating > great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users |
From: ScottEllis <sco...@gm...> - 2010-11-16 17:33:05
|
Yeah, I think the rule is hands off the spi_message once you've called spi_async until you get the completion callback for that message. If you run into performance roadblocks with the Linux SPI framework, there is always the McBSP controller. It will do SPI too, though the Linux driver is a little harder to work with. That Tobi board issue I was having did turn out to be bad soldering. I was able to fix it. Thanks for pointing out the hrtimer framework. Never used it before. -- View this message in context: http://old.nabble.com/SPI-repeated-messages-on-Overo-tp30199878p30231086.html Sent from the Gumstix mailing list archive at Nabble.com. |