Hello all,

 

I am developing a kernel driver for ltc244x AD series. It is an spi driver. During developing, I think I’ve found a bug in omap spi driver.

My driver is  quite simple. In probe function I start an hrtimer. Every 6ms (more or less) “timer.function” is called (I restart hrtimer using hrtimer_forward_now() in my timer.function). This function sends/receives 4 bytes via spi_async call. My “complete” function reads rx_buffer and elaborates. I have an irq function but as of now I disabled it. I simply want to have a constant time between spi reads (sort of, I know this is not real time etc.).

 

When I insert module, everything works fine. Communication is going on, with 6ms (varying) between messages. My messages last 51us each one. I have an SPI sniffer connected to my Chestnut 40 pin header and I can see spi messages.

 

I run some “dd if=/dev/urandom of=/dev/null” via ssh (2 different ssh) to stess-test the hardware during SPI communication. The problem manifests itself after some minutes: SPI starts sending messages without delay. This generates too much traffic on SPI and definitely hangs down my Overo. I double checked my code, I added in my “timer.function” call (which is the one called every 6ms) a simple gpio (147, for the records) switching. Oscilloscope shows 25Hz steady on that pin, so my function is called correctly at that frequency. On spi sniffer messages arrives with no delay. Every 52us a repeated message is sent.

 

I am pretty sure that they are repeated messages because usually the first byte my driver sends in every message is cycling between 0xA0, 0xA1, 0xA2, 0xA3. When I have the error condition, the same message (for example 0xA0) is repeated many times, in fact it is repeated between timer.function (which is still called at 25Hz). After that, it switches to 0xA1 and goes on repeating itself.

 

I’m no kernel expert, but seems that the problem is not hrtimers related.

 

I am running kernel 2.6.33 from Sakoman git, using OE. I tried omap2_mcspi.c from head (2.6.37-rc1?) but the problem is still here. Maybe it is related with

 https://github.com/scottellis/overo-adc-mcp3002 cleanup null ref.

 

Is there some of you who has already seen this problem or knows a solution?

 

After a minute with spi rushing, this message appears on console:

 

BUG: soft lockup - CPU#0 stuck for 61s! [omap2_mcspi:11]

Modules linked in: ltc244x

 

Pid: 11, comm:          omap2_mcspi

CPU: 0    Not tainted  (2.6.33 #2)

PC is at mcspi_wait_for_reg_bit+0x40/0x58

LR is at msecs_to_jiffies+0x20/0x28

pc : [<c0265d78>]    lr : [<c0064368>]    psr: 20000013

sp : cf93bf00  ip : 00000000  fp : fa09804c

r10: fa098044  r9 : fa098050  r8 : ced44c43

r7 : fa098044  r6 : 00000001  r5 : c056eb20  r4 : 0000a2c0

r3 : 00000002  r2 : 00000064  r1 : 00000028  r0 : 0000a324

Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel

Control: 10c5387d  Table: 8ed38019  DAC: 00000017

Missed ticks 8

 

 

--

Federico Belvisi