We are experiencing a recurrent Linux kernel BUG/panic with the e1000e driver on a Kontron mSP1 COM Express "mini" CPU module and other similar COM Express modules.
The BUG occurs frequently (right after booting and bringing the interface up) but not constantly. I suspect that we enter some kind of race condition.
However it can easily be reproduced by bringing the interface up and down in a loop -and ping-flooding it from another host.
on Kontron (with e1000e):
while true; do ifdown eth0; ifup eth0; done
on other Linux host:
ping -s1024 -w0 -f 192.168.4.1
The BUG/panic will then happen every time after 1-240 seconds.
If the attached patch: skb_put_nopanic.dump.patch is applied, we can see that the panic is caused by skb->tail > skb->end that leads to skb_over_panic() and BUG().
From attached messages:
skbuff: Warning: skb_put l:9250 t:0xdd455c62 e:0xdd453f40 len:9250
Furthermore, I can add, that when the patch is applied, everything seems to work perfectly before and after the situation occurs, and network traffic continues to flow. However, I know that this is only a temporary work-around and not a general solution.
I have been experimenting with various vanilla Linux kernels from kernel.org starting from 3.4 to 3.10.20 and various tweaked configurations -but we are currently clinging on to 3.10.10 for other reasons. I have also experimented with various versions of the e1000e driver from kernel.org and the e1000(e) Sourceforge project. However the problem seems identical for all kernels and driver versions.
I have attached kernel configuration, and various information from the machine, that might help identifying the issue.
The bom.txt file includes information about the (busybox based) userspace, all other output files should be easily identified by their name.
I will be glad to help with further investigations and information and any help or ideas for debugging and problem solving will be greatly appreciated.