It appears that boundary alignment was one of the causes for the
exceptions. In rtcfg_client_event.c in the function
rtcfg_client_recv_stage_1() at line ~640, there are two lines that
dereference 32-bit values not aligned to a 4-byte boundary:
daddr = *(u32*)stage_1_cfg->client_addr;
saddr = *(u32*)stage_1_cfg->server_addr;
which should be changed to something like:
memcpy(&daddr, stage_1_cfg->client_addr, 4);
memcpy(&saddr, stage_1_cfg->server_addr, 4);
On ARM processors the data pointed to must be boundary aligned to the size
of the type being dereferenced, otherwise it can cause an exception.
Xenomai caught this exception and gave the ominous: "suspending kernel
thread ... after exception #0". I'll look for some more of these unaligned
accesses, and send a patch later today.
I'm a bit surprised that no one else has had this issue on any of the other
ARM architectures...
> (take care of CCs)
Oops, too many late nights. :-)
Thanks for your help.
- James
On Thu, Dec 10, 2009 at 1:47 AM, Jan Kiszka <jan...@we...> wrote:
> (take care of CCs)
>
> James Kilts wrote:
> >> rtcfg_main_state_off should not be called, even if the frame is totally
> >> corrupted (yeah, we are lacking basic range checks here):
> >> RTCFG_FRM_STAGE_1_CFG is added to the header ID, so you
> >> normally end up in the frame event handlers.
> >
> > While it's true that RTCFG_FRM_STAGE_1_CFG is added to the header ID for
> the
> > event_id, this has no effect on device[1].state in rtcfg_do_main_event().
> > The value of device[1].state defaults to 0, which means that the function
> is
> > invoked as "(*state[0])(...)" which in the state function array is
> > "rtcfg_main_state_off()". Verified by rtdm_printk(),
> rtcfg_main_state_off()
> > is getting called. In the end, rtcfg_main_state_off() puts it into the
> > correct state (RTCFG_MAIN_CLIENT_0), but not before potentially
> corrupting
> > memory.
>
> Now I got your path.
>
> >
> > Anyway, even when rtcfg_main_state_off() only calls
> rtcfg_next_main_state()
> > and rtdm_mutex_unlock() in the RTCFG_CMD_CLIENT case, this does not seem
> to
> > be the source of my problems.
> >
>
> Right. An unconfigured RTcfg node receiving RTcfg frames will spit out
> warnings if debugging is on or will simply ignore those frames. But it
> won't deference the event_data under an incorrect type.
>
> >
> >> What is the header ID when things go wrong?
> >
> > frm_head->id == 0. It seems that the exception happens the first time
> > calling rtcfg_do_main_event() in rtcfg_rx_task().
> >
>
> Where precisely is important.
>
> >
> >> start with the frame as the stack finds it in memory after
> >> the driver handed it over. If that one is correct, we have a bug in the
> >> higher layers. But I strongly suspect the frame is corrupted.
> >
> > I think the frame is correct... here are some of the values from the
> rtskb
> > in rtcfg_main_state_client_0() (where event_id ==
> RTCFG_FRM_STAGE_1_CFG):
> >
> > rtskb->nh.iph->saddr == 0x44542442
> > rtskb->nh.iph->daddr == 0x4643414d
> > rtskb->protocol == 8848
> > rtskb->pkt_type == 1
> > rtskb->priority == 0
> > rtskb->len == 83
> > rtskb->data[0] == 0x00
> > rtskb->data[1] == 0x01
> > rtskb->data[2] == 0x0a
> > rtskb->data[3] == 0x65
> > rtskb->data[4] == 0x01
> > rtskb->data[5] == 0x7e
> > rtskb->data[6] == 0x0a
> > rtskb->data[7] == 0x65
> >
> >
> > My debugging environment is limited at the moment. When the exception
> > occurs, the stack is lost due to the exception handler, so the hardware
> > debugger is of little or no value.
>
> You know the call path quite well already, and this path is taken the
> first time here, so setting proper breakpoints should be no deal - given
> you actually have a suited hardware debugger, do you?
>
> > And when I use too many rtdm_printk()
> > calls, the entire system hangs before the exception. Because of the last
> > issue, I can't use the built-in debugging features of RTnet. Any
> > suggestions for debugging this?
>
> Xenomai reports the failing instruction address in the suspension
> message. Collect the module address (/proc/modules), compile the module
> with debug symbols (=>CONFIG_DEBUG_INFO), and process the binary with
> objdump or gdb. That should give you the instruction and its environment
> (ie. source code line).
>
> Jan
>
>
|