[RTnet-developers] High performance Giga Ethernet possible with RT-net?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear RT-net hackers,

I would like to open a discussion about the use of RT-net with Gigabit
Ethernet=20
in a high-performance fashion. The scene takes place in an embedded
platform,
with scarce resources, and the task is to process raw Ethernet frames
(RX+TX),=20
eventually at line rate (~950Mbps, 100k frames/s avg).

Looking at the present API, a Xenomai user task need to do one syscall
per
packet for each RX and TX side. On some platforms, a syscall can cost up
to=20
several microseconds. Up-to 4*950Mbps bandwidth with RAM (2x on RX, 2x
on TX)=20
is needed just for crossing kernel-user boundary. On the driver side,
RT-net=20
requests one interrupt per frame, which is normal to achieve low
latency, but=20
pointless when the only user task is periodic. What would be nice is the

ability for the user task to put the driver in polling mode. Of course
the=20
counterpart implies not being able to use TDMA and such in that mode,
but=20
that's fine with that kind of application.

So far, do you think it is among RT-net mission to offer such quality of

service? It might be also that this is present already, but I still
haven't=20
found it in RT-net 0.9.9 as I'm still digging through the code.

What I have in mind are the following, in order of importance:
* zero-copy interface between kernel-user
* scatter-gather support on TX side
* batched mode (more than one frame per syscall, interrupts->polling)

For the zero-copy, I was thinking about a pre-allocated rx buffer zone
that would be mmap'ed as read-only to user tasks. Some kind of reworked
rt_recvmsg() would query the list of frames received since last call.
The address of buffers would not be provided by the user task, but
by the rtnet stack, by filling in the msg_iov.iov_base field with=20
a pointer to the mmap'ed area. Once the user task is done with a
frame, a new call (or ioctl? direct access through mmap'ed zone
a la PACKET_MMAP[1]?) would release it to the driver for future use as=20
input frames.
Zero-copy for the TX side is a bit tricky, because it depends whether
the packet to be sent is mapped in kernel space or not. Does RTDM
provide developers with interface for easy kmapping? Virtual addresses=20
will have to be converted for valid DMA space. Once the transmit is
done,
the packet can be kunmapped and RT task should be notified that its
buffer can be reused. In that regard, I like the async API of
libusb-win32
outlined here[2] (rest easy, nothing about win32 with that kind of API
:-).
Unfortunately, such an API would work only with a MAC offering hardware
pattern matching, ie. doing the demultiplexing before the DMA.
[1] linux/Documentation/networking/packet_mmap.txt
[2] http://article.gmane.org/gmane.comp.lib.libusb.devel.windows/198

It looks like the scatter-gather TX support is already present in the
API
through the mean of sendmsg, but the driver is missing that information.
In fact, the gathering is done by a call to rt_memcpy_fromkerneliovec.
IMO, this should be up to the driver to decide whether to do soft
gathering using rt_memcpy_fromkerneliovec, or taking advantage of=20
hardware s/g DMA. Maybe a new field hard_start_xmit_sg would be welcomed
in struct rtnet_device? In the end, the sendmsg call would still need
to offer an asynchronous interface.

I'll stop here my random thoughts for now, because I'd like to know
whether
RT-net is the right place. Besides, some people may have already taken
that
route, or at least that goal, and why not escape this wheel of
reincarnation?

Thanks for reading and feedback,
--=20
Stephane

PS: has anybody ported the PowerQUICC 3 gianfar driver to RT-net?