Socket based CAN driver API

Pavel Pisa
  • Pavel Pisa
    Pavel Pisa

    Because of Sergei Sharonov mentioned this topic,
    I am sending udated copy of my August 2005 reply to some
    proponent of that API. They have not even bother reply to my e-mail.
    Yet continued to critique LinCAN even without disclosing
    description of their intentions.

    I have no strong opinion in the socket/char device selection case.

    I would be happy, if the LinCAN could be extended to support latest
    RTDM/Xenomai model too, even that I think, that all these non-Linux-native
    API real-time extensions are has utilization only for very fast
    sampling frequencies above 1 or 10kHz. I think, that CAN messages
    processing does not fall into this category (response about 1ms is enough)
    and fully preemptive kernel is right solution for this area.
    LinCAN is tested even with fully-preemptive kernel and works
    smoothly in this environment. I have thought about RTAI
    support long time ago, but previous incarnations of the RTAI
    API missed even basic support to register named devices
    and I have received no response from RTAI comuniti about that.
    RT-Linux supported most of these and we have utilized and
    cooperated on it with UPV people so the OCERA framework was
    more mature and features ready than RTAI when we started.

    I would be happy, if somebody could provide his/her time to help
    implement these features and extend LinCAN. I know about
    many LinCAN users only from Google search. Except from our
    university partners companies nobody caser to return back some
    value to the project.

    It is possible, that there are some strong advantages of the socket
    approach hidden to me.

    I try to describe flow of my ideas. I would like to hear from others
    about their view of things.

    There are my ideas
    1) CAN uses very short messages (0-8 data bytes), that means that
        traffic consists of huge amount of short messages => if read/write
        or send/receive is required for each message, syscall overhead
        is considerable problem. Some solution to read/write more messages
        by one syscall is required.

    2) If transferred data have stream type character, than some state
        type connection with stream read/write similar to TCP could hide
        small packet character from applications. There could be added special
        code to solve block and segmented CANopen SDO transfers into kernel.
        But it would help only to devices configuration, most of the data are
        process data and there are occasionally some expedited SDO transfers
        during operational state. All these are represented by simple eight
        data bytes messages. Only solution is to communicate with kernel through
        some higher level layer grouping requests. But things gets to
        be very complicated in such case.

    3) That means, that we need to enable to read and write multiple messages
        in one go. If we allow that through read/receive with buffer size
        for more messages and messages are returned packed (variable size),
        than messages processing gets complicated. The data of the second
        and next messages would not guarantee alignment. This is very bad,
        because solution of this would cause to add code for unaligned access
        to ID field etc. The result would be up to ten times slower code
        on some CPUs. Even skipping of some messages would require to add
        size field instead of simple indexing (not so big problem).
        You could not even deliver messages to other function by sending
        pointer reference (regular pointer to structure or data expects
        to be aligned) or by copying fixed size memory blocks.

    4) May it be, that some solution to get administrative data (ID, flags,
        length) out of send/receive data stream could help there.
        But if lengths are not in data stream, than you do not know
        what length of individual messages is either.

    5) I agree, that Linux networking infrastructure is mature and well
        optimized. It would be interesting to see what can be gained by its usage
        for CAN. I have seen one rough CAN driver utilizing this, but
        I have not spent so much time with it. On the other hand
        I have fear, that Linux networking is not optimized for so small
        packets => time and memory overhead would be considerable.

    6) CAN applications require to receive and send broad range
        of messages with different IDs in one application. I am almost
        sure, that reuse of Linux concept about kernel packet
        routing according to addresses and requirement to have
        individual open for each ID would kill applications. You would
        have dozens of threads or big selects or epolls.
        Managing of multiple addresses per one open would be required.
        This means, that raw packets has to be used and IDs cannot be
        assigned/inserted/and stripped on per open basis.

    7) The regular networks work with exclusive addresses mostly, but it is not
        case in CAN networks. You need to synchronize even more applications
        through sync frames. You need masked filter supports. I thing, that
        relatively small number of driver opens from one application is
        reasonable. There is good to have one driver open per one thread.
        It is good idea to be able to select TX queues priorities per each
        open. Small amount is enough but at least EMERGENCY, PDO and MANAGEMNT
        priorities should be distinguished.

    8) The infrastructure should be defined such way, that integration of
        user level applications with some RT Linux addon and RT applications
        has not big overhead.

    9) Redundancy should be considered in the future as well, that means,
        that communication is redirected to other controller or bus in the
        case of fault.

    10)support for automatic FIFO queues connecting to different chips
        COBs/messages buffers would be nice to have for some chips.

    We have tried to keep in mind most of above aspects, when we enhanced LinCAN.
    We have implemented internals to allow 7) in the future for example,
    but user API to this feature is not exported yet. We think, that our
    base could enabe 9) and 10) could be added and that we could
    even enhance already proven 8) on RT-Linux  such way,
    that instead of direct connection of each Linux context application
    to RT-Linux based driver messages to Linux context are routed through
    kernel thread/hub such way, that increasing number of Linux application
    and driver clients has zero influence to RT driver and applications.

    So this is what I and my colleagues have in mind, when we worked
    on our project. I do not say, that result is ideal and there
    is no better alternative. I agree, that constructive technical
    critique is required and some unification of CAN Linux support
    is necessary.
    I would like to hear your ideas and achievements.
    I am not too much polite in argumentation sometimes,
    but I think, that I am open to technical arguments
    and I am able to evaluate them even from different
    angles of view, that is my actual need.
    I would try to provide objective reactions even
    in case, that we would not be able to switch to
    solution agreed upon for reasons of our actual
    industrial partners and applications needs.
    I do not like development blocking by non technical
    and political spamming without reaction to actual
    technical problems.

    I think, that above considerations list should
    be discussed and editted on some public place.
    I do not know, where is right place.
    If you have some idea, send reaction with your ideas,
    critique and experience to some public list with CC
    to me. I can offere only OCERA lists for this discussion now.
    We can build even new list at some of our department
    or ISP servers. We manage for example.
    But I do not like to add yet another list and site without
    knowledge, that this would be used and usesfull for most
    of the Linux CAN developers. The LDDK has been right
    and well organized place years ago, but it is only
    progress blocking carcass now and even respective university
    administrators replied to me, that they are not able to locate
    physically LDDK computer. It even seems, that there are some
    players, which are happy from this Linux CAN Babylon

    • Hi Pavel,

      maybe you take a look on (sorry currently only german documentation).

      The development has been continued on

      where the Kernel 2.6 version is the latest (working) stuff.

      Maybe this brings you to the next level of handling CAN via Standard-IT-techniques as a real network-protocol implementation with sockets ;-)

      Best regards,


      • Pavel Pisa
        Pavel Pisa

        Hello Oliver,

        I have looked to your implementation.
        I want to congratulate you for very well defined API.
        To my surprise, the reuse of Linux kernel SKB layer provides significantly
        higher CAN implementation complexity reduction than I have
        originally expected. Not that complexity is not there, it is hidden in SKB
        handling, but it reuses infrastructure that is on the apex of interest of
        kernel architects, so its implementation is well optimized. The overhead
        for exceptionally short CAN messages should be checked.
        I like your balance of delivery queues categorisation in "struct rcv_dev_list"
        which is probably one of the little functional and reasonably optimal solutions
        for masked ID based filtering. Yes EFF complicates things.

        There is extreme need to unify CAN interfaces for Linux kernel and get
        it into mainline kernel and I confirm that your solution is the most promising

        I have tried to start discussion about Linux CAN unification process
        for many years. But I have not much luck there and responses (counting socket
        CAN advocates), even that I have invested non-marginal amount of my time
        (often unbackuped even by our university) to Linux CAN support for many users
        for free. The lack of the will to communicate is probably due significant
        political and economical dark OSS players who damage state of art progress
        in favour of temporarily advantages and gains which behavior has already lead
        to vanishing of their previously so glorified proprietary systems in last years.
        The TIXP4xx networking code is one of examples how to make many people to lose
        time and money for crappy code for undocumented hardware. There are many other
        similar BSPs holding people on 2.4.18 kernel forks and other obfuscated ill
        licensed code.

        I hope that your intention is to play fair in this respect, because else
        it could give you short time gain but it would contribute to blocking Linux
        to be sandbox place for enthusiasts, who has made Linux, what it is now.
        I would like to believe, that my very frustrating and almost year progress
        (in non CAN area) damaging experience with one of members seen on
        your project page was only exceptional and temporarily blackout of his
        ability evaluate technical argument in favour of some plotting plans
        and self-conceit. Nobody is perfect and I know that well about myself.

        Now back to your code. I think, that it is shame that you have not
        tried to answer to my previously stated questions. Your code addresses
        many of them, some can be evaluated as superfluous. Even if some
        limitations cannot be addressed at all, this does not mean,
        that socket based solution cannot be considered as most appropriate.

        There are some areas in your code, which I think that intention
        should to be paid for.

        The first, I agree, that your raw CAN message format is well aligned
        but I am not sure, if there should not be some field for
        communication object index on full CAN controllers.
        The obtaining of timestamps seems to have significant overhead.
        I am not sure for latest kernels, but IOCTL was known as requiring BKL
        even on relatively recent 2.6 kernels.

        The next code seems to me to have unnecessarily higher algorithmic
        complexity too.

        af_can.c:530 int can_rcv()
               /* find receive list for this device */
                for (q = rx_dev_list; q; q = q->next)
                        if (q->dev == dev)

        There should be some direct mapping between
        net_device and rcv_dev_list. I understand, that there could be
        problem with locking and space in generic networking structure,
        but it is not excuse for not thinking about that a little more.

        I am not sure, if choosing single linked lists for "next" field in "struct rcv_list"
        is the right idea.


        struct rcv_list {
            struct rcv_list *next;

        If this result in O(n-opens) process blocking during new device recognition,
        that would not be so bad. But I can see the loops in can_rx_unregister()
        which are run under rcv_lists_lock which means for each client
        close. You block all bottom halves during that time, this is not polite to
        real time applications. The read/write locks do not translate simply even on
        the fully preemptive kernel. I see similar problems in can_rx_unregister()
        which even calls find_rcv_list() under the lock. Even totaly unbound kmalloc
        is called under the write lock. The global rcv_lists_lock is not big win if
        the more CAN controllers are used on SMP or HT system as well and you use
        this global lock even in can_rcv(). This would result in cache trashing.

        We have maintained to keep almost all locking to be in O(1) scale in our queues
        approach and not held during any data copy operation operations and such hogs
        as kmalloc would be taken by me as homicide but we use these locking from
        IRQ context so our requirements are more strict. You can look into my queues
        There is only very occasional loop in the case, that the dead (being removed)
        edge has to be skiped. Else direct fall through code is only protected.
        Reference counting is used to ensure, that "objects" do not disappear during
        operation when no locks are held. Destroy is delayed. Reference counting
        uses CMPXCHG to not require more spinlocking during reference release
        for normal case. On the other hand our solution uses very huge amount
        of very short locking which could be overhead in some cases. We preffered
        this due to need to synchronize with hard-RT RT-Linux kernel in some build
        scenarios. The often fine grained object specific irq save locking has not
        so big overhead on standard kernels, but I admit, that it can have
        considerable overhead on the fully preemptive kernel.

        But  may it be, that these problems are not so critical in your case, because
        most of SKB delivery processing is run by the kernel in the background in
        the separate delayed processing so it has no influence to netif_rx() caller.
        But it has influence to RT application threads still when create queues
        and has impact on messages delivery latencies.

        But what I see as more problematic is, that delivery to the processes is handled
        in the common process_backlog function. It is right, that it is called from
        per CPU thread and all is probably well tuned for throughput, but even delivery
        of messages for real-time threads could be blocked by complex delivery of huge
        patkets in different protocol family (video broadcast etc).

        CAN is basically not about throughput, 1MBd is neglectable today,
        but it is about quick and deterministic delivery for which it is valued
        in robotic, motion control and automotive aqpplications.

        So there is most importatnt question for socket_can, does latency suffer from
        mixing SKBs with other protocols or not? I am not able to answer to this
        after short analysis of your code. Linux has not good responsiveness by default,
        but with fully preemptive kernel our approach allows to directly release
        high priority processes when CAN message arrives and latency should
        depend only on the priority of application thread and thread which
        is provided by fully preemptive kernel as kernel thread for CAN IRQ

        The problem could be solved, if input_pkt_queue would be moved to per device
        work queue which allows to tune right thread priority and would not compete
        with other protocols bandwidth. There seems to be some attempts in this direction
        for 2.6 kernels, but I do not know, if there is chance that they catch.
        I do not believe that there is chance for change in 2.4 kernels.

        There are another relatively minor things about your code which
        could be checked.

        I have not noticed even minimal attempt to prepare possibility to allow some
        control over messages Tx order (reaction to critical state should have some
        precedence over CANopen SDOs for example).
        But may it be, that I have not noticed some hidden possibility in SKB layer.

        The MSCAN utilizes delayed poll based receiption in kernel provided
        NET_RX_SOFTIRQ soft IRQ. It is again problem with thread priorities
        and mutual protocols load.

        I think, that generic can_calc_bit_time() could be more generic to allow
        computation for any device. The hardcoded constants (MAX_BIT_TIME ...)
        should be eliminated in favor of some device provided pointer
        to parameters structure. We do not have this in the LinCAN yet too, but if
        you want to see such code I can provide it to you from other of our CANopen
        university experiments based on next prototypes

        struct can_calc_const_t = {
          .unsigned sync,.sjw_max, tseg1_min, tseg1_max,
           tseg2_min, tseg2_max,.brp_min,.brp_max
        int can_auto_baud(long baud_rate, int *pbrp, int *ptseg1, int *ptseg2, int  sampl_pt, can_calc_const_t *c)

        But generally I agree, that your project is interesting
        and that it would worth to try to manage CAN communication
        that way. I hope, that I find time to port some of our test
        utilities to your API and I try to compare latencies.
        If socket_can latencies would not be significantly worse
        than LinCAN ones, then I would vote for its inclussion into
        vanilla. If I could find resources, people willing to
        contribute  (university funding could be problem there),
        then I think, that it would be good idea to try port
        more of LinCAN suported cards code into this framework.

        I do not expect, that we drop LinCAN at the moment,
        because it is used by more companies and your solution would
        not allow interoperability between hard-RT domain and linux
        kernel context. But I believe, that after fully peemptive
        patches are included into vanilla and maturation of your
        code, it has potential to be unified Linux CAN solution.

        The e-mail analysing your code is rather long but I think,
        that you should think about noticed aspect and try to go
        through previos checklist. I think, that it would not be bad,
        if you send these lists into your project mainlinglist to evoke
        discussion which could in end help even to your project.

        I am invited to RTLWS8 so if you go there we can meet.
        I wanted to evoke discussion about Linux CAN unification
        there and I have been preparing some materials for presentation
        before your e-mail. Your project contributes significant change
        in this area so I am not sure about my previous plans.
        In each case, I wish the there is mature, agreed upon, open,
        and non obfuscated CAN solution for Linux.

        Best wishes