Re: [Dle-develop] [RFC][PATCH] add tracepoint to __sk_mem_schedule

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Neil,

Thank you for looking at it.
I'm afraid that my explanation was not enough.

On 06/15/2011 07:07 AM, Neil Horman wrote:
> On Tue, Jun 14, 2011 at 03:24:14PM -0400, Satoru Moriya wrote:
>
> There are several ways to do this already.  Every drop that occurs in the stack
> should have a corresponding statistical counter exposed for it, and we also have
> a tracepoint in kfree_skb that the dropwatch system monitors to inform us of
> dropped packets in a certralized fashion.  Not saying this tracepoint isn't
> worthwhile, only that it covers already covered ground.

Right. We can catch packet drop events via dropwatch/kfree_skb tracepoint.
OTOH, what I'd like to know is why kernel dropped packets. Basically,
we're able to know the reason because kfree_skb tracepoint records its
return address. But in this case it is difficult to know a detailed reason
because there are some possibilities. I'll try to explain UDP case.

> Again, this is why dropwatch exists.  UDP gets into this path from:
> __udp_queue_rcv_skb
>  ip_queue_rcv_skb
>   sock_queue_rcv_skb
>    sk_rmem_schedule
>     __sk_mem_schedule
> 
> If ip_queue_rcv_skb fails we increment the UDP_MIB_RCVBUFERRORS counter as well
> as the UDP_MIB_INERRORS counter, and on the kfree_skb call after those
> increments, dropwatch will report the frame loss and the fact that it occured in
> __udp_queue_rcv_skb

As you said, we're able to catch the packet drop in __udp_queue_rcv_skb
and it means ip_queue_rcv_skb/sock_queue_rcv_skb returned negative value.
In sock_queue_rcv_skb there are 3 possibilities where it returns nagative
val but we can't separate them from the kfree_skb tracepoint. Moreover
sock_queue_rcv_skb calls __sk_mem_schedule and there are several
if-statement to decide whether kernel should drop a packet in it. I'd like
to know the condition when it returned error because some of them are
related to sysctl knob e.g. /proc/sys/net/ipv4/udp_mem, udp_rmem_min,
udp_wmem_min for UDP case and we can tune kernel behavior easily.
Also they are needed to show root cause of packet drop to our customers.

Does it make sense?

> I still think its an interesting tracepoint, just because it might be nice to
> know which sockets are expanding their snd/rcv buffer space, but why not modify
> the tracepoint so that it accepts the return code of __sk_mem_schedule and call
> it from both sk_rmem_schedule and sk_wmem_schedule.   That way you can use the
> tracepoint to record both successfull expansion and failed expansions.

It may be good but not enough for me as I mentioned above.

Regards,
Satoru