|
From: Satoru M. <sat...@hd...> - 2011-12-20 23:40:50
|
On 12/17/2011 07:49 PM, Hagen Paul Pfeifer wrote: >> Sometimes network packets are dropped for some reason. In enterprise >> systems which require strict RAS functionality, we must know the >> reason why it happened and explain it to our customers even if using >> TCP. When we investigate the incidents, at first we try to find out >> whether the problem is in the server(kernel, application) or else >> (router, hub etc). And next we try to find out which layer >> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem >> occurs. > > For the first question tcpdump may the right tool. We'd like to keep records on memory and save it to file when we detect problems so that we can keep tracing overhead low. We can also keep the amount of trace data lower than with tcpdump because we only record data when retransmission occurs. Capturing all the packets and saving them to file cannot satisfy our requirements. I should have written them in the cover-letter. I'll fix it. Also, we can analyze incidents in combination with the data from this tracepoint and from others easily. > For the later systemtap can be used. I mean we now have the > possibility to instrument the kernel at runtime, without bloating the > source. Yes, we can use systemtap to get the data we need. But systemtap is not included in kernel and we must maintain systemtap scripts to follow kernel modification. By adding tracepoint here, we can get useful data via ftrace/perf without any instruments which is not included in kernel. Of course, systemtap can insert a probe with this tracepoint too. > Anyway: is 63e03724b51 not suitable to gather the required information > easily? We use trace_kfree_skb() which 63e03724b51 uses to detect packet drop event. In addition to that, we would like to detect errors in TCP layer for better trouble shooting. Regards, Satoru |