Menu

#305 Support for zero copy on tx & skip rx copy (MSG_TRUNC) on rx

1.0
accepted
None
2024-10-02
2024-03-19
No

Per Brian Tierny

I did some testing with the neper tool from google, which support SO_ZEROCOPY/MSG_ZEROCOPY via the -Z flag, and found it made a big difference, so long as the receiver is not CPU limited. See: https://github.com/google/neper

The trick is to use the --skip-rx-copy flag on the server to make sure you are not limited by the receiver.

Here are my test commands and results: (I'm using numactl to use the same core every time, as different cores often give different throughput)

server: numactl -C 5 ./tcp_stream --skip-rx-copy

client: numactl -C 5 ./tcp_stream -c -H 10.10.2.62 -Z
result: 37Gbps

client with MSG_ZEROCOPY: numactl -C 5 ./tcp_stream -c -H 10.10.2.62 -Z
result: 47Gbps

Based on this, I think it would be great if both iperf2 and iperf3 supported MSG_ZEROCOPY (and --skip-rx-copy, which sets the TRUNC flag in recv).

Looking at the code for this in neper, it should be a pretty easy enhancement for both iperf2/iperf3.

PS: I'm also testing BIG TCP now too. I'll let y'all know what I find. But BIG TCP needs to be enabled at the system level, so its not something you can set in iperf.

Discussion

  • Robert McMahon

    Robert McMahon - 2024-04-16

    --skip-rx-copy commit is here. Seems to work as iperf CPU goes from 53% to 29% when using a single stream TCP over 10G

    On:

    top - 11:51:34 up 2 days, 17:07,  2 users,  load average: 2.98, 3.08, 2.96
    Tasks: 744 total,   2 running, 740 sleeping,   0 stopped,   2 zombie
    %Cpu(s):  6.2 us,  2.8 sy,  0.0 ni, 88.8 id,  0.0 wa,  0.2 hi,  1.9 si,  0.0 st 
    MiB Mem :  64235.9 total,   4020.0 free,  17106.8 used,  44945.4 buff/cache     
    MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  47129.1 avail Mem 
    
     227309 rjmcmah+  20   0  227764   5712   3328 S  28.7   0.0   0:22.62 iperf        
    

    Off:

    top - 11:53:06 up 2 days, 17:08,  2 users,  load average: 3.15, 3.11, 2.99
    Tasks: 728 total,   3 running, 723 sleeping,   0 stopped,   2 zombie
    %Cpu(s):  7.5 us,  3.1 sy,  0.0 ni, 87.2 id,  0.0 wa,  0.2 hi,  1.9 si,  0.0 st 
    MiB Mem :  64235.9 total,   5483.0 free,  15643.7 used,  44936.3 buff/cache     
    MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  48592.2 avail Mem 
    
     227661 rjmcmah+  20   0  230632   5888   3328 S  53.0   0.0   0:08.95 iperf    
    
     
  • Robert McMahon

    Robert McMahon - 2024-04-16
    • summary: Support for zero copy --> Support for zero copy on tx & skip rx copy (MSG_TRUNC) on rx
     
  • Winnie

    Winnie - 2024-10-02

    Hello, @rjmcmahon

    Thank you for adding this feature.
    Based on my understanding, now it skips only rx copy.

    Based on my understanding, iperf2 uses write() function to send TCP packet in client and write() function does not support zero-copy.
    Do you have any plan to support zero-copy on TX (SO_ZEROCOPY/MSG_ZEROCOPY)?

    Thanks

     

Log in to post a comment.

MongoDB Logo MongoDB