Lost packets for no reason

RT
2010-01-22
2013-02-03
  • RT
    RT
    2010-01-22

    Runnin WANem on Dell laptop with 2 NICs each at 100mbps Full duplex.
    Host application generating appr  100 packets per second for total IP bandwidth of ~ 340kbps.
    When I have no WANEM rules configured, or 0ms delay configured, the traffic flows ok.
    As soon as I apply delay to either interface,  the affected data stream experiences a small burst of lost packets every 35 to 40 seconds like clockwork….

    ALSO when enabling any delays or changing rules,,, console displays the following message:

    HTB: Quantum of class 100001 is big.  Consider 2Rq change.

    Could these losses be related to the message? 
    I need to emulate a clean link with delay…  Any ideas…?
    Thanks.

     
  • RT
    RT
    2010-01-25

    Upgraded from WANem 2.0 to WANem 2.2….   
    Periodic packet loss when delay is applied remains though
    "HTB: Quantum of class 100001 is big. Consider 2Rq change" message no longer seen.

    Tried a second laptop (older Dell) and same problem persist on either interface that has delay enabled.

     
  • M K Nambiar
    M K Nambiar
    2010-01-26

    Hi,

    A similar problem is being discussed in this topic for much higher bandwidths - https://sourceforge.net/projects/wanem/forums/forum/7256 29/topic/3445060

    Haven't been able to find a fix for it yet.

    However I am unable to reproduce your problem. How are you detecting packet loss?

    Regards,
    M.K.Nambiar

    However

     
  • RT
    RT
    2010-01-27

    How do I determine losses?
    Several methods, the most direct is of course Pings… but slow PC generated pings take many many to reveal a loss.
    I have other equipment that can generate many pings per second.   At 30 pings per second running in parallel with my UDP test traffic, I detect 12 ping losses out of 3000.  Further I was able to determine the losses were symmetrical.    6 of the losses were the pings on the way to the destination.  6 of the losses were the replies on their way back to the source.   Delay was applied symmetrically dyring that test.

    My other method is testing with equipment that generates a constant UDP traffic stream of 100 packets per second.  I use an SNMP polling engine to continually compare the number of packet transmitted from one side to the number of packets received at the other side.  The resulting graph reveals a more or less linear rate of loss near .25% in any direction for which delay has been enabled.

     
  • RT
    RT
    2010-01-27

    One critical difference between this case and the indicated thread….. for this case, WANem status does NOT report discarded packets.

     
  • M K Nambiar
    M K Nambiar
    2010-02-05

    I am assuming that you are generating UDP traffic at 100 packets per second in both tests.

    Now it depends on the packet size you are using.

    Lets assume your packet size is 1400 bytes.

    At 100 packets per second you are generating 100*1400*8 = 1120 kbps of traffic.

    At a bandwidth setting of 340 kbps I would expect packet losses.

    If you are limiting your generated traffic to less than 340 kbps and still seeing the problem - it will help me if you can tell me an easy way for me to reproduce it. I can then try it in my lab.

    Regards,
    M.K.Nambiar

     
  • RT
    RT
    2010-02-05

    I am generating 100 packets per second that result in ~340 kbps of load  (packet size of approximately 425 bytes).   For WANem, I have chosen bandwidths anywhere from 2048k E1 to 100meg ethernet to 45meg with the same result.

    Maybe a UDP Test application can generate 100 packets per second with packet size 425 bytes.   Just in case it matters, the UDP packets are being generated on with UDP source and destination port 65434.

     
  • RT
    RT
    2010-03-02

    I have more information.
    I am able to reproduce this problem using only ICMP Pings.

    I have narrowed the issue down to a packet per second issue.

    I have the ability to generate a ping session sending 10 pings per second.
    I further have the ability to generate multiple such sessions.

    At a total of 30 pings per second, I see no losses.
    At 40 pings per second or higher I experience losses.
    I see losses in both directions even if WANEM delay is added only on one interface.
    I can monitor and confirm that some Pings are lost in the forward direction AND some Ping replies are lost in the reverse direction.

    The result does not seem to be related to packet size/bandwidth as the result is the same with ping sizes of 8 bytes, 64 bytes, or 1400 bytes.

    Ultimately I need to use this with my UDP application that generates 100-300 pps in production.

    There are no losses until Delay is added to one of the WANEm interfaces.
    Is there an explanation for this packet per second limitation on these DELL Laptops when using WANEM?  Can anything be done for it?

     
  • M K Nambiar
    M K Nambiar
    2010-03-17

    Hi,

    Finally, ran a similar test in my lab.

    I am using the iperf tool to run my tests. I am using the UDP part of the tool to generate traffic. iperf is available at https://sourceforge.net/projects/iperf/

    Here is my test configuration

    |server 1 |-----------------| eth0 WANem eth1 |-----------------| server 2|

    LAN speed - 1 Gbps
    All subnet masks - 255.255.255.0
    server 1 - 192.168.140.61
    server 2 - 192.168.140.60
    WANem eth0 - 192.168.140.15
    WANem eth1 - 192.168.140.115

    commands executed before test
    server1


    route add -host 192.168.140.60 netmask 0.0.0.0 gw 192.168.140.15
    iperf -s

    server2

    route add -host 192.168.140.61 netmask 0.0.0.0 gw 192.168.140.115
    iperf -s

    WANem console

    WANemControl@PERC>assign 192.168.140.61 eth0
    WANemControl@PERC>assign 192.168.140.60 eth1

    WANem GUI

    eth0: BW-44.736 Mbps (T3/DS3), Delay - 50 ms
    eth1: BW-44.736 Mbps (T3/DS3), Delay - 50 ms

    The tests - Following commands on each server executed at the (almost) same time

    server 1: iperf -u -c 192.168.140.60 -t 120 -b 40M # UDP stream from server 1 to server 2
    server 2: iperf -u -c 192.168.140.61 -t 120 -b 40M # UDP stream from server 2 to server 1

    I am limiting the network traffic to 40 Mbps so that there is no packet loss due to bandwidth limitation.

    Caomman Output results

    server 1

    iperf -u -c 192.168.140.60 -t 120 -b 40M

    Client connecting to 192.168.140.60, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size:   256 KByte (default)


    local 192.168.140.61 port 25261 connected with 192.168.140.60 port 5001
    Interval       Transfer     Bandwidth
      0.0-120.0 sec    563 MBytes  39.4 Mbits/sec
    Sent 401927 datagrams
    Server Report:
    Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
      0.0-120.0 sec    563 MBytes  39.4 Mbits/sec  0.589 ms    0/401927 (0%)

    server 2

    iperf -u -c 192.168.140.61 -t 120 -b 40M

    Client connecting to 192.168.140.61, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size:   256 KByte (default)


    local 192.168.140.60 port 11501 connected with 192.168.140.61 port 5001
    Interval       Transfer     Bandwidth
      0.0-120.0 sec    572 MBytes  40.0 Mbits/sec
    Sent 408165 datagrams
    Server Report:
    Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
      0.0-120.0 sec    572 MBytes  40.0 Mbits/sec  0.462 ms    0/408165 (0%)

    I ran this test several times with similar results. I also tried combinations of lower bandwidths , with and without delays, but no packet loss. The rate of generated traffic was always less than the bandwidths configured in WANem.

    Conclusion - WANem is passing more than 3000 PPS bidirectionally (1470 bytes each) without unwanted packet loss. Since you need only upto 300 PPS max WANem should work well for you provided ….. (read on)

    If you are still seeing packet loss I can think of following possibilities
    1> Could be that you are losing packets on your LAN (outside of WANem)
    2> Your WANem might be running on poor Hardware cos of which WANem is losing packets
    3> Packets must be getting discarded at the sending host or the receiving host

    Let me describe each point in detail
    1> Packet Loss on LAN not WANem -
    The best way to check if WANem is losing packets is by comparing tcpdump logs on eth0 and eth1. What comes in eth0 should go out on eth1 and vice versa. You might want to use tcpstat or tcptrace to ease analysis. Alternately you can copy the logs to analyze using wireshark.
    Relevant URLs
    tcpstat: http://www.frenchfries.net/paul/tcpstat/
    tcptrace: http://www.tcptrace.org/
    tcpdump: http://www.tcpdump.org (Already in WANem CD)
    Wireshark http://www.wireshark.org

    2> WANem maybe running on poor hardware
    It is difficult for me to suggest the correct hardware details that will be a "just fit" for your test. However here is what I used for the tests executed above.
    WANem:
    4 core,  Intel(R) Xeon(TM) CPU 3.20GHz, 2 MB
    RAM : 2 GB
    eth0 & eth1:
    driver: tg3
    version: 3.86
    firmware-version: 5704-v3.27b
    Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)

    3> End Host issue - Hardware, OS or application
    This is just a general stab of what might be contributing to the packet losses you see. In an unrelated test I had a 2 processes, a sender and a receiver running on the same host communicating using UDP. I was horrified to see packets losses even when the network was not involved. What it means is this - if the receiver program cannot keep up with the rate of incoming packets, packets will be lost regardless. Am I saying that the ping server in your test is slow - maybe, or it could be just the hardware and operating system on the end hosts.

    To give you an idea this is what was used for my test

    server 1:
    8 core, Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz, 6 MB cache
    RAM: 8 GB
    OS: 2.6.24.7-65.el5rt.centos #1 SMP PREEMPT RT Mon Jul 21 06:03:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
    NIC details:
    driver: e1000
    version: 7.3.20-k2-NAPI
    firmware-version: 2.1-12
    Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)

    server 2:
    16 core, Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz, 8 MB cache
    RAM: 8 GB
    OS:  2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
    NIC details:
    driver: igb
    version: 1.3.16-k2
    firmware-version: 1.2-3
    Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

    Regards,
    M.K.Nambiar

     
  • RT
    RT
    2010-03-18

    "If you are still seeing packet loss I can think of following possibilities
    1> Could be that you are losing packets on your LAN (outside of WANem)
    2> Your WANem might be running on poor Hardware cos of which WANem is losing packets
    3> Packets must be getting discarded at the sending host or the receiving host
    "

    1.  There is no packet loss on the LAN because I run loss free through the WANem machine using all the same connectivity as long as the WANem is not "running" by adding any delays or limits or jitter.

    3.  There is no discarding by the source or destination machine again because I run loss free through the WANem machine using all the same connectivity as long as the WANem is not "running" by adding any delays or limits or jitter.   The nature of the traffic generation is not delay dependent so the addition of delay itself will not cause the source/destination to discard packets themselves.

    2.  Processing limited hardware -  Sure this is a possibility but I don't have another machine to test this theory.  Both of my DELL Latitude laptops show the same behavior.  The stronger machine is an Intel Core 2 DUO CPUs T7500 @ 2.20 Ghz.   Builtin Broadcom NetXpreme57xx gigabit controller etherent controller,,,  and one PCMCIA  Dynex 10/100bT Ethernet adapter card.

    I have rerun my test today and results remain the same.   Once packet rate approaches 40 pps, I start to experience a low packet loss rate.    Packet size and WANem parameters have little to no effect.  It seems only a PPS issue and ONLY when WANem is actually applying delay.  The amount of delay, or the packet limit, or various Bandwidth limits seem to not make any difference.   The delay added is verified as functional as the Ping times change when the delay is changed.

    NOTE.. in your test bed you placed all servers and wanem in the same IP subnet.   My IP source and destination devices do not allow this use of routing via a gateway for a local network IP address.   AS such, I have to configure it as 2 networks…

    All masks are 255.255.255.0
    Server 1 : 192.198.69.10
    WANem Eth0:  192.168.69.99
    Server 2 :   192.168.70.20
    WANem Eth1:  192.168.70.99

    Could this "2-network" arrangement make any difference for WANem performance?

    Adding the following "assign" statements do not make any difference in result.

    assign 192.168.69.10 eth0
    assign 192.168.70.20 eth1

     
  • M K Nambiar
    M K Nambiar
    2010-04-22

    Hi,

    The 2-network arrangement should not be a problem.

    In the meantime I have tried something different.

    I used the tbf queue to emulate bandwitdh (instead of the regular htb queue).

    I ran these commands on the WANem console after an exit2shell, after the 2 assign commands.

    tc qdisc add dev eth0 root handle 1:0  tbf rate 1000mbit buffer 999999999 limit 999999999
    tc qdisc add dev eth0 parent 1:1 handle 10: netem delay 50ms limit 999999999
    tc qdisc add dev eth1 root handle 1:0  tbf rate 1000mbit buffer 999999999 limit 999999999
    tc qdisc add dev eth1 parent 1:1 handle 10: netem delay 50ms limit 999999999

    These commands get WANem to emulate 1 Gbps with a delay of 50 ms in both NIC's.

    I tested with iperf again -

    on server: iperf -s -u
    on client: iperf -u -c 192.168.140.60 -t 120 -b 1000M

    Pasted below is the iperf output

    /root>iperf -u -c 192.168.140.60 -t 120 -b 1000M

    Client connecting to 192.168.140.60, UDP port 5001
    Sending 1470 byte datagrams
    UDP buffer size:   256 KByte (default)


    local 192.168.140.61 port 52924 connected with 192.168.140.60 port 5001
    Interval       Transfer     Bandwidth
      0.0-120.0 sec  13.4 GBytes    957 Mbits/sec
    Sent 9766812 datagrams
    Server Report:
    Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
      0.0-120.0 sec  13.4 GBytes    957 Mbits/sec  0.060 ms    0/9766811 (0%)
      0.0-120.0 sec  1 datagrams received out-of-order

    So no packet loss.

    You might want to try this and see if it works for you. Before trying this you should do a reset settings from the GUI.

    Regards,
    M.K.Nambiar