Open Event Machine Code

An event driven processing runtime for multicore

Brought to you by: cwallen, kisse, matiaselo, psav
Tree [94a5bc] master v1.3.0 / History
HTTPS access
File	Date	Author	Commit
doc	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
include	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
programs	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
src	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
.gitignore	2015-08-17	Carl Wallén	[7fdd2a] OpenEM v1.2.0
CHANGE_NOTES	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
README	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
README.dpdk	2017-08-31	Carl Wallén	[94a5bc] em-dpdk v1.3.0
Read Me

Copyright (c) 2013-2016, Nokia Solutions and Networks
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

  * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
  * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
  * Neither the name of the copyright holder nor the names of its
    contributors may be used to endorse or promote products derived
    from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

===============================================================================
Table of Contents
===============================================================================
1. Open Event Machine
2. Examples / Test Cases
   A) Basic standalone examples
   B) Performance test examples
   C) Packet-I/O examples
   D) Add-on API examples (event-timer using the EM add-on APIs)
3. Changes
4. Open Issues
5. Notes

===============================================================================
1. Open Event Machine (OpenEM, EM or em-dpdk)
===============================================================================
License:
- OpenEM - See the the event_machine/LICENSE file.
- DPDK   - See the DPDK package.

Note: This release has support for the following environment:

em-dpdk - optimized for Intel x86_64 + Linux + DPDK + Intel NICs
    See event_machine/intel/README.dpdk

Note: Read the Open Event Machine API, especially the event_machine.h file and
      the files it includes, for a more thorough description about the
      functionality. Documentation slides also available.

OpenEM version numbering scheme reflects the used OpenEM API version:
  a.b.c(-d)
      a.b = API version supported
       .c = implementation number supporting API version a.b
       -d = fix number, added if needed
  So this means that e.g. version 1.2.2 stands for
      a.b = 1.2: supports OpenEM API v1.2
       .c =  .2: third implementation supporting API 1.2 (.0 is first)
    no -d = N/A: no fixes yet

Open Event Machine is a lightweight, dynamically load balancing,
run-to-completion, event processing runtime targeting multicore SoC data plane
environments.

This release of the Open Event Machine contains the OpenEM API as well an
example all-SW implementation.

The Event Machine is a multicore optimized, high performance, data plane
processing concept based on asynchronous queues and event scheduling.
Applications are built from Execution Objects (EO), events and event queues.
The EO is a run-to-completion object that gets called at each event receive.
Communication is built around events and event queues. The EM scheduler selects
an event from a queue based on priority, and calls the EO's receive function
for event processing. The EO might send events to other EOs or HW-accelerators
for further processing or output. The next event for the core is selected only
after the EO has returned. The selected queue type, queue priority and queue
group affects the scheduling. Dynamic load balancing of work between the cores
is the default behaviour. Note that the application (or EO) is not tied to a
particular core, instead the event processing for the EO is performed on any
core that is available at that time.

===============================================================================
2. Examples / Test Cases
===============================================================================
The package contains a set of examples / test cases.

For compilation see:
em-dpdk/README.dpdk

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A) Basic standalone Examples - do not require any external input or I/O,
   just compile and run to get output/results.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  Sources in directory: em-dpdk/programs/example/
                   (and em-dpdk/programs/common/ for main() and common setup)

  > cd em-dpdk/programs/example/
  > make real_clean && make em_clean && make clean && make
  -----------------------------------------------------------------------------
  A1) Hello World: hello
      em-dpdk/programs/example/hello/hello.c
  -----------------------------------------------------------------------------
  Simple "Hello World" example the Event Machine way.

  Creates two Execution Objects (EOs), each with a dedicated queue for incoming
  events, and ping-pongs an event between the EOs while printing "Hello world"
  each time the event is received. Note the changing core number in the
  printout below (the event is handled by any core that is ready to take on new
  work):

  Run hello on 8 cores
  > sudo ./build/hello -c 0xff -n 4 -- -p
    or
  > sudo ./build/hello -c 0xff -n 4 -- -t
    ...
    Hello world started EO A. I'm EO 0. My queue is 320.
    Hello world started EO B. I'm EO 1. My queue is 321.
    Entering event dispatch loop() on EM-core 0
    Hello world from EO A! My queue is 320. I'm on core 02. Event seq is 0.
    Hello world from EO B! My queue is 321. I'm on core 04. Event seq is 1.
    Hello world from EO A! My queue is 320. I'm on core 02. Event seq is 2.
    Hello world from EO B! My queue is 321. I'm on core 05. Event seq is 3.
    ...

  -----------------------------------------------------------------------------
  A2) Fractal calculation and drawing: fractal
      em-dpdk/programs/example/fractal/fractal.c
  -----------------------------------------------------------------------------
  Event Machine fractal (Mandelbrot set) drawing application.
  Generate Mandelbroth fractal images and for each image zoom deeper into the
  fractal. Prints frames-per-second and the frame range. Modify defined values
  for different resolution, zoom point, starting frame, ending frame and
  the precision of the fractal calculation.

  Creates three Execution Objects to form a pipeline:
   --> ](Pixel_handler) --> ](Worker) --> ](Imager) -==> IMAGE
   \_____________________________________________/

  Note: Creates a ramdisk to /tmp/ramdisk to store fractal.ppm.
        Runs the 'feh' application to display and refresh the image as a
        background process, install if missing or use something else.

  Run fractal on 4 cores
  > sudo ./build/fractal -c 0xf -n 4 -- -t
  ...
  Started Pixel handler. I'm EO 6. My queue is 352.
  Started Worker. I'm EO 7. My queue is 353.
  Started Imager. I'm EO 8. My queue is 355.
  Entering the event dispatch loop() on EM-core 0
  Entering the event dispatch loop() on EM-core 1
  Entering the event dispatch loop() on EM-core 2
  Entering the event dispatch loop() on EM-core 3
  Frames per second: 01 | frames 0 - 0
  Frames per second: 13 | frames 1 - 13
  Frames per second: 08 | frames 14 - 21
  Frames per second: 08 | frames 22 - 29
  Frames per second: 06 | frames 30 - 35
  ...

  -----------------------------------------------------------------------------
  A3) Error Handling: error
      em-dpdk/programs/example/error/error.c
  -----------------------------------------------------------------------------
  Demonstrate and test the Event Machine error handling functionality,

  Three application EOs are created, each with a dedicated queue. An
  application specific global error handler is registered (thus replacing the
  EM default). Additionally EO A will register an EO specific error handler.
  When the EOs receive events (error_receive) they will generate errors by
  explicit calls to em_error() and by calling EM-API functions with invalid
  arguments. The registered error handlers simply print the error information
  on screen.

  Note that we continue executing after a fatal error since these are only
  test-errors.

  > sudo ./build/error -c 0xf -n 4 -- -p
   or
  > sudo ./build/error -c 0xf -n 4 -- -t
  ...
  Error log from EO C [0] on core 1!
  THIS IS A FATAL ERROR!!
  Appl Global error handler     : EO 2  error 0x8000DEAD  escope 0x0
  Return from fatal.

  Error log from EO A [0] on core 2!
  Appl EO specific error handler: EO 0  error 0x00001111  escope 0x1
  Appl EO specific error handler: EO 0  error 0x00002222  escope 0x2 ARGS: Second error
  Appl EO specific error handler: EO 0  error 0x00003333  escope 0x3 ARGS: Third  error 320
  Appl EO specific error handler: EO 0  error 0x00004444  escope 0x4 ARGS: Fourth error 320 0
  Appl EO specific error handler: EO 0  error 0x0000000A  escope 0xFF000402 - EM info:
  EM ERROR:0x0000000A  ESCOPE:0xFF000402  EO:1-"EO B"  core:03 ecount:81(14)  em_free(L:2149) em.c  event ptr NULL!
  ...

  -----------------------------------------------------------------------------
  A4.1) Event Group: event_group
        em-dpdk/programs/example/event_group/event_group.c
  -----------------------------------------------------------------------------
  Tests and measures the event group feature for fork-join type of operations
  using events. See the event_machine_group.h file for event group API calls.

  An EO allocates and sends a number of data events to itself (using an event
  group) to trigger a notification event to be sent when the configured event
  count has been received. The cycles consumed until the notification is
  received is measured and printed.

  Note: To keep things simple this test case uses only a single queue into
  which to receive all events, including the notification events. The event
  group fork-join mechanism does not care about the used queues however, it's
  basically a counter of events sent using a certain event group id. In a more
  complex example each data event could be send from different EO:s to
  different queues and the final notification event sent yet to another queue.

  > sudo ./build/event_group -c 0xf -n 4 -- -p
    or
  > sudo ./build/event_group -c 0xf -n 4 -- -t
  ...
  --- Start event group ---
  Event group notification event received after   256 data events.
  Cycles curr:139024, ave:144201

  --- Start event group ---
  Event group notification event received after   256 data events.
  Cycles curr:144075, ave:144188

  --- Start event group ---
  Event group notification event received after   256 data events.
  Cycles curr:142648, ave:144048
  ...

  -----------------------------------------------------------------------------
  A4.2) Event Group: event_group_abort
        em-dpdk/programs/example/event_group/event_group_abort.c
  -----------------------------------------------------------------------------
  Event Machine event group example/test using em_event_group_abort().

  Aborting an ongoing event group means that events sent with that group no
  longer belong to a valid event group. Same is valid for excess events.
  If more group events are sent than the applied count, the excess events,
  once received don't belong to a valid event group.

  This example creates one EO with a parallel queue and allocates a defined
  amount of event groups at startup that are reused. At round start the EO
  allocates a defined amount of data events per event group and sends these to
  the parallel queue. Events loop until the group is used as many times as
  defined by the round. Event count and round stop count is set randomly.

  Counters track received valid and non-valid event group events and
  statistics on event group API calls. Events that don't belong to a valid
  group are removed from the loop and freed.  Once all event groups are
  aborted, or notification events are received, a starting event is sent that
  begins the next round.

  During the round everything possible is done to misuse the event group to
  try and break it by getting the test or the groups in an undefined state.
  Groups are incremented, assigned and ended at random.

  E.g. start on 12 cores:
  > sudo ./build/event_group_abort -n 4 -c 0xfcfc -- -t
      or
  >
  ...
  --- Round 1
  Created 30 event group(s) with count of 1384
  Abort group when received 82887 events

  Group events received: Valid: 43893, Expired: 3812
  Event group increments: Valid: 22090, Failed: 1
  Event group assigns: Valid: 19730, Failed: 1783
  Aborted 0 event groups
  Failed to abort 0 times
  Received 30 notification events
  Freed 0 notification events
  -----------------------------------------

  --- Round 2
  Created 30 event group(s) with count of 19487
  Abort group when received 43952 events

  Group events received: Valid: 591137, Expired: 3832
  Event group increments: Valid: 295407, Failed: 1
  Event group assigns: Valid: 288890, Failed: 4510
  Aborted 0 event groups
  Failed to abort 0 times
  Received 30 notification events
  Freed 0 notification events
  -----------------------------------------
  ...

  -----------------------------------------------------------------------------
  A4.3) Event Group: event_group_assign_end
      em-dpdk/programs/example/event_group/event_group_assign_end.c
  -----------------------------------------------------------------------------
  Event Machine event group example using em_event_group_assign() and
  em_event_group_processing_end().

  Test and measure the event group feature for fork-join type of operations
  using events. See the event_machine_group.h file for event group API calls.

  Allocates and sends a number of data events to itself (using two event
  groups) to trigger notification events to be sent when the configured event
  count has been received. The cycles consumed until the notification is
  received is measured and printed.

  Three events groups are used, two to track completion of a certain number of
  data events and a final third chained event group to track completion of the
  two other event groups. The test is restarted once the final third
  notification is received. One of the two event groups used to track
  completion of data events is a "normal" event group but the other event group
  is assigned when a data event is received instead of normally sending the
  event with an event group.

  E.g. start on 2 cores:
  > sudo ./build/event_group_assign_end -c 0x3 -n 4 -- -p
  ...
  --- Start event group ---
  --- Start assigned event group ---
  "Normal" event group notification event received after  2048 data events.
  Cycles curr:1811245, ave:1827395
  Assigned event group notification event received after  2048 data events.
  Cycles curr:1801551, ave:1821180
  --- Chained event group done ---

  --- Start assigned event group ---
  --- Start event group ---
  Assigned event group notification event received after  2048 data events.
  Cycles curr:1814983, ave:1820294
  "Normal" event group notification event received after  2048 data events.
  Cycles curr:1799081, ave:1823350
  --- Chained event group done ---
  ...

  -----------------------------------------------------------------------------
  A4.4) Event Group: event_group_chaining
      em-dpdk/programs/example/event_group/event_group_chaining.c
  -----------------------------------------------------------------------------
  Event Machine event group chaining example.

  Test and measure the event group chaining feature for fork-join type of
  operations using events. See the event_machine_group.h file for the
  event group API calls.

  Allocates and sends a number of data events to itself using an event group.
  When the configured event count has been reached, notification events are
  triggered. The cycles consumed until each notification is received is
  measured and printed. The notification events are sent using a second event
  group which will trigger a final done event after all the notifications have
  been processed.

  Note: To keep things simple this test case uses only a single queue to
  receive all events, including the notification events. The event group
  fork-join mechanism does not care about the used queues however, it's
  basically a counter of events sent using a certain event group id. In a more
  complex example each data event could be send from different EO:s to
  different queues and the final notification event sent yet to another queue.

  E.g start on 7 cores:
  > sudo ./build/event_group_chaining -c 0xfe -n 4 -- -p
  ...
  --- Start event group ---
  Event group notification event received after   256 data events.
  Cycles curr:159870, ave:186599
  Event group notification event received after   256 data events.
  Cycles curr:160560, ave:184739
  Event group notification event received after   256 data events.
  Cycles curr:161112, ave:183164
  Event group notification event received after   256 data events.
  Cycles curr:232228, ave:186230
  --- Chained event group done ---

  --- Start event group ---
  Event group notification event received after   256 data events.
  Cycles curr:175525, ave:185601
  Event group notification event received after   256 data events.
  Cycles curr:176777, ave:185110
  Event group notification event received after   256 data events.
  Cycles curr:178012, ave:184737
  Event group notification event received after   256 data events.
  Cycles curr:178420, ave:184421
  --- Chained event group done ---
  ...

  -----------------------------------------------------------------------------
  A5) Queue Group: queue_group
      em-dpdk/programs/example/queue_group.c
  -----------------------------------------------------------------------------
  Event Machine queue group feature test.

  Creates an EO with two queues: a notification queue and a data event queue.
  The notif queue belongs to the default queue group and can be processed on
  any core while the data queue belongs to a newly created queue group called
  "test_group". The EO-receive function receives a number of data events and
  then modifies the test queue group (i.e. changes the cores allowed to process
  events from the data event queue). The test is restarted when the queue group
  has been modified enough times to include each core at least once.

  E.g start on 8 cores:
  > sudo ./build/queue_group -c 0xf0f0 -n 4 -- -p

  APPL: notif_start_done(line:518) - EM-core00:
  Created test queue:266 type:PARALLEL(2) queue group:1 (name:"QGrp004")

  APPL: notif_event_group_data_done(line:676) - EM-core02:
  ****************************************
  Received 62464 events on Q:266:
      QueueGroup:1, Curr Coremask:0x33
  Now Modifying:
      QueueGroup:1,  New Coremask:0x34
  ****************************************

  APPL: notif_event_group_data_done(line:676) - EM-core02:
  ****************************************
  Received 62464 events on Q:266:
      QueueGroup:1, Curr Coremask:0x66
  Now Modifying:
      QueueGroup:1,  New Coremask:0x67
  ****************************************

  ...

  APPL: notif_event_group_data_done(line:676) - EM-core07:
  ****************************************
  Received 62464 events on Q:266:
      QueueGroup:1, Curr Coremask:0xff
  Now Modifying:
      QueueGroup:1,  New Coremask:0x0
  ****************************************

  APPL: notif_queue_group_modify_done(line:552) - EM-core05:
  *************************************
  All cores removed from QueueGroup!
  *************************************

  APPL: notif_queue_group_modify_done(line:573) - EM-core05:
  Deleting test queue:266,     Qgrp ID:1 (name:"QGrp004")

  APPL: receive_event_notif(line:397) - EM-core04:
  ***********************************************
  !!! Restarting test !!!
  ***********************************************
  ...

  -----------------------------------------------------------------------------
  A6.1) Queue: queue_types_ag
        em-dpdk/programs/example/queue/queue_types_ag.c
  -----------------------------------------------------------------------------
  Event Machine Queue Types test example with included atomic groups(=_ag).

  The test creates several EO-pairs and sends events between the queues in the
  pair. Each EO has an input queue (of type atomic(A), parallel(P) or
  parallel-ordered(PO)) or, in the case of atomic groups(AG), three(3) input
  atomic queues that belong to the same atomic group but have different
  priority. The events sent between the queues of the EO-pair are counted and
  statistics for each pair type is printed. If the queues in the EO-pair retain
  order also this is verified.

  E.g start on 6 cores
  > sudo ./build/queue_types_ag -c 0x0e0e -n 4 -- -p

  The test prints event count statistics from each core:
  Stat Core-05: Count/PairType \
  A-A:1470184 P-P:1472316 PO-PO:1480740 P-A:1444621 PO-A:1333477 PO-P:1212570 \
  AG-AG:3296741 AG-A:1591504 AG-P:1515778 AG-PO:1959302 (cycles/event:457.25)

  , where A=atomic queue, P=parallel queue, PO=parallel-ordered queue
    and AG=atomic group (consisting of 3 atomic queues each)

  NOTE: The number of events in the AG-xx cases may be higher due to the higher
        number of queues and events in an atomic group vs a single queue.

  -----------------------------------------------------------------------------
  A6.2) Queue: ordered
        em-dpdk/programs/example/queue/ordered.c
  -----------------------------------------------------------------------------
  Event Machine Parallel-Ordered queue test

  E.g start on 2 cores:
  > sudo ./build/ordered -n 4 -c 0x3c3c -- -t

  ...
  EO 249 starting.
  Entering the event dispatch loop() on EM-core 0
  ...
  Entering the event dispatch loop() on EM-core 5
  Entering the event dispatch loop() on EM-core 1
  cycles per event 742.56  @2693.58 MHz (core-01 1)
  cycles per event 743.75  @2693.58 MHz (core-00 1)
  cycles per event 743.92  @2693.58 MHz (core-05 1)
  cycles per event 746.63  @2693.58 MHz (core-03 1)
  cycles per event 747.42  @2693.58 MHz (core-04 1)
  cycles per event 748.22  @2693.58 MHz (core-02 1)
  cycles per event 748.31  @2693.58 MHz (core-07 1)
  cycles per event 749.51  @2693.58 MHz (core-06 1)
  cycles per event 743.92  @2693.58 MHz (core-01 2)
  cycles per event 744.81  @2693.58 MHz (core-00 2)
  cycles per event 746.15  @2693.58 MHz (core-05 2)
  cycles per event 745.51  @2693.58 MHz (core-03 2)
  cycles per event 745.37  @2693.58 MHz (core-04 2)
  cycles per event 745.14  @2693.58 MHz (core-07 2)
  cycles per event 749.82  @2693.58 MHz (core-02 2)
  cycles per event 749.28  @2693.58 MHz (core-06 2)
  ...

  -----------------------------------------------------------------------------
  A8)  dispatcher_callback -
       em-dpdk/programs/example/dispatcher/dispatcher_callback.c
  -----------------------------------------------------------------------------
  Event Machine dispatcher user-callback-function test example.

  Based on the hello world example. Adds dispatcher enter and exit callback
  functions which are called right before and after the EO receive function.

  E.g start on 2 cores:
  > sudo ./build/dispatcher_callback -c 0x3 -n 4 -- -t

  ...
  Test start EO A: EO 3, queue:258.
  Test start EO B: EO 5, queue:259.
  Entering the event dispatch loop() on EM-core 0
  ++ Dispatcher enter callback 1 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 0.
  ++ Dispatcher enter callback 2 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 0.
  Ping from EO A! Queue: 258 on core 00. Event seq: 0.
  Entering the event dispatch loop() on EM-core 1
  ++ Dispatcher enter callback 1 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 1.
  ++ Dispatcher enter callback 2 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 1.
  Ping from EO B! Queue: 259 on core 01. Event seq: 1.
  -- Dispatcher exit callback 1 for EO: 3
  -- Dispatcher exit callback 2 for EO: 3
  -- Dispatcher exit callback 1 for EO: 5
  -- Dispatcher exit callback 2 for EO: 5
  ++ Dispatcher enter callback 1 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 2.
  ++ Dispatcher enter callback 2 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 2.
  Ping from EO A! Queue: 258 on core 00. Event seq: 2.
  -- Dispatcher exit callback 1 for EO: 3
  -- Dispatcher exit callback 2 for EO: 3
  ++ Dispatcher enter callback 1 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 3.
  ++ Dispatcher enter callback 2 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 3.
  ...

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
B) Performance test examples - does not require any external
   input or I/O, just compile and run to get output/results.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  > cd em-dpdk/programs/performance/
  > make real_clean && make em_clean && make clean && make

  -----------------------------------------------------------------------------
  B1) Performance: pairs -
      em-dpdk/programs/performance/pairs.c
  -----------------------------------------------------------------------------
  Measures the average cycles consumed during an event send-sched-receive loop
  for a certain number of EO pairs in the system. Test has a number of EO
  pairs, that send ping-pong events. Depending on test dynamics (e.g. single
  burst in atomic queue) only one EO of a pair might be active at a time.

  Note that the cycles per event increases with a larger core count (try e.g 4
  vs 8 cores). Also note that the way EM-cores are mapped onto HW threads
  matters: using HW threads on the same CPU-core (HW multithreading) differs
  from using HW threads on separate cores.

  Sample output run an a CPU with 8 cores, 16 HW threads, max core mask:0xffff

  Run on 4 EM-cores:
  >  sudo ./build/pairs -c 0xf -n 4 -- -p    (4 EM-cores (=processes) mapped to
                                              4 HW threads on separate cores)
  >  sudo ./build/pairs -c 0x0303 -n 4 -- -p (4 EM-cores =(processes) mapped to
                                              4 HW threads on 2 cores,
                                              2 HW threads per core)
   or
  >  sudo ./build/pairs -c 0xf -n 4 -- -t    (4 EM-cores (=pthreads) mapped to
                                              4 HW threads on separate cores)
  >  sudo ./build/pairs -c 0x0303 -n 4 -- -t (4 EM-cores (=pthreads) mapped to
                                              4 HW threads on 2 cores,
                                              2 HW threads per core)


  > sudo ./build/pairs -c 0xf -n 4 -- -p
  ...
  cycles per event 285.27  @2693.59 MHz (core-03 2)
  cycles per event 285.82  @2693.59 MHz (core-01 2)
  cycles per event 285.80  @2693.59 MHz (core-00 2)
  cycles per event 285.84  @2693.59 MHz (core-02 2)
  cycles per event 286.22  @2693.59 MHz (core-03 3)
  cycles per event 286.73  @2693.59 MHz (core-01 3)
  cycles per event 286.65  @2693.59 MHz (core-00 3)
  cycles per event 286.61  @2693.59 MHz (core-02 3)
  ...


  Run on 8 EM-cores:
  >  sudo ./build/pairs -c 0xff -n 4 -- -p   (8 EM-cores (=processes) mapped to
                                              8 HW threads on separate cores)
  >  sudo ./build/pairs -c 0x0f0f -n 4 -- -p (8 EM-cores (=processes) mapped to
                                              8 HW threads on 4 cores,
                                              2 HW threads per core)
   or
  >  sudo ./build/pairs -c 0xff -n 4 -- -t   (8 EM-cores (=pthreads) mapped to
                                              8 HW threads on separate cores)
  >  sudo ./build/pairs -c 0x0f0f -n 4 -- -t (8 EM-cores (=pthreads) mapped to
                                              8 HW threads on 4 cores,
                                              2 HW threads per core)

  or e.g.
  > sudo ./build/pairs -c 0xff0 -n 4 -- -p
  ...
  cycles per event 317.40  @2693.59 MHz (core-03 2)
  cycles per event 317.55  @2693.59 MHz (core-04 2)
  cycles per event 318.31  @2693.59 MHz (core-02 2)
  cycles per event 318.61  @2693.59 MHz (core-05 2)
  cycles per event 319.26  @2693.59 MHz (core-06 2)
  cycles per event 319.32  @2693.59 MHz (core-01 2)
  cycles per event 319.96  @2693.59 MHz (core-07 2)
  cycles per event 321.02  @2693.59 MHz (core-00 2)
  cycles per event 317.55  @2693.59 MHz (core-03 3)
  cycles per event 317.62  @2693.59 MHz (core-04 3)
  cycles per event 318.28  @2693.59 MHz (core-02 3)
  cycles per event 318.52  @2693.59 MHz (core-05 3)
  cycles per event 319.17  @2693.59 MHz (core-06 3)
  cycles per event 319.14  @2693.59 MHz (core-01 3)
  cycles per event 320.11  @2693.59 MHz (core-07 3)
  ...

  -----------------------------------------------------------------------------
  B2) Performance: queue_groups -
      em-dpdk/programs/performance/queue_groups.c
  -----------------------------------------------------------------------------
  Event Machine queue group performance test.

  Measures the average cycles consumed during an event send-sched-receive loop
  for a certain number of queue groups in the system. The test increases the
  number of groups for each measurement round and prints the results.

  Plot the cycles/event to get an idea of how the system scales with an
  increasing number of queue groups.

  E.g. start on 14 cores:
  > sudo ./build/queue_groups -c 0xfefe -n 4 -- -p
  ...
  Creating 1 new queue group(s)
  New group name: grp_0
  Entering the event dispatch loop() on EM-core 0
  ...
  New queue group(s) ready

  Queues: 256, Queue groups: 1
  Cycles/Event: 376.73    @2693.59 MHz(0)
  Cycles/Event: 376.47    @2693.59 MHz(1)
  Cycles/Event: 376.66    @2693.59 MHz(2)
  Cycles/Event: 376.40    @2693.59 MHz(3)
  Cycles/Event: 376.55    @2693.59 MHz(4)
  Cycles/Event: 376.60    @2693.59 MHz(5)
  Cycles/Event: 376.38    @2693.59 MHz(6)
  Cycles/Event: 376.77    @2693.59 MHz(7)

  Creating 1 new queue group(s)
  New group name: grp_1
  New queue group(s) ready

  Queues: 256, Queue groups: 2
  Cycles/Event: 381.33    @2693.59 MHz(8)
  Cycles/Event: 381.28    @2693.59 MHz(9)
  Cycles/Event: 381.38    @2693.59 MHz(10)
  Cycles/Event: 381.18    @2693.59 MHz(11)
  Cycles/Event: 381.35    @2693.59 MHz(12)
  Cycles/Event: 381.29    @2693.59 MHz(13)
  Cycles/Event: 381.34    @2693.59 MHz(14)
  Cycles/Event: 381.33    @2693.59 MHz(15)
  ...

  -----------------------------------------------------------------------------
  B3) Performance: queues -
      em-dpdk/programs/performance/queues.c
  -----------------------------------------------------------------------------
  Event Machine performance test.

  Measures the average cycles consumed during an event send-sched-receive loop
  for a certain number of queues and events in the system. The test increases
  the number of queues[+events] for each measurement round and prints the
  results.

  Plot the cycles/event to get an idea of how the system scales with an
  increasing number of queues.

  E.g. start on 14 cores:
  > sudo ./build/queues -c 0xfefe -n 4 -- -t

  Create new queues: 8
  New Qs created:8 First:288 Last:295

  Number of queues: 8
  Number of events: 4096
  Cycles/Event: 1231  Latency Hi: 97392 Lo: 265086  @2694MHz(0)
  Cycles/Event: 1233  Latency Hi: 98481 Lo: 264635  @2694MHz(1)
  Cycles/Event: 1232  Latency Hi: 98118 Lo: 264811  @2694MHz(2)
  Cycles/Event: 1232  Latency Hi: 96143 Lo: 267460  @2694MHz(3)
  Cycles/Event: 1232  Latency Hi: 95945 Lo: 266943  @2694MHz(4)
  Cycles/Event: 1234  Latency Hi: 97214 Lo: 266064  @2694MHz(5)
  Cycles/Event: 1234  Latency Hi: 97296 Lo: 266196  @2694MHz(6)
  Cycles/Event: 1235  Latency Hi: 98790 Lo: 265076  @2694MHz(7)

  Create new queues: 8
  New Qs created:8 First:7065 Last:7072

  Number of queues: 16
  Number of events: 4096
  Cycles/Event: 684  Latency Hi: 47877 Lo: 153043  @2694MHz(8)
  Cycles/Event: 684  Latency Hi: 47813 Lo: 153240  @2694MHz(9)
  Cycles/Event: 684  Latency Hi: 46726 Lo: 154437  @2694MHz(10)
  Cycles/Event: 685  Latency Hi: 47444 Lo: 153830  @2694MHz(11)
  Cycles/Event: 685  Latency Hi: 47041 Lo: 154228  @2694MHz(12)
  Cycles/Event: 684  Latency Hi: 47447 Lo: 153749  @2694MHz(13)
  Cycles/Event: 684  Latency Hi: 47807 Lo: 153285  @2694MHz(14)
  Cycles/Event: 684  Latency Hi: 47777 Lo: 153325  @2694MHz(15)

  Create new queues: 16
  New Qs created:16 First:5937 Last:5952

  Number of queues: 32
  Number of events: 4096
  Cycles/Event: 594  Latency Hi: 22203 Lo: 152451  @2694MHz(16)
  Cycles/Event: 594  Latency Hi: 21952 Lo: 152730  @2694MHz(17)
  Cycles/Event: 594  Latency Hi: 22098 Lo: 152488  @2694MHz(18)
  Cycles/Event: 594  Latency Hi: 21978 Lo: 152711  @2694MHz(19)
  Cycles/Event: 594  Latency Hi: 22006 Lo: 152633  @2694MHz(20)
  Cycles/Event: 594  Latency Hi: 21890 Lo: 152832  @2694MHz(21)
  Cycles/Event: 594  Latency Hi: 21550 Lo: 153074  @2694MHz(22)
  Cycles/Event: 594  Latency Hi: 21934 Lo: 152824  @2694MHz(23)

  Create new queues: 32
  New Qs created:32 First:5953 Last:5984

  Number of queues: 64
  Number of events: 4096
  Cycles/Event: 545  Latency Hi: 26066 Lo: 134082  @2694MHz(24)
  Cycles/Event: 545  Latency Hi: 25869 Lo: 134325  @2694MHz(25)
  Cycles/Event: 545  Latency Hi: 25541 Lo: 134630  @2694MHz(26)
  Cycles/Event: 545  Latency Hi: 25873 Lo: 134357  @2694MHz(27)
  Cycles/Event: 545  Latency Hi: 25744 Lo: 134471  @2694MHz(28)
  Cycles/Event: 545  Latency Hi: 26052 Lo: 134205  @2694MHz(29)
  Cycles/Event: 545  Latency Hi: 26033 Lo: 134228  @2694MHz(30)
  Cycles/Event: 545  Latency Hi: 25995 Lo: 134209  @2694MHz(31)

  Create new queues: 64
  New Qs created:64 First:5985 Last:6048

  Number of queues: 128
  Number of events: 4096
  Cycles/Event: 516  Latency Hi: 32354 Lo: 119338  @2694MHz(32)
  Cycles/Event: 516  Latency Hi: 32347 Lo: 119619  @2694MHz(33)
  Cycles/Event: 516  Latency Hi: 32420 Lo: 119448  @2694MHz(34)
  Cycles/Event: 516  Latency Hi: 32363 Lo: 119505  @2694MHz(35)
  Cycles/Event: 516  Latency Hi: 32343 Lo: 119556  @2694MHz(36)
  Cycles/Event: 516  Latency Hi: 32307 Lo: 119669  @2694MHz(37)
  Cycles/Event: 516  Latency Hi: 32453 Lo: 119641  @2694MHz(38)
  Cycles/Event: 516  Latency Hi: 32261 Lo: 119761  @2694MHz(39)

  Create new queues: 128
  New Qs created:128 First:1989 Last:2116

  Number of queues: 256
  Number of events: 4096
  Cycles/Event: 508  Latency Hi: 42147 Lo: 107600  @2694MHz(40)
  Cycles/Event: 508  Latency Hi: 42045 Lo: 107483  @2694MHz(41)
  Cycles/Event: 508  Latency Hi: 42216 Lo: 107172  @2694MHz(42)
  Cycles/Event: 508  Latency Hi: 41878 Lo: 107425  @2694MHz(43)
  Cycles/Event: 508  Latency Hi: 42342 Lo: 107420  @2694MHz(44)
  Cycles/Event: 508  Latency Hi: 42318 Lo: 107284  @2694MHz(45)
  Cycles/Event: 508  Latency Hi: 42119 Lo: 107532  @2694MHz(46)
  Cycles/Event: 508  Latency Hi: 42251 Lo: 107412  @2694MHz(47)

  Create new queues: 256
  New Qs created:256 First:6049 Last:6304

  Number of queues: 512
  Number of events: 4096
  Cycles/Event: 499  Latency Hi: 44833 Lo: 102195  @2694MHz(48)
  Cycles/Event: 499  Latency Hi: 44630 Lo: 102181  @2694MHz(49)
  Cycles/Event: 499  Latency Hi: 44766 Lo: 102138  @2694MHz(50)
  Cycles/Event: 499  Latency Hi: 44586 Lo: 102485  @2694MHz(51)
  Cycles/Event: 499  Latency Hi: 44638 Lo: 102305  @2694MHz(52)
  Cycles/Event: 499  Latency Hi: 44733 Lo: 102219  @2694MHz(53)
  Cycles/Event: 499  Latency Hi: 44403 Lo: 102437  @2694MHz(54)
  Cycles/Event: 499  Latency Hi: 44414 Lo: 102218  @2694MHz(55)

  Create new queues: 512
  New Qs created:512 First:1425 Last:1936

  Number of queues: 1024
  Number of events: 4096
  Cycles/Event: 494  Latency Hi: 44398 Lo: 101154  @2694MHz(56)
  Cycles/Event: 494  Latency Hi: 44257 Lo: 101284  @2694MHz(57)
  Cycles/Event: 494  Latency Hi: 44328 Lo: 101202  @2694MHz(58)
  Cycles/Event: 494  Latency Hi: 44455 Lo: 101236  @2694MHz(59)
  Cycles/Event: 494  Latency Hi: 44371 Lo: 101018  @2694MHz(60)
  Cycles/Event: 494  Latency Hi: 44583 Lo: 100964  @2694MHz(61)
  Cycles/Event: 494  Latency Hi: 44230 Lo: 101406  @2694MHz(62)
  Cycles/Event: 494  Latency Hi: 44814 Lo: 100970  @2694MHz(63)
  Cycles/Event: 494  Latency Hi: 44541 Lo: 101066  @2694MHz(64)
  Cycles/Event: 494  Latency Hi: 44331 Lo: 101114  @2694MHz(65)
  Cycles/Event: 494  Latency Hi: 44396 Lo: 101000  @2694MHz(66)
  Cycles/Event: 494  Latency Hi: 44399 Lo: 100963  @2694MHz(67)
  Cycles/Event: 494  Latency Hi: 44578 Lo: 101152  @2694MHz(68)
  Cycles/Event: 494  Latency Hi: 44446 Lo: 100920  @2694MHz(69)
  Cycles/Event: 494  Latency Hi: 44362 Lo: 100899  @2694MHz(70)
  Cycles/Event: 494  Latency Hi: 44608 Lo: 101170  @2694MHz(71)

  ...

  -----------------------------------------------------------------------------
  B4) Performance: atomic_processing_end -
      em-dpdk/programs/performance/atomic_processing_end.c  -----------------------------------------------------------------------------
  Event Machine em_atomic_processing_end() example

  Measures the average cycles consumed during an event send-sched-receive loop
  for an EO pair using atomic queues and alternating between calling
  em_atomic_processing_end() and not calling it.
  Each EO's receive function will do some dummy work for each received event.
  The em_atomic_processing_end() is called before processing the dummy work to
  allow another core to continue processing events from the queue. Not calling
  em_atomic_processing_end() with a low number of queues and long per-event
  processing times will limit the throughput and the cycles/event result.
  For comparison, both results with em_atomic_processing_end()-calling enabled
  and and disabled is shown.
  Note: Calling em_atomic_processing_end() will normally give worse performance
  except in cases when atomic event processing becomes a bottleneck by blocking
  other cores from doing their work (as this test tries to demonstrate).

  E.g. start on 8 cores:
  > sudo ./build/atomic_processing_end -c 0x0ff0 -n 4 -- -t
  ...
    normal atomic processing:      7292 cycles/event      @2804.11 MHz (core-00 1)
    normal atomic processing:     13371 cycles/event      @2800.25 MHz (core-05 1)
    normal atomic processing:     13946 cycles/event      @2800.25 MHz (core-06 1)
    normal atomic processing:     15632 cycles/event      @2800.25 MHz (core-02 1)
    normal atomic processing:     14415 cycles/event      @2800.25 MHz (core-04 1)
    normal atomic processing:     14921 cycles/event      @2800.25 MHz (core-07 1)
    normal atomic processing:     15545 cycles/event      @2800.16 MHz (core-03 1)
    normal atomic processing:     19156 cycles/event      @2800.16 MHz (core-01 1)
  em_atomic_processing_end():      5553 cycles/event      @2800.25 MHz (core-04 2)
  em_atomic_processing_end():      5582 cycles/event      @2800.16 MHz (core-03 2)
  em_atomic_processing_end():      5589 cycles/event      @2800.16 MHz (core-01 2)
  em_atomic_processing_end():      5606 cycles/event      @2800.25 MHz (core-02 2)
  em_atomic_processing_end():      5621 cycles/event      @2800.25 MHz (core-05 2)
  em_atomic_processing_end():      5626 cycles/event      @2800.25 MHz (core-06 2)
  em_atomic_processing_end():      5643 cycles/event      @2804.11 MHz (core-00 2)
  em_atomic_processing_end():      5723 cycles/event      @2800.25 MHz (core-07 2)
    normal atomic processing:      6800 cycles/event      @2804.11 MHz (core-00 3)
    normal atomic processing:     12037 cycles/event      @2800.25 MHz (core-05 3)
    normal atomic processing:     11696 cycles/event      @2800.25 MHz (core-06 3)
    normal atomic processing:     13019 cycles/event      @2800.25 MHz (core-04 3)
    normal atomic processing:     13155 cycles/event      @2800.25 MHz (core-07 3)
    normal atomic processing:     13315 cycles/event      @2800.25 MHz (core-02 3)
    normal atomic processing:     16736 cycles/event      @2800.16 MHz (core-03 3)
    normal atomic processing:     18215 cycles/event      @2800.16 MHz (core-01 3)
  em_atomic_processing_end():      5554 cycles/event      @2800.25 MHz (core-04 4)
  em_atomic_processing_end():      5580 cycles/event      @2800.16 MHz (core-03 4)
  em_atomic_processing_end():      5584 cycles/event      @2800.16 MHz (core-01 4)
  em_atomic_processing_end():      5603 cycles/event      @2800.25 MHz (core-02 4)
  em_atomic_processing_end():      5616 cycles/event      @2800.25 MHz (core-05 4)
  em_atomic_processing_end():      5625 cycles/event      @2800.25 MHz (core-06 4)
  em_atomic_processing_end():      5640 cycles/event      @2804.11 MHz (core-00 4)
  em_atomic_processing_end():      5722 cycles/event      @2800.25 MHz (core-07 4)
  ...

  -----------------------------------------------------------------------------
  B5) Performance: queues_unscheduled -
      em-dpdk/programs/performance/queues_unscheduled.c
  -----------------------------------------------------------------------------
  Event Machine performance test
  Based on the queues.c test and extends it to also use unscheduled
  queues.

  Measures the average cycles consumed during an event send-sched-receive loop
  for a certain number of queues and events in the system. The test increases
  the number of queues[+events] for each measurement round and prints the
  results. Each normal scheduled queue is accompanied by an unscheduled queue
  that is dequeued from at each event receive. Both the received event and the
  dequeued event is sent to the next queue at the end of the receive function.

  The measured cycles contain the scheduled event send-sched-receive cycles as
  well as the unscheduled event dequeue

  Plot the cycles/event to get an idea of how the system scales with an
  increasing number of queues.

  E.g. run on 14 cores:
  > sudo ./build/queues_unscheduled -c 0xfefe -n 4 -- -t
  ...
  Create new queues - scheduled:8 + unscheduled:8

  Number of queues:      8 + 8
  Number of events:   2048 + 2048
  Cycles/Event: 834  Latency Hi: 35170 Lo: 88167  @2694MHz(0)
  Cycles/Event: 834  Latency Hi: 34631 Lo: 88448  @2694MHz(1)
  Cycles/Event: 834  Latency Hi: 33202 Lo: 90042  @2694MHz(2)
  Cycles/Event: 835  Latency Hi: 31936 Lo: 91393  @2694MHz(3)
  Cycles/Event: 835  Latency Hi: 32275 Lo: 90915  @2694MHz(4)
  Cycles/Event: 835  Latency Hi: 31905 Lo: 91232  @2694MHz(5)
  Cycles/Event: 835  Latency Hi: 32350 Lo: 90583  @2694MHz(6)
  Cycles/Event: 835  Latency Hi: 36971 Lo: 86235  @2694MHz(7)

  Create new queues - scheduled:8 + unscheduled:8

  Number of queues:     16 + 16
  Number of events:   2048 + 2048
  Cycles/Event: 469  Latency Hi: 23507 Lo: 45394  @2694MHz(8)
  Cycles/Event: 469  Latency Hi: 22632 Lo: 46299  @2694MHz(9)
  Cycles/Event: 469  Latency Hi: 22975 Lo: 45997  @2694MHz(10)
  Cycles/Event: 469  Latency Hi: 22952 Lo: 46006  @2694MHz(11)
  Cycles/Event: 469  Latency Hi: 23571 Lo: 45372  @2694MHz(12)
  Cycles/Event: 469  Latency Hi: 23092 Lo: 45846  @2694MHz(13)
  Cycles/Event: 469  Latency Hi: 23005 Lo: 45923  @2694MHz(14)
  Cycles/Event: 469  Latency Hi: 23251 Lo: 45716  @2694MHz(15)

  Create new queues - scheduled:16 + unscheduled:16

  Number of queues:     32 + 32
  Number of events:   2048 + 2048
  Cycles/Event: 398  Latency Hi: 15282 Lo: 43075  @2694MHz(16)
  Cycles/Event: 398  Latency Hi: 15538 Lo: 42841  @2694MHz(17)
  Cycles/Event: 398  Latency Hi: 15513 Lo: 42815  @2694MHz(18)
  Cycles/Event: 398  Latency Hi: 15506 Lo: 42844  @2694MHz(19)
  Cycles/Event: 398  Latency Hi: 15500 Lo: 42806  @2694MHz(20)
  Cycles/Event: 398  Latency Hi: 15186 Lo: 43176  @2694MHz(21)
  Cycles/Event: 398  Latency Hi: 15419 Lo: 42912  @2694MHz(22)
  Cycles/Event: 398  Latency Hi: 15498 Lo: 42874  @2694MHz(23)

  Create new queues - scheduled:32 + unscheduled:32

  Number of queues:     64 + 64
  Number of events:   2048 + 2048
  Cycles/Event: 370  Latency Hi: 5584 Lo: 48616  @2694MHz(24)
  Cycles/Event: 370  Latency Hi: 5606 Lo: 48651  @2694MHz(25)
  Cycles/Event: 369  Latency Hi: 5605 Lo: 48596  @2694MHz(26)
  Cycles/Event: 370  Latency Hi: 5630 Lo: 48615  @2694MHz(27)
  Cycles/Event: 370  Latency Hi: 5627 Lo: 48590  @2694MHz(28)
  Cycles/Event: 369  Latency Hi: 5578 Lo: 48557  @2694MHz(29)
  Cycles/Event: 370  Latency Hi: 5595 Lo: 48633  @2694MHz(30)
  Cycles/Event: 370  Latency Hi: 5600 Lo: 48623  @2694MHz(31)

  Create new queues - scheduled:64 + unscheduled:64

  Number of queues:    128 + 128
  Number of events:   2048 + 2048
  Cycles/Event: 352  Latency Hi: 10616 Lo: 40956  @2694MHz(32)
  Cycles/Event: 352  Latency Hi: 10633 Lo: 40930  @2694MHz(33)
  Cycles/Event: 352  Latency Hi: 10644 Lo: 40931  @2694MHz(34)
  Cycles/Event: 352  Latency Hi: 10648 Lo: 40983  @2694MHz(35)
  Cycles/Event: 352  Latency Hi: 10646 Lo: 40954  @2694MHz(36)
  Cycles/Event: 352  Latency Hi: 10687 Lo: 40942  @2694MHz(37)
  Cycles/Event: 352  Latency Hi: 10634 Lo: 40992  @2694MHz(38)
  Cycles/Event: 352  Latency Hi: 10671 Lo: 40989  @2694MHz(39)

  Create new queues - scheduled:128 + unscheduled:128

  Number of queues:    256 + 256
  Number of events:   2048 + 2048
  Cycles/Event: 344  Latency Hi: 12443 Lo: 38042  @2694MHz(40)
  Cycles/Event: 344  Latency Hi: 12497 Lo: 38034  @2694MHz(41)
  Cycles/Event: 344  Latency Hi: 12541 Lo: 37943  @2694MHz(42)
  Cycles/Event: 344  Latency Hi: 12532 Lo: 37943  @2694MHz(43)
  Cycles/Event: 344  Latency Hi: 12515 Lo: 37984  @2694MHz(44)
  Cycles/Event: 344  Latency Hi: 12529 Lo: 37974  @2694MHz(45)
  Cycles/Event: 344  Latency Hi: 12460 Lo: 38014  @2694MHz(46)
  Cycles/Event: 344  Latency Hi: 12529 Lo: 37952  @2694MHz(47)

  Create new queues - scheduled:256 + unscheduled:256

  Number of queues:    512 + 512
  Number of events:   2048 + 2048
  Cycles/Event: 336  Latency Hi: 12735 Lo: 36438  @2694MHz(48)
  Cycles/Event: 335  Latency Hi: 12765 Lo: 36436  @2694MHz(49)
  Cycles/Event: 335  Latency Hi: 12743 Lo: 36427  @2694MHz(50)
  Cycles/Event: 335  Latency Hi: 12775 Lo: 36437  @2694MHz(51)
  Cycles/Event: 336  Latency Hi: 12762 Lo: 36419  @2694MHz(52)
  Cycles/Event: 335  Latency Hi: 12722 Lo: 36490  @2694MHz(53)
  Cycles/Event: 335  Latency Hi: 12737 Lo: 36426  @2694MHz(54)
  Cycles/Event: 335  Latency Hi: 12789 Lo: 36416  @2694MHz(55)
  Cycles/Event: 335  Latency Hi: 12725 Lo: 36444  @2694MHz(56)
  Cycles/Event: 335  Latency Hi: 12812 Lo: 36422  @2694MHz(57)
  Cycles/Event: 336  Latency Hi: 12763 Lo: 36484  @2694MHz(58)
  Cycles/Event: 336  Latency Hi: 12765 Lo: 36535  @2694MHz(59)
  Cycles/Event: 335  Latency Hi: 12827 Lo: 36467  @2694MHz(60)
  Cycles/Event: 336  Latency Hi: 12789 Lo: 36475  @2694MHz(61)
  Cycles/Event: 335  Latency Hi: 12798 Lo: 36424  @2694MHz(62)
  Cycles/Event: 336  Latency Hi: 12824 Lo: 36429  @2694MHz(63)
  Cycles/Event: 335  Latency Hi: 12773 Lo: 36481  @2694MHz(64)
  ...

  -----------------------------------------------------------------------------
  B6) Performance: send_multi -
      em-dpdk/programs/performance/send_multi.c
  -----------------------------------------------------------------------------
  Event Machine performance test for burst sending of events.
  (based on the queues_unscheduled.c test and extends it to use burst
  sending of events into the next queue, see em_send_multi() &
  em_queue_dequeue_multi())

  Measures the average cycles consumed during an event send-sched-receive loop
  for a certain number of queues and events in the system. The test increases
  the number of queues[+events] for each measurement round and prints the
  results. The test will stop if the maximum number of supported queues by the
  system is reached.

  Each normal scheduled queue is accompanied by an unscheduled queue. Received
  events are stored until a suitable length event burst is available, then the
  whole burst is forwarded to the next queue in the chain using
  em_send_multi(). Each stored burst is accompanied by another burst taken
  from the associated unscheduled queue.
  Both the received scheduled events and the unscheduled dequeued events are
  sent as bursts to the next queue at the end of the receive function.

  The measured cycles contain the scheduled event send_multi-sched-receive
  cycles as well as the unscheduled event multi_dequeue.

  Plot the cycles/event to get an idea of how the system scales with an
  increasing number of queues.

  E.g. run on 14 cores:
  > sudo ./build/send_multi -c 0xfefe -n 4 -- -t
  ...
  Create new queues - scheduled:8 + unscheduled:8

  Number of queues:      8 + 8
  Number of events:   2048 + 2048
  Cycles/Event: 558  Latency Hi: 37929 Lo: 44146  @2694MHz(0)
  Cycles/Event: 559  Latency Hi: 37239 Lo: 44931  @2694MHz(1)
  Cycles/Event: 559  Latency Hi: 37720 Lo: 44746  @2694MHz(2)
  Cycles/Event: 559  Latency Hi: 38153 Lo: 44182  @2694MHz(3)
  Cycles/Event: 560  Latency Hi: 41254 Lo: 41385  @2694MHz(4)
  Cycles/Event: 560  Latency Hi: 40871 Lo: 41616  @2694MHz(5)
  Cycles/Event: 559  Latency Hi: 41417 Lo: 41241  @2694MHz(6)
  Cycles/Event: 559  Latency Hi: 41480 Lo: 40869  @2694MHz(7)

  Create new queues - scheduled:8 + unscheduled:8

  Number of queues:     16 + 16
  Number of events:   2048 + 2048
  Cycles/Event: 333  Latency Hi: 6396 Lo: 42390  @2694MHz(8)
  Cycles/Event: 333  Latency Hi: 6379 Lo: 42495  @2694MHz(9)
  Cycles/Event: 333  Latency Hi: 6356 Lo: 42537  @2694MHz(10)
  Cycles/Event: 333  Latency Hi: 6575 Lo: 42293  @2694MHz(11)
  Cycles/Event: 333  Latency Hi: 6365 Lo: 42496  @2694MHz(12)
  Cycles/Event: 333  Latency Hi: 6425 Lo: 42451  @2694MHz(13)
  Cycles/Event: 333  Latency Hi: 6405 Lo: 42506  @2694MHz(14)
  Cycles/Event: 333  Latency Hi: 6347 Lo: 42585  @2694MHz(15)

  Create new queues - scheduled:16 + unscheduled:16

  Number of queues:     32 + 32
  Number of events:   2048 + 2048
  Cycles/Event: 303  Latency Hi: 5074 Lo: 39313  @2694MHz(16)
  Cycles/Event: 303  Latency Hi: 5056 Lo: 39370  @2694MHz(17)
  Cycles/Event: 303  Latency Hi: 5238 Lo: 39199  @2694MHz(18)
  Cycles/Event: 303  Latency Hi: 5165 Lo: 39279  @2694MHz(19)
  Cycles/Event: 303  Latency Hi: 5107 Lo: 39288  @2694MHz(20)
  Cycles/Event: 302  Latency Hi: 5088 Lo: 39278  @2694MHz(21)
  Cycles/Event: 302  Latency Hi: 5289 Lo: 39084  @2694MHz(22)
  Cycles/Event: 302  Latency Hi: 5219 Lo: 39134  @2694MHz(23)

  Create new queues - scheduled:32 + unscheduled:32

  Number of queues:     64 + 64
  Number of events:   2048 + 2048
  Cycles/Event: 284  Latency Hi: 8290 Lo: 33378  @2694MHz(24)
  Cycles/Event: 284  Latency Hi: 8226 Lo: 33477  @2694MHz(25)
  Cycles/Event: 284  Latency Hi: 8258 Lo: 33430  @2694MHz(26)
  Cycles/Event: 284  Latency Hi: 8267 Lo: 33436  @2694MHz(27)
  Cycles/Event: 284  Latency Hi: 8239 Lo: 33458  @2694MHz(28)
  Cycles/Event: 284  Latency Hi: 8269 Lo: 33431  @2694MHz(29)
  Cycles/Event: 284  Latency Hi: 8157 Lo: 33557  @2694MHz(30)
  Cycles/Event: 284  Latency Hi: 8229 Lo: 33494  @2694MHz(31)

  Create new queues - scheduled:64 + unscheduled:64

  Number of queues:    128 + 128
  Number of events:   2048 + 2048
  Cycles/Event: 274  Latency Hi: 9716 Lo: 30444  @2694MHz(32)
  Cycles/Event: 274  Latency Hi: 9720 Lo: 30451  @2694MHz(33)
  Cycles/Event: 274  Latency Hi: 9726 Lo: 30460  @2694MHz(34)
  Cycles/Event: 274  Latency Hi: 9721 Lo: 30504  @2694MHz(35)
  Cycles/Event: 274  Latency Hi: 9733 Lo: 30471  @2694MHz(36)
  Cycles/Event: 274  Latency Hi: 9751 Lo: 30470  @2694MHz(37)
  Cycles/Event: 274  Latency Hi: 9743 Lo: 30477  @2694MHz(38)
  Cycles/Event: 274  Latency Hi: 9718 Lo: 30468  @2694MHz(39)

  Create new queues - scheduled:128 + unscheduled:128

  Number of queues:    256 + 256
  Number of events:   2048 + 2048
  Cycles/Event: 270  Latency Hi: 9830 Lo: 29766  @2694MHz(40)
  Cycles/Event: 270  Latency Hi: 9834 Lo: 29765  @2694MHz(41)
  Cycles/Event: 270  Latency Hi: 9848 Lo: 29792  @2694MHz(42)
  Cycles/Event: 270  Latency Hi: 9822 Lo: 29775  @2694MHz(43)
  Cycles/Event: 270  Latency Hi: 9798 Lo: 29801  @2694MHz(44)
  Cycles/Event: 270  Latency Hi: 9838 Lo: 29759  @2694MHz(45)
  Cycles/Event: 270  Latency Hi: 9797 Lo: 29782  @2694MHz(46)
  Cycles/Event: 270  Latency Hi: 9834 Lo: 29782  @2694MHz(47)

  Create new queues - scheduled:256 + unscheduled:256
  EAL: memzone_reserve_aligned_thread_unsafe(): No more room in config
  RING: Cannot reserve memory

  Unable to create more queues

  Test finished
  Max nbr of supported queues: 1261

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
C) Packet-I/O examples - used together with an external traffic generator.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  NOTE: some differences depending on the target environment:
  em-dpdk: These tests assume that the test system is equipped with at
           least one NIC supported by DPDK.

  Sources in directory: em-dpdk/programs/packet_io/

  Packet-io enabled by setting: em_conf.pkt_io = 1 before calling
  em_init(&em_conf).

  NOTE: Might require DPDK config changes, see README.intel
  Assign NIC ports to dpdk as described in the dpdk docs.

  > cd em-dpdk/programs/packet_io/
  > make real_clean && make em_clean && make clean && make

  -----------------------------------------------------------------------------
  C1) Packet-IO: loopback -
      em-dpdk/programs/packet_io/loopback.c
  -----------------------------------------------------------------------------
  Simple Dynamically Load Balanced Packet-I/O loopback test application.

  Receives UDP/IP packets, swaps the addresses/ports and sends the packets back
  to where they came from.
  The test expects the traffic generator to send data using 256 UDP-flows:
    - 4 IP dst addresses each with 64 different UDP dst ports (=flows).
      Alternatively setting "#define QUEUE_PER_FLOW 0" will accept any packets,
      but uses only a single default EM queue, thus limiting performance.

  Use the traffic generator to find the max sustainable throughput for loopback
  traffic. The throughput should increase near-linearly with increasing core
  counts, as set by '-c 0xcoremask'.

  > sudo ./build/loopback -c 0xffff -n 4 -- -p
    or
  > sudo ./build/loopback -c 0xffff -n 4 -- -t

  Note: The number of used flows has been decreased in order to work
  out-of-the-box with the dpdk default .config

  -----------------------------------------------------------------------------
  C2) Packet-IO: loopback_ag -
      em-dpdk/programs/packet_io/loopback_ag.c
  -----------------------------------------------------------------------------
  Test derived from the loopback test above but further groups the input
  queues into atomic groups (hence _ag) to provide multiple priority levels for
  each atomic processing context.

  An application (EO) that receives UDP datagrams and exchanges
  the src-dst addesses before sending the datagram back out.
  Each set of four input EM queues with prios Highest, High,
  Normal and Low are mapped into an EM atomic group to provide
  "atomic context with priority".
  Similar to normal atomic queues, atomic groups provide EOs with atomic
  processing context, but expands the context over multiple queues,
  i.e. over all the queues in the same atomic group.
  All queues in an atomic group are by default of type "atomic".

  Traffic setup and startup similar to the loopback case.

  -----------------------------------------------------------------------------
  C3) Packet-IO: multi_stage -
      em-dpdk/programs/packet_io/multi_stage.c
  -----------------------------------------------------------------------------
  Similar packet-I/O example as loopback, except that each UDP-flow is
  handled in three (3) stages before sending back out. The three stages (3 EOs)
  causes each packet be enqueued, scheduled and received multiple times on the
  multicore CPU, thus stressing the EM scheduler.
  Additionally this test uses EM queues of different priority and type.

  The test expects the traffic generator to send data using 128 UDP-flows:
    - 4 IP dst addresses each with 32 different UDP dst ports (=flows).

  Use the traffic generator to find the max sustainable throughput for
  multi-stage traffic. The throughput should increase near-linearly with
  increasing core counts, as set by '-c 0xcoremask'.

  > sudo ./build/multi_stage -c 0xffff -n 4 -- -p
    or
  > sudo ./build/multi_stage -c 0xffff -n 4 -- -t

  Note: The number of used flows has been decreased in order to work
  out-of-the-box with the dpdk default .config

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
D) Add-on examples - Event Timer
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  Sources in directory: em-dpdk/programs/example/add-ons/

  EM event timer enabled by setting: em_conf.event_timer = 1 before calling
  em_init(&em_conf).

  NOTE: Currently the event timer tests must be run in thread-per-core (-t) mode
        due to a bug in dpdk (tested dpdk v17.08).

  > cd em-dpdk/programs/example/add-ons/
  > make real_clean && make em_clean && make clean && make

  -----------------------------------------------------------------------------
  D1) Event Timer: timer_hello -
      em-dpdk/programs/example/add-ons/timer_hello.c
  -----------------------------------------------------------------------------
  Event Machine timer add-on hello world example.

  Timer hello world example to show basic event timer usage. Creates a
  single EO that starts a periodic and a random one-shot timeout.

  Exception/error management is simplified to focus on basic timer usage.

  E.g. start on 14 cores:
  > sudo ./build/timer_hello -c 0xfefe -n 4 -- -t
  ...
  EO start
  System has 1 timer(s)
  Timer "ExampleTimer" info:
    -resolution: 50000 ns
    -max_tmo: 2592000000000 ms
    -num_tmo: 2047
    -tick Hz: 2693514283 hz
  1. tick
  tock
  2. tick
  tock
  3. tick
  tock
  4. tick
  tock
  5. tick
  tock
  Meditation time: what can you do in 8538 ms?
  6. tick
  tock
  7. tick
  tock
  8. tick
  tock
  9. tick
  tock
  8538 ms gone!

  Meditation time: what can you do in 15603 ms?
  10. tick
  tock
  11. tick
  tock
  12. tick
  tock
  13. tick
  tock
  14. tick
  tock
  15. tick
  tock
  16. tick
  tock
  17. tick
  tock
  15603 ms gone!
  ...

  -----------------------------------------------------------------------------
  D2) Event Timer: timer_test -
      em-dpdk/programs/example/add-ons/timer_test.c
  -----------------------------------------------------------------------------
  Event Machine timer add-on basic test.

  Simple test for timer (does not test everything). Creates and deletes random
  timers and checks how accurate the timeout indications are against timer
  itself and also linux time (clock_gettime). Single EO, but receiving queue
  is parallel so multiple threads can process timeouts concurrently.

  Exception/error management is simplified and aborts on any error.

  E.g. start on 14 cores:
  > sudo ./build/timer_test -c 0xfefe -n 4 -- -t
  ...
  EO start
  System has 1 timer(s)
  Timer "TestTimer" info:
    -resolution: 50000 ns
    -max_tmo: 2592000000000 ms
    -num_tmo: 2047
    -tick Hz: 2693513330 hz
  Linux reports clock running at 1000000000 hz
  app_eo_start done, test repetition interval 96s

  Aug-29 16:14:26 ROUND 1 ************
  Timer: Creating 1500 timeouts took 42188 ns (28 ns each)
  Linux: Creating 1500 timeouts took 41949 ns (27 ns each)
  Started single shots
  Started periodic
  Running
  ...............................................
  Heartbeat count 48
  ONESHOT:
   Received: 1500, expected 1500
   Cancelled OK: 0
   Cancel failed (too late): 0
   SUMMARY/TICKS: min 2695, max 148754, avg 69087
          /NS: min 1000, max 55226, avg 25649
   SUMMARY/LINUX NS: min 4546, max 155133, avg 78032
  PERIODIC:
   Received: 15687
   Cancelled: 52
   Cancel failed (too late): 0
  Errors: 0

  TOTAL RUNTIME/US: min 1, max 55
  TOTAL RUNTIME LINUX/US: min 4, max 155
  TOTAL ERRORS: 0
  TOTAL ACK FAILS: 0

  Cleaning up
  Timer: Deleting 1500 timeouts took 63056 ns (42 ns each)
  Linux: Deleting 1500 timeouts took 62836 ns (41 ns each)


  Aug-29 16:16:02 ROUND 2 ************
  Timer: Creating 1500 timeouts took 41596 ns (27 ns each)
  Linux: Creating 1500 timeouts took 41415 ns (27 ns each)
  Started single shots
  Started periodic
  Running
  ...............................................
  Heartbeat count 96
  ONESHOT:
   Received: 1500, expected 1500
   Cancelled OK: 0
   Cancel failed (too late): 0
   SUMMARY/TICKS: min 1763, max 156497, avg 70586
          /NS: min 654, max 58101, avg 26205
   SUMMARY/LINUX NS: min 5195, max 156309, avg 78396
  PERIODIC:
   Received: 11582
   Cancelled: 42
   Cancel failed (too late): 0
  Errors: 0

  TOTAL RUNTIME/US: min 0, max 58
  TOTAL RUNTIME LINUX/US: min 4, max 156
  TOTAL ERRORS: 0
  TOTAL ACK FAILS: 0

  Cleaning up
  Timer: Deleting 1500 timeouts took 64467 ns (42 ns each)
  Linux: Deleting 1500 timeouts took 64346 ns (42 ns each)
  ...

===============================================================================
3. Changes
===============================================================================
See CHANGE_NOTES

===============================================================================
4. Open Issues
===============================================================================

===============================================================================
5. Notes
===============================================================================
Open Event Machine Code

An event driven processing runtime for multicore

Branches

Tags

Tree [94a5bc] master v1.3.0 /

History

Read Me

Open Event Machine Code

An event driven processing runtime for multicore

Branches

Tags

Tree [94a5bc] master v1.3.0 / Download Snapshot History

Read Me

Tree [94a5bc] master v1.3.0 /

History