Read Me
Copyright (c) 2013-2016, Nokia Solutions and Networks
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
===============================================================================
Table of Contents
===============================================================================
1. Open Event Machine
2. Examples / Test Cases
A) Basic standalone examples
B) Performance test examples
C) Packet-I/O examples
D) Add-on API examples (event-timer using the EM add-on APIs)
3. Changes
4. Open Issues
5. Notes
===============================================================================
1. Open Event Machine (OpenEM, EM or em-dpdk)
===============================================================================
License:
- OpenEM - See the the event_machine/LICENSE file.
- DPDK - See the DPDK package.
Note: This release has support for the following environment:
em-dpdk - optimized for Intel x86_64 + Linux + DPDK + Intel NICs
See event_machine/intel/README.dpdk
Note: Read the Open Event Machine API, especially the event_machine.h file and
the files it includes, for a more thorough description about the
functionality. Documentation slides also available.
OpenEM version numbering scheme reflects the used OpenEM API version:
a.b.c(-d)
a.b = API version supported
.c = implementation number supporting API version a.b
-d = fix number, added if needed
So this means that e.g. version 1.2.2 stands for
a.b = 1.2: supports OpenEM API v1.2
.c = .2: third implementation supporting API 1.2 (.0 is first)
no -d = N/A: no fixes yet
Open Event Machine is a lightweight, dynamically load balancing,
run-to-completion, event processing runtime targeting multicore SoC data plane
environments.
This release of the Open Event Machine contains the OpenEM API as well an
example all-SW implementation.
The Event Machine is a multicore optimized, high performance, data plane
processing concept based on asynchronous queues and event scheduling.
Applications are built from Execution Objects (EO), events and event queues.
The EO is a run-to-completion object that gets called at each event receive.
Communication is built around events and event queues. The EM scheduler selects
an event from a queue based on priority, and calls the EO's receive function
for event processing. The EO might send events to other EOs or HW-accelerators
for further processing or output. The next event for the core is selected only
after the EO has returned. The selected queue type, queue priority and queue
group affects the scheduling. Dynamic load balancing of work between the cores
is the default behaviour. Note that the application (or EO) is not tied to a
particular core, instead the event processing for the EO is performed on any
core that is available at that time.
===============================================================================
2. Examples / Test Cases
===============================================================================
The package contains a set of examples / test cases.
For compilation see:
em-dpdk/README.dpdk
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A) Basic standalone Examples - do not require any external input or I/O,
just compile and run to get output/results.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sources in directory: em-dpdk/programs/example/
(and em-dpdk/programs/common/ for main() and common setup)
> cd em-dpdk/programs/example/
> make real_clean && make em_clean && make clean && make
-----------------------------------------------------------------------------
A1) Hello World: hello
em-dpdk/programs/example/hello/hello.c
-----------------------------------------------------------------------------
Simple "Hello World" example the Event Machine way.
Creates two Execution Objects (EOs), each with a dedicated queue for incoming
events, and ping-pongs an event between the EOs while printing "Hello world"
each time the event is received. Note the changing core number in the
printout below (the event is handled by any core that is ready to take on new
work):
Run hello on 8 cores
> sudo ./build/hello -c 0xff -n 4 -- -p
or
> sudo ./build/hello -c 0xff -n 4 -- -t
...
Hello world started EO A. I'm EO 0. My queue is 320.
Hello world started EO B. I'm EO 1. My queue is 321.
Entering event dispatch loop() on EM-core 0
Hello world from EO A! My queue is 320. I'm on core 02. Event seq is 0.
Hello world from EO B! My queue is 321. I'm on core 04. Event seq is 1.
Hello world from EO A! My queue is 320. I'm on core 02. Event seq is 2.
Hello world from EO B! My queue is 321. I'm on core 05. Event seq is 3.
...
-----------------------------------------------------------------------------
A2) Fractal calculation and drawing: fractal
em-dpdk/programs/example/fractal/fractal.c
-----------------------------------------------------------------------------
Event Machine fractal (Mandelbrot set) drawing application.
Generate Mandelbroth fractal images and for each image zoom deeper into the
fractal. Prints frames-per-second and the frame range. Modify defined values
for different resolution, zoom point, starting frame, ending frame and
the precision of the fractal calculation.
Creates three Execution Objects to form a pipeline:
--> ](Pixel_handler) --> ](Worker) --> ](Imager) -==> IMAGE
\_____________________________________________/
Note: Creates a ramdisk to /tmp/ramdisk to store fractal.ppm.
Runs the 'feh' application to display and refresh the image as a
background process, install if missing or use something else.
Run fractal on 4 cores
> sudo ./build/fractal -c 0xf -n 4 -- -t
...
Started Pixel handler. I'm EO 6. My queue is 352.
Started Worker. I'm EO 7. My queue is 353.
Started Imager. I'm EO 8. My queue is 355.
Entering the event dispatch loop() on EM-core 0
Entering the event dispatch loop() on EM-core 1
Entering the event dispatch loop() on EM-core 2
Entering the event dispatch loop() on EM-core 3
Frames per second: 01 | frames 0 - 0
Frames per second: 13 | frames 1 - 13
Frames per second: 08 | frames 14 - 21
Frames per second: 08 | frames 22 - 29
Frames per second: 06 | frames 30 - 35
...
-----------------------------------------------------------------------------
A3) Error Handling: error
em-dpdk/programs/example/error/error.c
-----------------------------------------------------------------------------
Demonstrate and test the Event Machine error handling functionality,
Three application EOs are created, each with a dedicated queue. An
application specific global error handler is registered (thus replacing the
EM default). Additionally EO A will register an EO specific error handler.
When the EOs receive events (error_receive) they will generate errors by
explicit calls to em_error() and by calling EM-API functions with invalid
arguments. The registered error handlers simply print the error information
on screen.
Note that we continue executing after a fatal error since these are only
test-errors.
> sudo ./build/error -c 0xf -n 4 -- -p
or
> sudo ./build/error -c 0xf -n 4 -- -t
...
Error log from EO C [0] on core 1!
THIS IS A FATAL ERROR!!
Appl Global error handler : EO 2 error 0x8000DEAD escope 0x0
Return from fatal.
Error log from EO A [0] on core 2!
Appl EO specific error handler: EO 0 error 0x00001111 escope 0x1
Appl EO specific error handler: EO 0 error 0x00002222 escope 0x2 ARGS: Second error
Appl EO specific error handler: EO 0 error 0x00003333 escope 0x3 ARGS: Third error 320
Appl EO specific error handler: EO 0 error 0x00004444 escope 0x4 ARGS: Fourth error 320 0
Appl EO specific error handler: EO 0 error 0x0000000A escope 0xFF000402 - EM info:
EM ERROR:0x0000000A ESCOPE:0xFF000402 EO:1-"EO B" core:03 ecount:81(14) em_free(L:2149) em.c event ptr NULL!
...
-----------------------------------------------------------------------------
A4.1) Event Group: event_group
em-dpdk/programs/example/event_group/event_group.c
-----------------------------------------------------------------------------
Tests and measures the event group feature for fork-join type of operations
using events. See the event_machine_group.h file for event group API calls.
An EO allocates and sends a number of data events to itself (using an event
group) to trigger a notification event to be sent when the configured event
count has been received. The cycles consumed until the notification is
received is measured and printed.
Note: To keep things simple this test case uses only a single queue into
which to receive all events, including the notification events. The event
group fork-join mechanism does not care about the used queues however, it's
basically a counter of events sent using a certain event group id. In a more
complex example each data event could be send from different EO:s to
different queues and the final notification event sent yet to another queue.
> sudo ./build/event_group -c 0xf -n 4 -- -p
or
> sudo ./build/event_group -c 0xf -n 4 -- -t
...
--- Start event group ---
Event group notification event received after 256 data events.
Cycles curr:139024, ave:144201
--- Start event group ---
Event group notification event received after 256 data events.
Cycles curr:144075, ave:144188
--- Start event group ---
Event group notification event received after 256 data events.
Cycles curr:142648, ave:144048
...
-----------------------------------------------------------------------------
A4.2) Event Group: event_group_abort
em-dpdk/programs/example/event_group/event_group_abort.c
-----------------------------------------------------------------------------
Event Machine event group example/test using em_event_group_abort().
Aborting an ongoing event group means that events sent with that group no
longer belong to a valid event group. Same is valid for excess events.
If more group events are sent than the applied count, the excess events,
once received don't belong to a valid event group.
This example creates one EO with a parallel queue and allocates a defined
amount of event groups at startup that are reused. At round start the EO
allocates a defined amount of data events per event group and sends these to
the parallel queue. Events loop until the group is used as many times as
defined by the round. Event count and round stop count is set randomly.
Counters track received valid and non-valid event group events and
statistics on event group API calls. Events that don't belong to a valid
group are removed from the loop and freed. Once all event groups are
aborted, or notification events are received, a starting event is sent that
begins the next round.
During the round everything possible is done to misuse the event group to
try and break it by getting the test or the groups in an undefined state.
Groups are incremented, assigned and ended at random.
E.g. start on 12 cores:
> sudo ./build/event_group_abort -n 4 -c 0xfcfc -- -t
or
>
...
--- Round 1
Created 30 event group(s) with count of 1384
Abort group when received 82887 events
Group events received: Valid: 43893, Expired: 3812
Event group increments: Valid: 22090, Failed: 1
Event group assigns: Valid: 19730, Failed: 1783
Aborted 0 event groups
Failed to abort 0 times
Received 30 notification events
Freed 0 notification events
-----------------------------------------
--- Round 2
Created 30 event group(s) with count of 19487
Abort group when received 43952 events
Group events received: Valid: 591137, Expired: 3832
Event group increments: Valid: 295407, Failed: 1
Event group assigns: Valid: 288890, Failed: 4510
Aborted 0 event groups
Failed to abort 0 times
Received 30 notification events
Freed 0 notification events
-----------------------------------------
...
-----------------------------------------------------------------------------
A4.3) Event Group: event_group_assign_end
em-dpdk/programs/example/event_group/event_group_assign_end.c
-----------------------------------------------------------------------------
Event Machine event group example using em_event_group_assign() and
em_event_group_processing_end().
Test and measure the event group feature for fork-join type of operations
using events. See the event_machine_group.h file for event group API calls.
Allocates and sends a number of data events to itself (using two event
groups) to trigger notification events to be sent when the configured event
count has been received. The cycles consumed until the notification is
received is measured and printed.
Three events groups are used, two to track completion of a certain number of
data events and a final third chained event group to track completion of the
two other event groups. The test is restarted once the final third
notification is received. One of the two event groups used to track
completion of data events is a "normal" event group but the other event group
is assigned when a data event is received instead of normally sending the
event with an event group.
E.g. start on 2 cores:
> sudo ./build/event_group_assign_end -c 0x3 -n 4 -- -p
...
--- Start event group ---
--- Start assigned event group ---
"Normal" event group notification event received after 2048 data events.
Cycles curr:1811245, ave:1827395
Assigned event group notification event received after 2048 data events.
Cycles curr:1801551, ave:1821180
--- Chained event group done ---
--- Start assigned event group ---
--- Start event group ---
Assigned event group notification event received after 2048 data events.
Cycles curr:1814983, ave:1820294
"Normal" event group notification event received after 2048 data events.
Cycles curr:1799081, ave:1823350
--- Chained event group done ---
...
-----------------------------------------------------------------------------
A4.4) Event Group: event_group_chaining
em-dpdk/programs/example/event_group/event_group_chaining.c
-----------------------------------------------------------------------------
Event Machine event group chaining example.
Test and measure the event group chaining feature for fork-join type of
operations using events. See the event_machine_group.h file for the
event group API calls.
Allocates and sends a number of data events to itself using an event group.
When the configured event count has been reached, notification events are
triggered. The cycles consumed until each notification is received is
measured and printed. The notification events are sent using a second event
group which will trigger a final done event after all the notifications have
been processed.
Note: To keep things simple this test case uses only a single queue to
receive all events, including the notification events. The event group
fork-join mechanism does not care about the used queues however, it's
basically a counter of events sent using a certain event group id. In a more
complex example each data event could be send from different EO:s to
different queues and the final notification event sent yet to another queue.
E.g start on 7 cores:
> sudo ./build/event_group_chaining -c 0xfe -n 4 -- -p
...
--- Start event group ---
Event group notification event received after 256 data events.
Cycles curr:159870, ave:186599
Event group notification event received after 256 data events.
Cycles curr:160560, ave:184739
Event group notification event received after 256 data events.
Cycles curr:161112, ave:183164
Event group notification event received after 256 data events.
Cycles curr:232228, ave:186230
--- Chained event group done ---
--- Start event group ---
Event group notification event received after 256 data events.
Cycles curr:175525, ave:185601
Event group notification event received after 256 data events.
Cycles curr:176777, ave:185110
Event group notification event received after 256 data events.
Cycles curr:178012, ave:184737
Event group notification event received after 256 data events.
Cycles curr:178420, ave:184421
--- Chained event group done ---
...
-----------------------------------------------------------------------------
A5) Queue Group: queue_group
em-dpdk/programs/example/queue_group.c
-----------------------------------------------------------------------------
Event Machine queue group feature test.
Creates an EO with two queues: a notification queue and a data event queue.
The notif queue belongs to the default queue group and can be processed on
any core while the data queue belongs to a newly created queue group called
"test_group". The EO-receive function receives a number of data events and
then modifies the test queue group (i.e. changes the cores allowed to process
events from the data event queue). The test is restarted when the queue group
has been modified enough times to include each core at least once.
E.g start on 8 cores:
> sudo ./build/queue_group -c 0xf0f0 -n 4 -- -p
APPL: notif_start_done(line:518) - EM-core00:
Created test queue:266 type:PARALLEL(2) queue group:1 (name:"QGrp004")
APPL: notif_event_group_data_done(line:676) - EM-core02:
****************************************
Received 62464 events on Q:266:
QueueGroup:1, Curr Coremask:0x33
Now Modifying:
QueueGroup:1, New Coremask:0x34
****************************************
APPL: notif_event_group_data_done(line:676) - EM-core02:
****************************************
Received 62464 events on Q:266:
QueueGroup:1, Curr Coremask:0x66
Now Modifying:
QueueGroup:1, New Coremask:0x67
****************************************
...
APPL: notif_event_group_data_done(line:676) - EM-core07:
****************************************
Received 62464 events on Q:266:
QueueGroup:1, Curr Coremask:0xff
Now Modifying:
QueueGroup:1, New Coremask:0x0
****************************************
APPL: notif_queue_group_modify_done(line:552) - EM-core05:
*************************************
All cores removed from QueueGroup!
*************************************
APPL: notif_queue_group_modify_done(line:573) - EM-core05:
Deleting test queue:266, Qgrp ID:1 (name:"QGrp004")
APPL: receive_event_notif(line:397) - EM-core04:
***********************************************
!!! Restarting test !!!
***********************************************
...
-----------------------------------------------------------------------------
A6.1) Queue: queue_types_ag
em-dpdk/programs/example/queue/queue_types_ag.c
-----------------------------------------------------------------------------
Event Machine Queue Types test example with included atomic groups(=_ag).
The test creates several EO-pairs and sends events between the queues in the
pair. Each EO has an input queue (of type atomic(A), parallel(P) or
parallel-ordered(PO)) or, in the case of atomic groups(AG), three(3) input
atomic queues that belong to the same atomic group but have different
priority. The events sent between the queues of the EO-pair are counted and
statistics for each pair type is printed. If the queues in the EO-pair retain
order also this is verified.
E.g start on 6 cores
> sudo ./build/queue_types_ag -c 0x0e0e -n 4 -- -p
The test prints event count statistics from each core:
Stat Core-05: Count/PairType \
A-A:1470184 P-P:1472316 PO-PO:1480740 P-A:1444621 PO-A:1333477 PO-P:1212570 \
AG-AG:3296741 AG-A:1591504 AG-P:1515778 AG-PO:1959302 (cycles/event:457.25)
, where A=atomic queue, P=parallel queue, PO=parallel-ordered queue
and AG=atomic group (consisting of 3 atomic queues each)
NOTE: The number of events in the AG-xx cases may be higher due to the higher
number of queues and events in an atomic group vs a single queue.
-----------------------------------------------------------------------------
A6.2) Queue: ordered
em-dpdk/programs/example/queue/ordered.c
-----------------------------------------------------------------------------
Event Machine Parallel-Ordered queue test
E.g start on 2 cores:
> sudo ./build/ordered -n 4 -c 0x3c3c -- -t
...
EO 249 starting.
Entering the event dispatch loop() on EM-core 0
...
Entering the event dispatch loop() on EM-core 5
Entering the event dispatch loop() on EM-core 1
cycles per event 742.56 @2693.58 MHz (core-01 1)
cycles per event 743.75 @2693.58 MHz (core-00 1)
cycles per event 743.92 @2693.58 MHz (core-05 1)
cycles per event 746.63 @2693.58 MHz (core-03 1)
cycles per event 747.42 @2693.58 MHz (core-04 1)
cycles per event 748.22 @2693.58 MHz (core-02 1)
cycles per event 748.31 @2693.58 MHz (core-07 1)
cycles per event 749.51 @2693.58 MHz (core-06 1)
cycles per event 743.92 @2693.58 MHz (core-01 2)
cycles per event 744.81 @2693.58 MHz (core-00 2)
cycles per event 746.15 @2693.58 MHz (core-05 2)
cycles per event 745.51 @2693.58 MHz (core-03 2)
cycles per event 745.37 @2693.58 MHz (core-04 2)
cycles per event 745.14 @2693.58 MHz (core-07 2)
cycles per event 749.82 @2693.58 MHz (core-02 2)
cycles per event 749.28 @2693.58 MHz (core-06 2)
...
-----------------------------------------------------------------------------
A8) dispatcher_callback -
em-dpdk/programs/example/dispatcher/dispatcher_callback.c
-----------------------------------------------------------------------------
Event Machine dispatcher user-callback-function test example.
Based on the hello world example. Adds dispatcher enter and exit callback
functions which are called right before and after the EO receive function.
E.g start on 2 cores:
> sudo ./build/dispatcher_callback -c 0x3 -n 4 -- -t
...
Test start EO A: EO 3, queue:258.
Test start EO B: EO 5, queue:259.
Entering the event dispatch loop() on EM-core 0
++ Dispatcher enter callback 1 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 0.
++ Dispatcher enter callback 2 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 0.
Ping from EO A! Queue: 258 on core 00. Event seq: 0.
Entering the event dispatch loop() on EM-core 1
++ Dispatcher enter callback 1 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 1.
++ Dispatcher enter callback 2 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 1.
Ping from EO B! Queue: 259 on core 01. Event seq: 1.
-- Dispatcher exit callback 1 for EO: 3
-- Dispatcher exit callback 2 for EO: 3
-- Dispatcher exit callback 1 for EO: 5
-- Dispatcher exit callback 2 for EO: 5
++ Dispatcher enter callback 1 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 2.
++ Dispatcher enter callback 2 for EO: 3 (EO A) Queue: 258 on core 00. Event seq: 2.
Ping from EO A! Queue: 258 on core 00. Event seq: 2.
-- Dispatcher exit callback 1 for EO: 3
-- Dispatcher exit callback 2 for EO: 3
++ Dispatcher enter callback 1 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 3.
++ Dispatcher enter callback 2 for EO: 5 (EO B) Queue: 259 on core 01. Event seq: 3.
...
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
B) Performance test examples - does not require any external
input or I/O, just compile and run to get output/results.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> cd em-dpdk/programs/performance/
> make real_clean && make em_clean && make clean && make
-----------------------------------------------------------------------------
B1) Performance: pairs -
em-dpdk/programs/performance/pairs.c
-----------------------------------------------------------------------------
Measures the average cycles consumed during an event send-sched-receive loop
for a certain number of EO pairs in the system. Test has a number of EO
pairs, that send ping-pong events. Depending on test dynamics (e.g. single
burst in atomic queue) only one EO of a pair might be active at a time.
Note that the cycles per event increases with a larger core count (try e.g 4
vs 8 cores). Also note that the way EM-cores are mapped onto HW threads
matters: using HW threads on the same CPU-core (HW multithreading) differs
from using HW threads on separate cores.
Sample output run an a CPU with 8 cores, 16 HW threads, max core mask:0xffff
Run on 4 EM-cores:
> sudo ./build/pairs -c 0xf -n 4 -- -p (4 EM-cores (=processes) mapped to
4 HW threads on separate cores)
> sudo ./build/pairs -c 0x0303 -n 4 -- -p (4 EM-cores =(processes) mapped to
4 HW threads on 2 cores,
2 HW threads per core)
or
> sudo ./build/pairs -c 0xf -n 4 -- -t (4 EM-cores (=pthreads) mapped to
4 HW threads on separate cores)
> sudo ./build/pairs -c 0x0303 -n 4 -- -t (4 EM-cores (=pthreads) mapped to
4 HW threads on 2 cores,
2 HW threads per core)
> sudo ./build/pairs -c 0xf -n 4 -- -p
...
cycles per event 285.27 @2693.59 MHz (core-03 2)
cycles per event 285.82 @2693.59 MHz (core-01 2)
cycles per event 285.80 @2693.59 MHz (core-00 2)
cycles per event 285.84 @2693.59 MHz (core-02 2)
cycles per event 286.22 @2693.59 MHz (core-03 3)
cycles per event 286.73 @2693.59 MHz (core-01 3)
cycles per event 286.65 @2693.59 MHz (core-00 3)
cycles per event 286.61 @2693.59 MHz (core-02 3)
...
Run on 8 EM-cores:
> sudo ./build/pairs -c 0xff -n 4 -- -p (8 EM-cores (=processes) mapped to
8 HW threads on separate cores)
> sudo ./build/pairs -c 0x0f0f -n 4 -- -p (8 EM-cores (=processes) mapped to
8 HW threads on 4 cores,
2 HW threads per core)
or
> sudo ./build/pairs -c 0xff -n 4 -- -t (8 EM-cores (=pthreads) mapped to
8 HW threads on separate cores)
> sudo ./build/pairs -c 0x0f0f -n 4 -- -t (8 EM-cores (=pthreads) mapped to
8 HW threads on 4 cores,
2 HW threads per core)
or e.g.
> sudo ./build/pairs -c 0xff0 -n 4 -- -p
...
cycles per event 317.40 @2693.59 MHz (core-03 2)
cycles per event 317.55 @2693.59 MHz (core-04 2)
cycles per event 318.31 @2693.59 MHz (core-02 2)
cycles per event 318.61 @2693.59 MHz (core-05 2)
cycles per event 319.26 @2693.59 MHz (core-06 2)
cycles per event 319.32 @2693.59 MHz (core-01 2)
cycles per event 319.96 @2693.59 MHz (core-07 2)
cycles per event 321.02 @2693.59 MHz (core-00 2)
cycles per event 317.55 @2693.59 MHz (core-03 3)
cycles per event 317.62 @2693.59 MHz (core-04 3)
cycles per event 318.28 @2693.59 MHz (core-02 3)
cycles per event 318.52 @2693.59 MHz (core-05 3)
cycles per event 319.17 @2693.59 MHz (core-06 3)
cycles per event 319.14 @2693.59 MHz (core-01 3)
cycles per event 320.11 @2693.59 MHz (core-07 3)
...
-----------------------------------------------------------------------------
B2) Performance: queue_groups -
em-dpdk/programs/performance/queue_groups.c
-----------------------------------------------------------------------------
Event Machine queue group performance test.
Measures the average cycles consumed during an event send-sched-receive loop
for a certain number of queue groups in the system. The test increases the
number of groups for each measurement round and prints the results.
Plot the cycles/event to get an idea of how the system scales with an
increasing number of queue groups.
E.g. start on 14 cores:
> sudo ./build/queue_groups -c 0xfefe -n 4 -- -p
...
Creating 1 new queue group(s)
New group name: grp_0
Entering the event dispatch loop() on EM-core 0
...
New queue group(s) ready
Queues: 256, Queue groups: 1
Cycles/Event: 376.73 @2693.59 MHz(0)
Cycles/Event: 376.47 @2693.59 MHz(1)
Cycles/Event: 376.66 @2693.59 MHz(2)
Cycles/Event: 376.40 @2693.59 MHz(3)
Cycles/Event: 376.55 @2693.59 MHz(4)
Cycles/Event: 376.60 @2693.59 MHz(5)
Cycles/Event: 376.38 @2693.59 MHz(6)
Cycles/Event: 376.77 @2693.59 MHz(7)
Creating 1 new queue group(s)
New group name: grp_1
New queue group(s) ready
Queues: 256, Queue groups: 2
Cycles/Event: 381.33 @2693.59 MHz(8)
Cycles/Event: 381.28 @2693.59 MHz(9)
Cycles/Event: 381.38 @2693.59 MHz(10)
Cycles/Event: 381.18 @2693.59 MHz(11)
Cycles/Event: 381.35 @2693.59 MHz(12)
Cycles/Event: 381.29 @2693.59 MHz(13)
Cycles/Event: 381.34 @2693.59 MHz(14)
Cycles/Event: 381.33 @2693.59 MHz(15)
...
-----------------------------------------------------------------------------
B3) Performance: queues -
em-dpdk/programs/performance/queues.c
-----------------------------------------------------------------------------
Event Machine performance test.
Measures the average cycles consumed during an event send-sched-receive loop
for a certain number of queues and events in the system. The test increases
the number of queues[+events] for each measurement round and prints the
results.
Plot the cycles/event to get an idea of how the system scales with an
increasing number of queues.
E.g. start on 14 cores:
> sudo ./build/queues -c 0xfefe -n 4 -- -t
Create new queues: 8
New Qs created:8 First:288 Last:295
Number of queues: 8
Number of events: 4096
Cycles/Event: 1231 Latency Hi: 97392 Lo: 265086 @2694MHz(0)
Cycles/Event: 1233 Latency Hi: 98481 Lo: 264635 @2694MHz(1)
Cycles/Event: 1232 Latency Hi: 98118 Lo: 264811 @2694MHz(2)
Cycles/Event: 1232 Latency Hi: 96143 Lo: 267460 @2694MHz(3)
Cycles/Event: 1232 Latency Hi: 95945 Lo: 266943 @2694MHz(4)
Cycles/Event: 1234 Latency Hi: 97214 Lo: 266064 @2694MHz(5)
Cycles/Event: 1234 Latency Hi: 97296 Lo: 266196 @2694MHz(6)
Cycles/Event: 1235 Latency Hi: 98790 Lo: 265076 @2694MHz(7)
Create new queues: 8
New Qs created:8 First:7065 Last:7072
Number of queues: 16
Number of events: 4096
Cycles/Event: 684 Latency Hi: 47877 Lo: 153043 @2694MHz(8)
Cycles/Event: 684 Latency Hi: 47813 Lo: 153240 @2694MHz(9)
Cycles/Event: 684 Latency Hi: 46726 Lo: 154437 @2694MHz(10)
Cycles/Event: 685 Latency Hi: 47444 Lo: 153830 @2694MHz(11)
Cycles/Event: 685 Latency Hi: 47041 Lo: 154228 @2694MHz(12)
Cycles/Event: 684 Latency Hi: 47447 Lo: 153749 @2694MHz(13)
Cycles/Event: 684 Latency Hi: 47807 Lo: 153285 @2694MHz(14)
Cycles/Event: 684 Latency Hi: 47777 Lo: 153325 @2694MHz(15)
Create new queues: 16
New Qs created:16 First:5937 Last:5952
Number of queues: 32
Number of events: 4096
Cycles/Event: 594 Latency Hi: 22203 Lo: 152451 @2694MHz(16)
Cycles/Event: 594 Latency Hi: 21952 Lo: 152730 @2694MHz(17)
Cycles/Event: 594 Latency Hi: 22098 Lo: 152488 @2694MHz(18)
Cycles/Event: 594 Latency Hi: 21978 Lo: 152711 @2694MHz(19)
Cycles/Event: 594 Latency Hi: 22006 Lo: 152633 @2694MHz(20)
Cycles/Event: 594 Latency Hi: 21890 Lo: 152832 @2694MHz(21)
Cycles/Event: 594 Latency Hi: 21550 Lo: 153074 @2694MHz(22)
Cycles/Event: 594 Latency Hi: 21934 Lo: 152824 @2694MHz(23)
Create new queues: 32
New Qs created:32 First:5953 Last:5984
Number of queues: 64
Number of events: 4096
Cycles/Event: 545 Latency Hi: 26066 Lo: 134082 @2694MHz(24)
Cycles/Event: 545 Latency Hi: 25869 Lo: 134325 @2694MHz(25)
Cycles/Event: 545 Latency Hi: 25541 Lo: 134630 @2694MHz(26)
Cycles/Event: 545 Latency Hi: 25873 Lo: 134357 @2694MHz(27)
Cycles/Event: 545 Latency Hi: 25744 Lo: 134471 @2694MHz(28)
Cycles/Event: 545 Latency Hi: 26052 Lo: 134205 @2694MHz(29)
Cycles/Event: 545 Latency Hi: 26033 Lo: 134228 @2694MHz(30)
Cycles/Event: 545 Latency Hi: 25995 Lo: 134209 @2694MHz(31)
Create new queues: 64
New Qs created:64 First:5985 Last:6048
Number of queues: 128
Number of events: 4096
Cycles/Event: 516 Latency Hi: 32354 Lo: 119338 @2694MHz(32)
Cycles/Event: 516 Latency Hi: 32347 Lo: 119619 @2694MHz(33)
Cycles/Event: 516 Latency Hi: 32420 Lo: 119448 @2694MHz(34)
Cycles/Event: 516 Latency Hi: 32363 Lo: 119505 @2694MHz(35)
Cycles/Event: 516 Latency Hi: 32343 Lo: 119556 @2694MHz(36)
Cycles/Event: 516 Latency Hi: 32307 Lo: 119669 @2694MHz(37)
Cycles/Event: 516 Latency Hi: 32453 Lo: 119641 @2694MHz(38)
Cycles/Event: 516 Latency Hi: 32261 Lo: 119761 @2694MHz(39)
Create new queues: 128
New Qs created:128 First:1989 Last:2116
Number of queues: 256
Number of events: 4096
Cycles/Event: 508 Latency Hi: 42147 Lo: 107600 @2694MHz(40)
Cycles/Event: 508 Latency Hi: 42045 Lo: 107483 @2694MHz(41)
Cycles/Event: 508 Latency Hi: 42216 Lo: 107172 @2694MHz(42)
Cycles/Event: 508 Latency Hi: 41878 Lo: 107425 @2694MHz(43)
Cycles/Event: 508 Latency Hi: 42342 Lo: 107420 @2694MHz(44)
Cycles/Event: 508 Latency Hi: 42318 Lo: 107284 @2694MHz(45)
Cycles/Event: 508 Latency Hi: 42119 Lo: 107532 @2694MHz(46)
Cycles/Event: 508 Latency Hi: 42251 Lo: 107412 @2694MHz(47)
Create new queues: 256
New Qs created:256 First:6049 Last:6304
Number of queues: 512
Number of events: 4096
Cycles/Event: 499 Latency Hi: 44833 Lo: 102195 @2694MHz(48)
Cycles/Event: 499 Latency Hi: 44630 Lo: 102181 @2694MHz(49)
Cycles/Event: 499 Latency Hi: 44766 Lo: 102138 @2694MHz(50)
Cycles/Event: 499 Latency Hi: 44586 Lo: 102485 @2694MHz(51)
Cycles/Event: 499 Latency Hi: 44638 Lo: 102305 @2694MHz(52)
Cycles/Event: 499 Latency Hi: 44733 Lo: 102219 @2694MHz(53)
Cycles/Event: 499 Latency Hi: 44403 Lo: 102437 @2694MHz(54)
Cycles/Event: 499 Latency Hi: 44414 Lo: 102218 @2694MHz(55)
Create new queues: 512
New Qs created:512 First:1425 Last:1936
Number of queues: 1024
Number of events: 4096
Cycles/Event: 494 Latency Hi: 44398 Lo: 101154 @2694MHz(56)
Cycles/Event: 494 Latency Hi: 44257 Lo: 101284 @2694MHz(57)
Cycles/Event: 494 Latency Hi: 44328 Lo: 101202 @2694MHz(58)
Cycles/Event: 494 Latency Hi: 44455 Lo: 101236 @2694MHz(59)
Cycles/Event: 494 Latency Hi: 44371 Lo: 101018 @2694MHz(60)
Cycles/Event: 494 Latency Hi: 44583 Lo: 100964 @2694MHz(61)
Cycles/Event: 494 Latency Hi: 44230 Lo: 101406 @2694MHz(62)
Cycles/Event: 494 Latency Hi: 44814 Lo: 100970 @2694MHz(63)
Cycles/Event: 494 Latency Hi: 44541 Lo: 101066 @2694MHz(64)
Cycles/Event: 494 Latency Hi: 44331 Lo: 101114 @2694MHz(65)
Cycles/Event: 494 Latency Hi: 44396 Lo: 101000 @2694MHz(66)
Cycles/Event: 494 Latency Hi: 44399 Lo: 100963 @2694MHz(67)
Cycles/Event: 494 Latency Hi: 44578 Lo: 101152 @2694MHz(68)
Cycles/Event: 494 Latency Hi: 44446 Lo: 100920 @2694MHz(69)
Cycles/Event: 494 Latency Hi: 44362 Lo: 100899 @2694MHz(70)
Cycles/Event: 494 Latency Hi: 44608 Lo: 101170 @2694MHz(71)
...
-----------------------------------------------------------------------------
B4) Performance: atomic_processing_end -
em-dpdk/programs/performance/atomic_processing_end.c -----------------------------------------------------------------------------
Event Machine em_atomic_processing_end() example
Measures the average cycles consumed during an event send-sched-receive loop
for an EO pair using atomic queues and alternating between calling
em_atomic_processing_end() and not calling it.
Each EO's receive function will do some dummy work for each received event.
The em_atomic_processing_end() is called before processing the dummy work to
allow another core to continue processing events from the queue. Not calling
em_atomic_processing_end() with a low number of queues and long per-event
processing times will limit the throughput and the cycles/event result.
For comparison, both results with em_atomic_processing_end()-calling enabled
and and disabled is shown.
Note: Calling em_atomic_processing_end() will normally give worse performance
except in cases when atomic event processing becomes a bottleneck by blocking
other cores from doing their work (as this test tries to demonstrate).
E.g. start on 8 cores:
> sudo ./build/atomic_processing_end -c 0x0ff0 -n 4 -- -t
...
normal atomic processing: 7292 cycles/event @2804.11 MHz (core-00 1)
normal atomic processing: 13371 cycles/event @2800.25 MHz (core-05 1)
normal atomic processing: 13946 cycles/event @2800.25 MHz (core-06 1)
normal atomic processing: 15632 cycles/event @2800.25 MHz (core-02 1)
normal atomic processing: 14415 cycles/event @2800.25 MHz (core-04 1)
normal atomic processing: 14921 cycles/event @2800.25 MHz (core-07 1)
normal atomic processing: 15545 cycles/event @2800.16 MHz (core-03 1)
normal atomic processing: 19156 cycles/event @2800.16 MHz (core-01 1)
em_atomic_processing_end(): 5553 cycles/event @2800.25 MHz (core-04 2)
em_atomic_processing_end(): 5582 cycles/event @2800.16 MHz (core-03 2)
em_atomic_processing_end(): 5589 cycles/event @2800.16 MHz (core-01 2)
em_atomic_processing_end(): 5606 cycles/event @2800.25 MHz (core-02 2)
em_atomic_processing_end(): 5621 cycles/event @2800.25 MHz (core-05 2)
em_atomic_processing_end(): 5626 cycles/event @2800.25 MHz (core-06 2)
em_atomic_processing_end(): 5643 cycles/event @2804.11 MHz (core-00 2)
em_atomic_processing_end(): 5723 cycles/event @2800.25 MHz (core-07 2)
normal atomic processing: 6800 cycles/event @2804.11 MHz (core-00 3)
normal atomic processing: 12037 cycles/event @2800.25 MHz (core-05 3)
normal atomic processing: 11696 cycles/event @2800.25 MHz (core-06 3)
normal atomic processing: 13019 cycles/event @2800.25 MHz (core-04 3)
normal atomic processing: 13155 cycles/event @2800.25 MHz (core-07 3)
normal atomic processing: 13315 cycles/event @2800.25 MHz (core-02 3)
normal atomic processing: 16736 cycles/event @2800.16 MHz (core-03 3)
normal atomic processing: 18215 cycles/event @2800.16 MHz (core-01 3)
em_atomic_processing_end(): 5554 cycles/event @2800.25 MHz (core-04 4)
em_atomic_processing_end(): 5580 cycles/event @2800.16 MHz (core-03 4)
em_atomic_processing_end(): 5584 cycles/event @2800.16 MHz (core-01 4)
em_atomic_processing_end(): 5603 cycles/event @2800.25 MHz (core-02 4)
em_atomic_processing_end(): 5616 cycles/event @2800.25 MHz (core-05 4)
em_atomic_processing_end(): 5625 cycles/event @2800.25 MHz (core-06 4)
em_atomic_processing_end(): 5640 cycles/event @2804.11 MHz (core-00 4)
em_atomic_processing_end(): 5722 cycles/event @2800.25 MHz (core-07 4)
...
-----------------------------------------------------------------------------
B5) Performance: queues_unscheduled -
em-dpdk/programs/performance/queues_unscheduled.c
-----------------------------------------------------------------------------
Event Machine performance test
Based on the queues.c test and extends it to also use unscheduled
queues.
Measures the average cycles consumed during an event send-sched-receive loop
for a certain number of queues and events in the system. The test increases
the number of queues[+events] for each measurement round and prints the
results. Each normal scheduled queue is accompanied by an unscheduled queue
that is dequeued from at each event receive. Both the received event and the
dequeued event is sent to the next queue at the end of the receive function.
The measured cycles contain the scheduled event send-sched-receive cycles as
well as the unscheduled event dequeue
Plot the cycles/event to get an idea of how the system scales with an
increasing number of queues.
E.g. run on 14 cores:
> sudo ./build/queues_unscheduled -c 0xfefe -n 4 -- -t
...
Create new queues - scheduled:8 + unscheduled:8
Number of queues: 8 + 8
Number of events: 2048 + 2048
Cycles/Event: 834 Latency Hi: 35170 Lo: 88167 @2694MHz(0)
Cycles/Event: 834 Latency Hi: 34631 Lo: 88448 @2694MHz(1)
Cycles/Event: 834 Latency Hi: 33202 Lo: 90042 @2694MHz(2)
Cycles/Event: 835 Latency Hi: 31936 Lo: 91393 @2694MHz(3)
Cycles/Event: 835 Latency Hi: 32275 Lo: 90915 @2694MHz(4)
Cycles/Event: 835 Latency Hi: 31905 Lo: 91232 @2694MHz(5)
Cycles/Event: 835 Latency Hi: 32350 Lo: 90583 @2694MHz(6)
Cycles/Event: 835 Latency Hi: 36971 Lo: 86235 @2694MHz(7)
Create new queues - scheduled:8 + unscheduled:8
Number of queues: 16 + 16
Number of events: 2048 + 2048
Cycles/Event: 469 Latency Hi: 23507 Lo: 45394 @2694MHz(8)
Cycles/Event: 469 Latency Hi: 22632 Lo: 46299 @2694MHz(9)
Cycles/Event: 469 Latency Hi: 22975 Lo: 45997 @2694MHz(10)
Cycles/Event: 469 Latency Hi: 22952 Lo: 46006 @2694MHz(11)
Cycles/Event: 469 Latency Hi: 23571 Lo: 45372 @2694MHz(12)
Cycles/Event: 469 Latency Hi: 23092 Lo: 45846 @2694MHz(13)
Cycles/Event: 469 Latency Hi: 23005 Lo: 45923 @2694MHz(14)
Cycles/Event: 469 Latency Hi: 23251 Lo: 45716 @2694MHz(15)
Create new queues - scheduled:16 + unscheduled:16
Number of queues: 32 + 32
Number of events: 2048 + 2048
Cycles/Event: 398 Latency Hi: 15282 Lo: 43075 @2694MHz(16)
Cycles/Event: 398 Latency Hi: 15538 Lo: 42841 @2694MHz(17)
Cycles/Event: 398 Latency Hi: 15513 Lo: 42815 @2694MHz(18)
Cycles/Event: 398 Latency Hi: 15506 Lo: 42844 @2694MHz(19)
Cycles/Event: 398 Latency Hi: 15500 Lo: 42806 @2694MHz(20)
Cycles/Event: 398 Latency Hi: 15186 Lo: 43176 @2694MHz(21)
Cycles/Event: 398 Latency Hi: 15419 Lo: 42912 @2694MHz(22)
Cycles/Event: 398 Latency Hi: 15498 Lo: 42874 @2694MHz(23)
Create new queues - scheduled:32 + unscheduled:32
Number of queues: 64 + 64
Number of events: 2048 + 2048
Cycles/Event: 370 Latency Hi: 5584 Lo: 48616 @2694MHz(24)
Cycles/Event: 370 Latency Hi: 5606 Lo: 48651 @2694MHz(25)
Cycles/Event: 369 Latency Hi: 5605 Lo: 48596 @2694MHz(26)
Cycles/Event: 370 Latency Hi: 5630 Lo: 48615 @2694MHz(27)
Cycles/Event: 370 Latency Hi: 5627 Lo: 48590 @2694MHz(28)
Cycles/Event: 369 Latency Hi: 5578 Lo: 48557 @2694MHz(29)
Cycles/Event: 370 Latency Hi: 5595 Lo: 48633 @2694MHz(30)
Cycles/Event: 370 Latency Hi: 5600 Lo: 48623 @2694MHz(31)
Create new queues - scheduled:64 + unscheduled:64
Number of queues: 128 + 128
Number of events: 2048 + 2048
Cycles/Event: 352 Latency Hi: 10616 Lo: 40956 @2694MHz(32)
Cycles/Event: 352 Latency Hi: 10633 Lo: 40930 @2694MHz(33)
Cycles/Event: 352 Latency Hi: 10644 Lo: 40931 @2694MHz(34)
Cycles/Event: 352 Latency Hi: 10648 Lo: 40983 @2694MHz(35)
Cycles/Event: 352 Latency Hi: 10646 Lo: 40954 @2694MHz(36)
Cycles/Event: 352 Latency Hi: 10687 Lo: 40942 @2694MHz(37)
Cycles/Event: 352 Latency Hi: 10634 Lo: 40992 @2694MHz(38)
Cycles/Event: 352 Latency Hi: 10671 Lo: 40989 @2694MHz(39)
Create new queues - scheduled:128 + unscheduled:128
Number of queues: 256 + 256
Number of events: 2048 + 2048
Cycles/Event: 344 Latency Hi: 12443 Lo: 38042 @2694MHz(40)
Cycles/Event: 344 Latency Hi: 12497 Lo: 38034 @2694MHz(41)
Cycles/Event: 344 Latency Hi: 12541 Lo: 37943 @2694MHz(42)
Cycles/Event: 344 Latency Hi: 12532 Lo: 37943 @2694MHz(43)
Cycles/Event: 344 Latency Hi: 12515 Lo: 37984 @2694MHz(44)
Cycles/Event: 344 Latency Hi: 12529 Lo: 37974 @2694MHz(45)
Cycles/Event: 344 Latency Hi: 12460 Lo: 38014 @2694MHz(46)
Cycles/Event: 344 Latency Hi: 12529 Lo: 37952 @2694MHz(47)
Create new queues - scheduled:256 + unscheduled:256
Number of queues: 512 + 512
Number of events: 2048 + 2048
Cycles/Event: 336 Latency Hi: 12735 Lo: 36438 @2694MHz(48)
Cycles/Event: 335 Latency Hi: 12765 Lo: 36436 @2694MHz(49)
Cycles/Event: 335 Latency Hi: 12743 Lo: 36427 @2694MHz(50)
Cycles/Event: 335 Latency Hi: 12775 Lo: 36437 @2694MHz(51)
Cycles/Event: 336 Latency Hi: 12762 Lo: 36419 @2694MHz(52)
Cycles/Event: 335 Latency Hi: 12722 Lo: 36490 @2694MHz(53)
Cycles/Event: 335 Latency Hi: 12737 Lo: 36426 @2694MHz(54)
Cycles/Event: 335 Latency Hi: 12789 Lo: 36416 @2694MHz(55)
Cycles/Event: 335 Latency Hi: 12725 Lo: 36444 @2694MHz(56)
Cycles/Event: 335 Latency Hi: 12812 Lo: 36422 @2694MHz(57)
Cycles/Event: 336 Latency Hi: 12763 Lo: 36484 @2694MHz(58)
Cycles/Event: 336 Latency Hi: 12765 Lo: 36535 @2694MHz(59)
Cycles/Event: 335 Latency Hi: 12827 Lo: 36467 @2694MHz(60)
Cycles/Event: 336 Latency Hi: 12789 Lo: 36475 @2694MHz(61)
Cycles/Event: 335 Latency Hi: 12798 Lo: 36424 @2694MHz(62)
Cycles/Event: 336 Latency Hi: 12824 Lo: 36429 @2694MHz(63)
Cycles/Event: 335 Latency Hi: 12773 Lo: 36481 @2694MHz(64)
...
-----------------------------------------------------------------------------
B6) Performance: send_multi -
em-dpdk/programs/performance/send_multi.c
-----------------------------------------------------------------------------
Event Machine performance test for burst sending of events.
(based on the queues_unscheduled.c test and extends it to use burst
sending of events into the next queue, see em_send_multi() &
em_queue_dequeue_multi())
Measures the average cycles consumed during an event send-sched-receive loop
for a certain number of queues and events in the system. The test increases
the number of queues[+events] for each measurement round and prints the
results. The test will stop if the maximum number of supported queues by the
system is reached.
Each normal scheduled queue is accompanied by an unscheduled queue. Received
events are stored until a suitable length event burst is available, then the
whole burst is forwarded to the next queue in the chain using
em_send_multi(). Each stored burst is accompanied by another burst taken
from the associated unscheduled queue.
Both the received scheduled events and the unscheduled dequeued events are
sent as bursts to the next queue at the end of the receive function.
The measured cycles contain the scheduled event send_multi-sched-receive
cycles as well as the unscheduled event multi_dequeue.
Plot the cycles/event to get an idea of how the system scales with an
increasing number of queues.
E.g. run on 14 cores:
> sudo ./build/send_multi -c 0xfefe -n 4 -- -t
...
Create new queues - scheduled:8 + unscheduled:8
Number of queues: 8 + 8
Number of events: 2048 + 2048
Cycles/Event: 558 Latency Hi: 37929 Lo: 44146 @2694MHz(0)
Cycles/Event: 559 Latency Hi: 37239 Lo: 44931 @2694MHz(1)
Cycles/Event: 559 Latency Hi: 37720 Lo: 44746 @2694MHz(2)
Cycles/Event: 559 Latency Hi: 38153 Lo: 44182 @2694MHz(3)
Cycles/Event: 560 Latency Hi: 41254 Lo: 41385 @2694MHz(4)
Cycles/Event: 560 Latency Hi: 40871 Lo: 41616 @2694MHz(5)
Cycles/Event: 559 Latency Hi: 41417 Lo: 41241 @2694MHz(6)
Cycles/Event: 559 Latency Hi: 41480 Lo: 40869 @2694MHz(7)
Create new queues - scheduled:8 + unscheduled:8
Number of queues: 16 + 16
Number of events: 2048 + 2048
Cycles/Event: 333 Latency Hi: 6396 Lo: 42390 @2694MHz(8)
Cycles/Event: 333 Latency Hi: 6379 Lo: 42495 @2694MHz(9)
Cycles/Event: 333 Latency Hi: 6356 Lo: 42537 @2694MHz(10)
Cycles/Event: 333 Latency Hi: 6575 Lo: 42293 @2694MHz(11)
Cycles/Event: 333 Latency Hi: 6365 Lo: 42496 @2694MHz(12)
Cycles/Event: 333 Latency Hi: 6425 Lo: 42451 @2694MHz(13)
Cycles/Event: 333 Latency Hi: 6405 Lo: 42506 @2694MHz(14)
Cycles/Event: 333 Latency Hi: 6347 Lo: 42585 @2694MHz(15)
Create new queues - scheduled:16 + unscheduled:16
Number of queues: 32 + 32
Number of events: 2048 + 2048
Cycles/Event: 303 Latency Hi: 5074 Lo: 39313 @2694MHz(16)
Cycles/Event: 303 Latency Hi: 5056 Lo: 39370 @2694MHz(17)
Cycles/Event: 303 Latency Hi: 5238 Lo: 39199 @2694MHz(18)
Cycles/Event: 303 Latency Hi: 5165 Lo: 39279 @2694MHz(19)
Cycles/Event: 303 Latency Hi: 5107 Lo: 39288 @2694MHz(20)
Cycles/Event: 302 Latency Hi: 5088 Lo: 39278 @2694MHz(21)
Cycles/Event: 302 Latency Hi: 5289 Lo: 39084 @2694MHz(22)
Cycles/Event: 302 Latency Hi: 5219 Lo: 39134 @2694MHz(23)
Create new queues - scheduled:32 + unscheduled:32
Number of queues: 64 + 64
Number of events: 2048 + 2048
Cycles/Event: 284 Latency Hi: 8290 Lo: 33378 @2694MHz(24)
Cycles/Event: 284 Latency Hi: 8226 Lo: 33477 @2694MHz(25)
Cycles/Event: 284 Latency Hi: 8258 Lo: 33430 @2694MHz(26)
Cycles/Event: 284 Latency Hi: 8267 Lo: 33436 @2694MHz(27)
Cycles/Event: 284 Latency Hi: 8239 Lo: 33458 @2694MHz(28)
Cycles/Event: 284 Latency Hi: 8269 Lo: 33431 @2694MHz(29)
Cycles/Event: 284 Latency Hi: 8157 Lo: 33557 @2694MHz(30)
Cycles/Event: 284 Latency Hi: 8229 Lo: 33494 @2694MHz(31)
Create new queues - scheduled:64 + unscheduled:64
Number of queues: 128 + 128
Number of events: 2048 + 2048
Cycles/Event: 274 Latency Hi: 9716 Lo: 30444 @2694MHz(32)
Cycles/Event: 274 Latency Hi: 9720 Lo: 30451 @2694MHz(33)
Cycles/Event: 274 Latency Hi: 9726 Lo: 30460 @2694MHz(34)
Cycles/Event: 274 Latency Hi: 9721 Lo: 30504 @2694MHz(35)
Cycles/Event: 274 Latency Hi: 9733 Lo: 30471 @2694MHz(36)
Cycles/Event: 274 Latency Hi: 9751 Lo: 30470 @2694MHz(37)
Cycles/Event: 274 Latency Hi: 9743 Lo: 30477 @2694MHz(38)
Cycles/Event: 274 Latency Hi: 9718 Lo: 30468 @2694MHz(39)
Create new queues - scheduled:128 + unscheduled:128
Number of queues: 256 + 256
Number of events: 2048 + 2048
Cycles/Event: 270 Latency Hi: 9830 Lo: 29766 @2694MHz(40)
Cycles/Event: 270 Latency Hi: 9834 Lo: 29765 @2694MHz(41)
Cycles/Event: 270 Latency Hi: 9848 Lo: 29792 @2694MHz(42)
Cycles/Event: 270 Latency Hi: 9822 Lo: 29775 @2694MHz(43)
Cycles/Event: 270 Latency Hi: 9798 Lo: 29801 @2694MHz(44)
Cycles/Event: 270 Latency Hi: 9838 Lo: 29759 @2694MHz(45)
Cycles/Event: 270 Latency Hi: 9797 Lo: 29782 @2694MHz(46)
Cycles/Event: 270 Latency Hi: 9834 Lo: 29782 @2694MHz(47)
Create new queues - scheduled:256 + unscheduled:256
EAL: memzone_reserve_aligned_thread_unsafe(): No more room in config
RING: Cannot reserve memory
Unable to create more queues
Test finished
Max nbr of supported queues: 1261
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
C) Packet-I/O examples - used together with an external traffic generator.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTE: some differences depending on the target environment:
em-dpdk: These tests assume that the test system is equipped with at
least one NIC supported by DPDK.
Sources in directory: em-dpdk/programs/packet_io/
Packet-io enabled by setting: em_conf.pkt_io = 1 before calling
em_init(&em_conf).
NOTE: Might require DPDK config changes, see README.intel
Assign NIC ports to dpdk as described in the dpdk docs.
> cd em-dpdk/programs/packet_io/
> make real_clean && make em_clean && make clean && make
-----------------------------------------------------------------------------
C1) Packet-IO: loopback -
em-dpdk/programs/packet_io/loopback.c
-----------------------------------------------------------------------------
Simple Dynamically Load Balanced Packet-I/O loopback test application.
Receives UDP/IP packets, swaps the addresses/ports and sends the packets back
to where they came from.
The test expects the traffic generator to send data using 256 UDP-flows:
- 4 IP dst addresses each with 64 different UDP dst ports (=flows).
Alternatively setting "#define QUEUE_PER_FLOW 0" will accept any packets,
but uses only a single default EM queue, thus limiting performance.
Use the traffic generator to find the max sustainable throughput for loopback
traffic. The throughput should increase near-linearly with increasing core
counts, as set by '-c 0xcoremask'.
> sudo ./build/loopback -c 0xffff -n 4 -- -p
or
> sudo ./build/loopback -c 0xffff -n 4 -- -t
Note: The number of used flows has been decreased in order to work
out-of-the-box with the dpdk default .config
-----------------------------------------------------------------------------
C2) Packet-IO: loopback_ag -
em-dpdk/programs/packet_io/loopback_ag.c
-----------------------------------------------------------------------------
Test derived from the loopback test above but further groups the input
queues into atomic groups (hence _ag) to provide multiple priority levels for
each atomic processing context.
An application (EO) that receives UDP datagrams and exchanges
the src-dst addesses before sending the datagram back out.
Each set of four input EM queues with prios Highest, High,
Normal and Low are mapped into an EM atomic group to provide
"atomic context with priority".
Similar to normal atomic queues, atomic groups provide EOs with atomic
processing context, but expands the context over multiple queues,
i.e. over all the queues in the same atomic group.
All queues in an atomic group are by default of type "atomic".
Traffic setup and startup similar to the loopback case.
-----------------------------------------------------------------------------
C3) Packet-IO: multi_stage -
em-dpdk/programs/packet_io/multi_stage.c
-----------------------------------------------------------------------------
Similar packet-I/O example as loopback, except that each UDP-flow is
handled in three (3) stages before sending back out. The three stages (3 EOs)
causes each packet be enqueued, scheduled and received multiple times on the
multicore CPU, thus stressing the EM scheduler.
Additionally this test uses EM queues of different priority and type.
The test expects the traffic generator to send data using 128 UDP-flows:
- 4 IP dst addresses each with 32 different UDP dst ports (=flows).
Use the traffic generator to find the max sustainable throughput for
multi-stage traffic. The throughput should increase near-linearly with
increasing core counts, as set by '-c 0xcoremask'.
> sudo ./build/multi_stage -c 0xffff -n 4 -- -p
or
> sudo ./build/multi_stage -c 0xffff -n 4 -- -t
Note: The number of used flows has been decreased in order to work
out-of-the-box with the dpdk default .config
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
D) Add-on examples - Event Timer
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sources in directory: em-dpdk/programs/example/add-ons/
EM event timer enabled by setting: em_conf.event_timer = 1 before calling
em_init(&em_conf).
NOTE: Currently the event timer tests must be run in thread-per-core (-t) mode
due to a bug in dpdk (tested dpdk v17.08).
> cd em-dpdk/programs/example/add-ons/
> make real_clean && make em_clean && make clean && make
-----------------------------------------------------------------------------
D1) Event Timer: timer_hello -
em-dpdk/programs/example/add-ons/timer_hello.c
-----------------------------------------------------------------------------
Event Machine timer add-on hello world example.
Timer hello world example to show basic event timer usage. Creates a
single EO that starts a periodic and a random one-shot timeout.
Exception/error management is simplified to focus on basic timer usage.
E.g. start on 14 cores:
> sudo ./build/timer_hello -c 0xfefe -n 4 -- -t
...
EO start
System has 1 timer(s)
Timer "ExampleTimer" info:
-resolution: 50000 ns
-max_tmo: 2592000000000 ms
-num_tmo: 2047
-tick Hz: 2693514283 hz
1. tick
tock
2. tick
tock
3. tick
tock
4. tick
tock
5. tick
tock
Meditation time: what can you do in 8538 ms?
6. tick
tock
7. tick
tock
8. tick
tock
9. tick
tock
8538 ms gone!
Meditation time: what can you do in 15603 ms?
10. tick
tock
11. tick
tock
12. tick
tock
13. tick
tock
14. tick
tock
15. tick
tock
16. tick
tock
17. tick
tock
15603 ms gone!
...
-----------------------------------------------------------------------------
D2) Event Timer: timer_test -
em-dpdk/programs/example/add-ons/timer_test.c
-----------------------------------------------------------------------------
Event Machine timer add-on basic test.
Simple test for timer (does not test everything). Creates and deletes random
timers and checks how accurate the timeout indications are against timer
itself and also linux time (clock_gettime). Single EO, but receiving queue
is parallel so multiple threads can process timeouts concurrently.
Exception/error management is simplified and aborts on any error.
E.g. start on 14 cores:
> sudo ./build/timer_test -c 0xfefe -n 4 -- -t
...
EO start
System has 1 timer(s)
Timer "TestTimer" info:
-resolution: 50000 ns
-max_tmo: 2592000000000 ms
-num_tmo: 2047
-tick Hz: 2693513330 hz
Linux reports clock running at 1000000000 hz
app_eo_start done, test repetition interval 96s
Aug-29 16:14:26 ROUND 1 ************
Timer: Creating 1500 timeouts took 42188 ns (28 ns each)
Linux: Creating 1500 timeouts took 41949 ns (27 ns each)
Started single shots
Started periodic
Running
...............................................
Heartbeat count 48
ONESHOT:
Received: 1500, expected 1500
Cancelled OK: 0
Cancel failed (too late): 0
SUMMARY/TICKS: min 2695, max 148754, avg 69087
/NS: min 1000, max 55226, avg 25649
SUMMARY/LINUX NS: min 4546, max 155133, avg 78032
PERIODIC:
Received: 15687
Cancelled: 52
Cancel failed (too late): 0
Errors: 0
TOTAL RUNTIME/US: min 1, max 55
TOTAL RUNTIME LINUX/US: min 4, max 155
TOTAL ERRORS: 0
TOTAL ACK FAILS: 0
Cleaning up
Timer: Deleting 1500 timeouts took 63056 ns (42 ns each)
Linux: Deleting 1500 timeouts took 62836 ns (41 ns each)
Aug-29 16:16:02 ROUND 2 ************
Timer: Creating 1500 timeouts took 41596 ns (27 ns each)
Linux: Creating 1500 timeouts took 41415 ns (27 ns each)
Started single shots
Started periodic
Running
...............................................
Heartbeat count 96
ONESHOT:
Received: 1500, expected 1500
Cancelled OK: 0
Cancel failed (too late): 0
SUMMARY/TICKS: min 1763, max 156497, avg 70586
/NS: min 654, max 58101, avg 26205
SUMMARY/LINUX NS: min 5195, max 156309, avg 78396
PERIODIC:
Received: 11582
Cancelled: 42
Cancel failed (too late): 0
Errors: 0
TOTAL RUNTIME/US: min 0, max 58
TOTAL RUNTIME LINUX/US: min 4, max 156
TOTAL ERRORS: 0
TOTAL ACK FAILS: 0
Cleaning up
Timer: Deleting 1500 timeouts took 64467 ns (42 ns each)
Linux: Deleting 1500 timeouts took 64346 ns (42 ns each)
...
===============================================================================
3. Changes
===============================================================================
See CHANGE_NOTES
===============================================================================
4. Open Issues
===============================================================================
===============================================================================
5. Notes
===============================================================================