This software aims at demonstrating that we can easily provide
a very small and powerful runtime for running programs that are
coded in whatever programming model, but that could
be executed in a DATAFLOW style.
The first benefit of this software is to allow a rapid development
of such programs in the context of the TERAFLUX project http://teraflux.eu
The runtime API has been designed in such a way to allow for a future
development of a good compiler that targets such interface on one side,
and to allow for good architectural support of such API too:
ideally, each function could map to a Thread-Level-Parallelism
Instruction Set Extension (TLP ISE).
For some more information about the T-STAR interface, have a look at this paper:
http://doi.acm.org/10.1145/2212908.2212959
and or study the example.
See the TERAFLUX website for more general information.
Benchmark | File | Description |
---|---|---|
Recursive Fibonacci (RFIB) | fib-tsu4.c | Recursively compute a Fibonacci number. |
The recursion stops when the threshold is reached. | ||
The benchmark aims at generating a high number of | ||
threads easily for stressing the threads managment. |
| Blocked Matrix-Multiplication (BMM) | mmul2d.c | Blocked Matrix Multiplication (BMM) as
it is a very commonly used kernel in many applications
(especially in Artificial Intelligence, Deep Neural Networks, etc.),
and it moves much data around. |
There are some examples to start with:
Recursive-Fibonacci (fib-tsu4.c), Matrix-Multiply (mmul2d.c), Simple-List (sl4a.c).
To check that everything works, just issue the ./tregression.sh script: it should print OK for each correctly compiled program and for each correctly tested program (the first time you launch it the reference outputs will be generated, the second time the outputs are compared, so you can modify the C examples)
That's it !
The compilation procedure is straightforward, just use the following commands:
make run_{benchmark_name}
For example, to compile and run RFIB example:
make run_fib-tsu4
Enter the Fibonacci number to compute
Enter the threshold of the recursion.
export DRT_DEBUG=1
to see more detailed information (values of 2, 3, 4, ... increase the verbosity level).
export DRT_FSIZE=10000
to set the size of internal frame queues (similar to a "stack size")
DRT_DEBUG=2 DRT_FSIZE=1000 ./fib-tsu4 5 3
For comparing the performance of DRT against OCR and DARTS by using the
Recursive Fibonacci benchmark just execute the script:
$ ./run_perf_comparison.sh
OR
$ make run_perf