Hi guys,
I've pushed my recent updates to sourceforge. For those of you who have
used the iterative solvers, I've got great news: By getting rid of using
an uBLAS-type interface internally and using a low-level implementation
instead, the ILU0 und ILUT-preconditioners experience performance gains
by about an order of magnitude. Also, a GPU-version of ILU0
substitutions using level scheduling is implemented, which is expected
to yield good performance gains for linear systems of equations that
stem from three-dimensional meshes. A lot of functionality is already
available on standard CPU, OpenCL *and* CUDA.
Just some notes for using the developer version:
When pulling from the sourceforge repository, the OpenCL kernels are
generated in build/ after typing make in this folder. With each pull
adding new OpenCL kernels in auxiliary/, it is a good idea to just
delete build/auxilary, otherwise old kernels might interfere with the
new ones.
My initial plan/hope of releasing these days unfortunately doesn't hold,
but most work is done. I expect a release by the end of the month. Any
testing, feedback, comments etc. is of course welcome :-)
Best regards,
Karli
|