From: Michal B. <Fra...@ru...> - 2018-02-08 20:41:20
|
Hi, Timo Betcke <tim...@gm...> wrote: > which I already suspected. I tried POCL_VECTORIZER_REMARKS=1 to > activate vectorizer remarks. But it does not create any kind of Yes, it doesn't work - even if pocl registers the LLVM options to print debug info successfully.. i haven't yet figured out why it's not working. > The question is what prevents the auto vectorizer from working at > all. The code seems quite straight forward with very simple for-loops It's possible to use POCL_DEBUG_LLVM_PASSES=1 and grep for lines starting with "LV:" - this shows that vectorizer is in fact running. I tried to compile "evaluate_regular" from your .cl file. It seems to find two loops; the first one (smaller) is only vectorized when you build with "-cl-fast-relaxed-math" option. The second, longer loop prints this: LV: Checking a loop in "_pocl_launcher_evaluate_regular" from /tmp/POCL_CACHE/tempfile-4b-64-39-6d-57.cl LV: Loop hints: force=? width=0 unroll=0 LV: Found a loop: for.body.i LV: Found an induction variable. LV: We can vectorize this loop! LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Scalar loop costs: 20. LV: Vector loop of width 2 costs: 38. LV: Vector loop of width 4 costs: 37. LV: Vector loop of width 8 costs: 36. LV: Selecting VF: 1. LV: Vectorization is possible but not beneficial. ... changing N_QUAD_POINTS doesn't seem to make any difference. Cheers, -- mb |