From: Philippe T. <phi...@gm...> - 2014-06-27 11:02:29
|
Ok, thanks! This sounds reasonable indeed. Philippe 2014-06-26 23:51 GMT+02:00 Karl Rupp <ru...@iu...>: > Hi, > > the cases 5, 6, and 7 are handled by running a kernel for four vectors, > then subtract '4' and run a dedicated kernel on the remaining 1, 2, or 3 > vectors. This could also be handled by a generated kernel, yes, but I > haven't implemented this for two reasons: > 1. less kernels to compile > 2. less implementation effort > > One single kernel is not possible for arbitrary values of vectors. Eight > vectors turned out to be a reasonable upper bound because the overhead is > less than 12.5% over the ideal case already, but at the same time the > kernel still works for older GPUs with limited amounts of shared memory. > > Best regards, > Karli > > > > On 06/26/2014 11:09 PM, Philippe Tillet wrote: > >> I'll add something. I assume that multiple kernels are launched thanks >> to current_index. Wouldn't it be better to launch one single kernel ? I >> think that a lot of users would prefer to have better performance for >> perhaps a slightly longer JIT overhead (since we'll provide a caching >> mechanism). >> >> Philippe >> >> >> 2014-06-26 23:07 GMT+02:00 Philippe Tillet <phi...@gm... >> <mailto:phi...@gm...>>: >> >> >> Hello! >> >> I note this in the implementation of multi_inner_prod: >> >> switch (vec_tuple.const_size() - current_index) >> { >> case 7: >> case 6: >> case 5: >> case 4: >> //do stuff >> >> However, there is a test for 5,6,7 so I assume that these have to be >> implemented somehow. Could I have more details on why there is no >> specific kernel for these three cases? >> >> NB : This is the very last thing that has to be done before I can >> push the new device-specific OpenCL backend. All the tests pass >> except multi_inner_prod for tuple_size >= 5. :) >> >> Philippe >> >> >> >> >> ------------------------------------------------------------ >> ------------------ >> Open source business process management suite built on Java and Eclipse >> Turn processes into business applications with Bonita BPM Community >> Edition >> Quickly connect people, data, and systems into organized workflows >> Winner of BOSSIE, CODIE, OW2 and Gartner awards >> http://p.sf.net/sfu/Bonitasoft >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> Vie...@li... >> https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> >> > |