I have the same program running on both Tesla V100 and GTX1080. However, it seems Tesla V100 is significantly slower than GTX1080 on solving sparse linear systems (bicgstab with viennacl::linalg::chow_patel_ilu_precond) I think the problem is the library is not tuned for Tesla V100 according to its number of SM. Can anyone confirm this?