The OpenMP version of ATLAS (as of 3.9.68) currently ignores the number of threads set by OpenMP (which is usually set by the user via the OMP_NUM_THREADS environment variable). The maximum number of threads is instead compiled in (as the ATL_NTHREADS constant), modulo some dynamic tuning. It would be really nice if ATLAS would call omp_get_num_threads to get the number of threads to use (or at least a maximum number of threads to use) dynamically.
Not only does the current behavior mean that the user cannot change the number of threads without recompiling ATLAS (a common request as noted elsewhere on this list), but it also defeats a lot of the benefit of using OpenMP. By honoring omp_get_num_threads, ATLAS would share a common thread pool with the calling program, which allows the user to e.g. call ATLAS in a parallel loop and have each thread automatically call the serial ATLAS. It is currently quite painful to mix parallel code with a parallel ATLAS because the user code may be fighting with ATLAS over the processors.
I'd rather have the option of letting ATLAS rely on OpenMP more for the number of threads (and possibly even things like affinity, which should really be done within the OpenMP implementation as needed), even if the performance is somewhat worse, if the benefit is better co-existence with user parallelization and support for user control over the number of threads.
Looking through the code, unfortunately, there are quite a few places that ATL_NTHREADS (and ATL_NTHRPOW2) are used, so it's not so trivial to hack in (and I'm not sure what other side-effects replacing these with omp_get_num_threads would have). (Also, ATL_goparallel would be modifed to only call omp_set_num_threads when P < omp_get_num_threads, and to restore the previous value of omp_get_num_threads when it is done.)
Log in to post a comment.