math-atlas-devel Mailing List for Automatically Tuned Linear Algebra Soft.
Brought to you by:
rwhaley,
tonyc040457
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(8) |
Oct
(17) |
Nov
(29) |
Dec
(30) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(19) |
Feb
(19) |
Mar
(29) |
Apr
(3) |
May
(38) |
Jun
(14) |
Jul
(6) |
Aug
(7) |
Sep
(12) |
Oct
(6) |
Nov
(9) |
Dec
|
| 2003 |
Jan
(6) |
Feb
(5) |
Mar
(8) |
Apr
(10) |
May
(4) |
Jun
(11) |
Jul
(5) |
Aug
(3) |
Sep
(12) |
Oct
(1) |
Nov
(9) |
Dec
(45) |
| 2004 |
Jan
(7) |
Feb
(6) |
Mar
(4) |
Apr
(7) |
May
(7) |
Jun
(30) |
Jul
(7) |
Aug
(6) |
Sep
(1) |
Oct
(4) |
Nov
(18) |
Dec
(25) |
| 2005 |
Jan
(11) |
Feb
(10) |
Mar
(3) |
Apr
(7) |
May
|
Jun
|
Jul
(1) |
Aug
(29) |
Sep
(6) |
Oct
(8) |
Nov
(2) |
Dec
(5) |
| 2006 |
Jan
|
Feb
(16) |
Mar
(2) |
Apr
(9) |
May
(15) |
Jun
(24) |
Jul
(10) |
Aug
(39) |
Sep
(20) |
Oct
(8) |
Nov
(30) |
Dec
(28) |
| 2007 |
Jan
(1) |
Feb
(19) |
Mar
(11) |
Apr
(3) |
May
(12) |
Jun
(7) |
Jul
(20) |
Aug
(9) |
Sep
(7) |
Oct
(7) |
Nov
(8) |
Dec
(6) |
| 2008 |
Jan
(3) |
Feb
(8) |
Mar
|
Apr
|
May
(7) |
Jun
(16) |
Jul
(38) |
Aug
(11) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
(4) |
| 2009 |
Jan
(6) |
Feb
(25) |
Mar
(13) |
Apr
(5) |
May
|
Jun
|
Jul
(1) |
Aug
(8) |
Sep
(16) |
Oct
(17) |
Nov
(2) |
Dec
(1) |
| 2010 |
Jan
(3) |
Feb
(3) |
Mar
(2) |
Apr
(5) |
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
(16) |
Nov
(53) |
Dec
(7) |
| 2011 |
Jan
(10) |
Feb
(37) |
Mar
(30) |
Apr
(12) |
May
(5) |
Jun
(14) |
Jul
(7) |
Aug
(8) |
Sep
(37) |
Oct
(3) |
Nov
(5) |
Dec
(60) |
| 2012 |
Jan
(25) |
Feb
(5) |
Mar
(4) |
Apr
(7) |
May
(12) |
Jun
(28) |
Jul
(28) |
Aug
(2) |
Sep
(5) |
Oct
(6) |
Nov
|
Dec
(17) |
| 2013 |
Jan
(18) |
Feb
(10) |
Mar
(30) |
Apr
(21) |
May
|
Jun
(10) |
Jul
(8) |
Aug
|
Sep
(39) |
Oct
(54) |
Nov
(8) |
Dec
(6) |
| 2014 |
Jan
(17) |
Feb
(14) |
Mar
(16) |
Apr
(67) |
May
(2) |
Jun
(8) |
Jul
(7) |
Aug
(9) |
Sep
(6) |
Oct
(9) |
Nov
(12) |
Dec
|
| 2015 |
Jan
(5) |
Feb
(9) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(6) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
(3) |
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(22) |
Aug
|
Sep
(1) |
Oct
|
Nov
(21) |
Dec
|
| 2017 |
Jan
(20) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(8) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
|
From: Fulton, B. <bef...@iu...> - 2018-10-10 13:30:53
|
I've built this for IU's Carbonate cluster. I'll test it more later, but the "make install" appeared to want a recursive copy when installing the include files, so I added that flag. I also tried to run "make time" on a couple of nodes with slightly different configurations, but it appeared to return the exact same values - is there a "make timeclean" or some equivalent I could run? -- Ben Fulton Research Technologies Scientific Applications and Performance Tuning Indiana University E-Mail: bef...@iu... -----Original Message----- From: R. Clint Whaley <rcw...@iu...> Sent: Friday, October 5, 2018 3:30 AM To: List for developer discussion, NOT SUPPORT. <mat...@li...> Subject: [atlas-devel] 3.11.41 I have released 3.11.41. It is a bugfix release, fixing rotmg, assembly errors on POWER, and a performance regression in small triangle TRMM. Cheers, Clint ATLAS 3.11.41 released 10/05/18, highlights of changes from 3.11.40: * Fixed bug in drotmg: https://sourceforge.net/p/math-atlas/bugs/256/ * Fixed assembly errors for POWER9 (failure to save correct regs) * Fixed performance regression for small triangle TRMM -- ****************************************** ** R. Clint Whaley, PhD, Assoc Prof, IU ** http://homes.soic.indiana.edu/rcwhaley/ ****************************************** _______________________________________________ Math-atlas-devel mailing list Mat...@li... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
From: R. C. W. <rcw...@iu...> - 2018-10-05 07:30:36
|
I have released 3.11.41. It is a bugfix release, fixing rotmg, assembly errors on POWER, and a performance regression in small triangle TRMM. Cheers, Clint ATLAS 3.11.41 released 10/05/18, highlights of changes from 3.11.40: * Fixed bug in drotmg: https://sourceforge.net/p/math-atlas/bugs/256/ * Fixed assembly errors for POWER9 (failure to save correct regs) * Fixed performance regression for small triangle TRMM -- ****************************************** ** R. Clint Whaley, PhD, Assoc Prof, IU ** http://homes.soic.indiana.edu/rcwhaley/ ****************************************** |
|
From: R. C. W. <rcw...@iu...> - 2018-10-03 00:53:40
|
Guys,
Sorry to spam both lists and any dups that causes, but since it has
looked like I've retired, I'm sending this to atlas-devel & announce.
3.11.40 has finally been released. I have actually been working on it
for most of this time, but, with the move to Indiana factored in, it has
taken me this long to get the framework working again!
The reason is that we have essentially rewritten the entire way
microkernels are tuned and accessed in the library. Therefore, the
majority of tuning code has been touched or rewritten, and since this
includes all the generation, etc, it took a long while to get things at
all reliable.
The end goal is that increased microkernel specialization should greatly
increase our weird-shape and parallel scaling performance.
Right now, you will hopefully see much better serial non-GEMM BLAS
performance (eg., small-triangle TRSM or TRMM, for instance). Very
large problems aren't likely to have a huge difference, if prior
releases supported your architecture well (eg., we've added AVX-512 to
the code generators, which obviously will hugely improve SkylakeX
asymptotic performance).
The installs have gone from long to endless, unfortunately. I will fix
this before stable, but right now searches are all brute-force and
ignorance while we concentrate on getting the last of the microkernel
handling solidified. I will attempt to speed up search later, and allow
for a "no-timing" install from archdefs, so that people on
already-supported platforms can skip most or all of the tuning (a
feature many maintainers have long wanted).
For now, terrible install times will just be a feature until we finish
debugging and publish the new BLAS approach.
The major weakness in the install when ran on arbitrary machines right
now (other than time) is in some new cache detection code that creates a
file called atlas_cache.h. This code dies on several machines, and I
haven't had time to track down details. However, if it fails for you,
open up a tracker item and I can tell you how to proceed beyond it even
before fixing the code in question.
Hopefully, this release should be purely faster than any other that came
before, but if you spot performance regressions, please let us know. We
are not yet always using the correct microkernel (even when the library
has built it), because our selection algorithm work is awaiting the
finishing of the new tuning strategy.
Eventually, ATLAS will be able to not only tune microkernels to make the
BLAS/LAPACK, but specialized operations for people wanting to avoid BLAS
overheads (at cost of calling messy microkernels; think of things like
tensor algebra with very small shapes that need to scale, perhaps
machine learning, etc.). This allows you to have detailed cache control
necessary to scale when the problem size isn't large enough to dominate
low-order terms, and thus make BLAS API OK.
ChangeLog (which has almost no detail on massive changes) is below.
Cheers,
Clint
ATLAS 3.11.40 released 10/02/18, highlights of changes from 3.11.39:
* Basically a rewrite of all L3BLAS and LAPACK tuning framework:
+ Complete rewrite of all searches to allow different "views" of
kernels
for maximum performance for all-BLAS usage; present
implementation very
slow even with archdefs, will need to be speed up before stable
+ Complete rewrite of gemm kernel choice mechanism
+ Complete rewrite of all BLAS handling for much improved
small/medium perf
via greater use of microkernels
* Addition of core count to archdefs, because this usually increases
block
factors when maximizing performance
* Addition of -ansi flag to avoid C changes borking include files
* Archdef support for host of modern Intel/AMD + POWER9:
- Corei264AVXp16, Corei3EP64AVXMACp36, Corei4X64AVXZp18,
- AMD64K10h64SSE3p32, AMDRyzen64AVXMACp[8,16,64]
- ARM64xgene164p8, ARM64thund64p48
- POWER964LEVSXp8
* Addition of cpuid-based cache detection for Intel & AMD x86 machines
- Presently gets wrong answer on some machines, where shared caches
are either multiplied or divided by P inappropriately
* Beginning of rewrite of generic cache detection
* Fixed bug where names like "c99-gcc" preferred over "gcc"
* Added -Si indthr 1 option to autoprobe for aliased thread IDs
+ Presently, only supported on ARM64 & x86 with at least SSE2
* Complete rewrite of gemm kernel indexing to compact data structures
and minimize cache pollution
|
|
From: R. C. W. <rcw...@iu...> - 2017-08-17 13:15:37
|
Guys, I am now at Indiana University, having just completed my move, and am presently preparing to teach next week. This is a reason for the delay in responding to the several 3.10 patches/questions just recently. I am keeping your e-mails, and will respond as soon as I get on top of the new place and its processes. The recent delay in developer releases is because I rewrote my microkernel handling for greater efficiency, and it has taken a *long* time to get it working again. We are presently working on greatly improving our non-GEMM small-case performance, which I think is going to be worth the wait when I get it out. Anyway, I'm still working on both stable & developer, and will respond as soon as I can. Cheers, Clint -- ****************************************** ** R. Clint Whaley, PhD, Assoc Prof, IU ** http://homes.soic.indiana.edu/rcwhaley/ ****************************************** |
|
From: R. C. W. <wh...@my...> - 2017-06-30 00:14:15
|
> > The implementation of HT has improved over the years, so please don't > assume results obtained on older processors are applicable to the > current ones. I used to be a HT skeptic but almost everything runs > faster with them on Haswell and later, particularly the client parts > (i.e. Core series as opposed to Xeon). Unless they have changed the definition of what HT does, I do not see a theoretical way to avoid the cache problem. > > You might try running an actual application, where you get a mix > of kernels. This tends to stress the cache more, and can > sometimes expose the downside of HT. > > > On the other hand, idle HTs help with OS interrupts and other stuff > that happens quite a bit in an HPC environment once one starts using > MPI etc. This is one of the reasons I encourage everyone to enable HT > in the BIOS even if their applications don't use them. If the OS interrupts, its interrupting all threads, so I don't think I'm following this line of thought. Maybe you mean that if you have a huge stack of threads to be run, using HT you have 2 or 4 slots to round robin into once interrupted? > > I remember finding slight speedup in some case leading me to think > HT was helpful, but then I had performance collapses other places, > which led to me to recommend turning it off (or using affinity to > avoid it, like MKL is doing, if you can't turn it off) to maximize > performance. > > > If nothing else, HT doubles the number of threads, which hurts any > part of a code that scales poorly, and it makes it harder to manage > affinity. I had to spend quite a bit of time helping users with SMT > (2-4 HW threads per core) on Blue Gene/Q in my old job. > > So, for instance, take LAPACK or ATLAS LU or QR (or your own > version) and hook them up to the two BLAS. Does the non-MKL > HT-liking kernel get anywhere close to MKL performance despite > it's gemm looking as good with HT, or does it collapse its > performance while MKL maintains? > > > I don't have test driver for those already so I'm afraid I'm not going > to punt on those experiments. However, if somebody else posts the > code, I'll certainly run it and post results for generally available > hardware. ATLAS comes with timers for any or all of these. They are built to time other's libs too. For instance, set BLASlib to MKL, set FLAPACKlib to your f77 lapack, and "make xdtlatime_fl_sb" will time using MKL + LAPACK. Switch BLASlib to bliss now, remake, voila. > My guess is the MKL group got the same "HT not-reliable, non-HT > is" results, and that's why its behaving in this way. > > > Maybe. In any case, it simplifies the design space to not have to > think about >1 threads sharing an L1. L1 is not the problem on modern machines. As you scale like with Xeon-E series you need to use every scrap of cache, including shared. If you use the full scale of something like 12-cores per shared cache, I believe you will see substantial slowdowns from HT. Cheers, Clint > > Jeff > > Thanks for results! > Clint > > On 06/29/2017 05:56 PM, Hammond, Jeff R wrote: > > Good catch. strace shows only 35 calls to clone in both cases > with MKL. I didn’t know that MKL was doing these tricks. > > However, I tested another DGEMM implementation that supports > AVX2 and it uses all of the HTs and it performs on par with > MKL, but only when HT is used. > > Jeff > > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 > KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x > 2>&1 | head -n5000 | grep -c clone > 71 > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 > KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x > 2>&1 | head -n5000 | grep -c clone > 35 > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 > KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep > -v "%" > blis_dgemm_nn_rrr 384 384 384 204.027 > 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 650.820 > 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 816.355 > 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 835.650 > 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 832.179 > 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 863.123 > 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 844.502 > 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 860.262 > 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 851.694 > 5.80e-18 PASS > blis_dgemm_nn_rrr 3840 3840 3840 856.526 > 6.79e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 > KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep > -v "%" > blis_dgemm_nn_rrr 384 384 384 161.331 > 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 437.967 > 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 545.498 > 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 616.338 > 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 606.650 > 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 611.153 > 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 603.314 > 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 631.292 > 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 625.833 > 5.80e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 > KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep > -v "%" > blis_dgemm_nn_rrr 384 384 384 159.789 > 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 443.810 > 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 536.077 > 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 596.069 > 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 595.763 > 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 616.531 > 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 591.823 > 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 615.153 > 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 621.714 > 5.80e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 > KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep > -v "%" > blis_dgemm_nn_rrr 384 384 384 189.615 > 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 423.504 > 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 445.424 > 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 444.830 > 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 442.893 > 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 445.979 > 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 445.694 > 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 451.026 > 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 454.909 > 5.80e-18 PASS > > > On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley > <rcw...@ls... > <mailto:rcw...@ls...><mailto:rcw...@ls... > <mailto:rcw...@ls...>>> wrote: > Jeff, > > Have you run a thread monitor to see if MKL is simply not > using the hyperthreading regardless of whether it is on or off > in BIOS? > > You also may want to try something like LU. > > Cheers, > Clint > > > On 06/29/2017 05:15 PM, Jeff Hammond wrote: > I don't see any negative impact from using HT relative to not > using HT, at > least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% > gain here is > irrelevant and may be due to thermal effects (this box is in > my cubicle, > not an air-conditioned machine room). > > $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765 > Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930 > > HT on > > $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073 > Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853 > > I would be interested to see folks post data to support the > argument > against HT. > > Jeff > > On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel < > mat...@li... > <mailto:mat...@li...><mailto:mat...@li... > <mailto:mat...@li...>>> wrote: > > Thank you very much for quick response. Just to check if my > understanding > is correct : > > 1. By turning off cpuid in bios, I only need to use -t N to > build Atlas > right? > > 2. The N in -t N is the total number of threads on the > machine, not per > Cpu right ? > > 3. One more question I have is, how to set the correct -t N > for mpi based > application. > Let's say on the 2-cpu machine with 4 cores per CPU, > should I use -t > 4 or -t 8 if I rum my application with 2 mpi processes : > mpirun -n 2 myprogram > > Many thanks ! > > Sent from Yahoo Mail on Android > > On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley > <wh...@my... > <mailto:wh...@my...><mailto:wh...@my... > <mailto:wh...@my...>>> wrote: > Hyperthreading is an optimization aimed at addressing poorly > optimized > code. The idea is that most codes cannot drive the backend > hardware > (ALU/FPU, etc) at the maximal rate, so if you duplicate > registers you > can, amongst several threads, find enough work to keep the > backend busy. > > ATLAS (or any optimized linear algebra library) already runs > the FPU at > its maximal rate supported by the cache architecture after > cache blocking. > > If you can already drive the backend at >90% of peak, then > hyperthreading can actually *lose* you performance, as the > threads bring > conflicting data in the cache. > > It's usually not a night and day difference, but I haven't > measured it > in the huge blocking era used by recent developer releases (it > may be > worse there). > > My general recommendation is turn off hyperthreading for highly > optimized codes, and turn it on for relatively unoptimized codes. > > As to which core IDs correspond to the physical cores, that > varies by > machine. On x86, you can use CPUID to determine that if you are > super-knowledgeable. I usually just turn it off in the BIOS, > because I > don't like something that may thrash my cache running, even if > it might > occasionally help :) > > Cheers, > Clint > > On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: > Hello,Would like go check if my understanding is correct for > compiling > Atlas on a machine that has multiple CPUs and hyperthreading. > I have two types of machine: > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- > 2 CPU, > each with 8 Cores, hyperthreaded, 2 threads per core > So when I compile Atlas, is it correct that I should use: > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the > affinity ID > is from 0-7 and 0-15). > That means the number 8 or 16 is the total cores on the > machine, not > number of cores per CPU. Am I correct ? > I also read somewhere saying that Atlas supports > Hyperthreading. What > does this mean ? > Does this mean:1. I do not need to disable hyperthreading in > BIOS (no > performance difference whether it is enabled or disabled, as > long as the > number of threads and affinity IDs are set correctly when > compiling > Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 > and -tl 32 ? > Thank you very much, > lixin > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > <mailto:Mat...@li...><mailto:Mat...@li... > <mailto:Mat...@li...>> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel> > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > <mailto:Mat...@li...><mailto:Mat...@li... > <mailto:Mat...@li...>> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel> > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > <mailto:Mat...@li...><mailto:Mat...@li... > <mailto:Mat...@li...>> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel> > > > > -- > Jeff Hammond > jef...@gm... > <mailto:jef...@gm...><mailto:jef...@gm... > <mailto:jef...@gm...>> > http://jeffhammond.github.io/ > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > <mailto:Mat...@li...><mailto:Mat...@li... > <mailto:Mat...@li...>> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel> > > -- > ********************************************************************** > ** R. Clint Whaley, PhD * Assoc Prof, LSU * > www.csc.lsu.edu/~whaley > <http://www.csc.lsu.edu/%7Ewhaley><http://www.csc.lsu.edu/~whaley > <http://www.csc.lsu.edu/%7Ewhaley>> ** > ********************************************************************** > > > > > -- > Jeff Hammond > jef...@gm... > <mailto:jef...@gm...><mailto:jef...@gm... > <mailto:jef...@gm...>> > http://jeffhammond.github.io/ > > > > -- > ********************************************************************** > ** R. Clint Whaley, PhD * Assoc Prof, LSU * > www.csc.lsu.edu/~whaley <http://www.csc.lsu.edu/%7Ewhaley> ** > ********************************************************************** > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > <mailto:Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel> > > > > > -- > Jeff Hammond > jef...@gm... <mailto:jef...@gm...> > http://jeffhammond.github.io/ > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
From: Jeff H. <jef...@gm...> - 2017-06-29 23:33:40
|
On Thu, Jun 29, 2017 at 4:10 PM, R. Clint Whaley <rcw...@ls...> wrote: > Yeah, if it can't get that perf w/o hyperthreading, its not fully tuned. > > Agreed. BLIS is just a framework and I'm using the default blocking parameters. I know from discussions with Greg Henry that scaling all the way out on the high-core-count Xeon processors requires some algorithm changes. I expect that if I play around with the knobs of BLIS, it will perform optimally with 1 HT per core. > Back in day when I investigated HT, the problem really is in cache > stomping, as two threads compete for the same cache. This makes the > effects unpredictable (if the cache wasn't being fully utilized, maybe no > effect, if you get lucky on the replacement, maybe tiny effect, and if you > get unlucky, an truly bad dropoff). > > The implementation of HT has improved over the years, so please don't assume results obtained on older processors are applicable to the current ones. I used to be a HT skeptic but almost everything runs faster with them on Haswell and later, particularly the client parts (i.e. Core series as opposed to Xeon). > You might try running an actual application, where you get a mix of > kernels. This tends to stress the cache more, and can sometimes expose the > downside of HT. > > On the other hand, idle HTs help with OS interrupts and other stuff that happens quite a bit in an HPC environment once one starts using MPI etc. This is one of the reasons I encourage everyone to enable HT in the BIOS even if their applications don't use them. > I remember finding slight speedup in some case leading me to think HT was > helpful, but then I had performance collapses other places, which led to me > to recommend turning it off (or using affinity to avoid it, like MKL is > doing, if you can't turn it off) to maximize performance. > > If nothing else, HT doubles the number of threads, which hurts any part of a code that scales poorly, and it makes it harder to manage affinity. I had to spend quite a bit of time helping users with SMT (2-4 HW threads per core) on Blue Gene/Q in my old job. > So, for instance, take LAPACK or ATLAS LU or QR (or your own version) and > hook them up to the two BLAS. Does the non-MKL HT-liking kernel get > anywhere close to MKL performance despite it's gemm looking as good with > HT, or does it collapse its performance while MKL maintains? > > I don't have test driver for those already so I'm afraid I'm not going to punt on those experiments. However, if somebody else posts the code, I'll certainly run it and post results for generally available hardware. > My guess is the MKL group got the same "HT not-reliable, non-HT is" > results, and that's why its behaving in this way. > > Maybe. In any case, it simplifies the design space to not have to think about >1 threads sharing an L1. Jeff > Thanks for results! > Clint > > On 06/29/2017 05:56 PM, Hammond, Jeff R wrote: > >> Good catch. strace shows only 35 calls to clone in both cases with MKL. >> I didn’t know that MKL was doing these tricks. >> >> However, I tested another DGEMM implementation that supports AVX2 and it >> uses all of the HTs and it performs on par with MKL, but only when HT is >> used. >> >> Jeff >> >> >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 >> KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x 2>&1 | >> head -n5000 | grep -c clone >> 71 >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 >> KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x 2>&1 | >> head -n5000 | grep -c clone >> 35 >> >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 >> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%" >> blis_dgemm_nn_rrr 384 384 384 204.027 8.27e-18 >> PASS >> blis_dgemm_nn_rrr 768 768 768 650.820 5.36e-18 >> PASS >> blis_dgemm_nn_rrr 1152 1152 1152 816.355 4.40e-18 >> PASS >> blis_dgemm_nn_rrr 1536 1536 1536 835.650 7.02e-18 >> PASS >> blis_dgemm_nn_rrr 1920 1920 1920 832.179 9.96e-18 >> PASS >> blis_dgemm_nn_rrr 2304 2304 2304 863.123 6.28e-18 >> PASS >> blis_dgemm_nn_rrr 2688 2688 2688 844.502 8.28e-18 >> PASS >> blis_dgemm_nn_rrr 3072 3072 3072 860.262 9.92e-18 >> PASS >> blis_dgemm_nn_rrr 3456 3456 3456 851.694 5.80e-18 >> PASS >> blis_dgemm_nn_rrr 3840 3840 3840 856.526 6.79e-18 >> PASS >> >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 >> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%" >> blis_dgemm_nn_rrr 384 384 384 161.331 8.27e-18 >> PASS >> blis_dgemm_nn_rrr 768 768 768 437.967 5.36e-18 >> PASS >> blis_dgemm_nn_rrr 1152 1152 1152 545.498 4.40e-18 >> PASS >> blis_dgemm_nn_rrr 1536 1536 1536 616.338 7.02e-18 >> PASS >> blis_dgemm_nn_rrr 1920 1920 1920 606.650 9.96e-18 >> PASS >> blis_dgemm_nn_rrr 2304 2304 2304 611.153 6.28e-18 >> PASS >> blis_dgemm_nn_rrr 2688 2688 2688 603.314 8.28e-18 >> PASS >> blis_dgemm_nn_rrr 3072 3072 3072 631.292 9.92e-18 >> PASS >> blis_dgemm_nn_rrr 3456 3456 3456 625.833 5.80e-18 >> PASS >> >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 >> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%" >> blis_dgemm_nn_rrr 384 384 384 159.789 8.27e-18 >> PASS >> blis_dgemm_nn_rrr 768 768 768 443.810 5.36e-18 >> PASS >> blis_dgemm_nn_rrr 1152 1152 1152 536.077 4.40e-18 >> PASS >> blis_dgemm_nn_rrr 1536 1536 1536 596.069 7.02e-18 >> PASS >> blis_dgemm_nn_rrr 1920 1920 1920 595.763 9.96e-18 >> PASS >> blis_dgemm_nn_rrr 2304 2304 2304 616.531 6.28e-18 >> PASS >> blis_dgemm_nn_rrr 2688 2688 2688 591.823 8.28e-18 >> PASS >> blis_dgemm_nn_rrr 3072 3072 3072 615.153 9.92e-18 >> PASS >> blis_dgemm_nn_rrr 3456 3456 3456 621.714 5.80e-18 >> PASS >> >> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 >> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%" >> blis_dgemm_nn_rrr 384 384 384 189.615 8.27e-18 >> PASS >> blis_dgemm_nn_rrr 768 768 768 423.504 5.36e-18 >> PASS >> blis_dgemm_nn_rrr 1152 1152 1152 445.424 4.40e-18 >> PASS >> blis_dgemm_nn_rrr 1536 1536 1536 444.830 7.02e-18 >> PASS >> blis_dgemm_nn_rrr 1920 1920 1920 442.893 9.96e-18 >> PASS >> blis_dgemm_nn_rrr 2304 2304 2304 445.979 6.28e-18 >> PASS >> blis_dgemm_nn_rrr 2688 2688 2688 445.694 8.28e-18 >> PASS >> blis_dgemm_nn_rrr 3072 3072 3072 451.026 9.92e-18 >> PASS >> blis_dgemm_nn_rrr 3456 3456 3456 454.909 5.80e-18 >> PASS >> >> >> On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley <rcw...@ls... >> <mailto:rcw...@ls...>> wrote: >> Jeff, >> >> Have you run a thread monitor to see if MKL is simply not using the >> hyperthreading regardless of whether it is on or off in BIOS? >> >> You also may want to try something like LU. >> >> Cheers, >> Clint >> >> >> On 06/29/2017 05:15 PM, Jeff Hammond wrote: >> I don't see any negative impact from using HT relative to not using HT, at >> least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is >> irrelevant and may be due to thermal effects (this box is in my cubicle, >> not an air-conditioned machine room). >> >> $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine >> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) >> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s >> Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765 >> Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930 >> >> HT on >> >> $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine >> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) >> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s >> Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073 >> Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853 >> >> I would be interested to see folks post data to support the argument >> against HT. >> >> Jeff >> >> On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel < >> mat...@li...<mailto:math-atlas- >> de...@li...>> wrote: >> >> Thank you very much for quick response. Just to check if my understanding >> is correct : >> >> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas >> right? >> >> 2. The N in -t N is the total number of threads on the machine, not per >> Cpu right ? >> >> 3. One more question I have is, how to set the correct -t N for mpi based >> application. >> Let's say on the 2-cpu machine with 4 cores per CPU, should I use >> -t >> 4 or -t 8 if I rum my application with 2 mpi processes : >> mpirun -n 2 myprogram >> >> Many thanks ! >> >> Sent from Yahoo Mail on Android >> >> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley >> <wh...@my...<mailto:wh...@my...>> wrote: >> Hyperthreading is an optimization aimed at addressing poorly optimized >> code. The idea is that most codes cannot drive the backend hardware >> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you >> can, amongst several threads, find enough work to keep the backend busy. >> >> ATLAS (or any optimized linear algebra library) already runs the FPU at >> its maximal rate supported by the cache architecture after cache blocking. >> >> If you can already drive the backend at >90% of peak, then >> hyperthreading can actually *lose* you performance, as the threads bring >> conflicting data in the cache. >> >> It's usually not a night and day difference, but I haven't measured it >> in the huge blocking era used by recent developer releases (it may be >> worse there). >> >> My general recommendation is turn off hyperthreading for highly >> optimized codes, and turn it on for relatively unoptimized codes. >> >> As to which core IDs correspond to the physical cores, that varies by >> machine. On x86, you can use CPUID to determine that if you are >> super-knowledgeable. I usually just turn it off in the BIOS, because I >> don't like something that may thrash my cache running, even if it might >> occasionally help :) >> >> Cheers, >> Clint >> >> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: >> Hello,Would like go check if my understanding is correct for compiling >> Atlas on a machine that has multiple CPUs and hyperthreading. >> I have two types of machine: >> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, >> each with 8 Cores, hyperthreaded, 2 threads per core >> So when I compile Atlas, is it correct that I should use: >> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID >> is from 0-7 and 0-15). >> That means the number 8 or 16 is the total cores on the machine, not >> number of cores per CPU. Am I correct ? >> I also read somewhere saying that Atlas supports Hyperthreading. What >> does this mean ? >> Does this mean:1. I do not need to disable hyperthreading in BIOS (no >> performance difference whether it is enabled or disabled, as long as the >> number of threads and affinity IDs are set correctly when compiling >> Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 >> ? >> Thank you very much, >> lixin >> >> >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li...<mailto:Math-atlas- >> de...@li...> >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >> >> >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li...<mailto:Math-atlas- >> de...@li...> >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li...<mailto:Math-atlas- >> de...@li...> >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >> >> >> -- >> Jeff Hammond >> jef...@gm...<mailto:jef...@gm...> >> http://jeffhammond.github.io/ >> >> >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li...<mailto:Math-atlas- >> de...@li...> >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >> -- >> ********************************************************************** >> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley< >> http://www.csc.lsu.edu/~whaley> ** >> ********************************************************************** >> >> >> >> >> -- >> Jeff Hammond >> jef...@gm...<mailto:jef...@gm...> >> http://jeffhammond.github.io/ >> >> >> > -- > ********************************************************************** > ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** > ********************************************************************** > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > -- Jeff Hammond jef...@gm... http://jeffhammond.github.io/ |
|
From: R. C. W. <rcw...@ls...> - 2017-06-29 23:14:53
|
just realized my reply only went to Jeff. -------- Forwarded Message -------- Subject: Re: [atlas-devel] Compiling Atlas with hyperthreading Date: Thu, 29 Jun 2017 17:22:05 -0500 From: R. Clint Whaley <rcw...@ls...> To: Jeff Hammond <jef...@gm...> Jeff, Have you run a thread monitor to see if MKL is simply not using the hyperthreading regardless of whether it is on or off in BIOS? You also may want to try something like LU. Cheers, Clint On 06/29/2017 05:15 PM, Jeff Hammond wrote: > I don't see any negative impact from using HT relative to not using HT, at > least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is > irrelevant and may be due to thermal effects (this box is in my cubicle, > not an air-conditioned machine room). > > $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765 > Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930 > > HT on > > $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073 > Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853 > > I would be interested to see folks post data to support the argument > against HT. > > Jeff > > On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel < > mat...@li...> wrote: >> >> Thank you very much for quick response. Just to check if my understanding > is correct : >> >> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas > right? >> >> 2. The N in -t N is the total number of threads on the machine, not per > Cpu right ? >> >> 3. One more question I have is, how to set the correct -t N for mpi based > application. >> Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t > 4 or -t 8 if I rum my application with 2 mpi processes : >> mpirun -n 2 myprogram >> >> Many thanks ! >> >> Sent from Yahoo Mail on Android >> >> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley >> <wh...@my...> wrote: >> Hyperthreading is an optimization aimed at addressing poorly optimized >> code. The idea is that most codes cannot drive the backend hardware >> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you >> can, amongst several threads, find enough work to keep the backend busy. >> >> ATLAS (or any optimized linear algebra library) already runs the FPU at >> its maximal rate supported by the cache architecture after cache blocking. >> >> If you can already drive the backend at >90% of peak, then >> hyperthreading can actually *lose* you performance, as the threads bring >> conflicting data in the cache. >> >> It's usually not a night and day difference, but I haven't measured it >> in the huge blocking era used by recent developer releases (it may be >> worse there). >> >> My general recommendation is turn off hyperthreading for highly >> optimized codes, and turn it on for relatively unoptimized codes. >> >> As to which core IDs correspond to the physical cores, that varies by >> machine. On x86, you can use CPUID to determine that if you are >> super-knowledgeable. I usually just turn it off in the BIOS, because I >> don't like something that may thrash my cache running, even if it might >> occasionally help :) >> >> Cheers, >> Clint >> >> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: >>> Hello,Would like go check if my understanding is correct for compiling > Atlas on a machine that has multiple CPUs and hyperthreading. >>> I have two types of machine: >>> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, > each with 8 Cores, hyperthreaded, 2 threads per core >>> So when I compile Atlas, is it correct that I should use: >>> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID > is from 0-7 and 0-15). >>> That means the number 8 or 16 is the total cores on the machine, not > number of cores per CPU. Am I correct ? >>> I also read somewhere saying that Atlas supports Hyperthreading. What > does this mean ? >>> Does this mean:1. I do not need to disable hyperthreading in BIOS (no > performance difference whether it is enabled or disabled, as long as the > number of threads and affinity IDs are set correctly when compiling > Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ? >>> Thank you very much, >>> lixin >>> >>> >>> >>> > ------------------------------------------------------------------------------ >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> >>> >>> >>> _______________________________________________ >>> Math-atlas-devel mailing list >>> Mat...@li... >>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >>> >> >> >> > ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> >> >> > ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel >> > > > > -- > Jeff Hammond > jef...@gm... > http://jeffhammond.github.io/ > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
|
From: R. C. W. <rcw...@ls...> - 2017-06-29 23:10:56
|
Yeah, if it can't get that perf w/o hyperthreading, its not fully tuned. Back in day when I investigated HT, the problem really is in cache stomping, as two threads compete for the same cache. This makes the effects unpredictable (if the cache wasn't being fully utilized, maybe no effect, if you get lucky on the replacement, maybe tiny effect, and if you get unlucky, an truly bad dropoff). You might try running an actual application, where you get a mix of kernels. This tends to stress the cache more, and can sometimes expose the downside of HT. I remember finding slight speedup in some case leading me to think HT was helpful, but then I had performance collapses other places, which led to me to recommend turning it off (or using affinity to avoid it, like MKL is doing, if you can't turn it off) to maximize performance. So, for instance, take LAPACK or ATLAS LU or QR (or your own version) and hook them up to the two BLAS. Does the non-MKL HT-liking kernel get anywhere close to MKL performance despite it's gemm looking as good with HT, or does it collapse its performance while MKL maintains? My guess is the MKL group got the same "HT not-reliable, non-HT is" results, and that's why its behaving in this way. Thanks for results! Clint On 06/29/2017 05:56 PM, Hammond, Jeff R wrote: > Good catch. strace shows only 35 calls to clone in both cases with MKL. I didn’t know that MKL was doing these tricks. > > However, I tested another DGEMM implementation that supports AVX2 and it uses all of the HTs and it performs on par with MKL, but only when HT is used. > > Jeff > > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x 2>&1 | head -n5000 | grep -c clone > 71 > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x 2>&1 | head -n5000 | grep -c clone > 35 > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%" > blis_dgemm_nn_rrr 384 384 384 204.027 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 650.820 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 816.355 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 835.650 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 832.179 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 863.123 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 844.502 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 860.262 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 851.694 5.80e-18 PASS > blis_dgemm_nn_rrr 3840 3840 3840 856.526 6.79e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%" > blis_dgemm_nn_rrr 384 384 384 161.331 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 437.967 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 545.498 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 616.338 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 606.650 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 611.153 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 603.314 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 631.292 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 625.833 5.80e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%" > blis_dgemm_nn_rrr 384 384 384 159.789 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 443.810 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 536.077 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 596.069 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 595.763 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 616.531 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 591.823 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 615.153 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 621.714 5.80e-18 PASS > > [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%" > blis_dgemm_nn_rrr 384 384 384 189.615 8.27e-18 PASS > blis_dgemm_nn_rrr 768 768 768 423.504 5.36e-18 PASS > blis_dgemm_nn_rrr 1152 1152 1152 445.424 4.40e-18 PASS > blis_dgemm_nn_rrr 1536 1536 1536 444.830 7.02e-18 PASS > blis_dgemm_nn_rrr 1920 1920 1920 442.893 9.96e-18 PASS > blis_dgemm_nn_rrr 2304 2304 2304 445.979 6.28e-18 PASS > blis_dgemm_nn_rrr 2688 2688 2688 445.694 8.28e-18 PASS > blis_dgemm_nn_rrr 3072 3072 3072 451.026 9.92e-18 PASS > blis_dgemm_nn_rrr 3456 3456 3456 454.909 5.80e-18 PASS > > On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley <rcw...@ls...<mailto:rcw...@ls...>> wrote: > Jeff, > > Have you run a thread monitor to see if MKL is simply not using the hyperthreading regardless of whether it is on or off in BIOS? > > You also may want to try something like LU. > > Cheers, > Clint > > > On 06/29/2017 05:15 PM, Jeff Hammond wrote: > I don't see any negative impact from using HT relative to not using HT, at > least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is > irrelevant and may be due to thermal effects (this box is in my cubicle, > not an air-conditioned machine room). > > $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765 > Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930 > > HT on > > $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine > ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4)) > BLAS_NAME dim1 dim2 dim3 seconds Gflop/s > Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073 > Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853 > > I would be interested to see folks post data to support the argument > against HT. > > Jeff > > On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel < > mat...@li...<mailto:mat...@li...>> wrote: > > Thank you very much for quick response. Just to check if my understanding > is correct : > > 1. By turning off cpuid in bios, I only need to use -t N to build Atlas > right? > > 2. The N in -t N is the total number of threads on the machine, not per > Cpu right ? > > 3. One more question I have is, how to set the correct -t N for mpi based > application. > Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t > 4 or -t 8 if I rum my application with 2 mpi processes : > mpirun -n 2 myprogram > > Many thanks ! > > Sent from Yahoo Mail on Android > > On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley > <wh...@my...<mailto:wh...@my...>> wrote: > Hyperthreading is an optimization aimed at addressing poorly optimized > code. The idea is that most codes cannot drive the backend hardware > (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you > can, amongst several threads, find enough work to keep the backend busy. > > ATLAS (or any optimized linear algebra library) already runs the FPU at > its maximal rate supported by the cache architecture after cache blocking. > > If you can already drive the backend at >90% of peak, then > hyperthreading can actually *lose* you performance, as the threads bring > conflicting data in the cache. > > It's usually not a night and day difference, but I haven't measured it > in the huge blocking era used by recent developer releases (it may be > worse there). > > My general recommendation is turn off hyperthreading for highly > optimized codes, and turn it on for relatively unoptimized codes. > > As to which core IDs correspond to the physical cores, that varies by > machine. On x86, you can use CPUID to determine that if you are > super-knowledgeable. I usually just turn it off in the BIOS, because I > don't like something that may thrash my cache running, even if it might > occasionally help :) > > Cheers, > Clint > > On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: > Hello,Would like go check if my understanding is correct for compiling > Atlas on a machine that has multiple CPUs and hyperthreading. > I have two types of machine: > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, > each with 8 Cores, hyperthreaded, 2 threads per core > So when I compile Atlas, is it correct that I should use: > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID > is from 0-7 and 0-15). > That means the number 8 or 16 is the total cores on the machine, not > number of cores per CPU. Am I correct ? > I also read somewhere saying that Atlas supports Hyperthreading. What > does this mean ? > Does this mean:1. I do not need to disable hyperthreading in BIOS (no > performance difference whether it is enabled or disabled, as long as the > number of threads and affinity IDs are set correctly when compiling > Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ? > Thank you very much, > lixin > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li...<mailto:Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li...<mailto:Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li...<mailto:Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > > > -- > Jeff Hammond > jef...@gm...<mailto:jef...@gm...> > http://jeffhammond.github.io/ > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li...<mailto:Mat...@li...> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > -- > ********************************************************************** > ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley<http://www.csc.lsu.edu/~whaley> ** > ********************************************************************** > > > > > -- > Jeff Hammond > jef...@gm...<mailto:jef...@gm...> > http://jeffhammond.github.io/ > > -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
|
From: Jeff H. <jef...@gm...> - 2017-06-29 22:16:03
|
I don't see any negative impact from using HT relative to not using HT, at
least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is
irrelevant and may be due to thermal effects (this box is in my cubicle,
not an air-conditioned machine room).
$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
HT on
$ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
I would be interested to see folks post data to support the argument
against HT.
Jeff
On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
mat...@li...> wrote:
>
> Thank you very much for quick response. Just to check if my understanding
is correct :
>
> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas
right?
>
> 2. The N in -t N is the total number of threads on the machine, not per
Cpu right ?
>
> 3. One more question I have is, how to set the correct -t N for mpi based
application.
> Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t
4 or -t 8 if I rum my application with 2 mpi processes :
> mpirun -n 2 myprogram
>
> Many thanks !
>
> Sent from Yahoo Mail on Android
>
> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
> <wh...@my...> wrote:
> Hyperthreading is an optimization aimed at addressing poorly optimized
> code. The idea is that most codes cannot drive the backend hardware
> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you
> can, amongst several threads, find enough work to keep the backend busy.
>
> ATLAS (or any optimized linear algebra library) already runs the FPU at
> its maximal rate supported by the cache architecture after cache blocking.
>
> If you can already drive the backend at >90% of peak, then
> hyperthreading can actually *lose* you performance, as the threads bring
> conflicting data in the cache.
>
> It's usually not a night and day difference, but I haven't measured it
> in the huge blocking era used by recent developer releases (it may be
> worse there).
>
> My general recommendation is turn off hyperthreading for highly
> optimized codes, and turn it on for relatively unoptimized codes.
>
> As to which core IDs correspond to the physical cores, that varies by
> machine. On x86, you can use CPUID to determine that if you are
> super-knowledgeable. I usually just turn it off in the BIOS, because I
> don't like something that may thrash my cache running, even if it might
> occasionally help :)
>
> Cheers,
> Clint
>
> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> > Hello,Would like go check if my understanding is correct for compiling
Atlas on a machine that has multiple CPUs and hyperthreading.
> > I have two types of machine:
> > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU,
each with 8 Cores, hyperthreaded, 2 threads per core
> > So when I compile Atlas, is it correct that I should use:
> > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID
is from 0-7 and 0-15).
> > That means the number 8 or 16 is the total cores on the machine, not
number of cores per CPU. Am I correct ?
> > I also read somewhere saying that Atlas supports Hyperthreading. What
does this mean ?
> > Does this mean:1. I do not need to disable hyperthreading in BIOS (no
performance difference whether it is enabled or disabled, as long as the
number of threads and affinity IDs are set correctly when compiling
Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
> > Thank you very much,
> > lixin
> >
> >
> >
> >
------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > _______________________________________________
> > Math-atlas-devel mailing list
> > Mat...@li...
> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
> >
>
>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
--
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
|
|
From: lixin c. <lix...@ya...> - 2017-06-29 15:12:02
|
Thank you very much for quick response. Just to check if my understanding is correct : 1. By turning off cpuid in bios, I only need to use -t N to build Atlas right? 2. The N in -t N is the total number of threads on the machine, not per Cpu right ? 3. One more question I have is, how to set the correct -t N for mpi based application. Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t 4 or -t 8 if I rum my application with 2 mpi processes : mpirun -n 2 myprogram Many thanks ! Sent from Yahoo Mail on Android On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley<wh...@my...> wrote: Hyperthreading is an optimization aimed at addressing poorly optimized code. The idea is that most codes cannot drive the backend hardware (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you can, amongst several threads, find enough work to keep the backend busy. ATLAS (or any optimized linear algebra library) already runs the FPU at its maximal rate supported by the cache architecture after cache blocking. If you can already drive the backend at >90% of peak, then hyperthreading can actually *lose* you performance, as the threads bring conflicting data in the cache. It's usually not a night and day difference, but I haven't measured it in the huge blocking era used by recent developer releases (it may be worse there). My general recommendation is turn off hyperthreading for highly optimized codes, and turn it on for relatively unoptimized codes. As to which core IDs correspond to the physical cores, that varies by machine. On x86, you can use CPUID to determine that if you are super-knowledgeable. I usually just turn it off in the BIOS, because I don't like something that may thrash my cache running, even if it might occasionally help :) Cheers, Clint On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: > Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading. > I have two types of machine: > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core > So when I compile Atlas, is it correct that I should use: > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15). > That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ? > I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ? > Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ? > Thank you very much, > lixin > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Math-atlas-devel mailing list Mat...@li... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
From: R. C. W. <wh...@my...> - 2017-06-29 14:20:34
|
Hyperthreading is an optimization aimed at addressing poorly optimized code. The idea is that most codes cannot drive the backend hardware (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you can, amongst several threads, find enough work to keep the backend busy. ATLAS (or any optimized linear algebra library) already runs the FPU at its maximal rate supported by the cache architecture after cache blocking. If you can already drive the backend at >90% of peak, then hyperthreading can actually *lose* you performance, as the threads bring conflicting data in the cache. It's usually not a night and day difference, but I haven't measured it in the huge blocking era used by recent developer releases (it may be worse there). My general recommendation is turn off hyperthreading for highly optimized codes, and turn it on for relatively unoptimized codes. As to which core IDs correspond to the physical cores, that varies by machine. On x86, you can use CPUID to determine that if you are super-knowledgeable. I usually just turn it off in the BIOS, because I don't like something that may thrash my cache running, even if it might occasionally help :) Cheers, Clint On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote: > Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading. > I have two types of machine: > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core > So when I compile Atlas, is it correct that I should use: > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15). > That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ? > I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ? > Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ? > Thank you very much, > lixin > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > |
|
From: lixin c. <lix...@ya...> - 2017-06-29 03:32:37
|
Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading. I have two types of machine: - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core So when I compile Atlas, is it correct that I should use: -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15). That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ? I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ? Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ? Thank you very much, lixin |
|
From: R. C. W. <rcw...@ls...> - 2017-03-20 13:47:19
|
So far, it still must be compile-time chosen. We need it for affinity, which is necessary when OS does a poor job of managing the threads. Eventually I may be able to support run-time choice for the OpenMP implementation, which has its own scheduler (though in the cases where ATLAS used affinity in past it got horrible performance). Right now, I have not yet gotten time to look at that part of the threading package, as I'm in the middle of big kernel redesign still. Regards, Clint On 03/19/2017 12:20 PM, José Luis García Pallero wrote: > Hello: > > I've not used ATLAS for a while and I would like to ask if the library > has yet the ability to select the number of execution thread at > execution time instead of at compilation time. I remember that this > feature was discussed in the past, but I'm not sure if finally it was > considered for the future > > Thanks > -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
|
From: José L. G. P. <jgp...@gm...> - 2017-03-19 17:21:06
|
Hello: I've not used ATLAS for a while and I would like to ask if the library has yet the ability to select the number of execution thread at execution time instead of at compilation time. I remember that this feature was discussed in the past, but I'm not sure if finally it was considered for the future Thanks -- ***************************************** José Luis García Pallero jgp...@gm... (o< / / \ V_/_ Use Debian GNU/Linux and enjoy! ***************************************** |
|
From: Jeff H. <jef...@gm...> - 2017-01-18 19:30:22
|
I have no idea why this email is full of formatting puke but if it is my fault, I sincerely apologize. Gmail has been going downhill for a while. Jeff On Wed, Jan 18, 2017 at 11:26 AM, Jeff Hammond <jef...@gm...> wrote: > > <div dir="ltr"><br><div class="gmail_extra"><br><div > class="gmail_quote">On Wed, Jan 18, 2017 at 4:31 AM, john skaller > <span dir="ltr"><<a href="mailto:sk...@us..." > target="_blank">sk...@us...</a>></span> > wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 > .8ex;border-left:1px #ccc solid;padding-left:1ex"><span > class="">><br> > > Who would demand this? No one in the Windows world cares > about C99. The only folks I know who want MSVC to support C99 > are HPC developers who still think Windows support matters.<br> > ><br> > <br> > </span><a href=" https://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler " > data-saferedirecturl=" https://www.google.com/url?hl=en&q=http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler&source=gmail&ust=1484852481953000&usg=AFQjCNE3-KZfmfFadCsdnsu8tEzyE-juBA " > rel="noreferrer" > target="_blank">http://stackoverflow.com/ <wbr>questions/9610747/which-c99-<wbr>features-are-available-in-the-<wbr>ms-visual-studio-compiler</a><br> > <span class=""><br></span></blockquote><div><br></div><div>That has > useful information on it, but doesn't answer the question of who would > demand C99 support from MSVC.</div><div> </div><blockquote > class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc > solid;padding-left:1ex"><span class=""> > > I have no idea why anyone would want long long anyhow.<br> > > Use intptr_t instead.<br> > ><br> > ><br> > > "long long" must be at least 64-bits, regardless of how wide > pointers are. On a 32-bit OS, you would see sizeof(long > long)=2*sizeof(intptr_t), no?<br> > <br> > </span>Sure. Is there a HPC computing platform that isn’t 64 bit?<br> > <span class="im > HOEnZb"><br></span></blockquote><div><br></div><div>From what I've > seen on this list, ATLAS is popular with folks that want to run BLAS > on 32-bit platforms, perhaps in an embedded context. These are > not supercomputers but performance > matters.</div><div><br></div><div>Jeff</div><div> </div><blockquote > class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc > solid;padding-left:1ex"><span class="im HOEnZb"> > —<br> > john skaller<br> > <a href="mailto:sk...@us..."> sk...@us...</a><br> > <a href="http://felix-lang.org" > data-saferedirecturl=" https://www.google.com/url?hl=en&q=http://felix-lang.org&source=gmail&ust=1484852481953000&usg=AFQjCNG9D6Tlg8YMMKRW7KWKwmKZSJKOrg " > rel="noreferrer" target="_blank">http://felix-lang.org</a><br> > <br> > <br> > ------------------------------<wbr>------------------------------<wbr>------------------<br> > </span><span class="im HOEnZb">Check out the vibrant tech community on > one of the world's most<br> > engaging tech sites, SlashDot.org! <a href="http://sdm.link/slashdot" > data-saferedirecturl=" https://www.google.com/url?hl=en&q=http://sdm.link/slashdot&source=gmail&ust=1484852481953000&usg=AFQjCNEdgxepWifaye3YEOSX8vHxzwBZKg " > rel="noreferrer" target="_blank">http://sdm.link/slashdot</a><br> > </span><div class="HOEnZb"><div > class="h5">______________________________<wbr>_________________<br> > Math-atlas-devel mailing list<br> > <a href="mailto:Mat...@li... ">Math-atlas-devel@lists.<wbr>sourceforge.net</a><br> > <a href="https://lists.sourceforge.net/lists/listinfo/math-atlas-devel" > data-saferedirecturl=" https://www.google.com/url?hl=en&q=https://lists.sourceforge.net/lists/listinfo/math-atlas-devel&source=gmail&ust=1484852481953000&usg=AFQjCNHcMGmclmITvH_FWV-G52_ODGYDZw " > rel="noreferrer" > target="_blank">https://lists.sourceforge.net/ <wbr>lists/listinfo/math-atlas-<wbr>devel</a><br> > </div></div></blockquote></div><br><br clear="all"><div><br></div>-- > <br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff > Hammond<br><a href="mailto:jef...@gm..." > target="_blank">jef...@gm...</a><br><a > href="https://jeffhammond.github.io/" > target="_blank">http://jeffhammond.github.io/</a></div> > </div></div> -- Jeff Hammond jef...@gm... http://jeffhammond.github.io/ |
|
From: Jeff H. <jef...@gm...> - 2017-01-18 19:27:22
|
<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 18, 2017 at 4:31 AM, john skaller <span dir="ltr"><<a href="mailto:sk...@us..." target="_blank">sk...@us...</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">><br> > Who would demand this? No one in the Windows world cares about C99. The only folks I know who want MSVC to support C99 are HPC developers who still think Windows support matters.<br> ><br> <br> </span><a href="https://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler" data-saferedirecturl="https://www.google.com/url?hl=en&q=http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler&source=gmail&ust=1484852481953000&usg=AFQjCNE3-KZfmfFadCsdnsu8tEzyE-juBA" rel="noreferrer" target="_blank">http://stackoverflow.com/<wbr>questions/9610747/which-c99-<wbr>features-are-available-in-the-<wbr>ms-visual-studio-compiler</a><br> <span class=""><br></span></blockquote><div><br></div><div>That has useful information on it, but doesn't answer the question of who would demand C99 support from MSVC.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""> > I have no idea why anyone would want long long anyhow.<br> > Use intptr_t instead.<br> ><br> ><br> > "long long" must be at least 64-bits, regardless of how wide pointers are. On a 32-bit OS, you would see sizeof(long long)=2*sizeof(intptr_t), no?<br> <br> </span>Sure. Is there a HPC computing platform that isn’t 64 bit?<br> <span class="im HOEnZb"><br></span></blockquote><div><br></div><div>From what I've seen on this list, ATLAS is popular with folks that want to run BLAS on 32-bit platforms, perhaps in an embedded context. These are not supercomputers but performance matters.</div><div><br></div><div>Jeff</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="im HOEnZb"> —<br> john skaller<br> <a href="mailto:sk...@us...">sk...@us...</a><br> <a href="http://felix-lang.org" data-saferedirecturl="https://www.google.com/url?hl=en&q=http://felix-lang.org&source=gmail&ust=1484852481953000&usg=AFQjCNG9D6Tlg8YMMKRW7KWKwmKZSJKOrg" rel="noreferrer" target="_blank">http://felix-lang.org</a><br> <br> <br> ------------------------------<wbr>------------------------------<wbr>------------------<br> </span><span class="im HOEnZb">Check out the vibrant tech community on one of the world's most<br> engaging tech sites, SlashDot.org! <a href="http://sdm.link/slashdot" data-saferedirecturl="https://www.google.com/url?hl=en&q=http://sdm.link/slashdot&source=gmail&ust=1484852481953000&usg=AFQjCNEdgxepWifaye3YEOSX8vHxzwBZKg" rel="noreferrer" target="_blank">http://sdm.link/slashdot</a><br> </span><div class="HOEnZb"><div class="h5">______________________________<wbr>_________________<br> Math-atlas-devel mailing list<br> <a href="mailto:Mat...@li...">Math-atlas-devel@lists.<wbr>sourceforge.net</a><br> <a href="https://lists.sourceforge.net/lists/listinfo/math-atlas-devel" data-saferedirecturl="https://www.google.com/url?hl=en&q=https://lists.sourceforge.net/lists/listinfo/math-atlas-devel&source=gmail&ust=1484852481953000&usg=AFQjCNHcMGmclmITvH_FWV-G52_ODGYDZw" rel="noreferrer" target="_blank">https://lists.sourceforge.net/<wbr>lists/listinfo/math-atlas-<wbr>devel</a><br> </div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jef...@gm..." target="_blank">jef...@gm...</a><br><a href="https://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div> </div></div> |
|
From: R. C. W. <rcw...@ls...> - 2017-01-18 15:43:22
|
Thanks to everyone on the C99 stuff. After the *very* helpful comments, it seems only // is safe, and I've not yet found the courage to start using even that. The aesthete in me really wants //, but the engineer says "you are planning to break standards compliance for something that doesn't appear in the compiled code, and making aesthetic arguments in code where you use shifts rather than division & multiplication?" :) On 01/18/2017 06:31 AM, john skaller wrote: > Sure. Is there a HPC computing platform that isn’t 64 bit? For me, at least, ATLAS is not aimed just at HPC computing platforms, which are usually adequately served by vendor-supplied BLAS. ATLAS was created because I couldn't get BLAS for some platforms I wanted to work on. While I don't concentrate on 32-bit, I definitely want everything to work there, and design for it. For x86, 32-bit has code size implications that may be important if Intel keeps pouring most of their engineering into power rather than performance. Historically, I have tried to support any machine with a pipelined FPU, and I think ATLAS has been used (mainly for blocking) on even a few w/o an pipelined FPU :) Cheers, Clint -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
|
From: john s. <sk...@us...> - 2017-01-18 12:31:33
|
> > Who would demand this? No one in the Windows world cares about C99. The only folks I know who want MSVC to support C99 are HPC developers who still think Windows support matters. > http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler > I have no idea why anyone would want long long anyhow. > Use intptr_t instead. > > > "long long" must be at least 64-bits, regardless of how wide pointers are. On a 32-bit OS, you would see sizeof(long long)=2*sizeof(intptr_t), no? Sure. Is there a HPC computing platform that isn’t 64 bit? — john skaller sk...@us... http://felix-lang.org |
|
From: Jeff H. <jef...@gm...> - 2017-01-17 22:21:09
|
On Sat, Jan 14, 2017 at 1:11 PM, Andrew Reilly <ar...@bi...> wrote: > > Hi Clint, > > The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don't have good support for the C99 language features that aren't in C++. > > So: you'll find // comments everywhere. > You'll need a macro to define inline _inline on some systems. +1 to ATLAS_INLINE macro. > > You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC. Alas you can't actually use or redefine the keyword "restrict", because that is already a magic keyword used in the Windows header files, and some other Windows magic compilation directives. +1 to ATLAS_RESTRICT macro. > I'm fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions. Fixed-width integer types are part of C++11 ( http://en.cppreference.com/w/cpp/types/integer) so I would expect that MSVC supports them, but I have made no attempts to verify this. > > You will need a macro to define snprintf to _snprintf on MSVC, and you'll need to define _CRT_SECURE_NO_WARNINGS before including any of the standard headers to turn off the deprecation warnings. > > I haven't tried to use _Complex or _Thread_local myself. I have a memory of _Atomic being supported in many places though. I expect that the others are too. > *_Complex* - This is a C99 feature and I don't know of a compiler that doesn't support it. However, just to be safe, you should typedef atlas_complex_{float,double} and follow e.g. https://stackoverflow.com/questions/1063406/c99-complex-support-with-visual-studio if C99 support isn't available. I can't remember what ISO C and Fortran say about the interoperability of their respective complex types but I doubt it is an issue in practice. Clint probably knows what works (and doesn't) already anyways. *_Atomic* - This is a C11 feature and it is a bad one. It is also completely optional (see __STDC_NO_ATOMICS__). You should use the explicit types like atomic_int rather than "_Atomic int" and the explicit API (e.g. atomic_load) rather than relying on operator overloading (ding ding ding - this is why _Atomic is evil and totally un-C-like). The Intel compiler supports the explicit C11 atomics API but not _Atomic and it correctly reports the lack of complete support for C11 atomics via __STDC_NO_ATOMICS__, so you have to explicitly test for the explicit API or query the compiler version macro. https://github.com/jeffhammond/HPCInfo/blob/master/atomics/ping-pong/c11-ping-pong.c demonstrates the latter (it also notes a show-stopper GCC bug if you use mix with OpenMP). GCC and Clang support both C11 atomics APIs. I have not tested Cray C11 support exhaustively, but they have at least the explicit API. *_Thread_local* - This is a C11 feature and it is also strictly option (see __STDC_NO_THREADS__). I recommend you have a macro ATLAS_THREAD_LOCAL for the C11 _Thread_local, GCC __thread, MSVC __declspec(thread), and any other implementation-defined equivalents. One must be careful when mixing TLS (thread-local storage) specifiers with different threading models. I don't think one can guarentee that the TLS attribute associated with C11, GCC and OpenMP are *guaranteed* to work across C11, POSIX and OpenMP threads. Best, Jeff > > Cheers, > > Andrew Reilly > M: 0409-824-272 > ar...@bi... > > > > > On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote: > > > > Guys, > > > > In the developer release, I am considering relaxing ATLAS's present > > strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume > > stuff from C99. Frankly, the lack // is slowly killing me. > > > > Right now, any C99 features are enabled only by macros that can be shut > > off. > > > > There is little benefit aside from aesthetics to this (though safe > > string ops would be *so* nice), so I don't want to do it if anybody > > reports using a compiler that doesn't support these features, but I'm > > thinking that while their might still be some compilers w/o full C99 > > support, they'll all have the features I most want to add. > > > > Here's the list of things I'd definitely like to assume support for that > > I think all compilers support (even likely obscure ones on embedded > > systems): > > // style comments > > inline > > restrict > > long long int, %llu > > Safe string operations, like snprintf (this lack is painful) > > > > In addition there are more advanced features that might be useful, but > > I'm not sure if I can count on them being universally available: > > _Complex support > > _Atomic > > _Thread_local > > > > Does anyone have comments on this idea? > > > > Thanks, > > Clint > > > > -- > > ********************************************************************** > > ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** > > ********************************************************************** > > > > ------------------------------------------------------------------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > Math-atlas-devel mailing list > > Mat...@li... > > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel -- Jeff Hammond jef...@gm... http://jeffhammond.github.io/ |
|
From: Jeff H. <jef...@gm...> - 2017-01-17 22:00:36
|
On Sat, Jan 14, 2017 at 6:33 PM, john skaller <sk...@us... > wrote: > > > On 15 Jan. 2017, at 08:11, Andrew Reilly <ar...@bi...> wrote: > > > > Hi Clint, > > > > The two compilers with least support for c99 features that I'm aware of > are MSVC and TI CodeComposer. Both have most of the support for C99 > library features, but both (being primarily C++ compilers) don’t have good > support for the C99 language features that aren't in C++. > > Doesn’t modern MSVC provide full C99 support? > >From what I've heard, there has been no progress on this except the cases where C99 features were added to C++11. > I though MS caved in to demands? > > Who would demand this? No one in the Windows world cares about C99. The only folks I know who want MSVC to support C99 are HPC developers who still think Windows support matters. > > > I’m fairly sure that modern versions of MSVC support long long int and > %llu, although you might have to spell the former as __int64 on some > versions. > > %llu works for "long long unsigned". For int64_t, you need the PRId64 macro. Since __int64 isn't standard, one does whatever the compiler docs specify. > I have no idea why anyone would want long long anyhow. > Use intptr_t instead. > > "long long" must be at least 64-bits, regardless of how wide pointers are. On a 32-bit OS, you would see sizeof(long long)=2*sizeof(intptr_t), no? Jeff, speaking in a strictly personal capacity > > — > john skaller > sk...@us... > http://felix-lang.org > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > -- Jeff Hammond jef...@gm... http://jeffhammond.github.io/ |
|
From: James C. <cl...@jh...> - 2017-01-17 21:26:05
|
>>>>> "RCW" == R Clint Whaley <rcw...@ls...> writes: RCW> Unfortunately, I can't just macro my way around this lack: supporting RCW> both snprintf and sprintf doubles all my string handling code, which RCW> I'm unwilling to do from a code maintenance perspective, so I'll just RCW> continue with my present C89 behavior there :( You can always include an snprintf(3) implementation. The one from musl is small and is licensed MIT. -JimC -- James Cloos <cl...@jh...> OpenPGP: 0x997A9F17ED7DAEA6 |
|
From: J. R. J. <J.R...@ba...> - 2017-01-16 12:32:32
|
This is fine for me. You could make use of the feature test macros for C99 to produce a helpful error if the support you need isn't there. Jess On Sat, 14 Jan 2017, R. Clint Whaley wrote: > Guys, > > In the developer release, I am considering relaxing ATLAS's present > strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume > stuff from C99. Frankly, the lack // is slowly killing me. > > Right now, any C99 features are enabled only by macros that can be shut > off. > > There is little benefit aside from aesthetics to this (though safe > string ops would be *so* nice), so I don't want to do it if anybody > reports using a compiler that doesn't support these features, but I'm > thinking that while their might still be some compilers w/o full C99 > support, they'll all have the features I most want to add. > > Here's the list of things I'd definitely like to assume support for that > I think all compilers support (even likely obscure ones on embedded > systems): > // style comments > inline > restrict > long long int, %llu > Safe string operations, like snprintf (this lack is painful) > > In addition there are more advanced features that might be useful, but > I'm not sure if I can count on them being universally available: > _Complex support > _Atomic > _Thread_local > > Does anyone have comments on this idea? > > Thanks, > Clint > > |
|
From: R. C. W. <rcw...@ls...> - 2017-01-15 16:58:54
|
Andrew, On 01/14/2017 03:11 PM, Andrew Reilly wrote: > Hi Clint, > > The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don't have good support for the C99 language features that aren't in C++. > > So: you'll find // comments everywhere. > You'll need a macro to define inline _inline on some systems. > You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC. Alas you can't actually use or redefine the keyword "restrict", because that is already a magic keyword used in the Windows header files, and some other Windows magic compilation directives. > I'm fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions. > You will need a macro to define snprintf to _snprintf on MSVC, and you'll need to define _CRT_SECURE_NO_WARNINGS before including any of the standard headers to turn off the deprecation warnings. > > I haven't tried to use _Complex or _Thread_local myself. I have a memory of _Atomic being supported in many places though. I expect that the others are too. > Thank you very much for this! It looks like I can't really change much about ATLAS's C use, other than allowing myself to use // then. Of my proposed list, only this and the safe string functions were things that would immediately make my life a lot better, so lack of snprintf support is the only real disappointment. Unfortunately, I can't just macro my way around this lack: supporting both snprintf and sprintf doubles all my string handling code, which I'm unwilling to do from a code maintenance perspective, so I'll just continue with my present C89 behavior there :( Since I have a soft dependence on gcc for the install, I could still switch to snprintf, but the fact that MSVC doesn't support it still means I'm less confident that no embedded system has only 1 compiler that doesn't support snprintf, and for which modern gcc is not ported, so I'll not switch to snprint. For anyone concerned about security here: The string handling doesn't wind up in the ATLAS library, so it's not really a matter for user security. I do a lot of string handling during the tuning and generation stages, which would either be vastly simplified or made less likely to segfault using snprints, which is why I would have liked to make the change. On the long long type name, I'm already using a macro that can be changed to another name, but I had no way to print out such values in C89, even though many compilers supported the type, so the fact that llu will work is good. Many thanks, Clint > Cheers, > > Andrew Reilly > M: 0409-824-272 > ar...@bi... > > > >> On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote: >> >> Guys, >> >> In the developer release, I am considering relaxing ATLAS's present >> strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume >> stuff from C99. Frankly, the lack // is slowly killing me. >> >> Right now, any C99 features are enabled only by macros that can be shut >> off. >> >> There is little benefit aside from aesthetics to this (though safe >> string ops would be *so* nice), so I don't want to do it if anybody >> reports using a compiler that doesn't support these features, but I'm >> thinking that while their might still be some compilers w/o full C99 >> support, they'll all have the features I most want to add. >> >> Here's the list of things I'd definitely like to assume support for that >> I think all compilers support (even likely obscure ones on embedded >> systems): >> // style comments >> inline >> restrict >> long long int, %llu >> Safe string operations, like snprintf (this lack is painful) >> >> In addition there are more advanced features that might be useful, but >> I'm not sure if I can count on them being universally available: >> _Complex support >> _Atomic >> _Thread_local >> >> Does anyone have comments on this idea? >> >> Thanks, >> Clint >> >> -- >> ********************************************************************** >> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** >> ********************************************************************** >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> Math-atlas-devel mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel > -- ********************************************************************** ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** ********************************************************************** |
|
From: john s. <sk...@us...> - 2017-01-15 02:49:43
|
> On 15 Jan. 2017, at 08:11, Andrew Reilly <ar...@bi...> wrote: > > Hi Clint, > > The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don’t have good support for the C99 language features that aren't in C++. Doesn’t modern MSVC provide full C99 support? I though MS caved in to demands? > I’m fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions. I have no idea why anyone would want long long anyhow. Use intptr_t instead. — john skaller sk...@us... http://felix-lang.org |
|
From: Andrew R. <ar...@bi...> - 2017-01-14 21:41:00
|
Hi Clint, The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don't have good support for the C99 language features that aren't in C++. So: you'll find // comments everywhere. You'll need a macro to define inline _inline on some systems. You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC. Alas you can't actually use or redefine the keyword "restrict", because that is already a magic keyword used in the Windows header files, and some other Windows magic compilation directives. I'm fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions. You will need a macro to define snprintf to _snprintf on MSVC, and you'll need to define _CRT_SECURE_NO_WARNINGS before including any of the standard headers to turn off the deprecation warnings. I haven't tried to use _Complex or _Thread_local myself. I have a memory of _Atomic being supported in many places though. I expect that the others are too. Cheers, Andrew Reilly M: 0409-824-272 ar...@bi... > On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote: > > Guys, > > In the developer release, I am considering relaxing ATLAS's present > strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume > stuff from C99. Frankly, the lack // is slowly killing me. > > Right now, any C99 features are enabled only by macros that can be shut > off. > > There is little benefit aside from aesthetics to this (though safe > string ops would be *so* nice), so I don't want to do it if anybody > reports using a compiler that doesn't support these features, but I'm > thinking that while their might still be some compilers w/o full C99 > support, they'll all have the features I most want to add. > > Here's the list of things I'd definitely like to assume support for that > I think all compilers support (even likely obscure ones on embedded > systems): > // style comments > inline > restrict > long long int, %llu > Safe string operations, like snprintf (this lack is painful) > > In addition there are more advanced features that might be useful, but > I'm not sure if I can count on them being universally available: > _Complex support > _Atomic > _Thread_local > > Does anyone have comments on this idea? > > Thanks, > Clint > > -- > ********************************************************************** > ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley ** > ********************************************************************** > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |