Menu

#50 Illegal instruction

1.9
closed
None
2015-08-22
2015-08-14
No

I got this error when running WSClean 1.8 with a mask image in an AMD processor. The software was compiled in an Intel machine. I did not use any additional compiler flag, just the default configuration.

The following command runs OK:
/home/jsm/LOFAR/local/release/bin/wsclean -j 12 -absmem 768 -reorder -name imfield0_clusters1nm -size 4320 4320 -scale 1.5arcsec -weight briggs -0.5 -niter 10000 -cleanborder 0 -threshold 0.000525 -minuv-l 5.0 -mgain 0.6 -fitbeam -datacolumn DATA -no-update-model-required field.ms

but when I add a mask (-casamask templatemask_s1.masktmp) it fails:
/home/jsm/LOFAR/local/release/bin/wsclean -j 12 -absmem 768 -reorder -name imfield0_clusters1nm -size 4320 4320 -casamask templatemask_s1.masktmp -scale 1.5arcsec -weight briggs -0.5 -niter 10000 -cleanborder 0 -threshold 0.000525 -minuv-l 5.0 -mgain 0.6 -fitbeam -datacolumn DATA -no-update-model-required field.ms

The error is the following:

WSClean version 1.8 (2015-05-21)
This software package is released under the GPL version 3.
Author: André Offringa (offringa@gmail.com).

Reordering 1754970 selected rows into 1 x 1 parts.
Reordering: 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Initializing model visibilities: 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Detected 189.3 GB of system memory, usage limited to 189.3 GB (frac=100%, limit=768GB)
Opening reordered part 0 for field.ms
Precalculating weights for Briggs'(-0.5) weighting... DONE
== Constructing PSF ==
Selected channels: 0-20
Determining min and max w & theoretical beam size... DONE (w=[6.3788e-05:17012.3] lambdas, maxuvw=34233.2 lambda, beam=6.03'')
Setting small inversion image size of 2152 x 2152
Suggested number of w-layers: 27
Will process 27/27 w-layers per pass.
Visibility count per layer: 23819973 4723377 2273607 1162247 750533 536924 386863 326958 262643 237005 179482 153022 77375 41764 39833 38398 38533 34492 10433 1607 807 762 767 771 757 422 45
Gridding pass 0...
Rows that were required: 1754970/1754970
Fourier transforms...
Total rows read: 1754970 (overhead: 0%)
Freed 51 image buffer(s).
FFT 2152 x 2152 real -> complex...
FFT 4320 x 4320 complex -> real...
Storing imfield0_clusters1nm-psf-I-tmp.fits
Fitting beam... major=9.27'', minor=6.63'', PA=105.86 deg, theoretical=6.03''.
Writing psf image... DONE
== Constructing image ==
Selected channels: 0-20
Determining min and max w & theoretical beam size... DONE (w=[6.3788e-05:17012.3] lambdas, maxuvw=34233.2 lambda, beam=6.03'')
Setting small inversion image size of 2152 x 2152
Freed 6 image buffer(s).
Will process 27/27 w-layers per pass.
Gridding pass 0...
Illegal instruction

This is the information of the AMD processor (48 cores)

processor : 47
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6168
stepping : 1
microcode : 0x10000d9
cpu MHz : 800.000
cache size : 512 KB
physical id : 3
siblings : 12
core id : 5
cpu cores : 12
apicid : 75
initial apicid : 59
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips : 3800.11
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

And this is the information of the Intel processor (8 cores):

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
stepping : 6
microcode : 0x60f
cpu MHz : 2666.545
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips : 5333.95
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

Related

Wiki & Manual: Changelog-1.9
Wiki & Manual: Installation

Discussion

  • nudomarinero

    nudomarinero - 2015-08-14

    OK. I found that CMake uses the following flags for gcc:
    CXX_FLAGS = -O3 -Wall -DNDEBUG -march=native -msse4.1 -ggdb -I/home/jsm/LOFAR/local/release/include/casacore -I/home/jsm/LOFAR/local/release/include

    Apparently, "-march=native -msse4.1" will make the binary non portable,

     
  • André Offringa

    André Offringa - 2015-08-14
    • status: open --> wont-fix
    • assigned_to: André Offringa
     
  • André Offringa

    André Offringa - 2015-08-14

    Hi nudomarinero,

    Thanks for reporting this problem. You are right that -march=native and -msse4.1 will make the binary non-portable. Does this imply that you compiled the binary on a different machine as where you execute it? That does indeed generally not work for WSClean. Those compiler switches cause a large performance increase on some machines, hence they are on by default. You can indeed leave them out if you want the binary to be portable, as long as you are aware that this can influence the performance.

    It would be good to add this to the documentation; thanks for pointing this out!

     
  • nudomarinero

    nudomarinero - 2015-08-14

    Hello,
    I run WSClean in an heterogeneous cluster composed of AMD and Intel nodes, some of them of different generations. The head node is Intel and it is where the code was compiled. It is not possible to install software directly in the nodes, it has to be done in your own home area.

    Could it be possible to add a flag to the CMake config to create a portable binary? The default option would be optimized, but using "cmake -DPORTABLE=True .." it could be possible to generate a portable binary. This is also interesting to create deb packages for distribution.

    For example:
    if(PORTABLE)
    set(CMAKE_CXX_FLAGS "-O3 -Wall -DNDEBUG -ggdb")
    else()
    set(CMAKE_CXX_FLAGS "-O3 -Wall -DNDEBUG -march=native -msse4.1 -ggdb")
    endif(PORTABLE)

    I attach a diff with the propposed change.

    Best regards.

     
  • André Offringa

    André Offringa - 2015-08-14

    Thanks for the patch; I have applied it to Git. It should get into WSClean 1.9, which if I get the time, it should be released in one or two weeks or so. I have also updated the installation instructions.

     

    Related

    Wiki & Manual: Installation

  • André Offringa

    André Offringa - 2015-08-14
    • status: wont-fix --> pending
     
  • André Offringa

    André Offringa - 2015-08-14

    BTW; if your cluster is supposed to be a high-performance cluster, you should really tell a few admins that they make it hard to have properly optimized binaries if the cluster is heterogeneous... ;) It is a bit of a shame, because e.g. using or not using AVX can make a difference of a few factors.

     
  • André Offringa

    André Offringa - 2015-08-22
    • status: pending --> closed