#29 Crashing during data collection

None
wont-fix
nobody
None
1
2014-06-08
2014-06-04
Eugene Kirianov
No

I am just learning how to use your system. Any help is appreciated - SYSTEM CRASHES

Configuration: ubuntu 1204; DSP software - 2031
System crashes every second/third time (program quits) while taking STS data.

We just installed the GXSM 1.41.1 with SMP controller Model MK2-A810

Discussion

  • If you start from the console with an increased DEBUG level;
    what is printed then?

     
  • Percy Zahl
    Percy Zahl
    2014-06-05

    Eugene,

    unfortunately what you see is as it sounds a known issue in a way not directly related to Gxsm -- at least we were never able to pinpoint the problem as unreproducible in any predictable way:

    What is known:
    a) it happens only if you enable the "auto plot" so the graph is updated while data is coming in. I simple workaround to get started would be to no use the auto plot, but wait until done, then update.

    b) it seams only to happen on certain computer systems, I have mostly see this on systems "up to date" about 2..3 years ago. Nor older, nor more recent (see below). I have see it more on Intel based any early hyper threaded duo/quad core CPU systems, but also it happened on a AMD machine of that time. Never on older systems so.

    It seamed to be a thread related issue I could only track down into the depths of the X11 lib system and got lost... may be a race condition somewhere.

    Thoughts were that that time "newer" fast multi core systems are showing this symptom. But I still had system it honestly never happened and some it happened maybe once a month with day by day STS usages... and some on pretty much every 2nd click.. on a old quad core Xeon or such.

    Now I have a faily recent development system, to be precise a
    ASUS SABERTOOTH X79
    and though as this would be the fastest I used so far to see the trouble even more -- but -- nada, I since then could not a single time see this happening any more! Sorry, puzzled here.

    c) I am running lasted 3.5.x kernel, latest Debian system on this X79. But I have a feeling it is NOT a kernel nor a distribution related issue. And use since available 64bit Linux for sure.

    Sorry, I hope you find a solution.
    I would suggest trying a significantly different computer (avoid Duo Core, i5, and early i7 generations, Xeon, also avoid AMD Phenom X4 940 models) if you have one available. But as said, not sure, may be a CPU-Mobo combination or may be even
    certain kernel version/CPUs -- not sure exactly.

     
    Last edit: Percy Zahl 2014-06-05
  • Percy Zahl
    Percy Zahl
    2014-06-05

    FYI:

    I just checked on my most used STMSYSTEM -- I have see this problem on this system in the past very very rarely but never again for now over a year with newer kernel versions -- so this may also be a hint.
    This is very stable, I routinely do spectroscopic mappings using this system and run often over days at least 5000+ spectra in a session, multiple channels, with autoplot on -- no problem.

    It is a quad core i7 860:

    Linux lhestm 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64

    percy@lhestm:~$ cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 30
    model name : Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
    stepping : 5
    microcode : 0x3
    cpu MHz : 1199.000
    cache size : 8192 KB
    physical id : 0
    siblings : 8
    core id : 0
    cpu cores : 4
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 11
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
    bogomips : 5599.82
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

     
  • Percy Zahl
    Percy Zahl
    2014-06-05

    Also this I use on the X79 -- never ever a glitch on STS updates.

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 45
    model name : Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
    stepping : 7
    microcode : 0x70c
    cpu MHz : 1200.000
    cache size : 10240 KB

     
  • Thank you very much everybody. After disabling one core in our duo core pc, the STS data collection is stable. It resolved the glitch. Thank you again for taking time to respond.

     
  • Percy Zahl
    Percy Zahl
    2014-06-08

    Interesting. Thanks for posting this. I never even though about this -- and was not aware it's even possible to disable a core!

    Glad this works for you even not ideal...

    It's very strange to me is that on newer even faster and more core systems it dose not happen any more. As my only explanation was some kind of race situation somewhere not even in Gxsm code to my best knowledge. And I configured all thread options in libraries I know about.

     
  • Percy Zahl
    Percy Zahl
    2014-06-08

    • status: open --> wont-fix
    • Group: -->