Hi,
I experience following bug for RNG_test. When running
RNG_test sfc64 -tlmax 128T -multithreaded
after 512 gigabytes has been processed, RNG_test process gets stalled. It seems like deadlock situation has been reached. ps -eLf shoes that no threads are running and strace -p <pid> shows series of nanosleeps:</pid>
nanosleep({0, 1000}, NULL) = 0
nanosleep({0, 1000}, NULL) = 0
It happens for every tested RNG and both in versions 0.90 and 0.91.
To reproduce it please run
RNG_test sfc64 -tlmax 128T -multithreaded
After 512 gigabytes will be processed you should see that system load goes to zero and RNG_test does not do any work.
Thanks
Jirka
Trying to reproduce now.
Looking at the code, I'd say it's probably clean of race conditions. If a test crashed, that might cause it to wait indefinitely for the test to complete - but so far as I know PractRand tests don't make a habit of crashing. If pthread_create or CreateThread failed, that would likewise leave it waiting indefinitely - I never got around to checking the return value on those. That's actually sounding like the most plausible explanation to me at the moment... if I'd heard that before releasing .92 I would have tried to make it pass any pthread_create or CreateThread errors along to the user. man pages / MDSN suggests a couple of possibilities for why those functions might start giving me errors after a while of execution - I think I'm not cleaning up after myself properly in some ways, though it wouldn't matter if I were using a proper pool of worker threads instead of a quick-and-dirty solution.
I can't reproduce it. On win32 anyway. Are you using pthreads or win32 threading? Anyway, I have some ideas that might fix it for 0.93.
Hi,
I run RNG_test on Linux. I will try it with 0.92 version and let you know the outcome.
Jirka
Okay, my best guess (haven't finished a linux test run yet):
What's happening is that I'm leaking resources in Threading::create_thread, which eventually causes it to refuse to launch more threads for me. Immediately after the lines:
pthread_t thread;
pthread_create(&thread, NULL, func, param);
There should be a third line:
pthread_detach(&thread);
To let it know that it should clean up after the thread automatically when it terminates. The win32 code already does that correctly. It'll take me several hours to confirm that that is what's going on and this fixes it, but good chance that's it. I'll have it fixed in 0.93, you can patch your copy of 0.92 if you want, that's in tools/multithreading.h
I will try to apply the patch.
0.92 (vanilla) version exhibits the same bug - it happens again after 512 gigabytes of data has been processed.
rng=sfc64, seed=0x6c08d0b3
length= 512 gigabytes (2^39 bytes), time= 2485 seconds
no anomalies in 276 test result(s)
strace shows that RNG_test does nothing but calling
nanosleep({0, 1000}, NULL) = 0
nanosleep({0, 1000}, NULL) = 0
Short update: the proposed patch should be
pthread_detach(thread);
and not
pthread_detach(&thread);
I have started the test and will report the results back.
Thanks
Jirka
I have tested the proposed patch but unfortunately it did not help. I still experience the same behaviour. After processing of 512 gigabytes of data program is only calling nanosleep in a loop.
I can't reproduce your behavior. With the original, on win32 it seemed to just work for me. On my (single-core) linux VM it did the first 512 GB in a little over 2 hours, and failed to reach 1 TB for the next 5 hours before I aborted it - but CPU usage remained at 100% until it was aborted, unlike your observed behavior. With the patch applied, it worked correctly for me on both win32 and linux.
I'm not going to be up for any more testing today. If you want my next best idea: My linux VM doesn't seem to mind, but I just noticed that according to the docs pthread_mutex_destroy isn't supposed to be called on locked mutexes. So, on about line 17 of tools/MultithreadedTestManager.h where it says "delete this;", there should be a "lock.leave();" immediately before that.
Hi,
I will try that.
Do you have any debug messages which I could enable (for example during the compilation)? The fact that the bug manifests after processing 0.5 TB of data makes it hard to debug the code. If you can propoase some way how to debug it, please let me know.
Thanks
Jirka
Unfortunately there's not any debugging framework for that. I could write one I suppose... may have to if I can't figure out what's going on on your box.
I assume that neither fix did anything for you? Here's an idea that might be easier to test, though far less definitive that the previous two: in MultithreadedTestManager.h, in the constructors parameter list, the default value for max_buffer_amount_ is "1 << (27-10)", try changing it to "1 << (10-10)". That's around line 95.
That won't fix anything, but it might make your problem happen a lot faster, like within the first minute or two. If so, well, that would be informative in its own right, and make other changes easier to test. That change should force it to create threads far faster (one set for every kilobyte instead of for every 128 megabytes), which should force any issue tied to total number of threads (or total number of mutexes) to occur much faster.
Hi,
unfortunately, "lock.leave()" patch has not helped either.
With "1 << (10-10)" problem occurs already after 128MiB as you have predicted. I will try to debug it, let's see if I can find anything.
I see one strange behaviour. With all patches applied
and RNG_test (just RNG_test) compiled with -O2 flag (instead of -O3)
the bug seems to be gone. When compiled with -O3 I still see the bug.
Does it help you any further?
I have now started new tests with buffer_amount_ being set to "1 << (27-10)" and with patches pthread_detach and lock.leave applied. RNG_test was compiled with -O2. I should have results by tomorrow
Well, isn't that fun. So it's either a compiler bug, or something evil I'm
doing that it tolerates with optimization off. If it's dependent upon the
version of gcc used that could explain why I can't reproduce the issue.
Hm... something evil... the first thing that comes to mind is how my
OS-independence layer hides the implementation - the array of Uint8s cast
to a pthread_mutex_t is pretty evil. I wouldn't expect optimization to
have trouble with it, but if I'm violating some obscure
optimization-related constraint of the language that would be my first
guess.
I don't know how to tell my VMed Centos7 (a rebranded Redhat) to change gcc
versions, but just in case I figure it out, what version of gcc are you
using? I should probably test it out on MinGW too, but after my recent
hard drive failure I no longer have a working MinGW install, another thing
to fix sometime.
On Thu, Nov 20, 2014 at 5:11 PM, Jirka jhladky@users.sf.net wrote:
Related
Bugs:
#5I'm running awfully low on things that could cause it to sleep
indefinitely. I do have another bug though: If I'm not mistaken, I have
the two buffers mixed up in the memcpy call in multithread_prep_blocks -
alt_buffer[0] should be buffer[0], and buffer[prefix_blocks + main_blocks -
num_prefix_blocks] should be alt_buffer[prefix_blocks + main_blocks -
num_prefix_blocks].
There's a (very) remote chance that could cause a test to crash, which
would in turn cause the main thread to wait forever. Probably it just
causes a negligible error in the test results though.
That's in MultithreadedTestManager.h, around line 84 or so.
On Thu, Nov 20, 2014 at 4:06 PM, Jirka jhladky@users.sf.net wrote:
Related
Bugs:
#5I have tested it on two different boxes and with following two gcc versions:
gcc (GCC) 4.7.2 20121109
gcc (GCC) 4.8.2 20140120
Behaviour was the same - with O3 it exhibits the bug (falling into infinitive loop of nanosleep calls) , with -O2 (and other fixes applied) it's running fine.
I will test it with gcc 4.9 tomorrow.
Could you please send me the fixed MultithreadedTestManager.h ? The latest memcpy patch is not so easy as previous ones.... You can attach the file directly below this form.
Thanks
Jirka
I'm running 4.8.2 20140120 here. So apparently it's not tied to the compiler. I've checked that I'm using -O3. It occurred to me that it could be an issue with the lack of multicore in my linux VM, as windows will switch mutex implementations depending upon the hardware, but switching it to multicore did not help either. And I tried reverting to an older codebase and applying only the first patch, in case something else I did caused it.
No dice. I can't produce anything like what you're seeing once the first patch is applied. So... it's not tied to gcc version. Maybe pthreads version? I'm using NPTL 2.17 (identified via: getconf GNU_LIBPTHREAD_VERSION). Or glibc version? Hm... version number seems to be identical, 2.17 (identified via: ldd --version). To be clear, I'm building 64 bit binaries on linux, and a mix of 32 and 64 on windows.
I don't see how this could have gone wrong, but just to make very sure, this is the corrected function from the first patch:
void create_thread( THREADFUNC_RETURN_TYPE (THREADFUNC_CALLING_CONVENTION func)(void), void *param ) {
pthread_t thread;
pthread_create(&thread, NULL, func, param);
pthread_detach(thread);
}
I'm kind of grasping at straws here. My next guess would be that my portability layers dirty tricks are causing problems, and needs the layer largely removed and/or extra "volatile" keywords added and/or to have the storage space declared at a larger alignment (I think my code should already force it to 64 bit alignment, but it may require up to 512 bit alignment, I can't tell). I'll add some ifdefed compiler-dependent code to improve alignment, for whatever that is worth - I know malloc/new won't do alignments above 64 bit on MSVC, regardless of type properties.
I'll try to make the next version report all unexpected pthreads error codes. They may not actually help though, considering that this is dependent upon optimization levels and not reproducible here, but it might.
Hi,
I have installed Fedora21 Beta with gcc 4.9.2 20141101 and problem is gone after applying pthread_detach(thread) patch. I cannot reproduce the problem any longer also with older compiler version so it seems I have made some mistake during the testing. It could also be that problem does not manifest any longer with newer pthread library version.
In any case, I think that reporting unexpected pthreads error codes is a great idea and will definitely help in the future.
During the testing on Fedora21 I had one crash but unfortunately core dumps where disabled and I was not able to reproduce it.
Have you fixed MultithreadedTestManager.h regarding the memcpy? If so, could you please send me the fixed version?
For the original problem reported here it seems it's fixed with pthread_detach(thread) patch.
Thank you very much for your help on this!
Jirka
Whew, finally. The first patch was really the only one that had a high
chance of fixing things, everything else combined was long shots.
Yeah, I fixed which buffer it was copying where. The corrected code looks
like:
std::memcpy(
&buffer[0],
&alt_buffer[prefix_blocks + main_blocks - num_prefix_blocks],
PractRand::Tests::TestBlock::SIZE * num_prefix_blocks
);
(at aproximately line 62 of MultithreadedTestManager.h)
To check if you have the correct code enabled, do test runs of the
following two command line options:
RNG_test sfc64 -tlmax 1MB -a -seed 0
and
RNG_test sfc64 -tlmax 1MB -a -seed 0 -multithreaded
They should produce identical output. In particular, look at the p-values
for FPF-14+6/16:all and FPF-14+6/16:all2, those are probably the easiest
place to spot divergence. I was aware they weren't producing identical
output sometimes, but the results had gotten close enough that it wasn't a
priority to hunt down the last source or two of minor divergence.
On Mon, Nov 24, 2014 at 3:57 AM, Jirka jhladky@users.sf.net wrote:
Related
Bugs:
#5