An issue was encountered with multithreading for DSPSR when parsing multiple predictor/ephemeris files as an input for folding filterbank files. The folding seems to stall after a certain point indicating a threading deadlock. Some of the sample types of commands that failed are
dspsr sample.fil -k meerkat -E J1748-2446A.par -E J1748-2446F.par -E J1748-2446S.par -E J1748-2446T.par -E J1749-2629.par -t 5 -L 10dspsr sample.fil -k meerkat -t 5 -L 10 -w cands.txtdspsr sample.fil -t 5 -L 10 -P pred1.txt -P pred2.txt -P pred3.txtHowever, we are unable to reproduce the same error condition when using PSRFITS files as input.
We also observe that using a single thread (-t 1) works always but the folding fails regardless of running with and without the -A option. when using nthreads > 1
Sample predictor file for the -w option
SOURCE RA DEC PERIOD DM ACC
cfbf00000_test 5:40:41.08 -75:57:3.80 0.001538 0.000000 0.000000
cfbf00000_2 5:40:41.08 -75:57:3.80 0.001032 122.879997 0.000000
cfbf00000_3 5:40:41.08 -75:57:3.80 0.001636 88.199997 -20.804050
cfbf00000_4 5:40:41.08 -75:57:3.80 0.001278 298.440002 -9.548882
cfbf00000_5 5:40:41.08 -75:57:3.80 0.001433 165.899994 56.958942
**Sample verbose output before the stalling point: **
dsp::Seekable::load_data
load_size=13104
load_sample=380016
current_sample=380016
dsp::Seekable::recycle_data
start_sample=353808
last_sample=366912
dsp::Seekable::load_data
recycled=0
read_size=13104
read_sample=380016
dsp::Seekable::load_data total ndat=3858432 read_sample=380016
dsp::Seekable::load_data 3478416 samples remaining
dsp::Seekable::load_data call load_bytes(53673984)
dsp::File::load_bytes nbytes=53673984
dsp::Subint::transformation
dsp::File::load_bytes bytes_read=53673984 old_pos=1556545889 new_pos=1610219873 end_pos=15804137825
dsp::Input::operation load_data done load_sample=380016 name='SigProc'
dsp::Input::operation calling seek(13104)
dsp::Input::seek [INTERNAL] resolution=1 resolution_offset=0 load_sample=393120
dsp::Input::set_load_size block_size=13104 resolution_offset=0
dsp::Input::set_load_size load_size=13104
dsp::Input::operation exit with load_sample=393120
dsp::Input::load exit
dsp::Subint::unload_partial to callback
dsp::Subint::unload_partial this=0x7f8ab40012e0 unloader=0x1024e80
division=1 finished: 0 0 1 1 1 1 1 1 1 1 1 1
division=2 finished: 0 0 1 1 1 1 1 1 1 1 0 0
division=3 finished: 0 0 0 1 1 1 0 1 1 1 0 0
division=4 finished: 0 0 0 0 1 0 0 1 1 0 0 0
division=5 finished: 0 0 0 0 0 0 0 0 0 0 0 0
dsp::UnloaderShare::Storage::wait_all
division=5 finished: 0 0 0 0 0 0 0 0 1 0 0 0
dsp::Subint::transformation
dsp::Subint::unload_partial to callback
dsp::Subint::unload_partial this=0x7f8ac00012e0 unloader=0x1024020
dsp::UnloaderShare::Storage::integrate adding to division=5
dsp::UnloaderShare::Storage::integrate into profile=0x7f8ab4001b20 list=0x7f8ab4002090
dsp::UnloaderShare::Storage::integrate from profile=0x7f8ac0001b20 list=0x7f8ac0002090
dsp::SignalPath::combine this=0x7f8ab4002050 that=0x7f8ac0002050
dsp::SignalPath::combine IOManager:SigProc
dsp::SignalPath::combine Subint<fold>
division=1 finished: 0 0 1 1 1 1 1 1 1 1 1 1
division=2 finished: 0 0 1 1 1 1 1 1 1 1 0 0
division=3 finished: 0 0 0 1 1 1 0 1 1 1 0 0
division=4 finished: 0 0 0 1 1 0 0 1 1 0 0 0
division=5 finished: 0 0 0 1 0 0 0 0 1 0 0 0
dsp::Subint::transformation
dsp::Subint::unload_partial to callback
dsp::Subint::unload_partial this=0x7f8ac0002960 unloader=0x1026650
division=0 finished: 1 0 1 1 1 0 0 1 1 1 1 0
division=5 finished: 0 0 0 0 0 0 0 0 0 0 0 0
dsp::UnloaderShare::Storage::wait_all
division=5 finished: 0 0 0 1 0 0 0 0 0 0 0 0</fold>
DSPSR Version used :
`commit 044a83ecdf3685c5b4ab3ecab49e10daba2e217f
Merge: 7a6a642 33b311a
Author: Willem van Straten vanstraten.willem@gmail.com
Date: Sat Aug 15 20:51:48 2020 +1000
Merge branch 'NZAPP-208-merge'
`
Additionally, we also tried an earlier commit (ba764751c32f8fb935b23c31b8da6e46253b7b39 committed on Nov 25 2019) which showed the same result.
Hi Prajwal,
I think that this is a fundamental problem related to how dspsr folds multiple pulsars in multiple sub-integrations on multiple threads.
When a thread reaches the end of a sub-integration, there are two possibilities:
a. the sub-integration is complete and can be written to disk; or
b. the sub-integration is incomplete because one or more threads still have data to fold into it (and have yet to do so).
In case b, there are a couple of options:
Clone the data and put it in a place where the other threads will find it when they get to the end of the same sub-integration on the same pulsar; each thread can add to it and, when it is complete, write it to disk. The clone is necessary so that the original thread can resume where it left off in the time series and start folding the next sub-integration into a different array.
Put the data in a place where the other threads will find it when they get to the end of the same sub-integration on the same pulsar, then go to sleep. The other threads add to the data and wake up the original thread after they have done so; when the sub-integration is complete, the original thread writes it to disk
Option 1 is not very friendly on RAM, especially if sub-integration data are somewhat large (e.g. many channels) and there are many pulsars to fold. Therefore, dspsr implements option 2 by default. However, when there are multiple folds happening in parallel, different threads can go to sleep waiting for sub-integrations to be completed on different pulsars, and in the case of two threads it is possible for thread A to be waiting on pulsar X and thread B to be waiting on pulsar Y (deadlock).
I'm surprised that this bug has not been reported before. :-)
Regarding the best way to address this problem: It is simplest to switch to option 1, but this could lead to large numbers of cloned sub-integrations waiting around to be completed. I guess that there would be at most nthread times npulsar cloned sub-integrations, and perhaps this is not terrible in most cases.
Another approach is to stick to option 2 but somehow link the different Fold transformations to each other, such that a thread will go and finish other Fold operations (and wake up any other sleeping threads) before going to sleep on its current Fold.
Although I like option 2 best, I've implemented option 1 for now. Please compile and install the latest version of dspsr and let me know if the bug is fixed on your end.
Cheers,
Willem