On Linux (very recent Ubuntu 22.04 with g++ v11.2.0, but also on the GBO RHEL7 machines) using very recent PSRCHIVE compiles (my git hash is fe624d663b164305a467b16a04290aff5afb674c), we can create fluxcal files, but the resulting FITS FLUX_CAL table is corrupted, and causes SEGFAULTS.
For example:
~/tmp via 🐍 v2.7.18 took 2s
❯ fluxcal -f vegas_59755_20594_3C444_0090_cal_0001.zap.cf vegas_59755_20797_3C444_0091_cal_0001.zap.cf
fluxcal: loading vegas_59755_20594_3C444_0090_cal_0001.zap.cf
fluxcal: starting new FluxCalibrator
fluxcal: loading vegas_59755_20797_3C444_0091_cal_0001.zap.cf
fluxcal: observation added to FluxCalibrator
fluxcal: creating PSRFITS Archive
fluxcal: unloading vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal
~/tmp via 🐍 v2.7.18 took 7s
❯ pacv vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal 10:28:24
Graphics device/type (? to see list, default /NULL): /xwin
pacv: vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal is a processed Calibrator
pacv: constructing FluxCalibrator from Extension
pacv: Plotting FluxCalibrator
fish: Job 1, 'pacv vegas_59755_20594_3C444_00…' terminated by signal SIGSEGV (Address boundary error)
Running fitsverify on the resulting file gives;
=================== HDU 3: BINARY Table ====================
*** Warning: Keyword #30, CAL_MTHD has a null value.
*** Warning: Keyword #31, SCALFILE has a null value.
*** Error: Keyword #49, TDIM10: illegal TDIMn keyword value
*** Error: Keyword #44, TDIM7: illegal TDIMn keyword value
*** Error: Keyword #46, TDIM8: illegal TDIMn keyword value
*** Error: Keyword #48, TDIM9: illegal TDIMn keyword value
*** Warning: Keyword #34, EPOCH is deprecated. Use EQUINOX instead.
*** Error: Keyword #34, EPOCH: value = 59755.2394 is not a floating point
number. The value is entered as a string.
51 header keywords
FLUX_CAL(1) (10 columns x 1 rows)
Col# Name (Units) Format
1 DAT_FREQ (MHz) 2304D
2 DAT_WTS 2304E
3 S_SYS (Jy) 4608E
4 S_SYSERR (Jy) 4608E
5 S_CAL (Jy) 4608E
6 S_CALERR (Jy) 4608E
7 SCALE (Jy) E
8 SCALEERR (Jy) E
9 RATIO E
10 RATIOERR E
++++++++++++++++++++++ Error Summary ++++++++++++++++++++++
HDU# Name (version) Type Warnings Errors
1 Primary Array 0 0
2 HISTORY (1) Binary Table 0 0
3 FLUX_CAL (1) Binary Table 3 5
**** Verification found 3 warning(s) and 5 error(s). ****
Am happy to help debug.
Cheers,
Scott
Hi Scott,
Unfortunately, I can't reproduce the error on my end ... could you please make example data files available for download somewhere? I'm thinking I may not have enough channels in my test data files to trigger the issue, or something like that.
Cheers,
Willem
Hi Willem:
You can grab two on/off cal files that show this behavior from here:
https://www.cv.nrao.edu/~sransom/vegas_59705_19470_B1442+101_0001_cal_0001.zap.cf
https://www.cv.nrao.edu/~sransom/vegas_59705_19608_B1442+101_0002_cal_0001.zap.cf
Note that those only 1024 channels, but they are still showing the problems with the flux_cal columns.
Scott
I can reproduce this and spent some time debugging this morning. Not fully solved but a couple issues so far:
-gis not being used, the new columns in the flux cal FITS table should be removed, that will clean up the FITS errors Scott mentioned. This did not fix the segfault though.FluxCalibrator::createtheconstant_scaleattribute is being used before it's initialized. Fixing this fixes the segfault but I think there still may be other issues, sincepacvclaims "no points to plot" when reading the fluxcal file.I have patches for #1 and #2 but not checked in yet. I'll keep poking at it, but Willem let me know if you've already solved it!
Hm maybe the problem now is just with pacv, though likely originates in the fluxcal-related changes over the past few months. The "no points to plot" also shows up for fcal files created with an older psrchive version, so I'd guess the fits files are OK.. I think I will check in my changes so far but we should keep trying to understand what is happening with pacv.
Hi Scott, thanks for reporting this bug; and hi Paul, thank you for looking into it!
I think that I've tracked the problem down to some rotten old code that was last edited in Sept 2006.
In Base/Extensions/CalibratorExtension_build.C, the Pulsar::CalibratorExtension::build method uses the first sub-Integration in the Archive to define the "weight" associated with each channel
In some flux calibrator observations, every channel in the first sub-integration might be zapped, but the data in subsequent sub-integrations are ok, which makes the above code a rotten test for validity of calibrator solutions.
I'm guessing that this was not a problem in the past because programs like fluxcal would tscrunch before computing levels; however, I've relatively recently disabled this tscrunch in order to better cope with impulsive interference.
I'm going to code a better solution for determining the weight associated with each channel ... I'll report back when this is done.
cheers,
Willem
I've checked in a fix:
Could you please try it out on your end?
Thanks for the efforts, Paul and Willem! So with the new changes, the fluxcal gets written. But I'm seeing that it still has FITS errors and pacv doesn't like it for plotting. Here is what pacv says:
and here is what fitsverify says:
Not sure if the pacv and fitsverify issues are the same or not.
Hi Willem, Scott asked me about this again today so I looked into it some more. The problem does still exist in current psrchive. The issue is related to these two commits:
As noted here, there was a "lazy" evaluation imlemented for FluxCalibrator when loading data from a FluxCalibratorExtension. But there are several methods within FluxCalibrator that attempt to use the data array without first checking that it has actually been filled in.
I just pushed a simple change to FluxCalibrator::get_nchan() that addresses one of these problems, and makes the pacv plot work again.
But there are other more serious problems related to this.. for example get_CalibratorStokes() needs the data but doesn't have it, and this is currently breaking calibration using "pac -x" mode. I tried fixing but ran into a long chain of const-related fallout. Is there a simpler approach? Like maybe revert the "lazy" evaluation.. or is it needed for other reasons?
Thanks for any advice/help!
Hi Paul, I can't remember exactly why I introduced the lazy evaluation policy (or what broke to motivate it) but it most likely has something to do with the fact that, when the solution is a spline, it has no intrinsic number of channels, and therefore it was necessary to get the number of channels from the data being calibrated before constructing the discretely-sampled array of values (derived from the spline) to be used for calibration.
When const-related fallout snowballs into an avalanche, I sometimes turn to making class attributes 'mutable' so that they can be altered inside methods declared as const. This seems philosophically ok to me when the attribute in question is an implementation detail (like an array of pre-computed values) and the underlying information used to compute the attribute remains constant.
If mutable doesn't save the day, I'd be happy to help with the const fallout and/or reconsider the lazy evaluation strategy/implementation. e.g. if you can send me some data that I can use to reproduce the problem, I'll have a go.
Right, been a while since I had to use these C++ tricks.. but const_cast also works right? I added this line, which is already used elsewhere in FluxCalibrator and this seems to have addressed the CalibratorStokes issue:
I'll check this in soon unless you think there is any reason not to use this approach.
Also you might want to check the spline stuff, I don't know how to test that (though it does sound useful). In case my change to get_nchan() messed with any of that.
const_cast is aok with me too ... I'll test out your changes using some spline solutions when you're done. thank you for chasing down this bug (and sorry about the trouble!)
OK thanks Willem. See commit 64551887cf. This has fixed the immediate problem but there are still a handful of functions that may run into trouble with this. For example meanTsys() will return zero if setup has not been run. Not sure if we need to chase all these down right now, but might want to keep it in mind for future.