Menu

#465 erroneous FLUX_CAL table creation

next release
open
nobody
None
5
2023-08-22
2022-06-26
No

On Linux (very recent Ubuntu 22.04 with g++ v11.2.0, but also on the GBO RHEL7 machines) using very recent PSRCHIVE compiles (my git hash is fe624d663b164305a467b16a04290aff5afb674c), we can create fluxcal files, but the resulting FITS FLUX_CAL table is corrupted, and causes SEGFAULTS.

For example:

~/tmp via 🐍 v2.7.18 took 2s 
 fluxcal -f vegas_59755_20594_3C444_0090_cal_0001.zap.cf vegas_59755_20797_3C444_0091_cal_0001.zap.cf
fluxcal: loading vegas_59755_20594_3C444_0090_cal_0001.zap.cf
fluxcal: starting new FluxCalibrator
fluxcal: loading vegas_59755_20797_3C444_0091_cal_0001.zap.cf
fluxcal: observation added to FluxCalibrator
fluxcal: creating PSRFITS Archive
fluxcal: unloading vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal

~/tmp via 🐍 v2.7.18 took 7s 
 pacv vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal                          10:28:24
 Graphics device/type (? to see list, default /NULL): /xwin
pacv: vegas_59755_20594_3C444_0090_cal_0001.zap.fluxcal is a processed Calibrator
pacv: constructing FluxCalibrator from Extension
pacv: Plotting FluxCalibrator
fish: Job 1, 'pacv vegas_59755_20594_3C444_00…' terminated by signal SIGSEGV (Address boundary error)

Running fitsverify on the resulting file gives;

=================== HDU 3: BINARY Table ====================

*** Warning: Keyword #30, CAL_MTHD has a null value.
*** Warning: Keyword #31, SCALFILE has a null value.
*** Error:   Keyword #49, TDIM10: illegal TDIMn keyword value
*** Error:   Keyword #44, TDIM7: illegal TDIMn keyword value
*** Error:   Keyword #46, TDIM8: illegal TDIMn keyword value
*** Error:   Keyword #48, TDIM9: illegal TDIMn keyword value
*** Warning: Keyword #34, EPOCH is deprecated. Use EQUINOX instead.
*** Error:   Keyword #34, EPOCH: value = 59755.2394 is not a floating point
             number. The value is entered as a string. 

 51 header keywords

 FLUX_CAL(1)  (10 columns x 1 rows)

 Col# Name (Units)       Format
   1 DAT_FREQ (MHz)       2304D     
   2 DAT_WTS              2304E     
   3 S_SYS (Jy)           4608E     
   4 S_SYSERR (Jy)        4608E     
   5 S_CAL (Jy)           4608E     
   6 S_CALERR (Jy)        4608E     
   7 SCALE (Jy)           E         
   8 SCALEERR (Jy)        E         
   9 RATIO                E         
  10 RATIOERR             E         

++++++++++++++++++++++ Error Summary  ++++++++++++++++++++++

 HDU#  Name (version)       Type             Warnings  Errors
 1                          Primary Array    0         0     
 2     HISTORY (1)          Binary Table     0         0     
 3     FLUX_CAL (1)         Binary Table     3         5     

**** Verification found 3 warning(s) and 5 error(s). ****

Am happy to help debug.

Cheers,

Scott

Discussion

  • Willem van Straten

    Hi Scott,

    Unfortunately, I can't reproduce the error on my end ... could you please make example data files available for download somewhere? I'm thinking I may not have enough channels in my test data files to trigger the issue, or something like that.

    Cheers,
    Willem

     
  • Paul Demorest

    Paul Demorest - 2022-06-27

    I can reproduce this and spent some time debugging this morning. Not fully solved but a couple issues so far:

    1. If fluxcal -g is not being used, the new columns in the flux cal FITS table should be removed, that will clean up the FITS errors Scott mentioned. This did not fix the segfault though.
    2. in FluxCalibrator::create the constant_scale attribute is being used before it's initialized. Fixing this fixes the segfault but I think there still may be other issues, since pacv claims "no points to plot" when reading the fluxcal file.

    I have patches for #1 and #2 but not checked in yet. I'll keep poking at it, but Willem let me know if you've already solved it!

     
  • Paul Demorest

    Paul Demorest - 2022-06-27

    Hm maybe the problem now is just with pacv, though likely originates in the fluxcal-related changes over the past few months. The "no points to plot" also shows up for fcal files created with an older psrchive version, so I'd guess the fits files are OK.. I think I will check in my changes so far but we should keep trying to understand what is happening with pacv.

     
  • Willem van Straten

    Hi Scott, thanks for reporting this bug; and hi Paul, thank you for looking into it!

    I think that I've tracked the problem down to some rotten old code that was last edited in Sept 2006.

    In Base/Extensions/CalibratorExtension_build.C, the Pulsar::CalibratorExtension::build method uses the first sub-Integration in the Archive to define the "weight" associated with each channel

        const Integration* subint = calibrator->get_Archive()->get_Integration(0);
    
        unsigned nchan = get_nchan();
        for (unsigned ichan=0; ichan < nchan; ichan++) {
          weight[ichan] = subint->get_weight(ichan);
          centre_frequency[ichan] = subint->get_centre_frequency(ichan);
        }
    

    In some flux calibrator observations, every channel in the first sub-integration might be zapped, but the data in subsequent sub-integrations are ok, which makes the above code a rotten test for validity of calibrator solutions.

    I'm guessing that this was not a problem in the past because programs like fluxcal would tscrunch before computing levels; however, I've relatively recently disabled this tscrunch in order to better cope with impulsive interference.

    I'm going to code a better solution for determining the weight associated with each channel ... I'll report back when this is done.

    cheers,
    Willem

     
  • Willem van Straten

    I've checked in a fix:

    To ssh://git.code.sf.net/p/psrchive/code
       aede9e109..d318723e4  master -> master
    

    Could you please try it out on your end?

     
  • Scott Ransom

    Scott Ransom - 2022-07-15

    Thanks for the efforts, Paul and Willem! So with the new changes, the fluxcal gets written. But I'm seeing that it still has FITS errors and pacv doesn't like it for plotting. Here is what pacv says:

     Graphics device/type (? to see list, default /NULL): /xwin
    pacv: vegas_59705_19470_B1442+101_0001_cal_0001.zap.fluxcal is a processed Calibrator
    pacv: constructing FluxCalibrator from Extension
    pacv: Plotting FluxCalibrator
    Pulsar::CalibratorPlotter::plot no points to plot
     Type <RETURN> for next page: 
    

    and here is what fitsverify says:

     fitsverify vegas_59705_19470_B1442+101_0001_cal_0001.zap.fluxcal                11:55:26
    
                  fitsverify 4.20 (CFITSIO V4.000)              
                  --------------------------------              
    
    
    File: vegas_59705_19470_B1442+101_0001_cal_0001.zap.fluxcal
    
    3 Header-Data Units in this file.
    
    =================== HDU 1: Primary Array ===================
    
     65 header keywords
    
     Null data array; NAXIS = 0 
    
    =================== HDU 2: BINARY Table ====================
    
     75 header keywords
    
     HISTORY(1)  (29 columns x 3 rows)
    
     Col# Name (Units)       Format
       1 DATE_PRO             24A       
       2 PROC_CMD             256A      
       3 SCALE                8A        
       4 POL_TYPE             8A        
       5 NSUB                 1J        
       6 NPOL                 1I        
       7 NBIN                 1I        
       8 NBIN_PRD             1I        
       9 TBIN (s)             1D        
      10 CTR_FREQ (MHz)       1D        
      11 NCHAN                1J        
      12 CHAN_BW (MHz)        1D        
      13 REF_FREQ (MHz)       1D        
      14 DM (CM-3)            1D        
      15 RM (RAD)             1D        
      16 PR_CORR              1I        
      17 FD_CORR              1I        
      18 BE_CORR              1I        
      19 RM_CORR              1I        
      20 DEDISP               1I        
      21 DDS_MTHD             32A       
      22 SC_MTHD              32A       
      23 CAL_MTHD             32A       
      24 CAL_FILE             256A      
      25 RFI_MTHD             32A       
      26 RM_MODEL             32A       
      27 AUX_RM_C             1I        
      28 DM_MODEL             32A       
      29 AUX_DM_C             1I        
    
    =================== HDU 3: BINARY Table ====================
    
    *** Warning: Keyword #22, CAL_MTHD has a null value.
    *** Warning: Keyword #23, SCALFILE has a null value.
    *** Warning: Keyword #26, EPOCH is deprecated. Use EQUINOX instead.
    *** Error:   Keyword #26, EPOCH: value = 59705.2260 is not a floating point
                 number. The value is entered as a string. 
    
     37 header keywords
    
     FLUX_CAL(1)  (6 columns x 1 rows)
    
     Col# Name (Units)       Format
       1 DAT_FREQ (MHz)       1024D     
       2 DAT_WTS              1024E     
       3 S_SYS (Jy)           2048E     
       4 S_SYSERR (Jy)        2048E     
       5 S_CAL (Jy)           2048E     
       6 S_CALERR (Jy)        2048E     
    
    ++++++++++++++++++++++ Error Summary  ++++++++++++++++++++++
    
     HDU#  Name (version)       Type             Warnings  Errors
     1                          Primary Array    0         0     
     2     HISTORY (1)          Binary Table     0         0     
     3     FLUX_CAL (1)         Binary Table     3         1     
    
    **** Verification found 3 warning(s) and 1 error(s). ****
    

    Not sure if the pacv and fitsverify issues are the same or not.

     
  • Paul Demorest

    Paul Demorest - 2023-08-15

    Hi Willem, Scott asked me about this again today so I looked into it some more. The problem does still exist in current psrchive. The issue is related to these two commits:

    commit 58c6f1ada756402ca6e1bd3e580afca8164a2b8f
    Author: Willem van Straten <vanstraten.willem@gmail.com>
    Date:   Wed Apr 13 15:00:08 2022 +1000
    
        lazily evaluate the FluxCalibrator::data array when not calibrating data
    
    commit 922334b29881ff82f9e04163422173a2d2bef333
    Author: Willem van Straten <vanstraten.willem@gmail.com>
    Date:   Sat Mar 26 00:10:32 2022 +1300
    
        PSRFITS 6.6 write, read and use interpolated fluxcal solutions
    

    As noted here, there was a "lazy" evaluation imlemented for FluxCalibrator when loading data from a FluxCalibratorExtension. But there are several methods within FluxCalibrator that attempt to use the data array without first checking that it has actually been filled in.

    I just pushed a simple change to FluxCalibrator::get_nchan() that addresses one of these problems, and makes the pacv plot work again.

    But there are other more serious problems related to this.. for example get_CalibratorStokes() needs the data but doesn't have it, and this is currently breaking calibration using "pac -x" mode. I tried fixing but ran into a long chain of const-related fallout. Is there a simpler approach? Like maybe revert the "lazy" evaluation.. or is it needed for other reasons?

    Thanks for any advice/help!

     
  • Willem van Straten

    Hi Paul, I can't remember exactly why I introduced the lazy evaluation policy (or what broke to motivate it) but it most likely has something to do with the fact that, when the solution is a spline, it has no intrinsic number of channels, and therefore it was necessary to get the number of channels from the data being calibrated before constructing the discretely-sampled array of values (derived from the spline) to be used for calibration.

    When const-related fallout snowballs into an avalanche, I sometimes turn to making class attributes 'mutable' so that they can be altered inside methods declared as const. This seems philosophically ok to me when the attribute in question is an implementation detail (like an array of pre-computed values) and the underlying information used to compute the attribute remains constant.

    If mutable doesn't save the day, I'd be happy to help with the const fallout and/or reconsider the lazy evaluation strategy/implementation. e.g. if you can send me some data that I can use to reproduce the problem, I'll have a go.

     
  • Paul Demorest

    Paul Demorest - 2023-08-22

    Right, been a while since I had to use these C++ tricks.. but const_cast also works right? I added this line, which is already used elsewhere in FluxCalibrator and this seems to have addressed the CalibratorStokes issue:

    const_cast<FluxCalibrator*>(this)->setup();
    

    I'll check this in soon unless you think there is any reason not to use this approach.

    Also you might want to check the spline stuff, I don't know how to test that (though it does sound useful). In case my change to get_nchan() messed with any of that.

     
  • Willem van Straten

    const_cast is aok with me too ... I'll test out your changes using some spline solutions when you're done. thank you for chasing down this bug (and sorry about the trouble!)

     
  • Paul Demorest

    Paul Demorest - 2023-08-22

    OK thanks Willem. See commit 64551887cf. This has fixed the immediate problem but there are still a handful of functions that may run into trouble with this. For example meanTsys() will return zero if setup has not been run. Not sure if we need to chase all these down right now, but might want to keep it in mind for future.

     

Log in to post a comment.

MongoDB Logo MongoDB