Menu

Cmul function mutates arg when assigning result to the same variable

2021-01-13
2021-01-14
  • Richard Lincoln

    Richard Lincoln - 2021-01-13

    In the Bessel_I1 function Cmul is called with term as the first
    argument and the result is also assigned to term:

    Term := Cmul(term, cmul(incterm, newterm));
    

    The first line of Cmul sets the real part of the result which, in this
    case, is also argument A:

    Result.RE:=A.RE*B.RE-A.IM*B.IM;
    

    The second line sets the imaginary part of the result using the, now
    modified, real part of argument A:

    Result.IM:=A.RE*B.IM+A.IM*B.RE
    

    Is this as intended? It might be clearer if the function were to create a
    new variable and assign it to the result.

    Yours sincerely,
    Richard Lincoln

     
    • Davis Montenegro

      Hello,

      "Result" is a temporary variable itself that is used to return something. If you think you need a new variable in that part of the code, you can always download it and add it for your debugging/coding experiments after getting your clone. However, adding a new variable doesn't improves the code performance at all, so, it doesn't make sense to add it.

      Best regards

      Davis

       
  • Roger Dugan

    Roger Dugan - 2021-01-13

    Hi Richard!

    Long time, no speak. Hope you're doing well.

    Pretty sure the CMUL function is working OK. It has been that way since about 1984. If it wasn't working, a lot of things in OpenDSS would have been wrong and I think we would have noticed. Here is the declaration:

    Function CMUL(const a,b:complex):complex;  inline;
      BEGIN
        Result.RE:=A.RE*B.RE-A.IM*B.IM;
        Result.IM:=A.RE*B.IM+A.IM*B.RE
      END;
    

    Note that a and b are both declared as const. Therefore, the compiler will not allow them to be changed inside this routine. When CMUL is called in Bessel_I1 the values of the term variable are pushed onto the stack and neither of the complex components is changed during the execution of CMUL. At the end of CMUL, The value of the term variable in the calling function is replaced by the value of Result.

    Note that CMUL is declared as inline. This means that the code generated by the Delphi compiler will be done in the CPU registers as much as possible for fastest speed. I've inspected the asm code generated by the compiler and it looks to me like it does a good job of accomplishing this. I can't remember when we first did this, but I want to say if was at Delphi 10, whenever that was.

    So this is a bit of a programming trick that won't work in all languages. There are ways to do this in C and even VB. I don't know how to do it in Python. What language are you using?

    I noticed something else in Bessel_I1 that looks odd. I will have to check it out to see if it really works. Did you notice any strange results?

    I didn't write the original code. I just translated a routine that started out in Fortran, I think, and had been translated by another engineer to VBA, and then I added some OpenDSS Delphi features for efficiency. There is one thing that I would never do if writing from scratch, but the compiler might compensate. I need to check it out. I will let you know if I find a problem.

     
  • Richard Lincoln

    Richard Lincoln - 2021-01-13

    Hello Davis,

    Thank you for taking the time to respond. The temporary variable you describe is how I expected it to work. However, using OpenDSS v9 at r2948 with FPC v3.0.4 on Ubuntu 18.04:

    If I debug the 4Bus-DY-Bal.DSS test case with earthModel=Deri and add a breakpoint in Bessel_I1 at line 469. In the first iteration of the loop:

    TERM={RE = 0.31798631449324982, IM = 0.31798631449324982}
    INCTERM={RE = 0.31798631449324982, IM = 0.31798631449324982}
    NEWTERM={RE = 0.15899315724662491, IM = 0.15899315724662491}
    

    After the two calls to Cmul:

    TERM={RE = -0.032153280379121234, IM = -0.0032511884694972578}
    

    Whereas, I think it should be: -0.0321532803791+0.0321532803791j. After all, the real and imaginary parts of the arguments are the same. This Python script illustrates the problem:

    incterm = complex(0.31798631449324982, 0.31798631449324982)
    newterm = complex(0.15899315724662491, 0.15899315724662491)
    term = complex(0.31798631449324982, 0.31798631449324982)
    
    print(term * (incterm * newterm))
    
    def cmul(A,B):
        Areal = A.real*B.real-A.imag*B.imag
        return complex(Areal, Areal*B.imag+A.imag*B.real)
    
    print(cmul(term, incterm * newterm))
    

    Output:

    (-0.0321532803791+0.0321532803791j)
    (-0.0321532803791-0.0032511884695j)
    

    Can you confirm that this happens using your compiler too?

    Yours sincerely,
    Richard

     
    • Davis Montenegro

      Hi Richard,

      Just did the check and the output looks correct, I'm using Embarcadero Delphi in Win10 OS. Check the attachment. Thanks Paulo for clarifying the issues you've found in FPC under other OS, which I think rises a flag for other developers when moving to other platforms using this compiler.
      I wonder if that answer your question Richard. Anyway, if there is anything else we can do to help you validate, please let us know.

      Best regards

      Davis

       
  • Paulo Meira

    Paulo Meira - 2021-01-13

    Hi, all,
    For Free Pascal, the issue is due to aliasing through the const parameters (effectively passed by reference). Detailed discussion from April 2019, initial report by Dheepak:
    https://github.com/dss-extensions/dss_capi/issues/54

    Key point from the FPC wiki:

    Declaring a parameter as const allows the compiler the possibility to do optimizations it couldn't do otherwise, such as passing by reference while retaining the semantics of passing by value.

    This doesn't affect Windows, only Linux and macOS.

    When building with Free Pascal, I imagine this and many other bugs fixed in DSS C-API are still present in the main OpenDSS codebase. I would recommend anyone that compiles the official codebase with Free Pascal to carefully validate the outputs with the official OpenDSS binary (Windows only). Each platform/CPU target needs to be tested and validated separately. For DSS C-API/Extensions, we test Windows, Linux, macOS (x64). Recently, I checked that ARM builds (32 and 64 bit) run fine too so they will be available for download in the future.

    There are other issues we encountered through the years but I don't report this kind of bug anymore since Free Pascal / Linux is not officially supported by EPRI. Reporting those would only detract from the discussion of other potential bugs that also affect the official, Delphi-compiled OpenDSS. Note that not everything we fixed has public discussions or issue tickets.

    Regards,
    Paulo Meira

     
  • Roger Dugan

    Roger Dugan - 2021-01-13

    Paulo,

    Many thanks for your help! I am open to suggestions so we can make OpenDSS work across more platforms. We have received a lot of requests for various Linux platforms both internal EPRI and external users. We probably should get Tom McDermott's 2-cent's worth on this because he may have already solved some of these issues. I'll ping him.

    Here's something I noticed after Richard called my attention to the Bessel routine. CDivReal is called with a (complex, integer) argument instead of (complex, double). I would never have written it that way. I suspect this came from the VBA code I converted this from. VBA always tries to interpret what you mean and correct your code -- especially on mixing integer and floating point.

    term := CdivReal(x , 2);   <----!!!!
    Result := Term;
    incTerm := Term;
    i:=4;
    REPEAT
       newterm := CdivReal(x, i);   <----!!!!
       Term := Cmul(term, cmul(incterm, newterm));
       Caccum(Result, Term);
       incterm := newterm;
       inc(i,2);     <----!!!!
       SizeSqr := SQR(term.re) + SQR(term.im)
    UNTIL (i > MaxTerm) OR (SizeSqr < EpsilonSqr)
    

    So I stopped the Delphi debugger on the first one above and displayed the CPU:

    mathutil.pas.468: newterm := CdivReal(x, i);
    006C88F9 89542460         mov [esp+$60],edx
    006C88FD DB442460         fild dword ptr [esp+$60]
    006C8901 DD5C2464         fstp qword ptr [esp+$64]
    006C8905 9B               wait 
    006C8906 DD442464         fld qword ptr [esp+$64]
    006C890A DDD1             fst st(1)
    006C890C DC3B             fdivr qword ptr [ebx]
    006C890E DD1F             fstp qword ptr [edi]
    006C8910 9B               wait 
    006C8911 D9C0             fld st(0)
    006C8913 DC7B08           fdivr qword ptr [ebx+$08]
    006C8916 DD5F08           fstp qword ptr [edi+$08]
    006C8919 9B               wait 
    

    This introduced me to an instruction (fild) I did not know. This loads the 80-bit floating-point register from an integer converting the integer on the fly. So this code works as the programmer expected.

    Does this work for FPC? How about non-Windows platforms?

    My preference is to change the source code to correct the call to CdivReal (Complex, Double). Will save a couple instructions per call ... probably only a couple of ns per call but to me the more important issue in the spirit of Pascal is to have the proper type agreement.

    I've attached the subroutine from the ancient IBM SSP that I think is the original source of the Bessel's function code. Note that "I" is "BI". In Fortran IV, that makes the variable floating point. You'll notice that there is no type declaration. That's the way it used to be! Variables beginning with I, J, K, L, M, and N were default integer.

     
    • Paulo Meira

      Paulo Meira - 2021-01-13

      Hi, Roger,

      CDivReal is called with a (complex, integer) argument instead of (complex, double).

      I think that's fine, especially for integers on that scale (<=1000). If the situation was reversed (double converted to integer), the compiler should probably give a warning or an error (maybe it needs a compiler flag for that).

      Does this work for FPC? How about non-Windows platforms?

      On Free Pascal / Linux, cvtsi2sdl (SSE version) is used to load the integer, here's the assembly dump:

      # [500] newterm := CdivReal(x, i);
              cvtsi2sdl       %edx,%xmm0
              movsd   (%rsp),%xmm2
              divsd   %xmm0,%xmm2
              movsd   %xmm2,64(%rsp)
              movsd   8(%rsp),%xmm2
              divsd   %xmm0,%xmm2
              movsd   %xmm2,72(%rsp)
      

      80-bit floating-point

      On x64, since SSE is used in general (x87 FPU is deprecated), only 64-bit FP operations are used. I haven't tried forcing 80-bit in Free Pascal since it would only affect the Pascal code (as you know, while some intermediary values are in the the old x87 FPU registers, they would be kept as 80-bit without truncation). Most ARMs only support 64-bit FP.

      (Sidenote: With GPUs and machine learning, we now have 16-bit floats but no support for higher precision floats, which is a bit sad.)

      My preference is to change the source code to correct the call to CdivReal (Complex, Double). Will save a couple instructions per call ... probably only a couple of ns per call but to me the more important issue in the spirit of Pascal is to have the proper type agreement.

      Yeah, I wouldn't worry about the overhead since this shouldn't be in the hot path for most users.

      I've attached the subroutine from the ancient IBM SSP that I think is the original source of the Bessel's function code. Note that "I" is "BI". In Fortran IV, that makes the variable floating point. You'll notice that there is no type declaration. That's the way it used to be! Variables beginning with I, J, K, L, M, and N were default integer.

      I use the *bes* family of functions from SLATEC in some other code (through a DLL compiled with GFortran). I have a folder with some related papers -- it seems the first ones are from 1974 and 1981 (the SLATEC code is from '81). Boost (C++) includes Bessel implementations too, since 2007. There are papers some alternative formulations from 2011 and later on ACM, but I decided to keep the SLATEC version at the time.

      Years ago, I did a simple validation for the Bessel functions in OpenDSS and they seemed fine for the typical input range. I probably have some notes somewhere on the line model as a whole; I'll try to find them and check if there's anything of note to mention.

       
  • Tom McDermott

    Tom McDermott - 2021-01-13

    Hello, everyone. Roger asked me to comment on this. As a general principle, known bugs in the trunk should be fixed, including non-compliance with the language. If something happens to work in the current and previous versions of Delphi, it may not work in future versions or on other platforms.

    Without digging into the details, this particular issue doesn't sound familiar from our experience compiling with FPC on 64-bit Windows, Linux or Mac OS X, from the Version 7 trunk. There are a few special files and a few localized ifdefs for FPC, and it's been working, although we are not using the C API or OpenDSSDirect.

    This reminds me of some issues that were fixed in GridLAB-D over the past many months. We had some problems associated with optimization flags, gcc versions, and the use of clang on various platforms. These issues could all be fixed and in the end they were, or at least we're very close now. Doing so helps to establish, maintain or improve the audience perception of quality.

     
  • Paulo Meira

    Paulo Meira - 2021-01-14

    Hi, Tom,

    Without digging into the details, this particular issue doesn't sound familiar from our experience compiling with FPC on 64-bit Windows, Linux or Mac OS X, from the Version 7 trunk. There are a few special files and a few localized ifdefs for FPC, and it's been working, although we are not using the C API or OpenDSSDirect.

    Richard mentioned he used FPC v3.0.4, it might be that the specific issue happened only in that version. FPC v3.2 (released on June 2020), which seems to be used by DSS C-API and OpenDSSCmd nowadays, might have fixed the issue.
    For FPC v3.0, removing the const modifier is enough to fix the issue. I didn't try to check that for OpenDSSCmd since I couldn't find older (FPC v3.0.x) binaries.

    For DSS C-API, since DSS Python provides (nearly) the same API as the official COM DLL, we use that to compare most of the API calls to COM. If there are notable differences, the test warns or halts (as I mentioned in the capradius thread, we're adding more tests in the near future too). For running that on Linux and macOS, the API results are serialized on Windows and loaded on the respective platforms. The results are usually pretty close regardless of platform.

    Unfortunately I don't have an easy way to compare the results for OpenDSSCmd, but I found some deviations from the Windows version of OpenDSS. Investigating it a bit, I think the main issue might be compiler flags. From a quick look at Version7/Source/CMD/linuxopts.cfg vs the equivalent in DSS C-API, I don't see the -CF64 flag in linuxopts.cfg. On FPC v3.2, without it, some floats use 32-bit precision (since they do fit at first). Later, when those constants or literals are used with integers, the resulting float is still 32-bit and the results get truncated with bad precision. This is propagated in a bunch of places. That flag was added in our "Upgrade to FPC 3.2.0..." commit.

    Here are some results of the validation through COMparison (COM being the official OpenDSS COM DLL on Windows). You can see that the results don't deviate horribly, but for QSTS simulations, the results can deviate enough to quickly reach completely different control states. (Sidenote for other DSS users: for very long simulations, with small time steps, control states might deviate a little across platforms due to these tiny differences in implementation, and that's expected).

    (here, the max abs difference for some circuit properties; edited for brevity)

    On Windows:

    > EPRITestCircuits/epri_dpv/M1/Master_NoPV.dss
    Running using COM
    Running using CAPI
    Validating
    AllBusDistances 0.0
    AllBusVmag 4.163780431554187e-11
    AllBusVmagPu 3.467226505904364e-13
    AllBusVolts 3.598010778205207e-11
    AllElementLosses 4.187284462147375e-08 Line.0x008dbe00_0x008d4da8 (1.3001256986894972e-08-8.401090931061337e-08j) (-1.0361611011528268e-08-4.926164290710013e-08j)
    AllNodeDistances 0.0
    LineLosses 1.952943329051222e-07 
    SubstationLosses 1.1176837233506376e-11
    SystemY 4.3655745685100555e-11
    TotalPower 1.3876615412300453e-09
    YCurrents 8.986944521893747e-11
    YNodeVarray 3.598010778205207e-11
    Done
    

    On Linux, with -CF64 (differences in the same order of magnitude):

    > EPRITestCircuits/epri_dpv/M1/Master_NoPV.dss
    COM output loaded
    Running using CAPI
    Validating
    AllBusDistances 0.0
    AllBusVmag 4.175149115326349e-11
    AllBusVmagPu 3.476108290101365e-13
    AllBusVolts 3.285549610154703e-11
    AllElementLosses 2.8713800651740578e-08 Line.0x008dbe00_0x008d4da8 (1.3001256986894972e-08-8.401090931061337e-08j) (-8.341302919449808e-09-6.48021226619991e-08j)
    AllNodeDistances 0.0
    LineLosses 1.5428828438463524e-07
    SubstationLosses 3.8184566619747784e-11
    SystemY 4.3655745685100555e-11
    TotalPower 5.657057045027614e-10
    YCurrents 2.0372681319713593e-10
    YNodeVarray 3.285549610154703e-11
    

    On Linux without -CF64 (note how things start to deviate 3 to 5 orders of magnitude):

    > EPRITestCircuits/epri_dpv/M1/Master_NoPV.dss
    COM output loaded
    Running using CAPI
    Validating
    AllBusDistances 0.0
    AllBusVmag 4.147855179326143e-06
    AllBusVmagPu 5.761263688341955e-10
    AllBusVolts 3.463322627794696e-06
    AllElementLosses 6.334671525110025e-07 Capacitor.cap2 (2.9103830456733704e-14-1204.2185378463314j) (2.1827872842550278e-14-1204.2185372128642j)
    AllNodeDistances 0.0
    LineLosses 2.389844980378864e-06
    SubstationLosses 1.9921799321309663e-08
    SystemY 0.0004349835708126193
    TotalPower 5.87984686717391e-07
    YCurrents 1.0491629609532538e-07
    YNodeVarray 3.463322627794696e-06
    

    We can see that SystemY looks like a good candidate to check in OpenDSSCmd. So,
    to check OpenDSSCmd on Linux, I ran that circuit epri_dpv/M1/Master_NoPV.dss and export y triplets y_xxx.csv with three OpenDSS implementations:

    • y_com.csv: running with COM on windows, "Version 9.1.3.4 (64-bit build)" -- downloaded from the SVN some days ago. I consider this the reference.
    • y_capi.csv: DSS C-API 0.10.7+dev (eaf9e7cb340d03d95b54a7486a59339aca3f929c), Linux, FPC 3.2.0 (with -CF64)
    • y_cmd.csv: Console OpenDSS (Electric Power Distribution System Simulator) Version: 1.2.12 -- downloaded today (to save time, I didn't try recompiling it)

    Then, loaded the results through Pandas for a quick comparison (due to formatting, some precision is lost, that's why COM vs C-API match completely here and not in the previous test):

    >>> import pandas as pd
    >>> com = pd.read_csv('y_com.csv')
    >>> capi = pd.read_csv('y_capi.csv')
    >>> cmd = pd.read_csv('y_cmd.csv')
    >>> com
           Row   Col           G           B
    0        1     1    1.933559   -5.824611
    1        2     1    0.002958    0.010579
    2        3     1    0.002958    0.010579
    3        4     1   -0.007105    0.142091
    4        5     1    0.007105   -0.142091
    ...    ...   ...         ...         ...
    8416  3149  3149  326.796343 -103.540059
    8417  3150  3150  326.796343 -103.540059
    8418  3151  3151  182.139363  -60.333408
    8419  3152  3152  409.861443 -122.418847
    8420  3153  3153  409.861443 -122.418847
    
    [8421 rows x 4 columns]
    >>> (com - capi).abs().max()
    Row    0.0
    Col    0.0
    G      0.0
    B      0.0
    dtype: float64
    >>> (com - cmd).abs().max()
    Row    0.00000
    Col    0.00000
    G      0.00043
    B      0.00043
    dtype: float64
    >>> (com - cmd).describe()
              Row     Col             G             B
    count  8421.0  8421.0  8.421000e+03  8.421000e+03
    mean      0.0     0.0  8.134227e-07 -8.119339e-07
    std       0.0     0.0  1.551364e-05  1.555897e-05
    min       0.0     0.0 -4.100000e-04 -4.300000e-04
    25%       0.0     0.0  0.000000e+00 -1.150000e-07
    50%       0.0     0.0  0.000000e+00  0.000000e+00
    75%       0.0     0.0  1.770000e-07  0.000000e+00
    max       0.0     0.0  4.300000e-04  4.100000e-04
    >>>            
    

    Off the top of my head, that -CF64 flag was one notable issue, but there were a lot of small changes in DSS C-API that I can't recall now and couldn't precisely say how they affected the output. For the code in general, currently I check the Version8/ code only (I see that OpenDSSCmd uses only the Version7/ code). By the way, I hope to integrate a lot of changes by next month, including some that should enable most of the v8+ PM features in all platforms, in a safe way.

     

    Last edit: Paulo Meira 2021-01-14
  • Paulo Meira

    Paulo Meira - 2021-01-14

    Tom,
    I was reading the build instructions for OpenDSSCmd and looks like the info from the section "4.5 Other OpenDSS Development Branches" from Doc/OpenDSS_FPC_Build.docx is outdated (by a couple of years):

    A mirrored-repository Makefile:
    https://github.com/Muxelmann/OpenDSSDirect.make

    I think the author hasn't updated that in a while. Might still work but the other projects don't use it anymore. I see from https://github.com/Muxelmann/dss-macOS that the author moved to DSS C-API too.

    NREL’s Python interface to the direct-call, shared library version:
    https://github.com/NREL/OpenDSSDirect.py/
    EPRI’s Julia interface to the direct-call, shared library version:
    https://github.com/tshort/OpenDSSDirect.jl
    These last two NREL and EPRI interfaces to the shared library are using mirrored repositories. If these interfaces are of interest to PNNL developers, please use one of the following options, both of which reference the main repository:

    Besides the names, they don't use OpenDSSDirect anymore, starting in 2018. Both use DSS C-API and are grouped under the DSS Extensions. Please see https://dss-extensions.org/ and https://github.com/dss-extensions/ -- there are projects from other users and orgs not under DSS Extensions that rely on this OpenDSS implementation. For example, some projects use DSS Python and/or OpenDSSDirect.py.

    To expand a bit, since you might not be aware, the code in DSS C-API is not a simple mirror of the code from the SVN. My initial proposal in 2018 was only a plain C interface (distinct from OpenDSSDirect) but the whole thing has changed a lot. There are changes, many small optimizations, in a lot of places, in the Pascal code and even in KLUSolve (we use a fork with some extensions). From the end-user perspective, compatibility has been good.

    By the way, for "2.5 OpenDSS Repository", you could use git svn to simplify the process, there's an example in electricdss-tst. That's how we keep the branch at opendss-official-svn -- the whole git history for DSS C-API, which includes most of OpenDSS code history, is around 19MB.

     
  • Roger Dugan

    Roger Dugan - 2021-01-14

    I modified the Bessel_I1 code by getting rid of the integer i and making it a double:

    FUNCTION  Bessel_I1 (CONST x:  Complex):  Complex;
    CONST
        MaxTerm    : Double = 1000.0;
        EpsilonSqr : Double = 1.0E-20;
    
    VAR
        i      :  Double;
        term, incterm, newterm : Complex;
        SizeSqr:  Double;
    
    Begin
        term    := CdivReal(x , 2.0);
        Result  := Term;
        incTerm := Term;
        i       := 4.0;
        REPEAT
           newterm := CdivReal(x, i);
           Term := Cmul(term, cmul(incterm, newterm));
           Caccum(Result, Term);
           incterm := newterm;
           i := i + 2.0;
           SizeSqr := SQR(term.re) + SQR(term.im)
        UNTIL (i > MaxTerm) OR (SizeSqr < EpsilonSqr)
    
    End;
    

    This gives exactly the same answer and I think the ASM code created looks a little better and is a couple of instructions shorter. It doesn't have to do the type conversion on the fly anymore, so maybe it will run a little faster. A good test would have been that EMC studies we were doing 10 years ago doing harmonics studies up to 2 MHz where the line impedance has to be recomputed at each frequency. Even then, it took less than 10 s to run. So that doesn't justify putting much effort into it.

    More important to me and Tom, CdivReal is now called with arguments of the proper type . It'll be in the next build we put out.

    I'll also post the change to Version 7.

     
    👍
    2

Log in to post a comment.