Menu

#92 using size_t instead of int for index/size variables

open
nobody
None
5
2015-06-16
2014-06-12
kmx
No

Providing that a ta-lib function looks like:

TA_RetCode TA_SMA( int    startIdx,
                   int    endIdx,
                   const double inReal[],
                   int           optInTimePeriod, /* From 2 to 100000 */
                   int          *outBegIdx,
                   int          *outNBElement,
                   double        outReal[] );

you might got into troubles if you are on a 64bit platform and want to analyze >2GB data.

For that case it might be better to declare variables startIdx, endIdx, outBegIdx and outNBElement not as int but as size_t (which is usually 32bit on 32bit platforms and 64bit on 64bit platforms).

Looking at another scientific/math libraries size_t for index/size variables seems to be quite common practice see GSL example https://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html

Related

Bugs: #92

Discussion

  • Alexander Trufanov

    I have taken a look into this problem for fun. The TA functions are partially generated by gen_code tool and partially consist of function specific code. It's not a big deal to modify gen_code to produce c++ code where index arguments are size_t. The code is here.

    The real problem is that function specific code will still use ints. You'll get hundreds of warnings regarding comparison signed and unsigned (size_t) types at compile time. Worst of all there are functions that really assign negative values to index variables to mark them uninitialized or whatever. Whoever wrote them expected signed type. I don't think there are a lot of them. Still this means you can't just regenerate library's code - you'll need to check all TA functions after that.

    Another result of switch to size_t is that maintainer will have to distribute both 32bit and 64bit binaries.

    It looks that I'll need to go throw all funcs anyway. Perhaps I'll adjust them to size_t at the same time. Will see. If so, I'll try to contribute the code back to project.

     
    • Mario Fortier

      Mario Fortier - 2015-06-15

      In retrospect, yes using size_t for index into arrays (startIdx/endIdx/outBegIdx/outNbElement) would have been better.

      But...
      I doubt a lot of users have to deal with arrays larger than 2^31 elements in the context of this library... and typically an application design to scale will divide a very large array into smaller chunk that could be processed in parallel. So I doubt there is a widespread need for increasing the int parameters to unsigned 64 bits.
      More importantly: Changing the signed vs unsigned of the API will frustrate existing users. They will have to deal also with the compiler warnings.

      In short, I advise to not make that change.

       On Monday, June 15, 2015 3:33 PM, Alexander Trufanov <trufanov@users.sf.net> wrote:
      

      I have taken a look into this problem for fun. The TA functions are partially generated by gen_code tool and partially consist of function specific code. It's not a big deal to modify gen_code to produce c++ code where index arguments are size_t. The code is here.
      The real problem is that function specific code will still use ints. You'll get hundreds of warnings regarding comparison signed and unsigned (size_t) types at compile time. Worst of all there are functions that really assign negative values to index variables to mark them uninitialized or whatever. Whoever wrote them expected signed type. I don't think there are a lot of them. Still this means you can't just regenerate library's code - you'll need to check all TA functions after that.Another result of switch to size_t is that maintainer will have to distribute both 32bit and 64bit binaries.
      It looks that I'll need to go throw all funcs anyway. Perhaps I'll adjust them to size_t at the same time. Will see. If so, I'll try to contribute the code back to project. [bugs:#92] using size_t instead of int for index/size variablesStatus: open
      Group:
      Created: Thu Jun 12, 2014 08:27 PM UTC by kmx
      Last Updated: Thu Jun 12, 2014 08:27 PM UTC
      Owner: nobodyProviding that a ta-lib function looks like:TA_RetCode TA_SMA( int startIdx,
      int endIdx,
      const double inReal[],
      int optInTimePeriod, / From 2 to 100000 /
      int outBegIdx,
      int
      outNBElement,
      double outReal[] );
      you might got into troubles if you are on a 64bit platform and want to analyze >2GB data.For that case it might be better to declare variables startIdx, endIdx, outBegIdx and outNBElement not as int but as size_t (which is usually 32bit on 32bit platforms and 64bit on 64bit platforms).Looking at another scientific/math libraries size_t for index/size variables seems to be quite common practice see GSL example https://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.htmlSent from sourceforge.net because you indicated interest in https://sourceforge.net/p/ta-lib/bugs/92/To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #92

      •  kmx

        kmx - 2015-06-16

        Just for record: my use case for using ta-lib with 64bit indexing is this perl module https://metacpan.org/pod/PDL::Finance::TA

        I am not saying that there is a huge demand for analyzig big (>2GB) data at the moment but PDL http://pdl.perl.org/ in general is used for handling large verctors/matrices and supports 64bit indexing.

         
    •  kmx

      kmx - 2015-06-16

      The signed equivalent of size_t is IMO ptrdiff_t

       
  • Alexander Trufanov

    Yeah, perhaps you're right and this shouldn't be done.
    I noticed this bug report when I was going to solve quite same problem with slightly different approach. But solution could be applicable to this problem too.

    My intention is to adjust library to be able to calculate hundreds of indicators for real time signals. It's a bit different from current library paradigm. Current library lets you to calculate TA signals for bunch of historical data. If you got new piece of data you'll need to add it to this historical data and run function again. That's a lot of work for things like MA to recalculate whole subset of data (size determined by lookup function) every time I'm getting a new value. And it requires more code to handle.

    I was going to generate one or few new functions for every TA function. New function shall accept only one input value (instead of arrays of values) and return a pointer to internal state record among with only one result. State returned could be null if function has no memory. Function result could indicate if number of observed records is less than lookup count and result should be ignored. For functions with memory the State record could hold Function specific data and even CIRCLE_BUF.

    So instead of

    input[]
    output[]
    TA_Func(input, output, settings)
    

    User will be able to use

    Ta_Func_State* dat
    
    for (in_val in input)
    {
    res = TA_Func(in_val, out_val, settings, dat);
    if (res == SUCCESS)
     print out_val;
    }
    
    if (dat) free (dat);
    

    The second design is applicable to case when you're getting new data at run-time and doesn't depend on amount of data.

    Currently i'm thinking about returning void* instead of Ta_Func_Specific_State* to keep users out of its content. And about storing of settings that were used to initiate function inside state (this will require at least 2 new funcs to generate).

    What do you think? Will it be improvement for lib?

    P.S Unfortunately I'm focused on c++ code and not sure if I'll be able to take care about other languages code generation. At least I won't be able to check this code.

     

    Last edit: Alexander Trufanov 2015-06-16

Log in to post a comment.