IT++ Thread safety

Frank
2012-10-29
2012-11-15
  • Frank
    Frank
    2012-10-29

    Some time ago I found this module for including IT++ - code in Gnuradio blocks:
    https://www.cgran.org/wiki/itpp
    I was irritated by the note "IT++ is not thread safe. When operating on IT++ objects, insert mutexes or other thread protection mechanisms." But why it's not tread-safe ? Is it the IT++ code itself, the basic math objects, or the external implementations (ATLAS,ACML etc)? I wonder if it is easy to make it thread-safe or if the basic concepts are generally incompatible in multithread environments. Mutexes are more a "workaround" than a real solution, because multithread is very useful for live signal processing in Gnuradio. What about the alternatives, µBLAS and Armarillo, aren't they thread-safe either?

     
  • Frank
    Frank
    2012-11-14

    So, we learned that the random generators are not thread-safe, because of the static variables. Writing to matrix and vector objects concurrently can cause race conditions, too. That's Ok if you know it. Any other critical points? This should be documented.

    If IT++ is thread-safe otherwise it would be fine. You could write signal processing blocks for Gnuradio that run in parallel with other blocks. Very important is that there are no side-effect between independent functions and object instances. I.e. if a function like the cross-product is called from two different threads with different data, they should be completely independent. At least for ACML there is a "multithread" version, too. I don't know if this means that operations are done multi-core or if this is also required for multithread programs. For ATLAS they say it should:
    http://math-atlas.sourceforge.net/faq.html#tsafe
    FFTW is only thread-safe for "fftw_execute", but not for creating the FFTW plan:
    http://www.fftw.org/doc/Thread-safety.html
    It can be configured to do Multithread itself:
    http://www.fftw.org/doc/Usage-of-Multi_002dthreaded-FFTW.html#Usage-of-Multi_002dthreaded-FFTW

    So, maybe it would be wise to create a special thread-safe variant of ITPP, selected by #define flags, to take care about those issues. The issue with the FFT plan is difficult. I think we would have to split up the IT++ function into the parts: 1. creating plan, 2. executing plan on IT++ vectors.

     
  • andy_panov
    andy_panov
    2012-11-14

    As for random numbers, I've already submitted a patch to allow the parallel execution with openmp (see feature request #90). Just recompile the library with openmp support and you'll get the per-thread generation of random numbers.

    PS I've just went through the FFTW code and it seems it can be rewritten the same way (single plan for omp thread)

     
    Last edit: andy_panov 2012-11-14
  • Frank
    Frank
    2012-11-14

    Are you sure you get no significant performance penalties? I know, the pragmas are ignored in non-OpenMP apps. But the mechanisms are a bit different. I'm also sceptic about modifying the FFTW. In MATLAB it's just one single binary context, so you have to use the MATLAB-variant of FFTW and ACML. I tried mixing different library versions in Mex-functions, but I got lots of crashes with that. Debugging Mex within MATLAB is a nightmare.

     
  • Frank
    Frank
    2012-11-14

    Btw: for FFTW it would be better to build the FFT plan only once and use it for every thread. Per-thread FFT plan calculations are a huge overhead if the FFT sizes are the same.

     
  • andy_panov
    andy_panov
    2012-11-14

    RNG penalties - should not be much of it. It costs several more fetches from context address per RNG call. Added complexity is constant and does not depend on the number of samples to generate.
    FFT plans are created dynamically (MKL-FFTW case). ACML-based version uses dynamically created scratch-pad vector. Pointers to them are stored in static variable inside the transform function. We can just put these pointers to the thread-local storage and be on the safe side.

     
  • andy_panov
    andy_panov
    2012-11-14

    Per-thread FFT plan calculations are a huge overhead if the FFT sizes are the same.

    I know of it, but any call of fftw_execute shall be protected by mutex if it uses the global plan (imagine, if other thread changes the global plan when we are doing transform)

     
    • Frank
      Frank
      2012-11-14

      But if you mutex-protect you don't have the advantage of parallelism. I think it's best to leave it up to the user to handle the FFT plan. Simply by splitting the FFT up into 2 separate functions plan&execute, alternatively a simple variant with implicit plan calculation. It would be a simple wrapper to FFTW functions. The optimal handling strongly depends on the algorithm that uses FFT. Once I wrote something with many FFT calls on equal problem sizes. Then I would call the plan only once and later only execute multiple times.

       
  • andy_panov
    andy_panov
    2012-11-14

    I would suggest to add some functions to initialize per-thread plan explicitly from global plan or create new per-thread plan with desired parameters. Then it would cost nothing to call the transform function.

     
    • Frank
      Frank
      2012-11-14

      But the "global" plan depends on data and algorithm. Some algorithms uses mixed FFTs on multiple variables with different problem sizes. How can you automate this?

      In my opinion the user would have to create plans for different vector sizes manually and select the proper plan for every fftw_execute individually:
      http://www.fftw.org/doc/Using-Plans.html

      Btw: the planner should be semphore/mutex-protected, not the execute function:
      http://www.fftw.org/doc/Thread-safety.html#Thread-safety

       
  • andy_panov
    andy_panov
    2012-11-14

    Sure, each algorithm will use it's own plan. It'll be done the same way as in existing implementation. Existing implementation changes it's cached plan only if algorithm parameters (the transform length) changes. It uses the previously generated plan otherwise. So, plan is generated once for multiple transforms with the same length (this scenario seems to be most probable in applications). I'll just want to add an option to set the plan manually for each algorithm. I gonna use the free-standing functions because library supports several fft implementations (MKL, ACML and FFTW). We should abstract interface from specific fft implementation to make user code library-independent.

     
  • Frank
    Frank
    2012-11-15

    Ok. It would look like vecb = fft ( veca, plana_pointer )
    similar to the FFTW execute itself, the manual plan setting

    The transform parameters can change inside of the algorihms, if multiple vectors with different lengths are calculated.