Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

ITPP

motffh
2012-11-06
2012-11-10
  • motffh
    motffh
    2012-11-06

    I want to use parallel to accelarate my project,but it cannot.so i want to ask if itpp use the tecnology of parallel.

     
  • Frank
    Frank
    2012-11-06

    In general it is the task of the programmer to write parallel algorithms. But for big FFT I used the parallelized version of FFTW and it worked with multiple cores. I even managed to do this as MATLAB Mex function, but I had to replace the MATLAB-ACML by the AMD-provided ACML. Futhermore I sucessfully experimented with OpenMP pragmas. But you have to be careful. I read that ITPP is not thread-safe. Does anybody know the details? What exactly is forbidden? OpenMP automates parallelism, so it cannot always be optimal. The most important is to write algorithms that allow parallelism. If you are interested in multithread signal processing, you should have a look into Gnuradio, too. The gnuradio blocks can run multithread as stream processing units.

     
    Last edit: Frank 2012-11-06
    • motffh
      motffh
      2012-11-06

      thank you very much.I simulate a qpsk system both in serial and parallel(the way of Openmp). It consumes nearly the same time,sometimes parallel processing consume much more time.I really have no idea why.In theory,parallel processing is much quicker than serial processing.Does the qpsk systerm has used parallel algorithms.i doubt. I cannot find parallel alogrithms in my qpsk system. I respect your reply.Thanks for your help!

       
  • motffh
    motffh
    2012-11-06

    thank you very much.I simulate a qpsk system both in serial and parallel(the way of Openmp). It consumes nearly the same time,sometimes parallel processing consume much more time.I really have no idea why.In theory,parallel processing is much quicker than serial processing.Does the qpsk systerm has used parallel algorithms.i doubt. I cannot find parallel alogrithms in my qpsk system. I respect your reply.Thanks for your help!

     
    Attachments
    • andy_panov
      andy_panov
      2012-11-06

      berc is used simultaneously from several threads. There can be a race condition here.
      The same is true for awgn channel. Channel uses the same rngs for several threads.
      I would suggest to move the creation of these objects to the SNR-wise loop

       
      Last edit: andy_panov 2012-11-06
      • Darlan Moreira
        Darlan Moreira
        2012-11-06

        What about the channel related functions? I think anything related to
        random numbers share some "global information". Even if there is no
        race condition and the program runs "correctly" there is still the
        concern about the "quality" of the random sequences generated in
        multiple threads/processes.

        2012/11/6 andy_panov andypanov@users.sf.net:

        berc is used simultaneously from several threads. There can be a race
        condition here.


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/itpp/discussion/115655/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/prefs/

         
        • andy_panov
          andy_panov
          2012-11-07

          I think anything related to random numbers share some "global information".

          Not quite so. Each RNG class uses its own set of state variables. Some static constants are shared, but they are read-only. Look into random_dsfmt.h for futher details.

          Sorry for the misleading comment. All instances of RNG class share the same set of state variables. RNG calls shall be serialized somehow. It would be nice to have a per-thread RNGs.

           
          Last edit: andy_panov 2012-11-07
          • andy_panov
            andy_panov
            2012-11-07

            As a bottom line, multithreaded simulation with random numbers is nearly useless. It can be fixed if user would be allowed to set a per-thread RNG context to make random numbers generation independent for each thread.

             
            • Frank
              Frank
              2012-11-07

              To be correct, it's a PRNG, not RNG. Any idea to access the Intel hardware RNG chip easily? For a PRNG there has to be a global seed state with Mutex protected access. If every class has its isolated context, all threads would generate the same PRNG sequence. The class constructor has to call the global PRNG for the first seed. Afterwards it can run without global variables.

              As a workaround for classes without a global state varialbe, after initalisation you could reseed the PRNG within the loop. Simply adding the thread number to the seed variable will result in a different PRNG sequence. If it's a good generator, that's sufficient.

               
              • andy_panov
                andy_panov
                2012-11-07

                For a PRNG there has to be a global seed state with Mutex protected access. If every class has its isolated context, all threads would generate the same PRNG sequence. The class constructor has to call the global PRNG for the first seed. Afterwards it can run without global variables.

                Sounds like a good idea, Frank! Essentially it coincides with the approach proposed here:
                http://blogs.msdn.com/b/pfxteam/archive/2009/02/19/9434171.aspx

                It will definitely help if RNGs (or PRNGs if you wish) are created in thread context. I can provide a patch if maintainers have no objections.

                 
                • Bogdan Cristea
                  Bogdan Cristea
                  2012-11-08

                  Contributions are welcome. Please don't forget to provide some unit tests and use the new unit test framework, based on gtest.

                   
                  • andy_panov
                    andy_panov
                    2012-11-08

                    OK. Thank you, Bogdan. I need several days to write the patch and get familiar with gtest. I'll open a feature request once I am done with it.

                     
                    • Frank
                      Frank
                      2012-11-09

                      I think it would be a good idea to decide by #define switch if a multithread-version is created. You introduce locking mechanisms with this approach and a new dependency on libpthread. Usually, libraries come in two versions, single and multiple thread. I suppose it degrades performance for heavy use of PRNG in single-thread applications.

                       
                      • andy_panov
                        andy_panov
                        2012-11-09

                        Now I am trying to rely on omp pragmas only to do multithreading. I do not even want to include omp headers and use omp runtime. So, it should not be necessarily to define something. Single-threaded version will work smoothly without additional defines - compiler just ignores omp pragmas in single-threaded environment. Definitely, some code will be reorganized, but it should not significantly affect sinlge-threaded performance.

                         
                        • Frank
                          Frank
                          2012-11-09

                          Ok, if the Mutex-access can be realized with the pragmas only. I had the idea that you have to call special functions for that, but I'm not an experienced multithread programmer.

                          I wonder why there are so many "static" vars and functions within the IT++ random generators. Why not rely on a single static state and initialize all new random classes from this random source? If the call to this "mother" random source is Mutex-protected you have already the thread-safety for random generators. Alternatively you can always provide the manual seed initialization for a full user-control over the source.

                           
                          • andy_panov
                            andy_panov
                            2012-11-10

                            Static vars are used to store global context of DSFMT algorithm. It was just implemented this way. My implementation does not rely on it and use per-thread context for generation of pseudo-random numbers.

                             
      • Frank
        Frank
        2012-11-07

        True. Either put these inside the loop or mark them with #pragma omp private/firstprivate. private does not initialize. Possibly a problem with classes.

         
  • Frank
    Frank
    2012-11-06

    Did you enable OpenMP by the GCC flag "-fopenmp" ? Your code should be Ok. The simulations for each EbN0 are independent, so you don't need further parallel algorithms with IT++ at all. OpenMP takes care of scheduling the for-loop in multiple threads. I'm not sure if it's allowed that the vector bit_error_rate(i) is filled by different threads. At least there is no race condition. If it would grow within the loop this might be a non-thread-safety issue.

     
    • motffh
      motffh
      2012-11-08

      I enable Openmp by Gcc flag.I using openmp on my simulations,but without obtaining a significant gain.thank you very much .

       
    • motffh
      motffh
      2012-11-08

      I enable Openmp by Gcc flag.I using openmp on my simulations,but without obtaining a significant gain.thank you very much .

       
  • Frank
    Frank
    2012-11-08

    Maybe your vector is too short, only 1e6 bits. For short vectors the threading overhead is more significant than the speedup of the algorithm. pragma private means you multiply the number of variables in memory.