Menu

fruit works with openMP and is it thread safe

Help
fan xin
2015-11-16
2015-11-24
  • fan xin

    fan xin - 2015-11-16

    I'm a newbie to use fruit for my testing work. I'd like to make sure with you that is fruit thread safe and works fine with OpenMP? Thanks.

     
  • istomoya

    istomoya - 2015-11-16

    I've never checked if FRUIT works with OpenMP.
    However probably FRUIT is not thread safe, because
    * SAVE attribute is used in fruit.f90
    * fixed I/O unit number is used

     
    • fan xin

      fan xin - 2015-11-17

      Hi istomoya,
      Thanks for your kindly help. Although I'm disappointed to hear that answer.

       
  • istomoya

    istomoya - 2015-11-17

    Tried a tester code that uses OpenMP.
    Tester code is:

    module subs_test
    !$ use omp_lib
      use fruit
      implicit none
    contains
      subroutine    test_just_sum
        use subs, only : just_sum
    
        integer :: result
        character(50) :: str
    
    !$omp parallel
        write(str, '("thread", i3, " / ", i3)') omp_get_thread_num() + 1, omp_get_num_threads()
        result = just_sum(2, 3)
        call assert_equals(6, result, " it should fail, " // trim(str))
    
        result = just_sum(4, 5)
        call assert_equals(9, result, " it should success, " // trim(str))
    !$omp end parallel
      end subroutine test_just_sum
    end module subs_test
    

    Tested code is:

    module subs
      implicit none
    contains
      integer function just_sum(a, b)
        integer, intent(in) :: a, b
    
        just_sum = a + b
      end function just_sum
    end module subs
    

    Each therad supposed to report one fail and one success.

    Using two threads, result was OK. (log below)

     Test module initialized
    
        . : successful assert,   F : failed assert 
    
       ..running test: test_just_sum
    F.F.
       Un-satisfied spec:
       -- just sum
    
         Start of FRUIT summary: 
    
     Some tests failed!
    
       -- Failed assertion messages:
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  1 /   2]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  2 /   2]
       -- end of failed assertion messages.
    
     Total asserts :              4
     Successful    :              2
     Failed        :              2
    Successful rate:    50.00%
    
     Successful asserts / total asserts : [            2 /           4  ]
     Successful cases   / total cases   : [            0 /           1  ]
       -- end of FRUIT summary
    

    with 16 threads, message was corrupted.

     Test module initialized
    
        . : successful assert,   F : failed assert 
    
       ..running test: test_just_sum
    F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.
       Un-satisfied spec:
       -- just sum
    
         Start of FRUIT summary: 
    
     Some tests failed!
    
       -- Failed assertion messages:
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  2 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  3 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  4 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  5 /  16]
       [test_just_sum]:Expected; User message: [ it should fail, thread  1 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  6 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  7 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 13 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 11 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  8 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 10 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 15 /  16]
       [test_just_sum]:Expected [6], Got [5] [5]; User message: [ it should fail, thread 12 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread  9 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 14 /  16]
       [test_just_sum]:Expected [6], Got [5]; User message: [ it should fail, thread 16 /  16]
       -- end of failed assertion messages.
    
     Total asserts :             32
     Successful    :             16
     Failed        :             16
    Successful rate:    50.00%
    
     Successful asserts / total asserts : [           16 /          32  ]
     Successful cases   / total cases   : [            0 /           1  ]
       -- end of FRUIT summary
    

    So, fruit.f90 needs some more modification.

     
    • fan xin

      fan xin - 2015-11-18

      Hi istomoya,
      Thanks for you providing test code with openMP. I do think message corrupted is caused by race condition. I also parallize my testing code with openMP directives and it works fine. I just use asserts from fruit to be parallelized and not let the whole framwork be parallelilzed. So I consider multi-threads doesn't impact asserts from fruit. What do you think of that?

      module base
      integer :: baseint
      end module

      module a
      use base, ONLY: aint => baseint
      end module

      module b
      use base, ONLY: bint => baseint
      end module

      module c
      use a
      use b
      private
      public :: aint, bint
      end module
      module launch_test
      use fruit
      implicit none
      contains
      subroutine launchHello
      use c, only : aint, bint
      aint = 3
      bint = 5
      write (, ) " ..running test: launchHello"
      call assert_equals(aint, bint)
      call assert_equals(aint+1, bint)
      end subroutine
      end module

      module fruit_gen
      use omp_lib
      use launch_test
      contains
      subroutine hxlaunch_mimic
      !$OMP PARALLEL DO
      do i=1, 10
      write (,) "thread id=",OMP_GET_THREAD_NUM()
      call launchHello
      end do
      !$OMP END PARALLEL DO
      end subroutine

      subroutine test_all_tests
          call set_unit_name('launch_test')
          call run_test_case(hxlaunch_mimic, "launchHello")
      end subroutine
      

      end module

      =======================================
      Test module initialized

      . : successful assert,   F : failed assert
      

      thread id= 4
      ..running test: launchHello
      thread id= 1
      ..running test: launchHello
      . thread id= 2
      ..running test: launchHello
      . thread id= 0
      ..running test: launchHello
      .. thread id= 7
      ..running test: launchHello
      . thread id= 6
      ..running test: launchHello
      . thread id= 5
      ..running test: launchHello
      . thread id= 3
      ..running test: launchHello
      .FFF thread id= 0
      ..running test: launchHello
      .FF thread id= 1
      ..running test: launchHello
      .FFFFF

       Start of FRUIT summary:
      

      Some tests failed!

      -- Failed assertion messages:
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      [launchHello]:Expected [6], Got [5]
      -- end of failed assertion messages.

      Total asserts : 20
      Successful : 10
      Failed : 10
      Successful rate: 50.00%

      Successful asserts / total asserts : [ 10 / 20 ]
      Successful cases / total cases : [ 0 / 1 ]
      -- end of FRUIT summary

       
  • fan xin

    fan xin - 2015-11-18

    Hi istomoya,
    I think you are right. When I increase theads to 100+, app gets a crash maybe.
    ..running test: launchHello
    .F thread id= 20
    ..running test: launchHello
    .F. thread id= 43
    ..running test: launchHello
    .FF. ..running test: launchHello
    . thread id= 33

    Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

    Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

    Backtrace for this error:

    Backtrace for this error:

    I think a easy way to fix this should add a mutex lock for subroutine failed_assert_action and add_success. But I'm not an expert in fortran. If possible, can you please help me on this. Thanks a lot.

     
  • fan xin

    fan xin - 2015-11-18

    Hi ismotoya,
    I'm confused by test result. When I created 300 thread to launch a subrotine which contains assers from fruit. It get failed 1/20 approximately. Most of times it gets passed.

     
  • istomoya

    istomoya - 2015-11-18

    I tried adding several pairs of "critical" directives to fruit.f90.
    Could you try fruit.f90 uploaded to CVS?
    http://fortranxunit.cvs.sourceforge.net/viewvc/fortranxunit/fruit/src/fruit.f90?view=log

     
    • fan xin

      fan xin - 2015-11-19

      Hi istomoya,
      Thanks for your effort on updating framework. I have a try on the latest version and run same app 100 times with 500 threads. Each run have 5 secondes sleep. Unfortunately it gets failed once.
      If with 100 threads, all tests get passed.

      33227 Program received signal SIGABRT: Process abort signal.
      33228
      33229 Backtrace for this error:
      33230 ...........
      33231 .........F.
      33232 Program received signal SIGABRT: Process abort signal.
      33233
      33234 Backtrace for this error:
      33235 FF.F.F.F.FFF.....F.FFFFFF.FF..FFFF.#0 0x7F886B344417
      33236 ..#1 0x7F886B344A2E
      33237 F#2 0x7F886A41F49F
      33238 #3 0x7F886A41F425
      33239 F.#4 0x7F886A422B8A
      33240 F#5 0x7F886A45D39D
      33241 F#6 F0x7F886A467B95
      33242 ....#7 0x40CE7D in fruit_MOD_increase_message_stack_ at fruit.f90:0
      33243 .#8 0x40BECC in
      fruit_MOD_failed_assert_action
      33244 .F#9 0x408FE4 in fruit_MOD_assert_eq_int_
      33245 F.#10 0x40FFDE in
      launch_test_MOD_launchhello
      33246 F.F#11 0x410146 in __fruit_gen_MOD_hxlaunch_mimic._omp_fn.0 at access_spec.f90:0
      33247 #12 0x7F886AE26865
      33248 FFF#13 0x7F886A7B0E99
      33249 F#14 0x7F886A4DD3FC
      33250 F..F.F..
      33251 F..
      33252 Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
      33253
      33254 Backtrace for this error:
      33255 ...FFFFFF.F....
      33256 Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

       
    • fan xin

      fan xin - 2015-11-19

      Hi istomoya,
      I have another method to fix thread sync issue.
      I replaced !#omp critical (xxxx) with !#omp single.
      Then I created 3000 theads and each to run asserts 20 times. Test app doesn't get a crash this time. Thanks for your great effrot on this issue.

       
  • istomoya

    istomoya - 2015-11-19

    Thank you for using FRUIT.
    I afraid that using !#omp single ,number of assertions (both fails and successes) are reduced in the report.
    Only assertations ran by one thread is counted and assertations called from other threads will not be counted.
    In my case, only one error reported while 100 errors should have reported.

    Well, I'll keep on trying to improve CVS version.

     
    • fan xin

      fan xin - 2015-11-20

      Hi istomoya,
      You are right. I can reproduce this issue on my side. Do you mean there are bugs in framework still? Are there any chances to fix that?

       
  • istomoya

    istomoya - 2015-11-20

    I found a bug in fruit.f90 and fixed version is
    http://fortranxunit.cvs.sourceforge.net/viewvc/fortranxunit/fruit/src/fruit.f90?view=log
    as Revision 1.60.

     
    • fan xin

      fan xin - 2015-11-23

      Hi istomoya,
      I have a try with the latest version 3.4.0 and it seems work fine with 200 threads. I run test app several times and it doesn't get crashed. But if increasing threads to 300, some error message will output.
      A minor bug maybe there is about error message output as below:
      ...FF thread id= 2955
      ..running test: anotherHello2
      .F thread id= 2248
      ..running test: anotherHello1
      ...F
      .F thread id= 2016
      ..running test: anotherHello
      ...F..F thread id= 2268
      ..running test: anotherHello
      ..F...FFF thread id= 1533
      ..running test: anotherHello4
      .FF thread id= 1815
      ..running test: anotherHello2
      .F thread id= 1692
      ..running test: anotherHello
      .FFF thread id= 805
      ..running test: anotherHello4
      F0n�pc�te

       

      Last edit: fan xin 2015-11-23
    • fan xin

      fan xin - 2015-11-23

      Hi istomoya,
      There maybe still bugs in final summary. Please consider a test as below:
      module launch_test
      subroutine launchHello
      use c, only : aint, bint
      aint = 3
      bint = 5
      write (, ) " ..running test: launchHello"
      call assert_equals(aint, bint)
      call assert_equals(aint+1, bint)
      end subroutine
      end module

      module fruit_gen
      use omp_lib
      use launch_test
      contains
      subroutine hxlaunch_mimic
      integer :: tid
      integer :: N=3000
      call OMP_SET_NUM_THREADS(N)
      !$OMP PARALLEL private(tid)
      tid = OMP_GET_THREAD_NUM()
      if (tid .eq. 0) then
      print ,"number of threads=", OMP_GET_NUM_THREADS()
      end if
      write (
      ,*) "thread id=",tid
      call launchHello

      !$OMP END PARALLEL
      end subroutine

      I Have launched 3000 threads and each to assert twice. I think total asserts should be 6000. But it report:
      Total asserts : 5000
      Successful : 3000
      Failed : 2000
      Successful rate: 60.00%

      Successful asserts / total asserts : [ 3000 / 5000 ]
      Successful cases / total cases : [ 0 / 0 ]
      -- end of FRUIT summary
      ..STOP 1

       
  • istomoya

    istomoya - 2015-11-23

    Number of errors is limited to 2000 (given as MAX_MAG_STACK_SIZE) in fruit.f90.
    Whole test had been simply stopped (without caring about OpenMP) when this limit was reached.

     
    • fan xin

      fan xin - 2015-11-24

      Hi istomoya,
      You make great sense to me. So I think framework works fine with my tests. Thanks for your great effort!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.