Menu

#931 loop is repeated without waiting for clock posedge

devel
closed-fixed
None
7
2013-09-16
2013-05-25
Iztok Jeras
No

This is probably a scheduler issue, maybe related to the fork/join statement.

To repeat, please do:
$ git clone https://github.com/jeras/butterflylogic.git
$ git checkout 5390c5b00ea4e93bb0821231adb4f10f98f3336f
$ cd butterflylogic/sim
$ make -f Makefile.iverilog WAVE=1 rle

For a correct simulation I used Modelsim Altera edition:
$ make -f Makefile.modelsim WAVE=1 rle

To see the difference, observe where the tb_rle.error counter is incremented. The issue is the loop at line 81 in tbn/tb_rle.sv should count only after clock posedge, instead 'sto_cnt' goes from 3 to 4 without a clock posedge.

I do not have the time to create a more compact example now. I will use ModelSim for some time then get back to this bug.

Regards,
Iztok Jeras

Discussion

  • Martin Whitaker

    Martin Whitaker - 2013-05-26

    I've taken a quick look at this, and it does appear to be a problem with fork/join. Attached is a simple test case that exposes the bug. The fault occurs when the first thread of the fork terminates whilst the second thread is inside a task. The bug is only in development; V0.9 executes this test case correctly.

     
  • Martin Whitaker

    Martin Whitaker - 2013-05-26
    • assigned_to: Martin Whitaker
     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27

    I've pushed a fix for this to github.

     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27
    • status: open --> closed-fixed
     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27
    • status: closed-fixed --> open
     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27

    Belay that - looks like I've only fixed part of the problem!

     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27
    • status: open --> closed-fixed
     
  • Martin Whitaker

    Martin Whitaker - 2013-05-27

    Should be fully fixed now.

     
  • Iztok Jeras

    Iztok Jeras - 2013-09-15

    I still have issues with similar code, maybe the issue was fixed but there is a regression. This time I created shorter example files. Basically there is a stream source and a stream drain with valid/ready handshaking signals. Inside a fork 4 source and 4 drain cycles are requested. Waveforms should be observed to see the issue.

    There are 2 issues visible:
    1. at the end of the last transfer the 'drn' module should put 'tready' into an inactive state
    2. received data should properly show inside the 'dat' signal

    To run the example in Icarus do:

    iverilog -g2012 iverilog_test.sv str.sv
    vvp a.out
    gtkwave waves.dump
    

    To run the example using ModelSim do:

    vlib work
    vlog iverilog_test.sv str.sv
    vsim -c -do 'run -all; quit' iverilog_test
    gtkwave waves.dump
    

    Regards,
    Iztok Jeras

     
    • Cary R.

      Cary R. - 2013-09-16

      This latest code has a race condition related to tvalid. After adding a simple watchdog and using the -pfileline=1 flag to observe how the procedural code is executed. The following is how this is executing with Icarus:

      tvalid = 1'b0 in str_src.trn
      while (~tvalid) @ (posedge clk) in str_drn.trn
      tvalid = 1'b1 in str_src.trn

      Adding some delay (e.g. @ (negedge clk) ) before the tvalid = 1'b0 assignment avoids the race by causing the statement in str_drn.trn to run before tvalid is set to zero, but this change may not work as you intended.

       
  • Iztok Jeras

    Iztok Jeras - 2013-09-16

    Thanks Cary,

    I think the proper approach would be to use the blocking operator at appropriate places. I tried a few approaches but did not find one working correctly (will try again tomorrow). The problem is I often do not know which operator should be used in sequential bench code (especially handshaking code). And this is still true after quiet a few years of coding in Verilog. Do you know any good article explaining it, so I could avoid reporting bogus bugs? Or maybe I will just try to read the details in the standard.

    Regards,
    Iztok Jeras

     
    • Cary R.

      Cary R. - 2013-09-16

      Yes a blocking delay instead of the @(negedge clk) would work just fine. If Icarus supported "step" that should even be enough. The standard describes the event loop in detail and if you understand how it works then spotting races is easier. The Icarus -pfileline=1 flag helps in pinpointing the exact loop, but you need to narrow the problem down since it is very verbose. Adding a $display at the appropriate place to see which code is running correctly and which is not helps to narrow the problem down (e.g. I very quickly determined that the source code was working correctly and that the drain code never ran correctly/was locking up). A zero time glitch does not always indicate a race, but if you are going to use them you had better understand exactly how the simulator is supposed to work.

      Note that the Icarus event loop currently works as described in 1364-2005 not 1800-2012.

      My knowledge of how to deal with/detect race conditions comes from years of Verilog experience and now years of simulator coding. I've never read a paper on race conditions so I cannot make a recommendation.

      I will also add that in code that has any kind of handshake I always add a watchdog to the test suite that detects if the simulation takes too long. It reports that the simulation triggered the watchdog and then terminates the simulation (e.g. initial begin #100_000 $display("Watchdog triggered"); $finish; end). You could also use a fixed number of clock cycles instead of a fixed delay assuming you are not using a gated clock. This all depends on what you are trying to do.

       

Log in to post a comment.