Thread: [Iverilog-devel] VVP Performance boots

Brought to you by: caryr, martinwhitaker, stevewilliams

iverilog-devel

[Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-21 14:39:25

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I've pushed into GIT changes to the vvp_vector4_t class that
rework how the bits of bit4 values are stored in the vector. This
change is based on ideas from Cary. The result is a *significant*
boost in performance for simulations heavy with arithmetic. The
multiple_large test is something like 20% faster. So we may want
to be on the lookout for similar opportunities. We are still in
the need of performance improvement on run time.
- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDKcbrPt1Sc2b3ikRAt1NAJ9522FvldrSFmNMPj+MkEv9xtHyQgCfUYQX
wCgQE7oBaFHAvjgj/6zopwA=
=1jMT
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-21 17:16:35

--- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:

> I've pushed into GIT changes to the vvp_vector4_t class
> that
> rework how the bits of bit4 values are stored in the
> vector. This
> change is based on ideas from Cary. The result is a
> *significant*
> boost in performance for simulations heavy with arithmetic.
> The
> multiple_large test is something like 20% faster. So we may
> want
> to be on the lookout for similar opportunities. We are
> still in
> the need of performance improvement on run time.

I just run the test suite on my RHEL 4 32 bit machine and this patch causes an infinite loop in vvp for xnor_test and gives incorrect results for writememh2, writememb1, writemem2 and scanmem3 (from the VPI tests).

It appears that the writemem* work files are being created correctly. The scanmem3 file looks to be z vs x differences.

Once the problems are fixed I will also run valgrind to look for any other problem.

Once this is stable I think there are some more optimization that can be done for the various logic gates/functions. I can forward you the Verilog file I used to develop the gate optimization if you would like.

Cary

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-21 17:36:08

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:
|
|> I've pushed into GIT changes to the vvp_vector4_t class
|> that
|> rework how the bits of bit4 values are stored in the
|> vector. This
|> change is based on ideas from Cary. The result is a
|> *significant*
|> boost in performance for simulations heavy with arithmetic.
|> The
|> multiple_large test is something like 20% faster. So we may
|> want
|> to be on the lookout for similar opportunities. We are
|> still in
|> the need of performance improvement on run time.
|
| I just run the test suite on my RHEL 4 32 bit machine and this patch
causes an infinite loop in vvp for xnor_test and gives incorrect results
for writememh2, writememb1, writemem2 and scanmem3 (from the VPI tests).
|
| It appears that the writemem* work files are being created correctly.
The scanmem3 file looks to be z vs x differences.


Huh, they work for me on both PowerPC (Mac OS X) and AMD64 (Linux).

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDNCBrPt1Sc2b3ikRAoOUAJwKkhYZjOF89r9dboKal6DjxB30kACbBEmH
GfSbopNDYI8iTzM/BYZULgc=
=iK2e
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-21 18:57:47

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| I just run the test suite on my RHEL 4 32 bit machine and this patch
causes an infinite loop in vvp for xnor_test and gives incorrect results
for writememh2, writememb1, writemem2 and scanmem3 (from the VPI tests).

WRT xnor_test: Can you instrument the of_XNOR function in vthread.cc
and try it? Something like this:

	    vvp_bit4_t lb = thr_get_bit(thr, idx1);
	    vvp_bit4_t rb = thr_get_bit(thr, idx2);
	    cerr << "XXXX XNOR: lb=" << lb << " rb=" << rb
		 << " lb^rb=" << (lb ^ rb) << " ~(lb^rb)=" << ~(lb ^ rb)
~                 << endl;

... inside the "for" loop of the of_XNOR function. None of the
bits in the result should turn into X or Z. If the %xnor instruction
generates X results, then the loop in xnor_test.v may go infinite.
- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDOOqrPt1Sc2b3ikRAs81AKCEDDXsEcTw6tbdfRt2VLeJHhVINQCfRb9H
QraUsb3foIZpAQaMVW+KoJw=
=7AMq
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-21 20:38:48

--- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:

> WRT xnor_test: Can you instrument the of_XNOR function in
> vthread.cc
> and try it? Something like this:
> 
> 	    vvp_bit4_t lb = thr_get_bit(thr, idx1);
> 	    vvp_bit4_t rb = thr_get_bit(thr, idx2);
> 	    cerr << "XXXX XNOR: lb=" << lb
> << " rb=" << rb
> 		 << " lb^rb=" << (lb ^ rb) <<
> " ~(lb^rb)=" << ~(lb ^ rb)
> ~                 << endl;
> 
> ... inside the "for" loop of the of_XNOR
> function. None of the
> bits in the result should turn into X or Z. If the %xnor
> instruction
> generates X results, then the loop in xnor_test.v may go
> infinite.

The part of the test that is going into an infinite loop is the "different sized operands (equality)" section.

The small1 value is set to zero initial then increments to one, but never goes past that. It appears that the sixteen bit xnor value is calculated correctly. Both the vthread code and a straight print of large2 shows all bits are set to one.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-21 21:47:49

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:
|
|> WRT xnor_test: Can you instrument the of_XNOR function in
|> vthread.cc
|> and try it? Something like this:
|>
|> 	    vvp_bit4_t lb = thr_get_bit(thr, idx1);
|> 	    vvp_bit4_t rb = thr_get_bit(thr, idx2);
|> 	    cerr << "XXXX XNOR: lb=" << lb
|> << " rb=" << rb
|> 		 << " lb^rb=" << (lb ^ rb) <<
|> " ~(lb^rb)=" << ~(lb ^ rb)
|> ~                 << endl;
|>
|> ... inside the "for" loop of the of_XNOR
|> function. None of the
|> bits in the result should turn into X or Z. If the %xnor
|> instruction
|> generates X results, then the loop in xnor_test.v may go
|> infinite.
|
| The part of the test that is going into an infinite loop is the
"different sized operands (equality)" section.
|
| The small1 value is set to zero initial then increments to one, but
never goes past that. It appears that the sixteen bit xnor value is
calculated correctly. Both the vthread code and a straight print of
large2 shows all bits are set to one.

I think we are looking for some sort of uninitialized memory problem.
It appears that the %xnor opcode is fine in your case, but there is
some problem managing the index variable. Try putting a test print
in the of_ADDI function to look at the first word of the lva array
that is returned by the vector_to_array before and after the "for"
loop that does the addition. At the very least, this will confirm
that the index variable value is stuck.

Perhaps the %set/v or %load/v (if not the %addi) are not working
properly? Seeing what's going into/out of the %addi may help us pin
it down.

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDQuGrPt1Sc2b3ikRAhrcAJ4+1AS7MxQJvTrLva3wVr1VYT3cjACdHN2n
gKE2LNz9v95MUcg3a62FVFg=
=QPcr
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-22 03:22:40

--- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:

> I think we are looking for some sort of uninitialized
> memory problem.

Our internet and phones are down at work so I will try and look at your suggestions tomorrow. I believe I ran valgrind on the failing tests looking for this type of problem and found nothing!

I have an email at work waiting to send about the writemem tests. Here is a brief synopsis. The problems with the three writemem tests appears to be related to a different, but recent patch. I didn't track down which one yet. It looks like the compiler is incorrectly evaluating the 1 << (6-code) in the comparison. The 1 is being loaded as a two bit value not the 32 bit integer it should be. This means when code is less than five the value is being set to zero. FYI explicitly setting the width of 1 (32'b1) fixes the problem.

Cary 

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-22 16:57:39

--- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:

> I think we are looking for some sort of uninitialized
> memory problem.
> It appears that the %xnor opcode is fine in your case, but
> there is
> some problem managing the index variable. Try putting a
> test print
> in the of_ADDI function to look at the first word of the
> lva array
> that is returned by the vector_to_array before and after
> the "for"
> loop that does the addition. At the very least, this will
> confirm
> that the index variable value is stuck.
> 
> Perhaps the %set/v or %load/v (if not the %addi) are not
> working
> properly? Seeing what's going into/out of the %addi may
> help us pin
> it down.

This appears to be some kind of race condition! Oh fun!!. I can simplify the the loop down to the large1 assign and the two xnor assignments. Deleting either one of the xnor assignments will make the inf looping go away. The %addi is always getting 0 from the tread vector, so it appears to be working correctly. It also looks like the %load/v has the correct signal value (it starts at zero and then goes to 1). The only thing between the %load/v and the %addi are a couple of %mov instructions to pad the value.

If I add 100 to the thread addresses for the code that increments the loop counter everything works correctly, so it appears that when we have two xnor statements the second one is some how changing values after it should be and is zeroing the loop counter. I'm not certain how this is happening since the xnor should be producing all ones as output. I can send the reduced test case and a.out file if that helps. FYI the second xnor and the loop counter increment use the same or overlapping thread space addresses.

Cary

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-22 18:47:50

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Mon, 4/21/08, Stephen Williams <st...@ic...> wrote:
|
|
|> I think we are looking for some sort of uninitialized
|> memory problem.
|> It appears that the %xnor opcode is fine in your case, but
|> there is
|> some problem managing the index variable. Try putting a
|> test print
|> in the of_ADDI function to look at the first word of the
|> lva array
|> that is returned by the vector_to_array before and after
|> the "for"
|> loop that does the addition. At the very least, this will
|> confirm
|> that the index variable value is stuck.
|>
|> Perhaps the %set/v or %load/v (if not the %addi) are not
|> working
|> properly? Seeing what's going into/out of the %addi may
|> help us pin
|> it down.
|
| This appears to be some kind of race condition! Oh fun!!. I can
simplify the the loop down to the large1 assign and the two xnor
assignments. Deleting either one of the xnor assignments will make the
inf looping go away. The %addi is always getting 0 from the tread
vector, so it appears to be working correctly. It also looks like the
%load/v has the correct signal value (it starts at zero and then goes to
1). The only thing between the %load/v and the %addi are a couple of
%mov instructions to pad the value.

When you say that %addi is getting a 0, do you mean lva[0]==0,
or lva==0? The latter means that there were XZ bits in the vector
so it will cancel the add. That's probably not what's happening,
but I'm just double-checking.

Can you send me the a.out from your compile of the reduced test
program? (And the reduced test program as well.)  Need to make
sure we are generating the same code from a given input. Given
that, I think we'll need to dump the thread state between every
instruction to make sure all the operations are doing exactly
what we expect. This can be done by dumping the thr->bit4 value
in vthread_run after (and maybe also before) each call to the
opcode function pointer. Getting desperate here:-(

| If I add 100 to the thread addresses for the code that increments the
loop counter everything works correctly, so it appears that when we have
two xnor statements the second one is some how changing values after it
should be and is zeroing the loop counter. I'm not certain how this is
happening since the xnor should be producing all ones as output. I can
send the reduced test case and a.out file if that helps. FYI the second
xnor and the loop counter increment use the same or overlapping thread
space addresses.

Or more likely the second one has a different input value that is
triggering an X values. Or vectors running off their ends. Or
something like that. I think it is unlikely that there is a race
because there are no nets to race against.


- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDjLYrPt1Sc2b3ikRAoL9AJ9tbi1o9uIfk3S9CgxZRiwWAAd39QCgzkzi
9H1PDCnOtxrkCrz+G8g42Yg=
=JXOS
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-22 20:12:36

Attachments: a.out xnor_loop_fail.v

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote

> When you say that %addi is getting a 0, do you mean
> lva[0]==0,
> or lva==0? The latter means that there were XZ bits in the
> vector
> so it will cancel the add. That's probably not
> what's happening,
> but I'm just double-checking.

I was checking inside the loop so I'm fairly sure it is lva[idx].

> Can you send me the a.out from your compile of the reduced
> test
> program? (And the reduced test program as well.)  Need to
> make
> sure we are generating the same code from a given input.

Attached. The a.out is the modified add 100 to get things to work. It should be obvious what needs to be removed to bring the inf. loop back.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-22 21:08:28

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I think you are saying that changing this:
~    %load/v 12, v0x855918_0, 4;
~    %mov 132, 12, 4;
~    %mov 136, 0, 28;
~    %addi 132, 1, 32;
~    %set/v v0x855918_0, 132, 4;

to this:
~    %load/v 12, v0x855918_0, 4;
~    %mov 32, 12, 4;
~    %mov 36, 0, 28;
~    %addi 32, 1, 32;
~    %set/v v0x855918_0, 32, 4;

will bring the infinite loop back for you?

I don't see anything wrong. I think we need to see the value of
the thread bit4 after every opcode:-(


Cary R. wrote:
| --- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote
|
|> When you say that %addi is getting a 0, do you mean
|> lva[0]==0,
|> or lva==0? The latter means that there were XZ bits in the
|> vector
|> so it will cancel the add. That's probably not
|> what's happening,
|> but I'm just double-checking.
|
| I was checking inside the loop so I'm fairly sure it is lva[idx].
|
|> Can you send me the a.out from your compile of the reduced
|> test
|> program? (And the reduced test program as well.)  Need to
|> make
|> sure we are generating the same code from a given input.
|
| Attached. The a.out is the modified add 100 to get things to work. It
should be obvious what needs to be removed to bring the inf. loop back.
|
|
- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDlPMrPt1Sc2b3ikRAnQAAJ93NeWINaAv2nJn/liX2kqanLJQuwCbBZvu
bhhzLyi7lwExTEP1oPDs2sA=
=eTvY
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-22 21:29:19

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:

> I think you are saying that changing this:
> ~    %load/v 12, v0x855918_0, 4;
> ~    %mov 132, 12, 4;
> ~    %mov 136, 0, 28;
> ~    %addi 132, 1, 32;
> ~    %set/v v0x855918_0, 132, 4;
> 
> to this:
> ~    %load/v 12, v0x855918_0, 4;
> ~    %mov 32, 12, 4;
> ~    %mov 36, 0, 28;
> ~    %addi 32, 1, 32;
> ~    %set/v v0x855918_0, 32, 4;
> 
> will bring the infinite loop back for you?

That is correct.
 
> I don't see anything wrong. I think we need to see the
> value of
> the thread bit4 after every opcode:-(

I was expecting this. A tutorial or a patch with #ifdef code would be most appreciated.

Cary



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-22 21:35:13

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:
|
|> I think you are saying that changing this:
|> ~    %load/v 12, v0x855918_0, 4;
|> ~    %mov 132, 12, 4;
|> ~    %mov 136, 0, 28;
|> ~    %addi 132, 1, 32;
|> ~    %set/v v0x855918_0, 132, 4;
|>
|> to this:
|> ~    %load/v 12, v0x855918_0, 4;
|> ~    %mov 32, 12, 4;
|> ~    %mov 36, 0, 28;
|> ~    %addi 32, 1, 32;
|> ~    %set/v v0x855918_0, 32, 4;
|>
|> will bring the infinite loop back for you?
|
| That is correct.
|
|> I don't see anything wrong. I think we need to see the
|> value of
|> the thread bit4 after every opcode:-(
|
| I was expecting this. A tutorial or a patch with #ifdef code would be
most appreciated.
|

Try modifying the vhtread_run() function in vthread.cc to include
a simple print after the call to the opcode, like this:

~   bool rc = (cp->opcode)(thr, cp);
~   cerr << "thr pc=" << thr->pc << " bit4=" << thr->bits4 << endl;

That is, put the cerr on around line 335. It won't be pretty, but
I should be able to decode what it is saying. It will print the
thread bit4 bits after every instruction. Then send me the output
(truncated to a few hundred lines) and the .out that you used.


- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDloQrPt1Sc2b3ikRApUXAJ9EuvRLXo3GRgJj1KJonSgfWGZOYACeLPRn
3GLVByAUluh/0oPFt5TD7Mg=
=zbu6
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-22 23:20:57

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stephen Williams wrote:
| Try modifying the vhtread_run() function in vthread.cc to include
| a simple print after the call to the opcode, like this:
|
| ~   bool rc = (cp->opcode)(thr, cp);
| ~   cerr << "thr pc=" << thr->pc << " bit4=" << thr->bits4 << endl;

OK, I'm seeing something interesting. The second %addi is not
working. Judging by the symptoms, it appears to be reading a
32'd0 from the thread where it should be reading a 32'd1.

~   %load/v 12, v0xa0296d8_0, 4; "small1"
pc=0x84469e0
bit4=64'b000000000000000000000000000000010001111111111111_0001_0001X010ZX10

~   %mov 32, 12, 4;
pc=0x84469f0
bit4=64'b0000000000000000000000000000_0001_0001111111111111_0001_0001X010ZX10

~   %mov 36, 0, 28;
pc=0x8446a00
bit4=64'b_0000000000000000000000000000_0001000111111111111100010001X010ZX10

~   %addi 32, 1, 32;
pc=0x8446a10
bit4=64'b_00000000000000000000000000000001_000111111111111100010001X010ZX10

The 32'd1 stays 32'd1 in the thread, and hence the loop.

We need to look more carefully at the of_ADDI instruction. The
most likely problem is the vector_to_array() call returning a
wrong value. Then we need to look closely at the lvb array that
is made up by from the immediate data. This is probably OK if
the first %addi created an addition. Then we also need to look
at the final result that is lva. That is probably OK (given the
initial lva and lvb) because the thread got modified with the
correct result after the first %addi.

Can you put a test print in of_ADDI that prints lva[0] and lvb[0]
right before the "for" loop in the of_ADDI instruction? I'm going
to best that lva!=0 and lva[1]==0 every time through the of_ADDI
instruction.
- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDnLcrPt1Sc2b3ikRArjLAJ9rU6RGHgcIpw8QHoeraH3pUaBLewCfda1E
DgNvhyHBwnRWwmyvtPd2kg4=
=SHQK
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 00:18:03

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:

> We need to look more carefully at the of_ADDI instruction.
> The
> most likely problem is the vector_to_array() call returning
> a
> wrong value.

A very good guess! It's actually in vvp_vector4_t::subarray() in vvp_net.cc. In this case we are getting 32 bits at address 32, so on a 32 bit machine we are word aligned. The problem is that the atmp and btmp masking is failing. before the masking we have the correct value after it we have zero. This is caused by the fact that you cannot shift a 32 bit value by 32 bits. The same thing could also happen with 64 bit values. The 100 was a red hearing since it broke the word alignment.

I will let you decide how you want to fix this. As I remember I fixed something like this in the exact same routine not too long ago. Of course that was before you rewrote it to work with the new layout ;-)

Cary

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-23 00:31:39

Attachments: foo.c

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:
|
|
|> We need to look more carefully at the of_ADDI instruction.
|> The
|> most likely problem is the vector_to_array() call returning
|> a
|> wrong value.
|
| A very good guess! It's actually in vvp_vector4_t::subarray() in
vvp_net.cc. In this case we are getting 32 bits at address 32, so on a
32 bit machine we are word aligned. The problem is that the atmp and
btmp masking is failing. before the masking we have the correct value
after it we have zero. This is caused by the fact that you cannot shift
a 32 bit value by 32 bits. The same thing could also happen with 64 bit
values. The 100 was a red hearing since it broke the word alignment.
|
| I will let you decide how you want to fix this. As I remember I fixed
something like this in the exact same routine not too long ago. Of
course that was before you rewrote it to work with the new layout ;-)


Tell me what the attached program does for you on your machine.
When I compile it for 64bit (cc foo.c) and 32bits (cc -m32 foo.c)
I get the correct answer. And of course, on PPC32 I get the correct
results. I expect (1<<32) to be 0 on a 32bit machine.

Still, I'm fixing it here. Try pulling from git.

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDoNxrPt1Sc2b3ikRAvrqAJ9o+E9LANsuubWO2Y4rAofm3ovSCQCgl4Es
MMQ8ASQhb0m0NPOnotmpK0k=
=3dVZ
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 00:52:22

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:

> From: Stephen Williams <st...@ic...>
> Subject: Re: [Iverilog-devel] VVP Performance boots
> To: "Discussions concerning Icarus Verilog development" <ive...@li...>
> Date: Tuesday, April 22, 2008, 5:31 PM
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Cary R. wrote:
> | --- On Tue, 4/22/08, Stephen Williams
> <st...@ic...> wrote:
> |
> |
> |> We need to look more carefully at the of_ADDI
> instruction.
> |> The
> |> most likely problem is the vector_to_array() call
> returning
> |> a
> |> wrong value.
> |
> | A very good guess! It's actually in
> vvp_vector4_t::subarray() in
> vvp_net.cc. In this case we are getting 32 bits at address
> 32, so on a
> 32 bit machine we are word aligned. The problem is that the
> atmp and
> btmp masking is failing. before the masking we have the
> correct value
> after it we have zero. This is caused by the fact that you
> cannot shift
> a 32 bit value by 32 bits. The same thing could also happen
> with 64 bit
> values. The 100 was a red hearing since it broke the word
> alignment.
> |
> | I will let you decide how you want to fix this. As I
> remember I fixed
> something like this in the exact same routine not too long
> ago. Of
> course that was before you rewrote it to work with the new
> layout ;-)
> 
> 
> Tell me what the attached program does for you on your
> machine.
> When I compile it for 64bit (cc foo.c) and 32bits (cc -m32
> foo.c)
> I get the correct answer. And of course, on PPC32 I get the
> correct
> results. I expect (1<<32) to be 0 on a 32bit machine.
> 
> Still, I'm fixing it here. Try pulling from git.

This is printing:

foo before shift = 0x1
foo after shift = 0x1
foo after subtract = 0x0

The problem as I remember it is that a  bit shift that is greater than or equal to the machine variable width is undefined so in this case it is doing nothing. Though I agree I would also appreciate it if the left shift gave zero for this condition.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 01:34:08

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:

> Still, I'm fixing it here. Try pulling from git.

That fixed the vvp tests. One VPI test is still failing and I know why. When you changed the vvp_bit4_t type there were many places in vpi_vthr_vector.cc and a few other files that had the specific bit referenced by constant value not enum value! tables.cc also looks to be in error and while we are at it it would be nice if tables.cc also contained a single digit version that can be used to replace the ones in vpi_vthr_vector.cc, vpi_signal.cc, etc.

I have start down the path to fixing this, but could you confirm that my assumptions are correct. I did find a few places where the different VPI and internal bit patterns cause incorrect code of X and Z. Now that they are the same life should be easier.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-23 02:21:08

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:
|
|> Still, I'm fixing it here. Try pulling from git.
|
| That fixed the vvp tests. One VPI test is still failing and I know
why. When you changed the vvp_bit4_t type there were many places in
vpi_vthr_vector.cc and a few other files that had the specific bit
referenced by constant value not enum value! tables.cc also looks to be
in error and while we are at it it would be nice if tables.cc also
contained a single digit version that can be used to replace the ones in
vpi_vthr_vector.cc, vpi_signal.cc, etc.
|
| I have start down the path to fixing this, but could you confirm that
my assumptions are correct. I did find a few places where the different
VPI and internal bit patterns cause incorrect code of X and Z. Now that
they are the same life should be easier.

All the uses of the tables in tables.cc should be handled so that
they are independent of the vvp_bit4_t encoding. I manually map
vvp_bit4_t objects to bit pairs like are expected in the tables.
Or at least I *think* I got it every where the hex and oct tables
in tables.cc are used. Those are all the tables that are left
in tables.cc.

Note that tables.cc is generated by draw_tt.c

As for all the bit pattern references in vpi_vthr_vector.cc, I
thought I got them all, but I can see that I missed a few yet.
I would really like to change those last bits so that they do
not in any way depend on vvp_bit4_t encoding. I would rather
keep all code that depends on vvp_bit4_t encoding in the vvp_net.h
and vvp_net.cc files, if possible. If that means putting new map
arrays in vvp_net.h, that's OK I think.

For example, I see a few remaining cases of expressions of the
form ("01xz"[foo]) where foo is a vvp_bit4_t. Better to bury
those as macros in vvp_net.h.

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIDp0arPt1Sc2b3ikRAoJMAKCrRtImvNh/Pctpr/YGs4MVAs9Y2wCfRDGc
BTQq1GxgDsGit1Tolfw6Wik=
=sSnC
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 03:05:30

--- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:

> All the uses of the tables in tables.cc should be handled
> so that
> they are independent of the vvp_bit4_t encoding. I manually
> map
> vvp_bit4_t objects to bit pairs like are expected in the
> tables.
> Or at least I *think* I got it every where the hex and oct
> tables
> in tables.cc are used. Those are all the tables that are
> left
> in tables.cc.
> 
> Note that tables.cc is generated by draw_tt.c
> 
> As for all the bit pattern references in
> vpi_vthr_vector.cc, I
> thought I got them all, but I can see that I missed a few
> yet.
> I would really like to change those last bits so that they
> do
> not in any way depend on vvp_bit4_t encoding. I would
> rather
> keep all code that depends on vvp_bit4_t encoding in the
> vvp_net.h
> and vvp_net.cc files, if possible. If that means putting
> new map
> arrays in vvp_net.h, that's OK I think.
> 
> For example, I see a few remaining cases of expressions of
> the
> form ("01xz"[foo]) where foo is a vvp_bit4_t.
> Better to bury
> those as macros in vvp_net.h.

OK this is yours to fix. Once you fix the above string scanmem3 passes correctly. I saw some strange aval/bval stuff in vpi_memory.cc, vpi_signal.cc and vpi_vthr_vector.cc. All this is related to X vs Z values.

Adding a bin_digits to table.cc would also help encapsulate this. There are also a few places where the oct_digits array is incorrectly given size 256 instead of 64.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-23 18:19:49

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Tue, 4/22/08, Stephen Williams <st...@ic...> wrote:
|> As for all the bit pattern references in
|> vpi_vthr_vector.cc, I
|> thought I got them all, but I can see that I missed a few
|> yet.
|> I would really like to change those last bits so that they
|> do
|> not in any way depend on vvp_bit4_t encoding. I would
|> rather
|> keep all code that depends on vvp_bit4_t encoding in the
|> vvp_net.h
|> and vvp_net.cc files, if possible. If that means putting
|> new map
|> arrays in vvp_net.h, that's OK I think.


| OK this is yours to fix. Once you fix the above string scanmem3 passes
correctly. I saw some strange aval/bval stuff in vpi_memory.cc,
vpi_signal.cc and vpi_vthr_vector.cc. All this is related to X vs Z values.
|
| Adding a bin_digits to table.cc would also help encapsulate this.
There are also a few places where the oct_digits array is incorrectly
given size 256 instead of 64.
|
| Cary

Pushed into git. The scanmem3 vpi test now passes, so I think we
are in good shape.

*Whew*

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFID33ErPt1Sc2b3ikRAhfsAJ9zTMqOV1sntzRAxBO4wQEPdfhaeQCgkxcD
xlcc0BjNICQ9nWiVI879mJU=
=HQcw
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 18:51:02

--- On Wed, 4/23/08, Stephen Williams <st...@ic...> wrote:

> Pushed into git. The scanmem3 vpi test now passes, so I
> think we
> are in good shape.

We always assumed this was going to be a bit painful. Overall I think things went very well! Now that this has settled down we can start working on optimizing the logical operators and anything else that will benefit from the new vector format. I have worked out many of the equation if you are interested.

Cary

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Stephen W. <st...@ic...> - 2008-04-23 21:28:12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cary R. wrote:
| --- On Wed, 4/23/08, Stephen Williams <st...@ic...> wrote:
|
|> Pushed into git. The scanmem3 vpi test now passes, so I
|> think we
|> are in good shape.
|
| We always assumed this was going to be a bit painful. Overall I think
things went very well! Now that this has settled down we can start
working on optimizing the logical operators and anything else that will
benefit from the new vector format. I have worked out many of the
equation if you are interested.


Yeah, at least post them. I'm curious.

- --
Steve Williams                "The woods are lovely, dark and deep.
steve at icarus.com           But I have promises to keep,
http://www.icarus.com         and lines to code before I sleep,
http://www.picturel.com       And lines to code before I sleep."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFID6nxrPt1Sc2b3ikRAvtfAJ9yxo8w+y0APTDewu7ykCSZlNPylQCfUdeM
yd/W1M5yXLQ0nxdIPLk/ju8=
=JNJN
-----END PGP SIGNATURE-----

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-23 22:35:46

Attachments: two_input.gate.v one_input_gate.v

--- On Wed, 4/23/08, Stephen Williams <st...@ic...> wrote:
 
> Yeah, at least post them. I'm curious.

It has been some time since I looked at this. I believe the 0 array index is the aval and the 1 array index is the bval. The basic two input gates have a comparison to see that they are working correctly. The more complex  and single input gates print the output and the s (for the complex gates is this a H/L) value, so if the s output is 1 you need to translate the normal output. The X versions of the gates just put out an x for the H/L case. I left the cmos gate as an exercise for the user ;-). You can verify the result by reading columns left to right from the specification.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [Iverilog-devel] VVP Performance boots

From: Cary R. <cy...@ya...> - 2008-04-22 15:50:36

It appears that the writemem* problems are created by the 1<<(6-count). It look like the 1 is being interpreted as a two bit wide value instead of the normal integer 32 bits. This means that for values of count less than five the result is zero. The following simple program is producing incorrect results:

module main;
  reg [2:0] val;

  initial begin
    for (val=0; val<=6; val=val+1) begin
      $displayh(val,, 1<<(6-val));
    end
  end
endmodule

This problem appear to be related to a different patch since this is in the compiler not the runtime. I'm not certain when I last ran the tests with a synced source tree, but I know it was not too long ago.

Cary


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

1 2 > >> (Page 1 of 2)