|
From: Stephen W. <st...@ic...> - 2014-12-05 00:34:53
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have received some feedback on the vec4-stack branch, and I've responded to that feedback with some performance improvements. Now, at least for the tests I've run so far, the new vec4-stack branch runtime is significantly faster then that of the master branch. With this milestone, the merge of the vec4-stack branch into master is becoming imminent. I'm thinking maybe early next week. So please give it a go at your convenience, so we can find any issues as soon as possible. Thanks, - -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep." -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlSA/aIACgkQrPt1Sc2b3ikYIQCgxfm1wWFPOTEDEDmdrsLssd3S 0bcAoJY00ERPjoaHsTHMc0u9eUCNB75V =+Jcg -----END PGP SIGNATURE----- |
|
From: Martin W. <mai...@ma...> - 2014-12-05 20:06:13
|
Time taken for my long sequence of short tests: master 22:48 vec4-stack 18:28 Time taken for my short sequence of longer tests: master (lossless) 12:12 vec4-stack (lossless) 9:53 master (strict) 11:49 vec4-stack (strict) 9:35 So performance looks good. Memory use is still a bit of a concern - for the particular test I looked at, vvp reported master 261,016 KB vec4-stack 554,984 KB Stephen Williams wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I have received some feedback on the vec4-stack branch, and I've > responded to that feedback with some performance improvements. > Now, at least for the tests I've run so far, the new vec4-stack > branch runtime is significantly faster then that of the master branch. > > With this milestone, the merge of the vec4-stack branch into master > is becoming imminent. I'm thinking maybe early next week. So please > give it a go at your convenience, so we can find any issues as > soon as possible. > > Thanks, > > - -- > Steve Williams "The woods are lovely, dark and deep. > steve at icarus.com But I have promises to keep, > http://www.icarus.com and lines to code before I sleep, > http://www.picturel.com And lines to code before I sleep." > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2 > > iEYEARECAAYFAlSA/aIACgkQrPt1Sc2b3ikYIQCgxfm1wWFPOTEDEDmdrsLssd3S > 0bcAoJY00ERPjoaHsTHMc0u9eUCNB75V > =+Jcg > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Iverilog-devel mailing list > Ive...@li... > https://lists.sourceforge.net/lists/listinfo/iverilog-devel > |
|
From: Stephen W. <st...@ic...> - 2014-12-05 20:17:01
|
Those are pretty encouraging results. What are you looking at for the memory use? On 12/05/2014 12:06 PM, Martin Whitaker wrote: > Time taken for my long sequence of short tests: > > master 22:48 > vec4-stack 18:28 > > Time taken for my short sequence of longer tests: > > master (lossless) 12:12 > vec4-stack (lossless) 9:53 > > master (strict) 11:49 > vec4-stack (strict) 9:35 > > So performance looks good. Memory use is still a bit of a concern - for the > particular test I looked at, vvp reported > > master 261,016 KB > vec4-stack 554,984 KB > > > Stephen Williams wrote: > > I have received some feedback on the vec4-stack branch, and I've > responded to that feedback with some performance improvements. > Now, at least for the tests I've run so far, the new vec4-stack > branch runtime is significantly faster then that of the master branch. > > With this milestone, the merge of the vec4-stack branch into master > is becoming imminent. I'm thinking maybe early next week. So please > give it a go at your convenience, so we can find any issues as > soon as possible. > > Thanks, > >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Iverilog-devel mailing list >> Ive...@li... >> https://lists.sourceforge.net/lists/listinfo/iverilog-devel >> > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Iverilog-devel mailing list > Ive...@li... > https://lists.sourceforge.net/lists/listinfo/iverilog-devel > -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep." |
|
From: Martin W. <mai...@ma...> - 2014-12-05 20:38:13
|
Memory use is reported by 'vvp -v'. With the master branch, I get
Compiling VVP ...
... VVP file version 0.10.0 (devel) (s20121218-480-g020e280-dirty)
Compile cleanup...
... Linking
... Removing symbol tables
... Compiletf functions
... 9343 functors (net_fun pool=786432 bytes)
2807 logic
0 bufif
0 resolv
2181 signals
... 8707 filters (net_fil pool=1310720 bytes)
... 44817 opcodes (1081344 bytes)
... 5474 nets
... 9343 vvp_nets (1048544 bytes)
... 12 arrays (129 words)
... 125 memories
125 logic (277608 words)
0 real (0 words)
... 407 scopes
... 0.167 seconds, 59244.0/17172.0/2052.0 KBytes size/rss/shared
Running ...
...execute EndOfCompile callbacks
...propagate initialization events
...execute StartOfSim callbacks
...run scheduler
Generating expected results
Running test
...execute Postsim callbacks
... 59.719 seconds, 261020.0/219492.0/2344.0 KBytes size/rss/shared
Event counts:
130139 time steps (pool=128)
2411954 thread schedule events
23563521 assign events
...assign(vec4) pool=18724
...assign(vec8) pool=204
...assign(real) pool=256
...assign(word) pool=128
...assign(word/r) pool=204
9721848 other events (pool=4096)
With the vec4-stack branch, the only differences are:
... 45991 opcodes (1105920 bytes)
... 0.17 seconds, 60368.0/18228.0/2044.0 KBytes size/rss/shared
... 55.939 seconds, 555040.0/513392.0/2356.0 KBytes size/rss/shared
Stephen Williams wrote:
>
> Those are pretty encouraging results.
>
> What are you looking at for the memory use?
>
> On 12/05/2014 12:06 PM, Martin Whitaker wrote:
>> Time taken for my long sequence of short tests:
>>
>> master 22:48
>> vec4-stack 18:28
>>
>> Time taken for my short sequence of longer tests:
>>
>> master (lossless) 12:12
>> vec4-stack (lossless) 9:53
>>
>> master (strict) 11:49
>> vec4-stack (strict) 9:35
>>
>> So performance looks good. Memory use is still a bit of a concern - for the
>> particular test I looked at, vvp reported
>>
>> master 261,016 KB
>> vec4-stack 554,984 KB
>>
>>
>> Stephen Williams wrote:
>>
>> I have received some feedback on the vec4-stack branch, and I've
>> responded to that feedback with some performance improvements.
>> Now, at least for the tests I've run so far, the new vec4-stack
>> branch runtime is significantly faster then that of the master branch.
>>
>> With this milestone, the merge of the vec4-stack branch into master
>> is becoming imminent. I'm thinking maybe early next week. So please
>> give it a go at your convenience, so we can find any issues as
>> soon as possible.
>>
>> Thanks,
>>
>>>
>>> ------------------------------------------------------------------------------
>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>> Get technology previously reserved for billion-dollar corporations, FREE
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Iverilog-devel mailing list
>>> Ive...@li...
>>> https://lists.sourceforge.net/lists/listinfo/iverilog-devel
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Iverilog-devel mailing list
>> Ive...@li...
>> https://lists.sourceforge.net/lists/listinfo/iverilog-devel
>>
>
|
|
From: Stephen W. <st...@ic...> - 2014-12-05 20:58:54
|
OK, that's interesting. I theorize that what is going on here is that with the vec4-stack branch there are more vvp_vector4_t instances running around, and since they allocate some memory to hold the bit values there is a lot of allocation overhead. The way to address that may be to use a custom allocator within the vvp_vector4_t to allocate the bit space. Given that it is running faster then "master" even with the increased memory, I'm willing to accept the cost for now, and put it off until after the Big Merge. On 12/05/2014 12:38 PM, Martin Whitaker wrote: > Memory use is reported by 'vvp -v'. With the master branch, I get > > Compiling VVP ... > ... VVP file version 0.10.0 (devel) (s20121218-480-g020e280-dirty) > Compile cleanup... > ... Linking > ... Removing symbol tables > ... Compiletf functions > ... 9343 functors (net_fun pool=786432 bytes) > 2807 logic > 0 bufif > 0 resolv > 2181 signals > ... 8707 filters (net_fil pool=1310720 bytes) > ... 44817 opcodes (1081344 bytes) > ... 5474 nets > ... 9343 vvp_nets (1048544 bytes) > ... 12 arrays (129 words) > ... 125 memories > 125 logic (277608 words) > 0 real (0 words) > ... 407 scopes > ... 0.167 seconds, 59244.0/17172.0/2052.0 KBytes size/rss/shared > Running ... > ...execute EndOfCompile callbacks > ...propagate initialization events > ...execute StartOfSim callbacks > ...run scheduler > Generating expected results > Running test > ...execute Postsim callbacks > ... 59.719 seconds, 261020.0/219492.0/2344.0 KBytes size/rss/shared > Event counts: > 130139 time steps (pool=128) > 2411954 thread schedule events > 23563521 assign events > ...assign(vec4) pool=18724 > ...assign(vec8) pool=204 > ...assign(real) pool=256 > ...assign(word) pool=128 > ...assign(word/r) pool=204 > 9721848 other events (pool=4096) > > With the vec4-stack branch, the only differences are: > > ... 45991 opcodes (1105920 bytes) > > ... 0.17 seconds, 60368.0/18228.0/2044.0 KBytes size/rss/shared > > ... 55.939 seconds, 555040.0/513392.0/2356.0 KBytes size/rss/shared > > > Stephen Williams wrote: >> >> Those are pretty encouraging results. >> >> What are you looking at for the memory use? >> >> On 12/05/2014 12:06 PM, Martin Whitaker wrote: >>> Time taken for my long sequence of short tests: >>> >>> master 22:48 >>> vec4-stack 18:28 >>> >>> Time taken for my short sequence of longer tests: >>> >>> master (lossless) 12:12 >>> vec4-stack (lossless) 9:53 >>> >>> master (strict) 11:49 >>> vec4-stack (strict) 9:35 >>> >>> So performance looks good. Memory use is still a bit of a concern - for the >>> particular test I looked at, vvp reported >>> >>> master 261,016 KB >>> vec4-stack 554,984 KB >>> -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep." |
|
From: Martin W. <mai...@ma...> - 2014-12-05 21:12:55
|
I've tracked down the construct that is causing the problem - it's a user function call in a continuous assignment statement. Looks like there's a memory leak there. The leak is present in the master branch as well. I've got a simple test case that exposes the problem, so I'll work on a fix. I'll wait until after the Big Merge before I push anything! Stephen Williams wrote: > > OK, that's interesting. I theorize that what is going on here > is that with the vec4-stack branch there are more vvp_vector4_t > instances running around, and since they allocate some memory to > hold the bit values there is a lot of allocation overhead. > The way to address that may be to use a custom allocator within > the vvp_vector4_t to allocate the bit space. > > Given that it is running faster then "master" even with the > increased memory, I'm willing to accept the cost for now, and > put it off until after the Big Merge. > > On 12/05/2014 12:38 PM, Martin Whitaker wrote: >> Memory use is reported by 'vvp -v'. With the master branch, I get >> >> Compiling VVP ... >> ... VVP file version 0.10.0 (devel) (s20121218-480-g020e280-dirty) >> Compile cleanup... >> ... Linking >> ... Removing symbol tables >> ... Compiletf functions >> ... 9343 functors (net_fun pool=786432 bytes) >> 2807 logic >> 0 bufif >> 0 resolv >> 2181 signals >> ... 8707 filters (net_fil pool=1310720 bytes) >> ... 44817 opcodes (1081344 bytes) >> ... 5474 nets >> ... 9343 vvp_nets (1048544 bytes) >> ... 12 arrays (129 words) >> ... 125 memories >> 125 logic (277608 words) >> 0 real (0 words) >> ... 407 scopes >> ... 0.167 seconds, 59244.0/17172.0/2052.0 KBytes size/rss/shared >> Running ... >> ...execute EndOfCompile callbacks >> ...propagate initialization events >> ...execute StartOfSim callbacks >> ...run scheduler >> Generating expected results >> Running test >> ...execute Postsim callbacks >> ... 59.719 seconds, 261020.0/219492.0/2344.0 KBytes size/rss/shared >> Event counts: >> 130139 time steps (pool=128) >> 2411954 thread schedule events >> 23563521 assign events >> ...assign(vec4) pool=18724 >> ...assign(vec8) pool=204 >> ...assign(real) pool=256 >> ...assign(word) pool=128 >> ...assign(word/r) pool=204 >> 9721848 other events (pool=4096) >> >> With the vec4-stack branch, the only differences are: >> >> ... 45991 opcodes (1105920 bytes) >> >> ... 0.17 seconds, 60368.0/18228.0/2044.0 KBytes size/rss/shared >> >> ... 55.939 seconds, 555040.0/513392.0/2356.0 KBytes size/rss/shared >> >> >> Stephen Williams wrote: >>> >>> Those are pretty encouraging results. >>> >>> What are you looking at for the memory use? >>> >>> On 12/05/2014 12:06 PM, Martin Whitaker wrote: >>>> Time taken for my long sequence of short tests: >>>> >>>> master 22:48 >>>> vec4-stack 18:28 >>>> >>>> Time taken for my short sequence of longer tests: >>>> >>>> master (lossless) 12:12 >>>> vec4-stack (lossless) 9:53 >>>> >>>> master (strict) 11:49 >>>> vec4-stack (strict) 9:35 >>>> >>>> So performance looks good. Memory use is still a bit of a concern - for the >>>> particular test I looked at, vvp reported >>>> >>>> master 261,016 KB >>>> vec4-stack 554,984 KB >>>> > > |
|
From: Stephen W. <st...@ic...> - 2014-12-05 22:05:48
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 OK, that's very useful feedback. Given that, and assuming no other complaints, I will merge the vec4-stack branch into git master tomorrow, Saturday, 6 Dec. in the morning. In the mean time, I've made a 20141205 snapshot available in the usual FTP download site. That is the last snapshot before the big merge. On 12/05/2014 01:12 PM, Martin Whitaker wrote: > I've tracked down the construct that is causing the problem - it's > a user function call in a continuous assignment statement. Looks > like there's a memory leak there. The leak is present in the master > branch as well. I've got a simple test case that exposes the > problem, so I'll work on a fix. I'll wait until after the Big Merge > before I push anything! > > Stephen Williams wrote: >> >> OK, that's interesting. I theorize that what is going on here is >> that with the vec4-stack branch there are more vvp_vector4_t >> instances running around, and since they allocate some memory to >> hold the bit values there is a lot of allocation overhead. The >> way to address that may be to use a custom allocator within the >> vvp_vector4_t to allocate the bit space. >> >> Given that it is running faster then "master" even with the >> increased memory, I'm willing to accept the cost for now, and put >> it off until after the Big Merge. >> >> On 12/05/2014 12:38 PM, Martin Whitaker wrote: >>> Memory use is reported by 'vvp -v'. With the master branch, I >>> get >>> >>> Compiling VVP ... ... VVP file version 0.10.0 (devel) >>> (s20121218-480-g020e280-dirty) Compile cleanup... ... Linking >>> ... Removing symbol tables ... Compiletf functions ... 9343 >>> functors (net_fun pool=786432 bytes) 2807 logic 0 bufif 0 >>> resolv 2181 signals ... 8707 filters (net_fil pool=1310720 >>> bytes) ... 44817 opcodes (1081344 bytes) ... 5474 nets >>> ... 9343 vvp_nets (1048544 bytes) ... 12 arrays (129 >>> words) ... 125 memories 125 logic (277608 words) 0 real (0 >>> words) ... 407 scopes ... 0.167 seconds, >>> 59244.0/17172.0/2052.0 KBytes size/rss/shared Running ... >>> ...execute EndOfCompile callbacks ...propagate initialization >>> events ...execute StartOfSim callbacks ...run scheduler >>> Generating expected results Running test ...execute Postsim >>> callbacks ... 59.719 seconds, 261020.0/219492.0/2344.0 KBytes >>> size/rss/shared Event counts: 130139 time steps (pool=128) >>> 2411954 thread schedule events 23563521 assign events >>> ...assign(vec4) pool=18724 ...assign(vec8) pool=204 >>> ...assign(real) pool=256 ...assign(word) pool=128 >>> ...assign(word/r) pool=204 9721848 other events (pool=4096) >>> >>> With the vec4-stack branch, the only differences are: >>> >>> ... 45991 opcodes (1105920 bytes) >>> >>> ... 0.17 seconds, 60368.0/18228.0/2044.0 KBytes >>> size/rss/shared >>> >>> ... 55.939 seconds, 555040.0/513392.0/2356.0 KBytes >>> size/rss/shared >>> >>> >>> Stephen Williams wrote: >>>> >>>> Those are pretty encouraging results. >>>> >>>> What are you looking at for the memory use? >>>> >>>> On 12/05/2014 12:06 PM, Martin Whitaker wrote: >>>>> Time taken for my long sequence of short tests: >>>>> >>>>> master 22:48 vec4-stack 18:28 >>>>> >>>>> Time taken for my short sequence of longer tests: >>>>> >>>>> master (lossless) 12:12 vec4-stack (lossless) 9:53 >>>>> >>>>> master (strict) 11:49 vec4-stack (strict) 9:35 >>>>> >>>>> So performance looks good. Memory use is still a bit of a >>>>> concern - for the particular test I looked at, vvp >>>>> reported >>>>> >>>>> master 261,016 KB vec4-stack 554,984 KB >>>>> >> >> > > > ------------------------------------------------------------------------------ > > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and > Dashboards with Interactivity, Sharing, Native Excel Exports, App > Integration & more Get technology previously reserved for > billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > > _______________________________________________ > Iverilog-devel mailing list Ive...@li... > https://lists.sourceforge.net/lists/listinfo/iverilog-devel > - -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep." -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlSCLDIACgkQrPt1Sc2b3ikUFQCg5ZLBRVAgw8PNl8P2nzgjpRGh kwIAoJcsuRC9AkEtJgG5G1r180b8dyKQ =LeIP -----END PGP SIGNATURE----- |