From: Zoran V. <zv...@ar...> - 2006-12-15 18:50:10
|
Hi! I've tried libumem as Stephen suggested, but it is slower than the regular system malloc. This (libumem) is really geared toward the integration with the mdb (solaris modular debugger) for memory debugging and analysis. But, I've found: http://www.nedprod.com/programs/portable/nedmalloc/index.html and this looks more promising. I have run its (supplied) test and it seems that, at least speedwise, the code is faster than native OS malloc. I will now try to make it working on all platforms that we use (admitently, it will not run correctly if you do not set -DNDEBUG to silence some assertions; this is of course not right and I have to see why/what). Anyways.... perhaps a thing to try out... If you get any breath-taking news with the above, share it here. On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements over the built-in allocator of a factor of 3 (3 times better) with far less system overehad. I cannot say nothing about the fragmentation; this has yet to be tested. Cheers Zoran |
From: Vlad S. <vl...@cr...> - 2006-12-15 19:03:20
|
I also tried Hoard, Google tcmalloc, umem and some other rare mallocs i could find. Still zippy beats everybody, i ran my speed test not threadtest. Will try this one. Zoran Vasiljevic wrote: > Hi! > > I've tried libumem as Stephen suggested, but it is slower > than the regular system malloc. This (libumem) is really > geared toward the integration with the mdb (solaris modular > debugger) for memory debugging and analysis. > > But, I've found: > > http://www.nedprod.com/programs/portable/nedmalloc/index.html > > and this looks more promising. I have run its (supplied) > test and it seems that, at least speedwise, the code is > faster than native OS malloc. I will now try to make it working > on all platforms that we use (admitently, it will not run > correctly if you do not set -DNDEBUG to silence some assertions; > this is of course not right and I have to see why/what). > > Anyways.... perhaps a thing to try out... > > If you get any breath-taking news with the above, share it here. > On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements > over the built-in allocator of a factor of 3 (3 times better) > with far less system overehad. I cannot say nothing about the > fragmentation; this has yet to be tested. > > Cheers > Zoran > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Zoran V. <zv...@ar...> - 2006-12-15 19:09:51
|
On 15.12.2006, at 19:59, Vlad Seryakov wrote: > I also tried Hoard, Google tcmalloc, umem and some other rare > mallocs i > could find. Still zippy beats everybody, i ran my speed test not > threadtest. Will try this one. Important: it is not only raw speed, that is important but also the memory fragmentation (i.e. lack of it). In our app we must frequently reboot the server (each couple of days) otherwise it just bloats. And... we made sure there are no leaks (have purified all libs that we use)... I now have some experience with the (zippy) fragmentation and I will try to make a testbed with this allocator and run it for several days to get some experience. Cheers Zoran |
From: Zoran V. <zv...@ar...> - 2006-12-16 14:00:55
|
On 15.12.2006, at 19:59, Vlad Seryakov wrote: >> >> http://www.nedprod.com/programs/portable/nedmalloc/index.html Hm... not bad at all: This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: Testing standard allocator with 8 threads ... This allocator achieves 2098770.683107ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1974570.587561ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1449969.176647ops/sec under 8 threads Now on a SuSE Linux, a 1.8GHz Intel: Testing standard allocator with 8 threads ... This allocator achieves 1752893.072620ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 2114564.246869ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1460851.824732ops/sec under 8 threads The Tcl library was compiled for threads and uses the zippy allocator. This is how I compiled the test program from the nedmalloc package: gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ local/include -L/usr/local/lib -ltcl8.4g I had to make some tweaks as they have a problem in pthread_islocked() private call. Also, I expanded the testsuite to include Tcl_Alloc/ Tcl_Free in addition. If I run this same thing on other platforms I get more/less same results with one notable exception: o. nedmalloc is always faster then standard or zippy, except on Sun Sparc where the built-in malloc is the fastest o. zippy (Tcl) allocator is always the slowest among the three Now, I imagine, the nedmalloc test program may not be telling all the truth (i.e. may be biased towards nedmalloc)... It would be interesting to see some other metrics... Cheers Zoran |
From: Zoran V. <zv...@ar...> - 2006-12-16 16:14:20
|
On 15.12.2006, at 19:59, Vlad Seryakov wrote: > Will try this one. To aid you (and others): http://www.archiware.com/downloads/nedmalloc_tcl.tar.gz Download and peek at README file. This compiles on all machines I tested and works pretty fine in terms of speed. I haven't tested the memory size nor have any idea about fragmentation, but the speed is pretty good. Just look what this does on the Mac Pro (http://www.apple.com/macpro) which is currently the fastest Mac available: Testing standard allocator with 5 threads ... This allocator achieves 531241.923013ops/sec under 5 threads Testing Tcl allocator with 5 threads ... This allocator achieves 439181.119284ops/sec under 5 threads Testing nedmalloc with 5 threads ... This allocator achieves 4137423.021490ops/sec under 5 threads nedmalloc allocator is 7.788209 times faster than standard Tcl allocator is 0.826706 times faster than standard nedmalloc is 9.420767 times faster than Tcl allocator Hm... if I was not able to get same/similar results on other Mac's, I'd say this is a cheat. But it isn't. Zoran |
From: Zoran V. <zv...@ar...> - 2006-12-16 14:26:20
|
On 16.12.2006, at 15:00, Zoran Vasiljevic wrote: > > On 15.12.2006, at 19:59, Vlad Seryakov wrote: > >>> >>> http://www.nedprod.com/programs/portable/nedmalloc/index.html > > > Hm... not bad at all: This was on a iMac with Intel Dual Core 1.83 Ghz and 512 MB memory Testing standard allocator with 8 threads ... This allocator achieves 319503.459835ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1687884.294403ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 294571.750823ops/sec under 8 threads Hey! I think our customers will love it! I will now try to ditch the zippy and replace it with nedmalloc... Too bad that Tcl as-is does not allow easy snap-in of alternate memory allocators. I think this should be lobbied for. > This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: > > Testing standard allocator with 8 threads ... > This allocator achieves 2098770.683107ops/sec under 8 threads > > Testing nedmalloc with 8 threads ... > This allocator achieves 1974570.587561ops/sec under 8 threads > > Testing Tcl alloc with 8 threads ... > This allocator achieves 1449969.176647ops/sec under 8 threads > > Now on a SuSE Linux, a 1.8GHz Intel: > > Testing standard allocator with 8 threads ... > This allocator achieves 1752893.072620ops/sec under 8 threads > > Testing nedmalloc with 8 threads ... > This allocator achieves 2114564.246869ops/sec under 8 threads > > Testing Tcl alloc with 8 threads ... > This allocator achieves 1460851.824732ops/sec under 8 threads > > > The Tcl library was compiled for threads and uses the zippy > allocator. This is how I compiled the test program from the > nedmalloc package: > > gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ > local/include -L/usr/local/lib -ltcl8.4g > > I had to make some tweaks as they have a problem in pthread_islocked() > private call. Also, I expanded the testsuite to include Tcl_Alloc/ > Tcl_Free > in addition. > > If I run this same thing on other platforms I get more/less same > results with one notable exception: > > o. nedmalloc is always faster then standard or zippy, except on > Sun Sparc > where the built-in malloc is the fastest > > o. zippy (Tcl) allocator is always the slowest among the three > > Now, I imagine, the nedmalloc test program may not be telling all the > truth > (i.e. may be biased towards nedmalloc)... > > It would be interesting to see some other metrics... > > Cheers > Zoran > > > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel |
From: Stephen D. <sd...@gm...> - 2006-12-16 15:28:19
|
On 12/16/06, Zoran Vasiljevic <zv...@ar...> wrote: > > Hey! I think our customers will love it! I will now try to > ditch the zippy and replace it with nedmalloc... Too bad that > Tcl as-is does not allow easy snap-in of alternate memory allocators. > I think this should be lobbied for. > It would be nice to at least have a configure switch for the zippy allocator rather than having to hack up the Makefile. |
From: Stephen D. <sd...@gm...> - 2006-12-16 15:25:17
|
On 12/16/06, Zoran Vasiljevic <zv...@ar...> wrote: > > On 15.12.2006, at 19:59, Vlad Seryakov wrote: > > >> > >> http://www.nedprod.com/programs/portable/nedmalloc/index.html > > > Hm... not bad at all: > This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: > > Testing standard allocator with 8 threads ... > This allocator achieves 2098770.683107ops/sec under 8 threads > > Testing nedmalloc with 8 threads ... > This allocator achieves 1974570.587561ops/sec under 8 threads > > Testing Tcl alloc with 8 threads ... > This allocator achieves 1449969.176647ops/sec under 8 threads > > Now on a SuSE Linux, a 1.8GHz Intel: > > Testing standard allocator with 8 threads ... > This allocator achieves 1752893.072620ops/sec under 8 threads > > Testing nedmalloc with 8 threads ... > This allocator achieves 2114564.246869ops/sec under 8 threads > > Testing Tcl alloc with 8 threads ... > This allocator achieves 1460851.824732ops/sec under 8 threads > > > The Tcl library was compiled for threads and uses the zippy > allocator. This is how I compiled the test program from the > nedmalloc package: > > gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ > local/include -L/usr/local/lib -ltcl8.4g > > I had to make some tweaks as they have a problem in pthread_islocked() > private call. Also, I expanded the testsuite to include Tcl_Alloc/ > Tcl_Free > in addition. > > If I run this same thing on other platforms I get more/less same > results with one notable exception: > > o. nedmalloc is always faster then standard or zippy, except on > Sun Sparc > where the built-in malloc is the fastest > > o. zippy (Tcl) allocator is always the slowest among the three > > Now, I imagine, the nedmalloc test program may not be telling all the > truth > (i.e. may be biased towards nedmalloc)... > > It would be interesting to see some other metrics... > Some other metrics: http://archive.netbsd.se/?ml=OpenLDAP-devel&a=2006-07&t=2172728 The seem, in the end, to go for Google tcmalloc. It wasn't the absolute fastest for their particular set of tests, but had dramatically lower memory usage. Something to think about: does the nedmalloc test include allocating memory in one thread and freeing it in another? Apparently this is tough for some allocators, such as Linux ptmalloc. Naviserver does this. |
From: Zoran V. <zv...@ar...> - 2006-12-16 15:43:48
|
On 16.12.2006, at 16:25, Stephen Deasey wrote: > The seem, in the end, to go for Google tcmalloc. It wasn't the > absolute fastest for their particular set of tests, but had > dramatically lower memory usage. The down side of tcmalloc: only Linux port. The nedmalloc does them all (win, solaris, linux, macosx) as it is written in ANSI-C and designed to be portable. I tested all our Unix boxes and was able to get it running on all of them. And the integration is rather simple, just add: #include <nedmalloc.c> #define malloc nedmalloc #define realloc nedrealloc #define free nedfree I believe this needs to be done in just one Tcl source file. Trickier part: you need to call neddisablethreadcache(0) at every thread exit. The lower memory usage is important of course. Here I have no experience yet. > > Something to think about: does the nedmalloc test include allocating > memory in one thread and freeing it in another? Apparently this is > tough for some allocators, such as Linux ptmalloc. Naviserver does > this. Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. The allocator there will not allow you that. There were some discussions on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) just "inherited" what aolserver had at that time (I believe V4.0) the same what applies to AS applies to Tcl and indirectly to us. |
From: Stephen D. <sd...@gm...> - 2006-12-16 16:15:18
|
On 12/16/06, Zoran Vasiljevic <zv...@ar...> wrote: > > Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. > The allocator there will not allow you that. There were some discussions > on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) > just "inherited" what aolserver had at that time (I believe V4.0) > the same what applies to AS applies to Tcl and indirectly to us. > Yeah, pretty sure. You can only use Tcl objects within a single interp, which is restricted to a single thread, but general ns_malloc'd memory chunks can be passed around between threads. It would suck pretty hard if that wasn't the case. We have a bunch of reference counted stuff, cache values for example, which we share among threads and delete when the reference count drops to zero. You can ns_register_proc from any thread, which needs to ns_free the old value... Here's the (a?) problem: http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html |
From: Zoran V. <zv...@ar...> - 2006-12-16 16:25:33
|
On 16.12.2006, at 17:15, Stephen Deasey wrote: > > Yeah, pretty sure. You can only use Tcl objects within a single > interp, which is restricted to a single thread, but general > ns_malloc'd memory chunks can be passed around between threads. It > would suck pretty hard if that wasn't the case. Interesting... I could swear I read it that you can't just alloc in one and free in other thread using the Tcl allocator. Well, regarding the nedmalloc, I do not know, but I can find out... |
From: Vlad S. <vl...@cr...> - 2006-12-16 16:31:05
|
You can, it moves Tcl_Objs struct between thread and shared pools, same goes with other memory blocks.On thread exit all memory goes to shared pool. Zoran Vasiljevic wrote: > On 16.12.2006, at 17:15, Stephen Deasey wrote: > > >> Yeah, pretty sure. You can only use Tcl objects within a single >> interp, which is restricted to a single thread, but general >> ns_malloc'd memory chunks can be passed around between threads. It >> would suck pretty hard if that wasn't the case. >> > > Interesting... I could swear I read it that you > can't just alloc in one and free in other thread > using the Tcl allocator. > > Well, regarding the nedmalloc, I do not know, but > I can find out... > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > > |
From: Vlad S. <vl...@cr...> - 2006-12-16 16:29:50
|
Instead of using threadspeed or other simple malloc/free test, i used naviserver and Tcl pages as test for allocators. Using ab from apache and stresstest it for thousand requests i test several allocators. And having everything the same except LD_PRELOAD the difference seems pretty clear. Hoard/TCmalloc/Ptmalloc2 all slower than zippy, no doubt. Using threadtest although, tcmalloc was faster than zippy, but in real life it behaves differently. So, i would suggest to you to try hit naviserver with nedmalloc. If it will be always faster than zippy, than you got what you want. Other thinks to watch, after each test see the size of nsd process. I will try nedmaloc as well later today Stephen Deasey wrote: > On 12/16/06, Zoran Vasiljevic <zv...@ar...> wrote: > >> Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. >> The allocator there will not allow you that. There were some discussions >> on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) >> just "inherited" what aolserver had at that time (I believe V4.0) >> the same what applies to AS applies to Tcl and indirectly to us. >> >> > > > Yeah, pretty sure. You can only use Tcl objects within a single > interp, which is restricted to a single thread, but general > ns_malloc'd memory chunks can be passed around between threads. It > would suck pretty hard if that wasn't the case. > > We have a bunch of reference counted stuff, cache values for example, > which we share among threads and delete when the reference count drops > to zero. You can ns_register_proc from any thread, which needs to > ns_free the old value... > > Here's the (a?) problem: > > http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > > |
From: Zoran V. <zv...@ar...> - 2006-12-16 16:37:42
|
On 16.12.2006, at 17:29, Vlad Seryakov wrote: > Instead of using threadspeed or other simple malloc/free test, i used > naviserver and Tcl pages as test for allocators. > Using ab from apache and stresstest it for thousand requests i test > several allocators. And > having everything the same except LD_PRELOAD the difference seems > pretty > clear. Hoard/TCmalloc/Ptmalloc2 all > slower than zippy, no doubt. Using threadtest although, tcmalloc was > faster than zippy, but in real life it behaves differently. > > So, i would suggest to you to try hit naviserver with nedmalloc. If it > will be always faster than zippy, than you got what you want. Other > thinks to watch, after each test see the size of nsd process. > > I will try nedmaloc as well later today Indeed, the best way is to checkout the real application. No test program can give you better picture! As far as this is concerned, I do plan to make this test but it takes some time! I spend the whole day getting the nedmalloc compiling OK on all platform that we use (solaris sparc/x86, mac ppc/x86, linux/x86, win). The next step is to snap it in the Tcl library and try the real application... |
From: Zoran V. <zv...@ar...> - 2006-12-16 18:21:50
|
On 16.12.2006, at 16:25, Stephen Deasey wrote: > Something to think about: does the nedmalloc test include allocating > memory in one thread and freeing it in another? Apparently this is > tough for some allocators, such as Linux ptmalloc. Naviserver does > this. I'm still not 100% ready reading the code but: The Tcl allocator just puts the free'd memory in the cache of the current thread that calls free(). On thread exit, or of the size of the cache exceeds some limit, the content of the cache is appended to shared cache. The memory is never returned to the system, unless it is allocated as a chunk larger that 16K. The nedmalloc does the same but does not move freed memory between the per-thread cache and the shared repository. Instead, the thread cache is emptied (freed) when a thread exits. This must be explicitly called by the user. As I see: all is green. But will pay more attention to that by reading the code more carefully... Perhaps there is some gotcha there which I would not like to discover at the customer site ;-) In nedmalloc you can disable the per-thread cache usage by defining -DTHREADCACHEMAX=0 during compilation. This makes some difference: Testing nedmalloc with 5 threads ... This allocator achieves 16194016.581962ops/sec under 5 threads w/o cache versus Testing nedmalloc with 5 threads ... This allocator achieves 18895753.973492ops/sec under 5 threads with the cache. The THREADCACHEMAX defines the size of the allocation which goes into cache, similarily to the zippy. The default is 8K (vs. 16K with zippy). The above figures were done with max 8K size. If you increase it to 16K the malloc cores :-( Too bad. Still, I believe that for long running processes, the approach of never releasing memory to the OS, as zippy is doing, is suboptimal. Speed here or there, I'd rather save myself process reboots if possible... Bad thing is that Tcl allocator (aka zippy) will not allow me any choice but bloat. And this is becomming more and more important. At some customers site I have observed process sizes of 1.5GB whereas we started with about 80MB. Eh! |
From: Vlad S. <vl...@cr...> - 2006-12-16 18:35:55
|
But if speed is not important to you, you can supply Tcl without zippy, then no bloat, system is returned with reasonable speed, at least on Linux, ptmalloc is not that bad Zoran Vasiljevic wrote: > On 16.12.2006, at 16:25, Stephen Deasey wrote: > >> Something to think about: does the nedmalloc test include allocating >> memory in one thread and freeing it in another? Apparently this is >> tough for some allocators, such as Linux ptmalloc. Naviserver does >> this. > > I'm still not 100% ready reading the code but: > > The Tcl allocator just puts the free'd memory > in the cache of the current thread that calls > free(). On thread exit, or of the size of the > cache exceeds some limit, the content of the cache > is appended to shared cache. The memory is never > returned to the system, unless it is allocated > as a chunk larger that 16K. > > The nedmalloc does the same but does not move > freed memory between the per-thread cache and > the shared repository. Instead, the thread cache > is emptied (freed) when a thread exits. This > must be explicitly called by the user. > > As I see: all is green. But will pay more attention > to that by reading the code more carefully... Perhaps > there is some gotcha there which I would not like to > discover at the customer site ;-) > > In nedmalloc you can disable the per-thread cache > usage by defining -DTHREADCACHEMAX=0 during compilation. > This makes some difference: > > Testing nedmalloc with 5 threads ... > This allocator achieves 16194016.581962ops/sec under 5 threads > > w/o cache versus > > Testing nedmalloc with 5 threads ... > This allocator achieves 18895753.973492ops/sec under 5 threads > > with the cache. The THREADCACHEMAX defines the size of > the allocation which goes into cache, similarily to the > zippy. The default is 8K (vs. 16K with zippy). The above > figures were done with max 8K size. If you increase it > to 16K the malloc cores :-( Too bad. > > Still, I believe that for long running processes, the > approach of never releasing memory to the OS, as zippy > is doing, is suboptimal. Speed here or there, I'd rather > save myself process reboots if possible... > Bad thing is that Tcl allocator (aka zippy) will not > allow me any choice but bloat. And this is becomming > more and more important. At some customers site I have > observed process sizes of 1.5GB whereas we started with > about 80MB. Eh! > > > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Zoran V. <zv...@ar...> - 2006-12-16 18:50:33
|
On 16.12.2006, at 19:31, Vlad Seryakov wrote: > But if speed is not important to you, you can supply Tcl without > zippy, > then no bloat, system is returned with reasonable speed, at least on > Linux, ptmalloc is not that bad Eh... Vlad... On the Mac the nedmalloc outperforms the standard allocator about 25 - 30 times! The same with the zippy. All tested with the supplied test program. I yet have to get real app tested... On other platforms (Linux, Solaris) yes, I can stay with the standard allocator. As the matter of fact, they are close to the nedmalloc +/- about 10-30% (in favour of nedmalloc, except on Sun/sparc). One shoe does not fit all, unfortunately... What I absolutely do not understand is: WHY? I mean, why I get 30 times difference!? It just makes no sense, but it is really true. I am absolutely confused :-(( |
From: Zoran V. <zv...@ar...> - 2006-12-16 21:51:13
|
On 16.12.2006, at 19:31, Vlad Seryakov wrote: > Linux, ptmalloc is not that bad Interestingly. ptmalloc3 (http://www.malloc.de/) and nedmalloc both diverge from dlmalloc (http://gee.cs.oswego.edu/malloc.h) library from Doug lea. Consequently, their performance is similar (nedmalloc being slight faster). I have been able to verify this on the Linux box. |
From: Zoran V. <zv...@ar...> - 2006-12-18 18:33:42
|
On 16.12.2006, at 19:31, Vlad Seryakov wrote: > But if speed is not important to you, you can supply Tcl without > zippy, > then no bloat, system is returned with reasonable speed, at least on > Linux, ptmalloc is not that bad OK. I think I've reached the peace of mind with all this alternate malloc implementations... This is what I found: On all plaforms (except the Mac OSX), it really does not pay to use anything else beside system native malloc. I mean, you can gain some percent of speed with hoard/tcmalloc/nedmalloc/zippy and friends, but you pay this with bloating memory. If you can afford it, then go ahead. I believe, at least from what I've seen from my tests, that zippy is quite fast and you gain very little, if at all (speedwise) by replacing it. You can gain some less memory fragmentation by using something else, but this is not a thing that would make me say: Wow! Exception to that is really Mac OSX. The native Mac OSX malloc sucks tremendously. The speed increase by zippy and nedmalloc are so high that you can really see (without any fancy measurements), how your application flies! The nedmalloc also bloats less than zippy (normally, as it clears per-thread cache on thread exit). So for the Mac (at least for us) I will stick to nedmalloc. It is lightingly fast and reasonably conservative with memory fragmentation. Conclusion: Linux/solaris = use system malloc Mac OSX = use nedmalloc Ah, yes... windows... this I haven't tested but nedmalloc author shows some very interesting numbers on his site. I somehow tend to believe them as some I have seen by myself when experimenting on unix platforms. So, most probably the outcome will be: Windows = use nedmalloc What this means to all of us:? I would say: very little. We know that zippy is bloating and now we know that is reasonably fast and on-pair with most of the other solutions out there. For people concerned with speed, I believe this is the right solution. For people concerned with speed AND memory fragmentation (in that order) the best is to use some alternative malloc routines. For people concerned with fragmentation the best is to stay with system malloc; exception: Mac OSX. There you just need to use something else and nedmalloc is the only thing that compiles (and works) there, to my knowledge. I hope I could help somebody with this report. Cheers Zoran |
From: Stephen D. <sd...@gm...> - 2006-12-18 18:57:34
|
On 12/18/06, Zoran Vasiljevic <zv...@ar...> wrote: > > On 16.12.2006, at 19:31, Vlad Seryakov wrote: > > > But if speed is not important to you, you can supply Tcl without > > zippy, > > then no bloat, system is returned with reasonable speed, at least on > > Linux, ptmalloc is not that bad > > > OK. I think I've reached the peace of mind with all this > alternate malloc implementations... > > This is what I found: > > On all plaforms (except the Mac OSX), it really does > not pay to use anything else beside system native > malloc. I mean, you can gain some percent of speed > with hoard/tcmalloc/nedmalloc/zippy and friends, but you > pay this with bloating memory. Are you saying you tested your app on Linux with native malloc and experienced no fragmentation/bloating? I think some people are experiencing fragmentation problems with ptmalloc -- the Squid and OpenLDAP guys, for example. There's also the malloc-in-one-thread, free-in-another problem, which if your threads don't exit is basically a leak. If it's not a problem for your app then great! Just wondering... > If you can afford it, > then go ahead. I believe, at least from what I've seen > from my tests, that zippy is quite fast and you gain > very little, if at all (speedwise) by replacing it. > You can gain some less memory fragmentation by using > something else, but this is not a thing that would > make me say: Wow! > > Exception to that is really Mac OSX. The native Mac OSX > malloc sucks tremendously. The speed increase by zippy > and nedmalloc are so high that you can really see > (without any fancy measurements), how your application > flies! The nedmalloc also bloats less than zippy (normally, > as it clears per-thread cache on thread exit). Doesn't zippy also clear it's per-thread cache on exit? Actually, did you experiment with exiting the conn threads after X requests? Seems to be one of the things AOL is recommending. One thing I wonder about this is, how do requests average out across all threads? If you set the conn threads to exit after 10,000 requests, will they all quit at roughly the same time causing an extreme load on the server? Also, this is only an option for conn threads. With scheduled proc threads, job threads etc. you get nothing. > So for the Mac (at least for us) I will stick to nedmalloc. > It is lightingly fast and reasonably conservative with > memory fragmentation. > > Conclusion: > > Linux/solaris = use system malloc > Mac OSX = use nedmalloc > > Ah, yes... windows... this I haven't tested but nedmalloc > author shows some very interesting numbers on his site. > I somehow tend to believe them as some I have seen by > myself when experimenting on unix platforms. So, most > probably the outcome will be: > > Windows = use nedmalloc > > What this means to all of us:? I would say: very little. > We know that zippy is bloating and now we know that is > reasonably fast and on-pair with most of the other solutions > out there. For people concerned with speed, I believe this > is the right solution. For people concerned with speed AND > memory fragmentation (in that order) the best is to use some > alternative malloc routines. For people concerned with fragmentation > the best is to stay with system malloc; exception: Mac OSX. > There you just need to use something else and nedmalloc is the > only thing that compiles (and works) there, to my knowledge. > > I hope I could help somebody with this report. > > Cheers > Zoran |
From: Vlad S. <vl...@cr...> - 2006-12-18 19:06:03
|
I tried to run this program, it crahses with all allocators on free when it was allocated in other thread. zippy does it as well, i amnot sure how Naviserver works then. #include <tcl.h> #define MemAlloc ckalloc #define MemFree ckfree int nbuffer = 16384; int nloops = 50000; int nthreads = 4; int gAllocs = 0; void *gPtr = NULL; Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i < nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); // Testing inter-thread alloc/free if (n % 5 == 0) { Tcl_MutexLock(&gLock); if (gPtr != NULL) { MemFree(gPtr); } gPtr = MemAlloc(n); gAllocs++; Tcl_MutexUnlock(&gLock); } } if (ptr != NULL) { MemFree(ptr); } if (gPtr != NULL) { MemFree(gPtr); } } void MemTime() { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i < nthreads; ++i) { Tcl_CreateThread( &tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i < nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } int main (int argc, char **argv) { MemTime(); } > > Doesn't zippy also clear it's per-thread cache on exit? It puts blocks into shared queue which other threads can re-use. But shared cache never gets returned so conn threads exit will not help with memory bloat. -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Stephen D. <sd...@gm...> - 2006-12-18 20:30:28
|
On 12/18/06, Vlad Seryakov <vl...@cr...> wrote: > I tried to run this program, it crahses with all allocators on free when > it was allocated in other thread. zippy does it as well, i amnot sure > how Naviserver works then. I don't think allocate in one thread, free in another is an unusual strategy. Googling around I see a lot of people doing it. There must be some bugs in your program. Here's one: At the end of MemThread() gPtr is checked and freed, but the gMutex is not held. This thread may have finished it's tight loop, but the other 3 threads could still be running. Also, the gPtr is not set to NULL after the free(), leading to a double free when the next thread checks it. > #include <tcl.h> > > #define MemAlloc ckalloc > #define MemFree ckfree > > int nbuffer = 16384; > int nloops = 50000; > int nthreads = 4; > > int gAllocs = 0; > void *gPtr = NULL; > Tcl_Mutex gLock; > > void MemThread(void *arg) > { > int i,n; > void *ptr = NULL; > > for (i = 0; i < nloops; ++i) { > n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); > if (ptr != NULL) { > MemFree(ptr); > } > ptr = MemAlloc(n); > // Testing inter-thread alloc/free > if (n % 5 == 0) { > Tcl_MutexLock(&gLock); > if (gPtr != NULL) { > MemFree(gPtr); > } > gPtr = MemAlloc(n); > gAllocs++; > Tcl_MutexUnlock(&gLock); > } > } > if (ptr != NULL) { > MemFree(ptr); > } > if (gPtr != NULL) { > MemFree(gPtr); > } > } > > void MemTime() > { > int i; > Tcl_ThreadId *tids; > tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); > > for (i = 0; i < nthreads; ++i) { > Tcl_CreateThread( &tids[i], MemThread, NULL, > TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); > } > for (i = 0; i < nthreads; ++i) { > Tcl_JoinThread(tids[i], NULL); > } > } > > int main (int argc, char **argv) > { > MemTime(); > } |
From: Vlad S. <vl...@cr...> - 2006-12-18 20:38:56
|
Still, even without the last free and with mutex around it, it core dumps in free(gPtr) during the loop. Stephen Deasey wrote: > On 12/18/06, Vlad Seryakov <vl...@cr...> wrote: >> I tried to run this program, it crahses with all allocators on free when >> it was allocated in other thread. zippy does it as well, i amnot sure >> how Naviserver works then. > > > I don't think allocate in one thread, free in another is an unusual > strategy. Googling around I see a lot of people doing it. There must > be some bugs in your program. Here's one: > > At the end of MemThread() gPtr is checked and freed, but the gMutex is > not held. This thread may have finished it's tight loop, but the other > 3 threads could still be running. Also, the gPtr is not set to NULL > after the free(), leading to a double free when the next thread checks > it. > > >> #include <tcl.h> >> >> #define MemAlloc ckalloc >> #define MemFree ckfree >> >> int nbuffer = 16384; >> int nloops = 50000; >> int nthreads = 4; >> >> int gAllocs = 0; >> void *gPtr = NULL; >> Tcl_Mutex gLock; >> >> void MemThread(void *arg) >> { >> int i,n; >> void *ptr = NULL; >> >> for (i = 0; i < nloops; ++i) { >> n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); >> if (ptr != NULL) { >> MemFree(ptr); >> } >> ptr = MemAlloc(n); >> // Testing inter-thread alloc/free >> if (n % 5 == 0) { >> Tcl_MutexLock(&gLock); >> if (gPtr != NULL) { >> MemFree(gPtr); >> } >> gPtr = MemAlloc(n); >> gAllocs++; >> Tcl_MutexUnlock(&gLock); >> } >> } >> if (ptr != NULL) { >> MemFree(ptr); >> } >> if (gPtr != NULL) { >> MemFree(gPtr); >> } >> } >> >> void MemTime() >> { >> int i; >> Tcl_ThreadId *tids; >> tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); >> >> for (i = 0; i < nthreads; ++i) { >> Tcl_CreateThread( &tids[i], MemThread, NULL, >> TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); >> } >> for (i = 0; i < nthreads; ++i) { >> Tcl_JoinThread(tids[i], NULL); >> } >> } >> >> int main (int argc, char **argv) >> { >> MemTime(); >> } > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- Vlad Seryakov 571 262-8608 office vl...@cr... http://www.crystalballinc.com/vlad/ |
From: Stephen D. <sd...@gm...> - 2006-12-18 21:08:21
|
On 12/18/06, Vlad Seryakov <vl...@cr...> wrote: > Still, even without the last free and with mutex around it, it core > dumps in free(gPtr) during the loop. OK. Still doesn't mean your program is bug free :-) There's a lot of extra stuff going on in your example program that makes it hard to see what's going on. I simplified it to this: #include <tcl.h> #include <stdlib.h> #include <assert.h> #define MemAlloc ckalloc #define MemFree ckfree void *gPtr = NULL; /* Global pointer to memory. */ void Thread(void *arg) { assert(gPtr != NULL); MemFree(gPtr); gPtr = NULL; } int main (int argc, char **argv) { Tcl_ThreadId tid; int i; for (i = 0; i < 100000; ++i) { gPtr = MemAlloc(1024); assert(gPtr != NULL); Tcl_CreateThread(&tid, Thread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); Tcl_JoinThread(tid, NULL); assert(gPtr == NULL); } } Works for me. I say you can allocate memory in one thread and free it in another. Let me know what the bug turns out to be..! > Stephen Deasey wrote: > > On 12/18/06, Vlad Seryakov <vl...@cr...> wrote: > >> I tried to run this program, it crahses with all allocators on free when > >> it was allocated in other thread. zippy does it as well, i amnot sure > >> how Naviserver works then. > > > > > > I don't think allocate in one thread, free in another is an unusual > > strategy. Googling around I see a lot of people doing it. There must > > be some bugs in your program. Here's one: > > > > At the end of MemThread() gPtr is checked and freed, but the gMutex is > > not held. This thread may have finished it's tight loop, but the other > > 3 threads could still be running. Also, the gPtr is not set to NULL > > after the free(), leading to a double free when the next thread checks > > it. > > > > > >> #include <tcl.h> > >> > >> #define MemAlloc ckalloc > >> #define MemFree ckfree > >> > >> int nbuffer = 16384; > >> int nloops = 50000; > >> int nthreads = 4; > >> > >> int gAllocs = 0; > >> void *gPtr = NULL; > >> Tcl_Mutex gLock; > >> > >> void MemThread(void *arg) > >> { > >> int i,n; > >> void *ptr = NULL; > >> > >> for (i = 0; i < nloops; ++i) { > >> n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); > >> if (ptr != NULL) { > >> MemFree(ptr); > >> } > >> ptr = MemAlloc(n); > >> // Testing inter-thread alloc/free > >> if (n % 5 == 0) { > >> Tcl_MutexLock(&gLock); > >> if (gPtr != NULL) { > >> MemFree(gPtr); > >> } > >> gPtr = MemAlloc(n); > >> gAllocs++; > >> Tcl_MutexUnlock(&gLock); > >> } > >> } > >> if (ptr != NULL) { > >> MemFree(ptr); > >> } > >> if (gPtr != NULL) { > >> MemFree(gPtr); > >> } > >> } > >> > >> void MemTime() > >> { > >> int i; > >> Tcl_ThreadId *tids; > >> tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); > >> > >> for (i = 0; i < nthreads; ++i) { > >> Tcl_CreateThread( &tids[i], MemThread, NULL, > >> TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); > >> } > >> for (i = 0; i < nthreads; ++i) { > >> Tcl_JoinThread(tids[i], NULL); > >> } > >> } > >> > >> int main (int argc, char **argv) > >> { > >> MemTime(); > >> } > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > naviserver-devel mailing list > > nav...@li... > > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > > > > -- > Vlad Seryakov > 571 262-8608 office > vl...@cr... > http://www.crystalballinc.com/vlad/ > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > |
From: Zoran V. <zv...@ar...> - 2006-12-18 21:29:32
|
On 18.12.2006, at 22:08, Stephen Deasey wrote: > > Works for me. > > I say you can allocate memory in one thread and free it in another. Nice. Well I can say that nedmalloc works, that is, that small program runs to end w/o coring when compiled with nedmalloc. Does this prove anything? |