From: Dan C. P. <dp...@ml...> - 2004-06-28 16:08:42
|
Folks, I've released the nested set tree module discussed recently on this list to CPAN. http://cpan.uwinnipeg.ca/~djcp/DBIx-Tree-NestedSet It may not have propagated to all CPAN mirrors yet. I'm also soliciting feedback on the module at www.perlmonks.org and I plan on getting review from a few other sources as well. The full RFC is at: http://www.perlmonks.org/index.pl?node_id=370205 Thanks! -DJCP -- *-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-* Daniel Collis Puro CTO and Lead Developer, MassLegalServices.org Massachusetts Law Reform Institute 99 Chauncy St., Suite 500 Boston, MA 02111 617-357-0019 ext. 342 dp...@ml... http://www.masslegalservices.org |
From: Martin K. <ma...@pr...> - 2004-07-01 16:59:33
|
I did some benchmarking. Nothing spiffy just checking how fast WebGUI::Page and DBIx::Tree::NestedSet can fetch entire trees from the database. Both were measured without caching. On small trees (I took the WebGUI 6.0.3 default tree) Nested sets are faster: WG::Tree: 0.016 sec per tree NestedSet: 0.006 sec per tree On larger (about 400 pages) trees there's no difference: WG::Tree: 0.555 sec per tree NestedSet: 0.556 sec per tree At first sight this doesn't look to good, but remember that we're gonna cache only the nav trees, and those consist of probably less than 400 nodes. I've got some other suggestions: -While some properties are very easily extracted from NestedSet nodes (like hasDaughter, isDescendant, etc.) others (like isTopLevel, depth, isChild, etc.) require some kind of traversal. For a big part this can be avoided by including ajacency list properties like parentId, and depth in the table. I therefore think it would be a good idea to keep those properties, if we were to go for Nested Sets. -There's no good traversal method included for now. I'm not sure if we need one for webgui though, so that might not prove to be to difficult. To conclude this mail, I've found two issues with the code: -You can't set the name of the id-column, it's alwys called 'id'. -In the DESTROY method the database connection is closed ($dbh->disconnect). This is strange behaviour since you have to pass a dbh to the new method. I think disconnecting a dbh should be done by the person who did iniate in the first place. Martin Dan Collis Puro wrote: >Folks, > >I've released the nested set tree module discussed recently on this list >to CPAN. > >http://cpan.uwinnipeg.ca/~djcp/DBIx-Tree-NestedSet > >It may not have propagated to all CPAN mirrors yet. I'm also soliciting >feedback on the module at www.perlmonks.org and I plan on getting review >from a few other sources as well. > >The full RFC is at: > >http://www.perlmonks.org/index.pl?node_id=370205 > >Thanks! > >-DJCP > > > |
From: Christian H. <ch...@ng...> - 2004-07-01 17:58:29
|
On 2004-07-01, at 18.59, Martin Kamerbeek wrote: > I did some benchmarking. Nothing spiffy just checking how fast > WebGUI::Page and DBIx::Tree::NestedSet can fetch entire trees from the > database. Both were measured without caching. > > On small trees (I took the WebGUI 6.0.3 default tree) Nested sets are > faster: > > WG::Tree: 0.016 sec per tree > NestedSet: 0.006 sec per tree > > On larger (about 400 pages) trees there's no difference: > > WG::Tree: 0.555 sec per tree > NestedSet: 0.556 sec per tree > > At first sight this doesn't look to good, but remember that we're > gonna cache only the nav trees, and those consist of probably less > than 400 nodes. This sounds wrong. Can you tell us more about how you did the benchmark? Some code would be nice too. > > I've got some other suggestions: > -While some properties are very easily extracted from NestedSet nodes > (like hasDaughter, isDescendant, etc.) others (like isTopLevel, depth, > isChild, etc.) require some kind of traversal. For a big part this can > be avoided by including ajacency list properties like parentId, and > depth in the table. I therefore think it would be a good idea to keep > those properties, if we were to go for Nested Sets. If isTopLevel is the same as root, it comes for free. Root node in Nested Set always has lft == 1 Depth also comes for free, just count nodes lft between ancestors lft and rgt. IsChild would require a statement if you use the "orginal" Nested Set. Keeping parentId would only buy you easier comparison when comparing isSibling, isChild and isParent. > -There's no good traversal method included for now. I'm not sure if > we need one for webgui though, so that might not prove to be to > difficult. > > To conclude this mail, I've found two issues with the code: > -You can't set the name of the id-column, it's alwys called 'id'. > -In the DESTROY method the database connection is closed > ($dbh->disconnect). This is strange behaviour since you have to pass a > dbh to the new method. I think disconnecting a dbh should be done by > the person who did iniate in the first place. > > Martin > > Dan Collis Puro wrote: -- Christian Hansen nGmedia +46 40 660 17 50 |
From: JT S. <jt...@pl...> - 2004-07-01 19:38:41
|
I know I haven't been very involved in this particular conversation, but I wanted to quickly give a little feedback. 1) There isn't much left to do in the 6.1 release except for fixing the page tree issue. Everything else will be completed after this weekend. So the faster we can decide on an implementation and implement it the better. 2) As a result of this conversation I realized that there are a lot of people on this list that are way smarter than me. =) I knew that there were some that are a little smarter than me, but holy crap guys. You've blown me away. Thanks for that. 3) At the bare minimum we need to change the current page tree mechanism to not cache the entire page tree all at once. But I think I'd prefer to just replace the page tree system. 4) After reading what's been posted to this list, and some external resources on Nested Sets, I think that they are the best way to go. I'd like to see the page tree implemented using nested sets. And as we go forward, I'd like to see the whole Tree::DAG_Node and WebGUI::Persistent system go away. 5) If we need to implement a page tree caching system under the new nested set method, it cannot cache the entire tree, but rather segments of the tree due to scalability problems with huge page trees. I have recommended caching the parts of the tree associated with each navigation that is defined, but if someone else has a better plan then I'm all ears. Now finally, to my question: Is anybody currently working on, or willing to work on doing this conversion? I can do it, but I'm not as familiar with Nested Sets and I don't want to duplicate efforts. If someone else is doing it, or wants to do it then that's fine by me. If not, I'll start on it next week. We need to get this done so we can move on to the next release. On Thu, 1 Jul 2004 19:57:25 +0200 Christian Hansen <ch...@ng...> wrote: > >On 2004-07-01, at 18.59, Martin Kamerbeek wrote: > >> I did some benchmarking. Nothing spiffy just checking how fast >> WebGUI::Page and DBIx::Tree::NestedSet can fetch entire trees from the >> database. Both were measured without caching. >> >> On small trees (I took the WebGUI 6.0.3 default tree) Nested sets are >> faster: >> >> WG::Tree: 0.016 sec per tree >> NestedSet: 0.006 sec per tree >> >> On larger (about 400 pages) trees there's no difference: >> >> WG::Tree: 0.555 sec per tree >> NestedSet: 0.556 sec per tree >> >> At first sight this doesn't look to good, but remember that we're >> gonna cache only the nav trees, and those consist of probably less >> than 400 nodes. > >This sounds wrong. Can you tell us more about how you did the benchmark? Some code >would be nice too. > >> >> I've got some other suggestions: >> -While some properties are very easily extracted from NestedSet nodes >> (like hasDaughter, isDescendant, etc.) others (like isTopLevel, depth, >> isChild, etc.) require some kind of traversal. For a big part this can >> be avoided by including ajacency list properties like parentId, and >> depth in the table. I therefore think it would be a good idea to keep >> those properties, if we were to go for Nested Sets. > >If isTopLevel is the same as root, it comes for free. Root node in Nested Set always >has lft == 1 > >Depth also comes for free, just count nodes lft between ancestors lft and rgt. > >IsChild would require a statement if you use the "orginal" Nested Set. > >Keeping parentId would only buy you easier comparison when comparing isSibling, isChild >and isParent. > >> -There's no good traversal method included for now. I'm not sure if >> we need one for webgui though, so that might not prove to be to >> difficult. >> >> To conclude this mail, I've found two issues with the code: >> -You can't set the name of the id-column, it's alwys called 'id'. >> -In the DESTROY method the database connection is closed >> ($dbh->disconnect). This is strange behaviour since you have to pass a >> dbh to the new method. I think disconnecting a dbh should be done by >> the person who did iniate in the first place. >> >> Martin >> >> Dan Collis Puro wrote: > > >-- > >Christian Hansen >nGmedia >+46 40 660 17 50 > > > >------------------------------------------------------- >This SF.Net email sponsored by Black Hat Briefings & Training. >Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top >technical experts, no vendor pitches, unmatched networking opportunities. Visit >www.blackhat.com >_______________________________________________ >Pbwebgui-development mailing list >Pbw...@li... >https://lists.sourceforge.net/lists/listinfo/pbwebgui-development JT ~ Plain Black Create like a god, command like a king, work like a slave. |
From: Martin K. <maj...@gm...> - 2004-07-01 20:12:47
|
JT Smith wrote: > I know I haven't been very involved in this particular conversation, > but I wanted to quickly give a little feedback. > > 1) There isn't much left to do in the 6.1 release except for fixing > the page tree issue. Everything else will be completed after this > weekend. So the faster we can decide on an implementation and > implement it the better. True, I'm gonna work on it this weekend. > 3) At the bare minimum we need to change the current page tree > mechanism to not cache the entire page tree all at once. But I think > I'd prefer to just replace the page tree system. That's what I was planning to do. > 4) After reading what's been posted to this list, and some external > resources on Nested Sets, I think that they are the best way to go. > I'd like to see the page tree implemented using nested sets. And as we > go forward, I'd like to see the whole Tree::DAG_Node and > WebGUI::Persistent system go away. The way the nested set module is built this would be the best way. > 5) If we need to implement a page tree caching system under the new > nested set method, it cannot cache the entire tree, but rather > segments of the tree due to scalability problems with huge page trees. > I have recommended caching the parts of the tree associated with each > navigation that is defined, but if someone else has a better plan then > I'm all ears. I think this is the way to go. Another possibility is to let the page tree system handle the caching of requested tree parts. > Now finally, to my question: > > Is anybody currently working on, or willing to work on doing this > conversion? I'm gonna do it this weekend. Martin > > > I can do it, but I'm not as familiar with Nested Sets and I don't want > to duplicate efforts. If someone else is doing it, or wants to do it > then that's fine by me. > > If not, I'll start on it next week. > > We need to get this done so we can move on to the next release. > > > On Thu, 1 Jul 2004 19:57:25 +0200 > Christian Hansen <ch...@ng...> wrote: > >> >> On 2004-07-01, at 18.59, Martin Kamerbeek wrote: >> >>> I did some benchmarking. Nothing spiffy just checking how fast >>> WebGUI::Page and DBIx::Tree::NestedSet can fetch entire trees from >>> the database. Both were measured without caching. >>> >>> On small trees (I took the WebGUI 6.0.3 default tree) Nested sets >>> are faster: >>> >>> WG::Tree: 0.016 sec per tree >>> NestedSet: 0.006 sec per tree >>> >>> On larger (about 400 pages) trees there's no difference: >>> >>> WG::Tree: 0.555 sec per tree >>> NestedSet: 0.556 sec per tree >>> >>> At first sight this doesn't look to good, but remember that we're >>> gonna cache only the nav trees, and those consist of probably less >>> than 400 nodes. >> >> >> This sounds wrong. Can you tell us more about how you did the >> benchmark? Some code would be nice too. >> >>> >>> I've got some other suggestions: >>> -While some properties are very easily extracted from NestedSet >>> nodes (like hasDaughter, isDescendant, etc.) others (like >>> isTopLevel, depth, isChild, etc.) require some kind of traversal. >>> For a big part this can be avoided by including ajacency list >>> properties like parentId, and depth in the table. I therefore think >>> it would be a good idea to keep those properties, if we were to go >>> for Nested Sets. >> >> >> If isTopLevel is the same as root, it comes for free. Root node in >> Nested Set always has lft == 1 >> >> Depth also comes for free, just count nodes lft between ancestors lft >> and rgt. >> >> IsChild would require a statement if you use the "orginal" Nested Set. >> >> Keeping parentId would only buy you easier comparison when comparing >> isSibling, isChild and isParent. >> >>> -There's no good traversal method included for now. I'm not sure if >>> we need one for webgui though, so that might not prove to be to >>> difficult. >>> >>> To conclude this mail, I've found two issues with the code: >>> -You can't set the name of the id-column, it's alwys called 'id'. >>> -In the DESTROY method the database connection is closed >>> ($dbh->disconnect). This is strange behaviour since you have to pass >>> a dbh to the new method. I think disconnecting a dbh should be done >>> by the person who did iniate in the first place. >>> >>> Martin >>> >>> Dan Collis Puro wrote: >> >> >> >> -- >> >> Christian Hansen >> nGmedia >> +46 40 660 17 50 >> >> >> >> ------------------------------------------------------- >> This SF.Net email sponsored by Black Hat Briefings & Training. >> Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital >> self defense, top technical experts, no vendor pitches, unmatched >> networking opportunities. Visit www.blackhat.com >> _______________________________________________ >> Pbwebgui-development mailing list >> Pbw...@li... >> https://lists.sourceforge.net/lists/listinfo/pbwebgui-development > > > > JT ~ Plain Black > > Create like a god, command like a king, work like a slave. > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital > self defense, top technical experts, no vendor pitches, unmatched > networking opportunities. Visit www.blackhat.com > _______________________________________________ > Pbwebgui-development mailing list > Pbw...@li... > https://lists.sourceforge.net/lists/listinfo/pbwebgui-development |
From: Dan C. P. <dp...@ml...> - 2004-07-01 23:35:49
|
<big_motha_of_a_snip> I will release a new version of DBIx::Tree::NestedSet this weekend, one that: 1) Abstracts some of the SQL to make it more DBD independent, ala CGI::Session (though the only currently implemented driver is MySQL, though to implement a Postgresql driver would be 5-10 lines of code. Volunteers? PLEASE contact me off-list.), 2) Gives you control over what the name of the id is. Dammit. What's wrong with "id"? 3) Doesn't disconnect the $dbh. Oops. Dumb. 4) Anything else? PLEASE don't let DBIx::Tree::NestedSet hold up WebGUI. Here's my thoughts on the whole matter: No matter what engine you use to create trees, the SQL is going to be the slowest part of your app. It almost always is in any mod_perl app. That means, no matter what, you're going to need to implement caching of your page tree, and the only way to do it Right is to have a granular cache. So let's play with D::T::NestedSet, by all means, but not get too hung up on it. If the Tree stuff can be abstracted enough to make D::T::NestedSet easy to drop in, that'd be great. Gives us the benefit of a quick release with a little more breathing room to play with D::T::NestedSet in the WebGUI environment. Thanks. I'm glad you're all interested and I'm glad I can contribute something to the goodness that is WebGUI. An a completely different note, I just launched a new WebGUI site at: http://www.newvotersproject.org I'm not completely happy with the design (that's not my bag anyway, I host it and did all the template stuff) and it'll probably be changing, but the client is very happy so far. -DJCP -- *-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-*^*-._.-* Daniel Collis Puro CTO and Lead Developer, MassLegalServices.org Massachusetts Law Reform Institute 99 Chauncy St., Suite 500 Boston, MA 02111 617-357-0019 ext. 342 dp...@ml... http://www.masslegalservices.org |
From: JT S. <jt...@pl...> - 2004-07-02 03:16:42
|
>http://www.newvotersproject.org Great site. I just linked it on the main webgui page under featured sites. You should do a small write-up on it and post it in our news section. It helps promote WebGUI, and at the same time you can show your client that you're going the extra mile giving them good publicity. JT ~ Plain Black Create like a god, command like a king, work like a slave. |
From: Martin K. <ma...@pr...> - 2004-07-02 09:21:44
|
Dan Collis Puro wrote: >2) Gives you control over what the name of the id is. Dammit. What's >wrong with "id"? > > > Because for WebGUI, for instance, the id i need is called 'pageId' and not 'id'. |
From: Christian H. <ch...@ng...> - 2004-07-02 01:30:23
|
On 2004-07-01, at 20.52, JT Smith wrote: [...] > 3) At the bare minimum we need to change the current page tree > mechanism to not cache the entire page tree all at once. But I think > I'd prefer to just replace the page tree system. I agree, replacing the tree model seems to be the smartest choice. > 4) After reading what's been posted to this list, and some external > resources on Nested Sets, I think that they are the best way to go. > I'd like to see the page tree implemented using nested sets. And as we > go forward, I'd like to see the whole Tree::DAG_Node and > WebGUI::Persistent system go away. Sounds good. > 5) If we need to implement a page tree caching system under the new > nested set method, it cannot cache the entire tree, but rather > segments of the tree due to scalability problems with huge page trees. > I have recommended caching the parts of the tree associated with each > navigation that is defined, but if someone else has a better plan then > I'm all ears. I don't think that caching the tree will be necessary. What if all the required pages to build the navigation would cost six sql statements? Still worth the caching? Here is a crude example, to give an idea: my $root = Class->root; my $current = Class->node_by_id(1234); # main menu for my $child ( $root->children ) { print $child->title; unless ( $root->is_right_child($child) ) { print ' | '; } } # sub menu my ($ancestor) = $current->ancestors( depth => 1 ); for my $decendant ( $ancestor->descendants( max_depth => 3 ) ) { printf( "%s %s", " " x ( $decendant->depth - 2 ), $decendant->title ); } # crumbtrail print join( " > ", map { $_->title } reverse $current->ancestors ), "\n"; This will cost: - one statement for $root - one statement for $current - one statement for $root->children - one statement for $ancestor - one statement for $ancestor->descendants - one statement for $current->ancestors Six statements in total, thats cheap IMO. I have some other ideas for caching, but I'll have to dig in to WebGUI source first to see if it's possible. I don't want to make a fool of myself ;) > Now finally, to my question: > > Is anybody currently working on, or willing to work on doing this > conversion? I'll be happy to help out as far as I can. I'm in a weird legal situation right now, so I'll get back within a couple of days on what i can contribute. [...] -- Christian Hansen nGmedia +46 40 660 17 50 |
From: JT S. <jt...@pl...> - 2004-07-02 03:22:58
|
>I don't think that caching the tree will be necessary. > >This will cost: > - one statement for $root > - one statement for $current > - one statement for $root->children > - one statement for $ancestor > - one statement for $ancestor->descendants > - one statement for $current->ancestors > >Six statements in total, thats cheap IMO. Agreed. It is cheap, especially when compared with what we're doing now. However, on a busy system you can keep adding web nodes that can each have their own cache. It's a lot harder to set up a reliable database cluster. For performance we don't really need the cache, but I think for scalability we do. JT ~ Plain Black Create like a god, command like a king, work like a slave. |
From: Martin K. <maj...@gm...> - 2004-07-01 20:04:25
|
Christian Hansen wrote: > This sounds wrong. Can you tell us more about how you did the > benchmark? Some code would be nice too. Sure, I just hook both things up to the database, time how long it takes to get all pages out of the database, repeat that 99 times and take the average. The code is used: /--- benchmark script for WebGUI::Page ---/ #!/usr/bin/perl our ($webguiRoot, $configFile); BEGIN { $configFile = "WebGUI-zes.conf"; $webguiRoot = "/data/domains/zes"; unshift (@INC, $webguiRoot."/lib"); } #-----------------DO NOT MODIFY BELOW THIS LINE-------------------- #use CGI::Carp qw(fatalsToBrowser); use strict; use WebGUI; use WebGUI::Session; use WebGUI::Page; use Tree::DAG_Node; use WebGUI::SQL; use Time::HiRes qw(gettimeofday); use DBIx::Tree::NestedSet; WebGUI::Session::open($webguiRoot, $configFile); my $total; for (1..100) { my $start = gettimeofday; WebGUI::Page->getPage; my $stop = gettimeofday; $total += ($stop - $start); print "Excuted $_ times: runlenght =".($stop-$start)." ; avg = ".($total/$_)."\n"; } print "\n\n". ($total / 100) . "\n\n"; /--- benchmarscript for DBIx::Tree::NestedSet --- #!/usr/bin/perl our ($webguiRoot, $configFile); BEGIN { $configFile = "WebGUI-zes.conf"; $webguiRoot = "/data/domains/zes"; unshift (@INC, $webguiRoot."/lib"); } #-----------------DO NOT MODIFY BELOW THIS LINE-------------------- #use CGI::Carp qw(fatalsToBrowser); use strict; use WebGUI; use WebGUI::Session; use WebGUI::Page; use Tree::DAG_Node; use WebGUI::SQL; use Time::HiRes qw(gettimeofday); use DBIx::Tree::NestedSet; WebGUI::Session::open($webguiRoot, $configFile); my $total; for (1..100) { my $start = gettimeofday; my $ns = DBIx::Tree::NestedSet->new( dbh => $session{dbh}, left_column_name => 'lft', right_column_name => 'rgt', table_name => 'page' ); my $ds = $ns->get_self_and_children_flat(id=>$ns->get_root); my $stop = gettimeofday; $total += ($stop - $start); print "Excuted $_ times: runlenght =".($stop-$start)." ; avg = ".($total/$_)."\n"; } print "\n\n". ($total / 100) . "\n\n"; > >> I've got some other suggestions: >> -While some properties are very easily extracted from NestedSet >> nodes (like hasDaughter, isDescendant, etc.) others (like isTopLevel, >> depth, isChild, etc.) require some kind of traversal. For a big part >> this can be avoided by including ajacency list properties like >> parentId, and depth in the table. I therefore think it would be a >> good idea to keep those properties, if we were to go for Nested Sets. > > > If isTopLevel is the same as root, it comes for free. Root node in > Nested Set always has lft == 1 Well it's not. You could see the page tree as forrest hold together by a dummy root. So if you also count the dummy root a toplevelpage is a child of a (webgui)root page which in turn is a child of the dummy root (page id: 0) > > Depth also comes for free, just count nodes lft between ancestors lft > and rgt. Which would take a traversal of some kind. > > IsChild would require a statement if you use the "orginal" Nested Set. > > Keeping parentId would only buy you easier comparison when comparing > isSibling, isChild and isParent. It was the whole idea to make those things easier and faster, so this would be a good thing ;) |
From: Christian H. <ch...@ng...> - 2004-07-02 00:57:11
|
On 2004-07-01, at 22.14, Martin Kamerbeek wrote: > Christian Hansen wrote: > >> This sounds wrong. Can you tell us more about how you did the >> benchmark? Some code would be nice too. > > Sure, I just hook both things up to the database, time how long it > takes to get all pages out of the database, repeat that 99 times and > take the average. I'm sorry. I did misread your previous mail. Try doing the same benchmark and pull a subtree in the middle of the tree, that would probably be a more fair benchmark. Please also consider using cmpthese from Benchmark.pm module that ships with perl, it will report some valuable cpu stats. If you don't have the time, I could whip up something by the weekend. [...] >> >>> I've got some other suggestions: >>> -While some properties are very easily extracted from NestedSet >>> nodes (like hasDaughter, isDescendant, etc.) others (like >>> isTopLevel, depth, isChild, etc.) require some kind of traversal. >>> For a big part this can be avoided by including ajacency list >>> properties like parentId, and depth in the table. I therefore think >>> it would be a good idea to keep those properties, if we were to go >>> for Nested Sets. >> >> >> If isTopLevel is the same as root, it comes for free. Root node in >> Nested Set always has lft == 1 > > Well it's not. You could see the page tree as forrest hold together by > a dummy root. So if you also count the dummy root a toplevelpage is a > child of a (webgui)root page which in turn is a child of the dummy > root (page id: 0) Is there a reason to keep the the "forrest" model if we are going for the Nested Set model? >> >> Depth also comes for free, just count nodes lft between ancestors lft >> and rgt. > > Which would take a traversal of some kind. Yes. >> >> IsChild would require a statement if you use the "orginal" Nested Set. >> >> Keeping parentId would only buy you easier comparison when comparing >> isSibling, isChild and isParent. > > It was the whole idea to make those things easier and faster, so this > would be a good thing ;) I agree :) -- Christian Hansen nGmedia +46 40 660 17 50 |
From: Martin K. <ma...@pr...> - 2004-07-02 10:40:55
|
Christian Hansen wrote: > I'm sorry. I did misread your previous mail. Try doing the same > benchmark and pull a subtree in the middle of the tree, that would > probably be a more fair benchmark. > > Please also consider using cmpthese from Benchmark.pm module that > ships with perl, it will report some valuable cpu stats. > I actuallly never used this module before Results for a small tree: Benchmark: running Ajacency, Nested Set for at least 3 CPU seconds... Ajacency: 3 wallclock secs ( 2.99 usr + 0.04 sys = 3.03 CPU) @ 67.33/s (n=204) Nested Set: 8 wallclock secs ( 2.91 usr + 0.11 sys = 3.02 CPU) @ 419.21/s (n=1266) Rate Ajacency Nested Set Ajacency 67.3/s -- -84% Nested Set 419/s 523% -- Results for a big tree: Benchmark: running Ajacency, Nested Set for at least 3 CPU seconds... Ajacency: 4 wallclock secs ( 3.27 usr + 0.03 sys = 3.30 CPU) @ 1.82/s (n=6) Nested Set: 36 wallclock secs ( 3.03 usr + 0.05 sys = 3.08 CPU) @ 20.45/s (n=63) Rate Ajacency Nested Set Ajacency 1.82/s -- -91% Nested Set 20.5/s 1025% -- The numbers for the ajency list are consistent with the timing benchmark I did: 67.3/s => 0.015 sec 1.82/s => 0.550 sec The results I get for the nested set module are: 419/s => 0.0024 sec, this is off by factor 3 20.5/s => 0.049 sec, this is off by about factor 10! The mesurements for the Nested Set module are not consistent with the timings i posted earlier on. Please not that these values of the nested set module are taken without using the get_root method. Measuring with using get_root brings the timeing for the small tree on about 3 miliseconds. Thats still a factor 2 difference, but it's only off by 3 ms. The big tree with using get_root, however, has a runtime of 0.050 which still differs an order of magnitude. Someone knows why this is this way? /--- The code I used: ---/ #!/usr/bin/perl our ($webguiRoot, $configFile); BEGIN { $configFile = "WebGUI-zes.conf"; $webguiRoot = "/data/domains/zes"; unshift (@INC, $webguiRoot."/lib"); } #-----------------DO NOT MODIFY BELOW THIS LINE-------------------- #use CGI::Carp qw(fatalsToBrowser); use strict; use WebGUI; use WebGUI::Session; use WebGUI::Page; use Tree::DAG_Node; use WebGUI::SQL; use Time::HiRes qw(gettimeofday); use DBIx::Tree::NestedSet; use Benchmark qw(:all); WebGUI::Session::open($webguiRoot, $configFile); my ($result, $count); $result = timethese( $count, { 'Ajacency' => sub {WebGUI::Page->getPage;}, 'Nested Set' => sub { my $ns = DBIx::Tree::NestedSet->new( dbh => $session{dbh}, left_column_name => 'lft', right_column_name => 'rgt', table_name => 'page' ); my $ds = $ns->get_self_and_children_flat(id=>$ns->get_root); } } ); cmpthese($result); /--- end of code ---/ > If you don't have the time, I could whip up something by the weekend. If something is flawed with my method, please do so. > > [...] > >>> >>>> I've got some other suggestions: >>>> -While some properties are very easily extracted from NestedSet >>>> nodes (like hasDaughter, isDescendant, etc.) others (like >>>> isTopLevel, depth, isChild, etc.) require some kind of traversal. >>>> For a big part this can be avoided by including ajacency list >>>> properties like parentId, and depth in the table. I therefore think >>>> it would be a good idea to keep those properties, if we were to go >>>> for Nested Sets. >>> >>> >>> >>> If isTopLevel is the same as root, it comes for free. Root node in >>> Nested Set always has lft == 1 >> >> >> Well it's not. You could see the page tree as forrest hold together >> by a dummy root. So if you also count the dummy root a toplevelpage >> is a child of a (webgui)root page which in turn is a child of the >> dummy root (page id: 0) > > > Is there a reason to keep the the "forrest" model if we are going for > the Nested Set model? The way pages work in WebGUI force you to use a forrest model. Every page-root is actually a seperate tree. Those are bound together with a (non-exeistent) node with pageId=0. If you want to drop the forrest model, you'll end up having one (existent) 'master' page (ie. only one page root) under which alle the other pages are hung. >>> >>> Depth also comes for free, just count nodes lft between ancestors >>> lft and rgt. >> >> >> Which would take a traversal of some kind. > > > Yes. > So it doesn't come for free, which it does when you calc the depth a creation/modification time and put it in the db as a property. Martin |
From: Christian H. <ch...@ng...> - 2004-07-02 17:14:12
Attachments:
benchmark.pl
|
On 2004-07-02, at 12.40, Martin Kamerbeek wrote: [...] > The mesurements for the Nested Set module are not consistent with the=20= > timings i posted earlier on. Please not that these values of the=20 > nested set module are taken without using the get_root method.=20 > Measuring with using get_root brings the timeing for the small tree on=20= > about 3 miliseconds. Thats still a factor 2 difference, but it's only=20= > off by 3 ms. > > The big tree with using get_root, however, has a runtime of 0.050=20 > which still differs an order of magnitude. Someone knows why this is=20= > this way? Try running timethese( 100, ...) or any other number. I'm not sure if=20 that will fix it but it=B4s worth a try. Please also try the attached benchmark and see if you get the same=20 inconsistency. [...] > >> If you don't have the time, I could whip up something by the weekend. > > If something is flawed with my method, please do so. It looks that you are pulling the hole tree, but i could be wrong. [...] >> Is there a reason to keep the the "forrest" model if we are going for=20= >> the Nested Set model? > > The way pages work in WebGUI force you to use a forrest model. Every=20= > page-root is actually a seperate tree. Those are bound together with a=20= > (non-exeistent) node with pageId=3D0. If you want to drop the forrest=20= > model, you'll end up having one (existent) 'master' page (ie. only one=20= > page root) under which alle the other pages are hung. Yes, but same result could be achieved with one root. It will also make=20= things easier when moving nodes. =46rom a performance view, the forrest=20= model is better as it will lower the number of affected rows when=20 updating. >>>> >>>> Depth also comes for free, just count nodes lft between ancestors=20= >>>> lft and rgt. >>> >>> >>> Which would take a traversal of some kind. >> >> >> Yes. >> > So it doesn't come for free, which it does when you calc the depth a=20= > creation/modification time and put it in the db as a property. It does, try this: SELECT node.*, ( COUNT(node.lft) - 1 ) AS depth FROM tree AS node, tree AS ancestor WHERE node.lft BETWEEN ancestor.lft AND ancestor.rgt GROUP BY node.lft ORDER BY node.lft --=20 Christian Hansen nGmedia +46 40 660 17 50 |
From: JT S. <jt...@pl...> - 2004-07-03 15:53:28
|
>>> Is there a reason to keep the the "forrest" model if we are going for >>> the Nested Set model? >> >> The way pages work in WebGUI force you to use a forrest model. Every >> page-root is actually a seperate tree. Those are bound together with a >> (non-exeistent) node with pageId=0. If you want to drop the forrest >> model, you'll end up having one (existent) 'master' page (ie. only one >> page root) under which alle the other pages are hung. > >Yes, but same result could be achieved with one root. It will also make things easier >when moving nodes. From a performance view, the forrest model is better as it will >lower the number of affected rows when updating. If it helps anything, there's no reason we couldn't put a master root page in the database with an ID of 0 and a parent ID of -1. JT ~ Plain Black Create like a god, command like a king, work like a slave. |