Thread: [Lxr-dev] Status of this project?
Brought to you by:
ajlittoz
From: Paul S. <ps...@ne...> - 2007-03-16 15:38:44
|
Hi all; Is anyone actively working on this project? I see some commits went in in January, but there are a lot of bug reports that aren't being managed (many duplicate bugs and no one seems to be closing them) and a number of the bugs being reported are already fixed in CVS, but there's no release available, or comments about when a release might become available. There are also a few patches that look like they (or the ideas they implement) could be helpful. I have done some work in my local version which has a big performance increase for my source tree, using swish-e (it won't help everyone's environment: it really helps with handling non-text files). I've also found a few other issues, like not all my header files are getting hyperlinked properly. Before I go too much farther with enhancements/fixes I'd like to have an idea on the status of LXR; is it still viable? Are there thoughts about a new release (even just a 0.9.5, not 1.0)? Is there a need/desire for new developers on the team? Or, is everyone just using their own customized version and they're happy with that? Should I start looking more seriously at alternatives such as OpenGrok, etc.? Cheers! -- ----------------------------------------------------------------------------- Paul D. Smith <ps...@ne...> http://netezza.com "Please remain calm--I may be mad, but I am a professional."--Mad Scientist ----------------------------------------------------------------------------- These are my opinions--Netezza takes no responsibility for them. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-18 16:04:51
|
On Fri, 2007-03-16 11:38:34 -0400, Paul Smith <ps...@ne...> wrote: > Is anyone actively working on this project? I see some commits went in > in January, but there are a lot of bug reports that aren't being managed > (many duplicate bugs and no one seems to be closing them) and a number > of the bugs being reported are already fixed in CVS, but there's no > release available, or comments about when a release might become > available. There are also a few patches that look like they (or the > ideas they implement) could be helpful. AFAICT, there's no "active" work being done at this time. The last patch I saw popping up here on the list was some perl5 compatibility stuff for callin exit(), IIRC (which is still unreviewed.) > I have done some work in my local version which has a big performance > increase for my source tree, using swish-e (it won't help everyone's > environment: it really helps with handling non-text files). Great. Care to send some patches for review? > I've also found a few other issues, like not all my header files are > getting hyperlinked properly. Only found the issue or also a fix? > Before I go too much farther with enhancements/fixes I'd like to have an > idea on the status of LXR; is it still viable? Are there thoughts about > a new release (even just a 0.9.5, not 1.0)? Is there a need/desire for > new developers on the team? Or, is everyone just using their own > customized version and they're happy with that? For years (at least for five years or so), there's only one "release" I'd tell people to download: CVS HEAD. So you shouldn't take a released tarball to start. I also think that we'd place a new release. At least there was the GIT backend added, which (taken that the initial scope of LXR is to visualize Linux kernel source code) I think is a somewhat important addition. LXR is, to my knowledge, also the only project in this area, that works. So that's a definitive answer (at least from my point of view) that you're right in starting hacking right here if something doesn't work for you, or that you place bug reports. In the case of bug reports, I'd suggest sending them to the list and not to SourceForge's bug tracker. I don't think anybody pays too much attention to it.... > Should I start looking more seriously at alternatives such as OpenGrok, > etc.? You may choose whatever tool fit your needs, but I'm not really thrilled by OpenGrok. Just look at their web page (http://www.opensolaris.org/os/project/opengrok/). The comparison to other projects is totally bogus. As we'd say in German, they're comparing apples to pears. cscope/ctags look pretty poor compared to LXR and OpenGrok, but they're focussing a totally different problem. It's just not their business presenting web-pages with source code! Finally only look at LXR vs. OpenGrok: Definition search: "the feature may be partly present" in LXR? What the heck?! I think they've been on crack when they wrote that... History search: LXR is /not/ a history browser. It's ment to give access to well pre-defined releases of source code. If you want history display, use ViewCVS or one of the other history displaying tools. Not our business! Shows matching lines: What's this feature, exactly? LXR can show diffs between releases, but honestly, I don't know what they mean here. Hierarchical Search, query syntax like AND, OR, field: LXR doesn't do searching itself. Not LXR's business. LXR is more flexible by using external search engines. Buy a Google appliance (yes, really, you can have your own Google!) and we'll manage to stuff the sources in there! But well-working searching is not something that should be too deeply built into any tool, because there are better tools available... Incremental update: We do that. Interface for SCM: Erm, without interfaces to SCMs, we'd face a hard time displaying anything, right? So we don't have an SCM interface? Heck, I've been working on the wrong project :) open source: Huuuhuhh?! Anybody at home? LXR is GPL? I always thought GPL _is_ an Open Source license. No? Individual file download: They're right. We don't have that. Probably because nobody ever thought it would be useful? Need it? No problem. That'd be easy to add. Changes at directory level: I don't understand this. What's it exactly? Multi language support: Just a matter of placing customized template files. But right now, we only have the English version. So that comparison really looks bogus to me. But they're not mentioning some parts that are somewhat imprortant to me: OpenGrok uses Java (thus has a requirement to a language that probably needs non-free interpreters) while LXR just uses simple plain Apache+PHP. OpenGrok is licensed under CDDL, while LXR is GPL. So OpenGrok ships with a GPL-incompatible license... Finally, OpenGrok needs a Tomcat (or equivalent) running, while LXR just needs a simple Apache instance. So have a look at both projects and decide which of those serves your needs best. MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: http://perl.plover.com/Questions.html the second : |
From: Maximilian W. <ma...@rf...> - 2007-03-18 16:44:49
|
Am Sunday, den 18 March hub Jan-Benedict Glaw folgendes in die Tasten: Hi! [...] > OpenGrok uses Java (thus has a requirement to a language that probably > needs non-free interpreters) while LXR just uses simple plain > Apache+PHP. s/PHP/Perl/ [...] > Finally, OpenGrok needs a Tomcat (or equivalent) running, while LXR > just needs a simple Apache instance. What about a DB-Backend? Ciao Max -- Follow the white penguin. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-18 16:51:05
|
On Sun, 2007-03-18 17:44:40 +0100, Maximilian Wilhelm <ma...@rf...> wro= te: > Am Sunday, den 18 March hub Jan-Benedict Glaw folgendes in die Tasten: > [...] > > OpenGrok uses Java (thus has a requirement to a language that probably > > needs non-free interpreters) while LXR just uses simple plain > > Apache+PHP. >=20 > s/PHP/Perl/ Argh. You're right of course :D > [...] > > Finally, OpenGrok needs a Tomcat (or equivalent) running, while LXR > > just needs a simple Apache instance. >=20 > What about a DB-Backend? Ah, right, we need one of those. MySQL, PostgreSQL, Oracle. Forgot something? But I guess OpenGrok either needs one, too, or they're having a number of ctags files, one for each release. MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: Tr=C3=A4ume nicht von Dein Leben: Lebe Deinen T= raum! the second : |
From: Paul S. <ps...@ne...> - 2007-03-19 02:04:50
Attachments:
lxr.diff
|
On Sun, 2007-03-18 at 17:04 +0100, Jan-Benedict Glaw wrote: > Great. Care to send some patches for review? Sure; here's a patch. It avoids reading the entire contents of the file until after we know we're going to need it. In my source tree I have a number of very large binary files (cross-compilers and similar things), and the old method of reading the entire contents of the file into memory first was extremely wasteful of system resources, not to mention very slow. Personally, I'd like to see the entire LXR::Files class reimplemented. IMO, the right way to do that is to have a class that represents a single file, where you'd run something like: $file = LXR::File->new($path, $release); and get back an object reference for that file. Then you'd run things like: $fh = $file->gethandle(); $size = $file->getsize(); $path = $file->getpath(); etc. The really nice thing about this is that when $file goes out of scope, there's a destructor invoked automatically by Perl that you can use to clean up any open filehandles, temporary copies of files you had to create from CVS or GIT, etc. etc. Currently, that all has to be done by hand which is a pain, not to mention error-prone. I've done a LOT of Perl hacking over the years. If people are amenable to these kinds of changes I'd be happy to make them. |
From: Paul S. <ps...@ne...> - 2007-03-19 02:14:47
|
On Sun, 2007-03-18 at 17:04 +0100, Jan-Benedict Glaw wrote: > > I've also found a few other issues, like not all my header files are > > getting hyperlinked properly. > > Only found the issue or also a fix? Only the issue. I spent about 30 minutes staring at the code and scratching my head trying to figure out where the #include files are linked for C/C++ and couldn't find it. I found the generic.conf file, etc. but I guess I just don't quite understand how it's all used and there's not much in the way of documentation :-/. What I see is that #include'd files are linked if they are in the same directory as the source, but not otherwise. Apparently the system doesn't know how to find non-local headers. To me it seems like a good way to resolve this would be to use the file list generated by swish to locate headers. If we wanted to be fancy we could say that if more than one header of a given name was found, the #include could link to a search for that header name so the user could decide (since we don't have the compiler line to figure it out for real). I don't know much about Glimpse, because it's not free software. If it doesn't have anything like swish's filenames file we could have genxref create one for us. |
From: Paul S. <ps...@ne...> - 2007-03-21 12:54:36
Attachments:
swish-speed.diff
incl-lookup.diff
|
On Wed, 2007-03-21 at 07:21 +0100, Gregor Hartmann wrote: > >Personally, I'd like to see the entire LXR::Files class reimplemented. > >IMO, the right way to do that is to have a class that represents a > >single file > Sounds like a great idea to me. Some time ago I noticed that files are > copied even when they are in the filesystem already (not in cvs). This > could be cleanly ommited then also by accessing the file directly. > Spechial care has to be taken then that the file is not modified and > most important not deleted at the end. After looking further I also think there's also a lot of opportunity to fix the Index class and subclasses: right now LXR::Index is essentially empty, but the reality is that, because of Perl's DBI interface which is common across databases, with only a few minor exceptions all the database access for all different databases will be identical. I think all the content of the LXR::Index::* subclasses should be pushed up into the LXR::Index class, and the subclasses left there only for implementing any DB-specific extensions that are required. This would make it much simpler to support new databases and would ensure that any enhancements made to one DB were available to all without any extra work. ----- Actually the patch I sent before relied on another change I made, that wasn't included. I'm including a new one that doesn't require any changes to the LXR::Files class to support it. I have a number of other cleanups and enhancements I can send. I'm including here a fix to the "not all .h files are indexed" issue I mentioned the other day; after a night's sleep I was able to pretty easily figure out where that cross-referencing was done. My fix looks up filenames in the database if they are not found via the current path lookup methods. If only one file is found that matches, a link to it is added. If more than one file is found that matches, a link to a search for that filename is used as the href. I only include the MySQL version here but the Postgres one is the same. I'm not sure about Oracle... we might have to let someone who has a copy fix it up. Both of these patches will have some fuzz because of other enhancements I've made, but they should apply. |
From: Malcolm B. <mal...@gm...> - 2007-03-22 00:08:32
|
Hi, On 3/18/07, Jan-Benedict Glaw <jb...@lu...> wrote: > On Fri, 2007-03-16 11:38:34 -0400, Paul Smith <ps...@ne...> wrote: > > Is anyone actively working on this project? I see some commits went in > > in January, but there are a lot of bug reports that aren't being managed > > (many duplicate bugs and no one seems to be closing them) and a number > > of the bugs being reported are already fixed in CVS, but there's no > > release available, or comments about when a release might become > > available. There are also a few patches that look like they (or the > > ideas they implement) could be helpful. > > AFAICT, there's no "active" work being done at this time. The last > patch I saw popping up here on the list was some perl5 compatibility > stuff for callin exit(), IIRC (which is still unreviewed.) Don't be so hard on yourself - you merged a GIT backend a couple of months ago! It's true we're woefully overdue for a release - CVS head is reasonably stable and could arguably be made v1.0. I've got a bunch of things I want to experiment with in the browsing experience, but they're post-1.0 features. And require time... > > > I have done some work in my local version which has a big performance > > increase for my source tree, using swish-e (it won't help everyone's > > environment: it really helps with handling non-text files). > > Great. Care to send some patches for review? Indeed! > > Before I go too much farther with enhancements/fixes I'd like to have an > > idea on the status of LXR; is it still viable? Are there thoughts about > > a new release (even just a 0.9.5, not 1.0)? Is there a need/desire for > > new developers on the team? Or, is everyone just using their own > > customized version and they're happy with that? There's both a need and a desire. While lots of people have their own customised versions, there's still a steady stream of downloads and I believe that there's plenty left to do that would be valuable to the majority. > LXR is, to my knowledge, also the only project in this area, that > works. So that's a definitive answer (at least from my point of view) > that you're right in starting hacking right here if something doesn't > work for you, or that you place bug reports. Totally agreed! > > In the case of bug reports, I'd suggest sending them to the list and > not to SourceForge's bug tracker. I don't think anybody pays too much > attention to it.... I read the tracker - and am more likely to remember the bug if I ever get time to fix things than in the mailing list. You'll get more response in the mailing list though - so both is a good option :-) > > Should I start looking more seriously at alternatives such as OpenGrok, > > etc.? Of course not - heresy :-) Cheers, Malcolm |
From: Maximilian W. <ma...@rf...> - 2007-03-22 13:40:44
|
Am Thursday, den 22 March hub Malcolm Box folgendes in die Tasten: Hi! > It's true we're woefully overdue for a release - CVS head is > reasonably stable and could arguably be made v1.0. I've got a bunch > of things I want to experiment with in the browsing experience, but > they're post-1.0 features. And require time... I´ve a yet unfinished patch to push the version and arch lists in drop-down boxes but this breaks the diff function IIRC. After my exams I hopefully find some time to look at this again. Ciao Max -- Follow the white penguin. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-22 15:24:30
|
On Thu, 2007-03-22 14:40:33 +0100, Maximilian Wilhelm <ma...@rf...> wro= te: > Am Thursday, den 22 March hub Malcolm Box folgendes in die Tasten: > > It's true we're woefully overdue for a release - CVS head is > > reasonably stable and could arguably be made v1.0. I've got a bunch > > of things I want to experiment with in the browsing experience, but > > they're post-1.0 features. And require time... That said, I think we should have a view at the documentation again. After that, lets release v1.0 and focus on the new goodies afterwards. > I=C2=B4ve a yet unfinished patch to push the version and arch lists > in drop-down boxes but this breaks the diff function IIRC. I think this should be v1.1 material. The speed-up patches posted could probably make it into v1.0, though. Malcom, what do you think? There's also the exit() compatibility patch for perl5. I think this should also go in before v1.0 is pushed out. MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: Alles wird gut! ...und heute wirds schon ein bi=C3=9F= chen besser. the second : |
From: Paul S. <ps...@ne...> - 2007-03-22 15:49:29
|
On Thu, 2007-03-22 at 16:24 +0100, Jan-Benedict Glaw wrote: > That said, I think we should have a view at the documentation again. > After that, lets release v1.0 and focus on the new goodies afterwards. That sounds fine, but maybe a 0.9.5 release for people to test before 1.0? > > I=B4ve a yet unfinished patch to push the version and arch lists > > in drop-down boxes but this breaks the diff function IIRC. >=20 > I think this should be v1.1 material. Agree. > The speed-up patches posted could probably make it into v1.0, though. The second version of the patch, that doesn't require changes to LXR::Files, I think is a good candidate. It's a simple algorithmic change that only impacts genxref and clearly makes sense, and helps a lot when dealing with swish-e and trees with binary files. > There's also the exit() compatibility patch for perl5. I think this > should also go in before v1.0 is pushed out. Critical to get in, IMO, are the enhancements to allow the code to work with MySQL 5 (the quoting of `release` so it's not considered a keyword). A number of the bugs and patches on the Savannah site all deal with this single issue. There needs to be some cleanup in this as well; note that the lxr.conf file allows you to rename the database and table prefix, which is great... but there's no way to CREATE those databases/tables with different names other than editing the initdb-* file by hand, which is not so great. Maybe the initdb-* files need to become Perl scripts, which can read lxr.conf and make the appropriate substitutions. Then the invocation model could change to something like: ./initdb-mysql | mysql ./initdb-postgres | psql (not sure for Oracle). I can look into that if people think it's a good idea. --=20 ---------------------------------------------------------------------------= -- Paul D. Smith <ps...@ne...> http://netezza.co= m "Please remain calm--I may be mad, but I am a professional."--Mad Scientis= t ---------------------------------------------------------------------------= -- These are my opinions--Netezza takes no responsibility for them. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-22 16:06:30
|
On Thu, 2007-03-22 11:49:11 -0400, Paul Smith <ps...@ne...> wrote: > On Thu, 2007-03-22 at 16:24 +0100, Jan-Benedict Glaw wrote: > > That said, I think we should have a view at the documentation again. > > After that, lets release v1.0 and focus on the new goodies afterwards. >=20 > That sounds fine, but maybe a 0.9.5 release for people to test before > 1.0? Sure. > > There's also the exit() compatibility patch for perl5. I think this > > should also go in before v1.0 is pushed out. >=20 > Critical to get in, IMO, are the enhancements to allow the code to work > with MySQL 5 (the quoting of `release` so it's not considered a > keyword). A number of the bugs and patches on the Savannah site all > deal with this single issue. That may need a fix. If the DB backend gets rewritten to make deeper use of the DBI interface (eg. don't do too much database specific stuff), that'd hopefully be dropped altogether. Renaming the field to avoid the issue could help, too. And a Perl guru could have a look at the documentation to tell us if DBI has a generic interface for prepared statements. > There needs to be some cleanup in this as well; note that the lxr.conf > file allows you to rename the database and table prefix, which is > great... but there's no way to CREATE those databases/tables with > different names other than editing the initdb-* file by hand, which is > not so great. Right. > Maybe the initdb-* files need to become Perl scripts, which can read > lxr.conf and make the appropriate substitutions. Then the invocation > model could change to something like: >=20 > ./initdb-mysql | mysql > ./initdb-postgres | psql >=20 > (not sure for Oracle). I can look into that if people think it's a good > idea. The interesting part here is most probably to correctly deal with dropping the old tables... MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: http://catb.org/~esr/faqs/smart-questions.html the second : |
From: Paul S. <ps...@ne...> - 2007-03-22 16:37:23
|
On Thu, 2007-03-22 at 17:06 +0100, Jan-Benedict Glaw wrote: > > Critical to get in, IMO, are the enhancements to allow the code to work > > with MySQL 5 (the quoting of `release` so it's not considered a > > keyword). A number of the bugs and patches on the Savannah site all > > deal with this single issue. > > That may need a fix. If the DB backend gets rewritten to make deeper > use of the DBI interface (eg. don't do too much database specific > stuff), that'd hopefully be dropped altogether. > > Renaming the field to avoid the issue could help, too. > > And a Perl guru could have a look at the documentation to tell us if > DBI has a generic interface for prepared statements. The only thing I understood completely here was the second paragraph :). Renaming the column is fine with me but would mean we need to advise people who already have databases on how to use ALTER TABLE to fix their schema. Putting on my SCM hat, I'd suggest a good alternative name for "release" is "stream". Probably it's a good practice to have an "upgrade" script that people would invoke to move from an older to a newer version. As far as I'm aware there's no way to "make deeper use of the DBI interface" that would allow us to do something like ignore column names... ? And I'm not sure what you mean by a generic interface for prepared statements. Hm... maybe you mean using placeholders ("?") for column names as well as for values, as in: select ? from lxr_releases where ? = ? If so, that won't work. DBI supports placeholders ONLY for values, not for table/column/database names or other keywords. > The interesting part here is most probably to correctly deal with > dropping the old tables... Which old tables would we want to drop? -- ----------------------------------------------------------------------------- Paul D. Smith <ps...@ne...> http://netezza.com "Please remain calm--I may be mad, but I am a professional."--Mad Scientist ----------------------------------------------------------------------------- These are my opinions--Netezza takes no responsibility for them. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-22 16:57:20
|
On Thu, 2007-03-22 12:37:04 -0400, Paul Smith <ps...@ne...> wrote: > On Thu, 2007-03-22 at 17:06 +0100, Jan-Benedict Glaw wrote: > > > Critical to get in, IMO, are the enhancements to allow the code to wo= rk > > > with MySQL 5 (the quoting of `release` so it's not considered a > > > keyword). A number of the bugs and patches on the Savannah site all > > > deal with this single issue. > >=20 > > That may need a fix. If the DB backend gets rewritten to make deeper > > use of the DBI interface (eg. don't do too much database specific > > stuff), that'd hopefully be dropped altogether. > >=20 > > Renaming the field to avoid the issue could help, too. > >=20 > > And a Perl guru could have a look at the documentation to tell us if > > DBI has a generic interface for prepared statements. >=20 > The only thing I understood completely here was the second paragraph :). About the first paragraph: I haven't had a look at the separate backends for PostgreSQL, MySQL or Oracle. But I guess there actually was a reason to generate a whole class for implementing different backends instead of just using plain DBI. The third paragraph stresses an assumption I have. There could be "prepared statements" inside the individual backends. When you submit something as simple as "select foo, bar from baz where xay=3D123", the DB server parses the request and tries to optimize it. A "prepared statement" allows us to declare that we'll use a specific query several times, but with different values. Even while the actual values differ, the way the tables are searched doesn't. This may save a lot of time. As I guess the backends implement prepared statements *as they are available* in the backends, we'd find a generic way to come up with prepared statements, or we'll have to keep the different backends. > As far as I'm aware there's no way to "make deeper use of the DBI > interface" that would allow us to do something like ignore column > names... ? And I'm not sure what you mean by a generic interface for > prepared statements. A column name must be given. The cruelity of MySQL is that it seems to require '`' instead of '"', what all other RDBMS servers seem to accept... > Hm... maybe you mean using placeholders ("?") for column names as well > as for values, as in: >=20 > select ? from lxr_releases where ? =3D ? >=20 > If so, that won't work. DBI supports placeholders ONLY for values, not > for table/column/database names or other keywords. You can, in theory, use 'select * from ....', but if you're forced to manually fiddle with the table at some given time, you cannot be sure that the order of colums will be stable. The where clause will of course also give you grey hair with `release` ... > > The interesting part here is most probably to correctly deal with > > dropping the old tables... >=20 > Which old tables would we want to drop? The (current) initdb scripts try to first drop table foo, then CREATE it. Databases implemented different dialects to avoid errors (which in turn could break a transaction), like the "drop table foo if exists" stuff... MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: ...und wenn Du denkst, es geht nicht mehr, the second : kommt irgendwo ein Lichtlein her. |
From: Maximilian W. <ma...@rf...> - 2007-03-22 16:52:35
|
Am Thursday, den 22 March hub Paul Smith folgendes in die Tasten: > On Thu, 2007-03-22 at 16:24 +0100, Jan-Benedict Glaw wrote: > > That said, I think we should have a view at the documentation again. > > After that, lets release v1.0 and focus on the new goodies afterwards. > That sounds fine, but maybe a 0.9.5 release for people to test before > 1.0? I think that´s a good idea. > > > I´ve a yet unfinished patch to push the version and arch lists > > > in drop-down boxes but this breaks the diff function IIRC. > > I think this should be v1.1 material. > Agree. More time to hack on it :) > > The speed-up patches posted could probably make it into v1.0, though. > The second version of the patch, that doesn't require changes to > LXR::Files, I think is a good candidate. It's a simple algorithmic > change that only impacts genxref and clearly makes sense, and helps a > lot when dealing with swish-e and trees with binary files. Sound good. > > There's also the exit() compatibility patch for perl5. I think this > > should also go in before v1.0 is pushed out. > Critical to get in, IMO, are the enhancements to allow the code to work > with MySQL 5 (the quoting of `release` so it's not considered a > keyword). A number of the bugs and patches on the Savannah site all > deal with this single issue. Would it be problem to quote it every time? > There needs to be some cleanup in this as well; note that the lxr.conf > file allows you to rename the database and table prefix, which is > great... but there's no way to CREATE those databases/tables with > different names other than editing the initdb-* file by hand, which is > not so great. > Maybe the initdb-* files need to become Perl scripts, which can read > lxr.conf and make the appropriate substitutions. Then the invocation > model could change to something like: > ./initdb-mysql | mysql > ./initdb-postgres | psql I guess these scripts would need the 'baseurl' as parameter to get the correct data from lxr.conf? The use case would be to have several project databases indexed by lxr side by side in the same database but with different table prefixes/names? I´m not that sure if this is really needed, maybe others have some experiences about that? > (not sure for Oracle). I can look into that if people think it's a good > idea. Sound like an interesting approach. Ciao Max -- Follow the white penguin. |
From: Paul S. <ps...@ne...> - 2007-03-22 18:29:46
|
On Thu, 2007-03-22 at 17:52 +0100, Maximilian Wilhelm wrote: > > Critical to get in, IMO, are the enhancements to allow the code to > work > > with MySQL 5 (the quoting of `release` so it's not considered a > > keyword). >=20 > Would it be problem to quote it every time? It's not a problem to quote it, EXCEPT THAT we then cannot combine the backends because (as Jan-Benedict points out below) MySQL doesn't by default accept standard SQL quoting chars. This can be addressed, but it might be simpler to just rename the column instead. > I guess these scripts would need the 'baseurl' as parameter to get the > correct data from lxr.conf? Ooh, good point. > The use case would be to have several project databases indexed by lxr > side by side in the same database but with different table > prefixes/names? > I=B4m not that sure if this is really needed, maybe others have some > experiences about that? It's helpful for testing, at the very least. Also I'm sure there are sites that have multiple completely different codebases they'd like to use with a single instance of LXR. The package does give people the option of using an alternate database names and/or table prefixes; it just seems odd that this configurability is there in the code but not available at the table creation time. Jan-Beneict Glaw <jb...@lu...> wrote: > About the first paragraph: I haven't had a look at the separate > backends for PostgreSQL, MySQL or Oracle. But I guess there actually > was a reason to generate a whole class for implementing different > backends instead of just using plain DBI. I'm not so sure there was a reason, other than maybe an unfamiliarity with Perl class inheritance and/or DBI itself. Looking at the Index backends I can't see any _material_ difference between Postgres and MySQL (I haven't looked at Oracle). They are pretty different in some ways but that difference feels to me more like someone just updated one backend but didn't bother with the other one (e.g., the MySQL backend keeps the prepared statements inside the object, while Postgres uses class variables). If I had to guess based on the code I'd guess that one of the backends was written, then when someone wanted a different database the split was created in a more or less ad hoc way. > The third paragraph stresses an assumption I have. There could be > "prepared statements" inside the individual backends. When you submit > something as simple as "select foo, bar from baz where xay=3D123", the > DB server parses the request and tries to optimize it. > A "prepared statement" allows us to declare that we'll use a specific > query several times, but with different values. Even while the actual > values differ, the way the tables are searched doesn't. This may save > a lot of time. Yes, unquestionably we should keep the prepared statement implementation as we have it today. > As I guess the backends implement prepared statements *as they are > available* in the backends, we'd find a generic way to come up with > prepared statements, or we'll have to keep the different backends. The DBI interface takes care of all this for us, though: it supports the prepare() method and does the best it can with whatever the underlying database supports. At the very least it saves the template in the statement object and binds the values when you run execute(). In other words, almost all those prepare statements are identical across all the backends, because they're all pure SQL using the same DBI interface; they can all be pushed up into a superclass, along with most of the methods. The only real difference I see is in handling the auto-increment fields: I don't know how Oracle does this but MySQL and Postgress handle this very differently. So, that bit would need to be kept local. > The cruelity of MySQL is that it seems to require '`' instead of '"', > what all other RDBMS servers seem to accept... True :-(. This could be the death of the commonality of the prepare() statements (although we could still share at least some of the methods since they just invoke execute()). Actually there is an "ANSI_QUOTES" mode that we could set, that lets you (among other things) use standard "-quoting in MySQL. That might be a valid thing to do. > The (current) initdb scripts try to first drop table foo, then CREATE > it. Databases implemented different dialects to avoid errors (which > in turn could break a transaction), like the "drop table foo if > exists" stuff... True. Note I wasn't suggesting that we would try to unify the initdb-* scripts into one script: there's too much variation in the schema creation operations to make that worthwhile IMO. I just wanted a way to create databases and tables based on the information in lxr.conf, without having to edit the appropriate initdb-* script by hand. --=20 ---------------------------------------------------------------------------= -- Paul D. Smith <ps...@ne...> http://netezza.co= m "Please remain calm--I may be mad, but I am a professional."--Mad Scientis= t ---------------------------------------------------------------------------= -- These are my opinions--Netezza takes no responsibility for them. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-22 19:31:01
|
On Thu, 2007-03-22 14:29:24 -0400, Paul Smith <ps...@ne...> wrote: > The only real difference I see is in handling the auto-increment fields: > I don't know how Oracle does this but MySQL and Postgress handle this > very differently. So, that bit would need to be kept local. Erm, I'm not 100% sure, but as far as I remember, the only difference for "auto_increment" or "serial" or whatever fields is the CREATE TABLE statement, at least for MySQL vs. PostgreSQL. Should be enough to simply omit the auto_increment column's data to get a value generated. What may be different is the way to get the newly auto-generated number back again. But since the INSERT pathes aren't time-critical, we'd just allow to SELECT for the value instead of playing tricks to get it. > > The cruelity of MySQL is that it seems to require '`' instead of '"', > > what all other RDBMS servers seem to accept... >=20 > True :-(. This could be the death of the commonality of the prepare() > statements (although we could still share at least some of the methods > since they just invoke execute()). Heck, this f*ing column name caused so much grief, lets just rename it! Yes, that's somewhat painful and we need a Big Fat Warning in the v1.0 docs that the column needs to be renamed, but I'm all for doing that. > Actually there is an "ANSI_QUOTES" mode that we could set, that lets you > (among other things) use standard "-quoting in MySQL. That might be a > valid thing to do. Is this in the DBI backend or configury on the MySQL server side? > > The (current) initdb scripts try to first drop table foo, then CREATE > > it. Databases implemented different dialects to avoid errors (which > > in turn could break a transaction), like the "drop table foo if > > exists" stuff... >=20 > True. Note I wasn't suggesting that we would try to unify the initdb-* > scripts into one script: there's too much variation in the schema > creation operations to make that worthwhile IMO. But wrapping around somehow should be done to ease creating tables with prefixes. > I just wanted a way to create databases and tables based on the > information in lxr.conf, without having to edit the appropriate initdb-* > script by hand. ACK. MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: Ich hatte in letzter Zeit ein bi=C3=9Fchen viel Rea= litycheck. the second : Langsam m=C3=B6chte ich mal wieder weitertr=C3= =A4umen k=C3=B6nnen. |
From: Paul S. <ps...@ne...> - 2007-03-22 20:13:09
|
On Thu, 2007-03-22 at 20:30 +0100, Jan-Benedict Glaw wrote: > What may be different is the way to get the newly auto-generated > number back again. Yes, this is what I meant. > But since the INSERT pathes aren't time-critical, we'd just allow to > SELECT for the value instead of playing tricks to get it. Hm. Interesting idea. I'm not so sure they aren't time-critical. They don't impact the web user of course, but genxref does take a while to index stuff already :). I had two other performance-related ideas: Genxref performance improvement: Especially when adding a new release, genxref can be very slow. I think it's because of all the indexing that goes on, and the DB re-indexes after every insert. A common method of adding a lot of data to an SQL DB is to put the commands into a file, then load the file all at once with indexing disabled, then re-index everything at the end. I know both MySQL and Postgres support this model (although most likely it's accomplished in different ways) although I've not investigated it thoroughly. So, my idea was changing genxref to do this: instead of adding things to the DB one at a time, it would write out the statements to a file and at the end, import that file with indexing disabled. Web performance improvement: I'm sure someone else has already thought of this, but right now we generate our HTML dynamically every time. It seems to me that this is a prime candidate for caching! Especially for backends that support annotate/blame etc. The annotations on a file won't change unless the contents of the file changes, for the most part (the other possibility is that the symbols in the database changes--as far as I can tell this shouldn't happen normally but if people are worried we can have genxref flush the cache). We have a unique file id already, so we can cache the content using the file id with a typical span out to avoid any single directory being too large. We can compare creation time of the cached file vs. the source file to tell when it's out of date. We can use the access time to clean out old, unused cache entries if we want. Also, we can just cache the actual file content, and leave off the header information; the header info can be added dynamically when the user browses it. That way the same cached copy of a file can be used for different releases, if they all share that fileid. > Heck, this f*ing column name caused so much grief, lets just rename > it! Yes, that's somewhat painful and we need a Big Fat Warning in the > v1.0 docs that the column needs to be renamed, but I'm all for doing > that. I'd be happy with that. It's not actually such a big deal to fix this; you just need a series of ALTER TABLE operations. It's easy enough to add to the readme, or even write an update script. > > Actually there is an "ANSI_QUOTES" mode that we could set, that lets you > > (among other things) use standard "-quoting in MySQL. That might be a > > valid thing to do. > > Is this in the DBI backend or configury on the MySQL server side? It can be set globally or per-session. We'd use per-session obviously. It's set from the client side. I also discovered that there's a DBI method that quotes identifiers like this for you: my $release = $dbh->quote_identifier('release'); That would be the safest way to go, although it's annoyingly verbose. And, there's a DBI get_info() method that lets you ask about all kinds of features of the server, and one of those is the quoting character, so we could get that and use it instead of quotes. But, changing the name sounds good to me! :) -- ----------------------------------------------------------------------------- Paul D. Smith <ps...@ne...> http://netezza.com "Please remain calm--I may be mad, but I am a professional."--Mad Scientist ----------------------------------------------------------------------------- These are my opinions--Netezza takes no responsibility for them. |
From: Jan-Benedict G. <jb...@lu...> - 2007-03-22 20:41:49
|
On Thu, 2007-03-22 16:12:23 -0400, Paul Smith <ps...@ne...> wrote: > I had two other performance-related ideas: >=20 > Genxref performance improvement: > Especially when adding a new release, genxref can be very slow. I think > it's because of all the indexing that goes on, and the DB re-indexes > after every insert. A common method of adding a lot of data to an SQL > DB is to put the commands into a file, then load the file all at once > with indexing disabled, then re-index everything at the end. I know > both MySQL and Postgres support this model (although most likely it's > accomplished in different ways) although I've not investigated it > thoroughly. The basic concept could be simplified to drop PRIMARY KEY constraints and indexes and add them back afterwards. (Though you may fail in case the PRIMARY KEY constraints were ignored in parts of the data...) Not a bright idea... > So, my idea was changing genxref to do this: instead of adding things to > the DB one at a time, it would write out the statements to a file and at > the end, import that file with indexing disabled. COPY could be an alternative, but that requires a file being read directly by the PostgreSQL server. (Don't know how, or whether at all, this is implemented by the other DB backends.) Also, the \copy directive of psql is worth being mentioned. > Web performance improvement: > I'm sure someone else has already thought of this, but right now we > generate our HTML dynamically every time. It seems to me that this is a > prime candidate for caching! Especially for backends that support > annotate/blame etc. The annotations on a file won't change unless the > contents of the file changes, for the most part (the other possibility > is that the symbols in the database changes--as far as I can tell this > shouldn't happen normally but if people are worried we can have genxref > flush the cache). Output shouldn't change at all :) Well, unless the templates get modified. > We have a unique file id already, so we can cache the content using the > file id with a typical span out to avoid any single directory being too > large. We can compare creation time of the cached file vs. the source > file to tell when it's out of date. We can use the access time to clean > out old, unused cache entries if we want. >=20 > Also, we can just cache the actual file content, and leave off the > header information; the header info can be added dynamically when the > user browses it. That way the same cached copy of a file can be used > for different releases, if they all share that fileid. Hopefully, everybody has a nice robots.txt to forbid Google et al. to index the whole thing, once... > > Heck, this f*ing column name caused so much grief, lets just rename > > it! Yes, that's somewhat painful and we need a Big Fat Warning in the > > v1.0 docs that the column needs to be renamed, but I'm all for doing > > that. >=20 > I'd be happy with that. It's not actually such a big deal to fix this; > you just need a series of ALTER TABLE operations. It's easy enough to > add to the readme, or even write an update script. I'm not sure how happy MySQL will be with its foreign keys... > > > Actually there is an "ANSI_QUOTES" mode that we could set, that lets = you > > > (among other things) use standard "-quoting in MySQL. That might be a > > > valid thing to do. > >=20 > > Is this in the DBI backend or configury on the MySQL server side? >=20 > It can be set globally or per-session. We'd use per-session obviously. > It's set from the client side. >=20 > I also discovered that there's a DBI method that quotes identifiers like > this for you: >=20 > my $release =3D $dbh->quote_identifier('release'); >=20 > That would be the safest way to go, although it's annoyingly verbose. > And, there's a DBI get_info() method that lets you ask about all kinds > of features of the server, and one of those is the quoting character, so > we could get that and use it instead of quotes. >=20 > But, changing the name sounds good to me! :) Lets just change the name. I actually don't think such a kludge is worth being done while an easy solution is available. MfG, JBG --=20 Jan-Benedict Glaw jb...@lu... +49-172-7608481 Signature of: Alles sollte so einfach wie m=C3=B6glich gemacht= sein. the second : Aber nicht einfacher. (Einstein) |
From: Paul S. <ps...@ne...> - 2007-03-23 15:12:07
|
On Thu, 2007-03-22 at 21:41 +0100, Jan-Benedict Glaw wrote: > On Thu, 2007-03-22 16:12:23 -0400, Paul Smith <ps...@ne...> wrote: > > > > Genxref performance improvement: > > Especially when adding a new release, genxref can be very slow. I think > > it's because of all the indexing that goes on, and the DB re-indexes > > after every insert. A common method of adding a lot of data to an SQL > > DB is to put the commands into a file, then load the file all at once > > with indexing disabled, then re-index everything at the end. > > The basic concept could be simplified to drop PRIMARY KEY constraints > and indexes and add them back afterwards. (Though you may fail in case > the PRIMARY KEY constraints were ignored in parts of the data...) Not > a bright idea... Yes, and there's an even more important problem I realized after I sent this: many of LXR's tables are indexed through auto-increment values. Of course we cannot know what values these will have until the table is loaded, so we can't really write out a file to load all that data at once. The best we could do would be to update the tables one at a time, being sure to create the basic tables first then reading the key values out of them to create the next table, etc. Seems overly complex, so this project is probably not worth it as things stand. > > Web performance improvement: > > I'm sure someone else has already thought of this, but right now we > > generate our HTML dynamically every time. It seems to me that this is a > > prime candidate for caching! Especially for backends that support > > annotate/blame etc. The annotations on a file won't change unless the > > contents of the file changes, for the most part (the other possibility > > is that the symbols in the database changes--as far as I can tell this > > shouldn't happen normally but if people are worried we can have genxref > > flush the cache). > > Output shouldn't change at all :) Well, unless the templates get > modified. Hopefully most/all of those types of changes can be handled through CSS; that would be my preferred way to do it anyway (I think many already are). I was thinking about symbols: if we re-index and some symbol that we didn't used to know about suddenly becomes available in the database, then any cached files that reference that symbol will not be updated to have a link. For static source trees this can't happen, of course, but LXR is also used to index trees that are still changing (a daily update of the HEAD of some stream/branch for example). Admittedly it's pretty hard to think of how this could happen without the relevant source files changing as well, which would obviously flush the cache. The only way I can see it offhand is if a symbol which used to be contained outside the tree (and so not indexed), suddenly were moved inside. This is such a corner case I'm not sure it's worth catering to at the expense of much performance. I was imagining how this might be implemented and I think that the first step is to create an LXR::File (one object per file) class. That would make the caching interface and code much simpler. This is definitely a post-1.0 thing. > > I'd be happy with that. It's not actually such a big deal to fix this; > > you just need a series of ALTER TABLE operations. It's easy enough to > > add to the readme, or even write an update script. > > I'm not sure how happy MySQL will be with its foreign keys... Hm. It definitely needs to be tested. -- ----------------------------------------------------------------------------- Paul D. Smith <ps...@ne...> http://netezza.com "Please remain calm--I may be mad, but I am a professional."--Mad Scientist ----------------------------------------------------------------------------- These are my opinions--Netezza takes no responsibility for them. |
From: Arne G. G. <ar...@gl...> - 2007-03-22 20:03:01
|
Jan-Benedict Glaw wrote: > What may be different is the way to get the newly auto-generated > number back again. But since the INSERT pathes aren't time-critical, > we'd just allow to SELECT for the value instead of playing tricks to > get it. While indexing is a batch job and as such not time-critical interactive-wise, there's a fair amount of inserts going on when you add a new release of a large project. I don't think the added simplicity is really worth de-optimizing this path. While it probably makes sense to add general versions to the DBI base class, I'd keep the specialized versions in the db-specific classes, at least for the larger tables. -- Arne. |
From: Arne G. G. <ar...@gl...> - 2007-03-22 00:45:38
|
Malcolm Box wrote: > It's true we're woefully overdue for a release - CVS head is > reasonably stable and could arguably be made v1.0. I've got a bunch > of things I want to experiment with in the browsing experience, but > they're post-1.0 features. And require time... I, too, have a handful of experimental changes to LXR that are actually starting to mature into something that might be useful for others. Unfortunately (or not, as the case may be), since I find my own code from 10 years ago both a bit baroque and embarrassing, much of the framework in my tree has been refactored and reworked to such a state that it's not readily compatible with mainline anymore. If there's interest, I'll but the code up somewhere for others to poke at. There might be stuff there that could be rolled back into mainline, and it might even be interesting as a development base for more experimental stuff. -- Arne. |