I have been using 0.9.6 with good results on several fedora versions. I could run a full genxref on several projects and version and it would only take a few minutes. This was true up until Jan 2012 with fedora 16. Then at some point an update to f16 caused the time to complete genxref to go to many hours. Just to add a single project took over an hour to add the version.
I am sure nothing has changed in my version of genxref. Also, mysql has not changed; only the kernel I think. However, I don't know how to tell where the slowdown is coming from. Looking with top, it appear that mysql and genxref are running with 1 or 2 % cpu and with little memory usage. However, it appears that the system is waiting for IO about 33% of the time. This disk LED is on almost continuously but little disk access noise is heard.
Doing a strace -p <genxref-pid>, see lots of write, read and poll system calls. Its almost like what was handled with larger read/writes are now occurring in many smaller segments.
In any case, genxref eventually finishes and lxr works fine when browsing code.
I am running F16 too with very frequent updates. My LXR version is 0.11-beta (not publicly available yet) but the figures below would be the same for 0.10.2.
On my high-end computer (3.3Ghz i5 4 cores, 4 GB memory, fast I/O), indexing the 3.1 Linux Kernel with full Glimpse free-text search ability took 3h 43mn. I have no yardstick to compare to. That was only a trial to check the incprefix/maps include feature with a tree needing more than one variable (namely $v and $a). Is this your order of magnitude?
Indexing an LXR tree with versions 0.9.8, 0.9.9, 0.10, current and LXRng (I know, it is not the same, not a member of family, but it is convenient to see the differences) takes 4.5s. When I process the same LXR tree on my old laptop (650MHz PIII, 512MB memory, standards PATA I/O), it takes 39s (which is rougly consistent with clock ratio plus some swapping) but I didn't upgrade it to F16; it is still under F14.
Could you give some information on your project? Global size either as number of lines ou MB, total number of files and directories (even approximate figures will do), language(s), free-text search engine, …
I had a quick check on 0.9.6 in CVS (because I am not familiar with this version, I took over afterwards). It was issued after the major incompatibility issue in MySQL (RELEASE becoming a keyword while it was used as a table field identifier in the schema), it should then be safe.
I looked into my yum log. Though it does not go back long (because of frequent updates, several times a week), the latest MySQL update was on Feb. 9th from 5.5.19 to 5.5.20. I seem to remeber there were other updates since I installed F16.
Thanks for the reply. Here's what I originally wrote regarding this on Feb 5:
This actually got slow before the last mysql update; however, I mistakenly imply in the above post that I thought it had been updated during the previous week. On weekend before (about Jan 29) genxref worked at normall speed on f16.
I tried to roll back with yum but was unable to due to a library update glitch fedora introduced during that previous week. I also booted with the oldest kernel still on the grub list but it had no effect on the speed.
I don't know how many lines my projects (versions in LXR parlance) are. Some are small other as fairly large but none alone are the size of the linux kernel. But I also see that on f14 on another machine they still, as a group, index in a few minutes but on f16 it (now) takes hours per project and the whole set of project is an overnight or longer activity.
Here's the combined size of my "lxr projects". However, several of these I am no longer indexing and some may contain binary files:
12247 files, 2044 sub-folders
When I was indexing all of this, a full genxref took maybe 1/2 hour max…?
I have always been using glimpse for text search.
What puzzles me is your project has the size order of a kernel versions (for 3.1 414 MB, 37086 files, 2285 sub directories). As I mentioned, it took 3 hours 43 minutes which is the order of magnitude of your time. I have no older figure than this one when I installed a kernel to check the include feature involving file name twiddling with $a variable.
Since you installed 0.9.6 and I work on 0.11-beta, recent changes cannot be blamed.
Genxref works in 3 passes:
1- Full tree handed over to glimpse (no LXR part involved here, only internal glimpse processing with its own database)
2- recursive descent through source tree to collect symbol definitions
LXR is a wrapper around ctags which processes the files one after the other. ctags output is then stuffed into MySQL.
3- recursive descent through source tree to collect symbol uses
Source files are scanned by a Perl-written parser, which is not a speed monster. I did a deep overhaul on it in 0.9.8 because there was a flaw in it leading to incorrect parsing in rather common cases. I think it brought neither improvement (though it should have, but some fail-safe tests are a bit expensive) nor slow down. The symbol occurrences are also stuffed into MySQL.
The parser is the same as the one used when scanning a file before display. It is quite difficult to notice performance impact on a single file, thus I doubt something can be discovered from profiling only file display ("source" script).
I am interested by all your findings. For the time being, do not upgrade to 0.10.2. Wait till I issue 0.11 (or till I can send you a frozen 0.11-beta) because installation is partially automated in this new release (you can have a flavour of it through the beta manual at http://lxr.sf.net/LXRUserManual-beta.pdf - caution ~6 MB! - or through the "hidden" installation instructions at http://lxr.sf.net/0-11-InstallSteps/0-11-install.shtml)
I am not sure if you are saying you see a slowdown with f16 too? In any case, step 1 that you mention is still fast, step 2 is not too bad but slower now than I remember. But step 3 is the one that really crawls.
I am tempted to install f15 on a spare drive and try it there if I had the time :). Previously I could clear the database with the initdb-mysql command and do a genxref http://myurl -allversions and it would take may 1/2 hour max (still does on other machine with f14).
Haven't noticed any problems with 0.9.6 that you mention.
I was not clear, that's right. I didn't notice any slow down because I usually genxref small projects (presently mainly LXR itself). My only big source tree was the 3.1 Linux kernel which I indexed only once to have a test case for the 'incprefix'/'maps' features, thus I can't compare. I asked another user who answered kernel genxref'ing take a bit more than 3 hours (without precise timing) with a computer similar to mine with 0.10.2 release. This computer runs Centos in vmware8 under win7 ultimate. Virtualisation should slow down execution. Since the order of magnitude is the same, I am tempted to consider that regular time for a kernel version is 3-4 hours on that type of configuration.
Concerning the 3 passes:
1- glimpse global: should be quite fast because there is no individual file detailed analysis
2- ctags dictionary build: quite fast too since only declarations are looked for. It does not use the Perl-interpreted parser; consequently there should not be any difference between LXR releases. To my knowledge, ctags has not issued new releases for years now. This step should not incur slow down on LXR behalf.
3- LXR references: the slowest of the three since every code line must be scanned for symbol use
Go ahead with F15 test. It will use older versions of kernel, Perl and MySQL at least.
I never got around to trying an f15 test but have just stayed with f16. However, I needed to add another project to lxr and found what was causing the extreme slowdown in indexing. It seems it is caused by the database engine that mysql defaults to, which currently is innodb. I don't understand all the details but innodb places all data in one *huge* file at /var/lib/mysql/ibdata1. It seems that when genxref is running, this file is continually being read and written which keeps disk actitivity very high but genxref and mysql are almost idle waiting for IO.
Previously, mysql used a engine called myISAM that creates a separate file for each table and does the indexing much faster since one large file is not being accessed but, instead, several smaller ones dedicated to lxr. So if I add the a "default-storage-engine=myISAM" line to section of /etc/my.cfg, stop and start mysqld and run genxref it indexes fast again. (Disk activity is low and genxref/mysql use more cpu cycles and what took hours with innodb takes a few minutes with myisam engine.)
Thanks a lot for having discovered the difference between MySQL versions.
But the single file is not the full answer to the issue. In the MySQL manual, I found a system variable "innodb_file_per_table" which, if non zero, creates one file per table instead of stuffing everything in the global HUGE file at /var/lib/mysql/data1. I ran a test and it DID NOT change indexing time.
So I read more thoroughly the differences between MyISAM and InnoDB (which is highly recommended). There are new features in the latter (mainly related to concurrent write access with row locks instead of table or file lock). In my opinion, these features are unnecessary in LXR where indexing is done once (or at least rarely) and reads are very very frequent.
I reran a test with MyISAM and indexing time is cut to half the InnoDB value.
I found a way to have tables stored in MyISAM files without the need to change default /etc/my.cfg. I specify engine=myisam in table description. The advantage is the change is not lost when mysql package is upgraded.
The mysql template has been modified. I send you a beta version for testing purpose.
I also tried the "innodb_file_per_table" option and, even though separate files are created for each lxr table, it still wrote to that big file and still seemed sluggish.
Yes, I think putting the myisam option on each table definition is a good idea since it doesn't require a change in the default my.cfg. Also, since lxr is basically a "read-only" database the advantages of innodb are probably not significant.
With myisam, I see a huge difference in indexing speed: maybe 20 minutes to index all my projects vs. overnight or longer with the innodb.
Thanks for sending the beta. I will try to see how it does but I can't promise exactly when :)
Log in to post a comment.