Thread: [Lxr-dev] [ lxr-Bugs-2714744 ] Tagger & Plain.pm inefficient
Brought to you by:
ajlittoz
From: SourceForge.net <no...@so...> - 2009-03-26 15:43:23
|
Bugs item #2714744, was opened at 2009-03-26 15:43 Message generated for change (Tracker Item Submitted) made by mbox You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: current cvs Status: Open Resolution: None Priority: 3 Private: No Submitted By: Malcolm Box (mbox) Assigned to: Malcolm Box (mbox) Summary: Tagger & Plain.pm inefficient Initial Comment: Tagger.pm uses Files::tmpfile to get a file for indexing. In the plain file case, this is created by copying the existing file into a temporary file. After indexing the temporary file is then deleted. Should fix the interface from Tagger.pm to Files.pm to simply get a read-only reference to the file, and close it when no longer needed. Leave it up to Files.pm to worry about where/how it's on the disk. Alternatively on Unix systems Plain.pm could just create a hardlink and give that back to Tagger.pm, but this isn't portable. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 |
From: SourceForge.net <no...@so...> - 2012-03-29 19:06:14
|
Bugs item #2714744, was opened at 2009-03-26 08:43 Message generated for change (Settings changed) made by ajlittoz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: current cvs >Status: Closed >Resolution: Fixed Priority: 3 Private: No Submitted By: Malcolm Box (mbox) >Assigned to: Andre-Littoz (ajlittoz) Summary: Tagger & Plain.pm inefficient Initial Comment: Tagger.pm uses Files::tmpfile to get a file for indexing. In the plain file case, this is created by copying the existing file into a temporary file. After indexing the temporary file is then deleted. Should fix the interface from Tagger.pm to Files.pm to simply get a read-only reference to the file, and close it when no longer needed. Leave it up to Files.pm to worry about where/how it's on the disk. Alternatively on Unix systems Plain.pm could just create a hardlink and give that back to Tagger.pm, but this isn't portable. ---------------------------------------------------------------------- >Comment By: Andre-Littoz (ajlittoz) Date: 2012-03-29 12:06 Message: Files.pm interface changed as follows: - method tmpfile renames to realfilename to return a filename containing text requested as file and version - new method releaserealfilename to tell when the real filename is no longer needed. Plain.pm: - realfilename returns the OS absolute path to the requested file version - releaserealfilename does nothing CVS.pm, BK.pm (this one on a guess, result not tested) - realfilename makes a temp copy of the requested file version and returns the OS absolute path of the temp file -releaserealfilename deletes the temps file Git.pm: no fix because Git support is broken in CPAN library. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 |
From: SourceForge.net <no...@so...> - 2012-04-06 17:29:29
|
Bugs item #2714744, was opened at 2009-03-26 08:43 Message generated for change (Comment added) made by ajlittoz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: current cvs Status: Closed Resolution: Fixed Priority: 3 Private: No Submitted By: Malcolm Box (mbox) Assigned to: Andre-Littoz (ajlittoz) Summary: Tagger & Plain.pm inefficient Initial Comment: Tagger.pm uses Files::tmpfile to get a file for indexing. In the plain file case, this is created by copying the existing file into a temporary file. After indexing the temporary file is then deleted. Should fix the interface from Tagger.pm to Files.pm to simply get a read-only reference to the file, and close it when no longer needed. Leave it up to Files.pm to worry about where/how it's on the disk. Alternatively on Unix systems Plain.pm could just create a hardlink and give that back to Tagger.pm, but this isn't portable. ---------------------------------------------------------------------- >Comment By: Andre-Littoz (ajlittoz) Date: 2012-04-06 10:29 Message: Made extensive tests and I'm a bit disappointed. The overall time-improvement seems to be 10-15% only. It means either the Linux caching mechanism is very very efficient or other factors dominate indexation time. One of these could be --reindexall processing. On a moderately sized test (8 LXR releases), genxref takes 70 seconds (high-end computer, 4GB memory, 3,4GHz, fast SATA) for a first indexation (empty database) and 88 seconds with --reindexall. In both cases, fluctuation on run time is about +-1 second. For Linux kernel 3.1 (more than 31,000 files), genxref takes 3 hours 53 minutes 58 seconds (single run, no deviation computed) with a fresh database while --reindexall gives 5 hours 30 minutes!!! Unless you want to keep cross-references on other versions, it is much better to erase the database first then to use --reindexall. Another examination shows that ctags step is quite fast (ctags is compiled, not interpreted) while the references step is slow. Most of the time is spent in SimpleParse.pm's nextfrag subroutine. A real improvement would result from using a compiled parser and probably all the more if it is a real finite state automaton parser instead of the surrogate regexp-based Perl-interpreted parser. Also what happens during glimpseindex step is not clear: A line "This is glimpseindex version ..." is printed, then nothing more during a very long time (kernel case) as if glimpseindex was frozen (but it is not). This needs some explanation or caveat from glimpse team. What also puzzles me: I made tests on my old low-end laptop (512 MB memory, 650 MHz, standard PATA) with the LXR sample. The results do not reflect the clock ratio between the machines. I should have a penalty with the slow PC (less memory to hold the caches) while it behaves better: 3 minutes 58 seconds average against 1 minute 12 seconds (same sample). Ratio is 3.3 for clock ratio 5.23. All times used were 'real' results from time. I have not done the comparisons on 'user' and 'sys'. ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-03-29 12:06 Message: Files.pm interface changed as follows: - method tmpfile renames to realfilename to return a filename containing text requested as file and version - new method releaserealfilename to tell when the real filename is no longer needed. Plain.pm: - realfilename returns the OS absolute path to the requested file version - releaserealfilename does nothing CVS.pm, BK.pm (this one on a guess, result not tested) - realfilename makes a temp copy of the requested file version and returns the OS absolute path of the temp file -releaserealfilename deletes the temps file Git.pm: no fix because Git support is broken in CPAN library. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 |
From: SourceForge.net <no...@so...> - 2012-04-07 11:05:00
|
Bugs item #2714744, was opened at 2009-03-26 08:43 Message generated for change (Comment added) made by ajlittoz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: genxref Group: current cvs Status: Closed Resolution: Fixed Priority: 3 Private: No Submitted By: Malcolm Box (mbox) Assigned to: Andre-Littoz (ajlittoz) Summary: Tagger & Plain.pm inefficient Initial Comment: Tagger.pm uses Files::tmpfile to get a file for indexing. In the plain file case, this is created by copying the existing file into a temporary file. After indexing the temporary file is then deleted. Should fix the interface from Tagger.pm to Files.pm to simply get a read-only reference to the file, and close it when no longer needed. Leave it up to Files.pm to worry about where/how it's on the disk. Alternatively on Unix systems Plain.pm could just create a hardlink and give that back to Tagger.pm, but this isn't portable. ---------------------------------------------------------------------- >Comment By: Andre-Littoz (ajlittoz) Date: 2012-04-07 04:05 Message: Accurate time for 3.1 kernel is 5 hours 46 minutes 56 seconds. ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-04-06 10:29 Message: Made extensive tests and I'm a bit disappointed. The overall time-improvement seems to be 10-15% only. It means either the Linux caching mechanism is very very efficient or other factors dominate indexation time. One of these could be --reindexall processing. On a moderately sized test (8 LXR releases), genxref takes 70 seconds (high-end computer, 4GB memory, 3,4GHz, fast SATA) for a first indexation (empty database) and 88 seconds with --reindexall. In both cases, fluctuation on run time is about +-1 second. For Linux kernel 3.1 (more than 31,000 files), genxref takes 3 hours 53 minutes 58 seconds (single run, no deviation computed) with a fresh database while --reindexall gives 5 hours 30 minutes!!! Unless you want to keep cross-references on other versions, it is much better to erase the database first then to use --reindexall. Another examination shows that ctags step is quite fast (ctags is compiled, not interpreted) while the references step is slow. Most of the time is spent in SimpleParse.pm's nextfrag subroutine. A real improvement would result from using a compiled parser and probably all the more if it is a real finite state automaton parser instead of the surrogate regexp-based Perl-interpreted parser. Also what happens during glimpseindex step is not clear: A line "This is glimpseindex version ..." is printed, then nothing more during a very long time (kernel case) as if glimpseindex was frozen (but it is not). This needs some explanation or caveat from glimpse team. What also puzzles me: I made tests on my old low-end laptop (512 MB memory, 650 MHz, standard PATA) with the LXR sample. The results do not reflect the clock ratio between the machines. I should have a penalty with the slow PC (less memory to hold the caches) while it behaves better: 3 minutes 58 seconds average against 1 minute 12 seconds (same sample). Ratio is 3.3 for clock ratio 5.23. All times used were 'real' results from time. I have not done the comparisons on 'user' and 'sys'. ---------------------------------------------------------------------- Comment By: Andre-Littoz (ajlittoz) Date: 2012-03-29 12:06 Message: Files.pm interface changed as follows: - method tmpfile renames to realfilename to return a filename containing text requested as file and version - new method releaserealfilename to tell when the real filename is no longer needed. Plain.pm: - realfilename returns the OS absolute path to the requested file version - releaserealfilename does nothing CVS.pm, BK.pm (this one on a guess, result not tested) - realfilename makes a temp copy of the requested file version and returns the OS absolute path of the temp file -releaserealfilename deletes the temps file Git.pm: no fix because Git support is broken in CPAN library. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=2714744&group_id=27350 |