Menu

#7 Incremental update

closed
None
5
2004-02-18
2003-09-30
Kirc Doog
No

I run cscope on Red Hat 7.2 and Windows XP. The
Windows XP version of cscope was v15.4 built using a
fresh install of CygWin. I use it under the provided
Emacs Lisp file with GNU Emacs. It works great and
fast too, with one exception: If I have a cscope
database at the top of a large source tree, and I
change a single file, cscope rebuilds its database
files from scratch. You can see this by running cscope
on a large software tree and monitor the directory in
which the cscope database files live while requesting a
new symbol.

I can understand cscope having to do this if it cannot
know ahead of time which files have changed, which
would mean it would have to do an out-of-date check on
individual source files, but that is not true in the
scenario that cscope is driven from Emacs (or from some
other editor for that matter). It is this one
scenario, where the cscope user is editing one or more
source files, that cscope needs to be the fastest, and
that is to avoid rebuilding the entire database from
scratch. It can only do this if it is told about each
and every file update, which can be arranged by the
user simply by adding additional code to the .el file
that catches every save and notifies cscope on the
command line about the change in state of that file (it
was deleted, it was modified, a new file was added).
Then, cscope could simply update all records in its
database that correspond to the changed file.

I propose new switches, call it --changed, be added to
the command line that takes the file and a keyword
indicating what type of change was made. Then, the .el
file could be modified to add a save-file hook function
to check to see if the file that is being saved is
cscope-managed, and then invoke cscope with the new
--change switch.

Discussion

  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    cscope database updates are incremental, already.
    cscope.out sections for files that haven't been modified are
    just copied over into the new cscope.out. But even copying
    takes some time.

    Generally, even if it's being run from inside an editor,
    that doesn't mean cscope can make assumptions about which
    file are, and which are not modified. It's not even clear
    it can request a list of buffers from the editor.

    Generally speaking, this is what the -d switch is for, which
    avoid updates altogether, until triggered manually. The VIM
    interface uses that switch, IIRC.

     
  • Kirc Doog

    Kirc Doog - 2004-02-07

    Logged In: YES
    user_id=877665

    Ok, I'd like to respond to that, but the SourceForge web page
    has no "Submit Followup". I feel silly about asking, but how
    do I make a followup to your followup?

     
  • Kirc Doog

    Kirc Doog - 2004-02-07

    Logged In: YES
    user_id=877665

    Ok, I get it. Attaching a comment is the same as submitting
    a followup. Well, that's cryptic. Oh well, now I know.

    I'll submit the followup, um I mean comment shortly.

     
  • Kirc Doog

    Kirc Doog - 2004-02-07

    Logged In: YES
    user_id=877665

    Ok, you said that even copying takes time. Ok, I think that is
    the underlying problem here. There is something "wrong" if
    you have to copy the entire .out file just to update it for
    changes made to a single file. Consider that cscope will be
    run on millions of lines of source code; changing one file
    should not require copying the entire .out file only to update
    it with the sections that were changed because of one file.
    With millions of lines of code, that is a huge .out to copy just
    to update, which is the main time consumer.

    You are correct that cscope cannot make assumptions about
    which files are modified and which ones are not. The editor
    or external program would have to notify cscope (via
    command-line option) each time it saves a file. cscope could
    sit around passively and only then update the database for
    that one file, and hopefully without copying the .out file.

     
  • Hans-Bernhard Broeker

    • assigned_to: nobody --> broeker
     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517

    Copying of the database contents is a necessity dictated by
    its format --- it's essentially a flat text file with no
    internal structure that would allow to overwrite only parts
    of it. Users basically seem to have adopted their style of
    working to this fact quite nicely: they just don't rebuild
    the database all the time, but rather use -d mode,
    particularly when running cscope as slave of some editor.
    They also generally only trigger rebuilds manually, when
    search results become just too imprecise to work with, so
    the copy/rebuild time is worth spending.

    What you're asking for is essentially to kill the core
    design element of the whole program: the data file format,
    and replace it by a full-blown database with
    in-place-replaceable records and an index. I've actually
    gone that way for a while, trying to replace the rather
    unmaintainable invlib.c module, but there are some rather
    serious drawbacks. For one thing, DB file sizes would
    increase significantly to accomodate the slack space the DB
    engine needs to maneouvre. And we would turn cscope, which
    currently relies on no external library except curses,
    dependant on some DB subsystem the user may well not have
    installed.

    As to letting the editor inform us which files were updated
    --- sorry, but that's not a workable solution, even if we
    managed to pull it off. For once, there are just too many
    different editors out there, so we certainly can't do this
    for each of them, on any realistic time budget. This would
    make this a half-baked solution, at best. Second, and
    worse, the whole assumption that the editor actually knows
    which files have been modified since the last DB rebuild is
    flawed. Files may have been edited, or be in the process of
    being edited, by other users, using other editors, on other
    machines in the network. Files may have been changed by
    programs that aren't editors at all. The single instance
    that has a chance of telling us which files have been
    modified and which haven't is the filesystem. So cscope
    *has* to check the timestamps. They're the only somewhat
    reliable status indicator available, and that's exactly what
    it does already.

     
  • Kirc Doog

    Kirc Doog - 2004-02-18

    Logged In: YES
    user_id=877665

    I hear what you are saying. There are some assumptions I
    was making
    that are not the same for other cscope users. I've concluded
    that my
    request for "incremental update" is not appropriate for cscope.

     
  • Hans-Bernhard Broeker

    • status: open --> closed
     
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.