#13 Match lines pointing to wrong files in large database

closed
nobody
None
5
2007-10-01
2007-05-11
Jack Goldstein
No

Hi,

We've been successfully using cscope for a long time without major problems but recently, the size of our code base must have passed some threshold and the results of searches are no longer correct. Specifically, the number of matches, function names, line numbers and code snippets are all right but the file names are wrong, seeming to be almost random choices from within the code tree. The problem occurs in versions 15.5 and 15.6 on both Linux and AIX.

In case it helps, here are the sizes of some of the files, in bytes:

596061077 all
32665600 all.in
429231828 all.po
4246958 filelist

The filelist contains 68,576 files.

I'm hoping that this problem is not too hard to fix once the cause is found but finding the cause when I don't know the code is a bit of a challenge. Does anyone have any hints on which data structures or functions are likely culprits?

Thanks,
Jack

Discussion

  • Neil Horman
    Neil Horman
    2007-05-11

    Logged In: YES
    user_id=827328
    Originator: NO

    Those file sizes shouldn't be relevant. can you check to make sure that all the file in your input list are C files, and that they are all readable. Non readable files have resulted in misconstructed indicies previously.

     
  • Logged In: YES
    user_id=27517
    Originator: NO

    The sheer size of the database shouldn't matter. Not so far below the first 'natural' boundary of 2^31 (LONG_MAX), anyway.

    So we'll need more detailed input: which was the exact straw that broke this camel's back? Important, likely boundary values might be 2^16 files, or 2^29 bytes in file 'all' --- even though the invlib data structures actually allow for 2^24 and 2^31, respectively.

    But be sure to follow Neil's remark, too: make 101% positively sure that all files you name in your namelist are actually readable, C source files. In a nutshell: if

    xargs file < filelist

    prints anything else but "C source text" or variations thereof, consider yourself in trouble.

     
  • Logged In: YES
    user_id=27517
    Originator: NO

    The sheer size of the database shouldn't matter. Not so far below the first 'natural' boundary of 2^31 (LONG_MAX), anyway.

    So we'll need more detailed input: which was the exact straw that broke this camel's back? Important, likely boundary values might be 2^16 files, or 2^29 bytes in file 'all' --- even though the invlib data structures actually allow for 2^24 and 2^31, respectively.

    But be sure to follow Neil's remark, too: make 101% positively sure that all files you name in your namelist are actually readable, C source files. In a nutshell: if

    xargs file < filelist

    prints anything else but "C source text" or variations thereof, consider yourself in trouble.

     
  • Jack Goldstein
    Jack Goldstein
    2007-05-15

    Logged In: YES
    user_id=1790622
    Originator: YES

    Neil and Hans-Bernhard, thanks for the quick responses.

    We've been successfully using cscope to index java files as well as C++ for a long time so the program is not quite as sensitive as you've implied but your assessment of the problem was right on. Someone created directories with names matching "*.java" and our find command was just searching by name. Adding "-type file" to that seems to have fixed the problem.

     
    • status: open --> closed