Menu

#214 cscope ignores symlinks to files

None
closed-rejected
nobody
None
5
2023-04-20
2007-04-05
No

I'm trying to use cscope to index a source tree that contains symlinks to files located outside of the tree. cscope won't process these files, since dir.c uses lstat() and checks for S_ISREG, which will never be true of the symlink itself. I can make cscope do what I want by changing the lstat() to stat() in accessible_file().

Is the intent to avoid scanning the same file twice? In my case, that never happens, since the symlinks point outside the tree being scanned. Perhaps cscope could check if a symlink points to another file that it has or will index, rather than ignore all symlinks. Or if that's too much work, perhaps a command line argument could tell cscope to allow symlinks (use stat instead of lstat).

Discussion

  • Neil Horman

    Neil Horman - 2007-04-05

    Logged In: YES
    user_id=827328
    Originator: NO

    The intent is actually to avoid reading files that aren't in the source tree at all. Nominally a source tree is just that, a tree of sources, with no files existing outside of the root directory. Not having files inside the source tree has adverse affects on how cscope can handle source changes. If we were to use the stat function to detect this file, and that external source were to change or dissappear, we would have no way to detect that. We could of course follow the link when we found one, and make the determination then with stat, but that only raises the question of how many links to support. In the end its really easier and more predictible to require that all sources be below a single root, since %99 of projects do that anyway.

    If you have a build environment that really requires you to use symlinks (which is questionable to begin with, you do have a few options:

    1) re-org your hirearchy in your source so that, despite the symlink, you have a common root to run cscope from.
    ie:
    commonroot/path/to/in/tree/src_symlink (link to commonroot/externalsource/myfile.c)
    commonroot/externalsource/myfile.c
    With this method, cscope won't find src_symlink, but it will pick up myfile.c, and so you can find all its symbols, etc.

    2) Manually build your cscope.files file, and hand include the source paths from your root to your external sources. cscope should find them just fine then.

     
  • Scott M. Ferris

    Scott M. Ferris - 2007-04-05

    Logged In: YES
    user_id=40524
    Originator: YES

    I'm having trouble understanding why adding a level of indirection with a symlink affects the ability to determine if a file has changed or been removed. You can still get all of the stat information with stat(), and can read the file. You can even resolve the symlink with readlink(2), if you're really worried about it changing to point to a different file, though in my environment that will never happen. Even if it did change to point to a different file, I don't see how that would differ from radically changing the contents of a file in the tree, which I assume cscope can already handle.

    The build environment in question is a customized Linux kernel source tree. Rather than check in the entire kernel tree and repeatedly merge newer versions, we checkin just the few files we modify, and have a Makefile target that unpacks a kernel tree from a tarball, and then replaces certain files with symlinks to the files in the version controlled directory. The kernel Makefile's cscope target builds cscope.files as it normally does, and contains pathnames that are in some cases symlinks outside of the kernel tree.

    I suppose I can work around the limitation by patching the kernel's Makefile to resolve the symlink when generating cscope.files, but since the limitation in cscope doesn't make any sense to me, I thought I'd see if cscope could be changed. From my perspective, it's an unnecessary limitation, and removing it would have no downside.

     
  • Hans-Bernhard Broeker

    Logged In: YES
    user_id=27517
    Originator: NO

    > Is the intent to avoid scanning the same file twice?

    Not according to any document I could find.

    It's to avoid the impossible task of checking for infinitely recursive symlinks. At least that's what the only relevant ChangeLog entry says about it (by Petr, dated 2001-07-09).

    Another problem would be broken symlinks --- stat() doesn't work for those at all.

     
  • Nobody/Anonymous

    Logged In: NO

    Why not just add a command line option to give the user the choice of whether to follow symlinks or not?

    Evan

     
  • Neil Horman

    Neil Horman - 2008-04-08

    Logged In: YES
    user_id=827328
    Originator: NO

    I suppose we could, but honestly, I really don't see alot of value there. The build environment you describe is really very unique as far as I can see. The environment you are describing is very often created by distribution packagers and managed by packaging tools. Commonly whats done by distributions using rpm, deb, etc, is that only patches to the base source version are committed to the SCM tool, along with a metadata file (the spec file in the case of RPM). The build tools then, when building the package, fetch the pristine source from upstream, expand it in a build root, and apply the patches according to the order specified in the metadata file. Thats a much more common (and sane) way of handing sparse changes.

    I suppose there isn't alot of harm in providing an option to follow symlinks, but you're shooting yourself in the foot. If you wanted to write and provide a patch to offer such an option, I'd be willing to consider merging it

     
  • Ken Weinert

    Ken Weinert - 2013-08-09

    I have another situation where following links would be useful.

    Our source tree is under synergy on AIX and, unless you check it out, files are links.

    I'll try the lstat() -> stat() change above and keep in mind that we could be finding dragons.

     
  • Bsi Ice

    Bsi Ice - 2014-06-02

    I also need a option to follow links!
    Although not much value, but it does has value.

     
  • Neil Horman

    Neil Horman - 2014-06-02

    for unique situations in which you have a tree made up of nothing but symlinks, building your cscoep database should be as simple as:
    cd /top/of/tree
    find . -type l > ./cscope.files
    cscope -b

    put it in a script and it will just work every time

     
  • Keith in Ottawa

    Keith in Ottawa - 2014-11-12

    I think that some of the replies are based on a faulty assumption. The original poster Scott wrote since dir.c uses lstat() and checks for S_ISREG, which will never be true of the symlink itself but this misunderstands the nature of lstat(). lstat() follows the link so the stat struct returned is that of the referenced file.

    That said, the cannot find file error messages can be misleading because inviewpath() (which calls acccessible_file()) can't tell the difference between a missing file and a file which need not be indexed because it is a symbolic link, presumably to another file which will be indexed.

     
  • Hans-Bernhard Broeker

    • status: open --> closed-rejected
    • Group: -->
     
  • Yichao Yu

    Yichao Yu - 2020-08-28

    but this misunderstands the nature of lstat(). lstat() follows the link so the stat struct returned is that of the referenced file.

    While there might be other reason this is rejected, this statement is wrong. From the manpage of fstatat on linux.

    The lstat() function shall be equivalent to stat(), except when path refers to a
    symbolic link. In that case lstat() shall return information about the link, while
    stat() shall return information about the file the link references.

    and the original poster Scott is correct about his understanding of lstat. Also, whether the symlinked files are indexed or not, the file not found error should at least be fixed...

     

    Last edit: Yichao Yu 2020-08-28
  • Mike Dubrovsky

    Mike Dubrovsky - 2023-04-20

    can we do it as new command line argument ?
    or if the concern is "too many parameters" and not very common functionality ... let's do as environment variable ?
    I can post the diff if gatekeepers agree to accept the change.

     

    Last edit: Mike Dubrovsky 2023-04-20

Log in to post a comment.