Menu

#284 15.8a: Internal error: cannot get source line from database

open
nobody
None
5
2016-10-21
2013-06-26
Jason8
No

Hello,

I'm seeing the "Internal error" which is also mentioned in bug #274. Here is how I'm getting the error:

  • build cscope 15.8a using gcc 2.95.3 and GNU flex (on solaris 10) (have also tried with the solaris studio compiler, with similar results)
  • create files.txt, a list of source files (.cc and .h), with about 2000 files in the list
  • create a DB using: cscope -b -f cscope.db -q -i files.txt
  • do a lookup in the DB using: cscope -d -f cscope.db -q -L -8 header.h
  • get the error: Internal error: cannot get source line from database

Note that the error only happens for header.h -- the lookup works successfully for all of the other headers in my project.

I've found a couple ways to work around this:

  1. Don't use the -q flag when creating the DB
  2. Split files.list into two files (with about 1000 source files in each), and then create two DBs and do two lookups

With either of these methods, I don't see the error for header.h (or any other header file). The first method seems like the easier one, and the slowdown without the inverse DB isn't too painful.

I'm wondering if perhaps the inverse DB is deprecated or not often used?

I've tried running both sides (DB creation and DB lookup) through purify, and didn't see anything much. When creating the DB, purify reports an "uninitialized memory read" in invmake (when calling fflush() after "write out junk to fill log blk"), so for fun I tried commenting out that code. Purify then no longer complains, but I still get the Internal error when doing a lookup for header.h. So this UMR is probably just a red herring.

Any other thoughts would be welcome. I notice in bug #274 there's a reference to a "patch mentioned in bug report #3528987", but I'm not sure where to find that patch.

Thanks!

-Jason

Discussion

  • Hans-Bernhard Broeker

    create files.txt, a list of source files (.cc and .h), with about 2000 files in the list

    That part alone already makes your report practically impossible to work on: I don't have those 2000 files, and since it appears I need at least more than half of those, there's just no sensible way I can investigate it.

    Note that the error only happens for header.h -- the lookup works successfully for all of the other headers in my project.

    Does a search for all headers at once (i.e. -L8 '.*') work, too?

    I'm wondering if perhaps the inverse DB is deprecated or not often used?

    No to both.

    I notice in bug #274 there's a reference to a "patch mentioned in bug report #3528987", but I'm not sure where to find that patch.

    Patch numbers were lost in SourceForge's site revamp. The whole bug tracker is new.

     

    Last edit: Hans-Bernhard Broeker 2013-06-26
    • Paul Bolle

      Paul Bolle - 2015-07-13

      0) A xref I recently built triggered the same error. I think I managed to pinpoint the issue (at least for that xref).

      1) The xref was built with an inverted index. So -L queries traverse cscope.in.out -> cscope.po.out -> cscope.out

      2) Essential to this bug is that you hit a posting in cscope.po.out that references a source line in the xref db (cscope.out) that starts exactly at a 8192 byte block boundary. Ie, the layout is like:
      "\n\n" [block boundary] "1234 foo bar baz"

      3) Somehow, I didn't check in detail, you'll enter putsource() with blockp pointing at "1".

      4) putsource() will now scan back to before the two newlines. Which means it will read in the previous block of the xref db. And then it'll do it's paranoia check whether we're really looking at a "\n\n[[:digit:]]+" fragment. (cscope's db must be pretty messed up if it isn't.) For some reason the end of the block we just read in now looks like:
      "\n\n\0\0"

      Ie, blockmarker is still the NUL byte. The last byte of block is NUL by definition. See seekdb() and read_block().

      5) The paranoia check starts with
      *blockp != '\n'

      which fails, because *blockp == '\n'. Good.

      6) Then we check
      getrefchar() != '\n'

      which actually means
      ((++blockp + 1) != '\0' ? blockp : (read_block() != NULL ? *blockp : '\0'))

      (++blockp + 1) == '\0' here. (Why does cscope skip the second '\n' here?) So we do a read_block(). blockp now points to the start of that new block, so blockp now is '1'. So getrefchar() != '\n', and we will see
      postfatal("Internal error: cannot get source line from database");

      7) -ENOPATCH. The code is rather hairy here. I might try a patch but that we'll take me at least a few days of work.

      (I'm too dumb to properly format the above. That should add to the fun when determining whether my analysis is correct.)

       
      • Hans-Bernhard Broeker

        Unfortunately, it appears this recipe is still crucially incomplete to reproduce the problem. I tweaked a source code to fulfill condition 0) through 2), but still couln't get it to trigger the error message.

         
        • Paul Bolle

          Paul Bolle - 2015-07-24

          0) I noticed this on an (inverted) cscope DB for a recent Linux kernel (ie, the three files involved weigh over one GB). So I really hope I can describe this in enough detail for you to reproduce.

          1) What seems to be involved in my case is the following:
          - the linenumber referenced for the first[*] entry for a term in the postings file starts at a 8192 block boundary, as I reported before;
          - the function referenced in that posting entry is zero, so it's a global term (otherwise the previous block, which would contain that function, will still be available in memory?);
          - the "mark" for that posting is not a '$' or a '~', so the posting doesn't reference a function definition or an include. (The problematic marks were either ' ', '#', 'e', 'g', 'm', 's', or 't'.)

          If the above is true the error triggers with
          cscope -d -L -0$term

          2) Does this make sense? Can you reproduce now?

          • I'm not sure this is actually relevant. I only tested the first posting for any term, just to limit the amount of data I had to handle.
           
          • Paul Bolle

            Paul Bolle - 2015-07-24

            0) I've been trying to pinpoint things further. Things seem to boil down to this:

            1) The non-failing terms (that also have their first line start at a block boundary in my tests!) pass these function calls:
            find_symbol_or_assignment()
            putpostingref()
            fetch_string_from_dbase() / because either p->fcnoffset == 0 && p->type == FCNDEF
            * or p->fcnoffset != lastfcnoffset
            /
            putref()
            putsource()

            The call to fetch_string_from_dbase() contains setmark('\n'). This sets blockmark to '\n'. So when putsource() is entered blockmark will be '\n'. And that will, by what looks like pure chance, make the test for two newlines preceding the line number work. (This test should check the last two characters of the preceding block. But it actually also looks at the blockmark. Why is that?)

            The failing terms (remember that they have their first line start at a block boundary) lack the calls to fetch_string_from_dbase() because they're neither a FCNDEF (mark character isn't '$') and they're global (fcnoffset == 0). So when putsource() is entered blockmark is still NUL.

            2) Tested this under gdb with a watch on blockmark and a breakpoint on putsource(). Setting blockmark to '\n' when one hits the putsource() breakpoint makes the error disappear. QED.

            3) Agree?

             
  • Jason8

    Jason8 - 2013-06-26

    That part alone already makes your report practically impossible to work on

    Yes, I understand. If I could replicate with a smaller set of files I would, but it seems to be something that's triggered by a large number of input files.

    Does a search for all headers at once (i.e. -L8 '.*') work, too?

    Just tried this, it works -- it finds 5 instances of header.h, and no errors.

    I wouldn't think it's a problem parsing my source files, since that process is successful (using the -q flag) when the filelist is broken up into two smaller lists. And since the all-header query above was successful, this seems to imply that the DB (and .in and .po) are correct, is that right? So perhaps it's a problem with the query, but why would only header.h cause the problem?

    I'm debugging the query right now, but haven't got very far. The stack is:

    =>[1] putsource()
    [2] putref()
    [3] putpostingref()
    [4] findinclude()
    [5] search()
    [6] main()

    In putsource(), it goes into the "read the previous block" code when running for header.h. At this point, I have blocknumber = 4441. Going up to postingref(), I see:

    *p = {
    lineoffset = 4547584
    fcnoffset = 0
    fileindex = 912
    type = 126
    }

    So it looks like block 4441 is the last block (4441 * 1024 = 4547584). Could this be related?

    Thanks for the quick response,

    -Jason

     
  • Bill Lash

    Bill Lash - 2016-10-21

    I think I have tracked down the problem based upon Paul's analysis above. The base issue is that in the case of doing searches in the inverted database, the "blockmark" global variable was not being set using the setmark macro before doing the search. This leads to read_block reading in a block and putting 2 \0 at the end instead of only 1, (normally it ends in the value in blockmark followed by a \0. This in turn leads to the getrefchar macro skipping the last character of the block since it sees (++blockp +1)=="\0" at the next to last character in the buffer.

    The good news is that the fix is easy, just add a setmark('\n'); in find_symbol_or_assignment() before doing the search in the inverted database.

    I created a c file that can be used to show the problem, but there is one issue with it. Since the cscope.out file stores the path to the directory where it was built, you have to edit the source file until you get the bug line to start at character 8192. I did this by trial and error. I'll attach the file, and describe the process to re-create the error:

    cscope -bcqk cstest.c

    in the cscope.out find where:

    118 int
    gbug_line
    ;

    starts. I did this by removing the 118 int line and everything after it, and running wc on the result. If it was off by a few bytes, I editted the source until wc read 8192. After that, I created the database again, and did a

    cscope -d -L0bug_line

    and got the fialure.

     
  • Bill Lash

    Bill Lash - 2016-10-21

    Oops, forgot the cstest.c file

     

Log in to post a comment.