I once more spent some time with my private, "structure-improved" version
of invlib.[ch] which I had originally started to get the 64bit problems
solved, and I guess it'd be a good idea to document the results
publically, in case anybody else ever has to try and understand what these
routines actually do. So I decided to just mail them around... please
excuse the bulk of it.
Attached, you'll find:
1) in_dump.c, po_dump.c
--> programs to dump .in.out and .po.out files into human-readable
2) My latest, modified invlib.c. Don't expect to be able to compile it
as-is with a normal cscope tree ---- this is for looking at, not for
Here's the core result: a copy of a documentation comment I put into
invlib.c that describes what all those strange binary data in the file are
supposed to do, and how they're meant to be used:
/* Description of the workings of this module and the files it writes:
* ---- the .in file (cscope.in.out or cscope.out.in):
* + The file header should be self-explanatory...
* + The file is subdivided into so-called "logical blocks".
* + SUPERFINGER is a table of strings (one long char with \0 between strings)
* + SUPERINT is an array of indices into SUPERFINGER, with entry #0 holding
* the number of entries.
* + SUPERINT and SUPERFINGER are sorted alphabetically.
* + In memory, SUPERINT and SUPERFINGER are separate, dynamically growing arrays.
* + In the file, they're concatenated into a single blob (and the indices in SUPERINT
* shifted to make them relative to the blob).
* + Each entry in SUPERINT/SUPERFINGER corresponds to one of the "logical block",
* + and its string is the first contained in that block.
* + A logical block consists of a header, followed by a large chunk of memory
* read as either an array of ENTRY structures, of chars, or of long integers.
* + The list of ENTRYs is again sorted alphabetically, and its length given in
* the header of the block.
* + Each ENTRY structure describes one string, or "term". The actual string is
* of size entry.size, and found in the char view of the block, at position
* + The ENTRY struct also contains information about the number of POSTINGs
* held for this string in the ".po" file.
* + At the next long-aligned address following the string belonging to an entry,
* the file position of the first of these POSTINGs in the ".po" file is stored.
* ---- the .po file (cscope.po.out or cscope.out.po)
* + This is just a long array of POSTING structs, which in turn hold file position
* indicators in the cscope.out file, and an index into the array of filenames found
* in cscope.out (and buffered into RAM).
* To find all uses of a query string via the inverted index, the steps are:
* find the largest i < SUPERINT such that:
* SUPERFINGER[SUPERINT[1+i]] <= querystring
* read logical block number 'i' from the file
* find the largest j < header.numitems such that:
* defining ENTRY e := entblk[j]:
* string(chrblk[e.offset], length e.size) <= querystring
* define k := (long integer directly behind chrblk[e.offset + e.size])
* seek to position 'k' in the .po file to 'k'
* for l := 0 .. (e.post-1):
* read POSTING 'p' from file
* check p.type to see if you wanted this type of occurence (#include, call, ...)
* if so:
* seek to position p.lineoffset in cscope.out
* read until double newline and print to screen
* seek to position p.fcnoffset in cscope.out for the name of the enclosing function
* p.fileindex tells the place of the filename in the list of source files
The majority of the pointer voodoo in the original version (some of it
still present in this modified version --- it's far from perfect) is due
to those memory areas being used simultaneously as arrays of different
In database-ish terms, what this all burns down to is that .po.out is a
table of records with 3 many-to-one relations to cscope.out and one 'type'
data field. And .in.out is a table of arbitrary-length strings with a
one-to-many relation to the .po.out table. .in.out has an index for the
"string" field, implemented as a two-stage binary search on sorted arrays.
(SUPERFINGER, and the ENTRY array).
I hope this sheds some light on this rather scary piece of code in case
anyone of you ever has to actually touch it...
Hans-Bernhard Broeker (broeker@...)
Even if all the snow were burnt, ashes would remain.