Hello, everybody,
I once more spent some time with my private, "structure-improved" version
of invlib.[ch] which I had originally started to get the 64bit problems
solved, and I guess it'd be a good idea to document the results
publically, in case anybody else ever has to try and understand what these
routines actually do. So I decided to just mail them around... please
excuse the bulk of it.
Attached, you'll find:
1) in_dump.c, po_dump.c
--> programs to dump .in.out and .po.out files into human-readable
format
2) My latest, modified invlib.c. Don't expect to be able to compile it
as-is with a normal cscope tree ---- this is for looking at, not for
using ;->
Here's the core result: a copy of a documentation comment I put into
invlib.c that describes what all those strange binary data in the file are
supposed to do, and how they're meant to be used:
/* Description of the workings of this module and the files it writes:
*
* ---- the .in file (cscope.in.out or cscope.out.in):
* + The file header should be self-explanatory...
* + The file is subdivided into so-called "logical blocks".
* + SUPERFINGER is a table of strings (one long char[] with \0 between strings)
* + SUPERINT is an array of indices into SUPERFINGER, with entry #0 holding
* the number of entries.
* + SUPERINT and SUPERFINGER are sorted alphabetically.
* + In memory, SUPERINT and SUPERFINGER are separate, dynamically growing arrays.
* + In the file, they're concatenated into a single blob (and the indices in SUPERINT
* shifted to make them relative to the blob).
* + Each entry in SUPERINT/SUPERFINGER corresponds to one of the "logical block",
* + and its string is the first contained in that block.
*
* + A logical block consists of a header, followed by a large chunk of memory
* read as either an array of ENTRY structures, of chars, or of long integers.
* + The list of ENTRYs is again sorted alphabetically, and its length given in
* the header of the block.
* + Each ENTRY structure describes one string, or "term". The actual string is
* of size entry.size, and found in the char[] view of the block, at position
* entry.offset
* + The ENTRY struct also contains information about the number of POSTINGs
* held for this string in the ".po" file.
* + At the next long-aligned address following the string belonging to an entry,
* the file position of the first of these POSTINGs in the ".po" file is stored.
*
* ---- the .po file (cscope.po.out or cscope.out.po)
* + This is just a long array of POSTING structs, which in turn hold file position
* indicators in the cscope.out file, and an index into the array of filenames found
* in cscope.out (and buffered into RAM).
*
* To find all uses of a query string via the inverted index, the steps are:
*
* find the largest i < SUPERINT[0] such that:
* SUPERFINGER[SUPERINT[1+i]] <= querystring
* read logical block number 'i' from the file
* find the largest j < header.numitems such that:
* defining ENTRY e := entblk[j]:
* string(chrblk[e.offset], length e.size) <= querystring
* define k := (long integer directly behind chrblk[e.offset + e.size])
* seek to position 'k' in the .po file to 'k'
* for l := 0 .. (e.post-1):
* read POSTING 'p' from file
* check p.type to see if you wanted this type of occurence (#include, call, ...)
* if so:
* seek to position p.lineoffset in cscope.out
* read until double newline and print to screen
* seek to position p.fcnoffset in cscope.out for the name of the enclosing function
* p.fileindex tells the place of the filename in the list of source files
*/
The majority of the pointer voodoo in the original version (some of it
still present in this modified version --- it's far from perfect) is due
to those memory areas being used simultaneously as arrays of different
types.
In database-ish terms, what this all burns down to is that .po.out is a
table of records with 3 many-to-one relations to cscope.out and one 'type'
data field. And .in.out is a table of arbitrary-length strings with a
one-to-many relation to the .po.out table. .in.out has an index for the
"string" field, implemented as a two-stage binary search on sorted arrays.
(SUPERFINGER, and the ENTRY array).
I hope this sheds some light on this rather scary piece of code in case
anyone of you ever has to actually touch it...
--
Hans-Bernhard Broeker (broeker@...)
Even if all the snow were burnt, ashes would remain.
|