[htdig-dev] binary document-database format questions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello developers,

I've analyzed the binary document-database format so that I'm now able to extract the informations without using the textual database.
But there's one thing I couldn't figure out:

The ID bytes for length informations (excerpt length, docume size, URL length) varies. Say we have a document size of less than 100h bytes.
Then the ID byte has the value 44h for that information. The size needs only one byte.
If the size exceeds 100h bytes (it needs two or more bytes) then the ID byte has the value 84h. What's the logic behind this ? Only to determine the byte count for the size ?
At the moment I've handled it using a switch/case statement.

And why is the document size information stored twice in the database ?

Thanks in advance,

Walantis

--
l8r,
Walantis

http://www.xraw.de