|
From: Geoff H. <ghu...@ws...> - 2002-09-04 16:24:38
|
On Wed, 4 Sep 2002, Walantis Giosis wrote: > The ID bytes for length informations (excerpt length, docume size, URL > length) varies. Say we have a document size of less than 100h bytes. > Then the ID byte has the value 44h for that information. The size > needs only one byte. If the size exceeds 100h bytes (it needs two or > more bytes) then the ID byte has the value 84h. What's the logic > behind this ? Only to determine the byte count for the size ? At the > moment I've handled it using a switch/case statement. Hans-Peter Nilsson rewrote the Serialize/Deserialize routines very carefully, so I can't speak authoritatively. I think he was trying to save as much space as possible. AFAICT, there's a marker indicating that the next variable coming up is sizeof() whatever. Take a look at htcommon/DocumentRef.cc::Serialize() to see the code. > And why is the document size information stored twice in the database ? They should be different. See htcommon/DocumentRef.[cc,h] which deals with the document DB records. In particular, there's the text size of the database and optionally, it can figure out the size of the document including all images. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |