On Fri, 8 Nov 2002, Lachlan Andrew wrote:
> Regarding the flags, I can see why it makes sense to store
> the information, but it doesn't need to be as a bit-field.
I do think it makes sense to have a bit field. Remember that we're not
just planning a database for HTML documents. Yes, some of the current bits
are exclusive, but I can imagine that some XML documents might want
combined bits, e.g.:
text to be indexed
Yes, some of the current flags could be in a lookup, but some
(i.e. FLAG_CAPITAL) are clearly a bitfield. I could also see some
situations where FLAG_AUTHOR and FLAG_KEYWORDS are combined, and
conceivably the parser should be smart enough to decide if FLAG_LINK_TEXT
and FLAG_URL should be combined, e.g.
Yes, you might argue these are somewhat contrived. But when we were first
planning the database format for 3.2, we considered that arbitrary
documents and XML might be included in a "3.2" release with user-defined
bits and field-restricted searching.
> can thank Mr Gates for that one... However, it could also
> be treated as "level 3 heading", unless it is already given
> extra weight somehow.
It is not given extra weight currently. Again, the catch would be with
field-restricted searches. If we treat things as a level-3 heading or
whatever, then we have to block a search at that level as you'll get more
than you asked for.
From: Lachlan Andrew <lha@ee...> - 2002-11-10 09:15:49
On Sat, 9 Nov 2002 02:55, Geoff Hutchison wrote:
> I do think it makes sense to have a bit field.
> I can imagine that some XML documents might want
> combined bits
> Yes, some of the current flags could be in a lookup, but
> some (i.e. FLAG_CAPITAL) are clearly a bitfield.
I wasn't meaning to suggest a lookup table in the sense of
"coding n flags in log(n) bits". I meant a table of
*combinations* of flags which are compatible. This can
still result in a moderate saving in bits, without losing
much flexibility at all. Even FLAG_CAPITAL may not make
sense in some contexts, such as for an email address or a
Of course, the "lookup table" approach can (in principle)
degenerate to a collection of pure bit fields if *all*
combinations are considered meaningful. That way it
provides the most generality for a given number of bits
(albeit at the expense of a huge table if the number of
bits is large).
Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg CRICOS Provider Code
University of Melbourne, Victoria, 3010 AUSTRALIA 00116K