|
From: Geoff H. <ghu...@ws...> - 2002-11-08 05:10:06
|
On Thursday, November 7, 2002, at 10:12 PM, Gilles Detillieux wrote: > I think there are some cases where that's true, but not necessarily in > all > cases, so I don't know how much you can optimize this. E.g., for > certain > keyword tags we allow the form <meta foo="bar">, but the configurable > keyword names must be of the form <meta name="foo" contents="bar">. > I don't know that we'd want to fully generalize this, but I'm open to > suggestions/recommendations from others. Keep in mind that the form <meta name="foo" contents="bar"> is the definitive W3C standard, whereas the other form is an older, depreciated case. I don't see much HTML like this anymore. Whether we want to completely ignore them or not is hard to say. > I'm sure there'd be a fair bit of discussion about this in the htdig-dev > archives of 2-3 years ago. I don't think it ever got formally > documented > elsewhere (yet). The reason was to allow "scoring on the fly". As well, it allows restricting word searches based on the "field" or tags that contain the words. > The decision to put all headings into one factor was to reduce the > number > of bits the flag would take by 5, so the flags can fit in a single byte. > We're going to have to increase this anyway, to accomodate custom > fields, > so it might make sense to reintroduce the distinction between heading No, the flags never were supposed to be a single byte. There happen to be 8 bits currently defined, but more than this should be actually stored for custom fields (and ideally to keep the database format identical). OTOH, there were 6 slots for headings under 3.1, and it seems like a huge waste of bits considering most won't be used--even with 3-bit encoding. Some other document formats also don't make much distinction between heading levels. Do people really think that markup beyond h1, h2 and h3 occurs? A lot of HTML I see these days uses <strong> or <b> or <i> tagging (or worse, <font>). Keep in mind that every bit we add to the flags adds more space to every word. Right now, I've specified 8 bits, including author and URL text which aren't currently used. -Geoff |