From: Gabriele B. <g.b...@co...> - 2002-12-09 08:12:38
|
Ciao guys, again, sorry if I will certainly make mistakes. I love to get to know more in this area, which is pretty new for me too. So please be patient. :-) Il lun, 2002-12-09 alle 02:14, Lachlan Andrew ha scritto: > - The format you describe sounds like a "half-inverted" > file -- listing locations *within* a document by word, but > listing *document* locations by document. Is that > correct? I think that was a flat representation of the index file, just an example. Am I right, Neal? In a simple scenario, we'll have - (please consider it is a very very draft!): - a word index (word id, stemmed/unstemmed flag, maybe language?) - a document index (document id, info regarding the document, pretty much as now: title, modification date, etc.) - an inverted index (word id, document id, locations) Words ----- ID Word S/U Lang -- ---- --- ---- 1 traveling 0 en 3 casa 0 it 12 travel 0 en 23 travels 0 en 45 pasta 0 it 60 travel 1 en ... Documents --------- ID URL Other info -- --- ---------- 1 http://www.pippo.it/ ..... 2 http://www.htdig.org/ ... Index ----- ID W ID D Locations and related info (position and markup) ---- ---- ------------------------------------------------ 1 2 1 Value_location 3 Value_location Value_Location is the value given to the location of the word Am I right? Of course it's just an example ... :-) Any comments about the language? > - With stemming in general, what is done about negating > affixes? If I searched for 'mercy', I wouldn't want > results about 'merciless' (although I would want results > about 'merciful'). Good point, are there any plans to include negative words too? Ciao ciao -Gabriele --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |