From: Pietro Leone <leone@di...> - 2005-11-25 14:23:26
Hallo, I'm using htdig for indexing an intranet site, several gigabytes of
documents (almost all .doc and .pdf), several tens of thousand of files. Well
the until now the size of db files are:
161809408 Nov 23 11:10 db.docdb
13569024 Nov 23 11:10 db.docs.index
295576571 Nov 23 11:05 db.wordlist
238107648 Nov 23 11:05 db.words.db
and I'm at the 20% of the work. The pc is a PII@... with 128mB of RAM. Can
this hw do the job?
I see that I can use mysql with htdig instead of berkeleydb, can it make the
search faster? Where can I find information on using mysql with htdig?
Another question, the directories tree is:
Into every day_x directory I have the same directories, so when I search for the
word "giustizia" I obtain one entry for every diectory and for every file in
directories called "giustizia" (now, after more then a minute of work, it
returns 12.000 result), how can I manage this situation?
Why htdig show me only 10 pages with 10 results per page? How can I see all
results if I have more than 100? (not a strange situation if I scan 50.000
files) Apart from say to htdig to show more then 10 doc per page.
I will build myself a copper tower
With four ways out and no way in
But mine the glory, mine the power
(So I chose Amiga and GNU/Linux)
Get latest updates about Open Source Projects, Conferences and News.