From: Soon-Son K. <ks...@kl...> - 2002-03-25 14:19:54
|
Hello folks: I recently changed httpd.conf to store information only for some specific URL as follows. limit_urls_to: HOWTO Translations KoreanDoc But after modifying the limit_urls_to, the db size grew much bigger than before. Here is current db size: -rw-r--r-- 1 root root 199M 3월 25 08:30 db.docdb -rw-r--r-- 1 root root 1.9M 3월 17 06:28 db.docs.index -rw-r--r-- 1 root root 1.6G 3월 25 23:15 db.wordlist Here is old db size: -rw-r--r-- 1 root root 15M 6월 10 2001 db.docdb -rw-r--r-- 1 root root 442k 6월 10 2001 db.docs.index -rw-r--r-- 1 root root 26M 6월 10 2001 db.wordlist -rw-r--r-- 1 root root 28M 6월 10 2001 db.words.db And it seems that the db.wordlist is getting bigger because I run rundig every week. Has anyone faced the same situation yet? I am using somewhat old version (3.1.2) because I have a patch which enables htdig to deal with 2-byte character data. I am also want to know if current htdig supports asian (especially Korean) characters or not. AFAIK, not yet but things may have changed. :-) -- -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* (o_ **WTFM** (o_ (o_ //\ (/)_ (/)_ V_/_ http://kldp.org -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* |
From: Geoff H. <ghu...@ws...> - 2002-03-25 19:07:56
|
On Mon, 25 Mar 2002, Soon-Son Kwon wrote: > limit_urls_to: HOWTO Translations KoreanDoc > > But after modifying the limit_urls_to, the db size > grew much bigger than before. I would worry that with the limit_urls_to that you have set, that you could end up heading off-site. If I had to take a guess from your domain name, I'd think perhaps you headed to the main LDP site and started indexing there. > Has anyone faced the same situation yet? > I am using somewhat old version (3.1.2) because I have a patch which > enables htdig to deal with 2-byte character data. > > I am also want to know if current htdig supports asian (especially Korean) > characters or not. AFAIK, not yet but things may have changed. :-) No, but we'd certainly be very interested in that patch. We'd certainly like to support multi-byte character sets, but as none of the currently active developers: * has much multi-byte data to index * is familiar with programming Unicode/UTF-8 encodings * can easily test multi-byte indexing we haven't made much progress. Of course if you have a patch that can point the way towards this, we'd certainly give it a look and/or work with people to get multi-byte indexing working. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |
From: Soon-Son K. <ks...@kl...> - 2002-03-30 15:39:09
|
On Mon, Mar 25, 2002 at 02:05:34PM -0500, Geoff Hutchison wrote: > On Mon, 25 Mar 2002, Soon-Son Kwon wrote: > > > limit_urls_to: HOWTO Translations KoreanDoc > > > > But after modifying the limit_urls_to, the db size > > grew much bigger than before. > > I would worry that with the limit_urls_to that you have set, that you > could end up heading off-site. If I had to take a guess from your domain > name, I'd think perhaps you headed to the main LDP site and started > indexing there. In fact, I am running Korean LDP and my server has some other websites but I want htdig search only on the URL which contains the above on my domain only.(kldp.org) So I set limit_urls_to to "kldp.org/HOWTO kldp.org/Translations kldp.org/KoreanDoc" to let htdig store information only for the URLs which contain the above strings but the result was the same. db.wordlist grew up to 1.4GB until it eat up all the disk space left. Can anyone please let me know how can I let htdig store only for some specific directories? > > Has anyone faced the same situation yet? > > I am using somewhat old version (3.1.2) because I have a patch which > > enables htdig to deal with 2-byte character data. > > > > I am also want to know if current htdig supports asian (especially Korean) > > characters or not. AFAIK, not yet but things may have changed. :-) > > No, but we'd certainly be very interested in that patch. We'd certainly > like to support multi-byte character sets, but as none of the currently > active developers: > * has much multi-byte data to index > * is familiar with programming Unicode/UTF-8 encodings > * can easily test multi-byte indexing > we haven't made much progress. > > Of course if you have a patch that can point the way towards this, we'd > certainly give it a look and/or work with people to get multi-byte > indexing working. In fact, this patch is over 2 years old and at that time, this patch was rejected because this is only for Korean...not Unicode/UTF-8. This patch works only for 3.1.2 and the developer stopped upgrading it. -- -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* (o_ **WTFM** (o_ (o_ //\ (/)_ (/)_ V_/_ http://kldp.org -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* |