From: <erc...@sp...> - 2006-11-18 16:13:42
|
Hi, I have been using htdig to search my one domain, and it has been working great. Now, I have 2 domains and would like to index both domains from my htdig on domain #1. I attempted to do this in my htdig.conf in an obvious (for me) way, but it didn't work (only indexed the 1st (local) domain). I would like to know if searching 2 domains is supported and I should keep trying, or is it simply not supported? If it is supported, I would love to see a simple example how it might be done in the htdig.conf file. Thanks, EC |
From: <erc...@sp...> - 2006-11-21 17:29:16
|
Hi, Thanks for your replies. Here is the htdig.conf settings I've used to try to index 2 domains: database_dir: /home/ercarlso/opt/htdig-3.1.5/db database_base: ${database_dir}/db2 # URLS below contains an html list of all the HTML files under files2index directory start_url: http://www.mydomain1.com/files2index/full_list.html http://www.mydomain2.net/files2index/full_list.html limit_urls_to: mydomain1.com/files2index mydomain2.net/files2index exclude_urls: /cgi-bin/ .cgi bad_extensions: .wav .gz .z .ico .sit .au .zip .tar .hqx .exe .com .gif \ .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi maintainer: pu...@my... max_head_length: 10000 max_doc_size: 200000 no_excerpt_show_top: true search_algorithm: exact:1 synonyms:0.5 endings:0.1 # left out remaining settings for graphics... With these settings, I get all of "mydomain1.com" indexed, but nothing from "mydomain2.com" indexed after running "rundig -v" I was able to index one domain OR the other domain using the method Jennifer described (two different htdig.conf files), but did not take the next step to merge the 2 databases. I'd like to try the simple method first. Thanks! EC |
From: Mike C. <mi...@mi...> - 2006-11-21 17:57:28
|
On Tue, 21 Nov 2006 12:27:08 -0500 erc...@sp... wrote: > start_url: http://www.mydomain1.com/files2index/full_list.html > http://www.mydomain2.net/files2index/full_list.html > > limit_urls_to: mydomain1.com/files2index > mydomain2.net/files2index If this is an exact copy of your config file, you need a backslash at the end of two lines, like this: start_url: http://www.mydomain1.com/files2index/full_list.html\ http://www.mydomain2.net/files2index/full_list.html limit_urls_to: mydomain1.com/files2index\ mydomain2.net/files2index Mike -- Mike Causer Email - mailto:mi...@mi... GPG KeyID 1C2DDA07 WWW - http://www.mikecauser.com Flood the fen again! - Wicken Fen enlargement - http://www.wicken.org.uk |
From: <mic...@bt...> - 2006-11-22 10:05:19
|
Stuff like this is covered at http://www.htdig.org/cf_general.html if you have any more problems. Note that email appears to have screwed up Mike C's first example - the line continuation should be a space followed by a backslash - this appears to have become part of the URL in the example here. Regards, Mike B > -----Original Message----- > From: htd...@li...=20 > [mailto:htd...@li...] On=20 > Behalf Of Mike Causer > Sent: Tuesday, November 21, 2006 5:57 PM > If this is an exact copy of your config file, you need a=20 > backslash at the > end of two lines, like this: >=20 >=20 > start_url: http://www.mydomain1.com/files2index/full_list.html\ > http://www.mydomain2.net/files2index/full_list.html > =20 > limit_urls_to: mydomain1.com/files2index\ > mydomain2.net/files2index >=20 >=20 >=20 > Mike > --=20 > Mike Causer Email -=20 > mailto:mi...@mi... > GPG KeyID 1C2DDA07 WWW -=20 > http://www.mikecauser.com > Flood the fen again! - Wicken Fen enlargement -=20 > http://www.wicken.org.uk >=20 |
From: Malcolm A. <mal...@co...> - 2006-11-22 10:15:37
|
On Wed, 22 Nov 2006 10:04:48 -0000, <mic...@bt...> wrote: > Note that email appears to have screwed up Mike C's first example - th= e > line continuation should be a space followed by a backslash - this > appears to have become part of the URL in the example here. My initial thought was that the continuation marker was space-backslash but I checked the documentaion and that clearly say it is just backslash= , no requirement for a space in front of it. Note though that the backslas= h /must/ be the last character on the line, if there is a space after the backslash it will not be treated as a continuation marker. The sucking of the backslash into the URL is an email client issue. =3D Malcolm. -- = Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ |
From: Mike C. <mi...@mi...> - 2006-11-22 10:36:19
|
On Wed, 22 Nov 2006 10:04:48 -0000 <mic...@bt...> wrote: > Note that email appears to have screwed up Mike C's first example - the > line continuation should be a space followed by a backslash - this > appears to have become part of the URL in the example here. You're on a MSDOS-based platform, /it/ does that to your email. Doesn't afflict ht://Dig though. Mike -- Mike Causer Email - mailto:mi...@mi... GPG KeyID 1C2DDA07 WWW - http://www.mikecauser.com Flood the fen again! - Wicken Fen enlargement - http://www.wicken.org.uk |
From: Jennifer G. <jen...@gm...> - 2006-11-20 21:12:57
|
EC, Try this: create a separate conf file for the second domain (say, second_domain.conf) and give that as a parameter to rundig: rundig -c ../conf/second_domain.conf That will index the second domain (make sure to choose a different location for the database so as not to overwrite your first domain). Then use htmerge to merge the two databases into one. I'm not too familiar with htmerge, but this should work from what I've read. If someone knows of a better method, please advise. Hope this helps, Jen On 11/18/06, erc...@sp... <erc...@sp...> wrote: > > Hi, > > I have been using htdig to search my one domain, and it has been working > great. > > Now, I have 2 domains and would like to index both domains from my htdig > on domain #1. I attempted to do this in my htdig.conf in an obvious (for > me) way, but it didn't work (only indexed the 1st (local) domain). > > I would like to know if searching 2 domains is supported and I should keep > trying, or is it simply not supported? > > If it is supported, I would love to see a simple example how it might be > done in the htdig.conf file. > > Thanks, > EC > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > |
From: <mic...@bt...> - 2006-11-21 10:10:22
|
There should be no need for anything this complicated - it is easy to specify multiple starting points, and multiple 'restrict' items, in the same config, and carry out a single index of both sites. =20 Regards, Mike ________________________________ From: htd...@li... [mailto:htd...@li...] On Behalf Of Jennifer Gallardo Sent: Monday, November 20, 2006 9:13 PM To: htd...@li... Subject: Re: [htdig] Indexing 2 domains =09 =09 EC, =09 Try this: create a separate conf file for the second domain (say, second_domain.conf) and give that as a parameter to rundig: =09 rundig -c ../conf/second_domain.conf=20 =09 That will index the second domain (make sure to choose a different location for the database so as not to overwrite your first domain). Then use htmerge to merge the two databases into one.=20 =09 I'm not too familiar with htmerge, but this should work from what I've read. If someone knows of a better method, please advise. =09 Hope this helps, =09 Jen =09 =09 On 11/18/06, erc...@sp... < erc...@sp... <mailto:erc...@sp...> > wrote:=20 Hi, =09 I have been using htdig to search my one domain, and it has been working great. =09 Now, I have 2 domains and would like to index both domains from my htdig on domain #1. I attempted to do this in my htdig.conf in an obvious (for me) way, but it didn't work (only indexed the 1st (local) domain). =09 I would like to know if searching 2 domains is supported and I should keep trying, or is it simply not supported? =09 If it is supported, I would love to see a simple example how it might be done in the htdig.conf file. =09 Thanks, EC =09 =09 ------------------------------------------------------------------------ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your=20 opinions on IT & business topics through brief surveys - and earn cash =09 http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDE V=20 _______________________________________________ ht://Dig general mailing list: <htd...@li... > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) =09 https://lists.sourceforge.net/lists/listinfo/htdig-general=20 =09 |