From: <mic...@sy...> - 2005-01-12 14:15:34
|
I've been thinking a little more about this problem, and since it seems to consist of two parts, I wonder if it can be solved by splitting the dig into two parts, and then merging the databases. If you use: limit_urls_to: DO_TOPIC \ DO_ROOT \ DO_COMMUNITY in one config, then my understanding of your problem is that the only 'GOOD' URL that you will exclude is http://example.org/index.html=20 If you then have: limit_urls_to: ${start_url} Max_docs: 1 (or something similar) in a second config then you should be able to get the missing document into= a second database, and merge it into the first. The only problem that I can see then is that on many systems you may not be able to get a good index this way, since the obvious start point is not accessible in the main dig. This may then be overcome by feeding a URL list generated by the 'short dig' (config 2) into the 'full dig' (config 1) Mike > On Mon, 10 Jan 2005, Dan Langille wrote: >=20 > > How can I use that on limit_urls_to? I've been trying this: > > > > limit_urls_to: ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY* > > > > There are addiitonal restrictions, but once I get a=20 > starting point, I=20 > > think it'll all fall into place. > > > > A few example of what we want to do: > > > > http://example.org/index.html OK =20 > http://example.org/index.html?ID=3D4 =20 > > BAD =20 > http://example.org/index.html?ID=3D4&DO_TOPIC OK >=20 ******************************************************************** This email may contain information which is privileged or confidential. If = you are not the intended recipient of this email, please notify the sender = immediately and delete it without reading, copying, storing, forwarding or = disclosing its contents to any other person Thank you Check us out at http://www.bt.com/consulting ******************************************************************** |