|
From: Gabriele B. <bar...@in...> - 2003-10-15 23:55:42
|
Cheers Neal, >It seems like it's to our advantage to always do a HEAD call, unless it's >an initial dig, where it is wastefull... and that the state of >persitent_connections is irrelevant to this decision. Let me try to understand. What you suggest is: 1) killing head_before_get 2) performing HEAD calls only in the incremental dig (either with or without persistent connections) 3) Unlinking the Head before Get mechanism from the persistent connections one If it is so, it could be good for me (for number 1 I will do what you guys decide). I had not understood it from the earlier messages, sorry. Even though - personally - I would not kill the attribute because: 1) It could be useful in cases when we don't know whether a document is parsable or not according to the *usual* means of exclusions (that is to say the URL). I know so far we take in consideration only the content-type but maybe in future release we could use other HTTP headers (i.e. cookies, language, etc.) and a pre-emptive head could save time in a initial dig as well. 2) I share the library with ht://Check which massively uses this option as it has to retrieve any document - images too - and a HEAD call could save a lot of time in the initial dig. I'd love to maintain the logic of the net library the more similar possible. 3) Killing the attribute would not avoid us to change the code in order to store information about the retrieval status in the Retriever and Document classes (unless we intend to use some classes variables - which I hate) >I don't have a problem keeping head_before_get, as long as we make the >default TRUE. That's the default. Please let me know if the Retriever and Document classes changes make sense to you guys and I will modify the code. Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Neal R. <ne...@ri...> - 2003-10-16 04:00:29
|
> but maybe in future release we could use other HTTP headers (i.e. cookies, > language, etc.) and a pre-emptive head could save time in a initial dig as > well. Yep.. even on an initial dig HEAD is a good idea.. unless the website is almost all HTML pages with few images... which seems pretty pie-in-the-sky at this point. > 2) I share the library with ht://Check which massively uses this option as > it has to retrieve any document - images too - and a HEAD call could save a > lot of time in the initial dig. I'd love to maintain the logic of the net > library the more similar possible. > Please let me know if the Retriever and Document classes changes make sense > to you guys and I will modify the code. I think what we've had here is informative debate. You as much as anyone else wrote the networking code, so for me it's your decision. I think the new TRUE default is fine. If you've perfected this logic in ht://Check, then we should probably consider syncing with your net code after 3.2 is done. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gabriele B. <bar...@in...> - 2003-10-19 13:37:42
|
> I think what we've had here is informative debate. You as much as
>anyone else wrote the networking code, so for me it's your decision. I
>think the new TRUE default is fine.
OK. Any other opinions?
> If you've perfected this logic in ht://Check, then we should probably
>consider syncing with your net code after 3.2 is done.
So ... is it ok for you guys if I go on with the Retriever, Document and
HtHTTP patch as suggested in the previous e-mails?
Basically, in order to perform always a HEAD call during an incremental
indexing, I need to store the information in both the Retriever and
Document class. Is that right for you? In particular, I suggest this enum:
enum RetrieverType {
Retriever_Initial,
Retriever_Incremental
};
and then change the constructor this way:
Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t =
Retriever_Initial);
In 'htdig.cc', we check whether the dig is an initial dig or not and:
if(!initial) // Switch the retriever type to Incremental
retriever_type = Retriever_Incremental;
therefore, when we instantiate the main retriever object, we just simply
add this:
Retriever retriever(Retriever_logUrl, retriever_type);
Please let me know.
Ciao and thanks,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Melbourne, Victoria, Australia
bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno
|
|
From: Gilles D. <gr...@sc...> - 2003-10-20 22:03:50
|
According to Gabriele Bartolini:
> > I think what we've had here is informative debate. You as much as
> >anyone else wrote the networking code, so for me it's your decision. I
> >think the new TRUE default is fine.
>
> OK. Any other opinions?
I think it was just a matter of not understanding what the attribute did or
didn't do, and in which circumstances it would be useful to change it.
Because of the potential for serious performance degradation when you get it
wrong, I think it would be helpful if the code automatically did the right
thing in most circumstances, and if the documentation for this attribute
made it clear in which circumstances it would make sense to turn it off.
> > If you've perfected this logic in ht://Check, then we should probably
> >consider syncing with your net code after 3.2 is done.
>
> So ... is it ok for you guys if I go on with the Retriever, Document and
> HtHTTP patch as suggested in the previous e-mails?
I think that's what Neal was getting at when he said it's your decision.
You wrote the networking code, so you know better than anyone else what's
needed to make this particular change. It sounds reasonable to me that
you'd need to make changes to these classes, as that's where the needed
decisions must be made about the appropriate default action.
> Basically, in order to perform always a HEAD call during an incremental
> indexing, I need to store the information in both the Retriever and
> Document class. Is that right for you? In particular, I suggest this enum:
>
> enum RetrieverType {
> Retriever_Initial,
> Retriever_Incremental
> };
>
> and then change the constructor this way:
>
> Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t =
> Retriever_Initial);
>
> In 'htdig.cc', we check whether the dig is an initial dig or not and:
>
> if(!initial) // Switch the retriever type to Incremental
> retriever_type = Retriever_Incremental;
>
> therefore, when we instantiate the main retriever object, we just simply
> add this:
>
> Retriever retriever(Retriever_logUrl, retriever_type);
>
> Please let me know.
Well, it seems to me that there are actually two different cases where
htdig does an initial dig. The obvious one is when the user specifies
-i, which sets the initial flag. The less obvious one is when htdig is
run without -i, but with no existing database, or with an empty one.
What matters is whether there are URLs in the database or not. If there
are none, then you'll never reject a document as "not changed".
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
|
|
From: Gabriele B. <bar...@in...> - 2003-10-21 07:49:58
|
Hi guys, At 17.02 20/10/2003 -0500, Gilles Detillieux wrote: >wrong, I think it would be helpful if the code automatically did the right >thing in most circumstances, and if the documentation for this attribute >made it clear in which circumstances it would make sense to turn it off. Yep. I think so too. Anyway, I modified the defaults.cc by putting the attribute in a 'true' default state and by explaining that: - during an incremental dig, the value is overridden; - in general, it is recommended to leave this value on. I did not specify cases in which the attribute should be turned off as I thought I would have generated more confusion in the user. However, I would pick these general cases, where the user should disable the attribute (please revise it): Case A - Persistent connections on 1) the majority of documents are HTML (this means we "always" want to GET them) 2) the server does not support HEAD (I have seen cases like this unfortunately) 3) cases where the persistent communication between htdig and the server does not work at 100%: there can be some problems with persistent connections and HEAD calls (I experience this kind of problems sometimes with ht://Check and some NT servers) Case B - Persistent connection off 1) same as case A 2) same as case A 3) I have never experienced any problem as in case A.3 with persistent connections disabled >Well, it seems to me that there are actually two different cases where >htdig does an initial dig. The obvious one is when the user specifies >-i, which sets the initial flag. The less obvious one is when htdig is >run without -i, but with no existing database, or with an empty one. >What matters is whether there are URLs in the database or not. If there >are none, then you'll never reject a document as "not changed". OK. Good point. I think I changed the Retriever class in order to perform this check as well. Also, during an incremental dig, if debug > 1 I show a notice message, saying that any head before get attribute configuration is overridden and that HEAD is always enabled. Sounds good? Ciao and thanks, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Gabriele B. <bar...@in...> - 2003-10-22 00:30:59
|
At 16.01 21/10/2003 -0500, Gilles Detillieux wrote: > > 2) the server does not support HEAD (I have seen cases like this > unfortunately) >OK, that sounds pretty important. I hadn't heard that one before. I meant that some server administrators may turn off the HEAD method (in Apache you can use the Limit directive). >but don't support the HEAD request. Wouldn't this be an argument against >overriding head_before_get during an incremental dig? I guess it is a matter of choosing the less painful solution. In the normal case (p/c on and hbg on) overriding is not done; however, in the incremental dig, one more request is made (HEAD) without success and hopefully - after that - the document GETs retrieved. There is a bit of overhead for sure but the question is: is it better to have a bit of overhead in some cases (minority) or to prevent users from getting the benefit from using always a workin HEAD call when updating the database? The other way is to remove the override and leave everything in the hands of the user (I would not mind this - of course providing a better documentation). With the changes done yesterday we have moved towards a clearer situation anyway, because: - head before get is now true by default - head before get has been detached by persistent connections and has become independent > > 3) cases where the persistent communication between htdig and the server > > does not work at 100%: there can be some problems with persistent > > connections and HEAD calls (I experience this kind of problems sometimes > > with ht://Check and some NT servers) > >Again, is this going to be a problem if we don't allow turning off >head_before_get during an update dig? I guess this could be fixable, because the problem comes up with persistent connections - which may be still disabled. >with these questionably compliant servers, then wouldn't they need a way >of turning off head_before_get unconditionally, whether it's an update >dig or an initial one? Yes, that'd be great. Again, I guess we have to balance what we can do in order to make things easier to the user but, at the same time, leave the users enough freedom in order to configure their systems the way they want. Also, with 3.2, the server and URL blocks have added more dimensions to the space of configurability available to users and ... more "clear" attributes are available and more the toy gets perfect. >This is what I was getting at before about this option never being >explained adequately. You're right. > On the surface, it seemed to be rather useless, >but with these new revelations that have come out of your testing, it >seems there may indeed be a need for turning this off in some cases. >That's the sort of thing that should be documented so others (developers >and end-users) know what you'd use this for. So ... we have 2 possibilities now: 1) leave the code as is 2) remove the overriding of the head before get in the incremental dig In both cases we need to write down a better documentation for this attribute (especially in the option 2 where we should talk about the benefits of a HEAD call in the incremental dig). I must confess. I would prefer option 2, as I think users' must have full control of the tool and IMHO by adding a default behaviour of HEAD before GET to the system we've done our part. So tell me what you think, especially you Gilles and Neal that have followed this thread. I am more than happy to (in case) rechange the code today. Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Lachlan A. <lh...@us...> - 2003-10-22 13:35:08
|
Greetings all, I've only been following this thready loosely, but my opinions are: 1. In version 3.2.1 (or 3.3, or 4.0) there should be three possible=20 settings: true, false, auto. That way the user has complete=20 control, but doesn't need to exert it. 2. We are in feature freeze, and scheduled to release in one week's=20 time, at the end of October. We should minimise changes to the code. =20 Has a bug report been filed for this issue yet? Wasn't the plan to=20 have no CVS commits without reference to a bug number? Cheers, Lachlan On Wed, 22 Oct 2003 08:30, Gabriele Bartolini wrote: > So ... we have 2 possibilities now: > > 1) leave the code as is > 2) remove the overriding of the head before get in the incremental > dig > > I must confess. I would prefer option 2, as I think users' must > have full control of the tool and IMHO by adding a default > behaviour of HEAD before GET to the system we've done our part. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Neal R. <ne...@ri...> - 2003-10-22 20:43:32
|
Lachlan wrote: > 2. We are in feature freeze, and scheduled to release in one week's > time, at the end of October. We should minimise changes to the code. > Has a bug report been filed for this issue yet? Wasn't the plan to > have no CVS commits without reference to a bug number? Gabriele: Please create a sourceforge bug for this when you change it... and clue us all in on what the 'net change' is after the commits ;-). As far as the release goes, we need to get some kind of testing report made and updated... I'll try and post something by tommorow. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gabriele B. <bar...@in...> - 2003-10-26 00:17:13
|
At 13.48 22/10/2003 -0600, Neal Richter wrote: > Gabriele: Please create a sourceforge bug for this when you change >it... and clue us all in on what the 'net change' is after the commits >;-). Sorry ... I forgot to open the bug before. Done everything. Ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Neal R. <ne...@ri...> - 2003-10-23 17:46:19
|
Hey all, I used Lachlan's break down (nearly unchanged) to create 19 testing tasks @ Soureforge. Please visit http://sourceforge.net/projects/htdig/ and navigate to Tasks-> Testing 3.2 Some of them are fairly long, some are short. I'm not assigning any to anyone, it's up to each of us to grab a task and complete it. I'm also leaving it up to each person to decide how deep to test. We need to get reasonable coverage. I would also encourage each of you to use valgrind to check for memory leaks while you are testing. Again, the depth you go looking for them is up to you. http://developer.kde.org/~sewardj/ If you need a Sourceforge account, please register yourself and send me an email with your account and I'll add you to the htDig project. If you find an error during testing please: 1)Create a bug 2)Contact appropriate developer or fix it yourself 3)Test fix 4)Commit fix (if you are fixing it) 5)Update status of bug 6)Have a second person either test fix or verify that commited code looks OK.. their choice. So the standards for release of 3.2RC1 are: 1) No important bugs in 'Include_in_3.2' queue 2) All testing tasks completed. Sourceforge is very cool! Thanks all! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-10-25 09:32:39
|
Greetings all, Thanks, Neal, for your work setting up the tasks! When I run valgrind, I get =3D=3D4058=3D=3D Conditional jump or move depends on uninitialised value(= s) =3D=3D4058=3D=3D at 0x40300421: CDB___lock_put_nolock (lock.c:650) from various contexts. I don't fancy changing the BDB code (my last foray was rather=20 forgettable :) Does anyone think that this is an issue, or should we=20 ignore it? Cheers, Lachlan On Fri, 24 Oct 2003 01:09, Neal Richter wrote: > I would also encourage each of you to use valgrind to check for > memory leaks while you are testing. Again, the depth you go looking > for them is up to you. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Neal R. <ne...@ri...> - 2003-10-26 23:01:35
|
Yea, ignore the BDB errors. And most errors about unitialized memory read, and Conditional jump or move depends on uninitialised value(s) are spurious. They are formed by the compilers code generation and there isn't to much that can be to elimiate them at the C/C++-level. You are kicking butt on the testing tasks.. On Fri, 24 Oct 2003, Lachlan Andrew wrote: > Greetings all, > > Thanks, Neal, for your work setting up the tasks! > > When I run valgrind, I get > ==4058== Conditional jump or move depends on uninitialised value(s) > ==4058== at 0x40300421: CDB___lock_put_nolock (lock.c:650) > from various contexts. > > I don't fancy changing the BDB code (my last foray was rather > forgettable :) Does anyone think that this is an issue, or should we > ignore it? > > Cheers, > Lachlan > > On Fri, 24 Oct 2003 01:09, Neal Richter wrote: > > > I would also encourage each of you to use valgrind to check for > > memory leaks while you are testing. Again, the depth you go looking > > for them is up to you. > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This SF.net email is sponsored by: The SF.net Donation Program. > Do you like what SourceForge.net is doing for the Open > Source Community? Make a contribution, and help us add new > features and functionality. Click here: http://sourceforge.net/donate/ > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-10-22 19:51:00
|
Gabriele wrote: > 1) leave the code as is > 2) remove the overriding of the head before get in the incremental dig > > In both cases we need to write down a better documentation for this > attribute (especially in the option 2 where we should talk about the > benefits of a HEAD call in the incremental dig). > > I must confess. I would prefer option 2, as I think users' must have full > control of the tool and IMHO by adding a default behaviour of HEAD before > GET to the system we've done our part. OK, you've convinced me, it IS useful to have this switch be user controlled.. I wasn't aware of the non-compliant servers causing an issue. Clearly 'automatic' behavior in that case is a bad thing. Go with option 2. Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Neal R. <ne...@ri...> - 2003-10-22 21:20:47
|
Hey all, Please go to sourceforge and look at the open bugs if you can, there are 18 'Status:Open' now. There are 6 bugs in the 'Status:Open & Group:Include_in_3.2' state. Gabriele: Did you fix this one already? [ 594790 ] rundig doesn't index Apache w/mod_zip Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Gabriele B. <bar...@in...> - 2003-10-23 17:33:29
|
At 13.45 22/10/2003 -0600, Neal Richter wrote: > OK, you've convinced me, it IS useful to have this switch be user >controlled.. I wasn't aware of the non-compliant servers causing an >issue. Clearly 'automatic' behavior in that case is a bad thing. >Go with option 2. Roger that. :-) -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Gilles D. <gr...@sc...> - 2003-10-25 23:23:25
|
According to Gabriele Bartolini: > At 13.45 22/10/2003 -0600, Neal Richter wrote: > > OK, you've convinced me, it IS useful to have this switch be user > >controlled.. I wasn't aware of the non-compliant servers causing an > >issue. Clearly 'automatic' behavior in that case is a bad thing. > >Go with option 2. > > Roger that. :-) I guess the only safe way to automate the selection of this would be for htdig to keep track, on a server by server basis, to see if a server responds favourably to HEAD requests. If it doesn't, then it would turn off this action for this server, but otherwise it seems it would almost always be an advantage to keep it on. But now we're getting into the area of feature requests, not bug fixes, so this should wait till after the upcoming release. If I'm not mistaken, as the code now stands, htdig will assume a document is inaccessible if the HEAD request fails, and so it won't try the GET on that document at all (unless head_before_get is explicitly set to false). So, properly automating this selection would require some code changes to the HtHTTP classs to implement this -- not something we want to start monkeying with at the eleventh hour before release. I think the current compromise is best, but it should be given a good pounding to make sure it's solid. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Gabriele B. <bar...@in...> - 2003-10-25 09:24:27
|
At 10.19 23/10/2003 -0500, Gilles Detillieux wrote: >I guess the only safe way to automate the selection of this would be >for htdig to keep track, on a server by server basis, to see if a server >responds favourably to HEAD requests. If it doesn't, then it would turn >off this action for this server, but otherwise it seems it would almost >always be an advantage to keep it on. But now we're getting into the That's what actually happens with persistent connections. However, for instance, the 'Limit' directive with apache can be set by directories or locations and I would not risk to disable the attribute for every document on the server just because one failed. Again, I guess that the 'webmaster' is the one that knows his scenario better than any one. >If I'm not mistaken, as the code now stands, htdig will assume a document >is inaccessible if the HEAD request fails, and so it won't try the GET on >that document at all (unless head_before_get is explicitly set to false). Hmmm ... by looking at the code in HtHTTP::Request(), we should add an 'if statement' for the case when the server returns a 405 status code. We should also add a proper Document Status for this in the Transport class (Document_method_not_allowed?). Basically when we issue a HEAD method and we get a not allowed method response, we should get the resource. What do you think? Anyway, I am going to open a feature request for this so we keep it in mind. >So, properly automating this selection would require some code changes >to the HtHTTP classs to implement this -- not something we want to start >monkeying with at the eleventh hour before release. I agree. Ciao, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Gabriele B. <bar...@in...> - 2003-10-24 00:54:21
|
>Gabriele: Did you fix this one already? >[ 594790 ] rundig doesn't index Apache w/mod_zip Yep ... and [828628] too. Sorry, I understood that someone but me should have closed it after testing it. For me they are both fixed and closed. Thanks, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |
|
From: Gilles D. <gr...@sc...> - 2003-10-22 09:47:55
|
According to Gabriele Bartolini: > However, I would pick these general cases, where the user should disable > the attribute (please revise it): > > Case A - Persistent connections on > 1) the majority of documents are HTML (this means we "always" want to GET them) > 2) the server does not support HEAD (I have seen cases like this unfortunately) OK, that sounds pretty important. I hadn't heard that one before. Persistent connections are only on for HTTP/1.1 servers, so what you're saying is that there are servers out there that claim to be 1.1 compliant but don't support the HEAD request. Wouldn't this be an argument against overriding head_before_get during an incremental dig? > 3) cases where the persistent communication between htdig and the server > does not work at 100%: there can be some problems with persistent > connections and HEAD calls (I experience this kind of problems sometimes > with ht://Check and some NT servers) Again, is this going to be a problem if we don't allow turning off head_before_get during an update dig? > Case B - Persistent connection off > 1) same as case A > 2) same as case A In this case, the server could be HTTP/1.1 or 1.0. Either way, the same question applies. If the user needs a way to tell htdig to deal nicely with these questionably compliant servers, then wouldn't they need a way of turning off head_before_get unconditionally, whether it's an update dig or an initial one? This is what I was getting at before about this option never being explained adequately. On the surface, it seemed to be rather useless, but with these new revelations that have come out of your testing, it seems there may indeed be a need for turning this off in some cases. That's the sort of thing that should be documented so others (developers and end-users) know what you'd use this for. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |
|
From: Neal R. <ne...@ri...> - 2003-10-20 22:46:43
|
The overall question I have is this (it was pointed out by someone in a
earlier mail):
Given that calling HEAD enables us to short-ciruit files with invalid
mime-types.. isn't it nearly always benefitial to call HEAD, even when
doing an 'initial-dig'?
The answer to this question may influence your choice of what to commit,
but the description below looks good to me if we want to never call HEAD
during an initial dig.
Thanks.
On Sun, 19 Oct 2003, Gabriele Bartolini wrote:
>
> > I think what we've had here is informative debate. You as much as
> >anyone else wrote the networking code, so for me it's your decision. I
> >think the new TRUE default is fine.
>
> OK. Any other opinions?
>
> > If you've perfected this logic in ht://Check, then we should probably
> >consider syncing with your net code after 3.2 is done.
>
> So ... is it ok for you guys if I go on with the Retriever, Document and
> HtHTTP patch as suggested in the previous e-mails?
>
> Basically, in order to perform always a HEAD call during an incremental
> indexing, I need to store the information in both the Retriever and
> Document class. Is that right for you? In particular, I suggest this enum:
>
> enum RetrieverType {
> Retriever_Initial,
> Retriever_Incremental
> };
>
> and then change the constructor this way:
>
> Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t =
> Retriever_Initial);
>
> In 'htdig.cc', we check whether the dig is an initial dig or not and:
>
> if(!initial) // Switch the retriever type to Incremental
> retriever_type = Retriever_Incremental;
>
> therefore, when we instantiate the main retriever object, we just simply
> add this:
>
> Retriever retriever(Retriever_logUrl, retriever_type);
>
> Please let me know.
>
> Ciao and thanks,
> -Gabriele
> --
> Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
> maintainer
> Current Location: Melbourne, Victoria, Australia
> bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
> Inferno
>
>
>
> -------------------------------------------------------
> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
> The Event For Linux Datacenter Solutions & Strategies in The Enterprise
> Linux in the Boardroom; in the Front Office; & in the Server Room
> http://www.enterpriselinuxforum.com
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Gabriele B. <bar...@in...> - 2003-10-21 08:53:08
|
I read again my e-mail and I think that I should have written this sentence in another way: >2) performing HEAD calls only in the incremental dig (either with or >without persistent connections) I meant: "in the incremental dig perform just HEAD calls". I guess you guys understood: "HEAD is performed only in incremental digs". If so ... I am sorry about that and my english. Ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno |