|
From: Neal R. <ne...@ri...> - 2003-09-16 22:09:03
|
I'm using a fairly recent cvs snapshot... 1) index a website 2) use search terms to return a given page. 3) Take links to that page out of the website 4) reindex site (don't use htdig -i) 5) repeat #2 I would think that this page should be 'obsoleted' and not returned after the reindex.. htpurge does not touch these document since they haven't been marked as obsolete... Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-09-20 13:59:57
|
Yes, that does sound like a bug! Well spotted, Neal! However, if reindexing obsoletes *removed* pages correctly, then I'd=20 vote not to delay the release to fix the behaviour for pages which=20 still exist but "shouldn't" be found. Perhaps it would be best to=20 file a bug report and fix it in 3.2.1... Opinions? Lachlan On Wed, 17 Sep 2003 08:07, Neal Richter wrote: > I'm using a fairly recent cvs snapshot... > > > 1) index a website > > 2) use search terms to return a given page. > > 3) Take links to that page out of the website > > 4) reindex site (don't use htdig -i) > > 5) repeat #2 > > I would think that this page should be 'obsoleted' and not returned > after the reindex.. > > htpurge does not touch these document since they haven't been > marked as obsolete... --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Neal R. <ne...@ri...> - 2003-09-20 21:57:12
|
I've got a fix for it.. a couple lines of code in the section that builds the linked list of search results... On Sat, 20 Sep 2003, Lachlan Andrew wrote: > Yes, that does sound like a bug! Well spotted, Neal! > > However, if reindexing obsoletes *removed* pages correctly, then I'd > vote not to delay the release to fix the behaviour for pages which > still exist but "shouldn't" be found. Perhaps it would be best to > file a bug report and fix it in 3.2.1... > > Opinions? > > Lachlan > > On Wed, 17 Sep 2003 08:07, Neal Richter wrote: > > I'm using a fairly recent cvs snapshot... > > > > > > 1) index a website > > > > 2) use search terms to return a given page. > > > > 3) Take links to that page out of the website > > > > 4) reindex site (don't use htdig -i) > > > > 5) repeat #2 > > > > I would think that this page should be 'obsoleted' and not returned > > after the reindex.. > > > > htpurge does not touch these document since they haven't been > > marked as obsolete... > > -- > lh...@us... > ht://Dig developer DownUnder (http://www.htdig.org) > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > ht://Dig Developer mailing list: > htd...@li... > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-dev > Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 |
|
From: Lachlan A. <lh...@us...> - 2003-09-23 13:25:07
|
On Sun, 21 Sep 2003 07:55, Neal Richter wrote: > I've got a fix for it.. a couple lines of code in the section that > builds the linked list of search results... That sounds great. If it checks the search results, I take it that it=20 doesn't purge the pages from the database itself. What is the patch? Cheers, Lachlan --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |
|
From: Neal R. <ne...@ri...> - 2003-09-25 18:41:47
|
On Tue, 23 Sep 2003, Lachlan Andrew wrote:
> On Sun, 21 Sep 2003 07:55, Neal Richter wrote:
> > I've got a fix for it.. a couple lines of code in the section that
> > builds the linked list of search results...
>
> That sounds great. If it checks the search results, I take it that it
> doesn't purge the pages from the database itself. What is the patch?
Oops, I misspoke... I don't have a fix for that.. it would take a walk
of the docdb to accomplish a fix! We would need to tag every document as
'obsolete' and let the spiderer set the values back to 'normal' as they
see the pages. AFter its finished and htpurge is run, they 'lost' pages
are killed.
I do have a short fox for another related issue:
Parser::parse(...)
There is a bug in this for loop:
for (int i = 0; i < elements->Count(); i++)
{
dm = (DocMatch *) (*elements)[i];
dm->collection = collection; // back reference
if (dm->orMatches > 1)
dm->score *= multimatch_factor;
resultMatches.add(dm);
}
If the query returned any Documents with a DocState of !=
Reference_normal, they are included in the linked-list of results. They
are filtered out on display... this is a bit inefficient. It also screws
up libhtdig results since I don't use display.
Here's the fix, it excludes any document that is not Reference_normal from
the results list.
for (int i = 0; i < elements->Count(); i++)
{
dm = (DocMatch *) (*elements)[i];
ref = collection->getDocumentRef(dm->GetId());
if(ref->DocState() == Reference_normal)
{
dm->collection = collection; // back reference
if (dm->orMatches > 1)
dm->score *= multimatch_factor;
resultMatches.add(dm);
}
}
Thanks
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
|
|
From: Lachlan A. <lh...@us...> - 2003-09-26 11:39:58
|
That sounds like a good strategy. However, I'd vote for keeping that=20 until after the release of 3.2.0 (or at least 3.2.0b5!) Should we perhaps start a new branch in CVS so that development can=20 continue? I have a couple of patches that I have been sitting on for=20 ages, because of the pending release. Any news on when the release is likely to be, or what I can do to=20 expedite it? I plan to test and commit the patch which Jesses says=20 works on HP-UX this weekend, unless we're already in code freeze. Cheers, Lachlan On Fri, 26 Sep 2003 04:40, Neal Richter wrote: > it would take a > walk of the docdb to accomplish a fix! We would need to tag every > document as 'obsolete' and let the spiderer set the values back to > 'normal' as they see the pages. AFter its finished and htpurge is > run, they 'lost' pages are killed. --=20 lh...@us... ht://Dig developer DownUnder (http://www.htdig.org) |