Feature requested by Dave Skinner. Here is note from
the list asking for feature. We chatted inhouse and
decided it makes sense:
Dave Skinner wrote:
> This was not in my list of three but it is easier to
send....
>
> ExtractorUniversal.java contains the following check
>
> protected void innerProcess(CrawlURI curi) {
> if(curi.hasBeenLinkExtracted()){
> //Some other extractor already handled
this one. We'll
> pass on it.
> return;
> }
>
> I think all the extractors should have the same or
similar code.
> Right now
> it is not easy to prevent a curi from having its
links followed. I cant
> find anywhere in the standard code where this is
checked other than
> the one
> place in ExtractorUniversal.
We had a chat here and it makes sense that extractors
should default to
not run if links have already been extracted.
Michael Stack
API
None
Public
|
Date: 2007-03-14 01:38
|
|
Date: 2005-01-28 19:13 Logged In: YES |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use