Hello,
I just noticed that the latest modification of
ExtractorHTML in CVS HEAD causes probably
unsubstantiated alerts, for example:
> Problem java.lang.NullPointerException occured when
trying to process
'http://www.nthposition.com/author.php?authid=281' at
step ABOUT_TO_BEGIN_PROCESSOR in ExtractorHTML:
>
> Associated Throwable: java.lang.NullPointerException
>
> Stacktrace:
> java.lang.NullPointerException
> at
org.archive.crawler.extractor.ExtractorHTML.innerProcess(ExtractorHTML.java
:352)
> at
org.archive.crawler.framework.Processor.process(Processor.java:102)
> at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:273)
> at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:143)
This is caused by the fact that curi.getHttpRecorder()
may return null (javadoc says so), thus an assignment
like cs =
curi.getHttpRecorder().getReplayCharSequence()
(ExtractorHTML, line 352) is not guaranteed to be
evaluated without throwing a NullPointerException. This
exception is no longer caught directly, so it is thrown
up too far, triggering the reported alert.
A patch fixing this behaviour is attached.
Christian Kohlschütter
Nobody/Anonymous
Extraction
None
Public
|
Date: 2007-03-14 00:21
|
|
Date: 2005-02-21 13:42 Logged In: YES |
|
Date: 2005-02-16 16:21 Logged In: YES |
| Filename | Description | Download |
|---|---|---|
| extractor-html-npe.patch | Bugfix | Download |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Closed | 2005-02-21 13:43 | ck-heritrix |
| status_id | Open | 2005-02-16 16:21 | stack-sf |
| resolution_id | None | 2005-02-16 16:21 | stack-sf |
| close_date | - | 2005-02-16 16:21 | stack-sf |
| File Added | 120255: extractor-html-npe.patch | 2005-02-16 11:14 | ck-heritrix |