Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 NPE in FastBufferedOutputStream.close - ID: 1179530
Last Update: Comment added ( karl-ia )

Seen in Dan's Tsunami crawl:

Title: Problem occured processing
'http://www.bi.go.id/biweb/utama/peraturan/PBI-6-8-04.pdf'
Time: Apr. 8, 2005 13:23:56 GMT
Level: SEVERE
Message:

Problem java.lang.NullPointerException occured when
trying to process
'http://www.bi.go.id/biweb/utama/peraturan/PBI-6-8-04.pdf'
at step ABOUT_TO_BEGIN_PROCESSOR in HTTP


Associated Throwable: java.lang.NullPointerException

Stacktrace:
java.lang.NullPointerException
at
it.unimi.dsi.mg4j.io.FastBufferedOutputStream.close(FastBufferedOutputStrea
m.java:98)
at
org.archive.io.RecordingOutputStream.closeRecorder(RecordingOutputStream.ja
va:286)
at
org.archive.io.RecordingOutputStream.close(RecordingOutputStream.java:281)
at
org.archive.io.RecordingInputStream.close(RecordingInputStream.java:137)
at
org.archive.util.HttpRecorder.close(HttpRecorder.java:180)
at
org.archive.crawler.fetcher.FetchHTTP.cleanup(FetchHTTP.java:527)
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:391)
at
org.archive.crawler.framework.Processor.process(Processor.java:103)
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:272)

at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:142)


Seems odd. Code looks like this:

public void close() throws IOException {
if ( os == null ) return;
if ( pos != 0 ) os.write( buffer, 0, pos );
if ( os != System.out ) os.close(); <== LINE 98.
os = null;
buffer = null;
}

How could line 98 throw an NPE unless another thread
was also going through close at same time?


Michael Stack ( stack-sf ) - 2005-04-08 23:16

7

Closed

Fixed

Nobody/Anonymous

None

1.4.2

Public


Comments ( 5 )

Date: 2007-03-14 00:22
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-394 -- please add further
comments at that location.


Date: 2005-04-19 00:08
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Closing. Haven't seen since added synchronization on
closeRecorder (Will open again if I see it again).


Date: 2005-04-12 18:31
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Added a synchronized close on disk stream to see if it fixes
this issue. Below is commit. If that don't work, look in
logs for why we're failing fetch. MIght give clue as to
whats happening.

[debord 826] heritrix > more /tmp/diff.txt
Attempt at a 'fix' for '[ 1179530 ] NPE in
FastBufferedOutputStream.close'.
* src/java/org/archive/io/RecordingOutputStream.java
Formatting.
(open): Added close of stream if nonnull (Should never
be null here).
(closeDiskStream): Added method that closes the
diskstream inside a
synchronized block. Shouldn't be necessary but multiple
threads
closing is only way I can explain NPE in FBOS. Code
looks like this:
public void close() throws IOException {
if ( os == null ) return;
if ( pos != 0 ) os.write( buffer, 0, pos );
if ( os != System.out ) os.close(); <== LINE 98.
os = null;
buffer = null;
}
* src/conf/heritrix.properties
Added commented out servercache logging level of FINE.



Date: 2005-04-12 17:04
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Tom Emerson has seen same exception. From list:

Hi all,

I received a couple of NPEs in the new UbiCrawler-integrated
code. The
crawl continues, apparently just fine. But I thought I'd
pass one
along:

Title: Problem occured processing
'http://www.stats.gov.cn/tjsj/qtsj/gjsj/1998/t20020523_402191369.htm'
Time: Apr. 12, 2005 13:47:29 GMT
Level: SEVERE
Message:

Problem java.lang.NullPointerException occured when trying
to process
'http://www.stats.gov.cn/tjsj/qtsj/gjsj/1998/t20020523_402191369.htm'
at step ABOUT_TO_BEGIN_PROCESSOR in HTTP


Associated Throwable: java.lang.NullPointerException

Stacktrace:
java.lang.NullPointerException
at
it.unimi.dsi.mg4j.io.FastBufferedOutputStream.close(FastBufferedOutputStream.java:98)
at
org.archive.io.RecordingOutputStream.closeRecorder(RecordingOutputStream.java:286)
at
org.archive.io.RecordingOutputStream.close(RecordingOutputStream.java:281)
at
org.archive.io.RecordingInputStream.close(RecordingInputStream.java:137)
at
org.archive.util.HttpRecorder.close(HttpRecorder.java:180)
at
org.archive.crawler.fetcher.FetchHTTP.cleanup(FetchHTTP.java:534)
at
org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:398)
at
org.archive.crawler.framework.Processor.process(Processor.java:103)
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:272)
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:142)


Date: 2005-04-08 23:19
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

There are a couple of these in dan's tsunami crawl on
crawling001, hours apart.


Attached File

No Files Currently Attached

Changes ( 5 )

Field Old Value Date By
artifact_group_id None 2005-09-23 18:24 gojomo
status_id Open 2005-04-19 00:08 stack-sf
resolution_id None 2005-04-19 00:08 stack-sf
close_date - 2005-04-19 00:08 stack-sf
priority 5 2005-04-12 18:31 stack-sf