Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 ARCReader#read(byte [], off, len) broke for non-null offset - ID: 1475798
Last Update: Comment added ( karl-ia )

See below from Brad:

I'm seeing a problem that I think is in ARCRecord
because of a weird use with a BufferedInputStream,
details in stack trace:


INFO:
closed..(org.archive.io.arc.ARCReaderFactory$CompressedARCReader@a205d2)
java.lang.IndexOutOfBoundsException
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
at
java.util.zip.GZIPInputStream.read(GZIPInputStream.java:87)
at
org.archive.io.arc.ARCRecord.read(ARCRecord.java:410)
at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at
java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at
org.archive.wayback.core.Resource.read(Resource.java:195)
at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at
sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at
java.io.InputStreamReader.read(InputStreamReader.java:167)
at
org.archive.wayback.archivalurl.JSReplayRenderer.renderResource(JSReplayRen
derer.java:165)
at
org.archive.wayback.timeline.TimelineReplayRenderer.renderResource(Timeline
ReplayRenderer.java:78)
at
org.archive.wayback.replay.ReplayServlet.doGet(ReplayServlet.java:169)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicatio
nFilterChain.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterC
hain.java:173)
at
org.archive.wayback.core.RequestFilter.doFilter(RequestFilter.java:87)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicatio
nFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterC
hain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.j
ava:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.j
ava:178)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:12
6)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:10
5)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.jav
a:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.process
Connection(Http11BaseProtocol.java:663)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.ja
va:527)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerW
orkerThread.java:80)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.ja
va:684)
at java.lang.Thread.run(Thread.java:595)


I've wrapped the ARCRecord(ISA InputStream) within a
BufferedInputStream, which is wrapped within an
InputStreamReader, which I'm then calling reads() on.

I've gotten this setup, because I want to mark() the
stream, read the first 4K bytes from the ARCRecord,
then reset() the stream. Then I hand off the 4K bytes
to a character set detection module, and then wrap the
BufferedInputStream in an InputStreamReader to get out
characters, which I then append to a StringBuffer...

Whew. Part of the problem *may* be that it seems the
default buffersize for BufferedInputStreams is 8K,
where I'm only using a readLimit of 4K with the mark()
call.

The arguments to ARCRecord that seem to go arwry are:

public int read(byte [] b, int offset, int length)
throws IOException {...

with an 8K byte[], offset = 6935, and length = 1257,
which all seem like reasonable values.

But this call is then invoking, at line 410:

read = this.in.read(b, offset, read);

with a read value of -5678, (offset is still 6935)
which is causing the IndexOutOfBoundsException deeper down.

I think the problem is at line 406:

read = Math.min(length - offset, available());

perhaps:

read = Math.min(length, available());

may be the right thing. Looked better to GJM and I, but
we wanted to pass this up to you. If this the problem,
line 395 may have the same issue:

read = Math.min(length - offset,
this.httpHeaderStream.available());

See ya!

Brad


Michael Stack ( stack-sf ) - 2006-04-24 21:19

7

Closed

Fixed

Michael Stack

Disk I/O

1.10.0

Public


Comments ( 3 )

Date: 2007-03-14 01:05
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-554 -- please add further
comments at that location.


Date: 2006-05-15 22:00
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Applied. Closing. Commit below.

Fix by Brad for '[ 1475798 ] ARCReader#read(byte [], off,
len) broke for
non-null offset'.
* src/java/org/archive/io/arc/ARCRecord.java
Fix incorrect calculating a negative count to read from
buffer.
* src/java/org/archive/io/arc/ARCWriterTest.java
Added test.



Date: 2006-04-24 21:24
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Added patch and unit test to fix above (Patch does change
suggested by Brad).

Asking Brad to review.


Attached File ( 1 )

Filename Description Download
arcrecord.patch Patch I forgot to add earlier Download

Changes ( 5 )

Field Old Value Date By
status_id Open 2006-05-15 22:10 stack-sf
close_date - 2006-05-15 22:10 stack-sf
artifact_group_id None 2006-05-15 22:00 stack-sf
resolution_id None 2006-05-15 22:00 stack-sf
File Added 175680: arcrecord.patch 2006-04-24 23:12 stack-sf