Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 IllegalCharsetNameException: Windows-1256 - ID: 938591
Last Update: Comment added ( karl-ia )

Below was reported by Igor.

Title: RuntimeException occured processing
'http://www.tunezine.com/'
Time: Apr. 9, 2004 19:07:59 GMT
Level: SEVERE
Message:

The following RuntimeException occure when trying to
process 'http://www.tunezine.com/'


Associated Throwable:
java.nio.charset.IllegalCharsetNameException:
Windows-1256 \r\n

Message:
Windows-1256 \r\n

Stacktrace:
java.nio.charset.IllegalCharsetNameException:
Windows-1256 \r\n
at java.nio.charset.Charset.checkName(Charset.java:294)
at
java.nio.charset.Charset.lookup(Charset.java(Compiled
Code))
at
java.nio.charset.Charset.forName(Charset.java(Inlined
Compiled Code))
at
java.lang.StringCoding$EncoderCache.makeEncoder(StringCoding.java(Compiled
Code))
at
java.lang.StringCoding$2.run(StringCoding.java(Compiled
Code))
at
java.security.AccessController.doPrivileged(Native Method)
at
java.lang.StringCoding$EncoderCache.getEncoder(StringCoding.java(Compiled
Code))
at
java.lang.StringCoding.getEncoder(StringCoding.java(Inlined
Compiled Code))
at
java.lang.StringCoding.encode(StringCoding.java(Compiled
Code))
at java.lang.String.getBytes(String.java(Inlined
Compiled Code))
at
org.archive.io.ReplayCharSequenceFactory.isMultibyteEncoding(ReplayCharSequ
enceFactory.java(Compiled
Code))
at
org.archive.io.ReplayCharSequenceFactory.getReplayCharSequence(ReplayCharSe
quenceFactory.java(Compiled
Code))
at
org.archive.io.RecordingOutputStream.getReplayCharSequence(RecordingOutputS
tream.java(Compiled
Code))
at
org.archive.io.RecordingInputStream.getReplayCharSequence(RecordingInputStr
eam.java(Inlined
Compiled Code))
at
org.archive.util.HttpRecorder.getReplayCharSequence(HttpRecorder.java(Inlin
ed
Compiled Code))
at
org.archive.crawler.extractor.ExtractorHTML.innerProcess(ExtractorHTML.java
(Compiled
Code))
at
org.archive.crawler.framework.Processor.process(Processor.java(Compiled
Code))
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java(Comp
iled
Code))
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java(Compiled
Code))


Michael Stack ( stack-sf ) - 2004-04-20 14:14

5

Closed

Fixed

Michael Stack

Extraction

None

Public


Comments ( 2 )

Date: 2007-03-14 00:10
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-121 -- please add further
comments at that location.


Date: 2004-04-20 14:17
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Windows-1256 is a legit encoding (arabic). Its the trailing
'\r\n' that was doing us in. Added a trim.

Closing.


Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2004-04-20 14:17 stack-sf
resolution_id None 2004-04-20 14:17 stack-sf
close_date - 2004-04-20 14:17 stack-sf