Whilst looking at the legacy files which have been converted into 'modern'
formats, I came across several plaintext documents encoded in GB18030.
Xena spat the dummy when trying to normalise them. Console output below...
Not sure if this is a bug or a feature request, but it would be nice to be
able to handle documents with a non-standard character set
------------------------------------------------
FINEST: XIS
file:/T:/Legacy%20-%20format%20converted/C379P1/PHASE_3/MEDIA/053/379P153A.
03D guessed as type PlainText
22/10/2008 09:48:28 au.gov.naa.digipres.xena.kernel.guesser.GuesserManager
getBestGuess
FINER: Exception thrown in guesser NonStandardPlainTextGuesser
au.gov.naa.digipres.xena.kernel.XenaException:
java.io.UnsupportedEncodingException: GB18030
at
au.gov.naa.digipres.xena.plugin.plaintext.NonStandardPlainTextGuesser
.guess(NonStandardPlainTextGuesser.java:98)
at
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.getBestGuess(G
uesserManager java:358)
at
au.gov.naa.digipres.xena.kernel.guesser.GuesserManager.mostLikelyType
(GuesserManager.java:262)
at
au.gov.naa.digipres.xena.core.Xena.getMostLikelyType(Xena.java:258)
at
au.gov.naa.digipres.xena.core.Xena.getMostLikelyType(Xena.java:243)
at
au.gov.naa.digipres.xena.litegui.NormalisationThread.setTypes(Normali
sationThread.java:419)
at
au.gov.naa.digipres.xena.litegui.NormalisationThread.normaliseStandar
d(NormalisationThread.java:196)
at
au.gov.naa.digipres.xena.litegui.NormalisationThread.run(Normalisatio
nThread.java:144)
Caused by: java.io.UnsupportedEncodingException: B18030
at sun.nio.cs.StreamDecoder.forInputStreamReader(Unknown Source)
at java.io.InputStreamReader.<init>(Unknown Source)
at
au.gov.naa.digipres.xena.plugin.plaintext.NonStandardPlainTextGuesser
.guess(NonStandardPlainTextGuesser.java:91)
... 7 more
Michael Carden
None
None
Public
|
Date: 2009-11-06 03:05 I don't think this file can be considered as plaintext. I'm currently |
|
Date: 2008-10-23 02:32 From the console output, I assume that the file was guessed as Non-Standard |
|
Date: 2008-10-23 01:25 On my machine Xena identifies the encoding as GB18030, reads a set of |
|
Date: 2008-10-23 01:15 I looked at the list of supported encodings in Java and GB18030 should be |
| Filename | Description | Download |
|---|---|---|
| 379P153A.040 | File encoded in GB18030 | Download |
| Field | Old Value | Date | By |
|---|---|---|---|
| assigned_to | jwaddell | 2009-11-06 03:05 | jwaddell |
| resolution_id | Accepted | 2009-11-06 03:05 | jwaddell |
| priority | 5 | 2008-10-23 02:32 | vombatus |
| resolution_id | None | 2008-10-23 01:15 | jwaddell |
| assigned_to | nobody | 2008-10-23 01:15 | jwaddell |
| File Added | 298491: 379P153A.040 | 2008-10-23 00:41 | vombatus |