From: <yos...@us...> - 2012-04-20 17:01:38
|
According to the call stack you quoted, it should be a problem in CharsetDetector, not Charset converter. The bug was introduced in ICU4J 49.1 release version and reported by ticket#9267 [http://bugs.icu-project.org/trac/ticket/9267] The bug was fixed in the trunk - and we'll include the fix in a future ICU4J 49 maintenance release. If you need immediate fix, please apply the patch - http://bugs.icu-project.org/trac/changeset?format=diff&new=31740&old=31534&new_path=%2Ficu4j%2Ftrunk%2Fmain%2Fclasses%2Fcore%2Fsrc%2Fcom%2Fibm%2Ficu%2Ftext%2FCharsetRecog_sbcs.java&old_path=%2Ficu4j%2Ftrunk%2Fmain%2Fclasses%2Fcore%2Fsrc%2Fcom%2Fibm%2Ficu%2Ftext%2FCharsetRecog_sbcs.java to your icu4j source. Thanks, Yoshito FuRaNu <fu...@gm...> wrote on 04/20/2012 05:27:06 AM: > From: FuRaNu <fu...@gm...> > To: icu...@li..., > Date: 04/20/2012 05:31 AM > Subject: [icu-support] Can't transcode large UTF-8 text files > (ArrayIndexOutOfBoundsException) using ICU4J > > Hello, > > I'm trying to detect the charset and convert to Unicode a text file > using one of these methods: > - getReader(InputStream in, String declaredEncoding) > - getString(byte[] in, String declaredEncoding) > > When I convert a small text (UTF-8 or ANSI) file all goes well, but > when I try to do it with a larger file i get an > ArrayIndexOutOfBoundsException (only when the source is encoded with UTF-8) > > I tried to write a workaround, not trying to re-encode UTF-8 charset > but the method detect() does not work either with large UTF-8 files. > > At the bottom of the message I have included all the specific > information about my problem > > Anyone knows if this is a bug or if there is a way to avoid this error? > > Thank you very much. > > ------------------------------- > > Here two text files were you can see the problem (note that if you > erase text and make these files smaller all works as it should): > http://www.gutenberg.org/ebooks/2000.txt > http://www.gutenberg.org/ebooks/17073.txt > ------------------------------- > > Here you can see my code: > String text = ""; > BufferedInputStream in = new BufferedInputStream( > new FileInputStream(path)); > > CharsetDetector charsetDetector = new CharsetDetector(); > > Reader reader = charsetDetector.getReader(in, null); <--Fails here > > int i; > > i = reader.read(); > > while (i != -1) { > text += (char) i; > i = reader.read(); > } > > in.close(); > ------------------------------- > > Here you can see the complete stack trace of the exception: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at com.ibm.icu.text.CharsetRecog_sbcs > $CharsetRecog_IBM420_ar.matchInit(CharsetRecog_sbcs.java:1185) > at com.ibm.icu.text.CharsetRecog_sbcs > $CharsetRecog_IBM420_ar_rtl.match(CharsetRecog_sbcs.java:1254) > at com.ibm.icu.text.CharsetDetector.detectAll(CharsetDetector.java:196) > at com.ibm.icu.text.CharsetDetector.detect(CharsetDetector.java:159) > at educrypt.commons.io.Input.readTextFile(Input.java:78) > at educrypt.commons.io.Input.readTextFromFile(Input.java:51) > at educrypt.gui.ctr.AnalysisCtr.processOpenFile(AnalysisCtr.java:220) > at educrypt.gui.ctr.AnalysisCtr.actionPerformed(AnalysisCtr.java:191) > at javax.swing.JFileChooser.fireActionPerformed(Unknown Source) > at javax.swing.JFileChooser.approveSelection(Unknown Source) > at educrypt.gui.components.EducryptFileChooser.approveSelection > (EducryptFileChooser.java:63) > at javax.swing.plaf.basic.BasicFileChooserUI > $Handler.mouseClicked(Unknown Source) > at sun.swing.FilePane$Handler.mouseClicked(Unknown Source) > at java.awt.AWTEventMulticaster.mouseClicked(Unknown Source) > at java.awt.Component.processMouseEvent(Unknown Source) > at javax.swing.JComponent.processMouseEvent(Unknown Source) > at java.awt.Component.processEvent(Unknown Source) > at java.awt.Container.processEvent(Unknown Source) > at java.awt.Component.dispatchEventImpl(Unknown Source) > at java.awt.Container.dispatchEventImpl(Unknown Source) > at java.awt.Component.dispatchEvent(Unknown Source) > at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) > at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) > at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) > at java.awt.Container.dispatchEventImpl(Unknown Source) > at java.awt.Window.dispatchEventImpl(Unknown Source) > at java.awt.Component.dispatchEvent(Unknown Source) > at java.awt.EventQueue.dispatchEventImpl(Unknown Source) > at java.awt.EventQueue.access$000(Unknown Source) > at java.awt.EventQueue$1.run(Unknown Source) > at java.awt.EventQueue$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.awt.EventQueue$2.run(Unknown Source) > at java.awt.EventQueue$2.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.awt.EventQueue.dispatchEvent(Unknown Source) > at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) > at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) > at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) > at java.awt.Dialog$1.run(Unknown Source) > at java.awt.Dialog$3.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at java.awt.Dialog.show(Unknown Source) > at java.awt.Component.show(Unknown Source) > at java.awt.Component.setVisible(Unknown Source) > at java.awt.Window.setVisible(Unknown Source) > at java.awt.Dialog.setVisible(Unknown Source) > at educrypt.gui.components.EducryptDialog.open(EducryptDialog.java:48) > at educrypt.gui.components.LoadFileDialog.open(LoadFileDialog.java:58) > at educrypt.gui.components.TextPanel.showFileChooser(TextPanel.java:78) > at educrypt.gui.components.TextPanel$1.mouseClicked(TextPanel.java:67) > at java.awt.AWTEventMulticaster.mouseClicked(Unknown Source) > at java.awt.Component.processMouseEvent(Unknown Source) > at javax.swing.JComponent.processMouseEvent(Unknown Source) > at java.awt.Component.processEvent(Unknown Source) > at java.awt.Container.processEvent(Unknown Source) > at java.awt.Component.dispatchEventImpl(Unknown Source) > at java.awt.Container.dispatchEventImpl(Unknown Source) > at java.awt.Component.dispatchEvent(Unknown Source) > at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) > at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) > at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) > at java.awt.Container.dispatchEventImpl(Unknown Source) > at java.awt.Window.dispatchEventImpl(Unknown Source) > at java.awt.Component.dispatchEvent(Unknown Source) > at java.awt.EventQueue.dispatchEventImpl(Unknown Source) > at java.awt.EventQueue.access$000(Unknown Source) > at java.awt.EventQueue$1.run(Unknown Source) > at java.awt.EventQueue$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.awt.EventQueue$2.run(Unknown Source) > at java.awt.EventQueue$2.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at java.security.AccessControlContext$1.doIntersectionPrivilege > (Unknown Source) > at java.awt.EventQueue.dispatchEvent(Unknown Source) > at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) > at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) > at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) > at java.awt.EventDispatchThread.pumpEvents(Unknown Source) > at java.awt.EventDispatchThread.pumpEvents(Unknown Source) > at java.awt.EventDispatchThread.run(Unknown Source) > ------------------------------- > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > icu-support mailing list - icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > Archives/Project Info: http://site.icu-project.org/contacts |