From: <ikr...@us...> - 2011-12-20 23:06:38
|
Revision: 3587 http://archive-access.svn.sourceforge.net/archive-access/?rev=3587&view=rev Author: ikreymer Date: 2011-12-20 23:06:32 +0000 (Tue, 20 Dec 2011) Log Message: ----------- BUGFIX: if the first char read from the InputStreamReader is a 0xFEFF BOM marker, remove it -- this implies the InputStreamReader is not interpreting it, and thus it should be removed from the content to avoid problems. Modified Paths: -------------- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/TextDocument.java Modified: trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/TextDocument.java =================================================================== --- trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/TextDocument.java 2011-12-18 04:17:45 UTC (rev 3586) +++ trunk/archive-access/projects/wayback/wayback-core/src/main/java/org/archive/wayback/replay/TextDocument.java 2011-12-20 23:06:32 UTC (rev 3587) @@ -205,6 +205,13 @@ // slurp the whole thing into RAM: sb = new StringBuilder(recordLength); + + //Skip the UTF-8 BOM 0xFEFF + int firstChar = isr.read(); + if ((firstChar != '\uFEFF') && (firstChar != -1)) { + sb.append(firstChar); + } + for (int r = -1; (r = isr.read(cbuffer, 0, C_BUFFER_SIZE)) != -1;) { sb.append(cbuffer, 0, r); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |