From: SourceForge.net <no...@so...> - 2010-10-26 16:16:34
|
Bugs item #2990554, was opened at 2010-04-22 03:09 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2990554&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: Tidy can wrap line in the middle of utf8 byte sequence. Initial Comment: I have used jtidy to clean up xml files. In some cases it produce files with incorrect symbols(0x0 symbol in utf8 xml files). I spent some time to debug: Tidy use line wrapping by default and have a bug with line wrapping. Tidy can wrap line in the middle of utf8 byte sequence. ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-10-27 00:16 Message: Closing due to lack of feedback; it's most likely fixed ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-08-18 09:06 Message: Hi, can you please check if this bug is now fixed in SVN? (due to fixing bug 3038314) ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-24 15:21 Message: Hi, first of all, what JTidy version are you using? Your patch doesn't compile with the current code. By the way, you can attach files to bug reports. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-04-22 20:05 Message: I have fixed it. Sorry, my first assumption was wrong. Problem in PPrint.java lines 706, 721. Code is: wraphere = linelen + 2; // 2, because AddChar is not till later but is should be: wraphere = linelen + 1; // 1, because we have arrays uses 0 as first char index and position of last symbol is len - 1 and because AddChar is not till later I have uploaded file with my changes. http://tuzikbottle.com/images/exchange/PPrint.java If you have any question mail me to vyacheslav.gudkov at gmail.com ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-04-22 16:09 Message: There is a test file: <root> <a>1,1′-azobis</a> </root> My test settings: Tidy tidy = new Tidy(); tidy.setXmlTags(true); tidy.setXmlOut(true); tidy.setFixBackslash(true); // replace \ with / in urls tidy.setFixComments(true); // detect mal-formed comments tidy.setHideComments(true); // hide all comments tidy.setHideEndTags(true); // prefer self-closing tag where possible tidy.setLowerLiterals(true); // output lower-case attrib names tidy.setMakeBare(true); // clean Microsoft cruft tidy.setNumEntities(true); // prefer number entities to named ones tidy.setTidyMark(false); // don't add meta tag giving tidy credit tidy.setOutputEncoding("UTF-8");// force tidy to report encoding as utf-8 instead of other tidy.setQuoteNbsp(true); // if char 160 is found, output as tidy.setMakeClean(true); // remove presentational clutter tidy.setDocType("omit"); tidy.setWraplen(9); ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-04-22 15:04 Message: I try to fix it today and upload patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2990554&group_id=13153 |