You can subscribe to this list here.
2004 |
Jan
(29) |
Feb
(1) |
Mar
(6) |
Apr
(31) |
May
(2) |
Jun
(2) |
Jul
(13) |
Aug
(31) |
Sep
(41) |
Oct
(12) |
Nov
(13) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(17) |
Feb
(3) |
Mar
(3) |
Apr
|
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(2) |
Dec
(6) |
2006 |
Jan
(4) |
Feb
(6) |
Mar
(2) |
Apr
(1) |
May
|
Jun
|
Jul
(21) |
Aug
(7) |
Sep
(5) |
Oct
(4) |
Nov
(2) |
Dec
(2) |
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(2) |
Oct
(2) |
Nov
|
Dec
(1) |
2008 |
Jan
(1) |
Feb
(1) |
Mar
(7) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
(8) |
2009 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(5) |
Aug
(24) |
Sep
(16) |
Oct
(8) |
Nov
(42) |
Dec
(3) |
2010 |
Jan
(8) |
Feb
(8) |
Mar
(14) |
Apr
(29) |
May
(2) |
Jun
(1) |
Jul
(11) |
Aug
(47) |
Sep
(4) |
Oct
(16) |
Nov
(18) |
Dec
|
2011 |
Jan
(5) |
Feb
(4) |
Mar
(2) |
Apr
|
May
|
Jun
(10) |
Jul
(50) |
Aug
(4) |
Sep
(4) |
Oct
(1) |
Nov
(4) |
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: SourceForge.net <no...@so...> - 2010-03-15 18:50:01
|
Bugs item #2922337, was opened at 2009-12-28 16:32 Message generated for change (Comment added) made by schierlm You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- Comment By: Michael Schierl (schierlm) Date: 2010-03-15 19:50 Message: The bug was not fixed by aditsu in r1094, it was just "taped over" (or "silenced") by making sure that getString will not throw SIOOBE any longer: http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/TidyUtils.java?r1=1094&r2=1093&pathrev=1094 When I saw this "fix" I knew that I'll have to migrate to some other HTML sanitizer than JTidy, at least for cases where the sanitizing happens automatically without anyone (except customers) looking at the output afterwards. (yes I know the NO WARRANTY clause of most open source licenses, so I will not complain about it). cigaly: If you managed to track down the root cause (I tried it but gave up after searching for a few hours), feel free to attach a patch to this bug. I will also test it and if it works fine with the "real" files that show that bug (with r1094 reversed) I will at least apply it to my private version, if aditsu does not want to apply it). I don't think forking is the right way to handle those issues, but if there is no alternative to it (r1094 is not, for me), I'll do my private fork. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 19:29 Message: sorry, but this is source that I am seeing in Lexer.java from jtidy-r938-sources.zip downloaded from sourceforge and also in one that I can see in svn repository (e.g. http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/Lexer.java?view=log) ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-15 14:17 Message: The bug was reported against revision 938 and you're complaining that it's not fixed in revision 927?!? That doesn't make any sense. Anyway, the bug was fixed in revision 1094. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 12:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 06:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-15 18:29:33
|
Bugs item #2922337, was opened at 2009-12-28 15:32 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 18:29 Message: sorry, but this is source that I am seeing in Lexer.java from jtidy-r938-sources.zip downloaded from sourceforge and also in one that I can see in svn repository (e.g. http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/Lexer.java?view=log) ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-15 13:17 Message: The bug was reported against revision 938 and you're complaining that it's not fixed in revision 927?!? That doesn't make any sense. Anyway, the bug was fixed in revision 1094. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 11:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 05:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-15 13:17:14
|
Bugs item #2922337, was opened at 2009-12-28 23:32 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-03-15 21:17 Message: The bug was reported against revision 938 and you're complaining that it's not fixed in revision 927?!? That doesn't make any sense. Anyway, the bug was fixed in revision 1094. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 19:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 13:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-15 11:26:37
|
Bugs item #2922337, was opened at 2009-12-28 15:32 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- Comment By: https://www.google.com/accounts () Date: 2010-03-15 11:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 05:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-07 15:03:37
|
Bugs item #2936583, was opened at 2010-01-22 04:44 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed >Resolution: Works For Me Priority: 7 Private: No Submitted By: CWall () Assigned to: Adrian Sandor (aditsu) Summary: NullPointerException Node.trimInitialSpace Initial Comment: The following HTML source causes a NullPointerException in Node.trimInitialSpace where element.parent is null. Note that it's the space(s) after the span tag that triggers the trimInitialSpace call. Yes, span inside table is not valid, but I wouldn't expect JTidy to NPE. Known issue? Workaround? Thanks. <table> <tr><td></td></tr> <span id="mySpan"> <tr><td></td></tr></span> </table> if (TidyUtils.toBoolean(element.tag.model & Dict.CM_INLINE) && !TidyUtils.toBoolean(element.tag.model & Dict.CM_FIELD) && element.parent.content != element) ... The easy fix is to add element.parent != null check. Thanks. -C ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-02-06 23:04 Message: Ok I committed a change for that, and now I get this output: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net"> <title></title> </head> <body> <span id="mySpan"></span> <table> <tr> <td></td> </tr> <tr> <td></td> </tr> </table> </body> </html> And some warnings, but no exception and no error. Can you provide your exact code and html that causes NPE? (a small test case) Also, what jtidy version are you using? ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-02-06 22:30 Message: With that html I get "Error: discarding unexpected </span>" and "This document has errors that must be fixed before using HTML Tidy to generate a tidied up version." No exception thrown. However, the behavior is different from Tidy, so I'll try fixing that first. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-07 10:25:46
|
Bugs item #2961207, was opened at 2010-03-01 21:48 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2961207&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Vidar S. Ramdal () >Assigned to: Adrian Sandor (aditsu) Summary: Input with single <object> element ends up in <head> Initial Comment: Using jTidy 4aug2000r7 Steps to reproduce: Given this input HTML: <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> Running this code, using the above HTML as inputHtml: Tidy tidy = new Tidy(); tidy.setQuiet(true); tidy.setXHTML(true); tidy.setCharEncoding("UTF-8"); tidy.parse(new ByteArrayInputStream(inputHtml.getBytes(charset)), out); Expected result (in "out"): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="HTML Tidy, see www.w3.org" /> <title></title> </head> <body> <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> </object> </body> </html> Actual result: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="HTML Tidy, see www.w3.org" /> <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> </object> <title></title> </head> </html> Note how the <object> element is within <head>, and that the <body> element is missing completely. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-07 18:25 Message: Since JTidy is a java port of the official c version of Tidy, we strive to maintain complete compatability with the official version. We have checked the code that you indicated and found that it matches the c version of Tidy. Any bugs that you receive from the c version of tidy should be reported to the HTML Tidy Bug Tracker at SourceForge where they can be addressed. When the bugs are fixed in the c version of tidy, that will be reflected in the java version as well. Source code for JTidy is available from the CVS section of this site. Please feel free to download the source and make any local changes in your copy of Tidy to correct your problems. Thank you. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2961207&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-01 13:48:52
|
Bugs item #2961207, was opened at 2010-03-01 13:48 Message generated for change (Tracker Item Submitted) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2961207&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: https://www.google.com/accounts () Assigned to: Nobody/Anonymous (nobody) Summary: Input with single <object> element ends up in <head> Initial Comment: Using jTidy 4aug2000r7 Steps to reproduce: Given this input HTML: <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> Running this code, using the above HTML as inputHtml: Tidy tidy = new Tidy(); tidy.setQuiet(true); tidy.setXHTML(true); tidy.setCharEncoding("UTF-8"); tidy.parse(new ByteArrayInputStream(inputHtml.getBytes(charset)), out); Expected result (in "out"): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="HTML Tidy, see www.w3.org" /> <title></title> </head> <body> <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> </object> </body> </html> Actual result: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content="HTML Tidy, see www.w3.org" /> <object type="application/x-shockwave-flash" data="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" width="400" height="300"><param name="movie" value="http://localhost:8080/demo/content/filelist_ce519e86-3155-4f47-807e-d9d2ff07838f/1267443344517/arrows.swf" /> </object> <title></title> </head> </html> Note how the <object> element is within <head>, and that the <body> element is missing completely. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2961207&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-17 15:23:51
|
Bugs item #2949261, was opened at 2010-02-11 03:07 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: Missing data Initial Comment: INPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> Some text2 </td> </tr> </tbody></table> OUTPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> </td> </tr> </tbody></table> DESIRED OUTPUT Without missing the text "Some Text 2". ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-17 23:23 Message: Actually, the same thing is happening. The data from the second column is not missing, but appears above the table: "SecretarioVocalVocal" Since this matches Tidy behavior, I'm going to close this bug. If you're not satisfied with the output, then please file a bug in the Tidy project. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-02-17 20:35 Message: sorry, i thought that was the problem. the true input was http://www.senado.gov.ar/web/senadores/comint.php?id_sena=374&iOrden=0&iSen=ASC And the data missing is the data in the second column of the table. sorry for the drawback ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-02-17 06:06 Message: I tried your input, and I got this output: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net"> <title></title> </head> <body> Some text2 <table> <tbody> <tr> <td class="texto-cuerpo" align="center">Some text</td> <td align="center" class="texto-cuerpo"> </tr> </tbody> </table> </body> </html> As you can see, "Some text2" is not missing but is before the table. This matches Tidy behavior. If you have a problem with that, please file a bug for the Tidy project. But if "Some text2" is really missing for you, please provide more details. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-17 12:35:02
|
Bugs item #2949261, was opened at 2010-02-10 19:07 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: Missing data Initial Comment: INPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> Some text2 </td> </tr> </tbody></table> OUTPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> </td> </tr> </tbody></table> DESIRED OUTPUT Without missing the text "Some Text 2". ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-02-17 12:35 Message: sorry, i thought that was the problem. the true input was http://www.senado.gov.ar/web/senadores/comint.php?id_sena=374&iOrden=0&iSen=ASC And the data missing is the data in the second column of the table. sorry for the drawback ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-02-16 22:06 Message: I tried your input, and I got this output: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net"> <title></title> </head> <body> Some text2 <table> <tbody> <tr> <td class="texto-cuerpo" align="center">Some text</td> <td align="center" class="texto-cuerpo"> </tr> </tbody> </table> </body> </html> As you can see, "Some text2" is not missing but is before the table. This matches Tidy behavior. If you have a problem with that, please file a bug for the Tidy project. But if "Some text2" is really missing for you, please provide more details. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-17 00:05:09
|
Bugs item #2905310, was opened at 2009-11-28 20:43 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2905310&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Stephen H (stephenjglu) Assigned to: Adrian Sandor (aditsu) Summary: Fix for Unicode surrogate support Initial Comment: JTidy revision 929 doesn't support Unicode surrogates (e.g. \uD801\uDC01). For UTF-8 and UTF-16 streams these are returned as separate characters by java.io.Reader.read(). They will then be stripped out in Lexer.addCharToLexer() as they will not appear in a valid Unicode range. The attached patch changes StreamInJavaImpl to process the surrogate there when a stream with UTF-8 or UTF-16 encoding is used. I also modified readCharFromStream() to take out the unnecessary setting of the endOfStream flag as this is handled in readChar(), though you may want to do this differently. ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-17 08:05 Message: Thanks for the details. I still plan to maintain a java 1.4-compatible version, but it probably won't get too many bug fixes anymore. I'm working on the CodeUpdateAndJava5 branch, which is not released yet, but already far ahead of trunk. As the name suggests, it requires java 5, and the code is updated to bring it closer to current Tidy. So a fix that requires java 5 is also welcome, however I'd like to review how Tidy handles surrogates before using any patches. The current tests from trunk are quite hopeless indeed, I'm not even trying to fix them anymore. On the other branch I added a new set of tests, taken from Tidy. See the TidyTests class. Currently out of 231 tests I have 151 passing (meaning output is byte-by-byte identical with Tidy output), 0 errors and 80 failing. Some of the failures are actually Tidy bugs, that's a bit annoying. ---------------------------------------------------------------------- Comment By: Stephen H (stephenjglu) Date: 2009-11-30 17:53 Message: Example HTML file is attached, cribbed from http://www.i18nguy.com/unicode-plane1-utf8.html. I had a go at plugging this into the tests but running them almost all gave failures so I wasn't sure of their current state. (The tests also don't like it if the path contains spaces.) This is only a fix for input - we work with the JTidy DOM which with this change then holds surrogates correctly. The text output from JTidy is a bit messed up. Will see if I can figure out what is going on there at some point. I'm assuming you want compatibility with pre Java 1.5 so I haven't used the surrogate support that came with that release. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-28 23:24 Message: Do you have a test case for this? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2905310&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-16 23:16:05
|
Bugs item #2891882, was opened at 2009-11-04 20:39 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2891882&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Alex Kainov (alexkainov) Assigned to: Adrian Sandor (aditsu) Summary: Incorrect parsing of <td> attributes Initial Comment: INPUT: <td width="14%" bgcolor="#008000"> TIDY OUTPUT: <td width="14%" bgcolor="#008000"> DESIRED OUTPUT: <td style="width:14%;background-color:#008000;"> Error: The tag: "td" doesn't have an attribute: "width" in currently active versions. The tag: "td" doesn't have an attribute: "bgcolor" in currently active versions. Found in version: r918 ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-17 07:16 Message: I fixed the bgcolor handling in trunk (when using the clean option). The behavior matches Tidy now. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-12-20 02:19 Message: An update about your report: First, <td width="14%" bgcolor="#008000"> validates perfectly in HTML 4.01 Transitional, see http://is.gd/5tYlj Second, tidy -c removes the bgcolor attribute, adds a class to the td and defines the background-color css attribute for that class. JTidy from trunk doesn't do that currently (with -c, or setMakeClean from the code). Neither Tidy nor JTidy change the width attribute. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-25 20:38 Message: Well, the main purpose of JTidy is to replicate the functionality of Tidy (the C program & library) in java. Are you able to get what you want using Tidy? If yes, please tell me how (what parameters you used). If not, then you can request an enhancement to Tidy first, and after they implement it, I can port it to java for JTidy. Alternatively, if this is a feature already documented for JTidy and it's not working, then please give me the details (specifically, where it says it is supposed to do what you wanted). ---------------------------------------------------------------------- Comment By: Alex Kainov (alexkainov) Date: 2009-11-25 19:47 Message: Just in case - previous post was from me (alexkainov) :) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-11-25 19:34 Message: Sure, W3C validator is not JTidy :) I got those 2 errors from W3C validator. It mean's JTidy should process those attributes in order to fit XHTML format. I didn't connect desired translation (attribute to style) to any JTydy option. Should I ? ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-08 09:39 Message: The W3C validator is not JTidy. Did you get those 2 errors from JTidy or not? Also which JTidy option (that you used) is supposed to replace the width and bgcolor attributes with inline style? ---------------------------------------------------------------------- Comment By: Alex Kainov (alexkainov) Date: 2009-11-06 19:00 Message: If you try W3C validator at http://validator.w3.org/#validate_by_upload+with_options with file after.html (see attachment), you'll get errors connected with "width" and "bgcolor" attributes of <td> tag. Short extract of error report you can find in file "ErrsByValidator.doc" (see attachment). ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-06 18:14 Message: "The code is quite simple" -> I totally disagree. Also note that you should attach a file instead of pasting the input in a comment, because it's too big, and because the formatting and encoding were most likely affected. Anyway, I used your code and your input, and I still did not get the errors you mentioned. It did preserve the width and bgcolor attributes in the td, but I'm not sure why you expect it to change them. ---------------------------------------------------------------------- Comment By: Alex Kainov (alexkainov) Date: 2009-11-06 17:38 Message: OK :) Here is the input: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head></head><body text="#000000" bgcolor="#FFFFFF"><table width="100%" border="1"><tr valign="top"><td width="14%"><p><font face="Verdana">cell 1</font></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr><tr valign="top"><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><font face="Verdana">row 2 Column 2</font></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr><tr valign="top"><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><font face="Verdana">??? </font><i><font color="#00E100" face="Verdana">??????? ??????</font></i></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr><tr valign="top"><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><font face="Verdana">???????? (</font><font color="#FF0000" face="Verdana">???????</font><font face="Verdana">)</font></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><b><i><u><font color="#A1009F" face="Verdana">?????????? ?????? ?????? ????????????</font></u></i></b></td><td width="14%" bgcolor="#00FFFF"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#008000"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#0000FF"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr><tr valign="top"><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><font color="#0062E1" face="Verdana">????? ???? </font><b><font color="#0062E1" face="Verdana">??????</font></b></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr><tr valign="top"><td width="14%" bgcolor="#FF0000"><font face="Verdana">ura</font></td><td width="14%" bgcolor="#FF8100"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#FFE118"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#00E100"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#00FFFF"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#2181FF"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td><td width="14%" bgcolor="#C200FF"><img width="1" height="1" src="/icons/ecblank.gif" border="0" alt=""></td></tr></table></body></html> ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-06 17:33 Message: Hi, I asked for the input, but you didn't provide it, you only explained how you obtained it. You're also using TOOOOOOOOOO many steps to convert the input before passing it to JTidy, but that shouldn't affect the processing of the td tag you showed. You don't need any .NET stuff. You can find Tidy at http://tidy.sourceforge.net/ and http://sourceforge.net/projects/tidy ---------------------------------------------------------------------- Comment By: Alex Kainov (alexkainov) Date: 2009-11-06 16:58 Message: Hi ! Thanks for the answer ! Well, I use URL u.openStream() as input for the parser: URL u = new URL(url + doc.getUniversalID()); BufferedReader in = new BufferedReader( new InputStreamReader( u.openStream(),"UTF-8") ); String s; StringBuffer htmlStr = new StringBuffer(); while( (s = in.readLine()) != null){ htmlStr.append(s); } String htmlString = htmlStr.toString(); The code is quite simple: Tidy tidy= new Tidy(); // obtain a new Tidy instance tidy.setDocType("strict"); tidy.setDropFontTags(true); tidy.setFixBackslash(true); tidy.setFixUri(true); tidy.setJoinClasses(true); tidy.setJoinStyles(true); tidy.setLogicalEmphasis(true); tidy.setQuiet(true); tidy.setQuoteMarks(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); tidy.setXHTML(true); tidy.setInputEncoding("UTF8"); tidy.setOutputEncoding("UTF8"); byte currentXMLBytes[] = htmlString.getBytes("UTF-8"); ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(currentXMLBytes); ByteArrayOutputStream byteArrayOutputStream= new ByteArrayOutputStream(); tidy.parse(byteArrayInputStream, byteArrayOutputStream); String sBuffer= byteArrayOutputStream.toString("UTF-8"); Concerning your proposal of using tidy (the C program). I've found only links to the program for .NET. .NETis not installed on my computer. Any idea ? Regards, Alex. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-06 10:32 Message: I don't get those errors, maybe you haven't included the whole input (especially the doctype). You also haven't provided the code. Anyway, check if tidy (the C program) behaves differently. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2891882&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-16 22:06:47
|
Bugs item #2949261, was opened at 2010-02-11 03:07 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Adrian Sandor (aditsu) Summary: Missing data Initial Comment: INPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> Some text2 </td> </tr> </tbody></table> OUTPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> </td> </tr> </tbody></table> DESIRED OUTPUT Without missing the text "Some Text 2". ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-17 06:06 Message: I tried your input, and I got this output: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net"> <title></title> </head> <body> Some text2 <table> <tbody> <tr> <td class="texto-cuerpo" align="center">Some text</td> <td align="center" class="texto-cuerpo"> </tr> </tbody> </table> </body> </html> As you can see, "Some text2" is not missing but is before the table. This matches Tidy behavior. If you have a problem with that, please file a bug for the Tidy project. But if "Some text2" is really missing for you, please provide more details. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-10 19:07:47
|
Bugs item #2949261, was opened at 2010-02-10 19:07 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Missing data Initial Comment: INPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> Some text2 </td> </tr> </tbody></table> OUTPUT <table><tbody> <tr> <td class="texto-cuerpo" align="center">Some text </td><td align="center" class="texto-cuerpo" /> </td> </tr> </tbody></table> DESIRED OUTPUT Without missing the text "Some Text 2". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2949261&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-06 15:04:50
|
Bugs item #2936583, was opened at 2010-01-22 04:44 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: CWall () Assigned to: Adrian Sandor (aditsu) Summary: NullPointerException Node.trimInitialSpace Initial Comment: The following HTML source causes a NullPointerException in Node.trimInitialSpace where element.parent is null. Note that it's the space(s) after the span tag that triggers the trimInitialSpace call. Yes, span inside table is not valid, but I wouldn't expect JTidy to NPE. Known issue? Workaround? Thanks. <table> <tr><td></td></tr> <span id="mySpan"> <tr><td></td></tr></span> </table> if (TidyUtils.toBoolean(element.tag.model & Dict.CM_INLINE) && !TidyUtils.toBoolean(element.tag.model & Dict.CM_FIELD) && element.parent.content != element) ... The easy fix is to add element.parent != null check. Thanks. -C ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-06 23:04 Message: Ok I committed a change for that, and now I get this output: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net"> <title></title> </head> <body> <span id="mySpan"></span> <table> <tr> <td></td> </tr> <tr> <td></td> </tr> </table> </body> </html> And some warnings, but no exception and no error. Can you provide your exact code and html that causes NPE? (a small test case) Also, what jtidy version are you using? ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-02-06 22:30 Message: With that html I get "Error: discarding unexpected </span>" and "This document has errors that must be fixed before using HTML Tidy to generate a tidied up version." No exception thrown. However, the behavior is different from Tidy, so I'll try fixing that first. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-02-06 14:30:33
|
Bugs item #2936583, was opened at 2010-01-22 04:44 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: CWall () >Assigned to: Adrian Sandor (aditsu) Summary: NullPointerException Node.trimInitialSpace Initial Comment: The following HTML source causes a NullPointerException in Node.trimInitialSpace where element.parent is null. Note that it's the space(s) after the span tag that triggers the trimInitialSpace call. Yes, span inside table is not valid, but I wouldn't expect JTidy to NPE. Known issue? Workaround? Thanks. <table> <tr><td></td></tr> <span id="mySpan"> <tr><td></td></tr></span> </table> if (TidyUtils.toBoolean(element.tag.model & Dict.CM_INLINE) && !TidyUtils.toBoolean(element.tag.model & Dict.CM_FIELD) && element.parent.content != element) ... The easy fix is to add element.parent != null check. Thanks. -C ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-02-06 22:30 Message: With that html I get "Error: discarding unexpected </span>" and "This document has errors that must be fixed before using HTML Tidy to generate a tidied up version." No exception thrown. However, the behavior is different from Tidy, so I'll try fixing that first. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-29 18:30:59
|
Bugs item #2940996, was opened at 2010-01-27 21:56 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940996&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Adrian Sandor (aditsu) Summary: ArithmeticException / by zero with Tabsize zero Initial Comment: if tidy is setup with Tidy#setTabsize(0) to zero this will result in a runtime exception (ArtihmeticException / by zero) in org.w3c.tidy.StreamInJavaImpl.readChar() line 248. workarround: setTabSize(>0); ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-01-30 02:30 Message: Fixed in svn (r1095) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940996&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-27 13:56:53
|
Bugs item #2940996, was opened at 2010-01-27 13:56 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940996&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: ArithmeticException / by zero with Tabsize zero Initial Comment: if tidy is setup with Tidy#setTabsize(0) to zero this will result in a runtime exception (ArtihmeticException / by zero) in org.w3c.tidy.StreamInJavaImpl.readChar() line 248. workarround: setTabSize(>0); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940996&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-27 10:49:05
|
Bugs item #2940893, was opened at 2010-01-27 11:49 Message generated for change (Tracker Item Submitted) made by matzon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940893&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Brian Matzon (matzon) Assigned to: Nobody/Anonymous (nobody) Summary: xhtml output + print body forces indent/pretty print Initial Comment: jtidy will incorrectly prettyprint/indent output when xhtml output is enabled and only printing body. Input: <span class="text-small"><i>Hello <b>How</b> Are You?</i></span> becomes: <span class="text-small"> <i>Hello <b>How</b> Are You?</i> </span> This does not happen when using the tidy binary. Options: tidy.setPrintBodyOnly(true); tidy.setXHTML(true); vs show-body-only: true output-xhtml: true ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940893&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-21 20:44:55
|
Bugs item #2936583, was opened at 2010-01-21 20:44 Message generated for change (Settings changed) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None >Priority: 7 Private: No Submitted By: CWall () Assigned to: Nobody/Anonymous (nobody) Summary: NullPointerException Node.trimInitialSpace Initial Comment: The following HTML source causes a NullPointerException in Node.trimInitialSpace where element.parent is null. Note that it's the space(s) after the span tag that triggers the trimInitialSpace call. Yes, span inside table is not valid, but I wouldn't expect JTidy to NPE. Known issue? Workaround? Thanks. <table> <tr><td></td></tr> <span id="mySpan"> <tr><td></td></tr></span> </table> if (TidyUtils.toBoolean(element.tag.model & Dict.CM_INLINE) && !TidyUtils.toBoolean(element.tag.model & Dict.CM_FIELD) && element.parent.content != element) ... The easy fix is to add element.parent != null check. Thanks. -C ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-21 20:44:31
|
Bugs item #2936583, was opened at 2010-01-21 20:44 Message generated for change (Tracker Item Submitted) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: CWall () Assigned to: Nobody/Anonymous (nobody) Summary: NullPointerException Node.trimInitialSpace Initial Comment: The following HTML source causes a NullPointerException in Node.trimInitialSpace where element.parent is null. Note that it's the space(s) after the span tag that triggers the trimInitialSpace call. Yes, span inside table is not valid, but I wouldn't expect JTidy to NPE. Known issue? Workaround? Thanks. <table> <tr><td></td></tr> <span id="mySpan"> <tr><td></td></tr></span> </table> if (TidyUtils.toBoolean(element.tag.model & Dict.CM_INLINE) && !TidyUtils.toBoolean(element.tag.model & Dict.CM_FIELD) && element.parent.content != element) ... The easy fix is to add element.parent != null check. Thanks. -C ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2936583&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-01-17 11:02:36
|
Feature Requests item #2933753, was opened at 2010-01-17 11:02 Message generated for change (Tracker Item Submitted) made by matuschd You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=363153&aid=2933753&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: Daniel Matuschek (matuschd) Assigned to: Nobody/Anonymous (nobody) Summary: optgroup parser should stop optgroup parsing on unknown tags Initial Comment: Todays implementation processed only <option> tags inside an optgroup. If the optgroup is not closed correctly in the HTML code, all tags after this will be ignored. Browsers seems to close the optgroup if there are other tags then <option>. It seems to be a better idea to close the optgroup if an unknown tag was found instead of ignoring these tags. This should give better results on buggy HTML code. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=363153&aid=2933753&group_id=13153 |
From: Erik M. <eri...@gm...> - 2010-01-10 17:27:32
|
Hi, I think JTidy needs a SAX interface, here is a simple one that uses the JTidy lexer. It doesn't validate but closes tags that isn't closed in HTML, it uses the tag table to see if the tag is expected to have content. If it meets a close tag it search for a matching open tag in a stack of parent nodes and closes unclosed tags. There is an example main method at the bottom. The reason you might want a SAX interface is that it streams the result, which means you can parse the result as it loads and you don't have to keep the entire document in memory which is nice in mobile devices for example or if you need to parse really large html documents. http://dl.dropbox.com/u/16123/Android/JTidy/JTidySAXParserFactory.java Interesting?? -- /erik martino |
From: SourceForge.net <no...@so...> - 2010-01-04 05:04:04
|
Bugs item #2922337, was opened at 2009-12-28 23:32 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) >Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 13:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2009-12-28 15:32:42
|
Bugs item #2922337, was opened at 2009-12-28 16:32 Message generated for change (Tracker Item Submitted) made by schierlm You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Nobody/Anonymous (nobody) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2009-12-19 18:26:03
|
Bugs item #2891765, was opened at 2009-11-04 16:55 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2891765&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Guido Leenders (guido_leenders) >Assigned to: Adrian Sandor (aditsu) Summary: Can not use same target as source (NullPointerException) Initial Comment: When using the following task in ant: <tidy destdir="XYZ"> <fileset dir="XYZ"> <include name="*.html" /> </fileset> ... I am getting: java.lang.NullPointerException at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.ant.JTidyTask.processFile(Unknown Source) at org.w3c.tidy.ant.JTidyTask.executeSet(Unknown Source) at org.w3c.tidy.ant.JTidyTask.execute(Unknown Source) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:357) at org.apache.tools.ant.Target.performTasks(Target.java:385) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1337) at org.apache.tools.ant.Project.executeTarget(Project.java:1306) at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) at org.apache.tools.ant.Project.executeTargets(Project.java:1189) at org.apache.tools.ant.Main.runBuild(Main.java:758) at org.apache.tools.ant.Main.startAnt(Main.java:217) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) When using <tidy destdir="DIFFERENT"> <fileset dir="XYZ"> <include name="*.html" /> </fileset> ... jtidy works fine. For my scenario, it would be great if JTidy could handle the same target as source dir (itself creating temporary files when necessary). Or if it would raise a better error. ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2009-12-20 02:26 Message: I'm not familiar with the ant task, however tidy has a "writeback" option, are you able to use it and does it fix the problem? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2891765&group_id=13153 |