You can subscribe to this list here.
2004 |
Jan
(29) |
Feb
(1) |
Mar
(6) |
Apr
(31) |
May
(2) |
Jun
(2) |
Jul
(13) |
Aug
(31) |
Sep
(41) |
Oct
(12) |
Nov
(13) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(17) |
Feb
(3) |
Mar
(3) |
Apr
|
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(3) |
Sep
(3) |
Oct
(1) |
Nov
(2) |
Dec
(6) |
2006 |
Jan
(4) |
Feb
(6) |
Mar
(2) |
Apr
(1) |
May
|
Jun
|
Jul
(21) |
Aug
(7) |
Sep
(5) |
Oct
(4) |
Nov
(2) |
Dec
(2) |
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(2) |
Oct
(2) |
Nov
|
Dec
(1) |
2008 |
Jan
(1) |
Feb
(1) |
Mar
(7) |
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
(8) |
2009 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(5) |
Aug
(24) |
Sep
(16) |
Oct
(8) |
Nov
(42) |
Dec
(3) |
2010 |
Jan
(8) |
Feb
(8) |
Mar
(14) |
Apr
(29) |
May
(2) |
Jun
(1) |
Jul
(11) |
Aug
(47) |
Sep
(4) |
Oct
(16) |
Nov
(18) |
Dec
|
2011 |
Jan
(5) |
Feb
(4) |
Mar
(2) |
Apr
|
May
|
Jun
(10) |
Jul
(50) |
Aug
(4) |
Sep
(4) |
Oct
(1) |
Nov
(4) |
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: SourceForge.net <no...@so...> - 2010-04-12 13:09:25
|
Bugs item #2985849, was opened at 2010-04-12 19:23 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2985849&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rajorshi Biswas (rajorshi) >Assigned to: Adrian Sandor (aditsu) Summary: Spaces are lost between elements Initial Comment: I think this is fairly serious. Please run the attached html through jtidy,jar. You will see that the input HTML: private String parseDescription becomes: privateString parseDescription The space between the span tags is lost. HTML Tidy works fine for this. ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-12 21:09 Message: Confirmed ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2985849&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-12 11:23:47
|
Bugs item #2985849, was opened at 2010-04-12 16:53 Message generated for change (Tracker Item Submitted) made by rajorshi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2985849&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rajorshi Biswas (rajorshi) Assigned to: Nobody/Anonymous (nobody) Summary: Spaces are lost between elements Initial Comment: I think this is fairly serious. Please run the attached html through jtidy,jar. You will see that the input HTML: private String parseDescription becomes: privateString parseDescription The space between the span tags is lost. HTML Tidy works fine for this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2985849&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-12 10:36:26
|
Feature Requests item #1780883, was opened at 2007-08-24 11:44 Message generated for change (Comment added) made by verhagent You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=363153&aid=1780883&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: JCopistAdmin (jcopistadmin) Assigned to: Nobody/Anonymous (nobody) Summary: Release new version in Maven Central Repo Initial Comment: Hi there ! We currently use Jtidy in our project running with maven2. As a result, we would need a new version to be uploaded to Maven central repository. The last version found there seems really old : http://repo1.maven.org/maven2/jtidy/jtidy/4aug2000r7-dev/. We have noticed that you provide a snapshot repository (http://jtidy.sourceforge.net/snapshots). However, we would need to depend on a fixed version better than on a snapshot. Last version there is from august 2006. Could you publish this one as a new version and provide it in Maven central repo ? ---------------------------------------------------------------------- Comment By: Tjeerd Verhagen (verhagent) Date: 2010-04-12 12:36 Message: Would indeed be nice to see the artifact appear soon on the public Maven repository. >From what I understand, from the http://maven.apache.org/guides/mini/guide-central-repository-upload.html the group id should change to the domain name, that JTidy owns. Which mean it should be updated to: <groupId>net.sourceforge.jtidy</groupId> I'm the project lead of http://docbook-utils.sourceforge.net/maven-tidy-plugin_1.0/docbook/article-project-overview.html, this Maven Plug-in depends on JTidy release r938, which can not be resolved through a central Maven repository. Maybe the JTidy release manager should have a look into setting up a Sonatype Forge account, so your fixed release get through that repo uploaded in the central Maven repository. And also a SNAPSHOT repository, will be available there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=363153&aid=1780883&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-11 13:14:52
|
Bugs item #2984038, was opened at 2010-04-09 01:56 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2984038&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Adrian Sandor (aditsu) Summary: Posible issue with attributes manipulation Initial Comment: Using JTidy with some HTMLs that have attributes with value without quotes (") I gave an error because the separator space for attributes was deleted. Example: Original: <A NAME='JD_CdigoTributarioArt.6RESOLUCIONN55'> After JTidy: <a id="JD_CdigoTributarioArt.6RESOLUCIONN55"name='JD_CdigoTributarioArt.6RESOLUCIONN55'></a> In this case, JTidy added a 'name' attribute with the same value of the 'id' attribute but without separator spaces for attributes. The options used for this case are: Tidy tidy = new Tidy(); tidy.setXmlOut(true); tidy.setXHTML(true); tidy.setPrintBodyOnly(true); tidy.setShowWarnings(false); tidy.setQuiet(true); tidy.setNumEntities(true); tidy.setDropProprietaryAttributes(true); tidy.setLiteralAttribs(true); I solve this using the option: tidy.setIndentAttributes(true); I attach the original htm of the example. Sorry for my english and for guest user post ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-11 21:14 Message: I tried your input and your code, and the output I get is: <a id="JD_CdigoTributarioArt.6RESOLUCIONN55" name='JD_CdigoTributarioArt.6RESOLUCIONN55'></a> As you can see, the name attribute is on a separate line. What version of JTidy are you using? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2984038&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-08 17:56:00
|
Bugs item #2984038, was opened at 2010-04-08 17:56 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2984038&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Posible issue with attributes manipulation Initial Comment: Using JTidy with some HTMLs that have attributes with value without quotes (") I gave an error because the separator space for attributes was deleted. Example: Original: <A NAME='JD_CdigoTributarioArt.6RESOLUCIONN55'> After JTidy: <a id="JD_CdigoTributarioArt.6RESOLUCIONN55"name='JD_CdigoTributarioArt.6RESOLUCIONN55'></a> In this case, JTidy added a 'name' attribute with the same value of the 'id' attribute but without separator spaces for attributes. The options used for this case are: Tidy tidy = new Tidy(); tidy.setXmlOut(true); tidy.setXHTML(true); tidy.setPrintBodyOnly(true); tidy.setShowWarnings(false); tidy.setQuiet(true); tidy.setNumEntities(true); tidy.setDropProprietaryAttributes(true); tidy.setLiteralAttribs(true); I solve this using the option: tidy.setIndentAttributes(true); I attach the original htm of the example. Sorry for my english and for guest user post ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2984038&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-04 15:31:22
|
Bugs item #1953507, was opened at 2008-04-29 00:16 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=1953507&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: incorrect handling of french html doc Initial Comment: French HTML document is corrupted. Refer to http://decision.tcc-cci.gc.ca/fr/2006/2008cci182/2008cci182.html ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-11-21 21:57 Message: Can you reproduce this problem with a recent version of JTidy? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=1953507&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-04 10:56:24
|
Bugs item #2973054, was opened at 2010-03-19 17:31 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-04 18:56 Message: Fixed in svn (r1104) to escape end tags like Tidy. Now the output matches Tidy. If you think it's still wrong, please file a bug in the Tidy project. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 21:57 Message: After correcting some error levels, it seems to work better. The only significant difference from Tidy output now is that Tidy escapes several end tags with a backslash before the slash. However, in both Tidy and JTidy, the end tags for script, body and html are repeated (except in Tidy they're escaped the first time). Also, I never got the output you pasted (with escaped angle brackets). If you think the slashes need to be escaped like Tidy does, I'll do that. If you think the repeated end tags are a problem, please file a Tidy bug. For anything else, please provide more information. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 21:21 Message: I'm getting a very different result - no output and 6 errors (with no error messages). Also, the CodeUpdateAndJava5 branch seems to handle it much better, I'll try to backport the differences. ---------------------------------------------------------------------- Comment By: Christophe Scholly () Date: 2010-03-19 18:13 Message: The problem seems to be in the getCDATA method of the Lexer class. The problem don't occur when you put an old version of getCDATA in the Lexer. The bug only occur in the last version of JTidy (938) because of the support of "scripts into scripts". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-03 14:47:06
|
Bugs item #2940893, was opened at 2010-01-27 18:49 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940893&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Brian Matzon (matzon) >Assigned to: Adrian Sandor (aditsu) Summary: xhtml output + print body forces indent/pretty print Initial Comment: jtidy will incorrectly prettyprint/indent output when xhtml output is enabled and only printing body. Input: <span class="text-small"><i>Hello <b>How</b> Are You?</i></span> becomes: <span class="text-small"> <i>Hello <b>How</b> Are You?</i> </span> This does not happen when using the tidy binary. Options: tidy.setPrintBodyOnly(true); tidy.setXHTML(true); vs show-body-only: true output-xhtml: true ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-03 22:47 Message: Fixed in svn (r1103) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2940893&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-03 12:40:17
|
Bugs item #2977242, was opened at 2010-03-27 03:47 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2977242&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Not correctly Initial Comment: Use this INPUT for jtidy : http://www.senado.gov.ar/web/comisiones/listado.php it works in every browser but jtidy is destroying it. ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-03 20:40 Message: Could you explain what the problem is? How is it destroying it? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2977242&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-03 00:54:41
|
Bugs item #2976610, was opened at 2010-03-25 13:45 Message generated for change (Comment added) made by chengas123 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Benjamin McCann (chengas123) Assigned to: Adrian Sandor (aditsu) Summary: jtidy cannot handle xml declarations Initial Comment: The following prints an empty string: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<?xml version=\"1.0\"?><html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); String outputString = new String(output.toByteArray()); System.out.println(outputString); } } ---------------------------------------------------------------------- >Comment By: Benjamin McCann (chengas123) Date: 2010-04-02 20:54 Message: I just updated to the latest version from SVN and this works now. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 04:36 Message: I ran your exact code and the output wasn't empty. What JTidy version are you using? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 15:08:11
|
Bugs item #2836476, was opened at 2009-08-13 04:33 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2836476&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Mr. Hatchet (hatchetman82) Assigned to: Nobody/Anonymous (nobody) Summary: Missing node type description causes Node.toString to fail Initial Comment: in Node.java, there's a String array caled NODETYPE_STRING which contains descriptions for node types. an entry for JSTE is missing, which causes toString to fail when it encounters an XML_DECL tag (because of an ArrayIndexOutOfBounds). the fic should be very easy - add a description string ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 23:08 Message: Closing due to lack of feedback ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2009-08-23 01:17 Message: Can you provide a test case? ---------------------------------------------------------------------- Comment By: Mr. Hatchet (hatchetman82) Date: 2009-08-13 04:40 Message: also missing from the same array is the description for the CDATA tag ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2836476&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 13:57:22
|
Bugs item #2973054, was opened at 2010-03-19 17:31 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 21:57 Message: After correcting some error levels, it seems to work better. The only significant difference from Tidy output now is that Tidy escapes several end tags with a backslash before the slash. However, in both Tidy and JTidy, the end tags for script, body and html are repeated (except in Tidy they're escaped the first time). Also, I never got the output you pasted (with escaped angle brackets). If you think the slashes need to be escaped like Tidy does, I'll do that. If you think the repeated end tags are a problem, please file a Tidy bug. For anything else, please provide more information. ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 21:21 Message: I'm getting a very different result - no output and 6 errors (with no error messages). Also, the CodeUpdateAndJava5 branch seems to handle it much better, I'll try to backport the differences. ---------------------------------------------------------------------- Comment By: Christophe Scholly () Date: 2010-03-19 18:13 Message: The problem seems to be in the getCDATA method of the Lexer class. The problem don't occur when you put an old version of getCDATA in the Lexer. The bug only occur in the last version of JTidy (938) because of the support of "scripts into scripts". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 13:21:01
|
Bugs item #2973054, was opened at 2010-03-19 17:31 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 21:21 Message: I'm getting a very different result - no output and 6 errors (with no error messages). Also, the CodeUpdateAndJava5 branch seems to handle it much better, I'll try to backport the differences. ---------------------------------------------------------------------- Comment By: Christophe Scholly () Date: 2010-03-19 18:13 Message: The problem seems to be in the getCDATA method of the Lexer class. The problem don't occur when you put an old version of getCDATA in the Lexer. The bug only occur in the last version of JTidy (938) because of the support of "scripts into scripts". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 13:19:22
|
Bugs item #2973054, was opened at 2010-03-19 17:31 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Adrian Sandor (aditsu) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- Comment By: Christophe Scholly () Date: 2010-03-19 18:13 Message: The problem seems to be in the getCDATA method of the Lexer class. The problem don't occur when you put an old version of getCDATA in the Lexer. The bug only occur in the last version of JTidy (938) because of the support of "scripts into scripts". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 09:35:17
|
Bugs item #2976597, was opened at 2010-03-26 01:37 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Adrian Sandor (aditsu) Summary: jtidy turns backslahes into forward slashes Initial Comment: Jtidy reverses slashes: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); System.out.println(new String(output.toByteArray())); } } ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 17:35 Message: Ok updated now, thanks for the report ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 17:34 Message: Fixed in svn (r1099), having some trouble updating the status ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 09:34:53
|
Bugs item #2976597, was opened at 2010-03-26 01:37 Message generated for change (Settings changed) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) >Assigned to: Adrian Sandor (aditsu) Summary: jtidy turns backslahes into forward slashes Initial Comment: Jtidy reverses slashes: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); System.out.println(new String(output.toByteArray())); } } ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 17:34 Message: Fixed in svn (r1099), having some trouble updating the status ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 09:34:36
|
Bugs item #2976597, was opened at 2010-03-26 01:37 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: jtidy turns backslahes into forward slashes Initial Comment: Jtidy reverses slashes: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); System.out.println(new String(output.toByteArray())); } } ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 17:34 Message: Fixed in svn (r1099), having some trouble updating the status ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-04-02 08:36:36
|
Bugs item #2976610, was opened at 2010-03-26 01:45 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Benjamin McCann (chengas123) >Assigned to: Adrian Sandor (aditsu) Summary: jtidy cannot handle xml declarations Initial Comment: The following prints an empty string: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<?xml version=\"1.0\"?><html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); String outputString = new String(output.toByteArray()); System.out.println(outputString); } } ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-04-02 16:36 Message: I ran your exact code and the output wasn't empty. What JTidy version are you using? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-26 19:47:17
|
Bugs item #2977242, was opened at 2010-03-26 19:47 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2977242&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Not correctly Initial Comment: Use this INPUT for jtidy : http://www.senado.gov.ar/web/comisiones/listado.php it works in every browser but jtidy is destroying it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2977242&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-25 17:45:48
|
Bugs item #2976610, was opened at 2010-03-25 13:45 Message generated for change (Tracker Item Submitted) made by chengas123 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Benjamin McCann (chengas123) Assigned to: Nobody/Anonymous (nobody) Summary: jtidy cannot handle xml declarations Initial Comment: The following prints an empty string: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<?xml version=\"1.0\"?><html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); String outputString = new String(output.toByteArray()); System.out.println(outputString); } } ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976610&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-25 17:38:00
|
Bugs item #2976597, was opened at 2010-03-25 17:37 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: jtidy turns backslahes into forward slashes Initial Comment: Jtidy reverses slashes: package com.benmccann.playground; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import org.w3c.dom.Document; import org.w3c.tidy.Tidy; public class JtidyTester { public static void main(String[] args) { Tidy tidy = new Tidy(); tidy.setMakeClean(false); tidy.setXmlTags(false); tidy.setXmlOut(false); tidy.setQuiet(true); tidy.setShowWarnings(false); tidy.setTidyMark(false); byte[] bytes = "<html><a href=\"javascript:alert('\\'hi\\'');\">click</a></html>".getBytes(); Document document = tidy.parseDOM(new ByteArrayInputStream(bytes), null); ByteArrayOutputStream output = new ByteArrayOutputStream(); tidy.pprint(document, output); System.out.println(new String(output.toByteArray())); } } ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2976597&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-19 10:13:40
|
Bugs item #2973054, was opened at 2010-03-19 10:31 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- Comment By: Christophe Scholly () Date: 2010-03-19 11:13 Message: The problem seems to be in the getCDATA method of the Lexer class. The problem don't occur when you put an old version of getCDATA in the Lexer. The bug only occur in the last version of JTidy (938) because of the support of "scripts into scripts". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-19 09:31:21
|
Bugs item #2973054, was opened at 2010-03-19 09:31 Message generated for change (Tracker Item Submitted) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: script within script problem Initial Comment: Jtidy isn't able to handle situations like : <html> <head> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> The tidied result is : <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" name="generator"/> <title>test title</title> </head> <body> <script type="text/javascript"> <!-- document.write('<script></scr'+'ipt>'); //--> </script> </body> </html> </script> </body> </html> Have anyone an issue ? Thanks Christophe ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2973054&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-16 18:02:19
|
Bugs item #2922337, was opened at 2009-12-28 16:32 Message generated for change (Comment added) made by schierlm You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- Comment By: Michael Schierl (schierlm) Date: 2010-03-16 19:02 Message: Hello Adrian, Thank you very much for your fast, polite and informative comment. We need more developers like you :-) I can understand your point (assuming that most calls of getString are in similar situations where some compare function that stops at a null byte was replaced). You might want to adjust the method comment and maybe rename the length parameter to max length, and maybe even check for embedded nulls and cut the string at the first one, as tmbstrncasecmp("foo\0bar", "foo\0foo", 7) will return 0 as the strings are the same up to the first 0 byte. I hope you have double-checked the other calls to getString, as this cutting might easily hide an obscure bug even deeper :) Some minor issue: the length() function and the tmbstrlen (which counts UTF-8 bytes) will return different values if non-ASCII-characters are involved; therefore a comparison that worked in C might fail in Java, as getString will convert too few bytes. But I understand this will really take time to fix properly, so just using the C version for international documents might be the better short-term solution :-) [I wrote a long comment about that including some real-world examples, but trying to send the comment ran into a server error, maybe because of some weird characters, so I am writing it for the second time now, without any examples. Sorry for that...] ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-16 07:47 Message: cigaly: you're talking about the revision of a single file, Lexer.java, but the fix is not there. I agree with schierlm about one thing: the change in r1094 was in TidyUtils, not in Lexer. However, I fully disagree about the other comments. First of all, I would like to remind everybody that this is a port of Tidy (aka HTML Tidy), which was written in C. JTidy strives to be completely compatible with it. cigaly says the problem is in these lines of Lexer.java: matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); The corresponding lines of lexer code in Tidy are: matches = TY_(tmbstrncasecmp)(container->element, lexer->lexbuf + start, TY_(tmbstrlen)(container->element)) == 0; As you can see, it's a direct translation, except strncasecmp which is replaced by equalsIgnoreCase and TidyUtils.getString. And the error happened because TidyUtils.getString (in conjunction with equalsIgnoreCase) didn't work exactly like tmbstrncasecmp. Now, by changing the code to: matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); the meaning of the code will change and will be different from Tidy. That's already a big problem, however the greatest problem is that TidyUtils.getString remains unfixed, so it can still cause similar problems when it is called from (many) other places. Therefore, I see "lexsize - start - 1" as the "taping over" (I'm not even sure it works correctly), and changing TidyUtils.getString to stop at the end of the buffer (which is null-terminated in C) as the correct fix, and that is why I chose it. It wasn't a "let's silence this error" decision. There may be better ways to port tmbstrncasecmp, but at least this fix can make the current way work correctly. ---------------------------------------------------------------------- Comment By: Michael Schierl (schierlm) Date: 2010-03-15 19:50 Message: The bug was not fixed by aditsu in r1094, it was just "taped over" (or "silenced") by making sure that getString will not throw SIOOBE any longer: http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/TidyUtils.java?r1=1094&r2=1093&pathrev=1094 When I saw this "fix" I knew that I'll have to migrate to some other HTML sanitizer than JTidy, at least for cases where the sanitizing happens automatically without anyone (except customers) looking at the output afterwards. (yes I know the NO WARRANTY clause of most open source licenses, so I will not complain about it). cigaly: If you managed to track down the root cause (I tried it but gave up after searching for a few hours), feel free to attach a patch to this bug. I will also test it and if it works fine with the "real" files that show that bug (with r1094 reversed) I will at least apply it to my private version, if aditsu does not want to apply it). I don't think forking is the right way to handle those issues, but if there is no alternative to it (r1094 is not, for me), I'll do my private fork. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 19:29 Message: sorry, but this is source that I am seeing in Lexer.java from jtidy-r938-sources.zip downloaded from sourceforge and also in one that I can see in svn repository (e.g. http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/Lexer.java?view=log) ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-15 14:17 Message: The bug was reported against revision 938 and you're complaining that it's not fixed in revision 927?!? That doesn't make any sense. Anyway, the bug was fixed in revision 1094. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 12:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 06:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |
From: SourceForge.net <no...@so...> - 2010-03-16 06:47:21
|
Bugs item #2922337, was opened at 2009-12-28 23:32 Message generated for change (Comment added) made by aditsu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Michael Schierl (schierlm) Assigned to: Adrian Sandor (aditsu) Summary: StringIndexOutOfBoundsException while lexing script content Initial Comment: JTidy version: jtidy-r938.jar Consider this example file: public class JTidyBug { public static void main(String[] args) throws Exception { final String SOURCE = "\n" + "<script>\n" + "var o={x=9};\n" + "var q=x<o.x;\n" + "</script>"; char[] padding = new char[8165]; java.util.Arrays.fill(padding, 'x'); String source = new String(padding)+SOURCE; org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy(); tidy.setShowWarnings(false); tidy.parse(new java.io.ByteArrayInputStream(source.getBytes("ISO-8859-1")), System.out); } } Expected result: A tidied HTML is output that is similar to that one produced when replacing the 8165 in the code above by 8160 Actual result: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 8193 at java.lang.String.checkBounds(String.java:402) at java.lang.String.<init>(String.java:443) at org.w3c.tidy.TidyUtils.getString(Unknown Source) at org.w3c.tidy.Lexer.getCDATA(Unknown Source) at org.w3c.tidy.ParserImpl$ParseScript.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseBody.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseTag(Unknown Source) at org.w3c.tidy.ParserImpl$ParseHTML.parse(Unknown Source) at org.w3c.tidy.ParserImpl.parseDocument(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at org.w3c.tidy.Tidy.parse(Unknown Source) at JTidyBug.main(JTidyBug.java:13) ---------------------------------------------------------------------- >Comment By: Adrian Sandor (aditsu) Date: 2010-03-16 14:47 Message: cigaly: you're talking about the revision of a single file, Lexer.java, but the fix is not there. I agree with schierlm about one thing: the change in r1094 was in TidyUtils, not in Lexer. However, I fully disagree about the other comments. First of all, I would like to remind everybody that this is a port of Tidy (aka HTML Tidy), which was written in C. JTidy strives to be completely compatible with it. cigaly says the problem is in these lines of Lexer.java: matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); The corresponding lines of lexer code in Tidy are: matches = TY_(tmbstrncasecmp)(container->element, lexer->lexbuf + start, TY_(tmbstrlen)(container->element)) == 0; As you can see, it's a direct translation, except strncasecmp which is replaced by equalsIgnoreCase and TidyUtils.getString. And the error happened because TidyUtils.getString (in conjunction with equalsIgnoreCase) didn't work exactly like tmbstrncasecmp. Now, by changing the code to: matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); the meaning of the code will change and will be different from Tidy. That's already a big problem, however the greatest problem is that TidyUtils.getString remains unfixed, so it can still cause similar problems when it is called from (many) other places. Therefore, I see "lexsize - start - 1" as the "taping over" (I'm not even sure it works correctly), and changing TidyUtils.getString to stop at the end of the buffer (which is null-terminated in C) as the correct fix, and that is why I chose it. It wasn't a "let's silence this error" decision. There may be better ways to port tmbstrncasecmp, but at least this fix can make the current way work correctly. ---------------------------------------------------------------------- Comment By: Michael Schierl (schierlm) Date: 2010-03-16 02:50 Message: The bug was not fixed by aditsu in r1094, it was just "taped over" (or "silenced") by making sure that getString will not throw SIOOBE any longer: http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/TidyUtils.java?r1=1094&r2=1093&pathrev=1094 When I saw this "fix" I knew that I'll have to migrate to some other HTML sanitizer than JTidy, at least for cases where the sanitizing happens automatically without anyone (except customers) looking at the output afterwards. (yes I know the NO WARRANTY clause of most open source licenses, so I will not complain about it). cigaly: If you managed to track down the root cause (I tried it but gave up after searching for a few hours), feel free to attach a patch to this bug. I will also test it and if it works fine with the "real" files that show that bug (with r1094 reversed) I will at least apply it to my private version, if aditsu does not want to apply it). I don't think forking is the right way to handle those issues, but if there is no alternative to it (r1094 is not, for me), I'll do my private fork. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-16 02:29 Message: sorry, but this is source that I am seeing in Lexer.java from jtidy-r938-sources.zip downloaded from sourceforge and also in one that I can see in svn repository (e.g. http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/Lexer.java?view=log) ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-03-15 21:17 Message: The bug was reported against revision 938 and you're complaining that it's not fixed in revision 927?!? That doesn't make any sense. Anyway, the bug was fixed in revision 1094. ---------------------------------------------------------------------- Comment By: cigaly () Date: 2010-03-15 19:26 Message: It has not been fixed, at least not in revision 927 : java.lang.StringIndexOutOfBoundsException caught: java.lang.StringIndexOutOfBoundsException: String index out of range: 16385 at java.lang.String.checkBounds(String.java:401) at java.lang.String.<init>(String.java:442) at org.w3c.tidy.TidyUtils.getString(null:-1) at org.w3c.tidy.Lexer.getCDATA(null:-1) at org.w3c.tidy.ParserImpl$ParseScript.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseBody.parse(null:-1) at org.w3c.tidy.ParserImpl.parseTag(null:-1) at org.w3c.tidy.ParserImpl$ParseHTML.parse(null:-1) at org.w3c.tidy.ParserImpl.parseDocument(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parse(null:-1) at org.w3c.tidy.Tidy.parseDOM(null:-1) Problem is caused by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, container.element.length())); in situations when start + container.element.length() > lexlength To fix that problem above lines should be replaced by matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start, lexsize - start - 1)); both in block handling state CDATA_STARTTAG and CDATA_ENDTAG ---------------------------------------------------------------------- Comment By: Adrian Sandor (aditsu) Date: 2010-01-04 13:03 Message: Fixed in svn, thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=2922337&group_id=13153 |