jtidy-devel Mailing List for JTidy

Brought to you by: aditsu, atripp, fgiust, garypeskin, and 4 others

jtidy-devel — JTidy mailing list for developers

You can subscribe to this list here.

2004	Jan (29)	Feb (1)	Mar (6)	Apr (31)	May (2)	Jun (2)	Jul (13)	Aug (31)	Sep (41)	Oct (12)	Nov (13)	Dec (4)
2005	Jan (17)	Feb (3)	Mar (3)	Apr	May (1)	Jun (2)	Jul (1)	Aug (3)	Sep (3)	Oct (1)	Nov (2)	Dec (6)
2006	Jan (4)	Feb (6)	Mar (2)	Apr (1)	May	Jun	Jul (21)	Aug (7)	Sep (5)	Oct (4)	Nov (2)	Dec (2)
2007	Jan (1)	Feb	Mar	Apr (2)	May	Jun	Jul (1)	Aug (2)	Sep (2)	Oct (2)	Nov	Dec (1)
2008	Jan (1)	Feb (1)	Mar (7)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct (1)	Nov (2)	Dec (8)
2009	Jan (1)	Feb	Mar	Apr	May (2)	Jun (2)	Jul (5)	Aug (24)	Sep (16)	Oct (8)	Nov (42)	Dec (3)
2010	Jan (8)	Feb (8)	Mar (14)	Apr (29)	May (2)	Jun (1)	Jul (11)	Aug (47)	Sep (4)	Oct (16)	Nov (18)	Dec
2011	Jan (5)	Feb (4)	Mar (2)	Apr	May	Jun (10)	Jul (50)	Aug (4)	Sep (4)	Oct (1)	Nov (4)	Dec
2012	Jan	Feb	Mar	Apr	May (2)	Jun (8)	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec

Flat | Threaded

1 2 3 .. 26 > >> (Page 1 of 26)

[Jtidy-devel] Inquiry

From: Mr. S. L. <hoo...@du...> - 2017-08-03 14:41:14

Dear friend,
I got you/company's contact in a business directory and will like to discuss a very important/serious business with you. I wish to partner you/company, to use your name/your company bank account detail to transfer huge fund for mutual benefit. I am very much aware that the internet has been flooded with unnecessary business proposal but serious business still goes through the internet. The fund was secured through over invoiced contracts amount for projects awarded in my department at the Nigerian National Petroleum Corporation. The projects has been completed and commissioned and the contractors has been paid in full. The over invoiced amount is currently in the Nigerian National Petroleum corporation account with the central bank of Nigeria. The fund will be transferred to our bank account as a sub contractor that executed the project. I will present you as the beneficiary of the fund and facilitate the transfer to your account immediately. I have colleagues in the accounts department that will ensure immediate transfer of the fund. The transfer process will not take more than 10 working days hopefully. We are offering 20% of the total amount as your share of the deal and will like to invest fund in your country as we cannot bring back all the fund to our country immediately to avoid exposing ourselves. We can discuss details if you are interested. My name is Shehu Liman, born 1st October 1958 in Adamawa, Adamawa state Nigeria. I am presently a Group General Manager, Supply Chain Management at Nigerian National Petroleum Corporation. Address NNPC Towers, Herbert Macaulay Way, Central Business District, PMB 190, Garki, Abuja. I look forward to your immediate response
Best Regards,

Mr. Shehu Liman
Group General Manager
Supply Chain Management at Nigerian National Petroleum Corporation.
Phone: 002348175027352

[Jtidy-devel] [ jtidy-Bugs-3532726 ] Non-breaking space in HEAD rejected

From: SourceForge.net <no...@so...> - 2012-06-08 09:47:00

Bugs item #3532726, was opened at 2012-06-07 03:44
Message generated for change (Comment added) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Non-breaking space in HEAD rejected

Initial Comment:
In some web pages I see buggy HTML code like this:

      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/prototype.js"></script>
      &nbsp; &nbsp; &nbsp;                  <script src="common/js/scriptaculous.js?load=effects,builder"></script>
      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/lightbox.js"></script><link rel="prev" href="apps_06_004.html">

The result is a warning (plain text not allowed in HEAD elements) plus the SCRIPT tags are ignored, i.e. removed from the output. This makes the filteres page unusable because I need those scripts.

How to fix: Make JTidy more tolerant by just ignoring nob-breaking space in HEAD sections or treating it like regular whitespace, parsing the rest of the line correctly.

----------------------------------------------------------------------

>Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-08 02:47

Message:
The uploaded test case is basically the same as for bug #3532720, but while
there I have stripped the bogus text nodes from HEAD here I left them in
the file so we have a real-world test case:

The uploaded files are part of a download from Galileo press (Galileo
Openbook about iPhone development). It is freely available for download, so
no worries there. I ran JTidy to clean up the HTML code and bumped into the
problem that HEAD parsing stopped after the first text node was found, thus
all following SCRIPT tags are non-existent in JTidy's output. My patch
fixes this problem.

----------------------------------------------------------------------

Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-08 02:37

Message:
The attached patch fixes the problem for me. Instead of stopping to parse
the HEAD section whenever a text node is found, now text tokens are just
ignored and the parsing continues. This has the additional advantage that
not only legal  HTML after the problematic text node is parsed correctly
but that for multiple occurrences of illegal text nodes withing HEAD a
warning is printed for each location which helps debug bogus HEAD
sections.

Maybe there is a better and cleaner way to do this, but as I said, it works
for me.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532726 ] Non-breaking space in HEAD rejected

From: SourceForge.net <no...@so...> - 2012-06-08 09:37:23

Bugs item #3532726, was opened at 2012-06-07 03:44
Message generated for change (Comment added) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Non-breaking space in HEAD rejected

Initial Comment:
In some web pages I see buggy HTML code like this:

      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/prototype.js"></script>
      &nbsp; &nbsp; &nbsp;                  <script src="common/js/scriptaculous.js?load=effects,builder"></script>
      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/lightbox.js"></script><link rel="prev" href="apps_06_004.html">

The result is a warning (plain text not allowed in HEAD elements) plus the SCRIPT tags are ignored, i.e. removed from the output. This makes the filteres page unusable because I need those scripts.

How to fix: Make JTidy more tolerant by just ignoring nob-breaking space in HEAD sections or treating it like regular whitespace, parsing the rest of the line correctly.

----------------------------------------------------------------------

>Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-08 02:37

Message:
The attached patch fixes the problem for me. Instead of stopping to parse
the HEAD section whenever a text node is found, now text tokens are just
ignored and the parsing continues. This has the additional advantage that
not only legal  HTML after the problematic text node is parsed correctly
but that for multiple occurrences of illegal text nodes withing HEAD a
warning is printed for each location which helps debug bogus HEAD
sections.

Maybe there is a better and cleaner way to do this, but as I said, it works
for me.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532720 ] BR within PRE rendered with additional linefeeds

From: SourceForge.net <no...@so...> - 2012-06-07 21:37:16

Bugs item #3532720, was opened at 2012-06-07 03:35
Message generated for change (Comment added) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: BR within PRE rendered with additional linefeeds

Initial Comment:
Sometimes in the wild there is HTML code with PRE sections containing BR tags instead of linefeeds, like so:

<pre>first line<br>second line<br>third line</pre>

JTidy's pretty-printer renders it like this ("break-before-br" is false):

<pre>
first line<br>
second line<br>
third line
</pre>

The result is that a browser renders two linefeeds where just one should exist, causing ugly empty lines in the output. The problem gets worse if for some reason I use multiple passes of JTidy, adding more and more linefeeds.

How to fix: never ever add newlines after BR tags inside PRE sections.

----------------------------------------------------------------------

>Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-07 14:37

Message:
The uploaded test case is part of a download from Galileo press (Galileo
Openbook about iPhone development). It is freely available for download, so
no worries there. I ran JTidy to clean up the HTML code and bumped into the
problem with unwanted blank lines in the PRE sections. Just search the HTML
file for <pre class="prettyprint"> and the corresponding sections in the
original HTML and the result generated by JTidy-r938. It looks wrong
without my patch and correct with my patch.

----------------------------------------------------------------------

Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-07 14:23

Message:
I just uploaded my humble try to fix the problem. Maybe it is a bit hacky
and not the optimal solution, sorry this was my first look into your code
and I have not been programming for a while. But at least locally it solves
my problem and might be potentially beneficiary for other users, too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532720 ] BR within PRE rendered with additional linefeeds

From: SourceForge.net <no...@so...> - 2012-06-07 21:23:45

Bugs item #3532720, was opened at 2012-06-07 03:35
Message generated for change (Comment added) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: BR within PRE rendered with additional linefeeds

Initial Comment:
Sometimes in the wild there is HTML code with PRE sections containing BR tags instead of linefeeds, like so:

<pre>first line<br>second line<br>third line</pre>

JTidy's pretty-printer renders it like this ("break-before-br" is false):

<pre>
first line<br>
second line<br>
third line
</pre>

The result is that a browser renders two linefeeds where just one should exist, causing ugly empty lines in the output. The problem gets worse if for some reason I use multiple passes of JTidy, adding more and more linefeeds.

How to fix: never ever add newlines after BR tags inside PRE sections.

----------------------------------------------------------------------

>Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-07 14:23

Message:
I just uploaded my humble try to fix the problem. Maybe it is a bit hacky
and not the optimal solution, sorry this was my first look into your code
and I have not been programming for a while. But at least locally it solves
my problem and might be potentially beneficiary for other users, too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532831 ] Non-breaking space in HEAD rejected

From: SourceForge.net <no...@so...> - 2012-06-07 18:22:03

Bugs item #3532831, was opened at 2012-06-07 08:16
Message generated for change (Comment added) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532831&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Deleted
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Non-breaking space in HEAD rejected

Initial Comment:
In some web pages I see buggy HTML code like this:

      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/prototype.js"></script>
      &nbsp; &nbsp; &nbsp;                  <script src="common/js/scriptaculous.js?load=effects,builder"></script>
      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/lightbox.js"></script><link rel="prev" href="apps_06_004.html">

The result is a warning (plain text not allowed in HEAD elements) plus the SCRIPT tags are ignored, i.e. removed from the output. This makes the filteres page unusable because I need those scripts.

How to fix: Make JTidy more tolerant by just ignoring nob-breaking space in HEAD sections or treating it like regular whitespace, parsing the rest of the line correctly.

----------------------------------------------------------------------

>Comment By: Alexander Kriegisch (kriegaex)
Date: 2012-06-07 11:22

Message:
Sorry, somehow a page reload created this bug twice. The older version is
the right one.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532831&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532831 ] Non-breaking space in HEAD rejected

From: SourceForge.net <no...@so...> - 2012-06-07 15:16:08

Bugs item #3532831, was opened at 2012-06-07 08:16
Message generated for change (Tracker Item Submitted) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532831&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Non-breaking space in HEAD rejected

Initial Comment:
In some web pages I see buggy HTML code like this:

      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/prototype.js"></script>
      &nbsp; &nbsp; &nbsp;                  <script src="common/js/scriptaculous.js?load=effects,builder"></script>
      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/lightbox.js"></script><link rel="prev" href="apps_06_004.html">

The result is a warning (plain text not allowed in HEAD elements) plus the SCRIPT tags are ignored, i.e. removed from the output. This makes the filteres page unusable because I need those scripts.

How to fix: Make JTidy more tolerant by just ignoring nob-breaking space in HEAD sections or treating it like regular whitespace, parsing the rest of the line correctly.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532831&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532726 ] Non-breaking space in HEAD rejected

From: SourceForge.net <no...@so...> - 2012-06-07 10:44:26

Bugs item #3532726, was opened at 2012-06-07 03:44
Message generated for change (Tracker Item Submitted) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Non-breaking space in HEAD rejected

Initial Comment:
In some web pages I see buggy HTML code like this:

      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/prototype.js"></script>
      &nbsp; &nbsp; &nbsp;                  <script src="common/js/scriptaculous.js?load=effects,builder"></script>
      &nbsp; &nbsp; &nbsp;                  <script type="text/javascript" src="common/js/lightbox.js"></script><link rel="prev" href="apps_06_004.html">

The result is a warning (plain text not allowed in HEAD elements) plus the SCRIPT tags are ignored, i.e. removed from the output. This makes the filteres page unusable because I need those scripts.

How to fix: Make JTidy more tolerant by just ignoring nob-breaking space in HEAD sections or treating it like regular whitespace, parsing the rest of the line correctly.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532726&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3532720 ] BR within PRE rendered with additional linefeeds

From: SourceForge.net <no...@so...> - 2012-06-07 10:35:12

Bugs item #3532720, was opened at 2012-06-07 03:35
Message generated for change (Tracker Item Submitted) made by kriegaex
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Kriegisch (kriegaex)
Assigned to: Nobody/Anonymous (nobody)
Summary: BR within PRE rendered with additional linefeeds

Initial Comment:
Sometimes in the wild there is HTML code with PRE sections containing BR tags instead of linefeeds, like so:

<pre>first line<br>second line<br>third line</pre>

JTidy's pretty-printer renders it like this ("break-before-br" is false):

<pre>
first line<br>
second line<br>
third line
</pre>

The result is that a browser renders two linefeeds where just one should exist, causing ugly empty lines in the output. The problem gets worse if for some reason I use multiple passes of JTidy, adding more and more linefeeds.

How to fix: never ever add newlines after BR tags inside PRE sections.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3532720&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3406215 ] replacing unexpected <h2> by </h2>. But it is valid h2 tag.

From: SourceForge.net <no...@so...> - 2012-05-04 20:40:07

Bugs item #3406215, was opened at 2011-09-08 05:48
Message generated for change (Comment added) made by martinkurz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rajesh Kumar (rajeshkumarp)
Assigned to: Nobody/Anonymous (nobody)
Summary: replacing unexpected <h2> by </h2>. But it is valid h2 tag.

Initial Comment:
JTidy replacing <h2> as </h2> while using nested <h2> tags.
Here is my HTML code.
<h2 >Test Content
    <h2 >Test Content</h2>
</h2>

Here I get a warning as 
Warning: replacing unexpected <h2> by </h2>


Please help me to fix this issue.

----------------------------------------------------------------------

Comment By: Martin (martinkurz)
Date: 2012-05-04 13:40

Message:
When looking at http://www.w3.org/TR/html4/struct/global.html#h-7.5.5, just
inline elements are allowed inside headings, so h2 isn't allowed inside
another h2 (or other heading). So jTidy finds an error and tries to fix
this.

----------------------------------------------------------------------

Comment By: Adam A. Koch (aakoch)
Date: 2012-05-04 12:41

Message:
I'm not sure nesting h2 elements is allowed. That's probably why it is
failing.

----------------------------------------------------------------------

Comment By: Rajesh Kumar (rajeshkumarp)
Date: 2011-09-08 05:53

Message:
Note : I am using jtidy-r938 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3406215 ] replacing unexpected <h2> by </h2>. But it is valid h2 tag.

From: SourceForge.net <no...@so...> - 2012-05-04 19:41:13

Bugs item #3406215, was opened at 2011-09-08 05:48
Message generated for change (Comment added) made by aakoch
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rajesh Kumar (rajeshkumarp)
Assigned to: Nobody/Anonymous (nobody)
Summary: replacing unexpected <h2> by </h2>. But it is valid h2 tag.

Initial Comment:
JTidy replacing <h2> as </h2> while using nested <h2> tags.
Here is my HTML code.
<h2 >Test Content
    <h2 >Test Content</h2>
</h2>

Here I get a warning as 
Warning: replacing unexpected <h2> by </h2>


Please help me to fix this issue.

----------------------------------------------------------------------

Comment By: Adam A. Koch (aakoch)
Date: 2012-05-04 12:41

Message:
I'm not sure nesting h2 elements is allowed. That's probably why it is
failing.

----------------------------------------------------------------------

Comment By: Rajesh Kumar (rajeshkumarp)
Date: 2011-09-08 05:53

Message:
Note : I am using jtidy-r938 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3432258 ] Unwrapped inline content means invalid XHTML is generated

From: SourceForge.net <no...@so...> - 2011-11-02 16:51:25

Bugs item #3432258, was opened at 2011-11-02 12:56
Message generated for change (Comment added) made by helsom
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hel (helsom)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unwrapped inline content means invalid XHTML is generated

Initial Comment:
Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails.

Example HTML 1 (generates valid XHTML)
"Text <em>Inline content</em>" -> "<p>Text <em>Inline content</em></p>"

Example HTML 2 (generates invalid XHTML)
"<em>Inline content</em>" -> "<em>Inline content</em>"

There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow?

----------------------------------------------------------------------

>Comment By: Hel (helsom)
Date: 2011-11-02 16:51

Message:
Adding code very similar to the TEXT_NODE encloseBodyText processing (about
line 799 within ParserImpl.java) for an inline element (at about line 934)
seems to result in inline content within the body being properly wrapped,
though it hasn't had extensive testing and there may be a better way.

That is,

                if (node.type == Node.START_TAG || node.type ==
Node.START_END_TAG)
                {
                    if ( (node.tag.model & Dict.CM_INLINE) != 0 ) {

                        if (lexer.configuration.encloseBodyText)
                        {
                            Node para;

                            lexer.ungetToken();
                            para = lexer.inferredTag("p");
                            body.insertNodeAtEnd(para);
                            parseTag(lexer, para, mode);
                            mode = Lexer.MIXED_CONTENT;
                            continue;
                        }
                    	
                    }
                    
                   ...




----------------------------------------------------------------------

Comment By: Hel (helsom)
Date: 2011-11-02 15:33

Message:
Even with the mentioned code being re-instated, this only resolves wrapping
of inline content within a blockquote, for example, and not at the top
level within the body element.

For Example:

"<em>ssss</em> <blockquote><em>Inline content</em></blockquote>"
generates xhtml:
"<em>ssss</em> <blockquote> <p><em>Inline content</em></p> </blockquote>"

Note the initial <em> does not get wrapped with a p element. If I place
some text in front of it, however, it does get wrapped.

For Example:

"xxxx <em>ssss</em> <blockquote><em>Inline content</em></blockquote>"
generates xhtml:
"<p>xxxx <em>ssss</em></p> <blockquote> <p><em>Inline content</em></p>
</blockquote>"


----------------------------------------------------------------------

Comment By: Hel (helsom)
Date: 2011-11-02 14:08

Message:
Update: This is not an xhtml-specific problem. Incorrectly wrapped content
also fails HTML 4.01 Strict validation. It seems a shame to lose this
important functionality because of what seems to be quite an obsure bug
(1403105).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3432258 ] Unwrapped inline content means invalid XHTML is generated

From: SourceForge.net <no...@so...> - 2011-11-02 15:33:14

Bugs item #3432258, was opened at 2011-11-02 12:56
Message generated for change (Comment added) made by helsom
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hel (helsom)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unwrapped inline content means invalid XHTML is generated

Initial Comment:
Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails.

Example HTML 1 (generates valid XHTML)
"Text <em>Inline content</em>" -> "<p>Text <em>Inline content</em></p>"

Example HTML 2 (generates invalid XHTML)
"<em>Inline content</em>" -> "<em>Inline content</em>"

There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow?

----------------------------------------------------------------------

>Comment By: Hel (helsom)
Date: 2011-11-02 15:33

Message:
Even with the mentioned code being re-instated, this only resolves wrapping
of inline content within a blockquote, for example, and not at the top
level within the body element.

For Example:

"<em>ssss</em> <blockquote><em>Inline content</em></blockquote>"
generates xhtml:
"<em>ssss</em> <blockquote> <p><em>Inline content</em></p> </blockquote>"

Note the initial <em> does not get wrapped with a p element. If I place
some text in front of it, however, it does get wrapped.

For Example:

"xxxx <em>ssss</em> <blockquote><em>Inline content</em></blockquote>"
generates xhtml:
"<p>xxxx <em>ssss</em></p> <blockquote> <p><em>Inline content</em></p>
</blockquote>"


----------------------------------------------------------------------

Comment By: Hel (helsom)
Date: 2011-11-02 14:08

Message:
Update: This is not an xhtml-specific problem. Incorrectly wrapped content
also fails HTML 4.01 Strict validation. It seems a shame to lose this
important functionality because of what seems to be quite an obsure bug
(1403105).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3432258 ] Unwrapped inline content means invalid XHTML is generated

From: SourceForge.net <no...@so...> - 2011-11-02 14:08:34

Bugs item #3432258, was opened at 2011-11-02 12:56
Message generated for change (Comment added) made by helsom
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hel (helsom)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unwrapped inline content means invalid XHTML is generated

Initial Comment:
Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails.

Example HTML 1 (generates valid XHTML)
"Text <em>Inline content</em>" -> "<p>Text <em>Inline content</em></p>"

Example HTML 2 (generates invalid XHTML)
"<em>Inline content</em>" -> "<em>Inline content</em>"

There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow?

----------------------------------------------------------------------

>Comment By: Hel (helsom)
Date: 2011-11-02 14:08

Message:
Update: This is not an xhtml-specific problem. Incorrectly wrapped content
also fails HTML 4.01 Strict validation. It seems a shame to lose this
important functionality because of what seems to be quite an obsure bug
(1403105).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3432258 ] Unwrapped inline content means invalid XHTML is generated

From: SourceForge.net <no...@so...> - 2011-11-02 12:56:52

Bugs item #3432258, was opened at 2011-11-02 12:56
Message generated for change (Tracker Item Submitted) made by helsom
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hel (helsom)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unwrapped inline content means invalid XHTML is generated

Initial Comment:
Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails.

Example HTML 1 (generates valid XHTML)
"Text <em>Inline content</em>" -> "<p>Text <em>Inline content</em></p>"

Example HTML 2 (generates invalid XHTML)
"<em>Inline content</em>" -> "<em>Inline content</em>"

There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3419740 ] StringIndexOutOfBoundsException when lexing a web page

From: SourceForge.net <no...@so...> - 2011-10-06 17:19:21

Bugs item #3419740, was opened at 2011-10-06 11:19
Message generated for change (Tracker Item Submitted) made by kd7ike
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3419740&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kim Ebert (kd7ike)
Assigned to: Nobody/Anonymous (nobody)
Summary: StringIndexOutOfBoundsException when lexing a web page

Initial Comment:
Possible fix for lexer bug.


=== modified file 'jtidy/src/main/java/org/w3c/tidy/Lexer.java'
--- jtidy/src/main/java/org/w3c/tidy/Lexer.java	2010-05-06 23:18:10 +0000
+++ jtidy/src/main/java/org/w3c/tidy/Lexer.java	2010-11-02 02:18:59 +0000
@@ -1821,7 +1821,12 @@
             	if (TidyUtils.isLetter((char) c)) {
                      continue;
             	}
-            	matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start,
+            	/* Fix for bug #991  */
+            	if ((start + container.element.length()) > lexsize)
+            		matches = false;
+            	/* End Fix */
+            	else
+            		matches = container.element.equalsIgnoreCase(TidyUtils.getString(lexbuf, start,
             			container.element.length()));
             	if (matches) {
             		nested++;


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3419740&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3406215 ] replacing unexpected <h2> by </h2>. But it is valid h2 tag.

From: SourceForge.net <no...@so...> - 2011-09-09 05:11:17

Bugs item #3406215, was opened at 2011-09-08 18:18
Message generated for change (Settings changed) made by rajeshkumarp
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rajesh Kumar (rajeshkumarp)
Assigned to: Nobody/Anonymous (nobody)
>Summary: replacing unexpected <h2> by </h2>. But it is valid h2 tag.

Initial Comment:
JTidy replacing <h2> as </h2> while using nested <h2> tags.
Here is my HTML code.
<h2 >Test Content
    <h2 >Test Content</h2>
</h2>

Here I get a warning as 
Warning: replacing unexpected <h2> by </h2>


Please help me to fix this issue.

----------------------------------------------------------------------

Comment By: Rajesh Kumar (rajeshkumarp)
Date: 2011-09-08 18:23

Message:
Note : I am using jtidy-r938 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3349161 ] problem parsing CDATA

From: SourceForge.net <no...@so...> - 2011-09-08 18:12:39

Bugs item #3349161, was opened at 2011-07-01 15:56
Message generated for change (Comment added) made by furman82
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349161&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Aaron Herstein (aarongh2012)
Assigned to: Nobody/Anonymous (nobody)
Summary: problem parsing CDATA

Initial Comment:
When parsing this page: http://www.nytimes.com/2011/04/14/world/asia/14quake.html?_r=2, a StringIndexOutOfBoundsException is being thrown with this stack trace:

java.lang.StringIndexOutOfBoundsException: String index out of range: 16385
	at java.lang.String.checkBounds(Unknown Source)
	at java.lang.String.<init>(Unknown Source)
	at org.w3c.tidy.TidyUtils.getString(TidyUtils.java:658)
	at org.w3c.tidy.Lexer.getCDATA(Lexer.java:1835)
	at org.w3c.tidy.ParserImpl$ParseScript.parse(ParserImpl.java:667)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2464)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:971)
	at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:203)
	at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:483)
	at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:3401)
	at org.w3c.tidy.Tidy.parse(Tidy.java:435)
	at org.w3c.tidy.Tidy.parse(Tidy.java:658)

----------------------------------------------------------------------

Comment By: Matt Furman (furman82)
Date: 2011-09-08 14:12

Message:
I also ran into this issue and "fixed" it locally... 

It appears to be a flaw with addByte within Lexer.java. The function
assumes that the buffer only gets examined one byte at a time, however in
the CDATA function, the call to TidyUtils.getString passes in a length that
is greater than 1. I overloaded the appropriate functions to allow to pass
in the size the buffer needs to grow by.

public void addByte(int c) {
    	addByte(c, 1);
    }
    
    /**
     * Adds a byte to lexer buffer.
     * @param c byte to add
     */
    public void addByte(int c, int size)
    {
        if (this.lexsize + size >= this.lexlength)
        {
            while (this.lexsize + size >= this.lexlength)
            {
                if (this.lexlength == 0)
                {
                    this.lexlength = 8192;
                }
                else
                {
                    this.lexlength = this.lexlength * 2;
                }
            }

            byte[] temp = this.lexbuf;
            this.lexbuf = new byte[this.lexlength];
            if (temp != null)
            {
                System.arraycopy(temp, 0, this.lexbuf, 0, temp.length);
                updateNodeTextArrays(temp, this.lexbuf);
            }
        }

        this.lexbuf[this.lexsize++] = (byte) c;
        this.lexbuf[this.lexsize] = (byte) '\0'; // debug
    }

Once I changed the necessary associated functions, it seemed to do the
trick.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349161&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3406215 ] replacing unexpected <h2> by </h2>

From: SourceForge.net <no...@so...> - 2011-09-08 12:53:35

Bugs item #3406215, was opened at 2011-09-08 18:18
Message generated for change (Comment added) made by rajeshkumarp
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rajesh Kumar (rajeshkumarp)
Assigned to: Nobody/Anonymous (nobody)
Summary: replacing unexpected <h2> by </h2>

Initial Comment:
JTidy replacing <h2> as </h2> while using nested <h2> tags.
Here is my HTML code.
<h2 >Test Content
    <h2 >Test Content</h2>
</h2>

Here I get a warning as 
Warning: replacing unexpected <h2> by </h2>


Please help me to fix this issue.

----------------------------------------------------------------------

>Comment By: Rajesh Kumar (rajeshkumarp)
Date: 2011-09-08 18:23

Message:
Note : I am using jtidy-r938 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3406215 ] replacing unexpected <h2> by </h2>

From: SourceForge.net <no...@so...> - 2011-09-08 12:48:02

Bugs item #3406215, was opened at 2011-09-08 18:18
Message generated for change (Tracker Item Submitted) made by rajeshkumarp
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tidy functionality
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rajesh Kumar (rajeshkumarp)
Assigned to: Nobody/Anonymous (nobody)
Summary: replacing unexpected <h2> by </h2>

Initial Comment:
JTidy replacing <h2> as </h2> while using nested <h2> tags.
Here is my HTML code.
<h2 >Test Content
    <h2 >Test Content</h2>
</h2>

Here I get a warning as 
Warning: replacing unexpected <h2> by </h2>


Please help me to fix this issue.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3406215&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3390317 ] JTidy goes into infinite loop on specific input document

From: SourceForge.net <no...@so...> - 2011-08-12 07:07:00

Bugs item #3390317, was opened at 2011-08-12 00:43
Message generated for change (Comment added) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Francis Crimmins ()
Assigned to: Nobody/Anonymous (nobody)
Summary: JTidy goes into infinite loop on specific input document

Initial Comment:
JTidy goes into infinite loop on specific input document:

http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php

When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity() 
and the CPU is pegged at 100%

We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.

Hopefully it's not too difficult to fix :)

Many thanks,

- Francis.

----------------------------------------------------------------------

Comment By: https://www.google.com/accounts ()
Date: 2011-08-12 07:07

Message:
Sorry - there was typo in the version I gave to Francis there... Make
that...

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head></head>
	<body>
		<em>
			<dl>
				<p>
					<dd>
					</dd>
				</p>
			</dl>
		</em>
	</body>
</html>

(i.e. the extra opening html tag is not required, and the close html can
be present)


From a quick investigation the problem seems to be that parser is
producing a cycle of br tags (with A followed by B and B followed by A)
below the dd tag.

e.g.

[Node type=RootNode,element=null,content=
	[Node type=StartTag,element=html,content=
		[Node type=StartTag,element=head,content=
			[Node type=StartTag,element=title,content=null]],
			[Node type=StartTag,element=body,content=
				[Node type=TextNode,element=null,text="",content=null],
				[Node type=StartTag,element=dl,content=
					[Node type=StartTag,element=dd,content=
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
						[Node type=StartTag,element=br,content=null],
                                                ...


Though not a proper fix, this patch will detect the cycle and throw a
RuntimeException (and will also limit the loop in toString to help see
what's happening as above).


Index: src/main/java/org/w3c/tidy/Node.java
===================================================================
--- src/main/java/org/w3c/tidy/Node.java	(revision 1261)
+++ src/main/java/org/w3c/tidy/Node.java	(working copy)
@@ -1311,7 +1311,11 @@

        for (child = this.content; child != null; child = child.next)
        {
-            if (child.parent != this || !child.checkNodeIntegrity())
+        	if (this.next != null && this.next.next == this) {
+        		throw new RuntimeException("Cycle detected - aborting");       
		
+        	}
+
+        	if (child.parent != this || !child.checkNodeIntegrity())
            {
                return false;
            }
@@ -1347,8 +1351,15 @@
        String s = "";
        Node n = this;

+        int loopLimit = 1024;
        while (n != null)
        {
+        	if (loopLimit < 0) {
+        		s += "...TRUNCATED...";
+        		n = null;
+        		break;
+        	}
+        	loopLimit--;
            s += "[Node type=";
            s += NODETYPE_STRING[n.type];
            s += ",element=";

----------------------------------------------------------------------

Comment By: Francis Crimmins ()
Date: 2011-08-12 06:07

Message:
And here's a more minimal document which exhibits the problem:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<html>
<head></head>
<body>
	<em>
		<dl>
			<p>
				<dd>
				</dd>
			</p>
		</dl>
	</em>
</body>

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3390317 ] JTidy goes into infinite loop on specific input document

From: SourceForge.net <no...@so...> - 2011-08-12 06:07:57

Bugs item #3390317, was opened at 2011-08-12 00:43
Message generated for change (Comment added) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Francis Crimmins ()
Assigned to: Nobody/Anonymous (nobody)
Summary: JTidy goes into infinite loop on specific input document

Initial Comment:
JTidy goes into infinite loop on specific input document:

http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php

When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity() 
and the CPU is pegged at 100%

We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.

Hopefully it's not too difficult to fix :)

Many thanks,

- Francis.

----------------------------------------------------------------------

>Comment By: Francis Crimmins ()
Date: 2011-08-12 06:07

Message:
And here's a more minimal document which exhibits the problem:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<html>
<head></head>
<body>
	<em>
		<dl>
			<p>
				<dd>
				</dd>
			</p>
		</dl>
	</em>
</body>

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3390317 ] JTidy goes into infinite loop on specific input document

From: SourceForge.net <no...@so...> - 2011-08-12 00:44:04

Bugs item #3390317, was opened at 2011-08-12 00:43
Message generated for change (Settings changed) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: Francis Crimmins ()
Assigned to: Nobody/Anonymous (nobody)
Summary: JTidy goes into infinite loop on specific input document

Initial Comment:
JTidy goes into infinite loop on specific input document:

http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php

When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity() 
and the CPU is pegged at 100%

We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.

Hopefully it's not too difficult to fix :)

Many thanks,

- Francis.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3390317 ] JTidy goes into infinite loop on specific input document

From: SourceForge.net <no...@so...> - 2011-08-12 00:43:01

Bugs item #3390317, was opened at 2011-08-12 00:43
Message generated for change (Tracker Item Submitted) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Francis Crimmins ()
Assigned to: Nobody/Anonymous (nobody)
Summary: JTidy goes into infinite loop on specific input document

Initial Comment:
JTidy goes into infinite loop on specific input document:

http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php

When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity() 
and the CPU is pegged at 100%

We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.

Hopefully it's not too difficult to fix :)

Many thanks,

- Francis.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153

[Jtidy-devel] [ jtidy-Bugs-3349163 ] problem parsing tbody

From: SourceForge.net <no...@so...> - 2011-07-11 01:18:19

Bugs item #3349163, was opened at 2011-07-02 04:02
Message generated for change (Comment added) made by aditsu
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349163&group_id=13153

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: DOM Support
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Aaron Herstein (aarongh2012)
Assigned to: Nobody/Anonymous (nobody)
Summary: problem parsing tbody

Initial Comment:
Tidy does not parse the tbody element in html tables as in this example:

<table border="1">
  <tbody>
    <tr>
      <td>January</td>
      <td>$100</td>
    </tr>
    <tr>
      <td>February</td>
      <td>$80</td>
    </tr>
  </tbody>
</table>

nothing is done with the tbody element

----------------------------------------------------------------------

>Comment By: Adrian Sandor (aditsu)
Date: 2011-07-11 09:18

Message:
Ugh, there was SO much spam. I removed those comments and blocked anonymous
posts. Sorry if that negatively affects any non-spammer.

----------------------------------------------------------------------

Comment By: Martin (martinkurz)
Date: 2011-07-10 21:18

Message:
Just tested with jTidys java-5 branch:


        String source = "<table
border=\"1\"><tbody><tr><td>January</td><td>$100</td></tr><tr><td>February</td><td>$80</td></tr></tbody></table>";
        Tidy tidy = new Tidy();
        Writer stringWriter = new StringWriter();
        tidy.parse(new ByteArrayInputStream(source.getBytes("UTF-8")),
stringWriter);
        System.out.println(stringWriter.toString());

and got the following result:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta content=
"HTML Tidy for Java (vers. 2009-08-01), see jtidy.sourceforge.net"
name="generator">
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>January</td>
<td>$100</td>
</tr>
<tr>
<td>February</td>
<td>$80</td>
</tr>
</tbody>
</table>
</body>
</html>

That's what I would expect to get, could you provide a test case?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3349163&group_id=13153

31 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 26 > >> (Page 1 of 26)