Menu

#19 [PATCH] Html5 tags support

2.11
closed-accepted
nobody
html5 (2)
5
2015-05-12
2015-04-15
phil
No

Hello,
I have implement a class file for each version of HTML,one file for Html4 tags and one for Html5 tags. I attach all the files affected by these files.I run all the test suite for the html cleaner and the following test are changed (because of html5):

1) Add in function testBalancing():

    assertHtml(
            "<u>aa<i>a<b>at<sup></u>fi <b>rst</b>text",
            "<html><head /><body><u>aa<i>a<b>at<sup /></b></i></u><i><b><sup>fi" +
                    "</sup></b></i><i><b><sup>rst</sup></b><sup>text</sup></i></body></html>"
    );

2) Change file (add the menu tag in code): src/test/resources/severalTagsClosedByChildBreak-cleaned.html

<menu><li>Some incomplete li
<p>
</p></li><li>Another li</li><li>
<ul>
<li>This is some text</li>
</ul>
</li></menu>

3) Change function testCollapseInsignificantBrWithEmptyElements:

collapsed = cleaner.clean("<p>Some text<br><span></span><BR/><u></u><BR/></p>");
assertEquals("<p>Some text</p>", serializer.getAsString(collapsed));
collapsed = cleaner.clean("<p>Some text<br><span></span><BR/><u></u><BR/><u></u></p>");
assertEquals("<p>Some text</p>", serializer.getAsString(collapsed));


4) Change function testInsureMeaninglessBrsStillCollapseEmptyElements:


TagNode collapsed = cleaner.clean("<p><u><br/></u>Some text<br><span><BR/><u><BR/></u></p></span>");
assertEquals("<p>Some text</p>", serializer.getAsString(collapsed));

5) Function testTagCopyingAndLimiting is not supported by html5

1 Attachments

Discussion

1 2 > >> (Page 1 of 2)
  • Scott Wilson

    Scott Wilson - 2015-04-16

    Hi Phil,

    Thanks for all your hard work here!

    However, in order to include it in the project I need a patch file as I can then review each of the code changes before I apply them.

    To make a patch in Eclipse, just right-click the project and select Team > Create Patch. Or alternatively from the command line you can run svn diff > mypatch.diff from the project root folder.

     
  • phil

    phil - 2015-04-16

    Hello Mr. Wilson,

    I have done the patch as you wrote,thank you for the instructions!I have created some extra files for testing the html5,if you want to upload them too just tell me.

    Thank you!

    Edit:minor change

     

    Last edit: phil 2015-04-16
  • Scott Wilson

    Scott Wilson - 2015-04-16

    Thanks Phil.

    I was able to merge in the changes; the only issue is I have 6 of the existing unit tests failing - I've attached the report.

    You need to check each failure to see what is happening. Some of these may be just a case that the model has changed, but the output is still correct (in which case we need to change the tests to reflect the new behaviour). In others there will be problems in the code that need fixing before I can commit the changes.

    PS - make sure you do an "svn up" before making a new patch in case there are recent code changes to include.

     
  • phil

    phil - 2015-04-17

    Hello Mr.Wilson,
    I saw the report and the problem is that the tag is not supported by html5.As I wrote in the first post with the changes to be made in the tests,I attach the patches for the tests that fail.After merging these new implementation,the following remain:

    1)org.htmlcleaner.TagCopyingAndLimitingTest.testTagCopyingAndLimiting(TagCopyingAndLimitingTest.java:90) - Reason: The tag "font" is not supported.

    2)org.htmlcleaner.ClosedTagReopenTest.testSimple(ClosedTagReopenTest.java:71)- Reason is the same as above (font tag)

    3)org.htmlcleaner.UtilsTest.testEscapeXml_recognizeUnicodeChars(UtilsTest.java:30)

    4)org.htmlcleaner.UtilsTest.testEscapeXml_transSpecialEntitiesToNCR_withHex(UtilsTest.java:41)

    The number 3 and 4 of the list I think were existed before I made changes (if you can confirm this),so there is only 2 issues that we should change the tests by removing the "font" tag.

    Thank you!

     
  • Scott Wilson

    Scott Wilson - 2015-04-20

    For the tests that fail due to the font tag, setting the version to Html 4 should mean the error doesn't occur - but it still does, at least for 1 and 2.

     
  • Scott Wilson

    Scott Wilson - 2015-04-20

    Not sure about changing CollapseHtmlTest - it looks to me like the original test is fine, its the new code thats behaving differently. We can't just change the tests to match up with whatever the code outputs :)

     
  • phil

    phil - 2015-04-22

    Hello Mr.Wilson,

    I totally agree that tests are there to see if the program really works and we shouldn't change it.The changes I did in CollapseHtmlTest is to just remove the "big" (<"big">) tag because is not supported by Html 5.Moreover,I had a mistake in Html4TagProvider and that's the reason for the errors if you set the html version to 4 (I attach the working class and some minor changes in html5),now it will work. The changes in tests (that I did for html5) are because of not supported tags,not about the functionality,so we have to create new tests for html5.

    Thank you!

     
  • Scott Wilson

    Scott Wilson - 2015-04-22

    Hmm, I still get those 6 tests failing when I do:

    cleaner.getProperties().setHtmlVersion(true);

    Incidentally, this isn't exactly very clear code :) I'd suggest instead using some defined constants, such as:

    cleaner.getProperties().setHtmlVersion(HtmlCleaner.HTML_4);
    cleaner.getProperties().setHtmlVersion(HtmlCleaner.HTML_5);

     
  • phil

    phil - 2015-04-24

    I think I understood why is not working for you.If you change in HtmlCleaner.java the variable defaulthtml to number 4 (HtmlCleaner.HTML_4) it will run through all the tests and succeed.However,I know that the function setHtmlVersion(true) must change the tagProvider to Html4TagProvider.java. I have problem with that, I tried to make this by changing TagProvider or calling HtmlCleaner constructor with the specified TagProvider but unfortunately I didn't manage to make it work in a JUnit test.Why HtmlCleaner is not changing behavior after changing tagProvider and keeping the properties as before? We can't change the version of html after the initialization of the program? I debug and is going through the Html4 TagProvider but it fails the test.

    The sample test I made.It is the testCollapseInsignificantBrWithEmptyElements test and it should work after I change to HTML_4 but is not:

    Code:

    public void test3() throws IOException {
            HtmlCleaner.defaulthtml=HtmlCleaner.HTML_4;
            cleaner.getProperties().setTagInfoProvider(Html4TagProvider.INSTANCE);
            TagNode collapsed2 = cleaner.clean("<p>Some text<br><span></span><BR/><u><big></big></u><BR/></p>");
            assertEquals("<p>Some text</p>", serializer.getAsString(collapsed2));
    
     

    Last edit: phil 2015-04-24
  • Scott Wilson

    Scott Wilson - 2015-04-29

    The problem is that the various convenience constructors call HtmlCleaner(null, properties) which then overrides the TagInfoProvider specified in properties with DefaultTagProvider. I'll see if I can fix that now...

     
  • Scott Wilson

    Scott Wilson - 2015-04-29

    OK, I've committed the Html5TagProvider and also made changes to HtmlCleaner(tagInfoProvider, properties) to ensure changes "stick" when constructing new instances. Update svn and merge in those fixes and see where that gets you.

     
  • phil

    phil - 2015-04-30

    Thank you for your help,the problem now is solved as you did it and I changed the setHtmlVersion to work right. Now,in PropertiesTest you can change the following:
    properties.setTagInfoProvider(null) change it to properties = properties.setHtmlVersion(true)

    The function setHtmlVersion as you will see,it will return a new cleanerProperties object.

    Also,you need to upload MathMLTagProvider.java in svn because it is connected with Html5TagProvider.

     
  • Scott Wilson

    Scott Wilson - 2015-05-01

    Phil,

    Can you provide a "clean" patch please?

    As you'll note I changed HTML5TagProvider to Html5TagProvider to keep naming consistent, so this patch won't apply as you're still using the previous class name.

    Make sure you're working from the latest HEAD revision in SVN, that the project builds with your changes, and that all of the unit tests pass. Then make a new patch!

     
  • phil

    phil - 2015-05-02

    Hello Scott,
    I attach the final changes for Html5 support,it includes changes in JUnit tests in order to work with Html 5 and html 4.You will notice that I use "Assume JUnit library" in order to run only the tests for the appropriate version of html.The CommandLine.java supports choosing html version and stdin pipe.For any problem just tell me,the tests run only for the chosen version of html.

     
  • Scott Wilson

    Scott Wilson - 2015-05-02

    Not sure what you mean about the JUnit tests - I didn't see any modifications to tests in the patch. Did you forget to include them?

    Results :

    Failed tests: testSimple(org.htmlcleaner.ClosedTagReopenTest): started with=text1

    text2

    text3 expected:<text1<[/font>

    text2

    <font]
    >text3
    > but was:<text1<[p>text2</p]>text3>
    testInsureMeaninglessBrsStillCollapseEmptyElements(org.htmlcleaner.CollapseHtmlTest): expected:<

    Some text<[]/p>> but was:<

    Some text<[br /><]/p>>
    testCollapseInsignificantBrWithEmptyElements(org.htmlcleaner.CollapseHtmlTest): expected:<

    Some text<[]/p>> but was:<

    Some text<[br />
    <]
    /p>>
    testTagProviders(org.htmlcleaner.TagBalancingTest): expected:<...ul>(..)
    testShouldSupportBreakingSeveralOpenTags(org.htmlcleaner.TagBalancingTest): expected:<...(..)
    testBalancing(org.htmlcleaner.TagBalancingTest): expected:<...>
    <[i>fi ]rstte...> but was:<...>
    <[big>fi]rstte...>
    (..)stTagCopyingAndLimiting(org.htmlcleaner.TagCopyingAndLimitingTest): expected:<... [

    Tests run: 175, Failures: 7, Errors: 0, Skipped: 2

    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------

     
  • phil

    phil - 2015-05-03

    Sorry,it's my fault,here I attach the tests changes.I forgot to include them above. :)
    Now it should pass all the appropriate tests.

     
  • Scott Wilson

    Scott Wilson - 2015-05-04

    Ah, I see what you mean about Assume.

    However doesn't this mean these tests are now never actually run in a standard build as Assume fails and the tests are then ignored? In which case, we now have reduced test coverage!

    I suggest instead modifying these tests to set the appropriate provider before they run.

     
  • Scott Wilson

    Scott Wilson - 2015-05-04

    Actually I just tried doing that and noticed that the Assume statements are wrong anyway - you've used Assume before instantiating the cleaner in many cases.

    So: get rid of Assume, and ensure each test passes by configuring the properties of the HtmlCleaner instance it uses appropriately.

    Some other things:

    • setHtmlVersion(boolean) is still there even though its not a good method signature.
    • exposing a defaultHtml attribute directly on HtmlCleaner isn't consistent with the rest of the API.

    Remove these and on CleanerProperties provide:

    private int htmlVersion;
    public void setHtmlVersion(int)
    public int getHtmlVersion()

    There's also no need to have a separate variable in HtmlCleaner if it already exists in CleanerProperties - which is the correct place for cleaner properties.

     
  • phil

    phil - 2015-05-05

    I have done it as you wrote above. Now all tests are passed and I removed Assume.I attach one patch for the tests files and one for the changes in HtmlCleaner.java and CleanerProperties.java.

     
  • phil

    phil - 2015-05-07

    I forgot to update the CommandLine.java, sorry for any inconvenience. I did a small change in order to work with the new constructor.

     
  • Scott Wilson

    Scott Wilson - 2015-05-07

    Thanks Phil!

    Thats a big improvement - I revoked all the previous patches and applied the new ones.

    But ... we still have three tests failing :(

    Failed tests: testSimpleHTML4(org.htmlcleaner.ClosedTagReopenTest): started with=

    text1

    text2

    text2a

    text3

    • text4
    text5
    • text6

    expected:<...m="attribute">text3<[/p>
    • text4

    text5

    • text6
    • </ul]>> but was:<...m="attribute">text3<[ul>
    • text4
    text5
    • text6
    </p]
    >>
    testShouldSupportBreakingSeveralOpenTagsHTML4(org.htmlcleaner.TagBalancingTest): expected:<... incomplete li(..)
    (..)stTagCopyingAndLimitingHTML4(org.htmlcleaner.TagCopyingAndLimitingTest): expected:<... [

    Tests run: 181, Failures: 3, Errors: 0, Skipped: 2

    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------

     
  • phil

    phil - 2015-05-08

    That is strange because for me all the tests are passed,I don't get any error like the above.Please make sure that the files TagBalancingTest.java and ClosedTagReopenTest.java are as the ones I attach.It should pass all the tests because I didn't make any changes in Html4.Check it and tell me.

     
  • Scott Wilson

    Scott Wilson - 2015-05-08

    They aren't exactly the same - the last one you published above has an Assume import, but is otherwise identical.

    Now you should see why its important to have clean patches against an up to date trunk otherwise you have no idea what the actual state of the code is :)

     
  • phil

    phil - 2015-05-08

    You are right,I made several projects in Eclipse with some different state of code for some files.I see the Assume import but is not used,so it should work as it is the same with mine.Now the problem exists with the latest code?

     
  • Scott Wilson

    Scott Wilson - 2015-05-08

    Right. So what you need to do now is:

    • Check out a completely fresh clean copy of HtmlCleaner from svn
    • Apply your patches to it (In Eclipse you right-click the project and do Team > Apply Patch)

    You'll then have the state you're trying to get me to test against. At that point you'll probably notice you're missing a few things and your build fails :)

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.