Menu

#86   in link results in space (0x20) rather than no break space (0xC2 0xA0)

General
closed-rejected
nobody
None
5
2016-11-28
2016-11-28
Code Buddy
No

Some real world HTML I've come across: Page has anchor links with no break spaces in, eg:

<a href='no&nbsp;break&nbsp;space.html'>no break space link</a>

When parsed, these come out as regular spaces, rather than the no break variety. I've created a test suite for this - all is well with testSpace() and testNbSpace(), but testHtmlNbSpace() fails (it passes if I expected SPACE rather than NB_SPACE):

package com.github.liamsharp;

import java.util.List;

import junit.framework.TestCase;
import net.htmlparser.jericho.Element;
import net.htmlparser.jericho.Source;

public class SpaceTests extends TestCase
{

    private static final String HTML_NB_SPACE = "&nbsp;";
    private static final String SPACE = "\u0020";
    private static final String NB_SPACE =  "\u00A0";

    public void testHtmlNbSpace()
    {
        runSpaceTest(HTML_NB_SPACE, NB_SPACE);
    }

    public void testSpace()
    {
        runSpaceTest(SPACE, SPACE);
    }

    public void testNbSpace()
    {
        runSpaceTest(NB_SPACE, NB_SPACE);
    }

    private void runSpaceTest(
        final String inputSpace,
        final String expectedOutputSpace)
    {
        final String content = 
                  "<html>"

                + " <body>"
                + " <a href='before" + inputSpace + "after'>foo</a>"
                + " </body>" 
                + "</html>";

        final Source source = new Source(content);
        source.fullSequentialParse();
        final List<Element> h1s = source.getAllElements("a");

        assertTrue(!h1s.isEmpty());
        Element anchor = h1s.get(0);

        final String href = anchor.getAttributeValue("href");
        assertEquals("before" + expectedOutputSpace + "after", href);
    }
}

Discussion

  • Code Buddy

    Code Buddy - 2016-11-28

    Tests in maven project can be grabbed from here if needed:
    https://github.com/liamsharp/jerichohtml-html-comments-in-css

    In:
    src/test/java/com/github/liamsharp/SpaceTests.java

     
  • Martin Jericho

    Martin Jericho - 2016-11-28
    • status: unread --> closed-rejected
     
  • Code Buddy

    Code Buddy - 2016-11-28

    Awsome, thanks Martin, much appreciated!

     

Log in to post a comment.

MongoDB Logo MongoDB