Menu

Character references in attributes

Anonymous
2010-03-06
2013-01-03
  • Anonymous

    Anonymous - 2010-03-06

    Hi,

    I'm trying to upgrade from version 2.5 to version 3.1 and I have run into a problem with one of our test cases.  The test case can be boiled down to this:

    public class MyTest {
        public static void main(String[] args) {
            String html = "<a href=\"http://host.com/resource?a=1&ne=2\">should have two parameters</a>";
            au.id.jericho.lib.html.Source s1 = new au.id.jericho.lib.html.Source(html);
            au.id.jericho.lib.html.StartTag t1 = (au.id.jericho.lib.html.StartTag) s1.findAllStartTags("a").get(0);
            System.out.println("href1: " + t1.getAttributeValue("href"));
            net.htmlparser.jericho.Source s2 = new net.htmlparser.jericho.Source(html);
            net.htmlparser.jericho.StartTag t2 = s2.getAllStartTags("a").get(0);
            System.out.println("href2: " + t2.getAttributeValue("href"));
        }
    }
    

    (Put the above in a file called MyTest.java and
    compile with:

    CLASSPATH=.:jericho-html-3.1.jar:jericho-html-2.5.jar javac MyTest.java
    

    and then
    run with

    CLASSPATH=.:jericho-html-3.1.jar:jericho-html-2.5.jar java MyTest
    

    )

    When I run this, I get:

    href1: http://host.com/resource?a=1&ne=2
    href2: http://host.com/resource?a=1=2
    

    href1 show the old behavior and href2 shows the new behavior.  I would like to keep the old behavior.

    I have tried to add:

    net.htmlparser.jericho.Config.CurrentCompatibilityMode = new net.htmlparser.jericho.Config.CompatibilityMode("mytest");
    

    to the top of the program to get a compatibility mode that will not expand unterminated character references in attributes, but it doesn't have any effect.

    Is there any way I can use version 3.1 and get the old behavior?

    - Erik -

     
  • Martin Jericho

    Martin Jericho - 2010-03-06

    Hi Erik,

    This is a bug.  I will take a look and try to get a patched version to you today.

    Cheers 
    Martin

     
  • Anonymous

    Anonymous - 2010-03-06

    I think I have found a workaround. Instead of:

    t1.getAttributeValue("href")
    

    I can do something like:

    t1.getAttributes().get("href").getValueSegment().toString()
    
     
  • Anonymous

    Anonymous - 2010-03-06

    oh, that was a quick reply.  Thanks,

    But don't worry about getting a fix out today. I'm not in any big rush on this.

     
  • Martin Jericho

    Martin Jericho - 2010-03-06

    This has been fixed for version 3.2.

    Until version 3.2 is officially released, the development version is available here: 
    http://jericho.htmlparser.net/temp/jericho-html-3.2-dev.zip

     

Log in to post a comment.