Menu

rewrite relative url inside html code

Help
le_tmp
2010-01-09
2013-04-27
  • le_tmp

    le_tmp - 2010-01-09

    hey! is it possible to rewrite the html code with jtidy so that there are no relative urls but absolute urls in the end? thanks alot!

     
  • Derrick Oswald

    Derrick Oswald - 2010-01-09

    I'm not sure about jtidy, but it is possible with HTMLParser.
    See the sitecapturer application.

     
  • le_tmp

    le_tmp - 2010-01-09

    thank you very much!

    i tried it like this:
    SiteCapturer capturer = new SiteCapturer();
    capturer.setTarget("localSite");
    capturer.setSource("http://www.google.de");
    capturer.capture();

    That downloads the complete page. Can you give me a short overview how I can only transform the sourcehtml to a new html string with replaced links (relative -> absolute) ?

    Should I use a NodeFilter?
    Thanks in advance!

     
  • Derrick Oswald

    Derrick Oswald - 2010-01-09

    You need an in-memory image of your web page, for that HTMLParser provides a NodeList.
    Get the URL, file, or text string and pass it to the parser and use a null filter to get everything:

    Parser parser = new Parser ();
    parser.setResource (…);
    NodeList list = parser.Parse (null);

    Then you need to find your links:

    NodeFilter filter = new NodeClassFilter (LinkTag.class);
    NodeList links = list.extractAllNodesThatMatch (filter, true /* recursive */);

    Now cycle through your list and fix the link:

    for (int i = 0; i < links.Length (); i++)
    {
        LinkTag tag = links_;
        … tag.getLink();
        … tag.setLink(<new link>);
    }

    Then output the whole page:
    System.out.println (list.toHtml ());_

     
  • le_tmp

    le_tmp - 2010-01-09

    thank you again!

    this helps if I want to change all <a></a> tags. but I also want to change <img src="">, <script src=""> and <link> tags.
    they have in common that links have an attribut "src" or "href".
    is there a better way to rewrite only these attributes instead of writing filters for every kind of tag?

    thanks again! :)

     
  • le_tmp

    le_tmp - 2010-01-09

    I tried it with the CompositeTag, but it doesn't work because the <link ../>-tag has no endtag.
    How can I access the <link>-tag ?

    Best regards

     
  • le_tmp

    le_tmp - 2010-01-09

    I used the Tag.class instead :) Thank you

     
  • Derrick Oswald

    Derrick Oswald - 2010-01-09

    You could try the visitor pattern.

     
  • le_tmp

    le_tmp - 2010-01-11

    hello again!

    I have some problems when I try to set the resource for the parser to javascript or php files:

    Internal Server Error (500) - unknown protocol: javascript
    Internal Server Error (500) - no protocol: index.php

    I can't find the problem.

     
  • le_tmp

    le_tmp - 2010-01-11

    I have another question:

    How can I access the url of the following code:

    <style media="all" type="text/css">
    @import "./templates/subSilver/themes/resolution/standard.css";
    @import "./templates/subSilver/themes/default/css/all.css";
    </style>

    I tried to create a CssSelectorNodeFilter("@import") but this doesn't work.

     
  • le_tmp

    le_tmp - 2010-01-11

    ah…my mistake.. with the CssSelectorNodeFilter you can access html elements with css selections :)
    sorry.

    do you have an idea, how I could access these @import urls ?

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.