Menu

Page visits

Help
tomdemets
2005-07-06
2013-04-27
  • tomdemets

    tomdemets - 2005-07-06

    I can connect to my homepage with the parser but I find it odd that it does not generate a page hit with my website statistics... It's just like the parser has never been there altough I can read everything just fine.
    Anyone has an idea why this is and how it can be changed?

     
    • Derrick Oswald

      Derrick Oswald - 2005-07-07

      It's possible that the page hit mechanism works on a secondary fetch, i.e. some <img> tag that is normally automatically fetched by a browser, but is not fetched by the parser without extra code in the program (it is not usually interesting for a program to fetch images). This kind of secondary fetch is done in the SiteCapturer example, where links are followed and resources are fetched.

       
      • tomdemets

        tomdemets - 2005-07-08

        I think I found what's causing it, the page loads and a javascript function has to be executed to count as a page visit. Can htmlparser make this function execute?

         
        • Derrick Oswald

          Derrick Oswald - 2005-07-08

          No. Not easily. It would mean parsing the javascript.
          There are two outstanding requests for enhancement that pertain to this...

          https://sourceforge.net/tracker/index.php?func=detail&aid=886862&group_id=24399&atid=381402

          http://sourceforge.net/tracker/index.php?func=detail&aid=1196079&group_id=24399&atid=381402

          ...but no one has attempted it yet. Some useful links if you want to try it: ECMASCRIPT:  http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf
          ANTLR: http://antlr.org/
          JavaCC: http://javacc.dev.java.net/
          FESI: an ecmascript grammar for JavaCC: http://www.lugrin.ch/fesi/index.html

           
          • tomdemets

            tomdemets - 2005-07-09

            I have taken a look at the links and the problems showing up there are indeed hard bits to crack. But suppose there is a way, how could the script actually be executed? Let's say the code is a simple alert? Would you let java show a swing alert box? I also looked at AposTestCase.java but it extended the ParserTestCase which I don't have :-). If you could mail it to me, I'd be happy to give it a try.

             
            • tomdemets

              tomdemets - 2005-07-09

              Derrick, if you send me an e-mail, please send it to tomNO_SPAM at codenation.be, since I don't look at my sourceforge mail.

               
            • Derrick Oswald

              Derrick Oswald - 2005-07-10

              Depending on what the program is trying to accomplish, the semantic meaning of the 'execution' can vary. For the most common use-case of crawling and indexing, script that alters the page contents or adds hyperlinks would be very interesting. Your case may be generalizable as one of these. What is the URL for your home page (if it's OK to post it).

              The ParserTestCase.java file should be found in the src.zip file included with each distribution:

              $ cd ~/htmlparser1_5/htmlparser/
              $ jar -tf src.zip | grep ParserTestCase
              src/org/htmlparser/tests/ParserTestCase.java

               
    • tomdemets

      tomdemets - 2005-07-13

      I've taken a second look at the JavaScript parsing and I must say I'm amazed at how much work this would require.
      I have to confess the JavaScript need for parsing my homepage is not so much a big issue, I just thought it would be nice to be able to parse it. If you would like to visit it, the link is: http://www.codenation.be. My pagerank in google is currently very low so, if anyone could give me a hand and put a link to my page, I would gladly put one back to yours. I've experimented a bit with the testcase class in the past few days. I've also thought about functions, if it would be difficult to parse external javascript,... It looks like you would have to translate the javascript to internal htmlparser commands. e.g: document.form1.textbox1.text = "tom";
      would mean: lookup form 1, lookup textbox1, and modify textbox1 so it would look like: <input type="text" id="textbox1" value="tom" />. Am I correct?

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.