Fernando

Show:

What's happening?

  • Followup: RE: Web Harves keeps hanging - what to do?

    Hi Pat, I think that the problem is the memory allocated to web-harvest to do the job. You can try to get around that using a different approach: You can put the garbage collector to run inside of the loop : <script> System.gc(); </script> and the results you want to get, put them around a file only, not a var-def tag: <file action="write"...

    2009-07-24 12:35:29 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Colon in url

    I don't believe it is a problem for you at all. If the compiler for web-harvest is giving you an error message, you should define the url within the CDATA block as follows: <var-def name="myurl"> <![CDATA[ http://www.allmusic.com/cg/amg.dll?p=amg&amp;sql=11:4q5tk6kx9krh ]]> </var-def>.

    2009-07-22 15:28:12 UTC in WebHarvest - web data extraction tool

  • Followup: RE: session

    Ok, First of all, to get the next page all you have to do is: <var-def name="page_courante"> <template> <http url='${sys.fullUrl(site.toString(),next.toString())}'/> </template> </var-def> In your code, you have to be careful on the loops, because if you use <while> then <var-def>, every time the loop is executed, the...

    2009-07-02 13:53:28 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Remove a div element in generated XML

    Hi, I was wondering how to do that and I got so curious that I came out with something like this: (we could not use the prunetags attribute from html-xml processor because it would get rid of all tags) <config> <var-def name="page"> <![CDATA[ <html> <title> </title> <body> <div id="div1">.

    2009-06-29 13:26:12 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Problem with memory..

    Sorry, I have a correction to do. When I suggested : <file path="results.txt" action="write"> <!-- your loop here --> </file> I was wrong. It won't do any help. Because only after it finishes the loop, the information will be saved in a file. To correct that you should use the file processor inside the loop: <loop> <list>...

    2009-06-29 12:51:19 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Problem with memory..

    If you don't want to change the amount of memory that web harvest uses, you can ask for a garbage collection at any time in your xml script file, just add the script in beanshell inside of the loop body, or any part that causes it to freeze or crash. <script> <![CDATA[ System.gc(); ]]> </script> I had the same problem with one big application...

    2009-06-26 20:42:47 UTC in WebHarvest - web data extraction tool

  • Followup: RE: F**** session

    Please lets watch our language. It just hurt my eyes to read that.

    2009-06-25 15:08:27 UTC in WebHarvest - web data extraction tool

  • Followup: RE: FileNotFoundException

    Hi, I use forward slash ("/") instead of the double back slash ("\\"). "C:/Users/ADMIN/Desktop/AllInOne.xml" Looks clean and avoid any misunderstandings. But in your case the file name might be wrong or in a different directory. Cheers, Fernando Abreu.

    2009-06-17 17:57:21 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Combine Webharvest with Java-code?

    Hi Hans, I have a bunch of applications that uses Java with Web-Harvest sending and receiving information from each other, and also some Java code inside the xml config file (through the beanshell language) Actually you would miss a lot of benefits if you do not use it inside a Java Application. For example, how would you save the results in a database?. Regards, Fernando Abreu.

    2009-06-17 17:46:10 UTC in WebHarvest - web data extraction tool

  • Followup: RE: Reading XML in browser with Xpath

    Hi, Actually the problem is in : "declare variable $doc as node() external; let $expediente := $doc//cbc:ContractFileID/text() return " Because "//cbc:ContractFileID/text() " is an Xpath expression and the ":" character has different meaning, you cannot use it. Also, the html-to-xml processor changed this tag in the xml file from...

    2009-06-17 17:42:12 UTC in WebHarvest - web data extraction tool

About Me

  • 2009-03-19 (8 months ago)
  • 2445863
  • fxabreu (My Site)
  • Fernando

Send me a message