Menu

IO Exception

Help
good
2006-11-16
2012-09-04
  • good

    good - 2006-11-16

    I am getting IO Exception like below. My config file is also given. I am not giving any & in the config file it automatically adds in the url. How to avoid it
    Config File.
    <?xml version="1.0" encoding="UTF-8"?>

    <config charset="UTF-8">

    &lt;include path=&quot;functions.xml&quot;/&gt;
    
    &lt;!-- defines search keyword and start URL --&gt;             
    &lt;var-def name=&quot;search&quot;&gt;travel chennai&lt;/var-def&gt;
    &lt;var-def name=&quot;url&quot;&gt;
        &lt;template&gt;http://http://www.guruji.com/local?q=${search}&lt;/template&gt;
    &lt;/var-def&gt;
    
    &lt;!-- collects all travel URLs --&gt;
    &lt;var-def name=&quot;imgLinks&quot;&gt;
        &lt;call name=&quot;download-multipage-list&quot;&gt;
            &lt;call-param name=&quot;pageUrl&quot;&gt;&lt;var name=&quot;url&quot;/&gt;&lt;/call-param&gt;
            &lt;call-param name=&quot;nextXPath&quot;&gt;//a[.='Next']/&lt;/call-param&gt;
            &lt;call-param name=&quot;maxloops&quot;&gt;5&lt;/call-param&gt;
        &lt;/call&gt;
    &lt;/var-def&gt;
    
    &lt;!-- download images and saves them to the files --&gt;
    &lt;loop item=&quot;link&quot; index=&quot;i&quot; filter=&quot;unique&quot;&gt;
        &lt;list&gt;
            &lt;var name=&quot;imgLinks&quot;/&gt;
        &lt;/list&gt;
        &lt;body&gt;
            &lt;file action=&quot;write&quot; type=&quot;binary&quot; path=&quot;guruji/${search}.txt&quot;&gt;
                &lt;http url=&quot;${sys.fullUrl(url, link)}&quot;/&gt;
            &lt;/file&gt;
        &lt;/body&gt;
    &lt;/loop&gt;
    

    </config>

    Exception
    Exception in thread "main" org.webharvest.exception.HttpException: IO error duri
    ng HTTP execution for URL: http://http://www.guruji.com/local?q=travel+chennai&
    at org.webharvest.runtime.web.HttpClientManager.execute(Unknown Source)
    at org.webharvest.runtime.processors.HttpProcessor.execute(Unknown Sourc
    e)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.BaseProcessor.getBodyTextContent(Un
    known Source)
    at org.webharvest.runtime.processors.HtmlToXmlProcessor.execute(Unknown
    Source)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.BaseProcessor.getBodyListContent(Un
    known Source)
    at org.webharvest.runtime.processors.VarDefProcessor.execute(Unknown Sou
    rce)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.EmptyProcessor.execute(Unknown Sour
    ce)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.BaseProcessor.getBodyListContent(Un
    known Source)
    at org.webharvest.runtime.processors.WhileProcessor.execute(Unknown Sour
    ce)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.BaseProcessor.getBodyListContent(Un
    known Source)
    at org.webharvest.runtime.processors.ReturnProcessor.execute(Unknown Sou
    rce)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.CallProcessor.execute(Unknown Sourc
    e)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.processors.BaseProcessor.executeBody(Unknown S
    ource)
    at org.webharvest.runtime.processors.BaseProcessor.getBodyListContent(Un
    known Source)
    at org.webharvest.runtime.processors.VarDefProcessor.execute(Unknown Sou
    rce)
    at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
    at org.webharvest.runtime.Scraper.execute(Unknown Source)
    at org.webharvest.runtime.Scraper.execute(Unknown Source)
    at CommandLine.main(Unknown Source)
    Caused by: java.net.UnknownHostException: http
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.<init>(Unknown Source)
    at java.net.Socket.<init>(Unknown Source)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
    reateSocket(DefaultProtocolSocketFactory.java:79)
    at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
    reateSocket(DefaultProtocolSocketFactory.java:121)
    at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java
    :704)
    at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt
    pMethodDirector.java:382)
    at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe
    thodDirector.java:168)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
    a:396)
    at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
    a:324)

     
    • Cal

      Cal - 2006-11-20

      The line
      <template>http://http://www.guruji.com/local?q=${search}</template>
      could be a problem. Having two 'http://'s in the url could be problematic.

      cheers.

       
    • good

      good - 2006-11-20

      thanks for the reply.
      I found the problem and corrected it.

       

Log in to post a comment.