crawl website mostly ends with error

2008-03-25
2013-05-13
  • Dmitry Buzolin

    Dmitry Buzolin - 2008-03-25

    Hi everybody!

    I'm trying examples in source code and all the time getting same error:

    $ ./webcrawler.bat -o out -depth 9 http://www.dice.com
    16 [main] INFO org.ontoware.rdf2go.RDF2Go - Using ModelFactory 'class org.openrdf.rdf2go.RepositoryModelFactory' which w
    as loaded via org.ontoware.rdf2go.impl.StaticBinding.
    Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in path at
    index 87: http://fls.doubleclick.net/activityi;src=982522;type=dicec050;cat=dicem975;ord=1;num='+ a + '?
            at org.ontoware.rdf2go.model.node.impl.URIImpl.<init>(URIImpl.java:51)
            at org.ontoware.rdf2go.model.node.impl.URIImpl.<init>(URIImpl.java:36)
            at org.ontoware.rdf2go.model.impl.AbstractModel.createURI(AbstractModel.java:139)
            at org.semanticdesktop.aperture.crawler.web.WebCrawler.processLinks(WebCrawler.java:544)
            at org.semanticdesktop.aperture.crawler.web.WebCrawler.processQueue(WebCrawler.java:320)
            at org.semanticdesktop.aperture.crawler.web.WebCrawler.crawlObjects(WebCrawler.java:135)
            at org.semanticdesktop.aperture.crawler.base.CrawlerBase.crawl(CrawlerBase.java:197)
            at org.semanticdesktop.aperture.examples.ExampleWebCrawler.crawl(ExampleWebCrawler.java:130)
            at org.semanticdesktop.aperture.examples.ExampleWebCrawler.main(ExampleWebCrawler.java:141)
    Caused by: java.net.URISyntaxException: Illegal character in path at index 87: http://fls.doubleclick.net/activityi;src=
    982522;type=dicec050;cat=dicem975;ord=1;num='+ a + '?
            at java.net.URI$Parser.fail(URI.java:2809)
            at java.net.URI$Parser.checkChars(URI.java:2982)
            at java.net.URI$Parser.parseHierarchical(URI.java:3066)
            at java.net.URI$Parser.parse(URI.java:3014)
            at java.net.URI.<init>(URI.java:578)
            at org.ontoware.rdf2go.model.node.impl.URIImpl.<init>(URIImpl.java:49)
            ... 8 more

    What is wrong here?

     
    • Antoni Mylka

      Antoni Mylka - 2008-03-25

      That's weird. Seems that the links on the website point to URL's that are faulty, by the standards of the parser built in the java.net.URI class. I'll have a look and get back to you as soon as I know something more.

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks