Menu

Page requested: http://www.testwebsite/test-page/( (404)

Help
Mark
2013-10-15
2013-10-16
  • Mark

    Mark - 2013-10-15

    Page requested: http://www.testwebsite/test-page/( (404)

    Note the above isn't an actual working link just an example. In this case, it only required that I provide an instance in which this described multiple instances.

    The crawler is appending a left closing parenthesis "(" to the end of some URLs and returning 404.

    Any idea what's causing this?

    Thanks,
    Mark

     

    Last edit: Mark 2013-10-15
  • Anonymous

    Anonymous - 2013-10-16

    Hi Mark,

    the crawler just sometimes finds some links in pages that are not links at all, for example in javascipt-code (something like image.src = "("+xyz....) This is a known issue, see this bug: http://sourceforge.net/p/phpcrawl/bugs/25/

    You may try to set enableAggressiveLinkSearch() to FALSE, maybe it helps in your case.

    Otherwiese you can just ignore these links as they are not existant.

     
  • Mark

    Mark - 2013-10-16

    Thanks.

    I tried setting enableAggressiveLinkSearch() to FALSE, but still returning the same issue.

    I was taking a look at the class (class files) to locate the function that parses the URLs. My idea was to add a regular expression to strip any parenthesis or backslash from the end of the end of URLs and return false.

    Would you be able to give me a little bit of a starting point on which part of class to take a look at?

    Thanks,
    Mark

     
  • Mark

    Mark - 2013-10-16

    Ah nevermind, I found $DocInfo->url. I can go ahead and parse that.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.