#38 2.3.0 dramatically lower for scanning site

v1.0 (example)


I'm using the lastest version 2.3.0 on a CentOS box. When using a previous version (2.2.1?) the scan completed in around 2 hours, however with 2.3.0 the scan is up to 16 hours and still going. This is just the link scanning stage and not the attack stage.

Please let me know if I can send you any specific logs to help with the issue.




  • AdamGreer

    AdamGreer - 2014-01-09


    Just an update, the scan is now on hour 36 for the large site (around 500 pages). It's also on hour 16 for a much smaller site (80ish pages). The scan run fine if I only select the --scope page so nothing appears to be wrong with the installation. Unless I missed a component that would could this?

  • devloop

    devloop - 2014-01-14

    Hi !

    I was AFK for some time...

    Is it possible you send the output of the scan using the "-v 2" option so I can take a look at the URLs ? Replace the hostname if it is sensitive.

    Also keep in mind that the browsing engine as been improved in this version and every modules as more payloads so in every cases it will take more time than the former version.
    But without the urls I can say now if it's a bug or not.


  • AdamGreer

    AdamGreer - 2014-01-14


    No problem, glad you got back to me.

    I've attached a scan of a site in development, feel free to test against it as it's not live and the server is my own. The scans are from 2.2.1 and 2.3.0. The 2.2.1 scan finished the link portion of the scan in 10 seconds or so, however the 2.3.0 was left scanning for around 20 minutes and showed no signs of slowing down. Interestingly in the output log each link too between 1-3 seconds, whereas 2.2.1 had many, many links per second.

  • devloop

    devloop - 2014-01-16

    Hello !

    The problem is related to JS parsing that is too hazardous for the moment.
    You should edit lswww.py and modify the handle_data method for the LinkParser class so it simply returns.

    Replace the following lines :
    if self.inscript:
    candidates = re.findall(r'"([A-Za-z0-9_=#&%.+\?/-])"', data)
    candidates += re.findall(r"'([A-Za-z0-9_=#&%.+\?/-]
    )'", data)
    for jstr in candidates:
    if ('/' in jstr or '.' in jstr or '?' in jstr) and jstr not in self.common_js_strings:

    with :

    You will also have to add the sublime protocol to the list of bad protocols.
    In lswww, for the correctlink method, add the line :
    llien.startswith("subl:") or
    after the line :
    llien.startswith("gopher:") or

    Unfortunately the scan still always last forever because there's some king of "append forever" loop that generate links like /%5C%22/cdn-cgi/ then /%5C%22/cdn-cgi/%5C%22/cdn-cgi/ then /%5C%22/cdn-cgi/%5C%22/cdn-cgi/%5C%22/cdn-cgi/ and so on...
    So you will have to ctrl+C to pass the scan and launch the attacks.

    Most of the bad links are found on an error page like this one :

    It will take some time to fix so I can't give you a patch now.


  • devloop

    devloop - 2014-01-26

    Hi !

    revision 332 (see https://sourceforge.net/p/wapiti/code/332/ ), adds a new protection against endless scan to fix this issue.
    Each scanned URL now have a counter of how much links were browsed before reaching the current URL (you can see it as how many referers we have since the URL given as an argument). If an URL exceeds the maximum depth of scan (default is 40), then the URL won't be fetched.

    A new option (-d or --depth) has been added to set this maximum value.

    You can svn checkout this revision or wait for a future stable release.

  • devloop

    devloop - 2014-01-26
    • status: open --> closed-fixed

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks