Hello I had a problem recently with the topic title "Link that redirect to other site not captured". I now have another problem:)
I have a test website for crawling that has 31 URLs. In the handle_page_data function, I add each of the URLs found to a list and it successful finds the 31 links. However I edited the code on one of the pages of the test site so it redirects to itself to itself but the scheme is https. Therefore it is like 1 of the 31 links is now using https.
However, when I now crawl the website it adds all the links to my array twice, once with http and once with https even though only one of the links is using http. What I want is to catpure the 30 link with http and the 1 link with https.
Thanks for the reply! I uploaded the site to a free host. It is located at http://webvulscanner.net84.net/testsitewithvulns/
If you click on the privacy link, you will see it redirects to itself but with https. However it seems to be timing out on the free hosting site but it works fine on localhost.
I just did a crawl on the uploaded site and it only actually finds the 32 links so it seems to be working there: 31 links with http and 1 with https. However when I crawl the exact same site on my localhost using xammp, it picks up 62 links: 31 links with http and the same 31 links with https. Do you have any idea why this might be happening? There is definitely no links to them on my localhost site as it is the exact same source code of the uploaded site.
Thanks in advance!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry problem resolved! Once one of the links were redirected to https, the user (or crawler in this case) would stay using https when browsing the site as all the links on the site link to relatives paths therefore keeping https in the domain name. Thanks for the help anyway!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello I had a problem recently with the topic title "Link that redirect to other site not captured". I now have another problem:)
I have a test website for crawling that has 31 URLs. In the handle_page_data function, I add each of the URLs found to a list and it successful finds the 31 links. However I edited the code on one of the pages of the test site so it redirects to itself to itself but the scheme is https. Therefore it is like 1 of the 31 links is now using https.
However, when I now crawl the website it adds all the links to my array twice, once with http and once with https even though only one of the links is using http. What I want is to catpure the 30 link with http and the 1 link with https.
For example, the links it captures are:
15:52:30:0230 ,begin_crawl, http://127.0.0.1/testsitewithvulns/
15:52:30:0230 ,begin_crawl, http://127.0.0.1/testsitewithvulns/index.php
15:52:30:0230 ,begin_crawl, http://127.0.0.1/testsitewithvulns/search.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/login.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/about_us.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/privacy.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/contact_us.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/products.php
15:52:31:0231 ,begin_crawl, http://127.0.0.1/testsitewithvulns/register.php
15:52:32:0232 ,begin_crawl, http://127.0.0.1/testsitewithvulns/vulnerabilities.php
15:52:32:0232 ,begin_crawl, http://127.0.0.1/testsitewithvulns/loginResult.php
15:52:32:0232 ,begin_crawl, https://127.0.0.1/testsitewithvulns/privacy.php
15:52:32:0232 ,begin_crawl, http://127.0.0.1/testsitewithvulns/processContactUsForm.php
15:52:32:0232 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1111
15:52:32:0232 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1112
15:52:33:0233 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1113
15:52:33:0233 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1114
15:52:33:0233 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1115
15:52:33:0233 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1116
15:52:33:0233 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1117
15:52:34:0234 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1118
15:52:34:0234 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1119
15:52:34:0234 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1120
15:52:34:0234 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1121
15:52:34:0234 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1122
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1123
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1124
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1125
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/registerResult.php
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/insecure_direct_object_reference1.php?fileToDisplay=welcome.txt
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/insecure_direct_object_reference2.php?message=Other%20Products&fileToDisplay=data.csv&otherFileToDisplay=sale.txt
15:52:35:0235 ,begin_crawl, http://127.0.0.1/testsitewithvulns/unvalidated_redirect1.php?redirect=index.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/search.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/login.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/index.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/about_us.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/contact_us.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/products.php
15:52:36:0236 ,begin_crawl, https://127.0.0.1/testsitewithvulns/register.php
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/vulnerabilities.php
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/loginResult.php
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/processContactUsForm.php
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1111
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1112
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1113
15:52:37:0237 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1114
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1115
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1116
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1117
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1118
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1119
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1120
15:52:38:0238 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1121
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1122
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1123
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1124
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/dataDrillDown.php?ID=1125
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/registerResult.php
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/insecure_direct_object_reference1.php?fileToDisplay=welcome.txt
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/insecure_direct_object_reference2.php?message=Other%20Products&fileToDisplay=data.csv&otherFileToDisplay=sale.txt
15:52:39:0239 ,begin_crawl, https://127.0.0.1/testsitewithvulns/unvalidated_redirect1.php?redirect=index.php
As you can see, all links are captured twice. Once with http and once with https.
Any replies or suggestions would be greatly appreciated!
Hi again,
could you maybe upload your testsite somwhere (so that it's possible to reproduce your report?)
And you are really sure that the other 30 https-links don't appear anywhere on your site?
Hi Again,
Thanks for the reply! I uploaded the site to a free host. It is located at http://webvulscanner.net84.net/testsitewithvulns/
If you click on the privacy link, you will see it redirects to itself but with https. However it seems to be timing out on the free hosting site but it works fine on localhost.
I just did a crawl on the uploaded site and it only actually finds the 32 links so it seems to be working there: 31 links with http and 1 with https. However when I crawl the exact same site on my localhost using xammp, it picks up 62 links: 31 links with http and the same 31 links with https. Do you have any idea why this might be happening? There is definitely no links to them on my localhost site as it is the exact same source code of the uploaded site.
Thanks in advance!
Sorry problem resolved! Once one of the links were redirected to https, the user (or crawler in this case) would stay using https when browsing the site as all the links on the site link to relatives paths therefore keeping https in the domain name. Thanks for the help anyway!