Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#6595 Prevent spiders from requesting tarballs

forge-sep-06
closed
stability (31)
General
2
2013-08-28
2013-08-22
Dave Brondsema
No

The following are examples of spiders requesting tarball creation. This is unnecessary and a waste of resources. We should make it not possible. We already have rel=nofollow but that apparently isn't working. I think the best solution is to require the URL to be a POST.

"GET /p/z-i/code-0/208/tarball HTTP/1.0" 200 16400 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
"GET /p/jhotdraw/svn/729/tarball HTTP/1.0" 200 17834 "-" "msnbot/0.01 (+http://search.msn.com/msnbot.htm)"
"GET /p/fourpane/git4pane/ci/ec65df3a5ff2ec7be011c0722286e766c2b76d94/tarball HTTP/1.0" 200 18137 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)"
"GET /u/lluct/me722-cm/ci/0aa649648a00979ad6ca9e9d61df4e44eb694259/tarball?path=/external/clang HTTP/1.0" 200 17918 "-" "YisouSpider"

Related

Chat: 5217b20b0594ca15016eb380
Chat: 5217b2110594ca15016eb383
Chat: 522659080594ca66f3cdedda
Chat: 522659090594ca66f3cdeddd
Tickets: #6602
Tickets: #6618

Discussion

  • Dave Brondsema
    Dave Brondsema
    2013-08-22

    I wonder if we'd want to be a bit more nuanced. If I click on "Download Snapshot" and then am waiting for the zip to be generated, I might hit refresh, and then I get a page that says "405 Method Not Allowed". Would it be practical to have a GET request check for status only? If there were no snapshot ready or in-progress, we'd probably need a message & link to POST a new request.

    That would also allow people to share URLs (e.g. in an email or webpage) directly to the code snapshot page still.

     
  • Made tarball controller handle GET and POST. Changes force-pushed to:

    forge:tv/6595
    forgehg:tv/6595

     
  • Dave Brondsema
    Dave Brondsema
    2013-08-26

    If I do a GET on a rev that has no tarball ever requested, it says "Checking snapshot status..." and does ajax checks which return 'na' over and over. We need some way let the user request the snapshot (put a POST form button right on that page?).

    A smaller initial delay is great, but // Check tarball status every 5 seconds should be removed since it's inaccurate now. The upper limit of 600,000ms seems pretty high too, might be good to drop that down while you're in there.

     


Anonymous


Cancel   Add attachments