SourceForge Infrastructure and Service Restoration update for 7/24

On 7/16, Slashdot Media sites (including Slashdot and SourceForge) experienced a storage fault.  Work has continued 24×7 on service restoration.  Updates have been provided as each key service component was restored. We’ve provided two prior updates (7/18 and 7/22) summarizing our infrastructure and service restoration status.  This is our third large update.

High-level status of all Slashdot Media sites and services as of 7/24:
  • Slashdotmedia.com – online
  • Slashdot.org – online
  • Slashdot Engineering infrastructure – online
  • Slashdot Media’s WordPress sites – online
  • SourceForge Engineering infrastructure – online
  • Slashdot Media operations infrastructure – online
  • SourceForge databases – online
  • SourceForge download service – online
  • SourceForge Directory services (project summary page, download pages, search, front page, directory) – online
  • SourceForge Developer Services – partially restored (see detailed status below)
In-depth status of SourceForge Developer Services as of 7/24:
  • SourceForge site’s Developer pages backed by Apache Allura (tickets, wikis, forums) – online
  • SourceForge Mailing List services (email, web archives, archiving) – online as of 7/22, archiving restored 7/23
  • SourceForge Project Database (MySQL) service — online
  • SourceForge Project Web service – online as of 7/22, except k* projects (restore in-progress); session store corrected 7/23
  • SourceForge User Web service – online as of 7/22
  • SourceForge Project Web file management — online as of 7/23
  • SourceForge Allura Git service – online as of 7/22.
  • SourceForge Allura Mercurial (Hg) service – online as of 7/23
  • SourceForge File Upload service – offline, filesystem checks complete, cryptographic summing projected to complete 7/24. Prep of data for service resumption in-progress.  ETA to follow once I/O performance calculated during mount reconstruction.
  • SourceForge Allura Subversion (SVN) service – offline, filesystem checks complete, data restoration has completed 22 letters (4 remain). This is our current restore priority. We project restore of data to complete by 7/25, to be followed by data validation and restore of service.  ETA to follow once I/O performance calculated during data validation.
  • SourceForge CVS service — offline, filesystem checks and data restoration to commence after Allura-backed SVN service is restored.  ETA to follow once SVN restore completed.  CVS is 20% of the size of SVN data, but requires a higher degree of manual validation; this data point will be used to estimate restoration timetable.
  • SourceForge non-Allura SCM platforms — offline, filesystem checks and data restoration to occur once CVS restoration is under way.  ETA to follow once SVN restore completed.  This service will be restored last.  Non-Allura SCM data set is substantially smaller than the size of SVN data; this data point will be used to estimate restoration timetable.

Engagement with our storage platform vendor will continue, including review of captured data. Post-mortem activity is anticipated after data restoration is completed. The team continues split operation between data restoration and service restoration as to expedite return to full service.

Knowledge capture has been continuous throughout this outage and will drive continuous improvement.  A few key points resulting from this process to date:

  • Transition of two SourceForge databases from centralized storage platform SSD to local storage SSD (Intel P3600’s) was completed 7/24.  Function and performance validated.
  • Review of I/O workloads is ongoing to further expedite service restoration.
  • Users on “Classic” non-Allura-backed SCM services should anticipate an upcoming pre-announced migration to Allura-backed service (which was restored first).
  • Additional storage is being onboarded at this time. In some cases we currently have three copies of production data to maintain during restoration.

We intend to continue our existing communications approach — incremental updates will be provided on individual service restoration, and large updates (like this one) will be provided with additional metrics and technical details as work progresses.

Work continues 24×7 on restoration of SourceForge file upload and yet-unrestored SCM services (per above list).

Thank you for your continued support and patience.

23 Responses to “SourceForge Infrastructure and Service Restoration update for 7/24”

  1. AB Jul 24, 2015 at 2:49 pm #

    “Knowledge capture has been continuous throughout this outage and will drive continuous improvement.” Disgustingly worded. You could have instead written “we’re learning from this disaster, so pretty please don’t abandon SF because it will never ever happen again, we promise”.

  2. AC Jul 24, 2015 at 4:45 pm #

    Blame the PR department. They’re the ones cooking up that kind of rubbish because somehow they think phrasing it that way sounds less bad than “we goofed. we’re fixing it as fast as we can. we’re really sorry and we’ll never do it again.”

  3. erikd Jul 24, 2015 at 5:12 pm #

    I’m still a bit baffled why the source repos were at the bottom of the priority-list, but I do appreciate the updates and all the efforts in restoring everything. This was clearly a huge disaster for everyone involved, and I hope lessons were learned and things will get back to normal soon. We could all just go to github, but I still think it’s important that Sourceforge will stay relevant.

    • Daniel Seagraves Jul 24, 2015 at 7:25 pm #

      Because source repos don’t generate revenue, downloads with adware do.

    • someone Jul 25, 2015 at 7:19 am #

      What use would the repos be if you can not add new downloads to your project and users cannot get old downloads and users cannot look at your project wiki or even your project homepage? Of course it makes sense to leave the repos last… unless your optimal solution is to fix YUOR project in full first and then worry about anything else.

      • erikd Jul 27, 2015 at 2:26 pm #

        Well, wouldn’t it be everyone’s ‘optimal solution’ if their own project was fixed first? 😉 I never suggested such a thing though. Anyway in the meantime I learned that the real reason wasn’t really about prioritization per se, but more about the time it takes to restore the repos, so it’s understandable to me now. One just can’t please everyone all of the time and all that…

    • Dennis Jul 25, 2015 at 10:15 am #

      Git was one of the first things that came back up, followed by Hg.

      • someon Jul 26, 2015 at 11:58 am #

        No, no it wasn’t.This only came back up relatively recently. Allura project listings, forums (!) and mailing lists (!) and project web were all online before that, you can see this from the blog posts, such as at : https://sourceforge.net/blog/page/2/ The order of restoral does seem odd – a technical explanation for this would be most appreciated. Being able to push/pull with DVCS, file downloads and project web to me seem most important. A technical explanation for the length of this downtime would be nice too… Was this an unforeseeable event, or technical debt? Given the time required for backup recovery (as compared to /., probably the latter).

  4. PlaneShift team Jul 24, 2015 at 7:53 pm #

    The fact SVN is at the bottom of the list is a problem for us (and I think many others), we had to delay releases of our software and give disservice to our users. Why this decision?

    • rgaloppini Jul 26, 2015 at 12:14 pm #

      Please read more at http://sourceforge.net/blog/sourceforge-subversion-svn-service-online/#comment-6749

  5. Tim Freeman Jul 24, 2015 at 10:48 pm #

    The list above says “SourceForge download service – online”, but I’m not seeing that. When I visit http://sourceforge.net/projects/neuroph/files/latest/download, it makes me wait 4 seconds, but the download never starts, and no error message is presented either. I land back at http://sourceforge.net/projects/neuroph/files/. I’ve tried this with Chrome on Windows and Linux, and Firefox on Ubuntu. I used to be able to do this sort of thing without any problem, so it doesn’t feel like pilot error to me. Feel free to delete this comment if you think you’ve fixed it.

    • Matthieu Jul 27, 2015 at 6:23 am #

      Indeed, all “new” downloads are not available on the download page, not possible to delete folders and reupload the new files. Meanwhile, the files are actually available in the CDN behind, and no timeframe as to when it will be fixed…

  6. Adam Jul 25, 2015 at 12:02 am #

    No mirrors are currently being listed! for any program! “Problems with the download? Please use this direct link, or try another mirror.” Just get redirected to http://sourceforge.net/projects/NAME/files/

  7. Dave Cottlehuber Jul 25, 2015 at 3:19 am #

    Thanks SF Ops team, everybody knows that this sort of event is a total nightmare and given how many of us use your services without fee, the least we can do is to say thank-you from the bottom of our open-source hearts. Hang in there!

  8. Shaul Jul 25, 2015 at 6:32 pm #

    I never understood the SF revenue model and I suspect that neither do they understand it fully. However I’m surely thankful for 10 years of fantastic product. Seems like they really are s***d now and I’m sure they work as seriously as possible to solve the problem. I’m surely going to mirror my code and my site elsewhere.

  9. levi Saraiva Moura Jul 25, 2015 at 6:58 pm #

    Amigos: boa tarde. Alguem sabe como posso informar aos responsáveis do Sourceforge que não estou conseguindo fazer upload das minhas planilhas. Quando clico em ADDfile, simplesmente não está fazendo upload. Desde já agradeço quem puder me ajudar ou comunicar ao sourceforge

  10. Mahmoud Fayed Jul 26, 2015 at 9:20 am #

    Thanks for the Service Restoration update! Sourceforge provides GREAT service to open source developers During the last 10 years i was satisfied with the service quality. I have all of what I need (Downloads, Bandwidth, Web Space, Database, Source Control, etc). Keep up the good work!

  11. Gonzalo Garramuño Jul 26, 2015 at 6:51 pm #

    Is there a new ETA on upload services ( sftp in particular ). Its been two days since the ETA on restoration and I cannot yet upload new binary files. Thanks for all the good work guys.

  12. John Cary Jul 26, 2015 at 6:54 pm #

    I have to echo the comments of Dave Cottlehuber, who noted what a nightmare this must be for all involved. My sympathies and thanks to all those working day and night. I too hope that SF continues, as SF gets SVN right while also providing git access,. (For us, certain svn features are indispensable for some things, git for others.) In addition, a monopoly (or even duopoly) will not provide the competition that leads to product improvement. All that said, if all your branches/tags/versions are important, this incident reinforces the need to regularly backup your repos, wherever they are.

    • erikd Jul 28, 2015 at 4:16 pm #

      I can’t agree more!

  13. ericsten Jul 28, 2015 at 2:02 pm #

    Is there a new ETA for the “SourceForge File Upload service”? It’s been three days since the original post, and no follow-up ETA has been posted. I have binary updates waiting to post to my project, which is the equivalent of releasing new bits. Thx! –E.

    • szobel Jul 29, 2015 at 5:39 am #

      I second that. Can we please have a ETA for the File Upload service!

  14. smshaw Jul 29, 2015 at 6:21 am #

    Update for SourceForge File Upload service says ETA to follow but it has been a number of days now with no ETA. Can we please have this ETA for when the file upload service is due to be back online? thanks