SourceForge Outage Recap and Future Steps

Most of SourceForge Developer Services were unavailable starting July 12th 18:52 UDT and ending July 13th 17:10 UDT. Project Web services were initially taken down to clear up some disk errors. During this maintenance we experienced issues with the entire NFS infrastructure, forcing us to take the site offline and migrating to our Failover Environment. After extensive troubleshooting we were able to determine that the core issue was excessive latency in the internal DNS load balancer, which was causing RPC timeouts and preventing NFS clients from mounting. Upon re-configuring DNS resolvers, we were able to restore proper operation of the NFS infrastructure and resume normal operations.

We apologize for the interruption to the SourceForge Site and services during this period of time. We were able to report some information during the downtime via our SFNet_Ops Twitter feed which is included in our 500 error page. However, due to the complex nature of the issue we were unable to provide additional details or an ETA on when the issue would be fixed.

Moving forward, we are investigating latency issues in the internal DNS load balancer and re-configuring relevant portions of our infrastructure to prevent downtime in the future. We will also update the SFNet_Ops Twitter feed more frequently so users are not in the dark. Once again, we apologize for the downtime. Should you find any lingering issues with the site or have any other questions please don’t hesitate to reach out to us.

One Response to “SourceForge Outage Recap and Future Steps”

  1. Hendrik Jul 18, 2016 at 4:33 am #

    Thank you for this posting. Outages do happen, unfortunately, but keeping us informed is what keeps trust in the site.