Tag Archives: statistics

Developer counts – the stats

Last week’s interview about the Tiki Wiki project got me looking closer at the data, and I wanted to share a few of my findings with you.

First of all, here’s that graph again:

project_vs_developer_count_sourceforge_2012-03-08

That shows the distribution of projects vs how many developers are on a project. 268,554 projects have only one developer. Whether that represents projects that simply don’t have a very wide appeal, or if it is project admins holding on to the reins a little too tightly, that represents a lot of code that will be orphaned if just one person loses interest, has a change in their available free time, or, as is inevitable some day, dies.

At the other end of the spectrum, there are 21 projects that have more than 100 committers. Some of these are names you may have heard before, while others may come as a surprise.

Project Developer Count
Firebird 104
Gene Ontology 106
CARE2X php Integ Hospital Info System 107
IT Process Models Repository 109
Boost C++ Libraries 127
Inkscape 133
tagua 134
eStudy 136
Generic Model Organism Database Project 136
Mediaportal Plugins 140
ADempiere ERP Business Suite 141
VeniVidiWiki 144
jEdit 177
XOOPS Web Application Platform 183
GamesCrafters 204
Apertium: machine translation toolbox 230
The Plone Collective 248
Moodle 248
TinyOS 324
work-in-progress pkgsrc packages 365
Tiki Wiki CMS Groupware 500

But what these numbers mean is far from obvious. As was mentioned in the Tiki Wiki interview, although there are 500 committers, only 260 of them have actually committed anything.

And that VeniVidiWiki project looks decidedly fishy, with the project registered on 2004-05-24, the latest activity on 2004-06-25, but with 144 committers. What does that mean? It turns out that it was part of a University course, where all of the students in the course had to take part in an Open Source project, so they started one from scratch. When the course was over, the project was over, too.

I’m always a little skeptical about the use of numbers to prove much of anything about an Open Source project – who’s to say that 1000 commits are in any way better than 100 commits? The proof is in the code, the longevity of the project, the quality of the end product.

But the hypothesis that a large committer base leads to a sustainable project seems to work out in this particular statistical sample, with all but three of the project showing some activity in the last couple of months.

This is all, of course, very unscientific, and I hope, over the coming months, to have conversations with various of these projects, and investigate further what seems to work when it comes to deciding who gets commit rights and who doesn’t. But I thought you’d like to see some of the data I’m working with.

The Top Myths About Sourceforge

Since starting at Sourceforge about a month ago, I’ve been paying close attention to media and Twitter mentions of Sourceforge. I’ve been astonished at the sheer volume of misinformation that’s just accepted as fact. I suppose when things are said often enough, you just can’t help believing them. Here’s some of the most common ones.

You have to use CVS

Sourceforge has offered Subversion for many years – pretty much since Subversion was available.

But we’ve also offered Git for many years. We had Git long before Git was cool. In fact, Git is the default when you create a new project. And, the Sourceforge codebase itself (Codename: Allura) is developed in Git. On Sourceforge. The Sourceforge code is released under the Apache Software License (ASL2) and is just as free as everything else on Sourceforge.

Much like another popular code hosting service you might have heard of, our Git implementation provides one-button forking, and one-button pull requests.

Oh, we offer Mercurial (hg) hosting too, if you prefer.

SCM options

We do, in fact, still offer CVS, but only to support older projects that haven’t gotten around to migrating yet – and there are a few. We’re available to help you migrate between various different SCM solutions, if you need that help.

New projects have to be approved

Long, long ago, we required that new projects be approved. This was a spam prevention measure. I remember those days, vaguely. That was at least four jobs ago, and a lot has changed since then. These days, creating a new project takes less than a minute, and does not involve any approval step.

You can’t customize your website

One default Sourceforge project site looks like another. But you have the option of creating a virtual host where you can put up a site that looks like whatever you want. Virtual hosts have tools you’d expect from a typical webhost, including php and mysql, but you can also install a variety of other things in order to make your project website whatever you need it to be.

We’ll answer requests for any hostname you have registered, as well as for PROJECTNAME.sf.net, and you can have up to ten virtual hosts per project. You then have access via your login shell to update those sites.

Sourceforge Is Dead

Ah, yes, the standard tech meme of announcing the death of whatever it is that you don’t like. As usual, it’s somewhat exaggerated.

We have almost 3.5 million registered users. The number of projects on Sourceforge is right at 325,000, and continues to grow every day. The existing projects continue to develop software, committing over 5,000 changes a day, closing tickets, and pushing out new releases, every day. And visitors from 40,000,000 unique addresses visited the website last month, downloading releases more than 4,000,000 times a day.

And Google seems to think we measure up pretty well to those other hosting sites. (via @robilad)

Google site stats

Meanwhile, yes, there are a lot of dormant and abandoned projects. This is also the case at Google Code, GitHub, and any other code hosting service you care to think of. It’s the normal lifecycle of Open Source software that some projects fall by the wayside. Some, because they are done, and there’s nothing more to do. Some because the developers lose interest and move on. And some because something else has been created that obsoletes it.

It is natural, and expected, that an older code hosting service will have a larger number of abandoned projects than the newcomers. We’re working on some ideas of community health metrics so that you can more quickly identify whether a particular project is active or not, while still keeping around the older projects that someone might still find useful. And we already incorporate project activity into search result ranking, so that these less active projects won’t be the ones that you find, most of the time, when you’re looking for software.

So, we think we’re pretty much alive, but we’re not resting on our laurels. The engineering team is working constantly on the platform, making it work better, look better, and scale better. And, for the criticisms of Sourceforge that are true, we’re working hard to correct them.

We think it’s worth your time to look into Sourceforge for yourself, and not just accept the myths.