As a fairly long time user of the ND data set and a member of the FLOSSMole SF project (thought I haven't yet contributed in any significant way). I concur with Megan's assessment, but I would add a couple things.
1) The interface for the FLOSSMole data set is much more usable. I have written an desktop based extension to the ND interface which made it much more usable for me. I hope to present that work at WoPDaSD'07 in June, but I would be happy to give you the application with source code if you decided you need some data from the ND site.
2) The database schema of the ND data set is not well understood, even by the ND researchers. This can make extracting information tricky. The tool I mentioned above included functionality to automatically generate ER diagrams from the database which aided me in my research.
3) The ND data set contains all the tracker logs (bugs, feature requests, patches, etc..) which I don't believe the FLOSSMole data does.
Hope that helps,
On 4/24/07, Megan Conklin <email@example.com> wrote:
> What is the difference between the Flossmole data for SF and the Notre Dame
> (official SourceForge) data?
Hi Scott, welcome to our little community of "data diggers". Great
My understanding is that the Notre Dame data comes directly from
Sourceforge in the form of database dumps each month.
Our FLOSSmole Sourceforge data is scraped from the Sourceforge web
site using our automated collection agents.
As far as I know, both projects have the same number of SF projects in
them (i.e. all of them), although the amount of data in the ND data
sets will probably be bigger since they have "everything". (Whether
you need "everything" is up to you!)
There are a few other differences between the Notre Dame effort and
the FLOSSmole effort:
1. FLOSSmole carries more than just Sourceforge data (we also carry
Freshmeat, Rubyforge, Objectweb, Free Software Foundation, and
SourceKibitzer at present, and we can carry any data you suggest)
2. FLOSSmole seeks donations of data and analyses from other research
groups. For example, we take monthly donations from SourceKibitzer,
which is an automated analysis of empirical measurements of open
source projects. The idea here is that researchers can use our data or
their own data, contribute data, remix data/analyze data/whatever, and
then (hopefully) contribute these analyses back for someone else to
3. The FLOSSmole data is free and open for your perusal at all times
of day or night, and for all interested parties (research, industry,
academia, we don't care). There are no forms or permission slips.
I guess the two sites have two different purposes, and we very
peacefully co-exist. I am quite sure that both data sets can be very
useful to you. Many users of FLOSSmole also are users of the ND data,
and vice versa.
Great question! Let us know what your ideas are for FLOSSmole and
we'll get some nice conversation going. Welcome to the community.
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
Ossmole-discuss mailing list