Reproducible research is, or should be, a tenet of any scientific endeavor. In today’s world, it means integrating results of published computational experiments with software and the data necessary for reproducing the experiments. Madagascar, an open source software package for scientific analysis of large digital datasets such as those occurring in geophysics, focuses on promoting reproducible research. Madagascar is used primarily by exploration geophysicists, but it can be employed for other scientific applications as well.
Madagascar comes from the Stanford Exploration Project. With the help of his students, including Madagascar project leader Sergey Fomel, geophysicist Jon Claerbout created an environment for reproducible computational experiments at his lab there. The environment worked at Stanford but was too complicated and clumsy to be shared with other groups, Fomel says. In addition, published results quickly lost reproducibility because nobody maintained them. “During my Ph.D. studies at Stanford in the late 1990s, I was inspired by the free and open source software movement and decided that the proper way to promote reproducible research was by turning it into an open source project,” Fomel says. “I started working on Madagascar (previously named Regularly Sampled Format) in 2003. The project got publicly released under the GPL in 2006. Since then, around 25 people have joined the community and contributed to development.
“We use C in the number-crunching part of Madagascar for optimal efficiency and for staying close to the hardware. We provide APIs for users who want to develop scientific codes in other languages, including C++, Fortran, Matlab, Python, and Java. We use Python for gluing the number-crunching code together into data processing workflows and for integrating them with publications. SCons, a Python-based open source replacement for make, provides a particularly useful environment. Python is a clean and easy-to-learn scripting language, which seems perfect for the task.”
Fomel says he released the software as open source because the open source philosophy matches the thinking behind reproducible research. “Reproducible research is a way of communicating computational results in a scientifically meaningful way so that other people could reproduce, verify, and extend them. Open source software works on the same principle. It is a natural match.
“SourceForge.net was attractive for us because it has the reputation for hosting famous open source projects and because it provides all the necessary tools for organizing an open source community: the Subversion server, the file server, mailing lists, etc. However, we miss the Compile Farm feature, which was useful for testing installation on different platforms and was one of the original attractions.”
Madagascar reached a big milestone last week with its release of version 1.0. “We had a particular goal for 1.0,” Fomel says, “which was a system for automatic testing. Once a computational result is archived in a reproducible form, it serves as a regression test for further development. We wanted a system for running such tests automatically. When a system like that was developed (in a community effort, with invaluable contributions from Joe Dellinger, Jim Jennings, and Nick Vlad), we could release 1.0.”
But this version is just one milestone for the project, Fomel says. “The collection of reproducible research papers will expand. We hope to diversify from geophysics to other scientific fields that work with large multidimensional data. There will be better tools for large-scale parallel computations and better documentation for existing tools. We have been doing two releases per year, which seems to catch major improvements, but that rate might accelerate.
“We could definitely use some help. New scientific applications, graphical user interfaces, better visualization, a cleaner Python framework are some of the areas where someone could contribute. The best way to get in touch is by writing to the mailing lists for users or developers.”