Archive | July, 2010

Lee says so long

All good things come to an end, and so it is with my tenure at SourceForge and parent company Geeknet. I’ve enjoyed learning about the projects I’ve been writing about every day on the community blog, and I’m sorry I won’t be around to share the excitement as the developers debut new features as part of a total revamp of the forge. From now on you can expect to see fewer blog entries – probably mainly writeups of the projects of the month and the occasional news development.

As for me, I’m not certain yet what my next venture will be. (If you know of an organization that’s hiring an editor or community manager, or someone who needs freelance writing or editing, let me know!) I’ve had a great seven years at this company, working with many exceptionally talented individuals. So long and good luck to all my colleagues.

And to everyone: If you’d like to stay in touch, follow me on Twitter.

– 30 –

For reproducible research, go to Madagascar

Reproducible research is, or should be, a tenet of any scientific endeavor. In today’s world, it means integrating results of published computational experiments with software and the data necessary for reproducing the experiments. Madagascar, an open source software package for scientific analysis of large digital datasets such as those occurring in geophysics, focuses on promoting reproducible research. Madagascar is used primarily by exploration geophysicists, but it can be employed for other scientific applications as well.

Madagascar comes from the Stanford Exploration Project. With the help of his students, including Madagascar project leader Sergey Fomel, geophysicist Jon Claerbout created an environment for reproducible computational experiments at his lab there. The environment worked at Stanford but was too complicated and clumsy to be shared with other groups, Fomel says. In addition, published results quickly lost reproducibility because nobody maintained them. “During my Ph.D. studies at Stanford in the late 1990s, I was inspired by the free and open source software movement and decided that the proper way to promote reproducible research was by turning it into an open source project,” Fomel says. “I started working on Madagascar (previously named Regularly Sampled Format) in 2003. The project got publicly released under the GPL in 2006. Since then, around 25 people have joined the community and contributed to development.

“We use C in the number-crunching part of Madagascar for optimal efficiency and for staying close to the hardware. We provide APIs for users who want to develop scientific codes in other languages, including C++, Fortran, Matlab, Python, and Java. We use Python for gluing the number-crunching code together into data processing workflows and for integrating them with publications. SCons, a Python-based open source replacement for make, provides a particularly useful environment. Python is a clean and easy-to-learn scripting language, which seems perfect for the task.”

Fomel says he released the software as open source because the open source philosophy matches the thinking behind reproducible research. “Reproducible research is a way of communicating computational results in a scientifically meaningful way so that other people could reproduce, verify, and extend them. Open source software works on the same principle. It is a natural match.

“ was attractive for us because it has the reputation for hosting famous open source projects and because it provides all the necessary tools for organizing an open source community: the Subversion server, the file server, mailing lists, etc. However, we miss the Compile Farm feature, which was useful for testing installation on different platforms and was one of the original attractions.”

Madagascar reached a big milestone last week with its release of version 1.0. “We had a particular goal for 1.0,” Fomel says, “which was a system for automatic testing. Once a computational result is archived in a reproducible form, it serves as a regression test for further development. We wanted a system for running such tests automatically. When a system like that was developed (in a community effort, with invaluable contributions from Joe Dellinger, Jim Jennings, and Nick Vlad), we could release 1.0.”

But this version is just one milestone for the project, Fomel says. “The collection of reproducible research papers will expand. We hope to diversify from geophysics to other scientific fields that work with large multidimensional data. There will be better tools for large-scale parallel computations and better documentation for existing tools. We have been doing two releases per year, which seems to catch major improvements, but that rate might accelerate.

“We could definitely use some help. New scientific applications, graphical user interfaces, better visualization, a cleaner Python framework are some of the areas where someone could contribute. The best way to get in touch is by writing to the mailing lists for users or developers.”

Spotting points of interest with phpoi

If you’ve ever looked at Google Maps to find points of interest in a given area, and wanted to do something similar with locales of your own, have we got a project for you. Phpoi is a set of PHP scripts for web servers that are designed to handle data about points of interest stored in a MySQL database. The scripts can create web pages with lists of POIs with links to different map services, and can also be used to create POI files usable by different GPS navigators.

Swedish developer Henrik Carlqvist started writing phpoi in 2008 when he was unable to find any software like it to use on a judo club’s website. “Most of all I was then looking for the functionality to provide POI files for navigators. As an added benefit I also got a web page with a complete list of all Swedish judo clubs.”

In future versions Carlqvist hopes to provide support for more GPS file formats. He makes new releases about once a year. “Right now I need someone with a Navman navigator and the software for it to load POIs. I need someone who can test the generated POI files created for Navman. I would also appreciate contributions of PHP files supporting more GPS file formats. The best way for anyone who would like to help to get in touch is through one of the project’s trackers.”

IDJC revives the radio star

Five years ago, English developer Stephen Fairchild found himself with access to a Shoutcast server but no way of making a show using free software. “Purely by coincidence I was playing around with Python and working through the PyGTK tutorial at the exact same time, so I decided to attempt to create an Internet radio client of my own.” Thus was born Internet DJ Console, a source client for Shoutcast or Icecast servers aimed at people who want to produce their own radio shows and stream them live on the Internet.

Fairchild focused on the Linux platform because similar more mature applications already exist on Windows. “Creating another less well-known and less mature application would have been pointless. It had to be for Linux, therefore it had to be open, just so I could consider it to be my contribution.”

While he just released the latest version earlier this month, Fairchild is already focusing on the future. “I’m thinking of adding cue sheet support to the players. The record feature already creates those, and certain other players can use them. I want to add the ability to search playlists. Debian seem keen on MP2 so I may add that. AAC streaming is a distinct possibility.”

A PEBL in the neuroscience sea

If you’re a psychology or neuroscientist, part of your job likely involves conducting experiments for research or clinical purposes. Unfortunately, the most common software tools used to create experiments typically require restrictive and expensive licenses. Not PEBL, however. This seven-year-old special-purpose programming language lets psychologists and neuroscientists create, modify, run, and share computer-based experiments.

The Psychology Experiment Building Language is also useful in the Psychology classroom, because it lets instructors and professors distribute tests to students or set them up in a computer lab so that students can experience firsthand the research paradigms they read about in a textbook. It includes special-purpose functions that make it easy to create visual stimuli, collect responses, randomize and counterbalance experimental designs, and record data.

PEBL is available on Windows, Linux, and Mac OS X. It bundles a set of experiments and tests in the form of the PEBL Test Battery that provide free implementations of many classic studies from cognitive and clinical neuropsychology. PEBL and the PEBL Test Battery have been used by researchers around the world, ranging from clinicians in their own offices to laboratories at Ivy League universities to government labs, including NASA and NIH.

Shane Mueller, a cognitive scientist and research psychologist who works in Dayton, Ohio, created and maintains PEBL. It’s a full-fledged programming language, heavily influenced by LISP and R, and written in C++, using the Standard Template Library and, increasingly, PEBL itself. It uses a parser/lexer created with Bison and flex, which gives PEBL a lot of flexibility in experimental design.

PEBL compiles text files to a parsed tree of executable nodes, and then executes that tree to run the experiment. It heavily leverages the SDL gaming libraries (including SDL_ttf, SDL_image, SDL_gfx, and SDL_net) to help provide simple creation and manipulation of stimuli. It is designed to be forgiving for beginner users (who are often graduate students in psychology) and to avoid many subtleties that create problems in other programming languages.

One script in the package allows you to collect survey data without working with PEBL code at all. You specify the survey questions in a .csv file, and the software runs the questions and saves the results in data files for you. “This turns out to be much easier than paper-and-pencil surveys that researchers still use frequently,” Mueller says, “because you don’t have to hand-code your data after you are done.”

Why make the software open source when similar applications are making money as proprietary tools? “I felt the community was turning over the keys to the scientific kingdom to vendors whose best interest was in keeping the doors locked,” Mueller says. “This, to me, is anti-scientific, because it means that you can’t share your experiments easily, unless the person you are sharing with buys the license. And you can’t check others’ experiments for errors, which is especially true for boutique companies that sell special-purpose test batteries. Plus, if your license lapses for whatever reason, you don’t have access to your own past experiments. Data are not much good if you can’t reproduce the conditions under which they are collected.”

Mueller chose to host on because “it provides a level of permanence that hosting on your own site cannot, and a level of independence that hosting at a university cannot. Plus, SourceForge offers a number of useful tools (mailing lists, wiki, CVS, web hosting, etc.) to help a grow a community around a piece of software.”

In the next version of PEBL, Mueller plans to work on support for various devices and trigger mechanisms that researchers use to link their experiment software with hardware such as eyetrackers, response buttons, and EEG systems. He makes new releases of the core software about once a year, with releases to the test battery coming about twice a year. He welcomes help with translations. “Many of the experiments I distribute can be localized into different languages, and researchers do this, but I don’t get many translations contributed back to the project.”

He’d also like help with validation studies. “One of the biggest obstacles researchers face when considering PEBL Battery tests is that there are currently only a few published studies showing performance distributions of typical research participants in these specific tasks. This is improving, with ongoing studies collecting norms for different tests, but it is really an ideal setting for open source collaboration, where researchers from multiple sites, world-wide, can contribute small studies to a large pool so that better norms can be developed.”

If you’d like to help with the project, e-mail the pebl-list or pebl-norms list.