Project of the Month, March 2006

CMU Sphinx

Description of project

CMUSphinx is a collection of several incarnations of Sphinx, a versatile continuous speech recognition toolkit from the Sphinx group at Carnegie Mellon University in Pittsburgh. It consists of two major kinds of components: trainers and decoders. The trainers (SphinxTrain and SimpleLM) are used to build acoustic and language models. These models are one input used by the various Sphinx decoders to transcribe digital audio. The decoders (Sphinx2, Sphinx3, and Sphinx4) perform the actual speech recognition. CMUSphinx is versatile, in that it can be applied to small, medium, and large vocabulary speech recognition applications.

Trove info

  • Operating System: All 32-bit Microsoft Windows (95/98/NT/2000/XP), all POSIX (Linux/BSD/UNIX-like OSes), Mac OS X
  • Hardware Requirement: Intel, AMD, Sun SPARC
  • Programming Language: Sphinx2, Sphinx3, SphinxTrain: C language, with scripts in Perl and some in sh, running on Linux, Windows, and Mac OS X. sphinx4: Java, for any platform where a Java JRE is available.
  • License: BSD-like
  • Language: English

Why and how did you get started?

CMUSphinx got started in 2000 when the Sphinx group at Carnegie Mellon, which had been working on speech recognition since 1987, decided that the open source community could benefit from having full access to a speech recognizer, and the recognizer could benefit from having more people looking at the code.

Eric: After CMUSphinx was underway, my officemate Mosur Ravishankar (a.k.a. Ravi) and I decided that the Sphinx research system needed some changes in order to support state-of-the-art acoustic modeling and search techniques. Ravi went off to build the Sphinx3 speech recognizer, and I went off to start what was to become SphinxTrain.

Peter: Sun Microsystems initially funded Sphinx4 as a demonstration of the performance of Java. Speech recognition was viewed as a problem that requires very complex algorithms, large amounts of memory and large amount of processing time. Writing a large vocabulary speech recognizer in Java that could be directly compared to a similar engine written in C, was viewed a a tour-de-force demonstration that Java was neither "big" nor "slow". In fact, even though Sphinx4 is far more flexible than Sphinx3, it has superior performance on several tests.

What is the software's intended audience?

Speech recognition researchers and application developers.

How many people do you believe are using your software?

We've had 155,000 downloads. That could translate to anything from a few hundred to 20,000 users.

What are a couple of notable examples of how people are using your software?

CMU uses Sphinx extensively, including one project researching Mandarin speech recognition. MERL is using Sphinx4 to research information retrieval using spoken queries. Among others are TRAINS and TRIPS (Sphinx3), Robosapien Dancing Machine (Sphinx3), and AiSee system (Sphinx4).

What gave you an indication that your project was becoming successful?

In the past, answering questions was mainly the administrators' business. Now we see experienced users are willing to help as well, which is a good thing, since the volume of questions from all over the world is increasing -- as is the number of open source projects using one of the Sphinxes. We were also happy to see that our performance in DARPA-sponsored evaluations compared favorably with other state-of-the-art recognizers.

What has been your biggest surprise?

Arthur: It is very difficult to take care of both users' and funded projects' needs.

Evandro: Finding time to contribute, finding ways of making Sphinx more usable by making the information accessible, or making data available so people can use it.

Ravi: Getting enough manpower together to put out high-quality documentation.

Eric: Acoustic modeling is a complicated multi-step process where all of the steps need to be configured and executed correctly or you can easily get garbage or suboptimal results. My biggest challenge has been demonstrating that algorithms/setups are correct and optimal when there is no gold standard to compare against other than the performance other people have been able to achieve.

Why do you think your project has been so well received?

Sphinx is a state-of-the-art system, and it can be used in commercial systems. Its flexible architecture makes Sphinx4 the ideal platform for trying innovative algorithms and building new types of applications. We also do a good job of support and maintenance, which are probably more important to users than the initial effort of development. Key developers are willing to help with users' questions.

Where do you see your project going?

We provide different recognizers for different situations, and different tools to support building other resources for the recognizers, such as acoustic models and language models. When SourceForge provides full support for Subversion, CMU Sphinx will use it for several major code-restructuring jobs across different modules in the codebase. For example, Sphinx2, Sphinx3, and SphinxTrain have some duplication problems. We are determined to eliminate them in the future. Sphinx4 needs a trainer. Sphinx4 is a great platform for building distributed and server-based speech recognition. Hopefully, it will enable new types of applications for the Internet, handheld market, and cell phone markets.

What's on your project wish list?

Arthur:

  1. Build a dictation machine.
  2. Enhance the current decoding and training algorithms.
  3. Use more sophisticated techniques in acoustic modeling.
  4. Make Sphinx support multiple languages.

Evandro: More systematic testing of Sphinx2 and SphinxTrain.

Peter: "Productize" the code so that Sphinx4 can be used in real products and services.

Ravi: Incorporate new acoustic and language modeling techniques that are currently missing from the open source version.

Eric: Design a common API for system objects shared by SphinxTrain, Sphinx2, and Sphinx3. Use more OO design and C++.

What are you most proud of?

The fact that people actually use things we helped to build.

If you could change something about the project, what would it be?

I wish that Sphinx3 could have a better user interface and developer support. We would also like to enlarge the number of contributors.

How do you coordinate the project?

All of the developers are experts in their own areas so it is pretty hard to say who the true leader is. Everyone receives email about CVS commits, and we give feedback on the changes as soon as we can. We have strong regression testing on Sphinx3 and Sphinx4 projects that helps avoid unexpected behavior changes in the code. The developers on the CMU side meet regularly with Sphinx's developers.

Do you work on the project full-time, or do you have another job?

We all work part-time on this project.

If you work on the project part-time, how much time would you say you spend, per week, on it?

From zero to 10 to 30 hours a week. It can be bursty.

What is your development environment like?

Machine: Intel 2.8GHz P4, Apple iBook, clusters in Speech group

OS: Windows XP, Red Hat Linux 9, Mac OS X.

Tools: gcc, Intel CC, Microsoft Visual C 6 and Visual C .Net, J2SE, Ant, xemacs, Eclipse, gdb, JetBrains Idea 1.5, KDevelop

Milestones:

  • 2000/01/27 CMUSphinx transition to open source
    Sphinx 4
  • 2004/09/28 Sphinx4 1.0 beta released
  • 2004/06/04 Sphinx4 0.1 alpha released
    Sphinx 3
  • 2005/01/13 Sphinx3.5 released
  • 2004/07/13 Sphinx3.4 released
    Sphinx 2
  • 2005/10/13 Sphinx2.6 released
  • 2004/07/31 Sphinx2.5 released
  • 2001/12/17 Sphinx2.4 released
    Sphinx Train
  • 2001/06/07 SphinxTrain Acoustic Model Trainer released
    PocketSphinx
  • 2005/11/08 PocketSphinx released

How can others contribute?

We do not yet have clear rules as to how to add someone new as a developer, so things go on a case by case basis. If you are interested, send an email to cmusphinx-contacts at lists dot sourceforge dot net.


More projects of the month

Project leaders:

Name: Willie Walker
Age: 41
Occupation: Software engineer
Education: BS in CS, summa cum laude, Virginia Tech, 1988
Experience: Experience: 16 years in Unix accessibility for people with disabilities
Location: Hollis, NH
Employer: Sun Microsystems

Key developers:

Name: Arthur Chan
Age: 29
Occupation: Senior Research Programmer
Education: Forgotten
Location: Pittsburgh
Employer: Carnegie Mellon University

Name: Evandro GouveaEvandro Gouvea
Age: 38
Occupation: Researcher
Education: Ph.D., Carnegie Mellon University
Location: Pittsburgh, Pennsylvania
Employer: Carnegie Mellon University

Name: Paul Lamere
Age: 46
Occupation: Software developer / researcher
Education: MSCS, Boston University
Location: Nashua, New Hampshire
Employer: Sun Microsystems Inc.

Name: Mosur Ravishankar (a.k.a. Ravi Mosur)Mosur Ravishankar (a.k.a. Ravi Mosur)
Age: 50
Occupation: Research Faculty, Carnegie Mellon University
Education: Ph.D., Carnegie Mellon University
Location: Pittsburgh, PA

Name: Eric Thayer
Age: 41
Occupation: Director, Speech Engineering (20 years building software, 12 years at CMU), Soliloquy Learning, Inc.
Location: Waltham, MA

Name: Peter WolfPeter Wolf
Age: 46
Occupation: Software engineer
Education: B.S. in CS, Yale University '82
Location: Boston

Quote about SourceForge.net?

SourceForge.net makes it easier to concentrate on development rather than on system maintainance by providing easy means of distributing information about our project. SourceForge provides us with a reliable, convenient, and painless way to share speech recognition research with the world.

Why did you place the project on SourceForge.net?

Low cost, simplicity, popularity, ease of access, high availability, large number of features: individual project Web pages, mailing lists, project forums, bug lists, source code control, etc.

How has SourceForge.net helped you?

It makes it easier to have the CVS repository in a publicly accessible site, and to release code. When I'm looking for a program or a library, the first place I look is SF, and I invariably find what I'm looking for.

The number one benefit of using SourceForge.net is:

It has all the infrastructure we need. Without a central repository that was accessible to developers from many organizations it would have been impossible to complete this project. SF takes care of all the hard work and allows us to concentrate on the project.