Title: README for SonasoundP
Author: Richard Lemieux
This software is provided as is and without any warranty of any kind and
under the GNU copyright agreement.
This software is a follow-up to Niklas Werner SonaSound that is distributed
under the GNU copyright from the sourceforge.net site and includes material
copyrighted by Niklas Werner.
Differences between Niklas's original SonaSound and SonasoundP are listed
in the initial entry of the ChangeLog file.
SonasoundP uses a simple pipeline architecture that should provide a smooth
user experience on many types of computers but this simple design results in
glitches and stuttering of the sonogram display in some cases. The current
design is more resilient than the earlier design was, and the hope is that the
next design will improve on the current design still further.
Sonasound allows a wide range of choices of the display speed, FFT size and
sample rate and some combinations of parameters may give poor results depending
on the computer and the operating system. The idea is that one can hopefully
find a usable combination of parameters appropriate to the computer/operating
system combination in use.
This project uses software and information from sources including.
- PortAudio <www.portaudio.com/>
This makes audio input very simple.
- FLTK <www.fltk.org/>
The portable lightweight windowing system.
- FFTW <www.fftw.org/>
A fantastic library of extremely efficient FFT functions.
- MINGW <www.mingw.org/>
The GNU develpment system on Windows.
- GLFW <www.glfw.org/>
The library of timing functions.
- Microsoft <http://msdn.microsoft.com/>
Excellent documentation on Windows API's.
- Windows community forums such as
<stackoverflow.com> and many others.
- Linux documentation teams and community forums.
and of course so many other people and projects.
SonasoundP connects to (mono) sound input from the sound card or a USB
microphone and displays a real-time sonogram on the display.
The intent is to allow students of foreign languages to see features of
the sound signal as they speak.
I have been using SonasoundP while learning Chinese pronunciation. I have used
it in four different ways:
1. Use the logarithmic FFT display to practice the tones,
2. Use the logarithmic FFT display to practice unvoiced consonants such as 'd',
3. Make sure to separate the syllabes with no liaison in between, and
4. Use the linear LPC display to visualize the formants and compare with
native speakers. The formants appear as lines on the LPC display and
represent resonances in the vocal tract. The resonances depend on the
shape of the vocal tract as a result of the position of the tongue and
other parts of the vocal tract.
The formants do not depend on tones. In fact you can see some of the
formants even when you don't voice speech if the microphone is close
enough to the mouth. While vocal chords produce a sound of one frequency
and harmonics, unvoiced sound is mostly noise and it includes a wide
range of frequencies; the flow noises are not white noise and don't come
from the focal chords location though. The vocal tract removes
frequencies from the initial mix. So, move all parts of the mouth in all
kinds of combinations to see which formants results. You can also see the
formants on the FFT display, but then make your task easier by generating
more harmonics by lowering the tone or try unvoiced speech.
1. A video card that provides hardware accelerated OpenGL. Some OpenGL
implementations will allow synchronization between the program and the
monitor and others won't.
2. A fast enough processor. SonasoundP uses nearly 20% of one CPU on my
2.4 GHz machine. So I guess the machine should be at least 500 MHZ.
INSTALLATION OF SonasoundP
See the INSTALL file for compilation instructions. The basic problem is to
install all needed libraries and dependencies and edit the appropriate
Otherwise this distribution include a binary compiled for Windows and ready
1. Start the program. Two windows will appear on the computer display:
a Controls window and a Panels window.
2. In the Controls window, you need to
1. Click on button 'Audio device' to select the input audio device to
connect to. If the microphone you want to connect to is not listed,
it may be because some other program such as Skype is already using
it; close other programs that may be using the microphone and
The audio device will likely be either the microphone jack of the
sound card or the USB microphone attached to the webcam.
Then check 'Sampling rate' just on the right. Click on the arrow
head and select 16,000 or a number close to 16,000.
2. Click button 'Open/Close Device' to connect to the device you just
3. Click button 'Start/Stop All' to start the sound processing and
the drawing on the panel window.
3. Start talking and watch the spectrogram on the panels window.
1. Make sure the sound is loud enough to fill half of the top signal
window. Adjust with the computer mixer program. In Windows click on
'Sound/Recording' and then on 'Microphone properties/Levels'. In Linux it
depends on the distribution. I use 'alsamixer' in an xterm terminal to
adjust the microphone level.
2. The panels in the Panels window can be resized by dragging the panel
boundaries with the mouse.
3. Click on the mouse left button when the cursor is over a panel to
freeze / unfreeze the display.
1. It is still better to stop the display (blue button) before resizing the
sonogram panel, otherwise the program may crash when slowly decreasing
the size of the lower panel.
2. The sonogram display appears to move too slowly or the time diplay shown
when moving the mouse in the sonogram panel is too small. This is an
indication that the video card is imposing VBlank synchronization and is
not allowing applications to set/unset it under program control.
Sonasound needs the capability to control by itself the synchronization
with the monitor refresh rate.
Under Linux with a NVIDIA video card I need to unset the 'Sync to
VBlank' option using program 'nvidia-settings' under 'X Screen / OpenGL
On Windows with an Intel video controller I did not meet this issue
yet. If using an NVIDIA card, open the NVIDIA control panel and select
'Use the 3D application setting' under 'Manage 3D settings / Vertical
3. The animation stuttering can be removed by selecting VBlank using the
bottom right button on the panelsWindow. The result may depend on the
computer and the operating system.
More controls on the Controls window.
Sampling Rate: I use 16,000 samples per second for speech. This is
good enough for language training. If you use a large number such as 44100
you will need to increase the 'FFT size' and 'LPC size' in proportion.
The sampling rate is what determines the 'scale' between samples and
seconds. The signal processing algorithms act on samples. The relation
between those results and the world of sound is set by the sampling rate.
A word or caution. In Windows the selection provided in the 'Sampling rate'
menu reflects 'resampled' signals and may have no relation with the actual
bandwidth of the microphone or the actual sample rate provided by the USB
microphone. For example I use a USB camera microphone returning samples
at a fixed 16,000 samples/s rate, but Windows offers 48,000 in the selection.
FFT Size: 512 or 1024 is OK. You may change the size to your liking. This
is the size of the sample window used to compute the FFT or the LPC
parameters. Don't forget to increase the 'Sampling rate' if you want more
LPC size: Used when 'Spectrum Type' is LPC. A good starting value is 25 if
'Sampling rate' is 16,000 samples/second and the 'FFT size' is 1024.
You may want to experiment a bit here.
FFT window type: This has nothing to do with the computer monitor. Think
of it as a curtain overlaid over each successive segment of the audio
signal and attenuating the signal so it is zero on the edges an beyond.
All choices except 'Rectangular' are good here. 'Rectangular' means
that the audio signal is left unchanged in the window and is zero
everywhere else; this choice results in many artifacts in the sonogram.
The FFT is computed on the resulting windowed signal.
The width of the window is the 'FFT size'.
Spectrum Type: Select either FFT or LPC. FFT is OK. LPC with an
appropriate velue of 'LPC size' might give you a clearer display of the
formants. When the pitch is high it may be hard to see some formants
on the FFT display. However there as times when the FFT display is
more informative though.
Display Grid: 'logarithm' is the default, 'log_colored' gives a
blue/red colored display.
Ear response curve: Selects an additional attenuation (dB) to be
added to the power spectrum to model the ear response.
No correction: The attenuation is 0 dB at all frequencies.
Human 1: The attenuation reflects the human ear response curve according
to measurements published in: Masakazu Konishi, "How the owl
tracks its prey.", 1973, American Scientist.
Buffer Size: This is a read-only number. This tells how many new
samples move in the signal window at successive sonogram
computations. The same number of samples move out of the window
at the other end.
Draw staff lines: Whether or not you need the staff lines.
Batch PS: This control has no effect on how the sound is processed. It
controls the way the power spectra computations are scheduled. When
the tick is marked, all the power spectra displayed in one paint cycle
are computed in a single batch without exchanging signals with other
threads. When the tick is unmarked, a signal is sent when a new power
spectrum will be needed. Given the fact that Windows and Linux are not
real-time operating systems it is better to limit the message traffic
as much as possible and select the batch mode here.
What's on the Panels window.
Top panel: Shows a segment of the signal but not necessarily of the same
size as the FFT Size.
Middle panel: This shows the power spectrum computed from the FFT of the
latest signal window.
Right slider: This adjusts the gain of the sonogram panel.
Eventually, it will also adjust the gain of the FFT panel.
Lower panel: This is the sonogram panel. This is a sequence of power
spectras with the power values translated into color or gray level
pixels. The horizontal axis shows the time and the vertical axis the
frequency. Move the cursor in this window and see the values of time
and frequency printed in the bar on the bottom of the window.
Bottom left selector: This provides a choice of three ways to select the
paint refresh rate.
VBlank: Painting is done synchronized with the monitor VBlank signal if
Windows: The windowing system generally knows the refresh rate selected
for the monitor. This is currently the value used at program startup
but the default will eventually be VBlank. If windows can't tell
the monitor frequency, this choice will revert to manual mode and
a 60 Hz refresh rate.
Manual: Use the slider on the right to select a refresh rate of your
choice. The painting is then synchronized by the computer clock
instead of the monitor.
Bottom slider: Sets the refresh rate. This number should be the monitor
refresh rate but you can use other values. When Sonasound starts it
tries to set this number to the value of the monitor refresh rate as
known to Windows or X11. If Sonasound can't get the refresh rate that
way, it uses 60 frames per second.
Setting that slider to '0' activates 'VBlank sync' or tries to. Then
Sonasound uses the measured value of the frame rate. Depending on the
hardware combination 'VBlank sync' may not work with Sonasound. This
is yet a tricky issue.
Bottom right spinner: Sets the number of FFT's or LPC's to be painted at
once in the video buffer (typically every monitor refresh cycle). The
spinner allows a selection between 1 and 16. A typical LCD monitor
will refresh every 1/60th second. The number here tells how many
pixels to slide the sonogram panel towards the left. If the number is
4 then the panel is slided by 4 pixels on the left every 1/60th second,
and four pixel columns are drawn one for each successive power
spectrum. In this example power spectra need to be computed at the
rate of 240 FFT/LPC per second (240 = 60 times 4).
Other practical points
* Some of the parameters may be set on the command line when starting
the program. Try 'sonasound -h' to get the list of available parameters.
Here is how I start sonasoundp,
sonasoundp -D 5 -f 1024 -w 4 -g l -d g -l 240
In my case Device 5 is the webcam microphone,
-f 1024 gives an FFT size of 2048,
-w 4 for the Hanning asymmetric window,
-g l for LPC processing
-d g for the logarithm display
-l 14 for a LPC size of 240.
* Currently SonasoundP handles only Mono (as opposed to Stereo or
I publish updates when I learn new ways to improve the performance and
The ultimate aim is to provide a practical tool for foreign language
students, running on popular platforms.
Support is throught the public forum on the SourceForge site.
The focus at this point is to provide a stable program on main platforms.
FFT Abbreviation for Fast Fourier Transform. This also means the Fourier
transform of a window of the signal. In the extensive sense it often
denotes the power spectrum of the FFT to contrast with the power spectrum
of a LPC filtered signal.
LPC Linear Predictive Coding. This denotes a simple signal filter model of
the vocal tract. The parameters of that model are estimated for each
window of the input signal. The LPC display shows the power spectrum
of the output of that estimated filter when feeded with white noise.