Home
Name Modified Size Downloads / Week Status
Totals: 2 Items   4.2 MB 2
sonasound.exe 2013-04-21 4.2 MB 11 weekly downloads
README 2013-04-09 15.3 kB 11 weekly downloads
Title: README for SonasoundP Author: Richard Lemieux Date: 2013-04-09 PROLOGUE This software is provided as is and without any warranty of any kind and under the GNU copyright agreement. This software is a follow-up to Niklas Werner SonaSound that is distributed under the GNU copyright from the sourceforge.net site and includes material copyrighted by Niklas Werner. Differences between Niklas's original SonaSound and SonasoundP are listed in the initial entry of the ChangeLog file. SonasoundP uses a simple pipeline architecture that should provide a smooth user experience on many types of computers but this simple design results in glitches and stuttering of the sonogram display in some cases. The current design is more resilient than the earlier design was, and the hope is that the next design will improve on the current design still further. Sonasound allows a wide range of choices of the display speed, FFT size and sample rate and some combinations of parameters may give poor results depending on the computer and the operating system. The idea is that one can hopefully find a usable combination of parameters appropriate to the computer/operating system combination in use. ACKNOWLEDGEMENTS This project uses software and information from sources including. - PortAudio <www.portaudio.com/> This makes audio input very simple. - FLTK <www.fltk.org/> The portable lightweight windowing system. - FFTW <www.fftw.org/> A fantastic library of extremely efficient FFT functions. - MINGW <www.mingw.org/> The GNU develpment system on Windows. - GLFW <www.glfw.org/> The library of timing functions. - Microsoft <http://msdn.microsoft.com/> Excellent documentation on Windows API's. - Windows community forums such as <stackoverflow.com> and many others. - Linux documentation teams and community forums. and of course so many other people and projects. PURPOSE SonasoundP connects to (mono) sound input from the sound card or a USB microphone and displays a real-time sonogram on the display. The intent is to allow students of foreign languages to see features of the sound signal as they speak. I have been using SonasoundP while learning Chinese pronunciation. I have used it in four different ways: 1. Use the logarithmic FFT display to practice the tones, 2. Use the logarithmic FFT display to practice unvoiced consonants such as 'd', 3. Make sure to separate the syllabes with no liaison in between, and 4. Use the linear LPC display to visualize the formants and compare with native speakers. The formants appear as lines on the LPC display and represent resonances in the vocal tract. The resonances depend on the shape of the vocal tract as a result of the position of the tongue and other parts of the vocal tract. The formants do not depend on tones. In fact you can see some of the formants even when you don't voice speech if the microphone is close enough to the mouth. While vocal chords produce a sound of one frequency and harmonics, unvoiced sound is mostly noise and it includes a wide range of frequencies; the flow noises are not white noise and don't come from the focal chords location though. The vocal tract removes frequencies from the initial mix. So, move all parts of the mouth in all kinds of combinations to see which formants results. You can also see the formants on the FFT display, but then make your task easier by generating more harmonics by lowering the tone or try unvoiced speech. HARDWARE REQUIREMENTS 1. A video card that provides hardware accelerated OpenGL. Some OpenGL implementations will allow synchronization between the program and the monitor and others won't. 2. A fast enough processor. SonasoundP uses nearly 20% of one CPU on my 2.4 GHz machine. So I guess the machine should be at least 500 MHZ. INSTALLATION OF SonasoundP See the INSTALL file for compilation instructions. The basic problem is to install all needed libraries and dependencies and edit the appropriate Makefile accordingly. Otherwise this distribution include a binary compiled for Windows and ready to run. OPERATING INSTRUCTIONS 1. Start the program. Two windows will appear on the computer display: a Controls window and a Panels window. 2. In the Controls window, you need to 1. Click on button 'Audio device' to select the input audio device to connect to. If the microphone you want to connect to is not listed, it may be because some other program such as Skype is already using it; close other programs that may be using the microphone and restart SonasoundP. The audio device will likely be either the microphone jack of the sound card or the USB microphone attached to the webcam. Then check 'Sampling rate' just on the right. Click on the arrow head and select 16,000 or a number close to 16,000. 2. Click button 'Open/Close Device' to connect to the device you just selected. 3. Click button 'Start/Stop All' to start the sound processing and the drawing on the panel window. 3. Start talking and watch the spectrogram on the panels window. 1. Make sure the sound is loud enough to fill half of the top signal window. Adjust with the computer mixer program. In Windows click on 'Sound/Recording' and then on 'Microphone properties/Levels'. In Linux it depends on the distribution. I use 'alsamixer' in an xterm terminal to adjust the microphone level. 2. The panels in the Panels window can be resized by dragging the panel boundaries with the mouse. 3. Click on the mouse left button when the cursor is over a panel to freeze / unfreeze the display. Possible problem. 1. It is still better to stop the display (blue button) before resizing the sonogram panel, otherwise the program may crash when slowly decreasing the size of the lower panel. 2. The sonogram display appears to move too slowly or the time diplay shown when moving the mouse in the sonogram panel is too small. This is an indication that the video card is imposing VBlank synchronization and is not allowing applications to set/unset it under program control. Sonasound needs the capability to control by itself the synchronization with the monitor refresh rate. Under Linux with a NVIDIA video card I need to unset the 'Sync to VBlank' option using program 'nvidia-settings' under 'X Screen / OpenGL Settings'. On Windows with an Intel video controller I did not meet this issue yet. If using an NVIDIA card, open the NVIDIA control panel and select 'Use the 3D application setting' under 'Manage 3D settings / Vertical sync'. 3. The animation stuttering can be removed by selecting VBlank using the bottom right button on the panelsWindow. The result may depend on the computer and the operating system. More controls on the Controls window. Sampling Rate: I use 16,000 samples per second for speech. This is good enough for language training. If you use a large number such as 44100 you will need to increase the 'FFT size' and 'LPC size' in proportion. The sampling rate is what determines the 'scale' between samples and seconds. The signal processing algorithms act on samples. The relation between those results and the world of sound is set by the sampling rate. A word or caution. In Windows the selection provided in the 'Sampling rate' menu reflects 'resampled' signals and may have no relation with the actual bandwidth of the microphone or the actual sample rate provided by the USB microphone. For example I use a USB camera microphone returning samples at a fixed 16,000 samples/s rate, but Windows offers 48,000 in the selection. FFT Size: 512 or 1024 is OK. You may change the size to your liking. This is the size of the sample window used to compute the FFT or the LPC parameters. Don't forget to increase the 'Sampling rate' if you want more resolution here. LPC size: Used when 'Spectrum Type' is LPC. A good starting value is 25 if 'Sampling rate' is 16,000 samples/second and the 'FFT size' is 1024. You may want to experiment a bit here. FFT window type: This has nothing to do with the computer monitor. Think of it as a curtain overlaid over each successive segment of the audio signal and attenuating the signal so it is zero on the edges an beyond. All choices except 'Rectangular' are good here. 'Rectangular' means that the audio signal is left unchanged in the window and is zero everywhere else; this choice results in many artifacts in the sonogram. The FFT is computed on the resulting windowed signal. The width of the window is the 'FFT size'. Spectrum Type: Select either FFT or LPC. FFT is OK. LPC with an appropriate velue of 'LPC size' might give you a clearer display of the formants. When the pitch is high it may be hard to see some formants on the FFT display. However there as times when the FFT display is more informative though. Display Grid: 'logarithm' is the default, 'log_colored' gives a blue/red colored display. Ear response curve: Selects an additional attenuation (dB) to be added to the power spectrum to model the ear response. No correction: The attenuation is 0 dB at all frequencies. Human 1: The attenuation reflects the human ear response curve according to measurements published in: Masakazu Konishi, "How the owl tracks its prey.", 1973, American Scientist. Buffer Size: This is a read-only number. This tells how many new samples move in the signal window at successive sonogram computations. The same number of samples move out of the window at the other end. Draw staff lines: Whether or not you need the staff lines. High pass: Batch PS: This control has no effect on how the sound is processed. It controls the way the power spectra computations are scheduled. When the tick is marked, all the power spectra displayed in one paint cycle are computed in a single batch without exchanging signals with other threads. When the tick is unmarked, a signal is sent when a new power spectrum will be needed. Given the fact that Windows and Linux are not real-time operating systems it is better to limit the message traffic as much as possible and select the batch mode here. What's on the Panels window. Top panel: Shows a segment of the signal but not necessarily of the same size as the FFT Size. Middle panel: This shows the power spectrum computed from the FFT of the latest signal window. Right slider: This adjusts the gain of the sonogram panel. Eventually, it will also adjust the gain of the FFT panel. Lower panel: This is the sonogram panel. This is a sequence of power spectras with the power values translated into color or gray level pixels. The horizontal axis shows the time and the vertical axis the frequency. Move the cursor in this window and see the values of time and frequency printed in the bar on the bottom of the window. Bottom left selector: This provides a choice of three ways to select the paint refresh rate. VBlank: Painting is done synchronized with the monitor VBlank signal if this works. Windows: The windowing system generally knows the refresh rate selected for the monitor. This is currently the value used at program startup but the default will eventually be VBlank. If windows can't tell the monitor frequency, this choice will revert to manual mode and a 60 Hz refresh rate. Manual: Use the slider on the right to select a refresh rate of your choice. The painting is then synchronized by the computer clock instead of the monitor. Bottom slider: Sets the refresh rate. This number should be the monitor refresh rate but you can use other values. When Sonasound starts it tries to set this number to the value of the monitor refresh rate as known to Windows or X11. If Sonasound can't get the refresh rate that way, it uses 60 frames per second. Setting that slider to '0' activates 'VBlank sync' or tries to. Then Sonasound uses the measured value of the frame rate. Depending on the hardware combination 'VBlank sync' may not work with Sonasound. This is yet a tricky issue. Bottom right spinner: Sets the number of FFT's or LPC's to be painted at once in the video buffer (typically every monitor refresh cycle). The spinner allows a selection between 1 and 16. A typical LCD monitor will refresh every 1/60th second. The number here tells how many pixels to slide the sonogram panel towards the left. If the number is 4 then the panel is slided by 4 pixels on the left every 1/60th second, and four pixel columns are drawn one for each successive power spectrum. In this example power spectra need to be computed at the rate of 240 FFT/LPC per second (240 = 60 times 4). Other practical points * Some of the parameters may be set on the command line when starting the program. Try 'sonasound -h' to get the list of available parameters. Here is how I start sonasoundp, sonasoundp -D 5 -f 1024 -w 4 -g l -d g -l 240 In my case Device 5 is the webcam microphone, -f 1024 gives an FFT size of 2048, -w 4 for the Hanning asymmetric window, -g l for LPC processing -d g for the logarithm display -l 14 for a LPC size of 240. * Currently SonasoundP handles only Mono (as opposed to Stereo or multi-channel) sound. FUTURE DEVELOPMENT I publish updates when I learn new ways to improve the performance and functionality. The ultimate aim is to provide a practical tool for foreign language students, running on popular platforms. SUPPORT Support is throught the public forum on the SourceForge site. The focus at this point is to provide a stable program on main platforms. NOTES FFT Abbreviation for Fast Fourier Transform. This also means the Fourier transform of a window of the signal. In the extensive sense it often denotes the power spectrum of the FFT to contrast with the power spectrum of a LPC filtered signal. LPC Linear Predictive Coding. This denotes a simple signal filter model of the vocal tract. The parameters of that model are estimated for each window of the input signal. The LPC display shows the power spectrum of the output of that estimated filter when feeded with white noise.
Source: README, updated 2013-04-09