Name Modified Size InfoDownloads / Week
README 2016-10-13 20.0 kB
sonasoundp.exe 2015-12-19 5.5 MB
sonasound.exe 2014-10-27 5.3 MB
Totals: 3 Items   10.8 MB 0
Title: README for SonasoundP Author: Richard Lemieux Date: 2013-04-09 Last time reviewed: 2015-08-10 Copyright: 2013,1014 Richard Lemieux SonasoundP Copyright (C) 2009-1014 Richard Lemieux under GNU GPL Version 3 and higher versions. Sonasound Copyright (C) 2002-2003 Niklas Werner READY TO RUN PROGRAMS FOR WINDOWS sonasoundp.exe: The program runs well on my non-throttled Windows10 workstation. This build also work fine on an older LE1700 tabletPC featuring a Core2 dual CPU and an Intel video chipset. VBlank synchronization is not supported by OpenGL on that older platform but it works well in Manual sync mode by setting the bottom slider to 60 on the Panels window.. SonasoundP may segfault the first time it runs. That may be related to fact that the program creates and uses a small file to hold FFT3W parameters. Otherwise, SonasoundP looks quite stable. The parameter file is named "sonasound.fftw_wisdom3_threads". sonasound.exe (deprecated): This older edition ran on Windows8 and earlier versions of Windows. The Windows programs are compiled using a MingW setup. I didn't figure out yet how to make a ready-to-run program for the Linux version. PROLOGUE This software is provided as is and without any warranty of any kind and under the GNU copyright agreement. This software is a follow-up to Niklas Werner SonaSound that is distributed under the GNU copyright from the sourceforge.net site and includes material copyrighted by Niklas Werner. Differences between Niklas's original SonaSound and SonasoundP are listed in the initial entry of the ChangeLog file. SonasoundP uses a simple pipeline architecture that should provide a smooth user experience on many types of computers but this simple design results in glitches and stuttering of the sonogram display in some cases. The current design is more resilient than the earlier design was, and the hope is that the next design will improve on the current design still further. Sonasound allows a wide range of choices of the display speed, FFT size and sample rate and some combinations of parameters may give poor results depending on the computer and the operating system. The idea is that one can hopefully find a usable combination of parameters appropriate to the computer/operating system combination in use. ACKNOWLEDGEMENTS This project uses software and information from sources including. - PortAudio <www.portaudio.com/> This makes audio input very simple. - FLTK <www.fltk.org/> The portable lightweight windowing system. - FFTW <www.fftw.org/> A fantastic library of extremely efficient FFT functions. - MINGW <www.mingw.org/> The GNU develpment system on Windows. - GLFW <www.glfw.org/> The library of timing functions. - Microsoft <http://msdn.microsoft.com/> Excellent documentation on Windows API's. - Windows community forums such as <stackoverflow.com> and many others. - Linux documentation teams and community forums. and of course so many other people and projects. PURPOSE SonasoundP connects to (mono) sound input from the sound card or a USB microphone and displays a real-time sonogram on the display. The intent is to allow students of foreign languages to see features of the sound signal as they speak. I have been using SonasoundP while learning Chinese pronunciation. I have used it in four different ways: 1. Use the logarithmic FFT display to practice the tones, 2. Use the logarithmic FFT display to practice unvoiced consonants such as 'd', 3. Make sure to separate the syllabes with no liaison in between, and 4. Use the linear LPC display to visualize the formants and compare with native speakers. The formants appear as lines on the LPC display and represent resonances in the vocal tract. The resonances depend on the shape of the vocal tract as a result of the position of the tongue and other parts of the vocal tract. The formants do not depend on tones. In fact you can see some of the formants even when you don't voice speech if the microphone is close enough to the mouth. While vocal chords produce a sound of one frequency and harmonics, unvoiced sound is mostly noise and it includes a wide range of frequencies; the flow noises are not white noise and don't come from the focal chords location though. The vocal tract removes frequencies from the initial mix. So, move all parts of the mouth in all kinds of combinations to see which formants results. You can also see the formants on the FFT display, but then make your task easier by generating more harmonics by lowering the tone or try unvoiced speech. Unvoiced as well as voiced speech also includes spectrally colored sound created by a process called venturi collapse. This process occurs when air is accelerated at a stricture between the soft tongue and another surface such as the palate or the teeths and the pressure suddenly drops causing a deformation of the softer material and leading to a repetitive process. (See "https://en.wikipedia.org/wiki/Formant") The sonogram display lags a bit behind the sound and the delay depends mostly on the size of the FFT window and the audio packet delivered by PortAudio. The resulting lag is 1/10 s for a 16,000 sample/s signal and a 1024 FFT size (1/15 s from the FFT window and 1/30 from the PortAudio packet); this lag translates in 30 pixel columns on the display at 5x acceleration. So it is not that easy to relate the video to the sound when monitoring a fast speaker. HARDWARE REQUIREMENTS 1. A video card that provides hardware accelerated OpenGL. Some OpenGL implementations will allow synchronization between the program and the monitor and others won't. 2. A fast enough processor. SonasoundP uses nearly 20% of one CPU on my 2.4 GHz machine. So I guess the machine should be at least 500 MHZ. INSTALLATION OF SonasoundP See the INSTALL file for compilation instructions. The basic problem is to install all needed libraries and dependencies and edit the appropriate Makefile accordingly. Otherwise this distribution includes a binary compiled for Windows and ready to run. OPERATING INSTRUCTIONS 1. Start the program. Two windows will appear on the computer display: a Controls window and a Panels window. 2. In the Controls window, you need to 1. Click on button 'Audio device' to select the input audio device to connect to. If the microphone you want to connect to is not listed, it may be because some other program such as Skype is already using it; close other programs that may be using the microphone and restart SonasoundP. The audio device will likely be either the microphone jack of the sound card or the USB microphone attached to the webcam. Then check 'Sampling rate' just on the right. Click on the arrow head and select 16,000 or a number close to 16,000. 2. Click button 'Open/Close Device' to connect to the device you just selected. 3. Click button 'Start/Stop All' to start the sound processing and the drawing on the panel window. 3. Start talking and watch the spectrogram on the panels window. 1. Make sure the sound is loud enough to fill half of the top signal window. Adjust with the computer mixer program. In Windows click on 'Sound/Recording' and then on 'Microphone properties/Levels'. In Linux it depends on the distribution. I use 'alsamixer' in an xterm terminal to adjust the microphone level. 2. The panels in the Panels window can be resized by dragging the panel boundaries with the mouse. 3. Click on the left mouse button when the cursor is over a panel to freeze / unfreeze the display. 3. Possible problems. 1. The program seg faults when starting. This may occur once in a while, but the program is stable once started. This is probably caused by a variable that is not properly initialized at startup but which variable it is has escaped all code reviews up to now. 1. The animation stuttering/tearing can be removed by selecting VBlank using the bottom right button on the panelsWindow. The result may depend on the computer and the operating system. 2. The sonogram display appears to move too slowly or the time diplay shown when moving the mouse in the sonogram panel is too small. This is an indication that the video card is imposing VBlank synchronization and is not allowing applications to set/unset it under program control. Sonasound needs the capability to control the synchronization with the monitor refresh rate when using VBlank synchro mode because SonasoundP makes use of three GL contexts and just one of them needs to wait for the VBlank. Otherwise if the three panels are made to wait then the panels together run at one third of the expected rate. Under Linux with a NVIDIA video card I need to unset the 'Sync to VBlank' option using program 'nvidia-settings' under 'X Screen / OpenGL Settings'. If using an NVIDIA card under Windows, open the NVIDIA control panel and select 'Use the 3D application setting' under 'Manage 3D settings / Vertical sync'. I did not meet that issue with the Intel video chipset on my slate PC since the older OpenGL version used on that Intel Chipset does not provide any GL extension for VBlank control. MORE CONTROLS ON THE CONTROLS WINDOW. Sampling Rate: I use 16,000 samples per second for speech. This is good enough for language training. If you use a large number such as 44100 you will need to increase the 'FFT size' and 'LPC size' in proportion. The sampling rate is what determines the 'scale' between samples and seconds. The signal processing algorithms act on samples. The relation between those results and the world of sound is set by the sampling rate. A word or caution. In Windows the selection provided in the 'Sampling rate' menu reflects 'resampled' signals and may have no relation with the actual bandwidth of the microphone or the actual sample rate provided by the USB microphone. For example I use a USB camera microphone returning samples at a fixed 16,000 samples/s rate, but Windows offers 48,000 in the selection. FFT Size: 512 or 1024 is OK. You may change the size to your liking. This is the size of the sample window used to compute the FFT or the LPC parameters. Don't forget to increase the 'Sampling rate' if you want more resolution here. LPC size: Used when 'Spectrum Type' is LPC. A good starting value is 25 if 'Sampling rate' is 16,000 samples/second and the 'FFT size' is 1024. You may want to experiment a bit here. FFT window type: This has nothing to do with the computer monitor. Think of it as a curtain overlaid over each successive segment of the audio signal and attenuating the signal so it is zero on the edges an beyond. All choices except 'Rectangular' are good here. 'Rectangular' means that the audio signal is left unchanged in the window and is zero everywhere else; this choice results in many artifacts in the sonogram. The FFT is computed on the resulting windowed signal. The width of the window is the 'FFT size'. Spectrum Type: Select either FFT or LPC. FFT is OK. LPC with an appropriate value of 'LPC size' might give you a clearer display of the formants. When the pitch is high it may be hard to see some formants on the FFT display. However there as times when the FFT display is more informative though. Display Grid: 'logarithm' is the default, 'log_colored' gives a blue/red colored display. Ear response curve: Selects an additional attenuation (dB) to be added to the power spectrum to model the ear response. No correction: The attenuation is 0 dB at all frequencies. Human 1: The attenuation reflects the human ear response curve according to measurements published in: Masakazu Konishi, "How the owl tracks its prey.", 1973, American Scientist. Buffer Size: This is a read-only number. This tells how many new samples move in the signal window at successive sonogram computations. The same number of samples move out of the window at the other end. Draw staff lines: Whether or not you need the staff lines. High pass: Batch PS: This control has no effect on how the sound is processed. It controls the way the power spectra computations are scheduled. When the tick is marked, all the power spectra displayed in one paint cycle are computed in a single batch without exchanging signals with other threads. When the tick is unmarked, a signal is sent when a new power spectrum will be needed. Given the fact that Windows and Linux are not real-time operating systems it is better to limit the message traffic as much as possible and select the batch mode here. WHAT'S ON THE PANELS WINDOW. Top panel: Shows a segment of the signal but not necessarily of the same size as the FFT Size. Middle panel: This shows the power spectrum computed from the FFT of the latest signal window. Right slider: This adjusts the gain of the sonogram panel. Eventually, it will also adjust the gain of the FFT panel. Lower panel: This is the sonogram panel. This is a sequence of power spectras with the power values translated into color or gray level pixels. The horizontal axis shows the time and the vertical axis the frequency. Move the cursor in this window and see the values of time and frequency printed in the bar on the bottom of the window. Bottom left selector: This provides a choice of three ways to select the paint refresh rate. VBlank: Painting is done synchronized with the monitor VBlank signal if this works. Windows: The windowing system generally knows the refresh rate selected for the monitor. This is currently the value used at program startup but the default will eventually be VBlank. If windows can't tell the monitor frequency, this choice will revert to manual mode and a 60 Hz refresh rate. Manual: Use the slider on the right to select a refresh rate of your choice. The painting is then synchronized by the computer clock instead of the monitor. VB2: Experimental mode targeted for the LE1700 which has Intel video. This setup shows unexpected behavior which still needs investigation. Bottom slider: Sets the refresh rate. This number should be the monitor refresh rate but you can use other values. When Sonasound starts it tries to set this number to the value of the monitor refresh rate as known to Windows or X11. If Sonasound can't get the refresh rate that way, it uses 60 frames per second. Setting that slider to '0' activates 'VBlank sync' or tries to. Then Sonasound uses the measured value of the frame rate. Depending on the hardware combination 'VBlank sync' may not work with Sonasound. This is yet a tricky issue. Bottom right spinner: Sets the number of FFT's or LPC's to be painted at once in the video buffer (typically every monitor refresh cycle). The spinner allows a selection between 1 and 16. A typical LCD monitor will refresh every 1/60th second. The number here tells how many pixels to slide the sonogram panel towards the left. If the number is 4 then the panel is slided by 4 pixels on the left every 1/60th second, and four pixel columns are drawn one for each successive power spectrum. In this example power spectra need to be computed at the rate of 240 FFT/LPC per second (240 = 60 times 4). OTHER PRACTICAL POINTS * Some of the parameters can be set on the command line when starting the program. Try 'sonasound -h' to get the list of available parameters. Here is how I start sonasoundp, sonasound -D 2 -f 1024 -w 1 -g l -d 1 -l 25 In my case Device 2 is the webcam microphone, -f 1024 gives an FFT size of 2048, -w 1 for the Hanning asymmetric window, -g l for LPC processing -d 1 for the linear display -l 25 for a LPC size of 240. On Windows, if you use the already compiled program sonasound.exe, 'sonasound -h' won't print anything since the console window is disabled in this build (load option -mwindows). However you can still use a command line as above to start sonasound. For your convecience, the output of 'sonasound -h' is included at the end of this README. * Currently SonasoundP handles only Mono (as opposed to Stereo or multi-channel) sound. FUTURE DEVELOPMENT I publish updates when I learn new ways to improve the performance and functionality. The ultimate aim is to provide a practical tool for foreign language students, running on popular platforms. SUPPORT Support is throught the public forum on the SourceForge site. The focus at this point is to provide a stable program on main platforms. NOTES FFT Abbreviation for Fast Fourier Transform. This also means the Fourier transform of a window of the signal. In the extensive sense it often denotes the power spectrum of the FFT to contrast with the power spectrum of a LPC filtered signal. LPC Linear Predictive Coding. This denotes a simple signal filter model of the vocal tract. The parameters of that model are estimated for each window of the input signal. The LPC display shows the power spectrum of the output of that estimated filter when feeded with white noise. HOW TO START SONASOUNDP FROM A COMMAND LINE # sonasound.exe -h Usage: ./sonasoundp [-a Size] [-b Freq] [-d {0,1,2}] [-D {0,1,...}] [-e {0,1}] [-D {0,1,...}] [-f Size] [-g {f,l}] [-H Coeff] [-l Size] [-n {0,1}] [-s Number from list] [-v] [-w {0,...,5}] SonasoundP Copyright (C) 2009-2014 Richard Lemieux Sonasound Copyright (C) 2002-2003 Niklas Werner Please note that one needs to first run 'sonasound' and check the available options from the controlsWindow to find meaningful options for parameters D and s below. -a: FFT size. Any number up to 65535. Use "-f" instead unless you have specific needs. -b: Reference A frequency in [Hz] for the Staff-lines -d: Display grid: 0: logarithmic gray levels, 1: linear gray levels, 2: logarithmic colored -D: Input Audio device: First: 0, Second: 1, ... Disable: -1 -e: Ear response curve: 0: No correction, 1: Human 1 -f: FFT size: up to 65535 rounded to NextPowerOfTwo (512,1024,...). See also "a". -g: Spectrum type: f: FFT, l: LPC -H: HighPass-filter coefficient: a number in the range (0.001, 1.0) default: 0.5 -l: LPC size: up to 65535 though only up to 256 makes sense. -n: Draw staff-lines: 0: No, 1: Yes. -s: SamplingRate: a number available for the selected audio device (44100 or 48000 ?) -v: Verbosity. Prints lots of data used by the programmer when he tests the program. -w: FFT Window type: 0: Hamming, 1: Hanning, 2: Blackman, 3: Bartlett, 4: Kaiser, 5: rectangular -h: Print this help SOFTWARE REVISIONS SVN225. Single thread mode.
Source: README, updated 2016-10-13