SonasoundP - Real-time phonetics - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.
Name	Modified	Size
README	2016-10-13	20.0 kB
sonasoundp.exe	2015-12-19	5.5 MB
sonasound.exe	2014-10-27	5.3 MB
Totals: 3 Items		10.8 MB
Title:      README for SonasoundP
Author:     Richard Lemieux
Date:       2013-04-09
Last time reviewed: 2015-08-10
Copyright:  2013,1014 Richard Lemieux

SonasoundP Copyright (C) 2009-1014 Richard Lemieux under GNU GPL Version 3 and
  higher versions.

Sonasound Copyright (C) 2002-2003 Niklas Werner

READY TO RUN PROGRAMS FOR WINDOWS

  sonasoundp.exe: The program runs well on my non-throttled Windows10
    workstation.

    This build also work fine on an older LE1700 tabletPC featuring a Core2
    dual CPU and an Intel video chipset.  VBlank synchronization is not
    supported by OpenGL on that older platform but it works well in Manual sync
    mode by setting the bottom slider to 60 on the Panels window..

    SonasoundP may segfault the first time it runs.  That may be related to
    fact that the program creates and uses a small file to hold FFT3W
    parameters.  Otherwise, SonasoundP looks quite stable. The parameter file
    is named "sonasound.fftw_wisdom3_threads".

  sonasound.exe (deprecated): This older edition ran on Windows8 and earlier
    versions of Windows.

  The Windows programs are compiled using a MingW setup.

  I didn't figure out yet how to make a ready-to-run program for the Linux
  version.

PROLOGUE

This software is provided as is and without any warranty of any kind and
under the GNU copyright agreement.

This software is a follow-up to Niklas Werner SonaSound that is distributed
under the GNU copyright from the sourceforge.net site and includes material
copyrighted by Niklas Werner.

Differences between Niklas's original SonaSound and SonasoundP are listed
in the initial entry of the ChangeLog file.

SonasoundP uses a simple pipeline architecture that should provide a smooth
user experience on many types of computers but this simple design results in
glitches and stuttering of the sonogram display in some cases.  The current
design is more resilient than the earlier design was, and the hope is that the
next design will improve on the current design still further.

Sonasound allows a wide range of choices of the display speed, FFT size and
sample rate and some combinations of parameters may give poor results depending
on the computer and the operating system.  The idea is that one can hopefully
find a usable combination of parameters appropriate to the computer/operating
system combination in use.


ACKNOWLEDGEMENTS

This project uses software and information from sources including.

  - PortAudio  <www.portaudio.com/>
      This makes audio input very simple.
  - FLTK       <www.fltk.org/>
      The  portable lightweight windowing system.
  - FFTW       <www.fftw.org/>
      A fantastic library of extremely efficient FFT functions.
  - MINGW      <www.mingw.org/>
      The GNU develpment system on Windows.
  - GLFW       <www.glfw.org/>
      The library of timing functions.
  - Microsoft  <http://msdn.microsoft.com/>
      Excellent documentation on Windows API's.
  - Windows community forums such as
               <stackoverflow.com> and many others.
  - Linux documentation teams and community forums.

and of course so many other people and projects.


PURPOSE

SonasoundP connects to (mono) sound input from the sound card or a USB
microphone and displays a real-time sonogram on the display.

The intent is to allow students of foreign languages to see features of
the sound signal as they speak.

I have been using SonasoundP while learning Chinese pronunciation.  I have used
it in four different ways:

  1. Use the logarithmic FFT display to practice the tones,

  2. Use the logarithmic FFT display to practice unvoiced consonants such as 'd',
  3. Make sure to separate the syllabes with no liaison in between, and

  4. Use the linear LPC display to visualize the formants and compare with
     native speakers.  The formants appear as lines on the LPC display and
     represent resonances in the vocal tract.  The resonances depend on the
     shape of the vocal tract as a result of the position of the tongue and
     other parts of the vocal tract.
       The formants do not depend on tones.  In fact you can see some of the
     formants even when you don't voice speech if the microphone is close
     enough to the mouth.  While vocal chords produce a sound of one frequency
     and harmonics, unvoiced sound is mostly noise and it includes a wide
     range of frequencies; the flow noises are not white noise and don't come
     from the focal chords location though.  The vocal tract removes
     frequencies from the initial mix.  So, move all parts of the mouth in all
     kinds of combinations to see which formants results.  You can also see the
     formants on the FFT display, but then make your task easier by generating
     more harmonics by lowering the tone or try unvoiced speech.
       Unvoiced as well as voiced speech also includes spectrally colored sound
     created by a process called venturi collapse.  This process occurs when
     air is accelerated at a stricture between the soft tongue and another
     surface such as the palate or the teeths and the pressure suddenly drops
     causing a deformation of the softer material and leading to a repetitive
     process.  (See "https://en.wikipedia.org/wiki/Formant")

   The sonogram display lags a bit behind the sound and the delay depends
   mostly on the size of the FFT window and the audio packet delivered by
   PortAudio.  The resulting lag is 1/10 s for a 16,000 sample/s signal and a
   1024 FFT size (1/15 s from the FFT window and 1/30 from the PortAudio
   packet); this lag translates in 30 pixel columns on the display at 5x
   acceleration.
     So it is not that easy to relate the video to the sound when monitoring a
   fast speaker.

HARDWARE REQUIREMENTS

1. A video card that provides hardware accelerated OpenGL.  Some OpenGL
   implementations will allow synchronization between the program and the
   monitor and others won't.

2. A fast enough processor.  SonasoundP uses nearly 20% of one CPU on my
   2.4 GHz machine.  So I guess the machine should be at least 500 MHZ.


INSTALLATION OF SonasoundP

  See the INSTALL file for compilation instructions.  The basic problem is to
  install all needed libraries and dependencies and edit the appropriate
  Makefile accordingly.

  Otherwise this distribution includes a binary compiled for Windows and ready
  to run.


OPERATING INSTRUCTIONS

  1.  Start the program.  Two windows will appear on the computer display:
      a Controls window and a Panels window.

  2.  In the Controls window, you need to

      1. Click on button 'Audio device' to select the input audio device to
         connect to.  If the microphone you want to connect to is not listed,
         it may be because some other program such as Skype is already using
         it;  close other programs that may be using the microphone and
         restart SonasoundP.
           The audio device will likely be either the microphone jack of the
         sound card or the USB microphone attached to the webcam.
           Then check 'Sampling rate' just on the right. Click on the arrow
         head and select 16,000 or a number close to 16,000.

      2. Click button 'Open/Close Device' to connect to the device you just
         selected.

      3. Click button 'Start/Stop All' to start the sound processing and
         the drawing on the panel window.

  3.  Start talking and watch the spectrogram on the panels window.

      1. Make sure the sound is loud enough to fill half of the top signal
      window.  Adjust with the computer mixer program.  In Windows click on
      'Sound/Recording' and then on 'Microphone properties/Levels'. In Linux it
      depends on the distribution.  I use 'alsamixer' in an xterm terminal to
      adjust the microphone level.

      2. The panels in the Panels window can be resized by dragging the panel
      boundaries with the mouse.

      3. Click on the left mouse button when the cursor is over a panel to
      freeze / unfreeze the display.

  3.  Possible problems.

      1. The program seg faults when starting.  This may occur once in a while,
      but the program is stable once started.  This is probably caused by a
      variable that is not properly initialized at startup but which variable
      it is has escaped all code reviews up to now.

      1. The animation stuttering/tearing can be removed by selecting VBlank
      using the bottom right button on the panelsWindow.  The result may depend
      on the computer and the operating system.

      2. The sonogram display appears to move too slowly or the time diplay
      shown when moving the mouse in the sonogram panel is too small.  This is
      an indication that the video card is imposing VBlank synchronization and
      is not allowing applications to set/unset it under program control.
        Sonasound needs the capability to control the synchronization with the
      monitor refresh rate when using VBlank synchro mode because SonasoundP
      makes use of three GL contexts and just one of them needs to wait for
      the VBlank. Otherwise if the three panels are made to wait then the
      panels together run at one third of the expected rate.
        Under Linux with a NVIDIA video card I need to unset the 'Sync to
      VBlank' option using program 'nvidia-settings' under 'X Screen / OpenGL
      Settings'.
        If using an NVIDIA card under Windows, open the NVIDIA control panel
      and select 'Use the 3D application setting' under 'Manage 3D settings /
      Vertical sync'.
        I did not meet that issue with the Intel video chipset on my slate PC
      since the older OpenGL version used on that Intel Chipset does not
      provide any GL extension for VBlank control.


MORE CONTROLS ON THE CONTROLS WINDOW.

    Sampling Rate:  I use 16,000 samples per second for speech. This is
    good enough for language training.  If you use a large number such as 44100
    you will need to increase the 'FFT size' and 'LPC size' in proportion.
      The sampling rate is what determines the 'scale' between samples and
    seconds.  The signal processing algorithms act on samples.  The relation
    between those results and the world of sound is set by the sampling rate.
      A word or caution.  In Windows the selection provided in the 'Sampling rate'
    menu reflects 'resampled' signals and may have no relation with the actual
    bandwidth of the microphone or the actual sample rate provided by the USB
    microphone.  For example I use a USB camera microphone returning samples
    at a fixed 16,000 samples/s rate, but Windows offers 48,000 in the selection.

    FFT Size: 512 or 1024 is OK.  You may change the size to your liking.  This
    is the size of the sample window used to compute the FFT or the LPC
    parameters. Don't forget to increase the 'Sampling rate' if you want more
    resolution here.

    LPC size: Used when 'Spectrum Type' is LPC.  A good starting value is 25 if
        'Sampling rate' is 16,000 samples/second and the 'FFT size' is 1024.
        You may want to experiment a bit here.

    FFT window type: This has nothing to do with the computer monitor.  Think
        of it as a curtain overlaid over each successive segment of the audio
        signal and attenuating the signal so it is zero on the edges an beyond.
          All choices except 'Rectangular' are good here.  'Rectangular' means
        that the audio signal is left unchanged in the window and is zero
        everywhere else; this choice results in many artifacts in the sonogram.
          The FFT is computed on the resulting windowed signal.
          The width of the window is the 'FFT size'.

    Spectrum Type: Select either FFT or LPC.  FFT is OK.  LPC with an
        appropriate value of 'LPC size' might give you a clearer display of the
        formants.  When the pitch is high it may be hard to see some formants
        on the FFT display.  However there as times when the FFT display is
        more informative though.

    Display Grid: 'logarithm' is the default, 'log_colored' gives a
        blue/red colored display. 

    Ear response curve:  Selects an additional attenuation (dB) to be
        added to the power spectrum to model the ear response.
     No correction:  The attenuation is 0 dB at all frequencies.
     Human 1:  The attenuation reflects the human ear response curve according
               to measurements published in: Masakazu Konishi, "How the owl
               tracks its prey.", 1973, American Scientist.

    Buffer Size:  This is a read-only number.  This tells how many new
        samples move in the signal window at successive sonogram
        computations.  The same number of samples move out of the window
        at the other end.

    Draw staff lines:  Whether or not you need the staff lines.

    High pass:

    Batch PS: This control has no effect on how the sound is processed.  It
        controls the way the power spectra computations are scheduled.  When
        the tick is marked, all the power spectra displayed in one paint cycle
        are computed in a single batch without exchanging signals with other
        threads.  When the tick is unmarked, a signal is sent when a new power
        spectrum will be needed. Given the fact that Windows and Linux are not
        real-time operating systems it is better to limit the message traffic
        as much as possible and select the batch mode here.


WHAT'S ON THE PANELS WINDOW.

    Top panel:  Shows a segment of the signal but not necessarily of the same
        size as the FFT Size.

    Middle panel: This shows the power spectrum computed from the FFT of the
        latest signal window.

    Right slider:  This adjusts the gain of the sonogram panel.
        Eventually, it will also adjust the gain of the FFT panel.

    Lower panel: This is the sonogram panel.  This is a sequence of power
        spectras with the power values translated into color or gray level
        pixels. The horizontal axis shows the time and the vertical axis the
        frequency.  Move the cursor in this window and see the values of time
        and frequency printed in the bar on the bottom of the window.

    Bottom left selector:  This provides a choice of three ways to select the
        paint refresh rate.

        VBlank: Painting is done synchronized with the monitor VBlank signal if
          this works.

        Windows: The windowing system generally knows the refresh rate selected
          for the monitor.  This is currently the value used at program startup
          but the default will eventually be VBlank.  If windows can't tell
          the monitor frequency, this choice will revert to manual mode and
          a 60 Hz refresh rate.

        Manual: Use the slider on the right to select a refresh rate of your
          choice.  The painting is then synchronized by the computer clock
          instead of the monitor.

        VB2: Experimental mode targeted for the LE1700 which has Intel video.
           This setup shows unexpected behavior which still needs investigation.

    Bottom slider: Sets the refresh rate.  This number should be the monitor
        refresh rate but you can use other values.  When Sonasound starts it
        tries to set this number to the value of the monitor refresh rate as
        known to Windows or X11. If Sonasound can't get the refresh rate that
        way, it uses 60 frames per second.
          Setting that slider to '0' activates 'VBlank sync' or tries to.  Then
        Sonasound uses the measured value of the frame rate.  Depending on the
        hardware combination 'VBlank sync' may not work with Sonasound.  This
        is yet a tricky issue.

    Bottom right spinner: Sets the number of FFT's or LPC's to be painted at
        once in the video buffer (typically every monitor refresh cycle).  The
        spinner allows a selection between 1 and 16.  A typical LCD monitor
        will refresh every 1/60th second.  The number here tells how many
        pixels to slide the sonogram panel towards the left.  If the number is
        4 then the panel is slided by 4 pixels on the left every 1/60th second,
        and four pixel columns are drawn one for each successive power
        spectrum.  In this example power spectra need to be computed at the
        rate of 240 FFT/LPC per second (240 = 60 times 4).


OTHER PRACTICAL POINTS

  * Some of the parameters can be set on the command line when starting
    the program.  Try 'sonasound -h' to get the list of available parameters.

    Here is how I start sonasoundp,

    sonasound  -D 2 -f 1024 -w 1 -g l -d 1 -l 25

        In my case Device 2 is the webcam microphone,
                   -f 1024 gives an FFT size of 2048,
                   -w 1    for the Hanning asymmetric window,
                   -g l    for LPC processing
                   -d 1    for the linear display
                   -l 25   for a LPC size of 240.

    On Windows, if you use the already compiled program sonasound.exe,
    'sonasound -h' won't print anything since the console window is disabled in
    this build (load option -mwindows).  However you can still use a command
    line as above to start sonasound. For your convecience, the output of
    'sonasound -h' is included at the end of this README.

  * Currently SonasoundP handles only Mono (as opposed to Stereo or
    multi-channel) sound.


FUTURE DEVELOPMENT

  I publish updates when I learn new ways to improve the performance and
  functionality.

  The ultimate aim is to provide a practical tool for foreign language
  students, running on popular platforms.


SUPPORT

  Support is throught the public forum on the SourceForge site.
  The focus at this point is to provide a stable program on main platforms.


NOTES

  FFT Abbreviation for Fast Fourier Transform.  This also means the Fourier
      transform of a window of the signal.  In the extensive sense it often
      denotes the power spectrum of the FFT to contrast with the power spectrum
      of a LPC filtered signal.

  LPC Linear Predictive Coding.  This denotes a simple signal filter model of
      the vocal tract.  The parameters of that model are estimated for each
      window of the input signal.  The LPC display shows the power spectrum
      of the output of that estimated filter when feeded with white noise.


HOW TO START SONASOUNDP FROM A COMMAND LINE

# sonasound.exe -h

Usage: ./sonasoundp [-a Size] [-b Freq] [-d {0,1,2}] [-D {0,1,...}] [-e {0,1}]
          [-D {0,1,...}] [-f Size] [-g {f,l}] [-H Coeff] [-l Size] [-n {0,1}]
          [-s Number from list] [-v] [-w {0,...,5}] 

SonasoundP Copyright (C) 2009-2014 Richard Lemieux
Sonasound Copyright (C) 2002-2003 Niklas Werner

	Please note that one needs to first run 'sonasound' and check the available options
         from the controlsWindow to find meaningful options for parameters D and s below.
-a:	FFT size. Any number up to 65535. Use "-f" instead unless you have specific needs.
-b:	Reference A frequency in [Hz] for the Staff-lines
-d:	Display grid: 0: logarithmic gray levels, 1: linear gray levels, 2: logarithmic colored
-D:	Input Audio device: First: 0, Second: 1, ... Disable: -1
-e:	Ear response curve: 0: No correction, 1: Human 1
-f:	FFT size: up to 65535 rounded to NextPowerOfTwo (512,1024,...). See also "a".
-g:	Spectrum type:  f: FFT, l: LPC
-H:	HighPass-filter coefficient: a number in the range (0.001, 1.0) default: 0.5
-l:	LPC size: up to 65535 though only up to 256 makes sense.
-n:	Draw staff-lines: 0: No, 1: Yes.
-s:	SamplingRate: a number available for the selected audio device (44100 or 48000 ?)
-v:	Verbosity.  Prints lots of data used by the programmer when he tests the program.
-w:	FFT Window type:
	 0: Hamming, 1: Hanning,
 	 2: Blackman, 3: Bartlett, 4: Kaiser, 5: rectangular
-h:	 Print this help


SOFTWARE REVISIONS

  SVN225.  Single thread mode.
Source: README, updated 2016-10-13
SonasoundP - Real-time phonetics Files