File Release Notes and Changelog
Release Name: v4.8
Notes:
User support page :
http://sourceforge.net/donate/index.php?group_id=40316
Copyright 2001 - 2005 Matt Flax <flatmax at ieee d0t org>
This application stretches and compresses audio without altering the frequency
character of the audio. For reasonable factors, this application will scale
audio without altering signal levels or introducing artifacts (in the ideal
implementation).
This application can be used on ambisonic recordings because it keeps the
channels in sync.
Histrory for the curious ...
The v4.x stream of this project is now about to migrate to stable.
It targets embedded operation, which is lighter and more efficient on desk top
resources (as well as embedded solutions). The memory footprint is approaching
about 2.5 k Bytes to 7.5 k bytes for a mono stream of audio and about 5 k bytes
to 15 k Bytes for stereo streams. At this stage the DFT based approach adopted
in version 3.x is stable and has excellent sound quiality even for very fast
and very slow time scalings.
In this first version (v4.0) the code has been completely re-worked and this
has speed up operation to a large degree.
Did I forget to mention the newly added type II filter, which is written in the
multimedia time code package ? Check my publicly available projects page.
As of Version 3.0, this implementation of WSOLA is now approximatly six times
faster then real time (800MHz CPU with coprocessor).
It is completely stable.
Microsoft takes approximatly the same ammout of time. (Using Cygwin GNU*NIX
translation)
Requirements :
* This program can read alot of file types because of the wrapper to libsndfile :
http://sourceforge.net/projects/mffmlibsndfilew/
* This program requires an installed version of MFFM multimedia time code
handling classes. Try :
http://mffmtimecode.sourceforge.net/
For fast operation (> v 3.* only), you will also require MFFM FFTw C++ wrapper.
Try:
http://mffmfftwrapper.sourceforge.net/
Audio files are read and written using LibSndFile v1 :
http://www.zip.com.au/~erikd/libsndfile/
Finally you require a C++ compiler, try :
http://gcc.gnu.org/install/binaries.html
http://www.cygwin.com (Microsoft users)
MS Windows BINARY users wiil require the file 'cygwin1.dll'. If it is not
shipped with this zip package then please try to find it at Cygwin:
http://www.cygwin.com
My other projects :
http://sourceforge.net/search/?type_of_search=soft&words=mffm
This project's Home Page :
http://mffmtimescale.sourceforge.net
MFFM Time Scale Modification for Audio is 2 things :
a] A compilable program WSOLATest.C which allow you to time stretch and compress
mono audio files. Audio files are restricted to be mono 16 bit frame sized.
b] A set of 2 header files which are the implementation of [1].
For simple use ....
Type 'make' and compile the program WSOLATest
Run WSOLA like so :
WSOLA inputFile outputFile factor
factor = 0.5 for halving the duration of an audio file
factor = 2.0 for doubling the duration of an audio file
factor = 1.0 for an identical file.
[1]"An overlap-add technique based on waveform similarity (WSOLA)
for high quality time-scale modification of speech",
Verhelst, W.; Roelands, M.
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93.,
1993 IEEE International Conference on On page(s): 554 - 557 vol.2
27-30 April 1993 Minneapolis, MN, USA 1993
Volume: 2
ISBN: 0-7803-0946-4
Number of Pages: 5 vol. (652+735+606+559+681)
References Cited: 4
INSPEC Accession Number: 4771035
Abstract:
A concept of waveform similarity for tackling the problem of
time-scale modification of speech is proposed. It is worked
out in the context of short-time Fourier transform representations.
The resulting WSOLA (waveform-similarity-based synchronized
overlap-add) algorithm produces high-quality speech output,
is algorithmically and computationally efficient and robust, and
allows for online processing with arbitrary time-scaling factors
that may be specified in a time-varying fashion and can be chosen
over a wide continuous range of values.
Changes:
Version 4.8 28/04/2005
* Altered WSOLA to allow people to shift it's time code reference on
this allows one to bypass WSOLA when the speed is unity (1.0) and
still use WSOLA as the mutlimedia master sync.
* Altered the example WSOLA4Audiere to play the original waveform
when the rate is set to unity.
Version 4.7 22/04/2005
* Altered the values HANNING_DURATION and DELTA_DIVISOR in WSOLA.H to get
better sound quality and operation in the example of WSOLA4Audiere.
Version 4.6 20/04/2005
* Various changes to examples/WSOLA4Audiere.H including compatability with
Audiere version 1.4.4 (CVS version). Documentation now exists for this
header ... check the html directory.
Version 4.4 11/04/2005
* More bug fixes. Sound quality is greatly improved. Incorperated an
average estimation mechanism.
* doc fixes to reflect new algorithm
* WSOLA4Audiere may need tweakling to get it to work in this version...
stay tuned for a working version
Version 4.4 01/04/2005
* Began WSOLA4Audiere. Fits WSOLA into Audiere (audiere.sf.net).
Appears to work to some degree. Expect changes in the next few weeks.
* Added a reset function to WSOLA. As well as various other methods :
getFrameSize, setPosition, checkPositions.
* Altered method copyBestMatch to return void (nothing).
Version 4.4 March 05
* Resetting everything to operate as embedded - only mode now !
* Removed old WSOLATest files in place for new
* API change ... check WSOLATest.v4.C
* Major changes in WSOLA.H
Version 4.3 27/02/05
* Altered algorithm to work from either memory or files. You can now simply
apply WSOLA to memory streams. This approaches a complete embedded solution.
* Constructed an example embedded file.
* Altered the initProcess(...) function, now requires the initial tau as an
extre argument.
* Removed unnecessary file read in the standard WSOLA method (non-embedded).
Version 4.2 17/01/05
* Slowing audio truncation fix.
Fixed stop criterion, this should now work for both time compression and
time expansion.
Version 4.1 30/11/04
* Removed a channel count read error. Multichannel now works well.
* Shifted compilation version up to 4.
* Still a known error that when slowing audio (tau>1.0), output file is
truncated.
Version 4.0 16/11/04
* Dynamic tau : dynamic speed change with embedded WSOLA. The aim of
this development arm is to implement embedded WSOLA. Many commercial
editors would use such an engine to speed up and slow down music.
They would do so by running Embedded WSOLA on each multichannel
track. If WSOLA is run again on the master bump out multi channel
then you may alter with large BPM range. Don't use this version to
implement such a player just yet. You HAVE to use multichannel
embedded WSOLA - once it is debugged.
This function is also usefull for the latest spate of mp3 players.
Did you know that blind people like to listen to audio faster then
seeing people ? Catch all the news at : http://www.daisy.org
* Memory footprint : The footprint is of the order of 2.5 kB per
channel @ 44.1 kHz sample rate. This footprint will linearly scale
with sample frequency. This is a cute footprint already !
* Waiting for dust to settle in my new file access methods.
* Oh yeah - the multimedia time code library now packs a type II filter which
can handle large polynomials. So it can probably handle some type of EQ
design mech. on the front end.
Version 3.8 15/11/04
* First release of new theory - for testing - high quality FFT based
implementation. This implements the file 'hybridDomainProcessing.pdf'
also released with this project
Version 3.7 12/11/04
* worked out the theory for implementing FT based WSOLA correctly. This should
replace the current method in V2 and render the quality as the same for V1.
Read the TODO and hybridDomainProcessing.pdf for more information.
Version 3.6 11/11/04
* Tested with other MFFM projects on sourceforge ... compiles
correctly.
* Changed README file
* Removed libsndfile.H in favour of MFFM_libsndfilew package
http://sourceforge.net/search/?type_of_search=soft&words=mffm
Version 3.5 05/04/04
* Fixed WSOLA.v2 to work with mffmfftwrapper (fftw3) v1.4
Version 3.4 08/08/03
* Fixed libSndFileWrapper.H
Version 3.3 28/02/03
* Switched to using libsndfile version 1.x.x from 0.x.x
* Upon noting that WSOLA v1 gave better compression quality then WSOLA v2,
both v1 and v2 have seperate executables.
Version 3.2 08/02/03
* Fixed maximum similarity scan to check for only relevant channel matches.
This was an unknown bug.
* Removed the v2 similarity check mechanism. Thesde remain resident in
(WSOLA.v2.H) for those who are interested.
Version 3.1 24/01/03
* Documentation included listing implementation change from v2.x to v3.x
Version 3.0
* First version to use FFTing for similarity checks (must define USE_FFT to use)
* WSOLA now runs at 4*realtime (4 times faster then realtime)
Version 2.8
* See Version 3.4
Version 2.7
* Included cygwin1.dll in the windows zip file.
Version 2.6
* Fixed sample rate problems by making it a variable
* Recompiled for win32 using cygwin ... greatly improves performance on win32
Version 2.5
* Fixed libSndFileWrapper.H to work with multi channels
* Added multi channel functionality.
* Changelog Started