Donate Share

MFFM Time Scale Modification for Audio

File Release Notes and Changelog

Release Name: v4.8

Notes: User support page : http://sourceforge.net/donate/index.php?group_id=40316 Copyright 2001 - 2005 Matt Flax <flatmax at ieee d0t org> This application stretches and compresses audio without altering the frequency character of the audio. For reasonable factors, this application will scale audio without altering signal levels or introducing artifacts (in the ideal implementation). This application can be used on ambisonic recordings because it keeps the channels in sync. Histrory for the curious ... The v4.x stream of this project is now about to migrate to stable. It targets embedded operation, which is lighter and more efficient on desk top resources (as well as embedded solutions). The memory footprint is approaching about 2.5 k Bytes to 7.5 k bytes for a mono stream of audio and about 5 k bytes to 15 k Bytes for stereo streams. At this stage the DFT based approach adopted in version 3.x is stable and has excellent sound quiality even for very fast and very slow time scalings. In this first version (v4.0) the code has been completely re-worked and this has speed up operation to a large degree. Did I forget to mention the newly added type II filter, which is written in the multimedia time code package ? Check my publicly available projects page. As of Version 3.0, this implementation of WSOLA is now approximatly six times faster then real time (800MHz CPU with coprocessor). It is completely stable. Microsoft takes approximatly the same ammout of time. (Using Cygwin GNU*NIX translation) Requirements : * This program can read alot of file types because of the wrapper to libsndfile : http://sourceforge.net/projects/mffmlibsndfilew/ * This program requires an installed version of MFFM multimedia time code handling classes. Try : http://mffmtimecode.sourceforge.net/ For fast operation (> v 3.* only), you will also require MFFM FFTw C++ wrapper. Try: http://mffmfftwrapper.sourceforge.net/ Audio files are read and written using LibSndFile v1 : http://www.zip.com.au/~erikd/libsndfile/ Finally you require a C++ compiler, try : http://gcc.gnu.org/install/binaries.html http://www.cygwin.com (Microsoft users) MS Windows BINARY users wiil require the file 'cygwin1.dll'. If it is not shipped with this zip package then please try to find it at Cygwin: http://www.cygwin.com My other projects : http://sourceforge.net/search/?type_of_search=soft&words=mffm This project's Home Page : http://mffmtimescale.sourceforge.net MFFM Time Scale Modification for Audio is 2 things : a] A compilable program WSOLATest.C which allow you to time stretch and compress mono audio files. Audio files are restricted to be mono 16 bit frame sized. b] A set of 2 header files which are the implementation of [1]. For simple use .... Type 'make' and compile the program WSOLATest Run WSOLA like so : WSOLA inputFile outputFile factor factor = 0.5 for halving the duration of an audio file factor = 2.0 for doubling the duration of an audio file factor = 1.0 for an identical file. [1]"An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech", Verhelst, W.; Roelands, M. Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on On page(s): 554 - 557 vol.2 27-30 April 1993 Minneapolis, MN, USA 1993 Volume: 2 ISBN: 0-7803-0946-4 Number of Pages: 5 vol. (652+735+606+559+681) References Cited: 4 INSPEC Accession Number: 4771035 Abstract: A concept of waveform similarity for tackling the problem of time-scale modification of speech is proposed. It is worked out in the context of short-time Fourier transform representations. The resulting WSOLA (waveform-similarity-based synchronized overlap-add) algorithm produces high-quality speech output, is algorithmically and computationally efficient and robust, and allows for online processing with arbitrary time-scaling factors that may be specified in a time-varying fashion and can be chosen over a wide continuous range of values.


Changes: Version 4.8 28/04/2005 * Altered WSOLA to allow people to shift it's time code reference on this allows one to bypass WSOLA when the speed is unity (1.0) and still use WSOLA as the mutlimedia master sync. * Altered the example WSOLA4Audiere to play the original waveform when the rate is set to unity. Version 4.7 22/04/2005 * Altered the values HANNING_DURATION and DELTA_DIVISOR in WSOLA.H to get better sound quality and operation in the example of WSOLA4Audiere. Version 4.6 20/04/2005 * Various changes to examples/WSOLA4Audiere.H including compatability with Audiere version 1.4.4 (CVS version). Documentation now exists for this header ... check the html directory. Version 4.4 11/04/2005 * More bug fixes. Sound quality is greatly improved. Incorperated an average estimation mechanism. * doc fixes to reflect new algorithm * WSOLA4Audiere may need tweakling to get it to work in this version... stay tuned for a working version Version 4.4 01/04/2005 * Began WSOLA4Audiere. Fits WSOLA into Audiere (audiere.sf.net). Appears to work to some degree. Expect changes in the next few weeks. * Added a reset function to WSOLA. As well as various other methods : getFrameSize, setPosition, checkPositions. * Altered method copyBestMatch to return void (nothing). Version 4.4 March 05 * Resetting everything to operate as embedded - only mode now ! * Removed old WSOLATest files in place for new * API change ... check WSOLATest.v4.C * Major changes in WSOLA.H Version 4.3 27/02/05 * Altered algorithm to work from either memory or files. You can now simply apply WSOLA to memory streams. This approaches a complete embedded solution. * Constructed an example embedded file. * Altered the initProcess(...) function, now requires the initial tau as an extre argument. * Removed unnecessary file read in the standard WSOLA method (non-embedded). Version 4.2 17/01/05 * Slowing audio truncation fix. Fixed stop criterion, this should now work for both time compression and time expansion. Version 4.1 30/11/04 * Removed a channel count read error. Multichannel now works well. * Shifted compilation version up to 4. * Still a known error that when slowing audio (tau>1.0), output file is truncated. Version 4.0 16/11/04 * Dynamic tau : dynamic speed change with embedded WSOLA. The aim of this development arm is to implement embedded WSOLA. Many commercial editors would use such an engine to speed up and slow down music. They would do so by running Embedded WSOLA on each multichannel track. If WSOLA is run again on the master bump out multi channel then you may alter with large BPM range. Don't use this version to implement such a player just yet. You HAVE to use multichannel embedded WSOLA - once it is debugged. This function is also usefull for the latest spate of mp3 players. Did you know that blind people like to listen to audio faster then seeing people ? Catch all the news at : http://www.daisy.org * Memory footprint : The footprint is of the order of 2.5 kB per channel @ 44.1 kHz sample rate. This footprint will linearly scale with sample frequency. This is a cute footprint already ! * Waiting for dust to settle in my new file access methods. * Oh yeah - the multimedia time code library now packs a type II filter which can handle large polynomials. So it can probably handle some type of EQ design mech. on the front end. Version 3.8 15/11/04 * First release of new theory - for testing - high quality FFT based implementation. This implements the file 'hybridDomainProcessing.pdf' also released with this project Version 3.7 12/11/04 * worked out the theory for implementing FT based WSOLA correctly. This should replace the current method in V2 and render the quality as the same for V1. Read the TODO and hybridDomainProcessing.pdf for more information. Version 3.6 11/11/04 * Tested with other MFFM projects on sourceforge ... compiles correctly. * Changed README file * Removed libsndfile.H in favour of MFFM_libsndfilew package http://sourceforge.net/search/?type_of_search=soft&words=mffm Version 3.5 05/04/04 * Fixed WSOLA.v2 to work with mffmfftwrapper (fftw3) v1.4 Version 3.4 08/08/03 * Fixed libSndFileWrapper.H Version 3.3 28/02/03 * Switched to using libsndfile version 1.x.x from 0.x.x * Upon noting that WSOLA v1 gave better compression quality then WSOLA v2, both v1 and v2 have seperate executables. Version 3.2 08/02/03 * Fixed maximum similarity scan to check for only relevant channel matches. This was an unknown bug. * Removed the v2 similarity check mechanism. Thesde remain resident in (WSOLA.v2.H) for those who are interested. Version 3.1 24/01/03 * Documentation included listing implementation change from v2.x to v3.x Version 3.0 * First version to use FFTing for similarity checks (must define USE_FFT to use) * WSOLA now runs at 4*realtime (4 times faster then realtime) Version 2.8 * See Version 3.4 Version 2.7 * Included cygwin1.dll in the windows zip file. Version 2.6 * Fixed sample rate problems by making it a variable * Recompiled for win32 using cygwin ... greatly improves performance on win32 Version 2.5 * Fixed libSndFileWrapper.H to work with multi channels * Added multi channel functionality. * Changelog Started