[OMPT-svn-commit] OpenMPT r23191 - in trunk/OpenMPT: include/SignalsmithStretch include/Signalsmith

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Author: sagamusix
Date: Sun May 25 14:04:10 2025
New Revision: 23191
URL: https://source.openmpt.org/browse/openmpt/?op=revision&rev=23191

Log:
[Var] Update Signalsmith Stretch to commit c78aacf6389fb862978d4ead01cfc71a3d5f8ca8 (2025-05-01)

Added:
   trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/
   trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/LICENSE.txt   (contents, props changed)
   trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/README.md   (contents, props changed)
   trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/fft.h   (contents, props changed)
   trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/stft.h   (contents, props changed)
Deleted:
   trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/dsp/
Modified:
   trunk/OpenMPT/include/SignalsmithStretch/OpenMPT.txt
   trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/README.md
   trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h
   trunk/OpenMPT/tracklib/TimeStretchPitchShift.cpp

Modified: trunk/OpenMPT/include/SignalsmithStretch/OpenMPT.txt
==============================================================================

--- trunk/OpenMPT/include/SignalsmithStretch/OpenMPT.txt	Sun May 25 13:24:35 2025	(r23190)
+++ trunk/OpenMPT/include/SignalsmithStretch/OpenMPT.txt	Sun May 25 14:04:10 2025	(r23191)
@@ -1,3 +1,7 @@
-Signalsmith Stretch: C++ pitch/time library, commit ffa45981be0b75079b43d1a7a142234f06869254 (2025-02-12)
+Signalsmith Stretch: C++ pitch shift / time stretch library, commit c78aacf6389fb862978d4ead01cfc71a3d5f8ca8 (2025-05-01)
 https://github.com/Signalsmith-Audio/signalsmith-stretch
-Unneeded files (cmd and web folder) have been removed.
\ No newline at end of file
+Unneeded files have been removed.
+
+Signalsmith Linear: C++ audio utilities, commit 94d677be71a50789eb7a33b8927947046c0c2fa2 (2025-04-09)
+https://github.com/Signalsmith-Audio/linear
+Unneeded files have been removed.
\ No newline at end of file

Modified: trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/README.md
==============================================================================
--- trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/README.md	Sun May 25 13:24:35 2025	(r23190)
+++ trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/README.md	Sun May 25 14:04:10 2025	(r23191)
@@ -90,6 +90,12 @@
 
 You should be supplying input samples slightly ahead of the processing time (which is where changes to pitch-shift or stretch rate will be centred), and you'll receive output samples slightly behind that processing time.
 
+### Split computation
+
+All of the `.preset???()` and `.configure()` methods have an optional `splitComputation` flag.  When enabled, this introduces one extra interval of output latency, and uses this to spread the computation out more evenly.
+
+Without this (as is common for spectral processing) the library will occasionally do a bunch of computation all at once, to compute the next spectral block of audio.  This is often fine, when audio's being processed across multiple threads with a decent amount of buffering (like mixing in a DAW), but if you're in a stricter situation then this flag might help. 
+
 #### Automation
 
 To follow pitch/time automation accurately, you should give it automation values from the current processing time (`.outputLatency()` samples ahead of the output), and feed it input from `.inputLatency()` samples ahead of the current processing time.
@@ -114,21 +120,36 @@
 stretch.flush(outputBuffers, outputSamples);
 ``` 
 
-Using `.seek()`/`.flush()` like this, you can perform an exact time-stretch on a fixed-length sound, and your result will have `.outputLatency()` of pre-roll. 
+Using `.seek()`/`.flush()` like this, you can perform an exact time-stretch on a fixed-length sound, and your result will have `.outputLatency()` of pre-roll.
 
-## Compiling
+### Formant compensation
 
-⚠️ This has mostly been tested with Clang.  If you're using another compiler and have any problems, please get in touch.
+```cpp
+stretch.setFormantFactor(1.2); // up ~3 semitones
 
-🚨 It's generally be OK to enable `-ffast-math`, however there's a bug in Apple Clang 16.0.0 which can generate incorrect SIMD code.  If you _have_ to use this version, we advise you don't use `-ffast-math`.
+stretch.setTransposeSemitones(3);
+```
 
-It's much slower (about 10x) if optimisation is disabled altogether, so you might want to enable optimisation where it's used, even in Debug builds.
+Both of those methods both have an optional `compensatePitch` flag.  Enabling this adjust for the pitch-shift (or non-linear map) when correcting/shifting formants.
 
-### DSP Library
+The formant correction is not a sharp as monophonic algorithms (such such as PSOLA).  It also needs you to give it a rough estimate of fundamental frequency (relative to Nyquist):
 
-This uses the Signalsmith DSP library for FFTs and other bits and bobs.
+```cpp
+// if 200Hz is the middle-register of the instrument
+stretch.setFormantBase(200/sampleRate);
+```
 
-For convenience, a copy of the library is included (with its own `LICENSE.txt`) in `dsp/`, but if you're already using this elsewhere then you should remove this copy to avoid versioning issues.
+## Compiling
+
+⚠️ This has mostly been tested with AppleClang (Mac), and MSVC (Windows).  We aim to support other compilers though, so please get in touch if you have any problems.
+
+Enabling `-ffast-math` (or equivalent) is fine, unless you're using Apple Clang 16.0.0 (in which case it will deliberately break, because that version generates incorrect SIMD code).
+
+The algorithm has a lot of number-crunching, so Debug builds are much slower (up to 10x).  If possible, you might want to enable optimisation where Stretch is used, even in Debug builds.
+
+### Dependencies and `#define`s
+
+This uses the [Signalsmith Linear](https://github.com/Signalsmith-Audio/linear) library for FFTs and other speedups.  There are [flags]([Linear repo](https://github.com/Signalsmith-Audio/linear?tab=readme-ov-file#building)) to enable Accelerate (`SIGNALSMITH_USE_ACCELERATE`) or IPP (`SIGNALSMITH_USE_IPP`).
 
 ## License
 
@@ -136,14 +157,18 @@
 
 ## Other environments / languages
 
-There's a Web Audio wrapper in `web/` (using WASM/AudioWorklet).  This will remain in-sync with the C++ library.
+There's a Web Audio release in [`web/`](web/) (WASM/AudioWorklet), also available on [NPM](npmjs.com/package/signalsmith-stretch).  This has its own (higher-level) API, but the stretching algorithm will remain in-sync with the C++ library.
+
+There's a [Python binding](https://pypi.org/project/python-stretch/) written/published by [Gregorio Andrea Giudici](https://github.com/gregogiudici/python-stretch).  This is used as the default pitch/time method in [Audiomentations](https://iver56.github.io/audiomentations/).
 
-There's a [Python binding](https://pypi.org/project/python-stretch/) written/published by [Gregorio Andrea Giudici](https://github.com/gregogiudici/python-stretch), and a [Rust wrapper](https://crates.io/crates/signalsmith-stretch) by [Colin Marc](https://github.com/colinmarc/signalsmith-stretch-rs).
+There's a [Rust wrapper](https://crates.io/crates/signalsmith-stretch) by [Colin Marc](https://github.com/colinmarc/signalsmith-stretch-rs).
 
 ## Thanks
 
 We'd like to particularly thank the following people who sponsored specific features or improvements:
 
+* **Future Audio Workshop**: chunked-computation mode
+* **Jochem de Jong**: formant shifting/compensation
 * **Metronaut**: web audio (JS/WASM) release
 * **Daniel L Bowling** and the Stanford School of Medicine: web audio improvements
 

Modified: trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h
==============================================================================
--- trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h	Sun May 25 13:24:35 2025	(r23190)
+++ trunk/OpenMPT/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h	Sun May 25 14:04:10 2025	(r23191)
@@ -1,63 +1,89 @@
 #ifndef SIGNALSMITH_STRETCH_H
 #define SIGNALSMITH_STRETCH_H
 
-#include "dsp/spectral.h"
-#include "dsp/delay.h"
-#include "dsp/perf.h"
-SIGNALSMITH_DSP_VERSION_CHECK(1, 6, 0); // Check version is compatible
+#include "signalsmith-linear/stft.h" // https://github.com/Signalsmith-Audio/linear
+
 #include <vector>
+#include <array>
 #include <algorithm>
 #include <functional>
 #include <random>
+#include <limits>
+#include <type_traits>
 
 namespace signalsmith { namespace stretch {
 
-template<typename Sample=float, class RandomEngine=std::default_random_engine>
+namespace _impl {
+	template<bool conjugateSecond=false, typename V>
+	static std::complex<V> mul(const std::complex<V> &a, const std::complex<V> &b) {
+		return conjugateSecond ? std::complex<V>{
+			b.real()*a.real() + b.imag()*a.imag(),
+				b.real()*a.imag() - b.imag()*a.real()
+		} : std::complex<V>{
+			a.real()*b.real() - a.imag()*b.imag(),
+			a.real()*b.imag() + a.imag()*b.real()
+		};
+	}
+	template<typename V>
+	static V norm(const std::complex<V> &a) {
+		V r = a.real(), i = a.imag();
+		return r*r + i*i;
+	}
+}
+
+template<typename Sample=float, class RandomEngine=void>
 struct SignalsmithStretch {
-	static constexpr size_t version[3] = {1, 1, 1};
+	static constexpr size_t version[3] = {1, 3, 1};
 
 	SignalsmithStretch() : randomEngine(std::random_device{}()) {}
 	SignalsmithStretch(long seed) : randomEngine(seed) {}
-
+	
 	int blockSamples() const {
-		return stft.windowSize();
+		return int(stft.blockSamples());
 	}
 	int intervalSamples() const {
-		return stft.interval();
+		return int(stft.defaultInterval());
 	}
 	int inputLatency() const {
-		return stft.windowSize()/2;
+		return int(stft.analysisLatency());
 	}
 	int outputLatency() const {
-		return stft.windowSize() - inputLatency();
+		return int(stft.synthesisLatency() + _splitComputation*stft.defaultInterval());
 	}
 	
 	void reset() {
-		stft.reset();
-		inputBuffer.reset();
+		stft.reset(0.1);
+		stashedInput = stft.input;
+		stashedOutput = stft.output;
+		
 		prevInputOffset = -1;
 		channelBands.assign(channelBands.size(), Band());
 		silenceCounter = 0;
 		didSeek = false;
-		flushed = true;
+		blockProcess = {};
+		freqEstimateWeighted = freqEstimateWeight = 0;
 	}
 
 	// Configures using a default preset
-	void presetDefault(int nChannels, Sample sampleRate) {
-		configure(nChannels, sampleRate*0.12, sampleRate*0.03);
+	void presetDefault(int nChannels, Sample sampleRate, bool splitComputation=false) {
+		configure(nChannels, sampleRate*0.12, sampleRate*0.03, splitComputation);
 	}
-	void presetCheaper(int nChannels, Sample sampleRate) {
-		configure(nChannels, sampleRate*0.1, sampleRate*0.04);
+	void presetCheaper(int nChannels, Sample sampleRate, bool splitComputation=true) {
+		configure(nChannels, sampleRate*0.1, sampleRate*0.04, splitComputation);
 	}
 
 	// Manual setup
-	void configure(int nChannels, int blockSamples, int intervalSamples) {
+	void configure(int nChannels, int blockSamples, int intervalSamples, bool splitComputation=false) {
+		_splitComputation = splitComputation;
 		channels = nChannels;
-		stft.setWindow(stft.kaiser, true);
-		stft.resize(channels, blockSamples, intervalSamples);
-		bands = stft.bands();
-		inputBuffer.resize(channels, blockSamples + intervalSamples + 1);
-		timeBuffer.assign(stft.fftSize(), 0);
+		stft.configure(channels, channels, blockSamples, intervalSamples + 1);
+		stft.setInterval(intervalSamples, stft.kaiser);
+		stft.reset(0.1);
+		stashedInput = stft.input;
+		stashedOutput = stft.output;
+		tmpBuffer.resize(blockSamples + intervalSamples);
+
+		bands = int(stft.bands());
 		channelBands.assign(bands*channels, Band());
 		
 		peaks.reserve(bands/2);
@@ -65,6 +91,9 @@
 		smoothedEnergy.resize(bands);
 		outputMap.resize(bands);
 		channelPredictions.resize(channels*bands);
+
+		blockProcess = {};
+		formantMetric.resize(bands + 2);
 	}
 
 	/// Frequency multiplier, and optional tonality limit (as multiple of sample-rate)
@@ -79,39 +108,76 @@
 	}
 	void setTransposeSemitones(Sample semitones, Sample tonalityLimit=0) {
 		setTransposeFactor(std::pow(2, semitones/12), tonalityLimit);
-		customFreqMap = nullptr;
 	}
 	// Sets a custom frequency map - should be monotonically increasing
 	void setFreqMap(std::function<Sample(Sample)> inputToOutput) {
 		customFreqMap = inputToOutput;
 	}
 
+	void setFormantFactor(Sample multiplier, bool compensatePitch=false) {
+		formantMultiplier = multiplier;
+		invFormantMultiplier = 1/multiplier;
+		formantCompensation = compensatePitch;
+	}
+	void setFormantSemitones(Sample semitones, bool compensatePitch=false) {
+		setFormantFactor(std::pow(2, semitones/12), compensatePitch);
+	}
+	// Rough guesstimate of the fundamental frequency, used for formant analysis. 0 means attempting to detect the pitch
+	void setFormantBase(Sample baseFreq=0) {
+		formantBaseFreq = baseFreq;
+	}
+
 	// Provide previous input ("pre-roll"), without affecting the speed calculation.  You should ideally feed it one block-length + one interval
 	template<class Inputs>
 	void seek(Inputs &&inputs, int inputSamples, double playbackRate) {
-		inputBuffer.reset();
+		tmpBuffer.resize(0);
+		tmpBuffer.resize(stft.blockSamples() + stft.defaultInterval());
+
+		int startIndex = std::max<int>(0, inputSamples - int(tmpBuffer.size())); // start position in input
+		int padStart = int(tmpBuffer.size() + startIndex) - inputSamples; // start position in tmpBuffer
+
 		Sample totalEnergy = 0;
 		for (int c = 0; c < channels; ++c) {
 			auto &&inputChannel = inputs[c];
-			auto &&bufferChannel = inputBuffer[c];
-			int startIndex = std::max<int>(0, inputSamples - stft.windowSize() - stft.interval());
 			for (int i = startIndex; i < inputSamples; ++i) {
 				Sample s = inputChannel[i];
 				totalEnergy += s*s;
-				bufferChannel[i] = s;
+				tmpBuffer[i - startIndex + padStart] = s;
 			}
+			
+			stft.writeInput(c, tmpBuffer.size(), tmpBuffer.data());
 		}
+		stft.moveInput(tmpBuffer.size());
 		if (totalEnergy >= noiseFloor) {
 			silenceCounter = 0;
 			silenceFirst = true;
 		}
-		inputBuffer += inputSamples;
 		didSeek = true;
-		seekTimeFactor = (playbackRate*stft.interval() > 1) ? 1/playbackRate : stft.interval();
+		seekTimeFactor = (playbackRate*stft.defaultInterval() > 1) ? 1/playbackRate : stft.defaultInterval();
 	}
-
+	
 	template<class Inputs, class Outputs>
 	void process(Inputs &&inputs, int inputSamples, Outputs &&outputs, int outputSamples) {
+#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_START
+		SIGNALSMITH_STRETCH_PROFILE_PROCESS_START(inputSamples, outputSamples);
+#endif
+		int prevCopiedInput = 0;
+		auto copyInput = [&](int toIndex){
+
+			int length = std::min<int>(int(stft.blockSamples() + stft.defaultInterval()), toIndex - prevCopiedInput);
+			tmpBuffer.resize(length);
+			int offset = toIndex - length;
+			for (int c = 0; c < channels; ++c) {
+				auto &&inputBuffer = inputs[c];
+				for (int i = 0; i < length; ++i) {
+					tmpBuffer[i] = inputBuffer[i + offset];
+				}
+				stft.writeInput(c, length, tmpBuffer.data());
+			}
+			stft.moveInput(length);
+			prevCopiedInput = toIndex;
+		};
+
 		Sample totalEnergy = 0;
 		for (int c = 0; c < channels; ++c) {
 			auto &&inputChannel = inputs[c];
@@ -120,10 +186,13 @@
 				totalEnergy += s*s;
 			}
 		}
+
 		if (totalEnergy < noiseFloor) {
-			if (silenceCounter >= 2*stft.windowSize()) {
-				if (silenceFirst) {
+			if (silenceCounter >= 2*stft.blockSamples()) {
+				if (silenceFirst) { // first block of silence processing
 					silenceFirst = false;
+					//stft.reset();
+					blockProcess = {};
 					for (auto &b : channelBands) {
 						b.input = b.prevInput = b.output = 0;
 						b.inputEnergy = 0;
@@ -148,15 +217,7 @@
 				}
 
 				// Store input in history buffer
-				for (int c = 0; c < channels; ++c) {
-					auto &&inputChannel = inputs[c];
-					auto &&bufferChannel = inputBuffer[c];
-					int startIndex = std::max<int>(0, inputSamples - stft.windowSize() - stft.interval());
-					for (int i = startIndex; i < inputSamples; ++i) {
-						bufferChannel[i] = inputChannel[i];
-					}
-				}
-				inputBuffer += inputSamples;
+				copyInput(inputSamples);
 				return;
 			} else {
 				silenceCounter += inputSamples;
@@ -165,119 +226,174 @@
 			silenceCounter = 0;
 			silenceFirst = true;
 		}
-
+		
 		for (int outputIndex = 0; outputIndex < outputSamples; ++outputIndex) {
-			stft.ensureValid(outputIndex, [&](int outputOffset) {
+			bool newBlock = blockProcess.samplesSinceLast >= stft.defaultInterval();
+			if (newBlock) {
+				blockProcess.step = 0;
+				blockProcess.steps = 0; // how many processing steps this block will have
+				blockProcess.samplesSinceLast = 0;
+				
 				// Time to process a spectrum!  Where should it come from in the input?
-				int inputOffset = std::round(outputOffset*Sample(inputSamples)/outputSamples) - stft.windowSize();
+				int inputOffset = std::round(outputIndex*Sample(inputSamples)/outputSamples);
 				int inputInterval = inputOffset - prevInputOffset;
 				prevInputOffset = inputOffset;
+				
+				copyInput(inputOffset);
+				stashedInput = stft.input; // save the input state, since that's what we'll analyse later
+				if (_splitComputation) {
+					stashedOutput = stft.output; // save the current output, and read from it
+					stft.moveOutput(stft.defaultInterval()); // the actual input jumps forward in time by one interval, ready for the synthesis
+				}
+
+				blockProcess.newSpectrum = didSeek || (inputInterval > 0);
+				blockProcess.mappedFrequencies = customFreqMap || freqMultiplier != 1;
+				if (blockProcess.newSpectrum) {
+					// make sure the previous input is the correct distance in the past (give or take 1 sample)
+					blockProcess.reanalysePrev = didSeek || std::abs(inputInterval - int(stft.defaultInterval())) > 1;
+					if (blockProcess.reanalysePrev) blockProcess.steps += stft.analyseSteps() + 1;
 
-				bool newSpectrum = didSeek || (inputInterval > 0);
-				if (newSpectrum) {
-					for (int c = 0; c < channels; ++c) {
-						// Copy from the history buffer, if needed
-						auto &&bufferChannel = inputBuffer[c];
-						for (int i = 0; i < -inputOffset; ++i) {
-							timeBuffer[i] = bufferChannel[i + inputOffset];
+					// analyse a new input
+					blockProcess.steps += stft.analyseSteps() + 1;
+				}
+				
+				blockProcess.processFormants = formantMultiplier != 1 || (formantCompensation && blockProcess.mappedFrequencies);
+
+				blockProcess.timeFactor = didSeek ? seekTimeFactor : stft.defaultInterval()/std::max<Sample>(1, inputInterval);
+				didSeek = false;
+
+				updateProcessSpectrumSteps();
+				blockProcess.steps += processSpectrumSteps;
+
+				blockProcess.steps += stft.synthesiseSteps() + 1;
+			}
+			
+			size_t processToStep = newBlock ? blockProcess.steps : 0;
+			if (_splitComputation) {
+				Sample processRatio = Sample(blockProcess.samplesSinceLast + 1)/stft.defaultInterval();
+				processToStep = std::min<size_t>(blockProcess.steps, (blockProcess.steps + 0.999f)*processRatio);
+			}
+			
+			while (blockProcess.step < processToStep) {
+				size_t step = blockProcess.step++;
+#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_STEP
+				SIGNALSMITH_STRETCH_PROFILE_PROCESS_STEP(step, blockProcess.steps);
+#endif
+				if (blockProcess.newSpectrum) {
+					if (blockProcess.reanalysePrev) {
+						// analyse past input
+						if (step < stft.analyseSteps()) {
+							stashedInput.swap(stft.input);
+							stft.analyseStep(step, stft.defaultInterval());
+							stashedInput.swap(stft.input);
+							continue;
 						}
-						// Copy the rest from the input
-						auto &&inputChannel = inputs[c];
-						for (int i = std::max<int>(0, -inputOffset); i < stft.windowSize(); ++i) {
-							timeBuffer[i] = inputChannel[i + inputOffset];
+						step -= stft.analyseSteps();
+						if (step < 1) {
+							// Copy previous analysis to our band objects
+							for (int c = 0; c < channels; ++c) {
+								auto channelBands = bandsForChannel(c);
+								auto *spectrumBands = stft.spectrum(c);
+								for (int b = 0; b < bands; ++b) {
+									channelBands[b].prevInput = spectrumBands[b];
+								}
+							}
+							continue;
 						}
-						stft.analyse(c, timeBuffer);
+						step -= 1;
 					}
-					flushed = false; // TODO: first block after a flush should be gain-compensated
 
-					for (int c = 0; c < channels; ++c) {
-						auto channelBands = bandsForChannel(c);
-						auto &&spectrumBands = stft.spectrum[c];
-						for (int b = 0; b < bands; ++b) {
-							channelBands[b].input = spectrumBands[b];
-						}
+					// Analyse latest (stashed) input
+					if (step < stft.analyseSteps()) {
+						stashedInput.swap(stft.input);
+						stft.analyseStep(step);
+						stashedInput.swap(stft.input);
+						continue;
 					}
-
-					if (didSeek || inputInterval != stft.interval()) { // make sure the previous input is the correct distance in the past
-						int prevIntervalOffset = inputOffset - stft.interval();
-						for (int c = 0; c < channels; ++c) {
-							// Copy from the history buffer, if needed
-							auto &&bufferChannel = inputBuffer[c];
-							for (int i = 0; i < std::min(-prevIntervalOffset, stft.windowSize()); ++i) {
-								timeBuffer[i] = bufferChannel[i + prevIntervalOffset];
-							}
-							// Copy the rest from the input
-							auto &&inputChannel = inputs[c];
-							for (int i = std::max<int>(0, -prevIntervalOffset); i < stft.windowSize(); ++i) {
-								timeBuffer[i] = inputChannel[i + prevIntervalOffset];
-							}
-							stft.analyse(c, timeBuffer);
-						}
+					step -= stft.analyseSteps();
+					if (step < 1) {
+						// Copy analysed spectrum into our band objects
 						for (int c = 0; c < channels; ++c) {
 							auto channelBands = bandsForChannel(c);
-							auto &&spectrumBands = stft.spectrum[c];
+							auto *spectrumBands = stft.spectrum(c);
 							for (int b = 0; b < bands; ++b) {
-								channelBands[b].prevInput = spectrumBands[b];
+								channelBands[b].input = spectrumBands[b];
 							}
 						}
+						continue;
 					}
+					step -= 1;
 				}
-				
-				Sample timeFactor = didSeek ? seekTimeFactor : stft.interval()/std::max<Sample>(1, inputInterval);
-				processSpectrum(newSpectrum, timeFactor);
-				didSeek = false;
 
-				for (int c = 0; c < channels; ++c) {
-					auto channelBands = bandsForChannel(c);
-					auto &&spectrumBands = stft.spectrum[c];
-					for (int b = 0; b < bands; ++b) {
-						spectrumBands[b] = channelBands[b].output;
+				if (step < processSpectrumSteps) {
+					processSpectrum(step);
+					continue;
+				}
+				step -= processSpectrumSteps;
+
+				if (step < 1) {
+					// Copy band objects into spectrum
+					for (int c = 0; c < channels; ++c) {
+						auto channelBands = bandsForChannel(c);
+						auto *spectrumBands = stft.spectrum(c);
+						for (int b = 0; b < bands; ++b) {
+							spectrumBands[b] = channelBands[b].output;
+						}
 					}
+					continue;
 				}
-			});
+				step -= 1;
+				
+				if (step < stft.synthesiseSteps()) {
+					stft.synthesiseStep(step);
+					continue;
+				}
+			}
+#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_ENDSTEP
+			SIGNALSMITH_STRETCH_PROFILE_PROCESS_ENDSTEP();
+#endif
 
+			++blockProcess.samplesSinceLast;
+			if (_splitComputation) stashedOutput.swap(stft.output);
 			for (int c = 0; c < channels; ++c) {
 				auto &&outputChannel = outputs[c];
-				auto &&stftChannel = stft[c];
-				outputChannel[outputIndex] = stftChannel[outputIndex];
+				Sample v = 0;
+				stft.readOutput(c, 1, &v);
+				outputChannel[outputIndex] = v;
 			}
+			stft.moveOutput(1);
+			if (_splitComputation) stashedOutput.swap(stft.output);
 		}
-
-		// Store input in history buffer
-		for (int c = 0; c < channels; ++c) {
-			auto &&inputChannel = inputs[c];
-			auto &&bufferChannel = inputBuffer[c];
-			int startIndex = std::max<int>(0, inputSamples - stft.windowSize());
-			for (int i = startIndex; i < inputSamples; ++i) {
-				bufferChannel[i] = inputChannel[i];
-			}
-		}
-		inputBuffer += inputSamples;
-		stft += outputSamples;
+		
+		copyInput(inputSamples);
 		prevInputOffset -= inputSamples;
+#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_END
+		SIGNALSMITH_STRETCH_PROFILE_PROCESS_END();
+#endif
 	}
 
 	// Read the remaining output, providing no further input.  `outputSamples` should ideally be at least `.outputLatency()`
 	template<class Outputs>
 	void flush(Outputs &&outputs, int outputSamples) {
-		int plainOutput = std::min<int>(outputSamples, stft.windowSize());
-		int foldedBackOutput = std::min<int>(outputSamples, stft.windowSize() - plainOutput);
+		int plainOutput = std::min<int>(outputSamples, int(stft.blockSamples()));
+		int foldedBackOutput = std::min<int>(outputSamples, int(stft.blockSamples()) - plainOutput);
+		stft.finishOutput(1);
 		for (int c = 0; c < channels; ++c) {
+			tmpBuffer.resize(plainOutput);
+			stft.readOutput(c, plainOutput, tmpBuffer.data());
 			auto &&outputChannel = outputs[c];
-			auto &&stftChannel = stft[c];
 			for (int i = 0; i < plainOutput; ++i) {
 				// TODO: plain output should be gain-
-				outputChannel[i] = stftChannel[i];
+				outputChannel[i] = tmpBuffer[i];
 			}
+			tmpBuffer.resize(foldedBackOutput);
+			stft.readOutput(c, plainOutput, foldedBackOutput, tmpBuffer.data());
 			for (int i = 0; i < foldedBackOutput; ++i) {
-				outputChannel[outputSamples - 1 - i] -= stftChannel[plainOutput + i];
-			}
-			for (int i = 0; i < plainOutput + foldedBackOutput; ++i) {
-				stftChannel[i] = 0;
+				outputChannel[outputSamples - 1 - i] -= tmpBuffer[i];
 			}
 		}
-		// Skip the output we just used/cleared
-		stft += plainOutput + foldedBackOutput;
+		stft.reset(0.1);
+
 		// Reset the phase-vocoder stuff, so the next block gets a fresh start
 		for (int c = 0; c < channels; ++c) {
 			auto channelBands = bandsForChannel(c);
@@ -285,31 +401,50 @@
 				channelBands[b].prevInput = channelBands[b].output = 0;
 			}
 		}
-		flushed = true;
 	}
 private:
+	bool _splitComputation = false;
+	struct {
+		size_t samplesSinceLast = std::numeric_limits<size_t>::max();
+		size_t steps = 0;
+		size_t step = 0;
+		
+		bool newSpectrum = false;
+		bool reanalysePrev = false;
+		bool mappedFrequencies = false;
+		bool processFormants = false;
+		Sample timeFactor;
+	} blockProcess;
+
 	using Complex = std::complex<Sample>;
 	static constexpr Sample noiseFloor{1e-15};
 	static constexpr Sample maxCleanStretch{2}; // time-stretch ratio before we start randomising phases
-	int silenceCounter = 0;
+	size_t silenceCounter = 0;
 	bool silenceFirst = true;
 
 	Sample freqMultiplier = 1, freqTonalityLimit = 0.5;
 	std::function<Sample(Sample)> customFreqMap = nullptr;
+	
+	bool formantCompensation = false; // compensate for pitch/freq change
+	Sample formantMultiplier = 1, invFormantMultiplier = 1;
+
+	using STFT = signalsmith::linear::DynamicSTFT<Sample, false, true>;
+	STFT stft;
+	typename STFT::Input stashedInput;
+	typename STFT::Output stashedOutput;
+	
+	std::vector<Sample> tmpBuffer;
 
-	signalsmith::spectral::STFT<Sample> stft{0, 1, 1};
-	signalsmith::delay::MultiBuffer<Sample> inputBuffer;
 	int channels = 0, bands = 0;
 	int prevInputOffset = -1;
-	std::vector<Sample> timeBuffer;
-	bool didSeek = false, flushed = true;
+	bool didSeek = false;
 	Sample seekTimeFactor = 1;
 
 	Sample bandToFreq(Sample b) const {
-		return (b + Sample(0.5))/stft.fftSize();
+		return stft.binToFreq(b);
 	}
 	Sample freqToBand(Sample f) const {
-		return f*stft.fftSize() - Sample(0.5);
+		return stft.freqToBin(f);
 	}
 	
 	struct Band {
@@ -371,10 +506,10 @@
 		Complex input;
 
 		Complex makeOutput(Complex phase) {
-			Sample phaseNorm = std::norm(phase);
+			Sample phaseNorm = _impl::norm(phase);
 			if (phaseNorm <= noiseFloor) {
 				phase = input; // prediction is too weak, fall back to the input
-				phaseNorm = std::norm(input) + noiseFloor;
+				phaseNorm = _impl::norm(input) + noiseFloor;
 			}
 			return phase*std::sqrt(energy/phaseNorm);
 		}
@@ -384,49 +519,95 @@
 		return channelPredictions.data() + c*bands;
 	}
 
-	RandomEngine randomEngine;
-
-	void processSpectrum(bool newSpectrum, Sample timeFactor) {
+	// If RandomEngine=void, use std::default_random_engine;
+	using RandomEngineImpl = typename std::conditional<
+		std::is_void<RandomEngine>::value,
+		std::default_random_engine,
+		RandomEngine
+	>::type;
+	RandomEngineImpl randomEngine;
+
+	size_t processSpectrumSteps = 0;
+	static constexpr size_t splitMainPrediction = 8; // it's just heavy, since we're blending up to 4 different phase predictions
+	void updateProcessSpectrumSteps() {
+		processSpectrumSteps = 0;
+		if (blockProcess.newSpectrum) processSpectrumSteps += channels;
+		if (blockProcess.mappedFrequencies) {
+			processSpectrumSteps += smoothEnergySteps;
+			processSpectrumSteps += 1; // findPeaks
+		}
+		processSpectrumSteps += 1; // updating the output map
+		processSpectrumSteps += channels; // preliminary phase-vocoder prediction
+		processSpectrumSteps += splitMainPrediction;
+		if (blockProcess.newSpectrum) processSpectrumSteps += 1; // .input -> .prevInput
+		if (blockProcess.processFormants) processSpectrumSteps += 3;
+	}
+	void processSpectrum(size_t step) {
+		Sample timeFactor = blockProcess.timeFactor;
+		
+		Sample smoothingBins = Sample(stft.fftSamples())/stft.defaultInterval();
+		int longVerticalStep = std::round(smoothingBins);
 		timeFactor = std::max<Sample>(timeFactor, 1/maxCleanStretch);
 		bool randomTimeFactor = (timeFactor > maxCleanStretch);
 		std::uniform_real_distribution<Sample> timeFactorDist(maxCleanStretch*2*randomTimeFactor - timeFactor, timeFactor);
-		
-		if (newSpectrum) {
-			for (int c = 0; c < channels; ++c) {
-				auto bins = bandsForChannel(c);
 
-				Complex rot = std::polar(Sample(1), bandToFreq(0)*stft.interval()*Sample(2*M_PI));
+		if (blockProcess.newSpectrum) {
+			if (step < size_t(channels)) {
+				int channel = int(step);
+				auto bins = bandsForChannel(channel);
+
+				Complex rot = std::polar(Sample(1), bandToFreq(0)*stft.defaultInterval()*Sample(2*M_PI));
 				Sample freqStep = bandToFreq(1) - bandToFreq(0);
-				Complex rotStep = std::polar(Sample(1), freqStep*stft.interval()*Sample(2*M_PI));
-				
+				Complex rotStep = std::polar(Sample(1), freqStep*stft.defaultInterval()*Sample(2*M_PI));
+				 
 				for (int b = 0; b < bands; ++b) {
 					auto &bin = bins[b];
-					bin.output = signalsmith::perf::mul(bin.output, rot);
-					bin.prevInput = signalsmith::perf::mul(bin.prevInput, rot);
-					rot = signalsmith::perf::mul(rot, rotStep);
+					bin.output = _impl::mul(bin.output, rot);
+					bin.prevInput = _impl::mul(bin.prevInput, rot);
+					rot = _impl::mul(rot, rotStep);
 				}
+				return;
 			}
+			step -= channels;
 		}
+		if (blockProcess.mappedFrequencies) {
+			if (step < smoothEnergySteps) {
+				smoothEnergy(step, smoothingBins);
+				return;
+			}
+			step -= smoothEnergySteps;
+			if (step-- == 0) {
+				findPeaks();
+				return;
+			}
+		}
+		if (step-- == 0) {
+			if (blockProcess.mappedFrequencies) {
+				updateOutputMap();
+			} else { // we're not pitch-shifting, so no need to find peaks etc.
+				for (int c = 0; c < channels; ++c) {
+					Band *bins = bandsForChannel(c);
+					for (int b = 0; b < bands; ++b) {
+						bins[b].inputEnergy = _impl::norm(bins[b].input);
+					}
+				}
 
-		Sample smoothingBins = Sample(stft.fftSize())/stft.interval();
-		int longVerticalStep = std::round(smoothingBins);
-		if (customFreqMap || freqMultiplier != 1) {
-			findPeaks(smoothingBins);
-			updateOutputMap();
-		} else { // we're not pitch-shifting, so no need to find peaks etc.
-			for (int c = 0; c < channels; ++c) {
-				Band *bins = bandsForChannel(c);
 				for (int b = 0; b < bands; ++b) {
-					bins[b].inputEnergy = std::norm(bins[b].input);
+					outputMap[b] = {Sample(b), 1};
 				}
 			}
-			for (int b = 0; b < bands; ++b) {
-				outputMap[b] = {Sample(b), 1};
+			return;
+		}
+		if (blockProcess.processFormants) {
+			if (step < 3) {
+				updateFormants(step);
+				return;
 			}
+			step -= 3;
 		}
-
 		// Preliminary output prediction from phase-vocoder
-		for (int c = 0; c < channels; ++c) {
+		if (step < size_t(channels)) {
+			int c = int(step);
 			Band *bins = bandsForChannel(c);
 			auto *predictions = predictionsForChannel(c);
 			for (int b = 0; b < bands; ++b) {
@@ -442,137 +623,152 @@
 
 				auto &outputBin = bins[b];
 				Complex prevInput = getFractional<&Band::prevInput>(c, lowIndex, fracIndex);
-				Complex freqTwist = signalsmith::perf::mul<true>(prediction.input, prevInput);
-				Complex phase = signalsmith::perf::mul(outputBin.output, freqTwist);
+				Complex freqTwist = _impl::mul<true>(prediction.input, prevInput);
+				Complex phase = _impl::mul(outputBin.output, freqTwist);
 				outputBin.output = phase/(std::max(prevEnergy, prediction.energy) + noiseFloor);
 			}
+			return;
 		}
+		step -= channels;
 
-		// Re-predict using phase differences between frequencies
-		for (int b = 0; b < bands; ++b) {
-			// Find maximum-energy channel and calculate that
-			int maxChannel = 0;
-			Sample maxEnergy = predictionsForChannel(0)[b].energy;
-			for (int c = 1; c < channels; ++c) {
-				Sample e = predictionsForChannel(c)[b].energy;
-				if (e > maxEnergy) {
-					maxChannel = c;
-					maxEnergy = e;
+		if (step < splitMainPrediction) {
+			// Re-predict using phase differences between frequencies
+			size_t chunk = step;
+			int startB = int(bands*chunk/splitMainPrediction);
+			int endB = int(bands*(chunk + 1)/splitMainPrediction);
+			for (int b = startB; b < endB; ++b) {
+				// Find maximum-energy channel and calculate that
+				int maxChannel = 0;
+				Sample maxEnergy = predictionsForChannel(0)[b].energy;
+				for (int c = 1; c < channels; ++c) {
+					Sample e = predictionsForChannel(c)[b].energy;
+					if (e > maxEnergy) {
+						maxChannel = c;
+						maxEnergy = e;
+					}
 				}
-			}
 
-			auto *predictions = predictionsForChannel(maxChannel);
-			auto &prediction = predictions[b];
-			auto *bins = bandsForChannel(maxChannel);
-			auto &outputBin = bins[b];
-
-			Complex phase = 0;
-			auto mapPoint = outputMap[b];
-
-			// Upwards vertical steps
-			if (b > 0) {
-				Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor;
-				Complex downInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - binTimeFactor);
-				Complex shortVerticalTwist = signalsmith::perf::mul<true>(prediction.input, downInput);
+				auto *predictions = predictionsForChannel(maxChannel);
+				auto &prediction = predictions[b];
+				auto *bins = bandsForChannel(maxChannel);
+				auto &outputBin = bins[b];
 
-				auto &downBin = bins[b - 1];
-				phase += signalsmith::perf::mul(downBin.output, shortVerticalTwist);
-				
-				if (b >= longVerticalStep) {
-					Complex longDownInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - longVerticalStep*binTimeFactor);
-					Complex longVerticalTwist = signalsmith::perf::mul<true>(prediction.input, longDownInput);
+				Complex phase = 0;
+				auto mapPoint = outputMap[b];
 
-					auto &longDownBin = bins[b - longVerticalStep];
-					phase += signalsmith::perf::mul(longDownBin.output, longVerticalTwist);
-				}
-			}
-			// Downwards vertical steps
-			if (b < bands - 1) {
-				auto &upPrediction = predictions[b + 1];
-				auto &upMapPoint = outputMap[b + 1];
+				// Upwards vertical steps
+				if (b > 0) {
+					Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor;
+					Complex downInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - binTimeFactor);
+					Complex shortVerticalTwist = _impl::mul<true>(prediction.input, downInput);
 
-				Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor;
-				Complex downInput = getFractional<&Band::input>(maxChannel, upMapPoint.inputBin - binTimeFactor);
-				Complex shortVerticalTwist = signalsmith::perf::mul<true>(upPrediction.input, downInput);
+					auto &downBin = bins[b - 1];
+					phase += _impl::mul(downBin.output, shortVerticalTwist);
+					
+					if (b >= longVerticalStep) {
+						Complex longDownInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - longVerticalStep*binTimeFactor);
+						Complex longVerticalTwist = _impl::mul<true>(prediction.input, longDownInput);
 
-				auto &upBin = bins[b + 1];
-				phase += signalsmith::perf::mul<true>(upBin.output, shortVerticalTwist);
-				
-				if (b < bands - longVerticalStep) {
-					auto &longUpPrediction = predictions[b + longVerticalStep];
-					auto &longUpMapPoint = outputMap[b + longVerticalStep];
+						auto &longDownBin = bins[b - longVerticalStep];
+						phase += _impl::mul(longDownBin.output, longVerticalTwist);
+					}
+				}
+				// Downwards vertical steps
+				if (b < bands - 1) {
+					auto &upPrediction = predictions[b + 1];
+					auto &upMapPoint = outputMap[b + 1];
+
+					Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor;
+					Complex downInput = getFractional<&Band::input>(maxChannel, upMapPoint.inputBin - binTimeFactor);
+					Complex shortVerticalTwist = _impl::mul<true>(upPrediction.input, downInput);
+
+					auto &upBin = bins[b + 1];
+					phase += _impl::mul<true>(upBin.output, shortVerticalTwist);
+					
+					if (b < bands - longVerticalStep) {
+						auto &longUpPrediction = predictions[b + longVerticalStep];
+						auto &longUpMapPoint = outputMap[b + longVerticalStep];
 
-					Complex longDownInput = getFractional<&Band::input>(maxChannel, longUpMapPoint.inputBin - longVerticalStep*binTimeFactor);
-					Complex longVerticalTwist = signalsmith::perf::mul<true>(longUpPrediction.input, longDownInput);
+						Complex longDownInput = getFractional<&Band::input>(maxChannel, longUpMapPoint.inputBin - longVerticalStep*binTimeFactor);
+						Complex longVerticalTwist = _impl::mul<true>(longUpPrediction.input, longDownInput);
 
-					auto &longUpBin = bins[b + longVerticalStep];
-					phase += signalsmith::perf::mul<true>(longUpBin.output, longVerticalTwist);
+						auto &longUpBin = bins[b + longVerticalStep];
+						phase += _impl::mul<true>(longUpBin.output, longVerticalTwist);
+					}
 				}
-			}
 
-			outputBin.output = prediction.makeOutput(phase);
-			
-			// All other bins are locked in phase
-			for (int c = 0; c < channels; ++c) {
-				if (c != maxChannel) {
-					auto &channelBin = bandsForChannel(c)[b];
-					auto &channelPrediction = predictionsForChannel(c)[b];
-					
-					Complex channelTwist = signalsmith::perf::mul<true>(channelPrediction.input, prediction.input);
-					Complex channelPhase = signalsmith::perf::mul(outputBin.output, channelTwist);
-					channelBin.output = channelPrediction.makeOutput(channelPhase);
+				outputBin.output = prediction.makeOutput(phase);
+				
+				// All other bins are locked in phase
+				for (int c = 0; c < channels; ++c) {
+					if (c != maxChannel) {
+						auto &channelBin = bandsForChannel(c)[b];
+						auto &channelPrediction = predictionsForChannel(c)[b];
+						
+						Complex channelTwist = _impl::mul<true>(channelPrediction.input, prediction.input);
+						Complex channelPhase = _impl::mul(outputBin.output, channelTwist);
+						channelBin.output = channelPrediction.makeOutput(channelPhase);
+					}
 				}
 			}
+			return;
 		}
+		step -= splitMainPrediction;
 
-		if (newSpectrum) {
-			for (auto &bin : channelBands) {
-				bin.prevInput = bin.input;
+		if (blockProcess.newSpectrum) {
+			if (step-- == 0) {
+				for (auto &bin : channelBands) {
+					bin.prevInput = bin.input;
+				}
 			}
 		}
 	}
 	
 	// Produces smoothed energy across all channels
-	void smoothEnergy(Sample smoothingBins) {
+	static constexpr size_t smoothEnergySteps = 3;
+	Sample smoothEnergyState = 0;
+	void smoothEnergy(size_t step, Sample smoothingBins) {
 		Sample smoothingSlew = 1/(1 + smoothingBins*Sample(0.5));
-		for (auto &e : energy) e = 0;
-		for (int c = 0; c < channels; ++c) {
-			Band *bins = bandsForChannel(c);
+		if (step-- == 0) {
+			for (auto &e : energy) e = 0;
+			for (int c = 0; c < channels; ++c) {
+				Band *bins = bandsForChannel(c);
+				for (int b = 0; b < bands; ++b) {
+					Sample e = _impl::norm(bins[b].input);
+					bins[b].inputEnergy = e; // Used for interpolating prediction energy
+					energy[b] += e;
+				}
+			}
 			for (int b = 0; b < bands; ++b) {
-				Sample e = std::norm(bins[b].input);
-				bins[b].inputEnergy = e; // Used for interpolating prediction energy
-				energy[b] += e;
+				smoothedEnergy[b] = energy[b];
 			}
+			smoothEnergyState = 0;
+			return;
 		}
-		for (int b = 0; b < bands; ++b) {
-			smoothedEnergy[b] = energy[b];
+
+		// The two other steps are repeated smoothing passes, down and up
+		Sample e = smoothEnergyState;
+		for (int b = bands - 1; b >= 0; --b) {
+			e += (smoothedEnergy[b] - e)*smoothingSlew;
+			smoothedEnergy[b] = e;
 		}
-		Sample e = 0;
-		for (int repeat = 0; repeat < 2; ++repeat) {
-			for (int b = bands - 1; b >= 0; --b) {
-				e += (smoothedEnergy[b] - e)*smoothingSlew;
-				smoothedEnergy[b] = e;
-			}
-			for (int b = 0; b < bands; ++b) {
-				e += (smoothedEnergy[b] - e)*smoothingSlew;
-				smoothedEnergy[b] = e;
-			}
+		for (int b = 0; b < bands; ++b) {
+			e += (smoothedEnergy[b] - e)*smoothingSlew;
+			smoothedEnergy[b] = e;
 		}
+		smoothEnergyState = e;
 	}
 	
 	Sample mapFreq(Sample freq) const {
 		if (customFreqMap) return customFreqMap(freq);
 		if (freq > freqTonalityLimit) {
-			Sample diff = freq - freqTonalityLimit;
-			return freqTonalityLimit*freqMultiplier + diff;
+			return freq + (freqMultiplier - 1)*freqTonalityLimit;
 		}
 		return freq*freqMultiplier;
 	}
 	
 	// Identifies spectral peaks using energy across all channels
-	void findPeaks(Sample smoothingBins) {
-		smoothEnergy(smoothingBins);
-
+	void findPeaks() {
 		peaks.resize(0);
 		
 		int start = 0;
@@ -631,6 +827,118 @@
 			outputMap[b] = {b + topOffset, 1};
 		}
 	}
+
+	// If we mapped formants the same way as mapFreq(), this would be the inverse
+	Sample invMapFormant(Sample freq) const {
+		if (freq*invFormantMultiplier > freqTonalityLimit) {
+			return freq + (1 - formantMultiplier)*freqTonalityLimit;
+		}
+		return freq*invFormantMultiplier;
+	}
+
+	Sample freqEstimateWeighted = 0;
+	Sample freqEstimateWeight = 0;
+	Sample estimateFrequency() {
+		// 3 highest peaks in the input
+		std::array<int, 3> peakIndices{0, 0, 0};
+		for (int b = 1; b < bands - 1; ++b) {
+			Sample e = formantMetric[b];
+			// local maxima only
+			if (e < formantMetric[b - 1] || e <= formantMetric[b + 1]) continue;
+			
+			if (e > formantMetric[peakIndices[0]]) {
+				if (e > formantMetric[peakIndices[1]]) {
+					if (e > formantMetric[peakIndices[2]]) {
+						peakIndices = {peakIndices[1], peakIndices[2], b};
+					} else {
+						peakIndices = {peakIndices[1], b, peakIndices[2]};
+					}
+				} else {
+					peakIndices[0] = b;
+				}
+			}
+		}
+		
+		// VERY rough pitch estimation
+		int peakEstimate = peakIndices[2];
+		if (formantMetric[peakIndices[1]] > formantMetric[peakIndices[2]]*0.1) {
+			int diff = std::abs(peakEstimate - peakIndices[1]);
+			if (diff > peakEstimate/8 && diff < peakEstimate*7/8) peakEstimate = peakEstimate%diff;
+			if (formantMetric[peakIndices[0]] > formantMetric[peakIndices[2]]*0.01) {
+				int diff = std::abs(peakEstimate - peakIndices[0]);
+				if (diff > peakEstimate/8 && diff < peakEstimate*7/8) peakEstimate = peakEstimate%diff;
+			}
+		}
+		Sample weight = formantMetric[peakIndices[2]];
+		// Smooth it out a bit
+		freqEstimateWeighted += (peakEstimate*weight - freqEstimateWeighted)*0.25;
+		freqEstimateWeight += (weight - freqEstimateWeight)*0.25;
+		
+		return freqEstimateWeighted/(freqEstimateWeight + Sample(1e-30));
+	}
+	
+	Sample freqEstimate;
+	
+	std::vector<Sample> formantMetric;
+	Sample formantBaseFreq = 0;
+	void updateFormants(size_t step) {
+		if (step-- == 0) {
+			for (auto &e : formantMetric) e = 0;
+			for (int c = 0; c < channels; ++c) {
+				Band *bins = bandsForChannel(c);
+				for (int b = 0; b < bands; ++b) {
+					formantMetric[b] += bins[b].inputEnergy;
+				}
+			}
+
+			freqEstimate = freqToBand(formantBaseFreq);
+			if (formantBaseFreq <= 0) freqEstimate = estimateFrequency();
+
+			for (int b = 0; b < bands; ++b) {
+				formantMetric[b] = std::sqrt(formantMetric[b]);
+			}
+		} else if (step-- == 0) {
+			Sample slew = 1/(freqEstimate*0.5 + 1);
+			Sample e = 0;
+			for (size_t repeat = 0; repeat < 2; ++repeat) {
+				for (int b = bands - 1; b >= 0; --b) {
+					e += (formantMetric[b] - e)*slew;
+					formantMetric[b] = e;
+				}
+				for (int b = 0; b < bands; ++b) {
+					e += (formantMetric[b] - e)*slew;
+					formantMetric[b] = e;
+				}
+			}
+		} else {
+			auto getFormant = [&](Sample band) -> Sample {
+				if (band < 0) return 0;
+				band = std::min<Sample>(band, bands);
+				int floorBand = std::floor(band);
+				Sample fracBand = band - floorBand;
+				Sample low = formantMetric[floorBand], high = formantMetric[floorBand + 1];
+				return low + (high - low)*fracBand;
+			};
+
+			for (int b = 0; b < bands; ++b) {
+				Sample inputF = bandToFreq(b);
+				Sample outputF = formantCompensation ? mapFreq(inputF) : inputF;
+				outputF = invMapFormant(outputF);
+
+				Sample inputE = formantMetric[b];
+				Sample targetE = getFormant(freqToBand(outputF));
+
+				Sample formantRatio = targetE/(inputE + Sample(1e-30));
+				Sample energyRatio = formantRatio*formantRatio;
+
+				for (int c = 0; c < channels; ++c) {
+					Band *bins = bandsForChannel(c);
+					// This is what's used to decide the output energy, so this affects the output
+					bins[b].inputEnergy *= energyRatio;
+				}
+			}
+		}
+	}
 };
 
 }} // namespace

Added: trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/LICENSE.txt
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/LICENSE.txt	Sun May 25 14:04:10 2025	(r23191)
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Signalsmith Audio
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

Added: trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/README.md
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/README.md	Sun May 25 14:04:10 2025	(r23191)
@@ -0,0 +1,91 @@
+# Signalsmith Linear
+
+Header-only C++11 wrappers for common things needed by our audio libraries.  It's designed for internal use, so [caveat developor](https://en.wikipedia.org/wiki/Caveat_emptor).  The goal is to wrap around Accelerate/IPP when available, but still work without it.
+
+Everything is in `signalsmith::linear::` namespace.
+
+## FFTs
+
+```cpp
+#include "signalsmith-linear/fft.h"
+```
+
+This provides real and complex FFTs.  They all have `.resize(size_t)`, and `.fft()`/`.ifft()` methods which can take either `std::complex<>` pointers, or a real/imaginary pair ("split-complex" form) for each complex argument.
+
+The `Pow2FFT<>` and `Pow2RealFFT<>` templates wrap around fast implementations where available.
+
+The main `FFT<>`, `RealFFT<>` and `ModifiedRealFFT<>` templates wrap around the `Pow2<>` implementations, to add support for multiples of 3 and 5.  They provide a static `.fastSizeAbove()` to find the next biggest size. 
+
+### Chunked computation
+
+The main FFT classes also provide `.steps()` method, and an optional `size_t` first argument to the `.fft()`/`.ifft()` methods, so that computation can be divided up into chunks.
+
+The computation time for the chunks is not exactly equal, but when you're doing large FFTs periodically (instead of smaller ones regularly) it can help distribute the computation out, without using threads.
+
+## STFTs
+
+```cpp
+#include "signalsmith-linear/stft.h"
+```
+
+This provides `DynamicSTFT<>` template, which is configured for any block length (zero padding to a fast size), and a (default!) interval between blocks.  It can have a different number of input/output channels, using `.spectrum(c)` to access both.
+
+### Input and output
+
+The `.synthesise()` method moves the output time forward, and adds in output block(s) generated from the spectrum.  The result can then be queried using `.readOutput()`.
+
+Input is passed in using `.writeInput()`, but you must move the input time forward using `.moveInput()` before calling `.analyse()`.  By default, this will analyse the most recent block of input, but if you passed in a non-zero `extraInputHistory` when configuring, you can analyse input from some time in the past.
+
+### Window shape and block interval
+
+It has separate analysis/synthesis windows, which can be changed on-the-fly.  Both windows have an "offset" marking the centre of the window, to support asymmetrical setups (for reduced latency).
+
+The "synthesis interval" (optionally passed to `.synthesise()`) can also change arbitrarily.  The output is normalised according to these gaps (and the analysis/synthesis windows) such that regardless of spacing or window shape, it would perfectly reconstruct the input.
+
+The analysis should remain constant between analysis and synthesis, since it's used for normalising the output properly.  If you're synthesising signals from scratch (which have no inherent input windowing), then you should reflect that by setting the analysis window to all 1s.
+
+### Chunked computation
+
+Similar to the FFTs' `.steps()` methods, `DynamicSTFT` has `.analyseSteps()`/`.synthesiseSteps()` methods, and `.analyseStep()`/`.synthesiseStep()` to help you spread the computation out over time.
+
+The window shapes/offsets, input/output and synthesis interval must stay the same until the analysis/synthesis is finished.
+
+## Expressions
+
+```cpp
+#include "signalsmith-linear/linear.h"
+```
+
+The main `Linear` class provides expression templates, which wrap around three types of pointer: real, complex, and split-complex.  You can wrap these pointers (or `std::vector`s) into `Expression`s using the `()` operator:
+
+```cpp
+Linear linear;
+
+float *a, *b;
+std::complex<float> *c;
+size_t size;
+
+// Once the above variables are set up:
+linear(a, size) = linear(b) + linear(c).abs();
+
+// Pass in two real pointers to make a split-complex expression (which is often a bit faster).
+linear(c, size) = linear(a, b).conj();
+```
+
+Implementations may use temporary internal storage.  This means it's not thread-safe, it should be a member of your processing class, and you should also call `.reserve???()` during configuration, with the longest vector length you expect to use.
+
+## Building
+
+### CMake
+
+If you're using CMake, include this directory.  It will add a `signalsmith-linear` target which doesn't build anything, but linking to this "library" will add the include path.
+
+### Other
+
+To use Accelerate on Mac, link the framework and define `SIGNALSMITH_USE_ACCELERATE`:
+
+```
+g++ -framework Accelerate -DSIGNALSMITH_USE_ACCELERATE
+```
+
+For IPP, link to `IPP::ippcore` and `IPP::ipps`, and define `SIGNALSMITH_USE_IPP`.

Added: trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/fft.h
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/fft.h	Sun May 25 14:04:10 2025	(r23191)
@@ -0,0 +1,1366 @@
+#ifndef SIGNALSMITH_AUDIO_LINEAR_FFT_H
+#define SIGNALSMITH_AUDIO_LINEAR_FFT_H
+
+#include <complex>
+#include <vector>
+#include <cmath>
+
+#if defined(__FAST_MATH__) && (__apple_build_version__ >= 16000000) && (__apple_build_version__ <= 16000099) && !defined(SIGNALSMITH_IGNORE_BROKEN_APPLECLANG)
+#	error Apple Clang 16.0.0 generates incorrect SIMD for ARM. If you HAVE to use this version of Clang, turn off -ffast-math.
+#endif
+
+#ifndef M_PI
+#	define M_PI 3.14159265358979323846
+#endif
+
+namespace signalsmith { namespace linear {
+
+namespace _impl {
+	template<class V>
+	void complexMul(std::complex<V> *a, const std::complex<V> *b, const std::complex<V> *c, size_t size) {
+		for (size_t i = 0; i < size; ++i) {
+			auto bi = b[i], ci = c[i];
+			a[i] = {bi.real()*ci.real() - bi.imag()*ci.imag(), bi.imag()*ci.real() + bi.real()*ci.imag()};
+		}
+	}
+	template<class V>
+	void complexMulConj(std::complex<V> *a, const std::complex<V> *b, const std::complex<V> *c, size_t size) {
+		for (size_t i = 0; i < size; ++i) {
+			auto bi = b[i], ci = c[i];
+			a[i] = {bi.real()*ci.real() + bi.imag()*ci.imag(), bi.imag()*ci.real() - bi.real()*ci.imag()};
+		}
+	}
+	template<class V>
+	void complexMul(V *ar, V *ai, const V *br, const V *bi, const V *cr, const V *ci, size_t size) {
+		for (size_t i = 0; i < size; ++i) {
+			V rr = br[i]*cr[i] - bi[i]*ci[i];
+			V ri = br[i]*ci[i] + bi[i]*cr[i];
+			ar[i] = rr;
+			ai[i] = ri;
+		}
+	}
+	template<class V>
+	void complexMulConj(V *ar, V *ai, const V *br, const V *bi, const V *cr, const V *ci, size_t size) {
+		for (size_t i = 0; i < size; ++i) {
+			V rr = cr[i]*br[i] + ci[i]*bi[i];
+			V ri = cr[i]*bi[i] - ci[i]*br[i];
+			ar[i] = rr;
+			ai[i] = ri;
+		}
+	}
+
+	// Input: aStride elements next to each other -> output with bStride
+	template<size_t aStride, class V>
+	void interleaveCopy(const V *a, V *b, size_t bStride) {
+		for (size_t bi = 0; bi < bStride; ++bi) {
+			const V *offsetA = a + bi*aStride;
+			V *offsetB = b + bi;
+			for (size_t ai = 0; ai < aStride; ++ai) {
+				offsetB[ai*bStride] = offsetA[ai];
+			}
+		}
+	}
+	template<class V>
+	void interleaveCopy(const V *a, V *b, size_t aStride, size_t bStride) {
+		for (size_t bi = 0; bi < bStride; ++bi) {
+			const V *offsetA = a + bi*aStride;
+			V *offsetB = b + bi;
+			for (size_t ai = 0; ai < aStride; ++ai) {
+				offsetB[ai*bStride] = offsetA[ai];
+			}
+		}
+	}
+	template<size_t aStride, class V>
+	void interleaveCopy(const V *aReal, const V *aImag, V *bReal, V *bImag, size_t bStride) {
+		for (size_t bi = 0; bi < bStride; ++bi) {
+			const V *offsetAr = aReal + bi*aStride;
+			const V *offsetAi = aImag + bi*aStride;
+			V *offsetBr = bReal + bi;
+			V *offsetBi = bImag + bi;
+			for (size_t ai = 0; ai < aStride; ++ai) {
+				offsetBr[ai*bStride] = offsetAr[ai];
+				offsetBi[ai*bStride] = offsetAi[ai];
+			}
+		}
+	}
+	template<class V>
+	void interleaveCopy(const V *aReal, const V *aImag, V *bReal, V *bImag, size_t aStride, size_t bStride) {
+		for (size_t bi = 0; bi < bStride; ++bi) {
+			const V *offsetAr = aReal + bi*aStride;
+			const V *offsetAi = aImag + bi*aStride;
+			V *offsetBr = bReal + bi;
+			V *offsetBi = bImag + bi;
+			for (size_t ai = 0; ai < aStride; ++ai) {
+				offsetBr[ai*bStride] = offsetAr[ai];
+				offsetBi[ai*bStride] = offsetAi[ai];
+			}
+		}
+	}
+}
+
+/// Fairly simple and very portable power-of-2 FFT
+template<typename Sample>
+struct SimpleFFT {
+	using Complex = std::complex<Sample>;
+	
+	SimpleFFT(size_t size=0) {
+		resize(size);
+	}
+	
+	void resize(size_t size) {
+		twiddles.resize(size*3/4);
+		for (size_t i = 0; i < size*3/4; ++i) {
+			Sample twiddlePhase = -2*M_PI*i/size;
+			twiddles[i] = std::polar(Sample(1), twiddlePhase);
+		}
+		working.resize(size);
+	}
+	
+	void fft(const Complex *time, Complex *freq) {
+		size_t size = working.size();
+		if (size <= 1) {
+			*freq = *time;
+			return;
+		}
+		fftPass<false>(size, 1, time, freq, working.data());
+	}
+
+	void ifft(const Complex *freq, Complex *time) {
+		size_t size = working.size();
+		if (size <= 1) {
+			*time = *freq;
+			return;
+		}
+		fftPass<true>(size, 1, freq, time, working.data());
+	}
+
+	void fft(const Sample *inR, const Sample *inI, Sample *outR, Sample *outI) {
+		size_t size = working.size();
+		if (size <= 1) {
+			*outR = *inR;
+			*outI = *inI;
+			return;
+		}
+		Sample *workingR = (Sample *)working.data(), *workingI = workingR + size;
+		fftPass<false>(size, 1, inR, inI, outR, outI, workingR, workingI);
+	}
+	void ifft(const Sample *inR, const Sample *inI, Sample *outR, Sample *outI) {
+		size_t size = working.size();
+		if (size <= 1) {
+			*outR = *inR;
+			*outI = *inI;
+			return;
+		}
+		Sample *workingR = (Sample *)working.data(), *workingI = workingR + size;
+		fftPass<true>(size, 1, inR, inI, outR, outI, workingR, workingI);
+	}
+private:
+	std::vector<Complex> twiddles;
+	std::vector<Complex> working;
+	
+	template<bool conjB>
+	static Complex mul(const Complex &a, const Complex &b) {
+		return conjB ? Complex{
+			a.real()*b.real() + a.imag()*b.imag(),
+			a.imag()*b.real() - a.real()*b.imag()
+		} : Complex{
+			a.real()*b.real() - a.imag()*b.imag(),
+			a.imag()*b.real() + a.real()*b.imag()
+		};
+	}
+
+	// Calculate a [size]-point FFT, where each element is a block of [stride] values
+	template<bool inverse>
+	void fftPass(size_t size, size_t stride, const Complex *input, Complex *output, Complex *working) {
+		if (size/4 > 1) {
+			// Calculate four quarter-size FFTs
+			fftPass<inverse>(size/4, stride*4, input, working, output);
+			combine4<inverse>(size, stride, working, output);
+		} else if (size == 4) {
+			combine4<inverse>(4, stride, input, output);
+		} else {
+			// 2-point FFT
+			for (size_t s = 0; s < stride; ++s) {
+				Complex a = input[s];
+				Complex b = input[s + stride];
+				output[s] = a + b;
+				output[s + stride] = a - b;
+			}
+		}
+	}
+
+	// Combine interleaved results into a single spectrum
+	template<bool inverse>
+	void combine4(size_t size, size_t stride, const Complex *input, Complex *output) const {
+		auto twiddleStep = working.size()/size;
+		for (size_t i = 0; i < size/4; ++i) {
+			Complex twiddleB = twiddles[i*twiddleStep];
+			Complex twiddleC = twiddles[i*2*twiddleStep];
+			Complex twiddleD = twiddles[i*3*twiddleStep];
+			
+			const Complex *inputA = input + 4*i*stride;
+			const Complex *inputB = input + (4*i + 1)*stride;
+			const Complex *inputC = input + (4*i + 2)*stride;
+			const Complex *inputD = input + (4*i + 3)*stride;
+			Complex *outputA = output + i*stride;
+			Complex *outputB = output + (i + size/4)*stride;
+			Complex *outputC = output + (i + size/4*2)*stride;
+			Complex *outputD = output + (i + size/4*3)*stride;
+			for (size_t s = 0; s < stride; ++s) {
+				Complex a = inputA[s];
+				Complex b = mul<inverse>(inputB[s], twiddleB);
+				Complex c = mul<inverse>(inputC[s], twiddleC);
+				Complex d = mul<inverse>(inputD[s], twiddleD);
+				Complex ac0 = a + c, ac1 = a - c;
+				Complex bd0 = b + d, bd1 = inverse ? (b - d) : (d - b);
+				Complex bd1i = {-bd1.imag(), bd1.real()};
+				outputA[s] = ac0 + bd0;
+				outputB[s] = ac1 + bd1i;
+				outputC[s] = ac0 - bd0;
+				outputD[s] = ac1 - bd1i;
+			}
+		}
+	}
+
+	// The same thing, but translated for split-complex input/output
+	template<bool inverse>
+	void fftPass(size_t size, size_t stride, const Sample *inputR, const Sample *inputI, Sample *outputR, Sample *outputI, Sample *workingR, Sample *workingI) const {
+		if (size/4 > 1) {
+			// Calculate four quarter-size FFTs
+			fftPass<inverse>(size/4, stride*4, inputR, inputI, workingR, workingI, outputR, outputI);
+			combine4<inverse>(size, stride, workingR, workingI, outputR, outputI);
+		} else if (size == 4) {
+			combine4<inverse>(4, stride, inputR, inputI, outputR, outputI);
+		} else {
+			// 2-point FFT
+			for (size_t s = 0; s < stride; ++s) {
+				Sample ar = inputR[s], ai = inputI[s];
+				Sample br = inputR[s + stride], bi = inputI[s + stride];
+				outputR[s] = ar + br;
+				outputI...
 
[truncated message content]

[OMPT-svn-commit] OpenMPT r23191 - in trunk/OpenMPT: include/SignalsmithStretch include/Signalsmith

A music tracker software for Windows.

[OMPT-svn-commit] OpenMPT r23191 - in trunk/OpenMPT: include/SignalsmithStretch include/SignalsmithStretch/SignalsmithStretch include/SignalsmithStretch/SignalsmithStretch/dsp include/SignalsmithStretch/signalsmith-linear tracklib