Author: sagamusix Date: Sun May 25 14:04:43 2025 New Revision: 23192 URL: https://source.openmpt.org/browse/openmpt/?op=revision&rev=23192 Log: Merged revision(s) 23191 from trunk/OpenMPT: [Var] Update Signalsmith Stretch to commit c78aacf6389fb862978d4ead01cfc71a3d5f8ca8 (2025-05-01) ........ Added: branches/OpenMPT-1.32/include/SignalsmithStretch/signalsmith-linear/ - copied from r23191, trunk/OpenMPT/include/SignalsmithStretch/signalsmith-linear/ Deleted: branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/dsp/ Modified: branches/OpenMPT-1.32/ (props changed) branches/OpenMPT-1.32/include/SignalsmithStretch/OpenMPT.txt branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/README.md branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h branches/OpenMPT-1.32/tracklib/TimeStretchPitchShift.cpp Modified: branches/OpenMPT-1.32/include/SignalsmithStretch/OpenMPT.txt ============================================================================== --- branches/OpenMPT-1.32/include/SignalsmithStretch/OpenMPT.txt Sun May 25 14:04:10 2025 (r23191) +++ branches/OpenMPT-1.32/include/SignalsmithStretch/OpenMPT.txt Sun May 25 14:04:43 2025 (r23192) @@ -1,3 +1,7 @@ -Signalsmith Stretch: C++ pitch/time library, commit ffa45981be0b75079b43d1a7a142234f06869254 (2025-02-12) +Signalsmith Stretch: C++ pitch shift / time stretch library, commit c78aacf6389fb862978d4ead01cfc71a3d5f8ca8 (2025-05-01) https://github.com/Signalsmith-Audio/signalsmith-stretch -Unneeded files (cmd and web folder) have been removed. \ No newline at end of file +Unneeded files have been removed. + +Signalsmith Linear: C++ audio utilities, commit 94d677be71a50789eb7a33b8927947046c0c2fa2 (2025-04-09) +https://github.com/Signalsmith-Audio/linear +Unneeded files have been removed. \ No newline at end of file Modified: branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/README.md ============================================================================== --- branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/README.md Sun May 25 14:04:10 2025 (r23191) +++ branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/README.md Sun May 25 14:04:43 2025 (r23192) @@ -90,6 +90,12 @@ You should be supplying input samples slightly ahead of the processing time (which is where changes to pitch-shift or stretch rate will be centred), and you'll receive output samples slightly behind that processing time. +### Split computation + +All of the `.preset???()` and `.configure()` methods have an optional `splitComputation` flag. When enabled, this introduces one extra interval of output latency, and uses this to spread the computation out more evenly. + +Without this (as is common for spectral processing) the library will occasionally do a bunch of computation all at once, to compute the next spectral block of audio. This is often fine, when audio's being processed across multiple threads with a decent amount of buffering (like mixing in a DAW), but if you're in a stricter situation then this flag might help. + #### Automation To follow pitch/time automation accurately, you should give it automation values from the current processing time (`.outputLatency()` samples ahead of the output), and feed it input from `.inputLatency()` samples ahead of the current processing time. @@ -114,21 +120,36 @@ stretch.flush(outputBuffers, outputSamples); ``` -Using `.seek()`/`.flush()` like this, you can perform an exact time-stretch on a fixed-length sound, and your result will have `.outputLatency()` of pre-roll. +Using `.seek()`/`.flush()` like this, you can perform an exact time-stretch on a fixed-length sound, and your result will have `.outputLatency()` of pre-roll. -## Compiling +### Formant compensation -⚠️ This has mostly been tested with Clang. If you're using another compiler and have any problems, please get in touch. +```cpp +stretch.setFormantFactor(1.2); // up ~3 semitones -🚨 It's generally be OK to enable `-ffast-math`, however there's a bug in Apple Clang 16.0.0 which can generate incorrect SIMD code. If you _have_ to use this version, we advise you don't use `-ffast-math`. +stretch.setTransposeSemitones(3); +``` -It's much slower (about 10x) if optimisation is disabled altogether, so you might want to enable optimisation where it's used, even in Debug builds. +Both of those methods both have an optional `compensatePitch` flag. Enabling this adjust for the pitch-shift (or non-linear map) when correcting/shifting formants. -### DSP Library +The formant correction is not a sharp as monophonic algorithms (such such as PSOLA). It also needs you to give it a rough estimate of fundamental frequency (relative to Nyquist): -This uses the Signalsmith DSP library for FFTs and other bits and bobs. +```cpp +// if 200Hz is the middle-register of the instrument +stretch.setFormantBase(200/sampleRate); +``` -For convenience, a copy of the library is included (with its own `LICENSE.txt`) in `dsp/`, but if you're already using this elsewhere then you should remove this copy to avoid versioning issues. +## Compiling + +⚠️ This has mostly been tested with AppleClang (Mac), and MSVC (Windows). We aim to support other compilers though, so please get in touch if you have any problems. + +Enabling `-ffast-math` (or equivalent) is fine, unless you're using Apple Clang 16.0.0 (in which case it will deliberately break, because that version generates incorrect SIMD code). + +The algorithm has a lot of number-crunching, so Debug builds are much slower (up to 10x). If possible, you might want to enable optimisation where Stretch is used, even in Debug builds. + +### Dependencies and `#define`s + +This uses the [Signalsmith Linear](https://github.com/Signalsmith-Audio/linear) library for FFTs and other speedups. There are [flags]([Linear repo](https://github.com/Signalsmith-Audio/linear?tab=readme-ov-file#building)) to enable Accelerate (`SIGNALSMITH_USE_ACCELERATE`) or IPP (`SIGNALSMITH_USE_IPP`). ## License @@ -136,14 +157,18 @@ ## Other environments / languages -There's a Web Audio wrapper in `web/` (using WASM/AudioWorklet). This will remain in-sync with the C++ library. +There's a Web Audio release in [`web/`](web/) (WASM/AudioWorklet), also available on [NPM](npmjs.com/package/signalsmith-stretch). This has its own (higher-level) API, but the stretching algorithm will remain in-sync with the C++ library. + +There's a [Python binding](https://pypi.org/project/python-stretch/) written/published by [Gregorio Andrea Giudici](https://github.com/gregogiudici/python-stretch). This is used as the default pitch/time method in [Audiomentations](https://iver56.github.io/audiomentations/). -There's a [Python binding](https://pypi.org/project/python-stretch/) written/published by [Gregorio Andrea Giudici](https://github.com/gregogiudici/python-stretch), and a [Rust wrapper](https://crates.io/crates/signalsmith-stretch) by [Colin Marc](https://github.com/colinmarc/signalsmith-stretch-rs). +There's a [Rust wrapper](https://crates.io/crates/signalsmith-stretch) by [Colin Marc](https://github.com/colinmarc/signalsmith-stretch-rs). ## Thanks We'd like to particularly thank the following people who sponsored specific features or improvements: +* **Future Audio Workshop**: chunked-computation mode +* **Jochem de Jong**: formant shifting/compensation * **Metronaut**: web audio (JS/WASM) release * **Daniel L Bowling** and the Stanford School of Medicine: web audio improvements Modified: branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h ============================================================================== --- branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h Sun May 25 14:04:10 2025 (r23191) +++ branches/OpenMPT-1.32/include/SignalsmithStretch/SignalsmithStretch/signalsmith-stretch.h Sun May 25 14:04:43 2025 (r23192) @@ -1,63 +1,89 @@ #ifndef SIGNALSMITH_STRETCH_H #define SIGNALSMITH_STRETCH_H -#include "dsp/spectral.h" -#include "dsp/delay.h" -#include "dsp/perf.h" -SIGNALSMITH_DSP_VERSION_CHECK(1, 6, 0); // Check version is compatible +#include "signalsmith-linear/stft.h" // https://github.com/Signalsmith-Audio/linear + #include <vector> +#include <array> #include <algorithm> #include <functional> #include <random> +#include <limits> +#include <type_traits> namespace signalsmith { namespace stretch { -template<typename Sample=float, class RandomEngine=std::default_random_engine> +namespace _impl { + template<bool conjugateSecond=false, typename V> + static std::complex<V> mul(const std::complex<V> &a, const std::complex<V> &b) { + return conjugateSecond ? std::complex<V>{ + b.real()*a.real() + b.imag()*a.imag(), + b.real()*a.imag() - b.imag()*a.real() + } : std::complex<V>{ + a.real()*b.real() - a.imag()*b.imag(), + a.real()*b.imag() + a.imag()*b.real() + }; + } + template<typename V> + static V norm(const std::complex<V> &a) { + V r = a.real(), i = a.imag(); + return r*r + i*i; + } +} + +template<typename Sample=float, class RandomEngine=void> struct SignalsmithStretch { - static constexpr size_t version[3] = {1, 1, 1}; + static constexpr size_t version[3] = {1, 3, 1}; SignalsmithStretch() : randomEngine(std::random_device{}()) {} SignalsmithStretch(long seed) : randomEngine(seed) {} - + int blockSamples() const { - return stft.windowSize(); + return int(stft.blockSamples()); } int intervalSamples() const { - return stft.interval(); + return int(stft.defaultInterval()); } int inputLatency() const { - return stft.windowSize()/2; + return int(stft.analysisLatency()); } int outputLatency() const { - return stft.windowSize() - inputLatency(); + return int(stft.synthesisLatency() + _splitComputation*stft.defaultInterval()); } void reset() { - stft.reset(); - inputBuffer.reset(); + stft.reset(0.1); + stashedInput = stft.input; + stashedOutput = stft.output; + prevInputOffset = -1; channelBands.assign(channelBands.size(), Band()); silenceCounter = 0; didSeek = false; - flushed = true; + blockProcess = {}; + freqEstimateWeighted = freqEstimateWeight = 0; } // Configures using a default preset - void presetDefault(int nChannels, Sample sampleRate) { - configure(nChannels, sampleRate*0.12, sampleRate*0.03); + void presetDefault(int nChannels, Sample sampleRate, bool splitComputation=false) { + configure(nChannels, sampleRate*0.12, sampleRate*0.03, splitComputation); } - void presetCheaper(int nChannels, Sample sampleRate) { - configure(nChannels, sampleRate*0.1, sampleRate*0.04); + void presetCheaper(int nChannels, Sample sampleRate, bool splitComputation=true) { + configure(nChannels, sampleRate*0.1, sampleRate*0.04, splitComputation); } // Manual setup - void configure(int nChannels, int blockSamples, int intervalSamples) { + void configure(int nChannels, int blockSamples, int intervalSamples, bool splitComputation=false) { + _splitComputation = splitComputation; channels = nChannels; - stft.setWindow(stft.kaiser, true); - stft.resize(channels, blockSamples, intervalSamples); - bands = stft.bands(); - inputBuffer.resize(channels, blockSamples + intervalSamples + 1); - timeBuffer.assign(stft.fftSize(), 0); + stft.configure(channels, channels, blockSamples, intervalSamples + 1); + stft.setInterval(intervalSamples, stft.kaiser); + stft.reset(0.1); + stashedInput = stft.input; + stashedOutput = stft.output; + tmpBuffer.resize(blockSamples + intervalSamples); + + bands = int(stft.bands()); channelBands.assign(bands*channels, Band()); peaks.reserve(bands/2); @@ -65,6 +91,9 @@ smoothedEnergy.resize(bands); outputMap.resize(bands); channelPredictions.resize(channels*bands); + + blockProcess = {}; + formantMetric.resize(bands + 2); } /// Frequency multiplier, and optional tonality limit (as multiple of sample-rate) @@ -79,39 +108,76 @@ } void setTransposeSemitones(Sample semitones, Sample tonalityLimit=0) { setTransposeFactor(std::pow(2, semitones/12), tonalityLimit); - customFreqMap = nullptr; } // Sets a custom frequency map - should be monotonically increasing void setFreqMap(std::function<Sample(Sample)> inputToOutput) { customFreqMap = inputToOutput; } + void setFormantFactor(Sample multiplier, bool compensatePitch=false) { + formantMultiplier = multiplier; + invFormantMultiplier = 1/multiplier; + formantCompensation = compensatePitch; + } + void setFormantSemitones(Sample semitones, bool compensatePitch=false) { + setFormantFactor(std::pow(2, semitones/12), compensatePitch); + } + // Rough guesstimate of the fundamental frequency, used for formant analysis. 0 means attempting to detect the pitch + void setFormantBase(Sample baseFreq=0) { + formantBaseFreq = baseFreq; + } + // Provide previous input ("pre-roll"), without affecting the speed calculation. You should ideally feed it one block-length + one interval template<class Inputs> void seek(Inputs &&inputs, int inputSamples, double playbackRate) { - inputBuffer.reset(); + tmpBuffer.resize(0); + tmpBuffer.resize(stft.blockSamples() + stft.defaultInterval()); + + int startIndex = std::max<int>(0, inputSamples - int(tmpBuffer.size())); // start position in input + int padStart = int(tmpBuffer.size() + startIndex) - inputSamples; // start position in tmpBuffer + Sample totalEnergy = 0; for (int c = 0; c < channels; ++c) { auto &&inputChannel = inputs[c]; - auto &&bufferChannel = inputBuffer[c]; - int startIndex = std::max<int>(0, inputSamples - stft.windowSize() - stft.interval()); for (int i = startIndex; i < inputSamples; ++i) { Sample s = inputChannel[i]; totalEnergy += s*s; - bufferChannel[i] = s; + tmpBuffer[i - startIndex + padStart] = s; } + + stft.writeInput(c, tmpBuffer.size(), tmpBuffer.data()); } + stft.moveInput(tmpBuffer.size()); if (totalEnergy >= noiseFloor) { silenceCounter = 0; silenceFirst = true; } - inputBuffer += inputSamples; didSeek = true; - seekTimeFactor = (playbackRate*stft.interval() > 1) ? 1/playbackRate : stft.interval(); + seekTimeFactor = (playbackRate*stft.defaultInterval() > 1) ? 1/playbackRate : stft.defaultInterval(); } - + template<class Inputs, class Outputs> void process(Inputs &&inputs, int inputSamples, Outputs &&outputs, int outputSamples) { +#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_START + SIGNALSMITH_STRETCH_PROFILE_PROCESS_START(inputSamples, outputSamples); +#endif + int prevCopiedInput = 0; + auto copyInput = [&](int toIndex){ + + int length = std::min<int>(int(stft.blockSamples() + stft.defaultInterval()), toIndex - prevCopiedInput); + tmpBuffer.resize(length); + int offset = toIndex - length; + for (int c = 0; c < channels; ++c) { + auto &&inputBuffer = inputs[c]; + for (int i = 0; i < length; ++i) { + tmpBuffer[i] = inputBuffer[i + offset]; + } + stft.writeInput(c, length, tmpBuffer.data()); + } + stft.moveInput(length); + prevCopiedInput = toIndex; + }; + Sample totalEnergy = 0; for (int c = 0; c < channels; ++c) { auto &&inputChannel = inputs[c]; @@ -120,10 +186,13 @@ totalEnergy += s*s; } } + if (totalEnergy < noiseFloor) { - if (silenceCounter >= 2*stft.windowSize()) { - if (silenceFirst) { + if (silenceCounter >= 2*stft.blockSamples()) { + if (silenceFirst) { // first block of silence processing silenceFirst = false; + //stft.reset(); + blockProcess = {}; for (auto &b : channelBands) { b.input = b.prevInput = b.output = 0; b.inputEnergy = 0; @@ -148,15 +217,7 @@ } // Store input in history buffer - for (int c = 0; c < channels; ++c) { - auto &&inputChannel = inputs[c]; - auto &&bufferChannel = inputBuffer[c]; - int startIndex = std::max<int>(0, inputSamples - stft.windowSize() - stft.interval()); - for (int i = startIndex; i < inputSamples; ++i) { - bufferChannel[i] = inputChannel[i]; - } - } - inputBuffer += inputSamples; + copyInput(inputSamples); return; } else { silenceCounter += inputSamples; @@ -165,119 +226,174 @@ silenceCounter = 0; silenceFirst = true; } - + for (int outputIndex = 0; outputIndex < outputSamples; ++outputIndex) { - stft.ensureValid(outputIndex, [&](int outputOffset) { + bool newBlock = blockProcess.samplesSinceLast >= stft.defaultInterval(); + if (newBlock) { + blockProcess.step = 0; + blockProcess.steps = 0; // how many processing steps this block will have + blockProcess.samplesSinceLast = 0; + // Time to process a spectrum! Where should it come from in the input? - int inputOffset = std::round(outputOffset*Sample(inputSamples)/outputSamples) - stft.windowSize(); + int inputOffset = std::round(outputIndex*Sample(inputSamples)/outputSamples); int inputInterval = inputOffset - prevInputOffset; prevInputOffset = inputOffset; + + copyInput(inputOffset); + stashedInput = stft.input; // save the input state, since that's what we'll analyse later + if (_splitComputation) { + stashedOutput = stft.output; // save the current output, and read from it + stft.moveOutput(stft.defaultInterval()); // the actual input jumps forward in time by one interval, ready for the synthesis + } + + blockProcess.newSpectrum = didSeek || (inputInterval > 0); + blockProcess.mappedFrequencies = customFreqMap || freqMultiplier != 1; + if (blockProcess.newSpectrum) { + // make sure the previous input is the correct distance in the past (give or take 1 sample) + blockProcess.reanalysePrev = didSeek || std::abs(inputInterval - int(stft.defaultInterval())) > 1; + if (blockProcess.reanalysePrev) blockProcess.steps += stft.analyseSteps() + 1; - bool newSpectrum = didSeek || (inputInterval > 0); - if (newSpectrum) { - for (int c = 0; c < channels; ++c) { - // Copy from the history buffer, if needed - auto &&bufferChannel = inputBuffer[c]; - for (int i = 0; i < -inputOffset; ++i) { - timeBuffer[i] = bufferChannel[i + inputOffset]; + // analyse a new input + blockProcess.steps += stft.analyseSteps() + 1; + } + + blockProcess.processFormants = formantMultiplier != 1 || (formantCompensation && blockProcess.mappedFrequencies); + + blockProcess.timeFactor = didSeek ? seekTimeFactor : stft.defaultInterval()/std::max<Sample>(1, inputInterval); + didSeek = false; + + updateProcessSpectrumSteps(); + blockProcess.steps += processSpectrumSteps; + + blockProcess.steps += stft.synthesiseSteps() + 1; + } + + size_t processToStep = newBlock ? blockProcess.steps : 0; + if (_splitComputation) { + Sample processRatio = Sample(blockProcess.samplesSinceLast + 1)/stft.defaultInterval(); + processToStep = std::min<size_t>(blockProcess.steps, (blockProcess.steps + 0.999f)*processRatio); + } + + while (blockProcess.step < processToStep) { + size_t step = blockProcess.step++; +#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_STEP + SIGNALSMITH_STRETCH_PROFILE_PROCESS_STEP(step, blockProcess.steps); +#endif + if (blockProcess.newSpectrum) { + if (blockProcess.reanalysePrev) { + // analyse past input + if (step < stft.analyseSteps()) { + stashedInput.swap(stft.input); + stft.analyseStep(step, stft.defaultInterval()); + stashedInput.swap(stft.input); + continue; } - // Copy the rest from the input - auto &&inputChannel = inputs[c]; - for (int i = std::max<int>(0, -inputOffset); i < stft.windowSize(); ++i) { - timeBuffer[i] = inputChannel[i + inputOffset]; + step -= stft.analyseSteps(); + if (step < 1) { + // Copy previous analysis to our band objects + for (int c = 0; c < channels; ++c) { + auto channelBands = bandsForChannel(c); + auto *spectrumBands = stft.spectrum(c); + for (int b = 0; b < bands; ++b) { + channelBands[b].prevInput = spectrumBands[b]; + } + } + continue; } - stft.analyse(c, timeBuffer); + step -= 1; } - flushed = false; // TODO: first block after a flush should be gain-compensated - for (int c = 0; c < channels; ++c) { - auto channelBands = bandsForChannel(c); - auto &&spectrumBands = stft.spectrum[c]; - for (int b = 0; b < bands; ++b) { - channelBands[b].input = spectrumBands[b]; - } + // Analyse latest (stashed) input + if (step < stft.analyseSteps()) { + stashedInput.swap(stft.input); + stft.analyseStep(step); + stashedInput.swap(stft.input); + continue; } - - if (didSeek || inputInterval != stft.interval()) { // make sure the previous input is the correct distance in the past - int prevIntervalOffset = inputOffset - stft.interval(); - for (int c = 0; c < channels; ++c) { - // Copy from the history buffer, if needed - auto &&bufferChannel = inputBuffer[c]; - for (int i = 0; i < std::min(-prevIntervalOffset, stft.windowSize()); ++i) { - timeBuffer[i] = bufferChannel[i + prevIntervalOffset]; - } - // Copy the rest from the input - auto &&inputChannel = inputs[c]; - for (int i = std::max<int>(0, -prevIntervalOffset); i < stft.windowSize(); ++i) { - timeBuffer[i] = inputChannel[i + prevIntervalOffset]; - } - stft.analyse(c, timeBuffer); - } + step -= stft.analyseSteps(); + if (step < 1) { + // Copy analysed spectrum into our band objects for (int c = 0; c < channels; ++c) { auto channelBands = bandsForChannel(c); - auto &&spectrumBands = stft.spectrum[c]; + auto *spectrumBands = stft.spectrum(c); for (int b = 0; b < bands; ++b) { - channelBands[b].prevInput = spectrumBands[b]; + channelBands[b].input = spectrumBands[b]; } } + continue; } + step -= 1; } - - Sample timeFactor = didSeek ? seekTimeFactor : stft.interval()/std::max<Sample>(1, inputInterval); - processSpectrum(newSpectrum, timeFactor); - didSeek = false; - for (int c = 0; c < channels; ++c) { - auto channelBands = bandsForChannel(c); - auto &&spectrumBands = stft.spectrum[c]; - for (int b = 0; b < bands; ++b) { - spectrumBands[b] = channelBands[b].output; + if (step < processSpectrumSteps) { + processSpectrum(step); + continue; + } + step -= processSpectrumSteps; + + if (step < 1) { + // Copy band objects into spectrum + for (int c = 0; c < channels; ++c) { + auto channelBands = bandsForChannel(c); + auto *spectrumBands = stft.spectrum(c); + for (int b = 0; b < bands; ++b) { + spectrumBands[b] = channelBands[b].output; + } } + continue; } - }); + step -= 1; + + if (step < stft.synthesiseSteps()) { + stft.synthesiseStep(step); + continue; + } + } +#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_ENDSTEP + SIGNALSMITH_STRETCH_PROFILE_PROCESS_ENDSTEP(); +#endif + ++blockProcess.samplesSinceLast; + if (_splitComputation) stashedOutput.swap(stft.output); for (int c = 0; c < channels; ++c) { auto &&outputChannel = outputs[c]; - auto &&stftChannel = stft[c]; - outputChannel[outputIndex] = stftChannel[outputIndex]; + Sample v = 0; + stft.readOutput(c, 1, &v); + outputChannel[outputIndex] = v; } + stft.moveOutput(1); + if (_splitComputation) stashedOutput.swap(stft.output); } - - // Store input in history buffer - for (int c = 0; c < channels; ++c) { - auto &&inputChannel = inputs[c]; - auto &&bufferChannel = inputBuffer[c]; - int startIndex = std::max<int>(0, inputSamples - stft.windowSize()); - for (int i = startIndex; i < inputSamples; ++i) { - bufferChannel[i] = inputChannel[i]; - } - } - inputBuffer += inputSamples; - stft += outputSamples; + + copyInput(inputSamples); prevInputOffset -= inputSamples; +#ifdef SIGNALSMITH_STRETCH_PROFILE_PROCESS_END + SIGNALSMITH_STRETCH_PROFILE_PROCESS_END(); +#endif } // Read the remaining output, providing no further input. `outputSamples` should ideally be at least `.outputLatency()` template<class Outputs> void flush(Outputs &&outputs, int outputSamples) { - int plainOutput = std::min<int>(outputSamples, stft.windowSize()); - int foldedBackOutput = std::min<int>(outputSamples, stft.windowSize() - plainOutput); + int plainOutput = std::min<int>(outputSamples, int(stft.blockSamples())); + int foldedBackOutput = std::min<int>(outputSamples, int(stft.blockSamples()) - plainOutput); + stft.finishOutput(1); for (int c = 0; c < channels; ++c) { + tmpBuffer.resize(plainOutput); + stft.readOutput(c, plainOutput, tmpBuffer.data()); auto &&outputChannel = outputs[c]; - auto &&stftChannel = stft[c]; for (int i = 0; i < plainOutput; ++i) { // TODO: plain output should be gain- - outputChannel[i] = stftChannel[i]; + outputChannel[i] = tmpBuffer[i]; } + tmpBuffer.resize(foldedBackOutput); + stft.readOutput(c, plainOutput, foldedBackOutput, tmpBuffer.data()); for (int i = 0; i < foldedBackOutput; ++i) { - outputChannel[outputSamples - 1 - i] -= stftChannel[plainOutput + i]; - } - for (int i = 0; i < plainOutput + foldedBackOutput; ++i) { - stftChannel[i] = 0; + outputChannel[outputSamples - 1 - i] -= tmpBuffer[i]; } } - // Skip the output we just used/cleared - stft += plainOutput + foldedBackOutput; + stft.reset(0.1); + // Reset the phase-vocoder stuff, so the next block gets a fresh start for (int c = 0; c < channels; ++c) { auto channelBands = bandsForChannel(c); @@ -285,31 +401,50 @@ channelBands[b].prevInput = channelBands[b].output = 0; } } - flushed = true; } private: + bool _splitComputation = false; + struct { + size_t samplesSinceLast = std::numeric_limits<size_t>::max(); + size_t steps = 0; + size_t step = 0; + + bool newSpectrum = false; + bool reanalysePrev = false; + bool mappedFrequencies = false; + bool processFormants = false; + Sample timeFactor; + } blockProcess; + using Complex = std::complex<Sample>; static constexpr Sample noiseFloor{1e-15}; static constexpr Sample maxCleanStretch{2}; // time-stretch ratio before we start randomising phases - int silenceCounter = 0; + size_t silenceCounter = 0; bool silenceFirst = true; Sample freqMultiplier = 1, freqTonalityLimit = 0.5; std::function<Sample(Sample)> customFreqMap = nullptr; + + bool formantCompensation = false; // compensate for pitch/freq change + Sample formantMultiplier = 1, invFormantMultiplier = 1; + + using STFT = signalsmith::linear::DynamicSTFT<Sample, false, true>; + STFT stft; + typename STFT::Input stashedInput; + typename STFT::Output stashedOutput; + + std::vector<Sample> tmpBuffer; - signalsmith::spectral::STFT<Sample> stft{0, 1, 1}; - signalsmith::delay::MultiBuffer<Sample> inputBuffer; int channels = 0, bands = 0; int prevInputOffset = -1; - std::vector<Sample> timeBuffer; - bool didSeek = false, flushed = true; + bool didSeek = false; Sample seekTimeFactor = 1; Sample bandToFreq(Sample b) const { - return (b + Sample(0.5))/stft.fftSize(); + return stft.binToFreq(b); } Sample freqToBand(Sample f) const { - return f*stft.fftSize() - Sample(0.5); + return stft.freqToBin(f); } struct Band { @@ -371,10 +506,10 @@ Complex input; Complex makeOutput(Complex phase) { - Sample phaseNorm = std::norm(phase); + Sample phaseNorm = _impl::norm(phase); if (phaseNorm <= noiseFloor) { phase = input; // prediction is too weak, fall back to the input - phaseNorm = std::norm(input) + noiseFloor; + phaseNorm = _impl::norm(input) + noiseFloor; } return phase*std::sqrt(energy/phaseNorm); } @@ -384,49 +519,95 @@ return channelPredictions.data() + c*bands; } - RandomEngine randomEngine; - - void processSpectrum(bool newSpectrum, Sample timeFactor) { + // If RandomEngine=void, use std::default_random_engine; + using RandomEngineImpl = typename std::conditional< + std::is_void<RandomEngine>::value, + std::default_random_engine, + RandomEngine + >::type; + RandomEngineImpl randomEngine; + + size_t processSpectrumSteps = 0; + static constexpr size_t splitMainPrediction = 8; // it's just heavy, since we're blending up to 4 different phase predictions + void updateProcessSpectrumSteps() { + processSpectrumSteps = 0; + if (blockProcess.newSpectrum) processSpectrumSteps += channels; + if (blockProcess.mappedFrequencies) { + processSpectrumSteps += smoothEnergySteps; + processSpectrumSteps += 1; // findPeaks + } + processSpectrumSteps += 1; // updating the output map + processSpectrumSteps += channels; // preliminary phase-vocoder prediction + processSpectrumSteps += splitMainPrediction; + if (blockProcess.newSpectrum) processSpectrumSteps += 1; // .input -> .prevInput + if (blockProcess.processFormants) processSpectrumSteps += 3; + } + void processSpectrum(size_t step) { + Sample timeFactor = blockProcess.timeFactor; + + Sample smoothingBins = Sample(stft.fftSamples())/stft.defaultInterval(); + int longVerticalStep = std::round(smoothingBins); timeFactor = std::max<Sample>(timeFactor, 1/maxCleanStretch); bool randomTimeFactor = (timeFactor > maxCleanStretch); std::uniform_real_distribution<Sample> timeFactorDist(maxCleanStretch*2*randomTimeFactor - timeFactor, timeFactor); - - if (newSpectrum) { - for (int c = 0; c < channels; ++c) { - auto bins = bandsForChannel(c); - Complex rot = std::polar(Sample(1), bandToFreq(0)*stft.interval()*Sample(2*M_PI)); + if (blockProcess.newSpectrum) { + if (step < size_t(channels)) { + int channel = int(step); + auto bins = bandsForChannel(channel); + + Complex rot = std::polar(Sample(1), bandToFreq(0)*stft.defaultInterval()*Sample(2*M_PI)); Sample freqStep = bandToFreq(1) - bandToFreq(0); - Complex rotStep = std::polar(Sample(1), freqStep*stft.interval()*Sample(2*M_PI)); - + Complex rotStep = std::polar(Sample(1), freqStep*stft.defaultInterval()*Sample(2*M_PI)); + for (int b = 0; b < bands; ++b) { auto &bin = bins[b]; - bin.output = signalsmith::perf::mul(bin.output, rot); - bin.prevInput = signalsmith::perf::mul(bin.prevInput, rot); - rot = signalsmith::perf::mul(rot, rotStep); + bin.output = _impl::mul(bin.output, rot); + bin.prevInput = _impl::mul(bin.prevInput, rot); + rot = _impl::mul(rot, rotStep); } + return; } + step -= channels; } + if (blockProcess.mappedFrequencies) { + if (step < smoothEnergySteps) { + smoothEnergy(step, smoothingBins); + return; + } + step -= smoothEnergySteps; + if (step-- == 0) { + findPeaks(); + return; + } + } + if (step-- == 0) { + if (blockProcess.mappedFrequencies) { + updateOutputMap(); + } else { // we're not pitch-shifting, so no need to find peaks etc. + for (int c = 0; c < channels; ++c) { + Band *bins = bandsForChannel(c); + for (int b = 0; b < bands; ++b) { + bins[b].inputEnergy = _impl::norm(bins[b].input); + } + } - Sample smoothingBins = Sample(stft.fftSize())/stft.interval(); - int longVerticalStep = std::round(smoothingBins); - if (customFreqMap || freqMultiplier != 1) { - findPeaks(smoothingBins); - updateOutputMap(); - } else { // we're not pitch-shifting, so no need to find peaks etc. - for (int c = 0; c < channels; ++c) { - Band *bins = bandsForChannel(c); for (int b = 0; b < bands; ++b) { - bins[b].inputEnergy = std::norm(bins[b].input); + outputMap[b] = {Sample(b), 1}; } } - for (int b = 0; b < bands; ++b) { - outputMap[b] = {Sample(b), 1}; + return; + } + if (blockProcess.processFormants) { + if (step < 3) { + updateFormants(step); + return; } + step -= 3; } - // Preliminary output prediction from phase-vocoder - for (int c = 0; c < channels; ++c) { + if (step < size_t(channels)) { + int c = int(step); Band *bins = bandsForChannel(c); auto *predictions = predictionsForChannel(c); for (int b = 0; b < bands; ++b) { @@ -442,137 +623,152 @@ auto &outputBin = bins[b]; Complex prevInput = getFractional<&Band::prevInput>(c, lowIndex, fracIndex); - Complex freqTwist = signalsmith::perf::mul<true>(prediction.input, prevInput); - Complex phase = signalsmith::perf::mul(outputBin.output, freqTwist); + Complex freqTwist = _impl::mul<true>(prediction.input, prevInput); + Complex phase = _impl::mul(outputBin.output, freqTwist); outputBin.output = phase/(std::max(prevEnergy, prediction.energy) + noiseFloor); } + return; } + step -= channels; - // Re-predict using phase differences between frequencies - for (int b = 0; b < bands; ++b) { - // Find maximum-energy channel and calculate that - int maxChannel = 0; - Sample maxEnergy = predictionsForChannel(0)[b].energy; - for (int c = 1; c < channels; ++c) { - Sample e = predictionsForChannel(c)[b].energy; - if (e > maxEnergy) { - maxChannel = c; - maxEnergy = e; + if (step < splitMainPrediction) { + // Re-predict using phase differences between frequencies + size_t chunk = step; + int startB = int(bands*chunk/splitMainPrediction); + int endB = int(bands*(chunk + 1)/splitMainPrediction); + for (int b = startB; b < endB; ++b) { + // Find maximum-energy channel and calculate that + int maxChannel = 0; + Sample maxEnergy = predictionsForChannel(0)[b].energy; + for (int c = 1; c < channels; ++c) { + Sample e = predictionsForChannel(c)[b].energy; + if (e > maxEnergy) { + maxChannel = c; + maxEnergy = e; + } } - } - auto *predictions = predictionsForChannel(maxChannel); - auto &prediction = predictions[b]; - auto *bins = bandsForChannel(maxChannel); - auto &outputBin = bins[b]; - - Complex phase = 0; - auto mapPoint = outputMap[b]; - - // Upwards vertical steps - if (b > 0) { - Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor; - Complex downInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - binTimeFactor); - Complex shortVerticalTwist = signalsmith::perf::mul<true>(prediction.input, downInput); + auto *predictions = predictionsForChannel(maxChannel); + auto &prediction = predictions[b]; + auto *bins = bandsForChannel(maxChannel); + auto &outputBin = bins[b]; - auto &downBin = bins[b - 1]; - phase += signalsmith::perf::mul(downBin.output, shortVerticalTwist); - - if (b >= longVerticalStep) { - Complex longDownInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - longVerticalStep*binTimeFactor); - Complex longVerticalTwist = signalsmith::perf::mul<true>(prediction.input, longDownInput); + Complex phase = 0; + auto mapPoint = outputMap[b]; - auto &longDownBin = bins[b - longVerticalStep]; - phase += signalsmith::perf::mul(longDownBin.output, longVerticalTwist); - } - } - // Downwards vertical steps - if (b < bands - 1) { - auto &upPrediction = predictions[b + 1]; - auto &upMapPoint = outputMap[b + 1]; + // Upwards vertical steps + if (b > 0) { + Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor; + Complex downInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - binTimeFactor); + Complex shortVerticalTwist = _impl::mul<true>(prediction.input, downInput); - Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor; - Complex downInput = getFractional<&Band::input>(maxChannel, upMapPoint.inputBin - binTimeFactor); - Complex shortVerticalTwist = signalsmith::perf::mul<true>(upPrediction.input, downInput); + auto &downBin = bins[b - 1]; + phase += _impl::mul(downBin.output, shortVerticalTwist); + + if (b >= longVerticalStep) { + Complex longDownInput = getFractional<&Band::input>(maxChannel, mapPoint.inputBin - longVerticalStep*binTimeFactor); + Complex longVerticalTwist = _impl::mul<true>(prediction.input, longDownInput); - auto &upBin = bins[b + 1]; - phase += signalsmith::perf::mul<true>(upBin.output, shortVerticalTwist); - - if (b < bands - longVerticalStep) { - auto &longUpPrediction = predictions[b + longVerticalStep]; - auto &longUpMapPoint = outputMap[b + longVerticalStep]; + auto &longDownBin = bins[b - longVerticalStep]; + phase += _impl::mul(longDownBin.output, longVerticalTwist); + } + } + // Downwards vertical steps + if (b < bands - 1) { + auto &upPrediction = predictions[b + 1]; + auto &upMapPoint = outputMap[b + 1]; + + Sample binTimeFactor = randomTimeFactor ? timeFactorDist(randomEngine) : timeFactor; + Complex downInput = getFractional<&Band::input>(maxChannel, upMapPoint.inputBin - binTimeFactor); + Complex shortVerticalTwist = _impl::mul<true>(upPrediction.input, downInput); + + auto &upBin = bins[b + 1]; + phase += _impl::mul<true>(upBin.output, shortVerticalTwist); + + if (b < bands - longVerticalStep) { + auto &longUpPrediction = predictions[b + longVerticalStep]; + auto &longUpMapPoint = outputMap[b + longVerticalStep]; - Complex longDownInput = getFractional<&Band::input>(maxChannel, longUpMapPoint.inputBin - longVerticalStep*binTimeFactor); - Complex longVerticalTwist = signalsmith::perf::mul<true>(longUpPrediction.input, longDownInput); + Complex longDownInput = getFractional<&Band::input>(maxChannel, longUpMapPoint.inputBin - longVerticalStep*binTimeFactor); + Complex longVerticalTwist = _impl::mul<true>(longUpPrediction.input, longDownInput); - auto &longUpBin = bins[b + longVerticalStep]; - phase += signalsmith::perf::mul<true>(longUpBin.output, longVerticalTwist); + auto &longUpBin = bins[b + longVerticalStep]; + phase += _impl::mul<true>(longUpBin.output, longVerticalTwist); + } } - } - outputBin.output = prediction.makeOutput(phase); - - // All other bins are locked in phase - for (int c = 0; c < channels; ++c) { - if (c != maxChannel) { - auto &channelBin = bandsForChannel(c)[b]; - auto &channelPrediction = predictionsForChannel(c)[b]; - - Complex channelTwist = signalsmith::perf::mul<true>(channelPrediction.input, prediction.input); - Complex channelPhase = signalsmith::perf::mul(outputBin.output, channelTwist); - channelBin.output = channelPrediction.makeOutput(channelPhase); + outputBin.output = prediction.makeOutput(phase); + + // All other bins are locked in phase + for (int c = 0; c < channels; ++c) { + if (c != maxChannel) { + auto &channelBin = bandsForChannel(c)[b]; + auto &channelPrediction = predictionsForChannel(c)[b]; + + Complex channelTwist = _impl::mul<true>(channelPrediction.input, prediction.input); + Complex channelPhase = _impl::mul(outputBin.output, channelTwist); + channelBin.output = channelPrediction.makeOutput(channelPhase); + } } } + return; } + step -= splitMainPrediction; - if (newSpectrum) { - for (auto &bin : channelBands) { - bin.prevInput = bin.input; + if (blockProcess.newSpectrum) { + if (step-- == 0) { + for (auto &bin : channelBands) { + bin.prevInput = bin.input; + } } } } // Produces smoothed energy across all channels - void smoothEnergy(Sample smoothingBins) { + static constexpr size_t smoothEnergySteps = 3; + Sample smoothEnergyState = 0; + void smoothEnergy(size_t step, Sample smoothingBins) { Sample smoothingSlew = 1/(1 + smoothingBins*Sample(0.5)); - for (auto &e : energy) e = 0; - for (int c = 0; c < channels; ++c) { - Band *bins = bandsForChannel(c); + if (step-- == 0) { + for (auto &e : energy) e = 0; + for (int c = 0; c < channels; ++c) { + Band *bins = bandsForChannel(c); + for (int b = 0; b < bands; ++b) { + Sample e = _impl::norm(bins[b].input); + bins[b].inputEnergy = e; // Used for interpolating prediction energy + energy[b] += e; + } + } for (int b = 0; b < bands; ++b) { - Sample e = std::norm(bins[b].input); - bins[b].inputEnergy = e; // Used for interpolating prediction energy - energy[b] += e; + smoothedEnergy[b] = energy[b]; } + smoothEnergyState = 0; + return; } - for (int b = 0; b < bands; ++b) { - smoothedEnergy[b] = energy[b]; + + // The two other steps are repeated smoothing passes, down and up + Sample e = smoothEnergyState; + for (int b = bands - 1; b >= 0; --b) { + e += (smoothedEnergy[b] - e)*smoothingSlew; + smoothedEnergy[b] = e; } - Sample e = 0; - for (int repeat = 0; repeat < 2; ++repeat) { - for (int b = bands - 1; b >= 0; --b) { - e += (smoothedEnergy[b] - e)*smoothingSlew; - smoothedEnergy[b] = e; - } - for (int b = 0; b < bands; ++b) { - e += (smoothedEnergy[b] - e)*smoothingSlew; - smoothedEnergy[b] = e; - } + for (int b = 0; b < bands; ++b) { + e += (smoothedEnergy[b] - e)*smoothingSlew; + smoothedEnergy[b] = e; } + smoothEnergyState = e; } Sample mapFreq(Sample freq) const { if (customFreqMap) return customFreqMap(freq); if (freq > freqTonalityLimit) { - Sample diff = freq - freqTonalityLimit; - return freqTonalityLimit*freqMultiplier + diff; + return freq + (freqMultiplier - 1)*freqTonalityLimit; } return freq*freqMultiplier; } // Identifies spectral peaks using energy across all channels - void findPeaks(Sample smoothingBins) { - smoothEnergy(smoothingBins); - + void findPeaks() { peaks.resize(0); int start = 0; @@ -631,6 +827,118 @@ outputMap[b] = {b + topOffset, 1}; } } + + // If we mapped formants the same way as mapFreq(), this would be the inverse + Sample invMapFormant(Sample freq) const { + if (freq*invFormantMultiplier > freqTonalityLimit) { + return freq + (1 - formantMultiplier)*freqTonalityLimit; + } + return freq*invFormantMultiplier; + } + + Sample freqEstimateWeighted = 0; + Sample freqEstimateWeight = 0; + Sample estimateFrequency() { + // 3 highest peaks in the input + std::array<int, 3> peakIndices{0, 0, 0}; + for (int b = 1; b < bands - 1; ++b) { + Sample e = formantMetric[b]; + // local maxima only + if (e < formantMetric[b - 1] || e <= formantMetric[b + 1]) continue; + + if (e > formantMetric[peakIndices[0]]) { + if (e > formantMetric[peakIndices[1]]) { + if (e > formantMetric[peakIndices[2]]) { + peakIndices = {peakIndices[1], peakIndices[2], b}; + } else { + peakIndices = {peakIndices[1], b, peakIndices[2]}; + } + } else { + peakIndices[0] = b; + } + } + } + + // VERY rough pitch estimation + int peakEstimate = peakIndices[2]; + if (formantMetric[peakIndices[1]] > formantMetric[peakIndices[2]]*0.1) { + int diff = std::abs(peakEstimate - peakIndices[1]); + if (diff > peakEstimate/8 && diff < peakEstimate*7/8) peakEstimate = peakEstimate%diff; + if (formantMetric[peakIndices[0]] > formantMetric[peakIndices[2]]*0.01) { + int diff = std::abs(peakEstimate - peakIndices[0]); + if (diff > peakEstimate/8 && diff < peakEstimate*7/8) peakEstimate = peakEstimate%diff; + } + } + Sample weight = formantMetric[peakIndices[2]]; + // Smooth it out a bit + freqEstimateWeighted += (peakEstimate*weight - freqEstimateWeighted)*0.25; + freqEstimateWeight += (weight - freqEstimateWeight)*0.25; + + return freqEstimateWeighted/(freqEstimateWeight + Sample(1e-30)); + } + + Sample freqEstimate; + + std::vector<Sample> formantMetric; + Sample formantBaseFreq = 0; + void updateFormants(size_t step) { + if (step-- == 0) { + for (auto &e : formantMetric) e = 0; + for (int c = 0; c < channels; ++c) { + Band *bins = bandsForChannel(c); + for (int b = 0; b < bands; ++b) { + formantMetric[b] += bins[b].inputEnergy; + } + } + + freqEstimate = freqToBand(formantBaseFreq); + if (formantBaseFreq <= 0) freqEstimate = estimateFrequency(); + + for (int b = 0; b < bands; ++b) { + formantMetric[b] = std::sqrt(formantMetric[b]); + } + } else if (step-- == 0) { + Sample slew = 1/(freqEstimate*0.5 + 1); + Sample e = 0; + for (size_t repeat = 0; repeat < 2; ++repeat) { + for (int b = bands - 1; b >= 0; --b) { + e += (formantMetric[b] - e)*slew; + formantMetric[b] = e; + } + for (int b = 0; b < bands; ++b) { + e += (formantMetric[b] - e)*slew; + formantMetric[b] = e; + } + } + } else { + auto getFormant = [&](Sample band) -> Sample { + if (band < 0) return 0; + band = std::min<Sample>(band, bands); + int floorBand = std::floor(band); + Sample fracBand = band - floorBand; + Sample low = formantMetric[floorBand], high = formantMetric[floorBand + 1]; + return low + (high - low)*fracBand; + }; + + for (int b = 0; b < bands; ++b) { + Sample inputF = bandToFreq(b); + Sample outputF = formantCompensation ? mapFreq(inputF) : inputF; + outputF = invMapFormant(outputF); + + Sample inputE = formantMetric[b]; + Sample targetE = getFormant(freqToBand(outputF)); + + Sample formantRatio = targetE/(inputE + Sample(1e-30)); + Sample energyRatio = formantRatio*formantRatio; + + for (int c = 0; c < channels; ++c) { + Band *bins = bandsForChannel(c); + // This is what's used to decide the output energy, so this affects the output + bins[b].inputEnergy *= energyRatio; + } + } + } + } }; }} // namespace Modified: branches/OpenMPT-1.32/tracklib/TimeStretchPitchShift.cpp ============================================================================== --- branches/OpenMPT-1.32/tracklib/TimeStretchPitchShift.cpp Sun May 25 14:04:10 2025 (r23191) +++ branches/OpenMPT-1.32/tracklib/TimeStretchPitchShift.cpp Sun May 25 14:04:43 2025 (r23192) @@ -20,9 +20,9 @@ #if MPT_COMPILER_MSVC #pragma warning(push) #pragma warning(disable:4305) -#pragma warning(disable:4267) #pragma warning(disable:4244) #pragma warning(disable:4458) +#pragma warning(disable:4456) #endif // MPT_COMPILER_MSVC #include <SignalsmithStretch/signalsmith-stretch.h> #if MPT_COMPILER_MSVC |