Variable Step Size and Frequency-Domain Adaptive Algorithms
The Convergence Trade-off of Fixed Step Size
Standard LMS and NLMS algorithms use a fixed step size parameter $\mu$. The choice of step size directly impacts algorithm performance, but a fundamental contradiction exists:
- Large step size: Fast convergence and quick adaptation to environmental changes, but large steady-state error and low filtering precision
- Small step size: Small steady-state error and high filtering precision, but slow convergence and sluggish response to abrupt changes
This contradiction is particularly pronounced in applications such as echo cancellation and active noise control — the system needs fast convergence during startup, yet wishes to maintain low error in steady state. A fixed step size cannot satisfy both phases simultaneously.
Figure 1 - Fixed Step Size Dilemma:
flowchart TD
START["Algorithm Start"] --> BIG{"Step Size μ Large?"}
BIG -->|"Yes"| FAST["Fast Convergence ✓<br/>But Large Steady Error ✗"]
BIG -->|"No"| SLOW["High Precision ✓<br/>But Slow Convergence ✗"]
FAST --> DILEMMA["You Can't Have<br/>Both"]
SLOW --> DILEMMA
DILEMMA --> VSS["Variable Step Solution<br/>μ(n) adapts to error dynamically"]
classDef problem fill:#f44336,color:#fff
classDef solution fill:#4CAF50,color:#fff
class FAST,SLOW,DILEMMA problem
class VSS solutionPrinciples of Variable Step Size Algorithms
The core idea of variable step size algorithms: the step size is no longer fixed, but dynamically adjusted according to some error metric. The step size is large when the error is large to accelerate convergence, and small when the error is small to reduce steady-state fluctuation.
The general form of a variable step size algorithm can be expressed as:
$$ \mu(n) = f(e(n)) $$
where $f(\cdot)$ is a monotonic non-decreasing function and $e(n)$ is the error signal at the current instant. The design of $f$ determines the algorithm’s specific behavior.
Figure 3 - Variable Step Size Behavior:
flowchart TD
subgraph Startup Phase
S1["Large error → μ increases<br/>→ Fast convergence"]
end
subgraph Steady State
S2["Small error → μ decreases<br/>→ Low steady-state error"]
end
subgraph Abrupt Change
S3["Error spikes → μ rises again<br/>→ Rapid re-tracking"]
end
S1 --> S2 --> S3 --> S2
classDef phase fill:#9C27B0,color:#fff
class S1,S2,S3 phaseThe dual-plot animation below shows how error e(n) and step size μ(n) change in sync — μ automatically increases when error is large, decreases when error is small:
Error-Energy Based Step Size
One intuitive approach couples the step size with the error energy:
$$ \mu(n) = \alpha \cdot \left(1 - \exp(-\beta \cdot e^2(n))\right) $$
Breaking down this formula term by term:
- $e^2(n)$: The squared error, i.e. the instantaneous error energy. The larger the error, the larger this value
- $\exp(-\beta \cdot e^2)$: Exponential decay term. When $e^2=0$ it equals 1 (step size = 0), as $e^2 \to \infty$ it approaches 0
- $1 - \exp(-\beta \cdot e^2)$: The inverted curve, rising smoothly from 0 to 1
- $\alpha$: Step size upper bound. The final $\mu(n)$ never exceeds $\alpha$
- $\beta$: Sensitivity. Large $\beta$ → small errors trigger significant step size changes; small $\beta$ → larger errors needed for response
Numerical example: $\alpha=0.01,;\beta=100,;e=0.1$ gives
$$ \mu = 0.01 \times (1 - \exp(-100 \times 0.01)) = 0.01 \times (1 - e^{-1}) \approx 0.0063 $$
where $\alpha$ controls the upper bound of the step size (maximum step size), and $\beta$ adjusts the slope of the response curve from error to step size. A larger $\beta$ produces faster decay of the step size at small errors.
Characteristics of this scheme:
- The step size drops to zero when the error is zero — weight updates nearly cease in steady state
- The step size rises smoothly as the error increases, with no abrupt transitions
- Two parameters $\alpha$ and $\beta$ control amplitude and sensitivity independently, making tuning intuitive
Sigmoid-Based Variable Step Size
Another common approach introduces the Sigmoid function to construct the step size curve over the absolute error:
$$ \mu(n) = \alpha \cdot \left(\frac{2}{1 + \exp(-\beta |e(n)|)} - 1\right) $$
The Sigmoid variable step size approaches zero at very small errors, then rapidly approaches $\alpha$ as the error grows. Compared to the exponential type, the Sigmoid curve has a steeper transition band in the low-to-medium error range, making it suitable for applications that demand more aggressive convergence speed.
Parameter tuning for variable step size algorithms depends on specific signal characteristics. General guidelines: $\alpha$ should not exceed the maximum step size allowed by stability conditions, and $\beta$ should be determined through experiments or offline data analysis.
Frequency-Domain Adaptive Filtering
When the adaptive filter order $N$ is large (typically in acoustic echo cancellation scenarios where $N$ can reach 1024 or even 4096), the computational cost of time-domain algorithms becomes a bottleneck.
The Cost of Time-Domain Convolution
Each iteration of a time-domain FIR filter involves a full convolution operation. For an $N$-tap filter:
- Filter output per iteration: $N$ multiply-accumulate operations
- Weight update per iteration: $N$ multiply-accumulate operations
- Total complexity per iteration: $O(N)$
Since one iteration is required per output sample, the per-sample computational cost is $O(N)$. For long filters, the accumulated overhead cannot be ignored.
FFT Turns Convolution into Multiplication
Frequency-domain algorithms bypass time-domain convolution by transforming signals into the frequency domain via FFT, performing filtering and weight updates there:
- Signal blocking: The input signal $x(n)$ is divided into data blocks of $L$ samples each — frequency-domain algorithms process data block by block
- FFT transform: Each block is transformed using an $M$-point FFT (typically $M = 2L$, using the overlap-save method to avoid circular convolution)
- Frequency-domain filtering: Filtering in the frequency domain becomes pointwise multiplication: $Y(k) = X(k) \cdot W(k)$, where $X(k)$ and $W(k)$ are the frequency-domain representations of the input and weights respectively
- IFFT back to time domain: The frequency-domain output is transformed back to the time domain via IFFT, and the valid portion is extracted as the block output
- Frequency-domain weight update: The error signal is also transformed to the frequency domain via FFT, where gradient computation and weight updates are performed
Figure 2 - Frequency-Domain Block Processing Pipeline:
flowchart TD
A["Input x(n)<br/>Time domain"] --> B["Blocking<br/>L samples per block"]
B --> C["FFT<br/>Time → Frequency"]
C --> D["Freq. Multiply<br/>Y(k) = X(k)·W(k)"]
D --> E["IFFT<br/>Frequency → Time"]
E --> F["Overlap-Save<br/>Extract valid output"]
F --> G["Output y(n)"]
classDef time fill:#2196F3,color:#fff
classDef freq fill:#f44336,color:#fff
classDef out fill:#4CAF50,color:#fff
class A,B,F,G time
class C,D,E freqEach block operation involves two forward FFTs, one inverse FFT, and frequency-domain multiplications, totaling $O(M \log M)$ complexity. Since one block operation handles $L$ samples, the amortized per-sample complexity is $O(\log M)$. Compared to $O(N)$ in the time domain, the frequency-domain algorithm reduces computation by one to two orders of magnitude when $N = 1024$.
To appreciate the difference with concrete numbers:
- Time domain: Each sample requires $O(N)$ multiply-accumulate operations. At $N=1024$, that’s 1024 operations per sample
- Frequency domain: One $O(M \log M)$ FFT per $L$ samples, amortized to $O(\log M)$ per sample
- $N=1024,;M=2048$: 1024 ops/sample (time) vs $\log_2 2048 \approx 11$ ops/sample (frequency), ~90x faster
- Cost: Block delay of $L$ sample periods. $L=1024$ @ 16kHz sample rate = 64ms latency
Trade-offs in Frequency-Domain Algorithms
The advantages of frequency-domain algorithms come at the cost of additional latency and memory:
- Block processing latency: $L$ samples must be collected before processing can begin, introducing an inherent delay of $L$ sample periods. Latency-sensitive applications need to constrain the block size
- FFT computation overhead: Even with reduced complexity, FFT implementation requires efficient math libraries or dedicated hardware (such as ARM CMSIS-DSP). On low-power embedded MCUs, large-point FFTs may still be impractical
- Doubled memory footprint: Frequency-domain algorithms need to store complex frequency-domain weights and FFT intermediate buffers, typically consuming 2-4 times the memory of time-domain implementations
- Block boundary effects: The overlap-save method requires additional pre- and post-processing, and the transitions between blocks must be carefully managed
Choosing the Right Approach
Time-domain variable step size algorithms and frequency-domain block algorithms suit different scenarios:
- Low filter order (N < 128): Time-domain algorithms are simpler, more direct, have lower latency, and lower implementation costs — the better choice
- High filter order (N > 512): The computational advantage of frequency-domain algorithms becomes significant; for N ≥ 2048, they are often the only viable real-time option
- Extreme latency sensitivity: Even with high filter orders, if the application demands ultra-low latency (such as hearing aids), a compromise using partitioned frequency-domain or time-domain algorithms may be necessary
- Embedded resource constraints: Memory-limited MCUs may need to prioritize time-domain variable step size algorithms, achieving better convergence performance within acceptable computational bounds