Variable Step Size and Frequency-Domain Adaptive Algorithms

May 14, 2026 Physical World Development Adaptive Filtering, Variable Step Size, Frequency-Domain Algorithm, FDAF, DSP Signals & Image Processing Series 1398 words 7 min read

🔊

The Convergence Trade-off of Fixed Step Size

Standard LMS and NLMS algorithms use a fixed step size parameter $\mu$. The choice of step size directly impacts algorithm performance, but a fundamental contradiction exists:

Large step size: Fast convergence and quick adaptation to environmental changes, but large steady-state error and low filtering precision
Small step size: Small steady-state error and high filtering precision, but slow convergence and sluggish response to abrupt changes

This contradiction is particularly pronounced in applications such as echo cancellation and active noise control — the system needs fast convergence during startup, yet wishes to maintain low error in steady state. A fixed step size cannot satisfy both phases simultaneously.

Figure 1 - Fixed Step Size Dilemma:

mermaid
flowchart TD
    START["Algorithm Start"] --> BIG{"Step Size μ Large?"}
    BIG -->|"Yes"| FAST["Fast Convergence ✓<br/>But Large Steady Error ✗"]
    BIG -->|"No"| SLOW["High Precision ✓<br/>But Slow Convergence ✗"]
    FAST --> DILEMMA["You Can't Have<br/>Both"]
    SLOW --> DILEMMA
    DILEMMA --> VSS["Variable Step Solution<br/>μ(n) adapts to error dynamically"]

    classDef problem fill:#f44336,color:#fff
    classDef solution fill:#4CAF50,color:#fff
    class FAST,SLOW,DILEMMA problem
    class VSS solution

Principles of Variable Step Size Algorithms

The core idea of variable step size algorithms: the step size is no longer fixed, but dynamically adjusted according to some error metric. The step size is large when the error is large to accelerate convergence, and small when the error is small to reduce steady-state fluctuation.

The general form of a variable step size algorithm can be expressed as:

$$ \mu(n) = f(e(n)) $$

where $f(\cdot)$ is a monotonic non-decreasing function and $e(n)$ is the error signal at the current instant. The design of $f$ determines the algorithm’s specific behavior.

Figure 3 - Variable Step Size Behavior:

mermaid
flowchart TD
    S1["Startup<br/>large error → μ rises<br/>fast convergence"]
    S2["Steady<br/>small error → μ drops<br/>low steady-state error"]
    S3["Abrupt change<br/>error spikes → μ rises<br/>rapid re-tracking"]
    S1 --> S2
    S2 -.alternating.-> S3

    classDef phase fill:#f3e5f5,stroke:#9C27B0,color:#4A148C
    class S1,S2,S3 phase

Error-driven adaptive step size: μ rises at startup to speed convergence, drops in steady state to minimize error, and rises again on abrupt changes to recover tracking — with S2 and S3 alternating.

The dual-plot animation below shows how error e(n) and step size μ(n) change in sync — μ automatically increases when error is large, decreases when error is small:

自适应步长 · μ(n) 随误差 e(n) 调节

误差 e(n)

步长 μ(n)

迭代次数 n

μ(n) = μ_max · (1 − e^{−β·e(n)²}) μ_max = 0.05, β = 10

Error-Energy Based Step Size

One intuitive approach couples the step size with the error energy:

$$ \mu(n) = \alpha \cdot \left(1 - \exp(-\beta \cdot e^2(n))\right) $$

Breaking down this formula term by term:

$e^2(n)$: The squared error, i.e. the instantaneous error energy. The larger the error, the larger this value
$\exp(-\beta \cdot e^2)$: Exponential decay term. When $e^2=0$ it equals 1 (step size = 0), as $e^2 \to \infty$ it approaches 0
$1 - \exp(-\beta \cdot e^2)$: The inverted curve, rising smoothly from 0 to 1
$\alpha$: Step size upper bound. The final $\mu(n)$ never exceeds $\alpha$
$\beta$: Sensitivity. Large $\beta$ → small errors trigger significant step size changes; small $\beta$ → larger errors needed for response

Numerical example: $\alpha=0.01,\;\beta=100,\;e=0.1$ gives

$$ \mu = 0.01 \times (1 - \exp(-100 \times 0.01)) = 0.01 \times (1 - e^{-1}) \approx 0.0063 $$

where $\alpha$ controls the upper bound of the step size (maximum step size), and $\beta$ adjusts the slope of the response curve from error to step size. A larger $\beta$ produces faster decay of the step size at small errors.

Characteristics of this scheme:

The step size drops to zero when the error is zero — weight updates nearly cease in steady state
The step size rises smoothly as the error increases, with no abrupt transitions
Two parameters $\alpha$ and $\beta$ control amplitude and sensitivity independently, making tuning intuitive

Sigmoid-Based Variable Step Size

Another common approach introduces the Sigmoid function to construct the step size curve over the absolute error:

$$ \mu(n) = \alpha \cdot \left(\frac{2}{1 + \exp(-\beta |e(n)|)} - 1\right) $$

The Sigmoid variable step size approaches zero at very small errors, then rapidly approaches $\alpha$ as the error grows. Compared to the exponential type, the Sigmoid curve has a steeper transition band in the low-to-medium error range, making it suitable for applications that demand more aggressive convergence speed.

Parameter tuning for variable step size algorithms depends on specific signal characteristics. General guidelines: $\alpha$ should not exceed the maximum step size allowed by stability conditions, and $\beta$ should be determined through experiments or offline data analysis.

Frequency-Domain Adaptive Filtering

When the adaptive filter order $N$ is large (typically in acoustic echo cancellation scenarios where $N$ can reach 1024 or even 4096), the computational cost of time-domain algorithms becomes a bottleneck.

The Cost of Time-Domain Convolution

Each iteration of a time-domain FIR filter involves a full convolution operation. For an $N$-tap filter:

Filter output per iteration: $N$ multiply-accumulate operations
Weight update per iteration: $N$ multiply-accumulate operations
Total complexity per iteration: $O(N)$

Since one iteration is required per output sample, the per-sample computational cost is $O(N)$. For long filters, the accumulated overhead cannot be ignored.

FFT Turns Convolution into Multiplication

Frequency-domain algorithms bypass time-domain convolution by transforming signals into the frequency domain via FFT, performing filtering and weight updates there:

Signal blocking: The input signal $x(n)$ is divided into data blocks of $L$ samples each — frequency-domain algorithms process data block by block
FFT transform: Each block is transformed using an $M$-point FFT (typically $M = 2L$, using the overlap-save method to avoid circular convolution)
Frequency-domain filtering: Filtering in the frequency domain becomes pointwise multiplication: $Y(k) = X(k) \cdot W(k)$, where $X(k)$ and $W(k)$ are the frequency-domain representations of the input and weights respectively
IFFT back to time domain: The frequency-domain output is transformed back to the time domain via IFFT, and the valid portion is extracted as the block output
Frequency-domain weight update: The error signal is also transformed to the frequency domain via FFT, where gradient computation and weight updates are performed

Figure 2 - Frequency-Domain Block Processing Pipeline:

mermaid
flowchart TD
    A["Input x(n)<br/>Time domain"] --> B["Block + FFT<br/>Time → Frequency"]
    B --> C["Freq. Multiply<br/>Y(k)=X(k)·W(k)"]
    C --> D["IFFT<br/>Frequency → Time"]
    D --> E["Overlap-Save<br/>Extract valid output"]
    E --> F["Output y(n)"]

    classDef time fill:#bbdefb,stroke:#2196F3,color:#1B5E20
    classDef freq fill:#ffcdd2,stroke:#f44336,color:#B71C1C
    classDef out fill:#c8e6c9,stroke:#4CAF50,color:#1B5E20
    class A,B,E,F time
    class C,D freq

Each block operation involves two forward FFTs, one inverse FFT, and frequency-domain multiplications, totaling $O(M \log M)$ complexity. Since one block operation handles $L$ samples, the amortized per-sample complexity is $O(\log M)$. Compared to $O(N)$ in the time domain, the frequency-domain algorithm reduces computation by one to two orders of magnitude when $N = 1024$.

To appreciate the difference with concrete numbers:

Time domain: Each sample requires $O(N)$ multiply-accumulate operations. At $N=1024$, that’s 1024 operations per sample
Frequency domain: One $O(M \log M)$ FFT per $L$ samples, amortized to $O(\log M)$ per sample
$N=1024,\;M=2048$: 1024 ops/sample (time) vs $\log_2 2048 \approx 11$ ops/sample (frequency), ~90x faster
Cost: Block delay of $L$ sample periods. $L=1024$ @ 16kHz sample rate = 64ms latency

Trade-offs in Frequency-Domain Algorithms

The advantages of frequency-domain algorithms come at the cost of additional latency and memory:

Block processing latency: $L$ samples must be collected before processing can begin, introducing an inherent delay of $L$ sample periods. Latency-sensitive applications need to constrain the block size
FFT computation overhead: Even with reduced complexity, FFT implementation requires efficient math libraries or dedicated hardware (such as ARM CMSIS-DSP). On low-power embedded MCUs, large-point FFTs may still be impractical
Doubled memory footprint: Frequency-domain algorithms need to store complex frequency-domain weights and FFT intermediate buffers, typically consuming 2-4 times the memory of time-domain implementations
Block boundary effects: The overlap-save method requires additional pre- and post-processing, and the transitions between blocks must be carefully managed

Choosing the Right Approach

Time-domain variable step size algorithms and frequency-domain block algorithms suit different scenarios:

Low filter order (N < 128): Time-domain algorithms are simpler, more direct, have lower latency, and lower implementation costs — the better choice
High filter order (N > 512): The computational advantage of frequency-domain algorithms becomes significant; for N ≥ 2048, they are often the only viable real-time option
Extreme latency sensitivity: Even with high filter orders, if the application demands ultra-low latency (such as hearing aids), a compromise using partitioned frequency-domain or time-domain algorithms may be necessary
Embedded resource constraints: Memory-limited MCUs may need to prioritize time-domain variable step size algorithms, achieving better convergence performance within acceptable computational bounds

Part of series: Signals & Image Processing Series

← Previous FXLMS and Active Noise Control System Architecture Next → Embedded ANC: STM32 in Practice