Variable Step Size and Frequency-Domain Adaptive Algorithms

The Convergence Trade-off of Fixed Step Size

Standard LMS and NLMS algorithms use a fixed step size parameter $\mu$. The choice of step size directly impacts algorithm performance, but a fundamental contradiction exists:

  • Large step size: Fast convergence and quick adaptation to environmental changes, but large steady-state error and low filtering precision
  • Small step size: Small steady-state error and high filtering precision, but slow convergence and sluggish response to abrupt changes

This contradiction is particularly pronounced in applications such as echo cancellation and active noise control — the system needs fast convergence during startup, yet wishes to maintain low error in steady state. A fixed step size cannot satisfy both phases simultaneously.

Figure 1 - Fixed Step Size Dilemma:

mermaid
flowchart TD
    START["Algorithm Start"] --> BIG{"Step Size μ Large?"}
    BIG -->|"Yes"| FAST["Fast Convergence ✓<br/>But Large Steady Error ✗"]
    BIG -->|"No"| SLOW["High Precision ✓<br/>But Slow Convergence ✗"]
    FAST --> DILEMMA["You Can't Have<br/>Both"]
    SLOW --> DILEMMA
    DILEMMA --> VSS["Variable Step Solution<br/>μ(n) adapts to error dynamically"]

    classDef problem fill:#f44336,color:#fff
    classDef solution fill:#4CAF50,color:#fff
    class FAST,SLOW,DILEMMA problem
    class VSS solution

Principles of Variable Step Size Algorithms

The core idea of variable step size algorithms: the step size is no longer fixed, but dynamically adjusted according to some error metric. The step size is large when the error is large to accelerate convergence, and small when the error is small to reduce steady-state fluctuation.

The general form of a variable step size algorithm can be expressed as:

$$ \mu(n) = f(e(n)) $$

where $f(\cdot)$ is a monotonic non-decreasing function and $e(n)$ is the error signal at the current instant. The design of $f$ determines the algorithm’s specific behavior.

Figure 3 - Variable Step Size Behavior:

mermaid
flowchart TD
    subgraph Startup Phase
        S1["Large error → μ increases<br/>→ Fast convergence"]
    end
    subgraph Steady State
        S2["Small error → μ decreases<br/>→ Low steady-state error"]
    end
    subgraph Abrupt Change
        S3["Error spikes → μ rises again<br/>→ Rapid re-tracking"]
    end
    S1 --> S2 --> S3 --> S2

    classDef phase fill:#9C27B0,color:#fff
    class S1,S2,S3 phase

The dual-plot animation below shows how error e(n) and step size μ(n) change in sync — μ automatically increases when error is large, decreases when error is small:

自适应步长 · μ(n) 随误差 e(n) 调节
误差 e(n)
步长 μ(n)
迭代次数 n
μ(n) = μmax · (1 − e−β·e(n)²) μmax = 0.05, β = 10

Error-Energy Based Step Size

One intuitive approach couples the step size with the error energy:

$$ \mu(n) = \alpha \cdot \left(1 - \exp(-\beta \cdot e^2(n))\right) $$

Breaking down this formula term by term:

Numerical example: $\alpha=0.01,;\beta=100,;e=0.1$ gives

$$ \mu = 0.01 \times (1 - \exp(-100 \times 0.01)) = 0.01 \times (1 - e^{-1}) \approx 0.0063 $$

where $\alpha$ controls the upper bound of the step size (maximum step size), and $\beta$ adjusts the slope of the response curve from error to step size. A larger $\beta$ produces faster decay of the step size at small errors.

Characteristics of this scheme:

Sigmoid-Based Variable Step Size

Another common approach introduces the Sigmoid function to construct the step size curve over the absolute error:

$$ \mu(n) = \alpha \cdot \left(\frac{2}{1 + \exp(-\beta |e(n)|)} - 1\right) $$

The Sigmoid variable step size approaches zero at very small errors, then rapidly approaches $\alpha$ as the error grows. Compared to the exponential type, the Sigmoid curve has a steeper transition band in the low-to-medium error range, making it suitable for applications that demand more aggressive convergence speed.

Parameter tuning for variable step size algorithms depends on specific signal characteristics. General guidelines: $\alpha$ should not exceed the maximum step size allowed by stability conditions, and $\beta$ should be determined through experiments or offline data analysis.

Frequency-Domain Adaptive Filtering

When the adaptive filter order $N$ is large (typically in acoustic echo cancellation scenarios where $N$ can reach 1024 or even 4096), the computational cost of time-domain algorithms becomes a bottleneck.

The Cost of Time-Domain Convolution

Each iteration of a time-domain FIR filter involves a full convolution operation. For an $N$-tap filter:

Since one iteration is required per output sample, the per-sample computational cost is $O(N)$. For long filters, the accumulated overhead cannot be ignored.

FFT Turns Convolution into Multiplication

Frequency-domain algorithms bypass time-domain convolution by transforming signals into the frequency domain via FFT, performing filtering and weight updates there:

  1. Signal blocking: The input signal $x(n)$ is divided into data blocks of $L$ samples each — frequency-domain algorithms process data block by block
  2. FFT transform: Each block is transformed using an $M$-point FFT (typically $M = 2L$, using the overlap-save method to avoid circular convolution)
  3. Frequency-domain filtering: Filtering in the frequency domain becomes pointwise multiplication: $Y(k) = X(k) \cdot W(k)$, where $X(k)$ and $W(k)$ are the frequency-domain representations of the input and weights respectively
  4. IFFT back to time domain: The frequency-domain output is transformed back to the time domain via IFFT, and the valid portion is extracted as the block output
  5. Frequency-domain weight update: The error signal is also transformed to the frequency domain via FFT, where gradient computation and weight updates are performed

Figure 2 - Frequency-Domain Block Processing Pipeline:

mermaid
flowchart TD
    A["Input x(n)<br/>Time domain"] --> B["Blocking<br/>L samples per block"]
    B --> C["FFT<br/>Time → Frequency"]
    C --> D["Freq. Multiply<br/>Y(k) = X(k)·W(k)"]
    D --> E["IFFT<br/>Frequency → Time"]
    E --> F["Overlap-Save<br/>Extract valid output"]
    F --> G["Output y(n)"]

    classDef time fill:#2196F3,color:#fff
    classDef freq fill:#f44336,color:#fff
    classDef out fill:#4CAF50,color:#fff
    class A,B,F,G time
    class C,D,E freq

Each block operation involves two forward FFTs, one inverse FFT, and frequency-domain multiplications, totaling $O(M \log M)$ complexity. Since one block operation handles $L$ samples, the amortized per-sample complexity is $O(\log M)$. Compared to $O(N)$ in the time domain, the frequency-domain algorithm reduces computation by one to two orders of magnitude when $N = 1024$.

To appreciate the difference with concrete numbers:

Trade-offs in Frequency-Domain Algorithms

The advantages of frequency-domain algorithms come at the cost of additional latency and memory:

Choosing the Right Approach

Time-domain variable step size algorithms and frequency-domain block algorithms suit different scenarios: