Adaptive Filtering: From LMS to NLMS

The LMS Algorithm

The core problem in adaptive filtering is: given a reference signal x(n) and a desired signal d(n), find a filter coefficient vector w such that the output y(n) approximates d(n).

The Least Mean Square (LMS) algorithm is the most classic solution to this problem, proposed by Widrow and Hoff in 1960. The core idea is to take one step down the steepest gradient of the error surface at each iteration — stochastic gradient descent.

Mathematical Model

For a filter of length N:

  • Weight vector: w(n) = [w_0(n), w_1(n), ..., w_{N-1}(n)]^T
  • Reference signal vector: x(n) = [x(n), x(n-1), ..., x(n-N+1)]^T

The filter output is the inner product of the two:

$$ y(n) = w^T(n) x(n) = \sum_{k=0}^{N-1} w_k(n) x(n-k) $$

The error signal:

$$ e(n) = d(n) - y(n) $$

The cost function is the mean square error:

$$ J(n) = E[e^2(n)] $$

Taking the gradient with respect to w yields the weight update rule:

$$ \nabla J(n) = -2 e(n) x(n) $$

$$ w(n+1) = w(n) - \mu \nabla J(n) = w(n) + \mu e(n) x(n) $$

where \mu is the step size that controls convergence speed.

Adaptive Filter Structure

The diagram below shows the closed-loop workflow of an adaptive filter:

mermaid
flowchart TD
    REF["Reference Signal x(n)"] --> W["Adaptive Filter W(z)<br/>Weights w(n)"]
    W -->|"Output y(n)"| SUM(("⊕"))
    DES["Desired Signal d(n)<br/>(Error Microphone)"] --> SUM
    SUM -->|"Error e(n) = d(n) - y(n)"| UPDATE["Weight Update<br/>w(n+1) = w(n) + μ·e(n)·x(n)"]
    REF --> UPDATE
    UPDATE --> W

    classDef signal fill:#2196F3,color:#fff
    classDef proc fill:#9C27B0,color:#fff
    classDef err fill:#f44336,color:#fff
    class REF,DES signal
    class W,UPDATE,SUM proc

The reference signal x(n) passes through the adaptive filter to produce output y(n), which is compared with the desired signal d(n) to compute the error e(n). The error is then fed back to guide weight updates. This closed loop runs continuously—the error decreases over time until the filter converges to the optimal solution.

Formula Breakdown

The LMS weight update rule looks like a single line, but each term has a clear physical meaning:

$$ w(n+1) = w(n) + \mu \cdot e(n) \cdot x(n) $$

SymbolMeaningIntuition
w(n)Filter coefficients (N taps) at time nThe filter’s “memory”—how it combines past input samples
μStep sizeHow large a step to take each iteration—too large → overshoot and diverge; too small → painfully slow convergence
e(n)Error signal d(n) - y(n)A gauge telling the algorithm “how wrong it is”—larger errors drive bigger corrections
x(n)Reference signal vectorTells the algorithm “which direction to adjust”—weights are corrected along x

An Intuitive Analogy: Blindfolded Hill Descent

Imagine you’re blindfolded on a hillside, trying to reach the lowest point in the valley:

  • e(n) is your perception of the slope—steeper slope means larger error, farther from the bottom
  • μ is your stride length—too large and you might leap past the valley to the opposite slope (divergence); too small and you’ll take forever (slow convergence)
  • x(n) is the terrain beneath your feet—telling you which direction to step
  • Each update: feel the slope → take a step in the right direction → feel again → step again

Numerical Example: With μ = 0.001, e = 0.5, x = 0.8, the update amount = 0.001 × 0.5 × 0.8 = 0.0004. If e increases to 5.0, the update = 0.001 × 5.0 × 0.8 = 0.004—larger errors drive larger corrections.

Why Is It Called “Stochastic” Gradient Descent?

Strictly speaking, the LMS cost function is the mean square error J(n) = E[e²(n)], and the true gradient is ∇J = -2 · E[e(n) · x(n)]. But LMS skips the expectation—it directly uses the instantaneous value -2 · e(n) · x(n) as a gradient estimate.

This means using one sample’s error in place of the expectation over all samples—hence “stochastic.” While a single estimate is noisy, averaging over many iterations steers the weights in the same direction as true gradient descent. This “instantaneous-instead-of-expected” tradeoff is what makes LMS computationally cheap yet effective.

Convergence Condition

The necessary and sufficient condition for LMS convergence is:

$$ 0 < \mu < \frac{2}{N \cdot P_x} $$

where P_x = E[x^2(n)] is the input signal power. Larger step sizes converge faster, but exceeding the upper bound causes divergence.

Convergence Condition Breakdown

The inequality 0 < μ < 2 / (N · P_x) determines whether LMS works correctly:

  • N: Filter length. More taps means each tap shares less “energy,” lowering the μ upper bound. A 64-tap filter has half the μ ceiling of a 32-tap filter.
  • P_x: Input signal power. Stronger signals require smaller μ—otherwise gradient noise spirals out of control.
  • Intuition: μ too large → weights overshoot the optimum → oscillation or divergence; μ too small → glacial convergence.

Numerical Example: With N = 64, P_x = 1.0 (unit power signal):

1
μ < 2 / (64 × 1.0) = 0.03125

In practice, take 1/10 to 1/2 of the theoretical upper bound—roughly 0.003 to 0.015—for a safe margin.

Convergence Process

LMS convergence follows a “coarse-to-fine” pattern:

  • Initial phase: Weights start at zero (or random values), error is large, and each update step is aggressive
  • Learning phase: Weights approach the optimal values, error drops rapidly
  • Steady-state phase: Weights jitter slightly around the optimum (“misadjustment noise”), error no longer decreases noticeably

The diagram below illustrates this iterative loop:

mermaid
flowchart TD
    A["Initialize<br/>w(0) = 0"] --> B["Process Sample<br/>y = w^T · x"]
    B --> C["Compute Error<br/>e = d - y"]
    C --> D["Update Weights<br/>w += μ·e·x"]
    D --> E{"Converged?"}
    E -->|"No<br/>Error still large"| B
    E -->|"Yes<br/>Error small enough"| F["Steady State<br/>Fine-tuning"]

    classDef init fill:#FF9800,color:#fff
    classDef loop fill:#2196F3,color:#fff
    classDef done fill:#4CAF50,color:#fff
    class A init
    class B,C,D loop
    class F done

If you plot the error over time, you’ll see a curve that drops steeply at first, then gradually flattens out—like a playground slide, steep at the top and gentle at the bottom.

The animation below draws this “slide” curve, clearly showing the transition through three phases:

收敛曲线
1.00.750.500.250.00255075100迭代次数误差初始阶段收敛阶段稳态阶段

C Implementation

A circular buffer manages the reference signal delay line, avoiding data movement at each iteration:

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <stdint.h>
#include <string.h>
#include <math.h>

#define FILTER_ORDER 64
#define MU 0.001f

typedef struct {
    float w[FILTER_ORDER];
    float x[FILTER_ORDER];
    uint32_t ptr;
} LMS_Filter;

void lms_init(LMS_Filter *f) {
    memset(f->w, 0, sizeof(f->w));
    memset(f->x, 0, sizeof(f->x));
    f->ptr = 0;
}

static float dot_product(const float *a, const float *b, uint32_t n) {
    float sum = 0.0f;
    for (uint32_t i = 0; i < n; i++) sum += a[i] * b[i];
    return sum;
}

float lms_process(LMS_Filter *f, float ref, float err) {
    f->x[f->ptr] = ref;
    float y = 0.0f;
    for (uint32_t i = 0; i < FILTER_ORDER; i++) {
        uint32_t idx = (f->ptr + FILTER_ORDER - i) % FILTER_ORDER;
        y += f->w[i] * f->x[idx];
    }
    for (uint32_t i = 0; i < FILTER_ORDER; i++) {
        uint32_t idx = (f->ptr + FILTER_ORDER - i) % FILTER_ORDER;
        f->w[i] += MU * err * f->x[idx];
    }
    f->ptr = (f->ptr + 1) % FILTER_ORDER;
    return y;
}

lms_process first writes the reference sample into the circular buffer, computes the filter output, then updates each weight with w += mu * e * x. The ptr cycles forward, effectively creating a sliding window over the input.

The NLMS Algorithm

LMS has a practical problem: the step size \mu is highly sensitive to input signal amplitude. Large signals amplify gradient noise and can cause divergence; small signals make convergence painfully slow. Every new deployment scenario requires retuning \mu.

Normalized LMS (NLMS) solves this — it divides the step size by the instantaneous power of the input signal, making the update self-tuning.

Weight Update

The NLMS weight update formula:

$$ w(n+1) = w(n) + \frac{\mu}{x^T(n) x(n) + \varepsilon} e(n) x(n) $$

The denominator x^T(n) x(n) is the energy of the reference signal vector (sum of squared samples), and \varepsilon is a small positive constant preventing division by zero.

NLMS Formula Breakdown

The NLMS update can be understood as LMS with an “adaptive step size” mechanism:

$$ w(n+1) = w(n) + \frac{\mu}{x^T(n) x(n) + \varepsilon} \cdot e(n) \cdot x(n) $$

ComponentMeaning
Denominator x^T(n) x(n)Total energy of the current input signal (sum of squared samples). Strong signal → larger value → step shrinks; weak signal → smaller value → step grows
ε (epsilon)Protection against division by zero. Input may go silent (e.g., pause in audio), default 1e-6
μ / (x^T x + ε)Effective step size, self-tuning per signal energy

Numerical Example: With μ = 0.5, input x = [0.1, -0.05, 0.2, ...], energy x^T x = 0.1² + (-0.05)² + 0.2² + ... = 0.0525, then effective step = 0.5 / (0.0525 + 1e-6) ≈ 9.52. If the signal strengthens to x = [0.5, -0.3, 0.8, ...], energy ≈ 0.98, effective step ≈ 0.5 / 0.98 ≈ 0.51—strong signal automatically reduces the step size.

Comparing LMS and NLMS updates:

AlgorithmUpdate
LMS\mu \cdot e \cdot x
NLMS\frac{\mu}{x^T x} \cdot e \cdot x

NLMS effectively adjusts the equivalent step size at each iteration based on signal energy. Strong signals automatically reduce the step size; weak signals amplify it, keeping gradient noise stable.

Step Size Range

NLMS step size \mu guarantees convergence in the range (0, 2). Compared to LMS which requires tedious tuning per signal power, NLMS is far more practical in real systems.

LMS vs NLMS: Which One to Use?

ScenarioRecommendedReason
Stable signal amplitude (e.g., sine wave tracking, power line noise cancellation)LMSLower computational cost—no division needed
Frequently changing amplitude (e.g., speech, audio ANC)NLMSSelf-tuning step size, no need for retuning
Low-power embedded platform (no FPU)LMS or fixed-point NLMSLMS avoids division and energy accumulation; NLMS fixed-point can precompute energy
Quick prototyping, robustness preferredNLMSμ = 0.5 usually works out of the box

Rule of thumb: Try NLMS first. Only switch to LMS if you need to save the last few MIPS. NLMS has fewer tuning knobs and better robustness—most scenarios work on the first try.

Fixed-Point NLMS Implementation

When the embedded system lacks an FPU, implement NLMS with Q31 fixed-point arithmetic. Weights and signals are int32_t values, and the q31_mul macro handles saturated shifts:

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include <stdint.h>
#include <string.h>

#define FILTER_ORDER 32
#define MU_Q31  0x147AE148
#define EPS_Q31 0x0001869F

static inline int32_t q31_mul(int32_t a, int32_t b) {
    return (int32_t)(((int64_t)a * (int64_t)b) >> 31);
}

typedef struct {
    int32_t h[FILTER_ORDER];
    int32_t x[FILTER_ORDER];
    uint16_t ptr;
} NLMS_Filter_Fixed;

The fixed-point version uses q31_mul for energy accumulation to avoid 32-bit overflow. MU_Q31 and EPS_Q31 are in Q31 format, equivalent to approximately 0.02 and 0.00001 in floating point.