Acoustic Waves & Digital Signal Processing Basics

Destructive Interference

Active noise cancellation (ANC) builds on a simple physical principle: destructive interference. Two sound waves of the same frequency and opposite phase cancel each other out.

The original noise signal:

$$ p_n(t) = A \cos(2\pi ft + \phi) $$

The anti-noise generated by the ANC system:

$$ p_c(t) = -A \cos(2\pi ft + \phi) = A \cos(2\pi ft + \phi + \pi) $$

Superposition:

$$ p_{\text{total}} = p_n(t) + p_c(t) = 0 $$

That is the ideal case. In a real ANC headset, the anti-noise travels from the speaker to the eardrum through an acoustic path, and the electronics introduce latency. If the phase shift is off by even a few degrees, or the amplitudes don’t match, cancellation degrades fast.

The animation below demonstrates destructive interference — the red noise wave and blue anti-noise wave combine to zero:

ptotal = pn(t) + pc(t) = 0
mermaid
flowchart TD
    subgraph Noise_Source
        A["p_n(t) = A·cos(2πft+φ)"]
    end
    subgraph ANC_System
        B["Reference Mic<br/>Captures noise"]
        C["Adaptive Filter<br/>Generates anti-noise"]
        D["Speaker<br/>Outputs anti-noise"]
    end
    subgraph Ear
        E["p_c(t) = -A·cos(2πft+φ)<br/>Summing → Quiet zone"]
    end
    A --> B --> C --> D --> E
    E -.->|"Residual noise"| F["Error Mic<br/>Feedback correction"]
    F --> C

    classDef noise fill:#f44336,color:#fff
    classDef anc fill:#2196F3,color:#fff
    classDef ear fill:#2196F3,color:#fff
    class A noise
    class B,C,D,F anc
    class E ear

Breaking down the parameters in the wave equation.

  • $A$: amplitude, determines loudness. Sound pressure level is $SPL = 20 \log_{10}(p/p_0)$. Doubling amplitude adds +6dB
  • $f$: frequency in Hz. Human hearing spans 20Hz–20kHz. ANC targets low frequencies (20Hz–1kHz) primarily — high-frequency wavelengths are too short for the system to keep up
  • $\phi$: initial phase. ANC’s entire job is making $\phi_c = \phi_n + \pi$ — perfect phase reversal

A concrete example. 100Hz noise has period T = 1/f = 10 ms, half-period is 5 ms. From reference mic capture → DSP compute → speaker output, the whole chain must stay under 5 ms. Exceed that and the phase error kills cancellation. This is why ANC is so sensitive to latency.

Consumer ANC delivers 20–30dB of cancellation, concentrated in the low-mid range. That’s not an algorithm limitation — it’s physics.

Higher frequency means shorter wavelength. 1kHz is about 34cm, 10kHz drops to 3.4cm. ANC needs the anti-noise to arrive at the error mic exactly out of phase with the noise. The shorter the wavelength, the tighter the position tolerance. An ear position error of a few millimeters can throw the phase off by tens of degrees for a 3.4cm wavelength.

Big waves cancel easily, ripples don’t. ANC handles low frequencies well; high frequencies rely on passive isolation (the physical seal). Real-world ANC numbers: consumer in-ear buds ~20–30dB, premium over-ear ~30–40dB, concentrated below 1kHz. Bose’s 1978 prototype managed only 10–15dB. Algorithm and hardware improvements have roughly tripled that.

Sampling and Quantization

ANC is real-time digital signal processing, plain and simple. A microphone captures sound pressure, the ADC turns it into a stream of numbers, the DSP chip crunches them, and a DAC converts the result back into sound.

Two parameters define the quality of that number stream:

  • Sample rate: how many samples per second. Typical ANC systems run at 16 kHz or 48 kHz
  • Bit depth: how many bits per sample, determines the dynamic range

At 48 kHz sample rate, the Nyquist frequency sits at 24 kHz. Human hearing tops out around 20 kHz, so 48 kHz covers the audible band just fine. The catch is anything above 24 kHz (some noise sources do produce ultrasonic content) will alias into the audible range after sampling. An anti-aliasing filter must knock those high frequencies down before the ADC stage.

mermaid
flowchart TD
    A["Analog Sound Pressure<br/>Continuous Signal"] -->|"Anti-aliasing Filter"| B["Low-pass Filter<br/>Cutoff fs/2"]
    B -->|"Sampling"| C["Discrete Samples<br/>fs samples/sec"]
    C -->|"Quantization"| D["Digital Sequence<br/>16bit/24bit"]
    D --> E["DSP Processing"]

    classDef analog fill:#f44336,color:#fff
    classDef digital fill:#2196F3,color:#fff
    class A,B analog
    class C,D,E digital

An intuitive way to grasp the sampling theorem $f_s > 2f_{\max}$: the wagon-wheel effect. A wheel spins counter-clockwise; you photograph it with a strobe. If the strobe rate drops below twice the wheel’s rotation speed, the spokes appear to spin backwards. That’s aliasing — the sampling rate is too low, and high-frequency information masquerades as low-frequency signal.

Same principle in acoustics. Without a low-pass filter (anti-aliasing filter) before the ADC, any content above $f_s/2$ folds into the audible band as an unfilterable distortion. ANC anti-aliasing filters typically set their cutoff slightly below $f_s/2$ to leave room for the transition band.

Discrete-Time Convolution

Digital filters process input signals through convolution. For an FIR filter of length N, each output sample is a weighted sum:

$$ y[n] = \sum_{k=0}^{N-1} w_k \cdot x[n-k] $$

One output sample costs N multiply-accumulate operations. Filter lengths in ANC range from tens to hundreds of taps — the MAC count directly drives DSP chip selection.

Breaking down what each symbol means physically.

$y[n]$ is the output at the current time step. $w_k$ is the k-th filter coefficient (tap weight). $x[n-k]$ is the input sample from k steps ago. The entire expression computes a weighted sum of the current sample and N–1 past samples.

A concrete example. An FIR filter with N=4, computing $y[3]$ expands to four terms: $$ y[3] = w_0 \cdot x[3] + w_1 \cdot x[2] + w_2 \cdot x[1] + w_3 \cdot x[0] $$ $x[3]$ is the current sample; $x[2], x[1], x[0]$ are the three previous ones. The coefficients $w_k$ determine how much each past sample contributes.

It is a sliding window — the filter length N defines the window width. Each new sample shifts the window forward by one step, and the weighted sum is recomputed. A larger N gives finer frequency resolution but adds latency.

FFT and Frequency-Domain Processing

Time-domain convolution gets expensive as N grows. The frequency-domain alternative: buffer a block of samples, take the FFT, multiply in the frequency domain, then IFFT back to time. This is equivalent to circular convolution (with overlap-add or overlap-save to handle block boundaries correctly). For long filters, the frequency-domain approach blows past direct convolution on computational cost.

Frequency-domain variants of FXLMS use the overlap-save method for block processing, trading off latency against compute efficiency.

Three Acoustic Paths

Every ANC system has three acoustic paths to account for:

  • Primary path P(z): transfer function from the noise source to the error microphone. This is the path the original noise travels before the system cancels it
  • Secondary path S(z): transfer function from the anti-noise speaker to the error microphone. The anti-noise passes through the speaker’s frequency response, cavity reflections, and ear cup attenuation before reaching the ear
  • Reference path: transfer function from the noise source to the reference microphone. The reference mic picks up the input signal for the adaptive filter
mermaid
flowchart TD
    SRC["Noise Source"] -->|"Primary Path P(z)"| EAR["Ear / Error Mic"]
    SRC -->|"Reference Path"| REF["Reference Mic"]
    REF --> DSP["Adaptive Filter W(z)"]
    DSP -->|"Secondary Path S(z)"| EAR

    classDef path fill:#FF9800,color:#fff
    classDef device fill:#9C27B0,color:#fff
    class SRC,EAR path
    class REF,DSP device

The secondary path is the tricky part. The adaptive filter output passes through S(z) before it meets the noise at the superposition point. The error signal carries this path’s imprint. Ignore it, and the algorithm diverges.

The physical reality of the secondary path: the anti-noise signal starts at the speaker diaphragm, passes through the speaker’s frequency response (low frequencies may roll off, high frequencies may show resonance peaks), travels through the air in the front cavity, penetrates the ear cup or eartip (partial absorption), reflects a few times inside the ear canal, and finally arrives at the eardrum. Both amplitude and phase are reshaped along the way.

It is not as simple as “the speaker fires an inverted copy of the noise.” The speaker itself behaves like a band-pass filter — it cannot move the diaphragm effectively at very low frequencies, and high-frequency output may be too directional. Secondary path identification measures the transfer function of this entire chain so the algorithm can compensate for these distortions.

Secondary Path Identification

Standard practice is offline identification of the secondary path. Play white noise or a swept sine through the speaker, capture the response at the error microphone, and estimate $\hat{S}(z)$.

The estimated $\hat{S}(z)$ feeds into the FXLMS (Filtered-X LMS) algorithm — the reference signal goes through $\hat{S}(z)$ first, then into the LMS coefficient update. This corrects the gradient calculation for the phase and magnitude distortion introduced by the secondary path.

Offline identification works if the acoustic environment stays stable. In-ear buds see little change. Over-ear headphones vary with fit tightness. Head-tracking ANC systems may need online $\hat{S}(z)$ updates.