Embedded ANC: ESP32 Practice
The ESP32-S3’s dual-core Xtensa LX7 processor with vector instruction set extensions is well-suited for the real-time DSP workloads required by embedded ANC. Combined with the ESP-DSP library, efficient adaptive filtering becomes practical on this platform.
I2S Microphone Capture
ANC requires at least two synchronous input channels: a reference microphone (capturing ambient noise) and an error microphone (capturing residual error). The ESP32-S3 I2S peripheral supports simultaneous multi-channel ADC data reception. Configuring it for 16-bit, 16 kHz sampling suffices for consumer-grade ANC.
| |
Setting use_apll = true enables the Audio PLL (Phase-Locked Loop) for a more precise audio clock. The APLL is generated by a dedicated PLL circuit with significantly less jitter than the default I2S clock—clock jitter translates directly into sampling timing errors, which in a phase-sensitive system like ANC severely degrades noise cancellation depth. The dma_buf_count = 4 and dma_buf_len = 1024 settings mean: 4 rotating buffers, each holding 1024 samples, for a total buffer of 4096 samples @ 16 kHz ≈ 256 ms. A larger buffer tolerates occasional CPU scheduling delays but increases I2S capture latency—a trade-off between real-time responsiveness and robustness.
The diagram below shows the complete ANC data flow from microphones through DMA into the NLMS processor, and finally to the speaker output:
flowchart TD
MIC1["Reference Mic<br/>I2S CH0"] --> DMA["DMA Double Buffer<br/>128 samples/frame"]
MIC2["Error Mic<br/>I2S CH1"] --> DMA
DMA --> DSP["NLMS Processing<br/>Normalized Step"]
DSP --> DAC["I2S Output<br/>16bit PCM"]
DAC --> SPK["Speaker"]
classDef input fill:#2196F3,color:#fff
classDef proc fill:#9C27B0,color:#fff
classDef output fill:#4CAF50,color:#fff
class MIC1,MIC2,DMA input
class DSP proc
class DAC,SPK outputAdaptive Filter Core
The NLMS filter is the heart of the entire ANC system. On the ESP32-S3, a filter order of 64 covers the dominant low-frequency noise below 1 kHz. The circular buffer avoids shifting all data on every sample, reducing the per-sample complexity from O(N) to O(1).
| |
The secondary path s_hat has 32 taps. It is typically obtained through offline identification—playing white noise before deployment and recording it with the error microphone, then estimating the transfer function from speaker to error mic via LMS.
NLMS Frame Processing
The real-time processing flow for each audio frame (128 samples):
| |
The ring buffer convolution access pattern is most intuitive as an animation — note how the read pointer wraps around in reverse:
fast_convolve computes convolution using ring buffer indexing, avoiding data reordering on each sample. The regularization term 1e-6f in the NLMS coefficient update prevents division by zero when the reference signal power is too low. The step size mu = 0.0005 balances convergence speed against steady-state misadjustment—fast enough for steady noises like fans and engines without excessive fluctuation after convergence.
Dual-Core Task Assignment
The ANC loop is pinned to CPU1 (PRO_CPU), while CPU0 handles Wi-Fi/Bluetooth stacks and other system tasks. With a 16 kHz sample rate and a frame size of 128 samples, each frame has a processing budget of approximately 8 ms.
The diagram below illustrates the specific division of labor between the two cores—CPU0 handles system management while CPU1 is dedicated to the real-time ANC pipeline:
flowchart TD
subgraph CPU0["CPU0 - System Management"]
A1["WiFi/BT Stack"]
A2["Config Management"]
A3["OTA Updates"]
end
subgraph CPU1["CPU1 - Real-time ANC"]
B1["I2S Read<br/>Dual-channel Audio"]
B2["NLMS/FXLMS<br/>Filter Computation"]
B3["I2S Write<br/>Anti-noise Output"]
B1 --> B2 --> B3 --> B1
end
classDef sys fill:#2196F3,color:#fff
classDef anc fill:#f44336,color:#fff
class A1,A2,A3 sys
class B1,B2,B3 anc | |
Using heap_caps_malloc(..., MALLOC_CAP_INTERNAL) ensures all buffers reside in internal SRAM, avoiding PSRAM access latency (typically 3-5× slower than SRAM) that could cause processing overruns. A task stack size of 8192 bytes is sufficient for the DSP call chain’s local variables.
The I2S output (I2S_NUM_1) drives an external DAC or Class-D amplifier, emitting anti-phase sound waves that cancel the original noise acoustically. Output values are clamped via fmaxf/fminf to prevent DAC clipping distortion.
The reason for pinning to CPU1 rather than CPU0: CPU0 in ESP-IDF handles the Wi-Fi and Bluetooth stacks. Wi-Fi hardware interrupts are high-priority and frequent—every beacon frame reception or TCP ACK triggers an interrupt. These interrupts preempt user tasks on CPU0, causing unpredictable delay jitter in ANC frame processing. CPU1 (APP_CPU) is unaffected by Wi-Fi interrupts, making it the better choice for real-time DSP tasks.
ESP32-S3 vs ESP32: The S3’s Xtensa LX7 processor adds vector instruction extensions (
AE_*family) that can complete multiple multiply-accumulate (MAC) operations in a single cycle, making NLMS convolution updates approximately 3-5× faster than the ESP32 (LX6). Both cores on the ESP32 are LX6, which lacks vector extensions and has significantly lower DSP efficiency. If using an ESP32 (non-S3), consider reducing the filter order to 32 or shortening the frame to 64 samples to maintain real-time performance.
Build Configuration
The following options must be enabled in sdkconfig:
CONFIG_ESP_DSP_ENABLED=y— enable the ESP-DSP libraryCONFIG_DSP_MAX_FFT_SIZE=4096— maximum FFT size (required by ESP-DSP initialization)CONFIG_FREERTOS_UNICORE=n— enable dual-core schedulingCONFIG_I2S_SUPPRESS_DEPRECATE_WARN=n— allow legacy I2S API (ESP-IDF v5.x removed the old API; migration to the newi2s_chan_handle_tdriver model is recommended)
For ESP-IDF v5.0 and above, migrating to the new I2S driver model is recommended, though the processing logic remains the same.
Latency Budget
The total pipeline latency accumulates as follows:
| Stage | Latency |
|---|---|
| I2S DMA capture | 1 frame (128 samples) ≈ 8 ms |
| ADC conversion + transfer | ~0.5 ms |
| Frame processing (NLMS) | ~2-3 ms |
| DAC conversion + transfer | ~0.5 ms |
| Acoustic path | ~0.1-0.3 ms |
| Total | ~11-12 ms |
The following diagram breaks down the latency budget visually:
flowchart TD
A["I2S Capture<br/>~2ms"] --> B["DMA Transfer<br/>~0.1ms"]
B --> C["NLMS Compute<br/>~3-5ms"]
C --> D["I2S Output<br/>~1ms"]
D --> TOTAL["Total Latency<br/>~6-8ms<br/>Effective Freq: ~60-80Hz"]
classDef stage fill:#2196F3,color:#fff
classDef result fill:#f44336,color:#fff
class A,B,C,D stage
class TOTAL resultThe effective frequency ceiling for consumer ANC is approximately 1 / (2 × total latency) ≈ 40-45 Hz, so this approach primarily targets low-frequency rumble below 50 Hz—air conditioner compressors, fans, road noise. To increase the effective frequency range, reduce frame size, lower the filter order, or run on a more powerful MCU.