ESP32-CAM Monitor: DIY Auto Flash for Dark Scenes

Why

I built a surveillance camera with ESP32-S3 before, and it worked well. Later, while rummaging through a drawer, I found an AI-Thinker ESP32-CAM development board — that classic board costing about ten bucks with a built-in OV2640 camera and TF card slot. No reason to let it go to waste, so I built another one: ai-thinker-esp32-cam.

This time I wrote the firmware from scratch using ESP-IDF again, with similar capabilities to the previous project but with lots of adaptations for the AI-Thinker board. Here’s what it ended up doing:

  • Real-time MJPEG stream viewable in browser by opening the device IP
  • Motion detection with auto photo capture
  • Auto flash for photo capture in dark environments
  • Save photos to TF card, with WebDAV upload to NAS
  • Web management interface, configurable from phone
  • Prometheus /metrics endpoint for monitoring integration
  • WiFi AP/STA dual mode, phone-based network config on first boot

The features aren’t complex, but the flash logic took me several days. The reason is simple: I was too lazy to add a photosensor. Saved a few wires, but wrote a few hundred extra lines of code.

Flash Logic: The Price of Skipping a Sensor

The requirement was one sentence; the implementation took days

Automatically turn on the flash when it gets dark — that’s the whole requirement. Adding a photoresistor to an ADC pin and reading a voltage value would take about half an hour. But the AI-Thinker ESP32-CAM has tight GPIO availability — the flash uses GPIO4, TF card uses GPIO2/14/15, the camera has a bunch of pins, and there aren’t many left. Adding another sensor would require a breadboard, Dupont wires, a voltage divider resistor… just thinking about it felt like too much work.

So why not use the camera itself to sense brightness? The camera is a light sensor, after all — bright scene means bright image, dark scene means dark image. The catch is that the camera is always outputting in JPEG mode (for MJPEG streaming and motion detection), so I needed to determine ambient brightness without disrupting normal operation.

Reading OV2640 exposure registers — didn’t work

My first thought was to read the OV2640’s AEC (Auto Exposure Control) registers. In theory, the exposure value reflects ambient brightness: darker scenes mean higher exposure, zero-latency zero-overhead reading. Perfect.

In practice, this register was completely unreliable in continuous JPEG output mode — the aec_value stayed locked at the max value of 671 regardless of actual lighting conditions. I checked the OV2640 datasheet but found no clear explanation. It might be a bug in the AEC feedback loop under continuous output mode, or just a “special feature” of this sensor. Either way, this approach was a dead end.

Grayscale pixel sampling: using the camera as a photosensor

Since reading registers didn’t work, let’s read pixels directly. The approach is straightforward:

  1. Do brightness probing every 30 seconds
  2. Temporarily switch the camera from JPEG mode to grayscale mode (GRAYSCALE + QQVGA 160×120)
  3. Grab one frame, iterate through all pixels, calculate average brightness
  4. Switch back to JPEG mode and continue
c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Switch to grayscale mode
esp_err_t ret = camera_init_grayscale();
if (ret != ESP_OK) {
    ESP_LOGE(TAG, "Grayscale probe: init failed");
    return;
}

// Discard first two frames, white balance hasn't stabilized after mode switch
for (int i = 0; i < 2; i++) {
    camera_fb_t *fb = esp_camera_fb_get();
    if (fb) esp_camera_fb_return(fb);
}

// Grab one frame and calculate average brightness
camera_fb_t *fb = NULL;
if (camera_capture(&fb) != ESP_OK || fb == NULL) {
    camera_restore_jpeg();
    return;
}

// Grayscale mode: one byte per pixel, just sum and average
uint32_t sum = 0;
for (size_t i = 0; i < fb->len; i++) {
    sum += fb->buf[i];
}
uint8_t avg = (uint8_t)(sum / fb->len);
uint8_t pct = (uint8_t)((uint32_t)avg * 100 / 255);  // Convert to percentage

bool is_dark = (pct < cfg->flash_threshold);

// Switch back to JPEG mode
camera_restore_jpeg();

QQVGA 160×120 totals 19,200 pixels — iterating through them takes a few hundred microseconds, negligible performance impact. In grayscale mode, each pixel is one byte (0~255), just sum and average, simple and crude.

However, there’s a prerequisite: if someone is viewing the MJPEG stream in a browser, you can’t do grayscale probing — switching modes would break the MJPEG stream. In that case, we fall back to the alternative below.

JPEG size as brightness proxy: a zero-cost hack

When someone is watching the stream, you can’t switch modes, but you still need to determine brightness. What to do? The answer is hidden in every JPEG frame: JPEG file size itself is a brightness indicator.

The principle is simple. In dark scenes, most of the frame is black — large areas of uniform color compress very efficiently in JPEG, resulting in small files. The brighter the scene with more detail, the larger the file. I tested at SVGA resolution with quality=10 and got these numbers:

  • Dark (lights off): 12~14 KB
  • Normal indoor: 14~17 KB
  • Bright (facing window): 17~25 KB

Based on this I created a linear mapping:

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
if (s_brightness.method != 2) {
    // No grayscale data, estimate from JPEG size
    uint32_t jpeg_kb = (uint32_t)frame_len / 1024;
    uint8_t pct;

    if (jpeg_kb >= 22) {
        pct = 100;
    } else if (jpeg_kb <= 12) {
        pct = 0;
    } else {
        pct = (uint8_t)((jpeg_kb - 12) * 100 / 10);
    }

    s_brightness.method = 1;
    s_brightness.brightness_pct = pct;
    s_brightness.is_dark = (pct < cfg->flash_threshold);
}

Less accurate than grayscale sampling, but with zero additional overhead — motion detection already grabs frames for frame-differencing, and JPEG size is obtained as a side effect without doing anything extra.

The relationship between the two approaches:

  • Grayscale probing (method=2): Accurate, but requires mode switching, not usable when someone is viewing the stream
  • JPEG size estimation (method=1): Moderate accuracy, but always available with zero extra cost
mermaid
flowchart TD
    A["Per-frame motion detection"] --> B@{shape: diam, label: "Grayscale probing available?"}
    B -->|"Probed within 30s & no MJPEG clients"| C["Use grayscale result"]
    B -->|"MJPEG client online or timeout"| D["Estimate via JPEG size"]
    C --> E@{shape: diam, label: "Brightness < threshold?"}
    D --> E
    E -->|"Yes"| F["Mark as dark scene"]
    E -->|"No"| G["Mark as bright scene"]
    F --> H["Flash ON when motion detected"]
    G --> I["Normal photo capture"]

    classDef decision fill:#fff3e0,stroke:#ff9800,stroke-width:2px
    classDef process fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef result fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    class B,E decision
    class A,C,D,H,I process
    class F,G result

Flash control: 80% duty cycle to be safe

The AI-Thinker ESP32-CAM flash is connected to GPIO4. There’s a hardware trap here: the board has no current-limiting resistor for the flash. The LED is directly connected to the GPIO. Running at full power could exceed current limits and potentially damage the board over time.

So I used LEDC PWM control, capping the duty cycle at 80% (205/255), keeping a safety margin:

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#define FLASH_GPIO            4
#define FLASH_LEDC_TIMER      LEDC_TIMER_1   // Timer 0 is used by camera XCLK
#define FLASH_LEDC_CHANNEL    LEDC_CHANNEL_1
#define FLASH_PWM_FREQ        2000           // 2 kHz
#define FLASH_PWM_RES         LEDC_TIMER_8_BIT
#define FLASH_PWM_DUTY        205            // ~80%

// Flash ON
ledc_set_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL, FLASH_PWM_DUTY);
ledc_update_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL);

// Flash OFF
ledc_set_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL, 0);
ledc_update_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL);

Note that the Timer and Channel can’t be chosen arbitrarily. The camera XCLK uses Timer 0 / Channel 0, so the flash must use Timer 1 / Channel 1, otherwise the two peripherals will conflict. This kind of trap isn’t mentioned in the ESP-IDF documentation — you only find out when things don’t work.

Flash only turns on during photo capture, never during detection

This design decision is crucial. The flash only turns on when frame-differencing detects motion and a photo needs to be saved. The flash is never on during frame-differencing comparison.

The reason is obvious: turning the flash on and off causes dramatic brightness changes between frames. The frame-differencing algorithm would see the entire scene “moving” and report 80%+ differences, all false alarms.

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
static void handle_motion_event(bool dark_scene)
{
    camera_fb_t *fb = NULL;
    if (dark_scene) {
        ESP_LOGI(TAG, "Dark scene, flash ON for photo");
        // Flash ON → wait 200ms → capture → flash OFF
        ledc_set_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL, FLASH_PWM_DUTY);
        ledc_update_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL);
        vTaskDelay(pdMS_TO_TICKS(200));  // Wait for white balance to catch up
        camera_capture(&fb);
        ledc_set_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL, 0);
        ledc_update_duty(FLASH_LEDC_SPEED, FLASH_LEDC_CHANNEL);
    }
    if (fb == NULL) {
        camera_capture(&fb);  // No flash needed, just capture
    }
    // ... save to TF card / upload to NAS ...
}

After turning on the flash, I wait 200ms before capturing. Right after the flash turns on, the white balance hasn’t caught up yet — colors would be off. Waiting lets the OV2640’s auto white balance stabilize for a usable photo.

Auto-lowering motion detection threshold in dark scenes

One more detail: in dark scenes, JPEG frame-differencing values are naturally lower (darker scenes have fewer details, less inter-frame difference after compression). Using normal thresholds might miss motion events. So in dark scenes, the threshold is automatically reduced to one quarter:

c
1
2
3
uint8_t effective_thresh = dark_scene
    ? (cfg->motion_threshold > 20 ? cfg->motion_threshold / 4 : 5)
    : cfg->motion_threshold;

No need to elaborate — a moment’s thought explains why.

Other Features

The flash consumed the most development time. The rest is straightforward, just a listing:

  • MJPEG stream: Double buffering + PSRAM, viewable directly in browser
  • WiFi management: AP/STA dual mode, AP mode for first-time network config, then auto STA
  • TF card: GPIO14 is shared with camera — must mount TF card after camera init, wrong order causes crashes
  • NAS upload: WebDAV/HTTP POST, auto-upload after capture
  • Web interface: Configure parameters, check status, download photos — all from phone
  • Prometheus metrics: /metrics endpoint, feed into Grafana dashboard

Closing Thoughts

Looking back, soldering a photoresistor would have taken half an hour. But using the camera itself for brightness detection forced me to thoroughly understand OV2640 mode switching, JPEG compression characteristics, LEDC PWM resource allocation, and other details. Was it worth it? Hard to say, but the tinkering process was certainly interesting.

All code is on GitHub: Mi-Bee-Studio/ai-thinker-esp32-cam, with build and flash instructions in the README.