Spatial Domain Restoration & Edge-Preserving Filters
The previous post introduced frequency domain image processing methods — transforming images to the frequency domain via Fourier transform, then performing filtering, restoration, and other operations. However, frequency domain methods are sometimes less intuitive, especially when we’re more accustomed to directly manipulating pixels.
Spatial domain methods process images directly in pixel space, without requiring transformations. This chapter introduces spatial domain image restoration, edge-preserving filtering, and sharpening techniques.
Lucy-Richardson Iterative Deconvolution
The goal of deconvolution is to recover the original image from a degraded image. The degradation process is typically modeled as convolution: degraded image = original image convolved with blur kernel + noise.
Direct deconvolution (inverse filtering) is extremely unstable when noise exists — noise is amplified by the division in high-frequency regions. The Lucy-Richardson algorithm provides a more robust iterative method.
Maximum Likelihood Principle
The Lucy-Richardson algorithm is based on maximum likelihood estimation, assuming image noise follows a Poisson distribution. Poisson noise is characterized by variance equal to mean — common in low-light imaging, X-ray imaging, and other scenarios.
Iterative Formula
The Lucy-Richardson iterative update formula is:
$$ f_{k+1}(x, y) = f_k(x, y) \cdot \left( \frac{g(x, y)}{h * f_k(x, y)} * h(-x, -y) \right) $$
Where:
- $f_k$ is the $k$-th iteration’s estimated image
- $g$ is the observed degraded image
- $h$ is the blur kernel (point spread function)
- $*$ denotes convolution
- $h(-x, -y)$ is the flipped blur kernel
Iterative Behavior
Intuitively, this formula does two things:
- Calculate the ratio between current estimate and observation $g / (h * f_k)$ — if current estimate is too dark, ratio > 1, needs enhancement
- Use the flipped blur kernel to back-propagate the ratio, correcting pixel values
Choice of iteration count is crucial:
- Too few iterations: image remains blurry, insufficient restoration
- Too many iterations: details fully restored but noise amplified, overfitting
Early stopping is a common strategy — stop iteration before noise is over-amplified.
Figure 1 - Lucy-Richardson iteration process:
flowchart TD
A["Initial estimate f_0<br/>uniform or degraded image"] --> B["Compute convolution h * f_k"]
B --> C["Compute ratio g / (h * f_k)"]
C --> D["Back-propagate: ratio * h(-x, -y)"]
D --> E["Update: f_{k+1} = f_k × back-propagation"]
E --> F{Iterations<br/>enough?}
F -->|No| B
F -->|Yes| G["Output restored image f_k"]
classDef start fill:#FF9800,color:#fff
classDef proc fill:#2196F3,color:#fff
classDef check fill:#9C27B0,color:#fff
classDef end fill:#4CAF50,color:#fff
class A start
class B,C,D,E proc
class F check
class G endIn practice, the blur kernel $h$ can be measured experimentally (e.g., photographing a point source) or estimated via methods like blur edge analysis.
Total Variation Regularization
Deconvolution is essentially an ill-posed problem — even knowing the blur kernel, recovering the original image from the degraded image has no unique solution. Regularization introduces prior constraints to make the solution more stable.
Why Choose L1 Norm
The most intuitive regularization is energy minimization: minimize the total variation (TV) of the restored image. TV measures the sum of gradients in an image:
$$ TV(f) = \sum_{x,y} \sqrt{(\nabla_x f)^2 + (\nabla_y f)^2} $$
Using L1 norm (absolute gradient) instead of L2 norm (squared gradient) is because:
- L2 norm over-penalizes strong gradients — edge gradients get smoothed out
- L1 norm is more tolerant of large gradients — allows edges to remain sharp
Intuitively: L2 norm wants gradients to be small everywhere (smooth), while L1 norm allows gradients to be large in some places (edges), as long as other areas have sufficiently small gradients (flat regions).
Optimization Objective
The image restoration objective with TV regularization:
$$ \min_f | g - h * f |_2^2 + \lambda \cdot TV(f) $$
The first term is the data fidelity term — the restored image must be consistent with the observed image. The second term is the regularization term — the restored image’s total variation should be small. $\lambda$ is the balance parameter: large $\lambda$ favors smoother images, small $\lambda$ favors closer adherence to observation.
This optimization problem is typically solved via iterative algorithms like gradient descent or split Bregman. In practice, you can call scikit-image functions like restoration.denoise_tv_chambolle or restoration.unsupervised_wiener.
Limitations of TV Regularization
While TV regularization preserves edges, it has two common issues:
- Staircase effect — recovered flat regions show staircase artifacts because L1 norm tends to produce piecewise-constant regions
- Texture loss — fine textures get over-smoothed
More advanced regularization methods (e.g., non-local means, sparse representation) can mitigate these issues but have higher computational complexity.
Bilateral Filter
Edge-preserving filtering’s core goal is: smooth noise while not blurring edges. Traditional Gaussian filtering cannot achieve this — it treats all pixels equally, blurring both edges and flat regions.
Bilateral filtering achieves edge preservation by introducing color similarity as an additional weight.
Two-Dimensional Weights
Each pixel in bilateral filtering has two weights:
- Spatial weight $G_s$: determined by geometric distance between pixels, closer pixels get larger weight
- Color weight $G_r$: determined by difference in pixel values, more similar values get larger weight
Complete formula:
$$ f(x, y) = \frac{1}{W} \sum_{(i,j) \in \Omega} I(i, j) \cdot G_s(|(x,y)-(i,j)|) \cdot G_r(|I(x,y) - I(i,j)|) $$
Where $W$ is the normalization factor:
$$ W = \sum_{(i,j) \in \Omega} G_s(|(x,y)-(i,j)|) \cdot G_r(|I(x,y) - I(i,j)|) $$
$G_s$ and $G_r$ are typically Gaussian functions:
$$ G_s(d) = \exp\left(-\frac{d^2}{2\sigma_s^2}\right), \quad G_r(v) = \exp\left(-\frac{v^2}{2\sigma_r^2}\right) $$
$\sigma_s$ controls spatial neighborhood size, $\sigma_r$ controls color similarity threshold.
Why It Preserves Edges
The magic of bilateral filtering lies in the color weight $G_r$:
In flat regions, neighboring pixel values are similar, $G_r \approx 1$, bilateral filtering reduces to Gaussian filtering — noise is smoothed.
At edges, neighboring pixel values differ significantly, $G_r \approx 0$, cross-edge weights are suppressed — edges are not blurred.
Figure 2 - Bilateral filtering weight mechanism:
flowchart TD
CENTER["Center pixel P(x,y)<br/>I = 128"]
P1["P1: I=125, dist1.0<br/>diff3 → weight0.93"]
P2["P2: I=120, dist1.4<br/>diff8 → weight0.77"]
P3["P3: I=30, dist1.0<br/>diff98 → weight≈0"]
CENTER --> P1
CENTER --> P2
CENTER --> P3
classDef center fill:#FF9800,color:#fff
classDef neighbor fill:#2196F3,color:#fff
classDef edge fill:#f44336,color:#fff
class CENTER center
class P1,P2 neighbor
class P3 edgeThe figure above shows that P3, despite being spatially close ($G_s \approx 0.95$), has a near-zero final weight because the color difference is too large (crossing an edge) — the edge is preserved.
Practical Applications
Bilateral filtering is commonly used for:
- High Dynamic Range (HDR) imaging: fuse multi-exposure images then adjust local contrast with bilateral filtering
- Cartoonization: strong bilateral filtering followed by color quantization produces cartoon-style effects
- Image matting: use bilateral filtering to optimize alpha masks
OpenCV provides cv2.bilateralFilter():
| |
Bilateral filtering’s drawback is high computational cost — each pixel needs to iterate over all pixels in the neighborhood, time complexity $O(N \cdot r^2)$, where $N$ is pixel count, $r$ is neighborhood radius. In practice, $d$ should not exceed 15, otherwise processing becomes slow.
Bilateral Filtering vs Gaussian Filtering
Compare the behavior of bilateral and Gaussian filtering at edges:
- Gaussian filtering: all neighborhood pixels participate regardless of edge presence → edges get blurred
- Bilateral filtering: cross-edge pixel weights are suppressed → edges preserved
You can verify this experimentally: process the same image with cv2.GaussianBlur() and cv2.bilateralFilter(), compare edge regions — Gaussian blur widens edges, bilateral filtering keeps edges sharp.
Guided Filter
Guided filtering is a linear-time $O(N)$ edge-preserving filtering method proposed by Kaiming He et al. in 2010. Compared to bilateral filtering, it has two advantages:
- Speed: linear time complexity, can quickly process large images
- No gradient reversal: avoids gradient reversal artifacts found in bilateral filtering (gradient reversal artifact means some regions’ gradient direction after filtering is opposite to the original image)
Local Linear Model
Guided filtering’s core assumption is: output image is a local linear transform of the guide image. For each local window $\omega_k$, we have:
$$ q_i = a_k I_i + b_k, \quad \forall i \in \omega_k $$
Where:
- $q_i$ is the $i$-th pixel of the output image
- $I_i$ is the $i$-th pixel of the guide image (guide image can be the input image itself or another image)
- $a_k$ and $b_k$ are linear coefficients in window $\omega_k$ (all pixels in the same window share $a_k, b_k$)
If $a_k$ is small, the output image varies smoothly within the window (flat region); if $a_k$ is large, the output follows the guide image (edge region).
Coefficient Solution
$a_k$ and $b_k$ are obtained via least squares fitting. The objective is to minimize:
$$ \min_{a_k, b_k} \sum_{i \in \omega_k} \left[ (a_k I_i + b_k - p_i)^2 + \epsilon a_k^2 \right] $$
Where $p_i$ is the $i$-th pixel of the input image. $\epsilon a_k^2$ is the regularization term, preventing $a_k$ from being too large (overfitting noise).
The solution is:
$$ a_k = \frac{\frac{1}{|\omega|} \sum_{i \in \omega_k} I_i p_i - \mu_k \bar{p}_k}{\sigma_k^2 + \epsilon} $$
$$ b_k = \bar{p}_k - a_k \mu_k $$
Where:
- $\mu_k$ is the mean of the guide image in window $\omega_k$
- $\sigma_k^2$ is the variance of the guide image in window $\omega_k$
- $\bar{p}_k$ is the mean of the input image in window $\omega_k$
- $|\omega|$ is the number of pixels in the window
Intuitively:
- Large variance $\sigma_k^2$ (edge regions), $a_k$ tends to be large → output follows guide image
- Small variance $\sigma_k^2$ (flat regions), $a_k$ tends toward 0 → output gets smoothed
Practical Applications
OpenCV provides cv2.ximgproc.guidedFilter() (requires opencv-contrib-python):
| |
Guided filtering is commonly used for:
- Matting refinement: use original image as guide, refine coarse alpha masks
- Dehazing: use dark channel as guide, restore clear image
- HDR compression: use original image as guide, adjust local contrast
Bilateral Filtering vs Guided Filtering
| Feature | Bilateral Filtering | Guided Filtering |
|---|---|---|
| Time complexity | $O(N \cdot r^2)$ | $O(N)$ |
| Edge preservation | Yes | Yes |
| Gradient reversal | Possible | None |
| Guidability | Only itself | Can use other images |
For large images or real-time applications, guided filtering is more suitable. If fine-grained control over edge preservation is needed, bilateral filtering’s parameter tuning is more intuitive.
Laplacian Sharpening
Sharpening is the opposite of filtering — filtering is for smoothing, sharpening is for enhancing edge contrast. Laplacian sharpening is the most basic sharpening method.
Second-Order Derivative Operator
The Laplacian operator is a second-order differential operator, defined as:
$$ \nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} $$
Second derivatives respond strongly to edges (where gradient changes abruptly) and approach 0 in flat regions. Thus Laplacian response can be seen as a measure of edge strength.
Discretization and Convolution Kernels
In discrete images, second derivatives are approximated with finite differences:
$$ \frac{\partial^2 f}{\partial x^2} \approx f(x+1, y) - 2f(x, y) + f(x-1, y) $$
$$ \frac{\partial^2 f}{\partial y^2} \approx f(x, y+1) - 2f(x, y) + f(x, y-1) $$
Adding them yields the 4-neighborhood Laplacian kernel:
| |
If diagonal directions are considered (8-neighborhood), the kernel is:
| |
Sharpening Formula
Adding the Laplacian response back to the original image enhances edge contrast:
$$ g(x, y) = f(x, y) + c \cdot \nabla^2 f(x, y) $$
$c$ is the sharpening strength:
- $c > 0$: enhance edges (when kernel center coefficient is positive, like 4 or 8)
- $c < 0$: reverse operation (when kernel center coefficient is negative, like -4 or -8)
In practice, to maintain image brightness, typically use a kernel with positive center coefficient (like changing 4-neighborhood kernel center to 4):
| |
This kernel is equivalent to $g = f + \nabla^2 f$ (since $5 = 4 + 1$).
Practical Applications
OpenCV applies custom convolution kernels with cv2.filter2D():
| |
Limitations of Sharpening
Laplacian sharpening has several issues:
- Noise amplification: sharpening also amplifies noise because noise is in high-frequency regions
- Overshoot and undershoot: excessive sharpening creates overshoot and undershoot on both sides of edges — bright rings and dark rings appear around edges
- Unnatural appearance: over-sharpened images look unnatural, like over-saturation
In practice, it’s best to denoise before sharpening (e.g., Gaussian filtering), and don’t use too much sharpening strength.
Summary
Spatial domain methods directly manipulate pixels, are more intuitive, but also have limitations like high computational cost and difficulty analyzing global properties.
| Method | Use Case | Core Idea | Limitations |
|---|---|---|---|
| Lucy-Richardson | Image deblurring | Iterative maximum likelihood | Iteration count hard to choose, noise amplification |
| TV regularization | Image denoising/deblurring | Minimize total variation (L1 gradient) | Staircase effect, texture loss |
| Bilateral filtering | Edge-preserving denoising | Spatial weight + color weight | High computational cost |
| Guided filtering | Edge-preserving filtering/matting | Local linear model | Parameter sensitive |
| Laplacian sharpening | Edge enhancement | Second derivative detects edges | Noise amplification, overshoot/undershoot |
Frequency domain and spatial domain methods each have strengths and weaknesses. In actual projects, they’re often combined: first perform global filtering in the frequency domain, then use spatial domain methods for local adjustments.
The next post will introduce more advanced topics — deep learning-based image processing.