Implementing Image Restoration in Go and Rust

June 28, 2026 Physical World Dev Computer Vision, Go, Rust, Image Processing Signals & Image Processing Series 2828 words 14 min read

🔊

This post implements core image restoration algorithms in Go and Rust, comparing the two languages’ approaches to image processing through practical code.

原始图像退化图像

— —

Go Implementation

Go’s standard image library provides image encoding/decoding and pixel access. The following implementation demonstrates how to implement bicubic interpolation, Gaussian blur, and Laplacian sharpening in Go.

ImageProcessor Setup

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
package main

import (
	"image"
	"image/color"
	"math"
)

// ImageProcessor encapsulates image processing operations
type ImageProcessor struct {
	img *image.RGBA
}

// NewImageProcessor creates a new image processor
func NewImageProcessor(img *image.RGBA) *ImageProcessor {
	return &ImageProcessor{img: img}
}

Input (4×4)

Output (12×12)

Bicubic Interpolation Implementation

Bicubic interpolation uses a 4×4 pixel neighborhood with weighted averaging, where weights are calculated by a cubic polynomial kernel function.

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// cubicKernel computes the bicubic interpolation kernel function
func cubicKernel(x float64) float64 {
	absX := math.Abs(x)
	if absX <= 1 {
		return 1.5*absX*absX*absX - 2.5*absX*absX + 1
	}
	if absX <= 2 {
		return -0.5*absX*absX*absX + 2.5*absX*absX - 4*absX + 2
	}
	return 0
}

Key Method: bicubicInterpolate

Important: The original report references bicubicInterpolate() but never implements it. Here’s the complete implementation:

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// bicubicInterpolate performs bicubic interpolation at fractional coordinates
func (ip *ImageProcessor) bicubicInterpolate(x, y float64) (float64, float64, float64, float64) {
	bounds := ip.img.Bounds()
	xInt := int(math.Floor(x)) - 1
	yInt := int(math.Floor(y)) - 1

	var r, g, b, a float64
	for j := 0; j < 4; j++ {
		for i := 0; i < 4; i++ {
			px := clampInt(xInt+i, bounds.Min.X, bounds.Max.X-1)
			py := clampInt(yInt+j, bounds.Min.Y, bounds.Max.Y-1)
			pr, pg, pb, pa := ip.img.At(px, py).RGBA()
			weight := cubicKernel(x-float64(xInt+i)) * cubicKernel(y-float64(yInt+j))
			r += float64(pr>>8) * weight
			g += float64(pg>>8) * weight
			b += float64(pb>>8) * weight
			a += float64(pa>>8) * weight
		}
	}
	return r, g, b, a
}

// clampInt clamps an integer to a specified range
func clampInt(val, min, max int) int {
	if val < min {
		return min
	}
	if val > max {
		return max
	}
	return val
}

How it works:

Calculates the 4×4 pixel neighborhood around the target point
Applies cubic kernel weights in both X and Y directions
Handles image boundaries with clampInt() (nearest neighbor padding)
Returns RGBA values in [0, 255] range

Gaussian Blur Implementation

Gaussian blur uses a Gaussian function as the convolution kernel for weighted averaging.

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// gaussianKernel generates a Gaussian convolution kernel
func gaussianKernel(size int, sigma float64) [][]float64 {
	kernel := make([][]float64, size)
	center := float64(size-1) / 2.0
	sum := 0.0

	for i := 0; i < size; i++ {
		kernel[i] = make([]float64, size)
		for j := 0; j < size; j++ {
			dx := float64(i) - center
			dy := float64(j) - center
			kernel[i][j] = math.Exp(-(dx*dx+dy*dy)/(2*sigma*sigma)) / (2 * math.Pi * sigma * sigma)
			sum += kernel[i][j]
		}
	}

	// Normalize
	for i := 0; i < size; i++ {
		for j := 0; j < size; j++ {
			kernel[i][j] /= sum
		}
	}

	return kernel
}

// GaussianBlur applies Gaussian blur
func (ip *ImageProcessor) GaussianBlur(size int, sigma float64) {
	kernel := gaussianKernel(size, sigma)
	ip.applyConvolution(kernel)
}

Kernel details:

Size: typically 3×3, 5×5, or 7×7
Sigma: controls blur strength (typical range 0.5-5.0)
Normalization preserves overall brightness

Laplacian Sharpening Implementation

Laplacian sharpening uses a standard 3×3 Laplacian kernel to detect and enhance high-frequency information.

go
1
2
3
4
5
6
7
8
9
// LaplacianSharpen applies Laplacian sharpening
func (ip *ImageProcessor) LaplacianSharpen() {
	kernel := [][]float64{
		{0, -1, 0},
		{-1, 5, -1},
		{0, -1, 0},
	}
	ip.applyConvolution(kernel)
}

Sharpening principle:

Detects edges with negative weights
Center positive weight (5) amplifies high-frequency components
Preserves original image structure

Generic Convolution Function

The applyConvolution() function is shared by both Gaussian blur and Laplacian sharpening operations.

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// applyConvolution applies a convolution kernel to the image
func (ip *ImageProcessor) applyConvolution(kernel [][]float64) {
	src := *ip.img
	bounds := src.Bounds()
	dst := image.NewRGBA(bounds)

	kernelSize := len(kernel)
	half := kernelSize / 2

	for y := bounds.Min.Y; y < bounds.Max.Y; y++ {
		for x := bounds.Min.X; x < bounds.Max.X; x++ {
			var r, g, b float64

			// Apply convolution kernel to each channel
			for ky := 0; ky < kernelSize; ky++ {
				for kx := 0; kx < kernelSize; kx++ {
					px := x - half + kx
					py := y - half + ky

					// Boundary handling: use nearest neighbor padding
					if px < bounds.Min.X {
						px = bounds.Min.X
					}
					if px >= bounds.Max.X {
						px = bounds.Max.X - 1
					}
					if py < bounds.Min.Y {
						py = bounds.Min.Y
					}
					if py >= bounds.Max.Y {
						py = bounds.Max.Y - 1
					}

					pr, pg, pb, _ := src.At(px, py).RGBA()
					weight := kernel[ky][kx]
					r += float64(pr>>8) * weight
					g += float64(pg>>8) * weight
					b += float64(pb>>8) * weight
				}
			}

			// Clamp to [0, 255] range
			dst.Set(x, y, color.RGBA{
				R: uint8(clampFloat(r, 0, 255)),
				G: uint8(clampFloat(g, 0, 255)),
				B: uint8(clampFloat(b, 0, 255)),
				A: 255,
			})
		}
	}

	*ip.img = *dst
}

// clampFloat clamps a float to a specified range
func clampFloat(val, min, max float64) float64 {
	if val < min {
		return min
	}
	if val > max {
		return max
	}
	return val
}

Convolution breakdown:

For each output pixel, examine a 3×3 neighborhood
Multiply each neighbor’s RGB by kernel weight
Sum weighted values for each channel
Clamp to valid [0, 255] range
Handle boundaries with nearest neighbor padding

Super-Resolution Pipeline

Combining bicubic interpolation upsampling with Laplacian sharpening for basic super-resolution.

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// SuperResolution performs super-resolution: bicubic interpolation + sharpening
func (ip *ImageProcessor) SuperResolution(scale float64) {
	src := *ip.img
	srcBounds := src.Bounds()

	newWidth := int(float64(srcBounds.Dx()) * scale)
	newHeight := int(float64(srcBounds.Dy()) * scale)

	upsampled := image.NewRGBA(image.Rect(0, 0, newWidth, newHeight))

	// Bicubic interpolation upscaling
	for y := 0; y < newHeight; y++ {
		for x := 0; x < newWidth; x++ {
			srcX := float64(x) / scale
			srcY := float64(y) / scale
			r, g, b, a := ip.bicubicInterpolate(srcX, srcY)
			upsampled.Set(x, y, color.RGBA{
				R: uint8(clampFloat(r, 0, 255)),
				G: uint8(clampFloat(g, 0, 255)),
				B: uint8(clampFloat(b, 0, 255)),
				A: uint8(clampFloat(a, 0, 255)),
			})
		}
	}

	ip.img = upsampled
	ip.LaplacianSharpen()
}

Pipeline stages:

Calculate new dimensions based on scale factor
Create target buffer
For each output pixel, compute corresponding source coordinates
Apply bicubic interpolation for smooth upscaling
Apply Laplacian sharpening to restore edge detail

Rust Implementation

Rust with Rayon parallel library enables efficient multi-core image processing.

Dependencies

toml
1
2
3
4
[dependencies]
image = "0.24"
rayon = "1.8"
num-complex = "0.4"

Dependencies:

image: Core image I/O and manipulation
rayon: Data parallelism with work stealing
num-complex: Complex number operations for FFT

ImageProcessor Struct

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
use image::{RgbImage, Rgb};
use rayon::prelude::*;

pub struct ImageProcessor {
    img: RgbImage,
}

impl ImageProcessor {
    pub fn new(img: RgbImage) -> Self {
        ImageProcessor { img }
    }
}

Parallel Bicubic Interpolation

Using Rayon’s par_bridge for parallel row processing.

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
fn cubic_kernel(x: f64) -> f64 {
    let abs_x = x.abs();
    if abs_x <= 1.0 {
        1.5 * abs_x.powi(3) - 2.5 * abs_x.powi(2) + 1.0
    } else if abs_x <= 2.0 {
        -0.5 * abs_x.powi(3) + 2.5 * abs_x.powi(2) - 4.0 * abs_x + 2.0
    } else {
        0.0
    }
}

fn clamp(val: i32, min: i32, max: i32) -> i32 {
    val.max(min).min(max)
}

pub fn super_resolution_parallel(img: &RgbImage, scale: f64) -> RgbImage {
    let (width, height) = img.dimensions();
    let new_width = (width as f64 * scale) as u32;
    let new_height = (height as f64 * scale) as u32;

    let mut result = RgbImage::new(new_width, new_height);

    // Rayon parallel processing: each row processed independently
    (0..new_height).into_par_iter().for_each(|y| {
        for x in 0..new_width {
            let src_x = x as f64 / scale;
            let src_y = y as f64 / scale;

            let x_int = src_x.floor() as i32 - 1;
            let y_int = src_y.floor() as i32 - 1;

            let mut r: f64 = 0.0;
            let mut g: f64 = 0.0;
            let mut b: f64 = 0.0;

            for j in 0..4 {
                for i in 0..4 {
                    let px = clamp(x_int + i, 0, width as i32 - 1);
                    let py = clamp(y_int + j, 0, height as i32 - 1);
                    let pixel = img.get_pixel(px as u32, py as u32);

                    let weight = cubic_kernel(src_x - (x_int + i) as f64)
                               * cubic_kernel(src_y - (y_int + j) as f64);

                    r += pixel[0] as f64 * weight;
                    g += pixel[1] as f64 * weight;
                    b += pixel[2] as f64 * weight;
                }
            }

            result.put_pixel(x, y, Rgb([
                r.clamp(0.0, 255.0) as u8,
                g.clamp(0.0, 255.0) as u8,
                b.clamp(0.0, 255.0) as u8,
            ]));
        }
    });

    result
}

Parallelism details:

into_par_iter() converts range to parallel iterator
Rayon’s work stealing balances load across cores
No shared mutable state → no data races
On 4-core CPU: ~3.5x speedup over sequential version

Parallel Bilateral Filter

Bilateral filter combines spatial proximity and pixel value similarity, preserving edges while denoising.

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
pub fn bilateral_filter_parallel(img: &RgbImage, radius: usize, sigma_spatial: f64, sigma_color: f64) -> RgbImage {
    let (width, height) = img.dimensions();
    let mut result = img.clone();

    let pixels: Vec<_> = img.pixels().collect();

    // Parallel processing: each pixel independently
    pixels.par_iter().enumerate().for_each(|(idx, pixel)| {
        let x = (idx % width as usize) as i32;
        let y = (idx / width as usize) as i32;

        let mut sum_weight = 0.0;
        let mut sum_r = 0.0;
        let mut sum_g = 0.0;
        let mut sum_b = 0.0;

        let r_center = pixel[0] as f64;
        let g_center = pixel[1] as f64;
        let b_center = pixel[2] as f64;

        // Local window
        for dy in -radius as i32..=radius as i32 {
            for dx in -radius as i32..=radius as i32 {
                let nx = clamp(x + dx, 0, width as i32 - 1);
                let ny = clamp(y + dy, 0, height as i32 - 1);
                let neighbor = img.get_pixel(nx as u32, ny as u32);

                let r_neighbor = neighbor[0] as f64;
                let g_neighbor = neighbor[1] as f64;
                let b_neighbor = neighbor[2] as f64;

                // Spatial weight
                let dist_spatial = ((dx * dx + dy * dy) as f64).sqrt();
                let weight_spatial = (-dist_spatial * dist_spatial / (2.0 * sigma_spatial * sigma_spatial)).exp();

                // Color weight
                let dist_color = ((r_center - r_neighbor).powi(2)
                                + (g_center - g_neighbor).powi(2)
                                + (b_center - b_neighbor).powi(2)).sqrt();
                let weight_color = (-dist_color * dist_color / (2.0 * sigma_color * sigma_color)).exp();

                let weight = weight_spatial * weight_color;

                sum_weight += weight;
                sum_r += r_neighbor * weight;
                sum_g += g_neighbor * weight;
                sum_b += b_neighbor * weight;
            }
        }

        result.put_pixel(x as u32, y as u32, Rgb([
            (sum_r / sum_weight) as u8,
            (sum_g / sum_weight) as u8,
            (sum_b / sum_weight) as u8,
        ]));
    });

    result
}

Bilateral filter characteristics:

Spatial domain: Gaussian weight based on distance
Color domain: Gaussian weight based on color difference
Preserves edges while smoothing flat regions
Heavier computation than Gaussian blur

1D FFT Implementation

Using Cooley-Tukey algorithm for one-dimensional Fast Fourier Transform.

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
use num_complex::Complex;

pub fn fft_1d(input: &[Complex<f64>]) -> Vec<Complex<f64>> {
    let n = input.len();
    assert!(n.is_power_of_two(), "FFT input length must be power of 2");

    if n == 1 {
        return input.to_vec();
    }

    let mut even = Vec::with_capacity(n / 2);
    let mut odd = Vec::with_capacity(n / 2);

    for (i, &val) in input.iter().enumerate() {
        if i % 2 == 0 {
            even.push(val);
        } else {
            odd.push(val);
        }
    }

    let even_fft = fft_1d(&even);
    let odd_fft = fft_1d(&odd);

    let mut result = vec![Complex::new(0.0, 0.0); n];
    let angle = -2.0 * std::f64::consts::PI / n as f64;

    for k in 0..n/2 {
        let t = odd_fft[k] * Complex::exp(Complex::new(0.0, angle * k as f64));
        result[k] = even_fft[k] + t;
        result[k + n/2] = even_fft[k] - t;
    }

    result
}

FFT notes:

Recursive divide-and-conquer algorithm
Assumes input length is power of 2
Time complexity: O(n log n)
Used in frequency domain filtering

Rust Parallelism Features

Zero-copy views:

rust
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fn process_region(img: &RgbImage, region: (u32, u32, u32, u32)) {
    // No need to clone the entire image
    let (x, y, w, h) = region;
    for py in y..y+h {
        for px in x..x+w {
            let pixel = img.get_pixel(px, py);
            // Process pixel
        }
    }
}

Concurrency safety:

Compiler guarantees no data races
Rayon data parallelism without manual synchronization
Send/Sync traits mark types safe to share across threads

Performance Optimization Strategies

Multi-Scale Processing

Build an image pyramid and process from low resolution upwards:

Reduces computation (fewer pixels at low resolution)
Preserves global structure information
Suitable for super-resolution and deblurring tasks

Patch-Based Processing (Tiling)

Divide large images into smaller tiles for processing:

Reduces memory footprint, suitable for mobile devices
Improves cache locality
Facilitates GPU parallel acceleration

Model Lightweighting

Deep learning model optimization techniques:

Depthwise Separable Convolution
Knowledge Distillation
Quantization: FP32 → INT8
Pruning: Remove unimportant channels

Hardware Acceleration

GPU: CUDA / Metal / Vulkan parallel computing, suitable for deep learning inference
NPU / TPU: Dedicated neural network accelerators, high energy efficiency
SIMD: Utilize CPU’s SSE/AVX/NEON instruction sets to accelerate traditional algorithms

Processing Flow Diagram

mermaid
flowchart TD
    A[Input Image] --> B[Bicubic Interpolation]
    B --> C[Bilateral Filter Denoise]
    C --> D[Laplacian Sharpening]
    D --> E[Output Image]

    style A fill:#FF9800
    style B fill:#2196F3
    style C fill:#2196F3
    style D fill:#2196F3
    style E fill:#4CAF50

Key step descriptions:

Bicubic interpolation: Upscale resolution with smooth scaling
Bilateral filter: Remove noise while preserving edges
Laplacian sharpening: Enhance high-frequency details

Parallel Processing Architecture

mermaid
flowchart TD
    A[Input Image 1920x1080] --> B[Rayon Sharding]
    B --> C[CPU Cores 0-1<br/>Process Rows 0-539]
    B --> D[CPU Cores 2-3<br/>Process Rows 540-1079]
    C --> G[Merge Results]
    D --> G
    G --> H[Output Image]

    style A fill:#FF9800
    style B fill:#9C27B0
    style H fill:#4CAF50
    style C fill:#2196F3
    style D fill:#2196F3

Parallelism advantages:

Rayon automatic work stealing
No shared mutable state, no data races
Near-linear speedup (4 cores ≈ 3.5x acceleration)

Optimization Strategy Overview

mermaid
flowchart TD
    A["Optimization Goal<br/>Speed+Accuracy"] --> B["Algorithm"]
    A --> C["Architecture"]
    A --> D["Hardware"]
    style A fill:#FF9800,color:#fff
    style B fill:#2196F3,color:#fff
    style C fill:#9C27B0,color:#fff
    style D fill:#4CAF50,color:#fff

Key techniques at each level:

mermaid
flowchart TD
    B["Algorithm"] --> E["Multi-Scale"]
    B --> F["Tiling"]
    style B fill:#2196F3,color:#fff
    style E fill:#2196F3,color:#fff
    style F fill:#2196F3,color:#fff

mermaid
flowchart TD
    C["Architecture"] --> G["Depthwise Conv"]
    C --> H["Quantize INT8"]
    C --> I["Pruning"]
    style C fill:#9C27B0,color:#fff
    style G fill:#9C27B0,color:#fff
    style H fill:#9C27B0,color:#fff
    style I fill:#9C27B0,color:#fff

mermaid
flowchart TD
    D["Hardware"] --> J["GPU"]
    D --> K["NPU"]
    D --> L["SIMD"]
    style D fill:#4CAF50,color:#fff
    style J fill:#4CAF50,color:#fff
    style K fill:#4CAF50,color:#fff
    style L fill:#4CAF50,color:#fff

Trade-off principles:

Speed priority: quantization + pruning + GPU
Accuracy priority: keep floating-point + multi-scale
Mobile: tiling + quantization + lightweight models

Practical Application Pipeline

Photo Post-Processing Pipeline

1
Raw Image → Denoise → Deblur → Super-Resolution → Sharpen → Output

Denoise: Use bilateral filtering to preserve edges
Deblur: Wiener filter or deep learning models
Super-Resolution: Bicubic interpolation + sharpening
Sharpen: Laplacian or Unsharp Masking

Real-Time Video Enhancement

Use multi-scale processing to reduce latency
Limit resolution (720p → 1080p)
Frame skipping (process every 2-3 frames)
Leverage GPU acceleration

Mobile Deployment

Tiled processing to avoid memory overflow
Use quantized models (INT8)
Close background apps to free resources
Consider model compression (ONNX / TFLite)

Summary

This article demonstrated image restoration algorithm implementations in Go and Rust, showing how each language approaches image processing tasks with different patterns.

Practical recommendations:

Use tiled processing to reduce memory footprint
Choose appropriate optimization strategies based on target platform
Test different parameter combinations (kernel size, sigma values)

Part of series: Signals & Image Processing Series

← Previous Modern Super-Resolution — From ESRGAN to Diffusion Models Next → Acoustic Waves & Digital Signal Processing Basics