Review: RED-Net — Residual Encoder-Decoder Network (Denoising / Super Resolution)

Image Restoration including Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting.

Sik-Ho Tsang

Follow

Published in

DataDrivenInvestor

5 min readDec 21, 2018

--

In this story, RED-Net (Residual Encoder-Decoder Network), for image restoration, is reviewed. Suppose we have a corrupted image y:

where x is the clean version of y; H is the degradation function and n is the additive noise. By using the same network architecture but trained with different dataset, i.e. with different sets of x and y, RED-Net can help for the tasks of Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting.

It is published in 2016 NIPS with over 200 citations. It also has a more detailed version of technical report in 2016 arXiv. (Sik-Ho Tsang @ Medium)

What Are Covered

Network Architecture
Ablation Study
Results on Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting

1. Network Architecture

The network contains layers of symmetric convolution (encoder) and deconvolution (decoder).

Convolution

The convolutional layers act as the feature extractor, which capture the abstraction of image contents while eliminating noises/corruptions.

Deconvolution

The deconvolutional layers are then combined to recover the details of image contents. Deconvolutional layers associate a single input activation with multiple outputs. Deconvolution is usually used as learnable up-sampling layers.

Skip/Shortcut Connections

Skip/Shortcut connections are connected every a few (in this case, two) layers from convolutional feature maps to their mirrored deconvolutional feature maps. Thus, the response from a convolutional layer is directly propagated to the corresponding mirrored deconvolutional layer, both forwardly and backwardly. The passed convolutional feature maps are summed to the deconvolutional feature maps element-wise, and passed to the next layer after rectification.

2. Ablation Study

2.1. Different Combinations of Convolution and Deconvolution

**PSNR on Image Denoising (σ=70) Validation Set During Training**

By using only 5 or 10 deconv (conv upsampling), the PSNR obtained is not good.
By using only 5 or 10 conv, the PSNR obtained is better.
By using 5 conv and 5 deconv, the PSNR obtained is much better.

2.2. Effectiveness of Skip/Shortcut Connections

With skip connections, the PSNR is much better.
The reason may be that deeper networks can destroy the image details, which is undesired for pixel-wise dense regression. Skip connections carry important image details, which helps to reconstruct clean image.
Using very deep networks may easily suffer from training issues such as gradient vanishing. Using skip connections can help to address this problem.

Without skip connections, network with more layers even increases the loss during training compared with those with fewer layers.
With skip connections, 30-layer network is better than 20-layer network with smaller training loss.

RED-net, which consists of long and short symmetric skip connections, is better than the ResNet building block in ResNet.

3. Results on Image Denoising, Super Resolution, JPEG Deblocking, and Image Inpainting

3.1. Image Denoising

Reduce the noise of noisy images.
Datasets: 14 common benchmark image, and BSD Dataset.

3.1.1. One Model for One Noise Level

**Average PSNR and SSIM results of σ 10, 30, 50, 70**

RED2n: n conv and n deconv with symmetric skip connections
RED10 has already got the better results than other state-of-the-art approaches
RED30 has even better results.

3.1.2. One Model for All Noise Levels

**Average PSNR and SSIM results for image denoising using a single 30-layer network**

PSNR is degraded comparing to separate models, but it still beats the existing methods.

**Visual results of image denoising. Images from left to right column are: clean image; the recovered image of RED30, BM3D, EPLL, NCSR, PCLR, PGPD, WNNM**

3.2. Super Resolution

Enlarge the size of image.
Datasets: Set5, Set14, and BSD100

3.2.1. One Model for One Scaling Factor

**Average PSNR and SSIM results of scaling 2, 3 and 4**

RED30 again obtains the highest PSNR, better than SRCNN.

**Visual results of image super-resolution. Images from left to right column are: High resolution image; the recovered image of RED30, ARFL+, CSC, CSCN, NBSRF, SRCNN, TSE**

At the mean time for the development of RED-Net, VDSR and DRCN were invented, the concurrent works for super resolution.
RED30 nearly performs the best for all datasets and scaling factors.

3.2.2. One Model for All Scaling Factors

**Average PSNR and SSIM results of scaling 2, 3 and 4 using a single 30-layer network**

RED30 still performs quite well.

3.3. JPEG Deblocking

Lossy compression, such as JPEG, introduces complex compression artifacts, particularly the blocking artifacts, ringing effects and blurring.
Reduce the JPEG compression artifacts.
Datasets: LIVE1