U-Net — Ryan Han

The U-Net is an architecture for convolutional neural networks consisting of an encoder and decoder with skip connections

U-Net

Overview

Previous architectures of convolutional networks failed to preserve high-resolution details
In a standard encoder-decoder network, data is compressed into a bottleneck where spatial information is lost in favor of high-level semantic meaning
The breakthrough with this paper was the introduction of skip connections, which skip the bottleneck by concatenating feature maps from the encoding path directly to the feature maps in the decoding path
- Preserves high-resolution information of the input image from the encoding path, so the model no longer has to guess about high-resolution details in the decoding path
- Enables fusion of features so the model can leverage both high-level and low-level information
- Improves gradient flow by propagating gradients from output layer back to earlier layers

Repeated application of:
- Two $3 \times 3$ convolutions, each followed by ReLU
- $2 \times 2$ max pooling operation with stride 2 for downsampling
- Double the number of feature maps at each downsampling stack

Encoding path:
- Spatial dimensions decrease $\to$ loses precise locations & gains global context
- Channel dimensions increase $\to$ gains complex concept detection
Decoding path:
- Spatial dimensions increase $\to$ recovers spatial resolution
- Channel dimensions decrease $\to$ compresses abstract concepts back into pixels