ResNet-50

2024-12-20 5 분 소요

1. Vanishing Gradient Problem

Example Image

Sigmoid의 Max Gradient는 0.25에 불과.
이외 값이 커지거나 작아짐에 따라, Graidnet는 0에 converge

만약, Network가 깊어진다면?

Back-Propagation 과정을 거치며, 연쇄적으로 0에 가까운 값들이 곱해지며 기울기가 소실되는 문제 발생
이로 인해, Weight Update(학습)에 문제가 되는 현상을 Vanishing Gradient.
Sigmoid에서 뚜렷하게 나타나지만, 다른 Activation Function 역시 동일한 문제가 발생할 가능성 有.
어떻게 해결할 수 있을까?

2. Skip-Connection

Example Image

이러한 문제를 해결하기 위해, 제안된 Skip-Connection
Activation Function을 통과하기 전 이전 Layer의 Feature map을 가산하여 Gradient의 값을 상승시키는 방식

3. Skip-Connection on CNN

Example Image

\[ O_{h} = \frac{I_h - K_h + 2P}{S} + 1 \quad O_{w} = \frac{I_w - K_w + 2P}{S} + 1 \]

Output shape Calculation
- Input Shape = (312,312)
  - Level 1. Conv2D \[ \frac{312 - 2}{2} + 1 = 156 \]
  - Level 2. Conv2D \[ \frac{156 - 2}{2} + 1 = 78 \]
  - F(x)는 (78,78)인데 X는 (312,312) 어떻게 더할 수 있는 거지?
Projection Shortcut (1x1 Convolution)
- shape을 맞춰주기 위해 1X1 Conv Layer를 활용해 Shape을 조정
- 1X1 Conv, Stride = 4(2+2) \[ \frac{312 - 1}{4} + 1 = 78 \]
- 이렇게 Shape을 조정한 후 element-wise addition
Code Version

from tensorflow.keras import layers, Model

def residual_block(input_tensor, filters=3, kernel_size=2, strides=2):
    # Conv Layer
    x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(input_tensor)
    x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(x)

    # Skip Connection
    shortcut = layers.Conv2D(filters, kernel_size=1, strides=strides * 2, padding='valid')(input_tensor)
    output = layers.add([x, shortcut])
    
    # Output
    output = layers.Activation('relu')(output)

    return output


input_tensor = layers.Input(shape=(312, 312, 3))  

output_tensor = residual_block(input_tensor)

model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()

4. BottleNeck

Example Image

ResNet의 핵심인 BottleNeck Architecture
BottleNeck은 병목, 즉 입구가 점점 좁아지는 형태에서 파생된 개념.
1X1, 3X3, 1X1의 Residual Block을 활용해 Parameters를 감소시키는 Architecture

Example Image

Calculation
- Input size : (320, 320, 256),
- Level 1. Conv 1X1 filters : 64, stride : 1
  - 1X1 Conv Layer를 활용해, channel을 줄여 연산량 감소 \[ \frac{320 - 1}{1} + 1 = 320 \quad \text{output shape} : (320,320,64) \]
- Level 2. Conv 3X3 filters : 64, stride : 2 \[ \frac{320 - 1}{2} + 1 = 160 \quad \text{output shape} : (160,160,64) \]
  - 3X3 Conv Layer를 활용해 Spatial Feature Extraction
- Level 3. Conv 1X1 filters : 256, stride : 1 \[ \frac{160 - 1}{1} + 1 = 160 \quad \text{output shape} : (160,160,256) \]
  - 1X1 Conv Layer를 활용해 추출된 특징의 channel을 복원
Code Version

from tensorflow.keras import layers, Model

def bottleneck(input_tensor, strides=1):
    # Reduce channels with 1x1 Conv
    x = layers.Conv2D(64, kernel_size=(1, 1), strides=1, padding='same', activation='relu')(input_tensor)
    # Extract spatial features with 3x3 Conv
    x = layers.Conv2D(64, kernel_size=(3, 3), strides=strides, padding='same', activation='relu')(x)
    # Restore channels with 1x1 Conv
    x = layers.Conv2D(256, kernel_size=(1, 1), strides=1, padding='same', activation=None)(x)
    
    # Shortcut connection
    shortcut = layers.Conv2D(256, kernel_size=(1, 1), strides=strides, padding='same')(input_tensor)
    
    # Add shortcut to the main path
    output = layers.add([x, shortcut])
    output = layers.Activation('relu')(output)
    
    return output

# Input tensor
input_tensor = layers.Input(shape=(320, 320, 256))  

# Bottleneck layer
output_tensor = bottleneck(input_tensor, strides=2)

# Model
model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()

5. ResNet

Example Image

실제 Layer에는 Batch normalization이 추가된 형태
이를 참고하여, ResNet-50을 torch, tensorflow 별로 구현
Resnet의 경우 일정한 BottleNeck Block이 반복되는 형태
- Block이 바뀔 경우 Dimension이 달라짐
  - Ex. Conv2 -> Conv3
- 이의 경우 Skip-Connection 간 Projection Shortcut을 통해 Dimension 통일
  - Conv2 -> Conv2 : identity_block
  - Conv2 -> Conv3 : convolutional_block

6. ResNet-50 구현

6.1 Tensorflow-Keras

from tensorflow.keras import layers, Model

def identity_block(x, filters):
    f1, f2, f3 = filters
    shortcut = x

    
    x = layers.Conv2D(f1, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)

   
    x = layers.Add()([x, shortcut])
    x = layers.ReLU()(x)
    return x


def convolutional_block(x, filters, strides=2):
    f1, f2, f3 = filters
    shortcut = layers.Conv2D(f3, (1, 1), strides=strides, padding='valid')(x)
    shortcut = layers.BatchNormalization()(shortcut)

    
    x = layers.Conv2D(f1, (1, 1), strides=strides, padding='valid')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    
    x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)

    
    x = layers.Add()([x, shortcut])
    x = layers.ReLU()(x)
    return x


def Conv2(x):
    x = convolutional_block(x, [64, 64, 256], strides=1)
    x = identity_block(x, [64, 64, 256])
    x = identity_block(x, [64, 64, 256])
    return x


def Conv3(x):
    x = convolutional_block(x, [128, 128, 512], strides=2)
    x = identity_block(x, [128, 128, 512])
    x = identity_block(x, [128, 128, 512])
    x = identity_block(x, [128, 128, 512])
    return x


def Conv4(x):
    x = convolutional_block(x, [256, 256, 1024], strides=2)
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    return x


def Conv5(x):
    x = convolutional_block(x, [512, 512, 2048], strides=2)
    x = identity_block(x, [512, 512, 2048])
    x = identity_block(x, [512, 512, 2048])
    return x


def Resnet_50(input_tensor, num_classes=1000):
    x = layers.Conv2D(64, kernel_size=(7, 7), strides=2, padding='same')(input_tensor)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)

    x = Conv2(x)
    x = Conv3(x)
    x = Conv4(x)
    x = Conv5(x)

    x = layers.GlobalAveragePooling2D()(x)
    output_tensor = layers.Dense(num_classes, activation='softmax')(x)

    model = Model(inputs=input_tensor, outputs=output_tensor)
    return model


input_tensor = layers.Input(shape=(224, 224, 3)) 
model = Resnet_50(input_tensor)
model.summary()

6.2 Pytorch

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary

class IdentityBlock(nn.Module):
    def __init__(self, in_channels, filters):
        super(IdentityBlock, self).__init__()
        f1, f2, f3 = filters

        self.conv1 = nn.Conv2d(in_channels, f1, nkernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(f1)

        self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(f2)

        self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(f3)

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        shortcut = x
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x)

        x += shortcut
        x = self.relu(x)
        return x


class ConvolutionalBlock(nn.Module):
    def __init__(self, in_channels, filters, stride=2):
        super(ConvolutionalBlock, self).__init__()
        f1, f2, f3 = filters

        self.conv1 = nn.Conv2d(in_channels, f1, kernel_size=1, stride=stride, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(f1)

        self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(f2)

        self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(f3)

        self.shortcut = nn.Sequential(
            nn.Conv2d(in_channels, f3, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(f3)
        )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        shortcut = self.shortcut(x)

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x)

        x += shortcut
        x = self.relu(x)
        return x


class ResNet50(nn.Module):
    def __init__(self, num_classes=1000):
        super(ResNet50, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(64, [64, 64, 256], 3, stride=1)
        self.layer2 = self._make_layer(256, [128, 128, 512], 4, stride=2)
        self.layer3 = self._make_layer(512, [256, 256, 1024], 6, stride=2)
        self.layer4 = self._make_layer(1024, [512, 512, 2048], 3, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(2048, num_classes)

    def _make_layer(self, in_channels, filters, blocks, stride):
        layers = []
        layers.append(ConvolutionalBlock(in_channels, filters, stride))
        for _ in range(1, blocks):
            layers.append(IdentityBlock(filters[2], filters))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ResNet50(num_classes=1000).to(device=device)
input_shape = (3, 224, 224)
summary(model,input_shape)

Twitter Facebook LinkedIn

ResNet-50

1. Vanishing Gradient Problem

2. Skip-Connection

3. Skip-Connection on CNN

4. BottleNeck

5. ResNet

6. ResNet-50 구현

6.1 Tensorflow-Keras

6.2 Pytorch

공유하기

댓글남기기

참고

[Lightweight-Sereis] 2. Quantization

[Lightweight-Sereis] 1. Pruning

[Lightweight-Sereis] 0. Intro

[Attention-Series] 1. Self-Attention