5 분 소요

1. Vanishing Gradient Problem

Example Image

  • Sigmoid의 Max Gradient는 0.25에 불과.
  • 이외 값이 커지거나 작아짐에 따라, Graidnet는 0에 converge

    만약, Network가 깊어진다면?

    Back-Propagation 과정을 거치며, 연쇄적으로 0에 가까운 값들이 곱해지며 기울기가 소실되는 문제 발생

  • 이로 인해, Weight Update(학습)에 문제가 되는 현상을 Vanishing Gradient.
  • Sigmoid에서 뚜렷하게 나타나지만, 다른 Activation Function 역시 동일한 문제가 발생할 가능성 有.
  • 어떻게 해결할 수 있을까?

2. Skip-Connection

Example Image

  • 이러한 문제를 해결하기 위해, 제안된 Skip-Connection
  • Activation Function을 통과하기 전 이전 Layer의 Feature map을 가산하여 Gradient의 값을 상승시키는 방식

3. Skip-Connection on CNN

Example Image

\[ O_{h} = \frac{I_h - K_h + 2P}{S} + 1 \quad O_{w} = \frac{I_w - K_w + 2P}{S} + 1 \]

  • Output shape Calculation
    • Input Shape = (312,312)
      • Level 1. Conv2D \[ \frac{312 - 2}{2} + 1 = 156 \]
      • Level 2. Conv2D \[ \frac{156 - 2}{2} + 1 = 78 \]

      • F(x)는 (78,78)인데 X는 (312,312) 어떻게 더할 수 있는 거지?
  • Projection Shortcut (1x1 Convolution)
    • shape을 맞춰주기 위해 1X1 Conv Layer를 활용해 Shape을 조정
    • 1X1 Conv, Stride = 4(2+2) \[ \frac{312 - 1}{4} + 1 = 78 \]
    • 이렇게 Shape을 조정한 후 element-wise addition
  • Code Version
from tensorflow.keras import layers, Model

def residual_block(input_tensor, filters=3, kernel_size=2, strides=2):
    # Conv Layer
    x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(input_tensor)
    x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(x)

    # Skip Connection
    shortcut = layers.Conv2D(filters, kernel_size=1, strides=strides * 2, padding='valid')(input_tensor)
    output = layers.add([x, shortcut])
    
    # Output
    output = layers.Activation('relu')(output)

    return output


input_tensor = layers.Input(shape=(312, 312, 3))  

output_tensor = residual_block(input_tensor)

model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()

4. BottleNeck

Example Image

  • ResNet의 핵심인 BottleNeck Architecture
  • BottleNeck은 병목, 즉 입구가 점점 좁아지는 형태에서 파생된 개념.
  • 1X1, 3X3, 1X1의 Residual Block을 활용해 Parameters를 감소시키는 Architecture

Example Image

  • Calculation
    • Input size : (320, 320, 256),
    • Level 1. Conv 1X1 filters : 64, stride : 1
      • 1X1 Conv Layer를 활용해, channel을 줄여 연산량 감소 \[ \frac{320 - 1}{1} + 1 = 320 \quad \text{output shape} : (320,320,64) \]
    • Level 2. Conv 3X3 filters : 64, stride : 2 \[ \frac{320 - 1}{2} + 1 = 160 \quad \text{output shape} : (160,160,64) \]
      • 3X3 Conv Layer를 활용해 Spatial Feature Extraction
    • Level 3. Conv 1X1 filters : 256, stride : 1 \[ \frac{160 - 1}{1} + 1 = 160 \quad \text{output shape} : (160,160,256) \]
      • 1X1 Conv Layer를 활용해 추출된 특징의 channel을 복원
  • Code Version
from tensorflow.keras import layers, Model

def bottleneck(input_tensor, strides=1):
    # Reduce channels with 1x1 Conv
    x = layers.Conv2D(64, kernel_size=(1, 1), strides=1, padding='same', activation='relu')(input_tensor)
    # Extract spatial features with 3x3 Conv
    x = layers.Conv2D(64, kernel_size=(3, 3), strides=strides, padding='same', activation='relu')(x)
    # Restore channels with 1x1 Conv
    x = layers.Conv2D(256, kernel_size=(1, 1), strides=1, padding='same', activation=None)(x)
    
    # Shortcut connection
    shortcut = layers.Conv2D(256, kernel_size=(1, 1), strides=strides, padding='same')(input_tensor)
    
    # Add shortcut to the main path
    output = layers.add([x, shortcut])
    output = layers.Activation('relu')(output)
    
    return output

# Input tensor
input_tensor = layers.Input(shape=(320, 320, 256))  

# Bottleneck layer
output_tensor = bottleneck(input_tensor, strides=2)

# Model
model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()


5. ResNet

Example Image

  • 실제 Layer에는 Batch normalization이 추가된 형태
  • 이를 참고하여, ResNet-50을 torch, tensorflow 별로 구현
  • Resnet의 경우 일정한 BottleNeck Block이 반복되는 형태
    • Block이 바뀔 경우 Dimension이 달라짐
      • Ex. Conv2 -> Conv3
    • 이의 경우 Skip-Connection 간 Projection Shortcut을 통해 Dimension 통일
      • Conv2 -> Conv2 : identity_block
      • Conv2 -> Conv3 : convolutional_block

6. ResNet-50 구현

6.1 Tensorflow-Keras

from tensorflow.keras import layers, Model

def identity_block(x, filters):
    f1, f2, f3 = filters
    shortcut = x

    
    x = layers.Conv2D(f1, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)

   
    x = layers.Add()([x, shortcut])
    x = layers.ReLU()(x)
    return x


def convolutional_block(x, filters, strides=2):
    f1, f2, f3 = filters
    shortcut = layers.Conv2D(f3, (1, 1), strides=strides, padding='valid')(x)
    shortcut = layers.BatchNormalization()(shortcut)

    
    x = layers.Conv2D(f1, (1, 1), strides=strides, padding='valid')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    
    x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

   
    x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
    x = layers.BatchNormalization()(x)

    
    x = layers.Add()([x, shortcut])
    x = layers.ReLU()(x)
    return x


def Conv2(x):
    x = convolutional_block(x, [64, 64, 256], strides=1)
    x = identity_block(x, [64, 64, 256])
    x = identity_block(x, [64, 64, 256])
    return x


def Conv3(x):
    x = convolutional_block(x, [128, 128, 512], strides=2)
    x = identity_block(x, [128, 128, 512])
    x = identity_block(x, [128, 128, 512])
    x = identity_block(x, [128, 128, 512])
    return x


def Conv4(x):
    x = convolutional_block(x, [256, 256, 1024], strides=2)
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    x = identity_block(x, [256, 256, 1024])
    return x


def Conv5(x):
    x = convolutional_block(x, [512, 512, 2048], strides=2)
    x = identity_block(x, [512, 512, 2048])
    x = identity_block(x, [512, 512, 2048])
    return x


def Resnet_50(input_tensor, num_classes=1000):
    x = layers.Conv2D(64, kernel_size=(7, 7), strides=2, padding='same')(input_tensor)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)

    x = Conv2(x)
    x = Conv3(x)
    x = Conv4(x)
    x = Conv5(x)

    x = layers.GlobalAveragePooling2D()(x)
    output_tensor = layers.Dense(num_classes, activation='softmax')(x)

    model = Model(inputs=input_tensor, outputs=output_tensor)
    return model


input_tensor = layers.Input(shape=(224, 224, 3)) 
model = Resnet_50(input_tensor)
model.summary()

6.2 Pytorch

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary

class IdentityBlock(nn.Module):
    def __init__(self, in_channels, filters):
        super(IdentityBlock, self).__init__()
        f1, f2, f3 = filters

        self.conv1 = nn.Conv2d(in_channels, f1, nkernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(f1)

        self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(f2)

        self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(f3)

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        shortcut = x
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x)

        x += shortcut
        x = self.relu(x)
        return x


class ConvolutionalBlock(nn.Module):
    def __init__(self, in_channels, filters, stride=2):
        super(ConvolutionalBlock, self).__init__()
        f1, f2, f3 = filters

        self.conv1 = nn.Conv2d(in_channels, f1, kernel_size=1, stride=stride, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(f1)

        self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(f2)

        self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(f3)

        self.shortcut = nn.Sequential(
            nn.Conv2d(in_channels, f3, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(f3)
        )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        shortcut = self.shortcut(x)

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x)

        x += shortcut
        x = self.relu(x)
        return x


class ResNet50(nn.Module):
    def __init__(self, num_classes=1000):
        super(ResNet50, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(64, [64, 64, 256], 3, stride=1)
        self.layer2 = self._make_layer(256, [128, 128, 512], 4, stride=2)
        self.layer3 = self._make_layer(512, [256, 256, 1024], 6, stride=2)
        self.layer4 = self._make_layer(1024, [512, 512, 2048], 3, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(2048, num_classes)

    def _make_layer(self, in_channels, filters, blocks, stride):
        layers = []
        layers.append(ConvolutionalBlock(in_channels, filters, stride))
        for _ in range(1, blocks):
            layers.append(IdentityBlock(filters[2], filters))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ResNet50(num_classes=1000).to(device=device)
input_shape = (3, 224, 224)
summary(model,input_shape)

댓글남기기