ResNet-50
1. Vanishing Gradient Problem
- Sigmoid의 Max Gradient는 0.25에 불과.
- 이외 값이 커지거나 작아짐에 따라, Graidnet는 0에 converge
만약, Network가 깊어진다면?
Back-Propagation 과정을 거치며, 연쇄적으로 0에 가까운 값들이 곱해지며 기울기가 소실되는 문제 발생
- 이로 인해, Weight Update(학습)에 문제가 되는 현상을 Vanishing Gradient.
- Sigmoid에서 뚜렷하게 나타나지만, 다른 Activation Function 역시 동일한 문제가 발생할 가능성 有.
- 어떻게 해결할 수 있을까?
2. Skip-Connection
- 이러한 문제를 해결하기 위해, 제안된 Skip-Connection
- Activation Function을 통과하기 전 이전 Layer의 Feature map을 가산하여 Gradient의 값을 상승시키는 방식
3. Skip-Connection on CNN
\[ O_{h} = \frac{I_h - K_h + 2P}{S} + 1 \quad O_{w} = \frac{I_w - K_w + 2P}{S} + 1 \]
- Output shape Calculation
- Input Shape = (312,312)
- Level 1. Conv2D \[ \frac{312 - 2}{2} + 1 = 156 \]
-
Level 2. Conv2D \[ \frac{156 - 2}{2} + 1 = 78 \]
- F(x)는 (78,78)인데 X는 (312,312) 어떻게 더할 수 있는 거지?
- Input Shape = (312,312)
- Projection Shortcut (1x1 Convolution)
- shape을 맞춰주기 위해 1X1 Conv Layer를 활용해 Shape을 조정
- 1X1 Conv, Stride = 4(2+2) \[ \frac{312 - 1}{4} + 1 = 78 \]
- 이렇게 Shape을 조정한 후 element-wise addition
- Code Version
from tensorflow.keras import layers, Model
def residual_block(input_tensor, filters=3, kernel_size=2, strides=2):
# Conv Layer
x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(input_tensor)
x = layers.Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='valid', activation='relu')(x)
# Skip Connection
shortcut = layers.Conv2D(filters, kernel_size=1, strides=strides * 2, padding='valid')(input_tensor)
output = layers.add([x, shortcut])
# Output
output = layers.Activation('relu')(output)
return output
input_tensor = layers.Input(shape=(312, 312, 3))
output_tensor = residual_block(input_tensor)
model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()
4. BottleNeck
- ResNet의 핵심인 BottleNeck Architecture
- BottleNeck은 병목, 즉 입구가 점점 좁아지는 형태에서 파생된 개념.
- 1X1, 3X3, 1X1의 Residual Block을 활용해 Parameters를 감소시키는 Architecture
- Calculation
- Input size : (320, 320, 256),
- Level 1. Conv 1X1 filters : 64, stride : 1
- 1X1 Conv Layer를 활용해, channel을 줄여 연산량 감소 \[ \frac{320 - 1}{1} + 1 = 320 \quad \text{output shape} : (320,320,64) \]
- Level 2. Conv 3X3 filters : 64, stride : 2
\[
\frac{320 - 1}{2} + 1 = 160 \quad \text{output shape} : (160,160,64)
\]
- 3X3 Conv Layer를 활용해 Spatial Feature Extraction
- Level 3. Conv 1X1 filters : 256, stride : 1
\[
\frac{160 - 1}{1} + 1 = 160 \quad \text{output shape} : (160,160,256)
\]
- 1X1 Conv Layer를 활용해 추출된 특징의 channel을 복원
- Code Version
from tensorflow.keras import layers, Model
def bottleneck(input_tensor, strides=1):
# Reduce channels with 1x1 Conv
x = layers.Conv2D(64, kernel_size=(1, 1), strides=1, padding='same', activation='relu')(input_tensor)
# Extract spatial features with 3x3 Conv
x = layers.Conv2D(64, kernel_size=(3, 3), strides=strides, padding='same', activation='relu')(x)
# Restore channels with 1x1 Conv
x = layers.Conv2D(256, kernel_size=(1, 1), strides=1, padding='same', activation=None)(x)
# Shortcut connection
shortcut = layers.Conv2D(256, kernel_size=(1, 1), strides=strides, padding='same')(input_tensor)
# Add shortcut to the main path
output = layers.add([x, shortcut])
output = layers.Activation('relu')(output)
return output
# Input tensor
input_tensor = layers.Input(shape=(320, 320, 256))
# Bottleneck layer
output_tensor = bottleneck(input_tensor, strides=2)
# Model
model = Model(inputs=input_tensor, outputs=output_tensor)
model.summary()
5. ResNet
- 실제 Layer에는 Batch normalization이 추가된 형태
- 이를 참고하여, ResNet-50을 torch, tensorflow 별로 구현
- Resnet의 경우 일정한 BottleNeck Block이 반복되는 형태
- Block이 바뀔 경우 Dimension이 달라짐
- Ex. Conv2 -> Conv3
- 이의 경우 Skip-Connection 간 Projection Shortcut을 통해 Dimension 통일
- Conv2 -> Conv2 : identity_block
- Conv2 -> Conv3 : convolutional_block
- Block이 바뀔 경우 Dimension이 달라짐
6. ResNet-50 구현
6.1 Tensorflow-Keras
from tensorflow.keras import layers, Model
def identity_block(x, filters):
f1, f2, f3 = filters
shortcut = x
x = layers.Conv2D(f1, (1, 1), strides=1, padding='valid')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
x = layers.BatchNormalization()(x)
x = layers.Add()([x, shortcut])
x = layers.ReLU()(x)
return x
def convolutional_block(x, filters, strides=2):
f1, f2, f3 = filters
shortcut = layers.Conv2D(f3, (1, 1), strides=strides, padding='valid')(x)
shortcut = layers.BatchNormalization()(shortcut)
x = layers.Conv2D(f1, (1, 1), strides=strides, padding='valid')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(f2, (3, 3), strides=1, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.Conv2D(f3, (1, 1), strides=1, padding='valid')(x)
x = layers.BatchNormalization()(x)
x = layers.Add()([x, shortcut])
x = layers.ReLU()(x)
return x
def Conv2(x):
x = convolutional_block(x, [64, 64, 256], strides=1)
x = identity_block(x, [64, 64, 256])
x = identity_block(x, [64, 64, 256])
return x
def Conv3(x):
x = convolutional_block(x, [128, 128, 512], strides=2)
x = identity_block(x, [128, 128, 512])
x = identity_block(x, [128, 128, 512])
x = identity_block(x, [128, 128, 512])
return x
def Conv4(x):
x = convolutional_block(x, [256, 256, 1024], strides=2)
x = identity_block(x, [256, 256, 1024])
x = identity_block(x, [256, 256, 1024])
x = identity_block(x, [256, 256, 1024])
x = identity_block(x, [256, 256, 1024])
x = identity_block(x, [256, 256, 1024])
return x
def Conv5(x):
x = convolutional_block(x, [512, 512, 2048], strides=2)
x = identity_block(x, [512, 512, 2048])
x = identity_block(x, [512, 512, 2048])
return x
def Resnet_50(input_tensor, num_classes=1000):
x = layers.Conv2D(64, kernel_size=(7, 7), strides=2, padding='same')(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
x = Conv2(x)
x = Conv3(x)
x = Conv4(x)
x = Conv5(x)
x = layers.GlobalAveragePooling2D()(x)
output_tensor = layers.Dense(num_classes, activation='softmax')(x)
model = Model(inputs=input_tensor, outputs=output_tensor)
return model
input_tensor = layers.Input(shape=(224, 224, 3))
model = Resnet_50(input_tensor)
model.summary()
6.2 Pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class IdentityBlock(nn.Module):
def __init__(self, in_channels, filters):
super(IdentityBlock, self).__init__()
f1, f2, f3 = filters
self.conv1 = nn.Conv2d(in_channels, f1, nkernel_size=1, stride=1, padding=0, bias=False)
self.bn1 = nn.BatchNorm2d(f1)
self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(f2)
self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
self.bn3 = nn.BatchNorm2d(f3)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
shortcut = x
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.conv3(x)
x = self.bn3(x)
x += shortcut
x = self.relu(x)
return x
class ConvolutionalBlock(nn.Module):
def __init__(self, in_channels, filters, stride=2):
super(ConvolutionalBlock, self).__init__()
f1, f2, f3 = filters
self.conv1 = nn.Conv2d(in_channels, f1, kernel_size=1, stride=stride, padding=0, bias=False)
self.bn1 = nn.BatchNorm2d(f1)
self.conv2 = nn.Conv2d(f1, f2, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(f2)
self.conv3 = nn.Conv2d(f2, f3, kernel_size=1, stride=1, padding=0, bias=False)
self.bn3 = nn.BatchNorm2d(f3)
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, f3, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(f3)
)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
shortcut = self.shortcut(x)
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.conv3(x)
x = self.bn3(x)
x += shortcut
x = self.relu(x)
return x
class ResNet50(nn.Module):
def __init__(self, num_classes=1000):
super(ResNet50, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(64, [64, 64, 256], 3, stride=1)
self.layer2 = self._make_layer(256, [128, 128, 512], 4, stride=2)
self.layer3 = self._make_layer(512, [256, 256, 1024], 6, stride=2)
self.layer4 = self._make_layer(1024, [512, 512, 2048], 3, stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(2048, num_classes)
def _make_layer(self, in_channels, filters, blocks, stride):
layers = []
layers.append(ConvolutionalBlock(in_channels, filters, stride))
for _ in range(1, blocks):
layers.append(IdentityBlock(filters[2], filters))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ResNet50(num_classes=1000).to(device=device)
input_shape = (3, 224, 224)
summary(model,input_shape)
댓글남기기