CNN Keras vs Torch

Posted on 2020-07-30 Edited on 2021-08-16 In Reinforcement Learning

1. How to understand parameters in CNN layers

There is an example to test how you really know this.

2. How to calculate parameter numbers in CNN layers

3. Implement difference between Keras and Torch

In keras, we will start with model = Sequential() and add all the layers to model.

In pytorch, we will start by defining class and initialize it with all layers and then add forward function to define flow of data.

class NeuralNet(nn.Module):
    def __init__(self):
        pass
    def forward(self , x):
        pass

Add convolution layer:

Keras:

1 2	Conv2D(filters, kernel_size, strides, padding, activation=None, use_bias=True) # "padding='valid'" means there is no padding, "padding='same'" means output dim is the same as input dim.

if no padding and stride:

model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu'))
# input dim: (28,28,1); filter num: 32; kernel dim: (5,5); activation: relu.
# output dim: (24,24,32)

if has padding and stride:

1 2	model.add(Conv2D(256, kernel_size=3, strides=1,padding=‘same’, activation='relu') # input dim: (24,24,32); filter num:256; kernel dim:(3,3), strid:(1,1), padding=(1,1); output dim: (24,24,32)

Torch:
1
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, padding_mode)
if no padding and stride:
1
2
3
4
5
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.conv1 = nn.Conv2d(1,32,kernel_size=(5,5))
self.relu1 = nn.Relu()
if has padding and stride:
1
2
self.conv2 = nn.Conv2d(32,256,kernel_size=(3,3),stride=(1,1),padding=(1,1))
self.relu1 = nn.Relu()
Here, nn.relu() has an argument inplace, which means the operation will be in-place. In other words, inplace=True means that it will modify the input directly, without allocating any additional output. It can sometimes slightly decrease the memory usage, but may not always be a valid operation (because the original input is destroyed)。利用in-place计算可以节省内（显）存，同时还可以省去反复申请和释放内存的时间。但是会对原变量覆盖，只要不带来错误就用。