had the same size, but output 1 did not have a ReLu layer. The upper branch consisted of a max-pooling layer (Max-Pooling), a convolution layer (Conv 1 × 1), and a batch norm layer (BN). The Max-Pooling had a kernel size of 3 × 3 and a stride of 2 to downsample the input 1 for retaining the features and ensuring the same size as the output layer before the element-wise add operation was conducted in the bottom branch. The Conv 1 × 1 consisted of multiple 1 × 1 convolution kernels with the same number as that in the second convolution layer in the bottom branch to adjust the number of channels. The BN used a regulation function to ensure the input in each layer of the mod