Press "Enter" to skip to content

CNN模型-ResNet、MobileNet、DenseNet、ShuffleNet、EfficientNet

本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.

文章来源:

 

https://medium.com/@CinnamonAITaiwan/cnn%E6%A8%A1%E5%9E%8B-resnet-mobilenet-densenet-shufflenet-efficientnet-5eba5c8df7e4

 

CNN演进

 

下图为我们了展示了2018前常用CNN模型大小与Accuracy的比较,网络上不乏介绍CNN演进的文章[LeNet/AlexNet/Vgg/ Inception/ResNet],写的也都很好,今天我们为各位读者介绍几个最新的CNN模型,如何搭建以及他们的优势在哪里。

CNN模型比较

CNN经典架构

 

要了解最新模型的优势,有一些架构的基本观念还是得先认识,下面就让我们来看看:Inception、残差网络、Depthwise Separable Convolution的观念。

 

Inception

 

Inception的架构最早由Google在2014年提出,其目的在于结合不同特征接收域(Receptive Field)的Kernel,而我们要怎幺做到这一点呢?大家可以先看看下图:

Inception架构

图中展示了经典的Inception架构,接在Feature Maps后一共有四条分支,其中三条先经过1*1 kernel的压缩这样做的意义主要是为了控制输出Channels的深度,并同时能增加模型的非线性;一条则是先通过3*3 kernel,而为了确保输出Feature Map在长宽上拥有一样尺寸,我们就要借用Padding技巧,1*1 kernel输出大小与输入相同,而3*3、5*5kernel则分别设定补边值为1、2,当然在tensorflow、Keras中最快的方式就是设定padding=same,就能在步长为1时确保输出尺寸维持相同。具体实现代码如下:

 

import tensorflow as tf
def Inception(input_data, input_depth = 192):
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 64, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)
    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 96, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)
        X_2 = tf.layers.conv2d(X_2, 128, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)
    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 16, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
        X_3 = tf.layers.conv2d(X_3, 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
        X_3 = tf.layers.conv2d(X_3, 32, (5, 5), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
    with  tf . name_scope('Branch_4'):
        X_4 = tf.layers.max_pooling2d(input_data, 2, 1, padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)
        X_4 = tf.layers.conv2d(X_4, 32, (1, 1), padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)
    out = tf.concat((X_1, X_2, X_3, X_4), axis = 3)
    return  out

 

残差网路

残差结构

上图为经典的残差结构,将输入的input与经过2–3层的F(x)跨接并相加,使输出表示为y=F(x)+x,这样的好处在于反向传播时能保至少会有一个1存在,降低梯度消失(vanishing gradient)发生的可能性。

 

什幺意思呢?举个例子来说,如果上方function中y(输出)对x偏微分,有一项是x自己对自己微分(得到1),在链式求导中每一项都保有一个1,比较不容易梯度消失,因此可以搭建更深的网路。

tensorflow实现的残差结构如下:

 

def Residual_Block(input_data, in_channel, out_channel, s = 1):
    X_shortcut = input_data ##记住输入
    X = tf.layers.conv2d(input_data, out_channel, (1, 1), strides = (s , s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)
    X = tf.layers.conv2d(X, out_channel, (3, 3), padding = 'same', strides = (s, s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)
    X = tf.layers.conv2d(X, out_channel, (1 , 1), strides = (1, 1),)
    X = tf.layers.batch_normalization(X)
    if(in_channel  !=  out_channel):
        X_shortcut = tf.layers.conv2d(X_shortcut, out_channel, (1 , 1),)
        X_shortcut = tf.layers.batch_normalization(X)
    X = X + X_shortcut
    X = tf.nn.relu(X)
    return  X

 

Depthwise Separable Convolution

Depthwise+Pointwise

上图为Depthwise Separable Convolution的架构,有别于一般的卷积,其主要可以分为两个步骤:

 

第一个步骤先将输入Feature Maps与k*k 、深度与input相同的kernel卷积(Depthwise),并且每一个Feature Map与Kernel的卷积是独立的。

 

第二步再用1*1 、深度与输出深度相同的kernel卷积(Pointwise)。

 

这样的好处是可以节省大量的参数,下方我们试着算算看参数量差别:

 

import tensorflow as tf
#计算总参数量
def get_num_params ():
  total_parameters = 0
  for variable in tf.trainable_variables():
    shape = variable.get_shape()
    # print(shape)
    # print(len(shape))
    variable_parameters = 1
    for dim in shape:
      # print(dim)
      variable_parameters *= dim.value
    # print(variable_parameters)
    total_parameters += variable_parameters
  return total_parameters
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.conv2d(inputs, 64, (3, 3), strides = (1, 1), activation = tf.nn.leaky_relu)
print (get_num_params()) ## (3*3*3+1)*64=1792
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.separable_conv2d(inputs, 64, (3, 3), padding = 'SAME')
print(get_num_params()) ## 3*3*3+(1*1*3+1)*64=283

 

由上方程式可以看出,同样输出是300*300*64,separable convolution的参数量大概是一般Convolution的1/6,达到轻量化模型的目的。

 

Depthwise Separable Convolution的参考代码如下:

 

import tensorflow as tf
import tensorflow.contrib as tc
slim = tc.slim
tf.reset_default_graph()
##单独定义 depthwise_conv层
## 参考 https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):
    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = normalizer_fn(x) if normalizer_fn is not None else x  # batch normalization
        x = activation_fn(x) if activation_fn is not None else x  # nonlinearity
        return x
      
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out=depthwise_conv(
        inputs, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv')
print(get_num_params()) ## 3*3*3=27  

##运用slim更简单
def depthwise_conv_bn(x, kernel_size, stride=1, dilation=1):
    with tf.variable_scope(None, 'depthwise_conv_bn'):
        x = slim.separable_conv2d(x, None, kernel_size, depth_multiplier=1, stride=stride,
                                  rate=dilation,)
        #x = slim.batch_norm(x, activation_fn=None, fused=False)
    return x
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out = depthwise_conv_bn(inputs, (3,3), stride=1, dilation=1)
print(get_num_params()) ## 3*3*3=27

 

CNN模型

 

ResNetV2

(a) ResnetV1 (e) ResnetV2以及其他变形

ResnetV2同样由Kaiming He团队提出,承袭ResnetV1的残差概念,但在Identity branch(左线)与Residual branch(右线)上做了一些更改。

 

拿掉Residual Block后的ReLU

 

作者认为,ReLU接在每个Residual block后面会导致Forward Propagation陷入单调递增,降低表达能力。

 

拿掉Identity branch后的BN

 

而如果使用图B的做法,BN层会改变Identity Branch的信息分布,造成收敛速度下降。论文中还有用到一个小技巧,先用1*1 kernel压缩深度,最后再用1*1 kernel回放深度,借此降低运算。

 

def  ResentV2_block(input_data, input_depth, compress_depth, output_depth, strides = (1, 1)):
    X_shortcut = input_data
    X = tf.layers.conv2d(input_data, compress_depth, (1, 1)) ##先压缩
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, compress_depth, (3, 3), padding = 'same', strides = strides)
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, output_depth, (1, 1)) ##再放大
    if(input_depth != output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1), strides = strides , padding = 'same') ##深度不同
    if(input_depth == output_depth) & (strides != (1, 1)):
        X_shortcut = tf.image.resize_images(X_shortcut, (X.shape[1], X.shape[2]), method = 0) ##Size不同
    out = X_shortcut + X
    return  out

 

有了Residual_Block,大家就可以依照论文给的参数去重建ResnetV2模型,论文中还有一些变化,像是多种变形的Residual_Block,有兴趣的读者们可以再深入去了解。

 

Inception-ResNet

InceptionResnet-A block

Inception-ResNet也是目前时常会用到的model,像是Inception-ResNetV2、InceptionV4等模型,我们上面有了Inception以及Residual Block的观念其实就很容易理解Inception-ResNet。模型核心就是把Residual Block中的Residual branch修改成Inception架构,文献中提出了三种不一样的组合,我们在这里实现InceptionResnet-A block。

 

def  InceptionResentA_block(input_data, input_depth = 3, output_depth = 384):
    X_shortcut = input_data
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)
    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)
        X_2 = tf.layers.conv2d(X_2 , 32, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)
    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
        X_3 = tf.layers.conv2d(X_3 , 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
        X_3 = tf.layers.conv2d(X_3 , 64, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
    out = tf . concat ((X_1, X_2, X_3), axis = 3)
    out = tf.layers.conv2d(out, output_depth, (1, 1))
    if(input_depth  !=  output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1))
    out = X_shortcut + out
    return out

 

DenseNet

Densenet架构

DenseNet为轻量模型的代表之一。下方代码实现Dense_Stage_Block(同时引入Depthwise separable convolution来进一步节省参数,加快模型速度,原文为一般卷积层):

 

def Dense_Stage(inputs_, depth=64, repeat=8):
    for _ in range(repeat):
        X_input = inputs_
        X = tf.layers.conv2d(inputs_,depth, (1,1), strides=(1,1), activation=tf.nn.leaky_relu)
        X = tf.layers.batch_normalization(X)
        X = tf.layers.separable_conv2d(X, depth, (3,3), padding='SAME')
        X = tf.nn.leaky_relu(X)
        X = tf.layers.batch_normalization(X)
        X = tf.concat([X_input,X],3)
        inputs_ = X
    return X

 

ShuffleNetV2

 

谈到轻量级模型,『ShuffleNet』应该是目前常见模型中的翘楚。轻量级模型主要有两个分支,分别为UC Berkeley and Stanford University推出的『SqueezeNet』以及Google推出的『MobileNet』,Depthwise separable convolution就是源于MobileNet,而SqueezeNet的原理与Inception非常类似在这就先不多加赘述。

 

ShuffleNet以SqueezeNet为基础并做了一些改变,其原理与Depthwise separable convolution有几分神似,Depthwise separable convolution是由Depthwise+Pointwise convolution组成,而之所以要运用Pointwise convolution是因为Depthwise中Feature Maps通道不流通的问题,在Depthwise Convolution中每一个Kernel都只对一张Feature Map卷积,并不能看到全局的信息。而在ShuffleNet中,Group Convolution一样有通道不流通的问题(参考下图,与Depthwise非常类似),然而不同于MobileNet使用Pointwise convolution来解决,ShuffleNet使用的方法就是『Shuffle』,直接把不同Group的Feature Map洗牌,送到下一层,这样一来又进一步节省了Pointwise convolution中的参数,达到『超轻量』级别。

Group Convolution

好,有了一些基本观念,现在让我们来看看ShuffleNetV2相较于V1做了哪些重要的改变:

 

1*1卷积

 

首先,V1使用大量的1*1卷积,会增加MAC(乘法加法操作),在Depthwise Separable Convolution中占运算及参数大宗的就是Pointwise Convolution,因此在V2中先对进入Block的Feature Maps做Split。

 

输出使用Concate

 

作者发现,Pixelwise的运算如相加与ReLU也是造成MAC上升的主因,因此V2中使用Concat取代V1的Add。

ShufflenetV1以及ShufflenetV2,(a) V1基本架构、(b)带有downsampling的V1架构、© V2基本架构、(d)带有downsampling的V2架构

下方代码为大家示范如何搭建一个ShuffleNetV2的Block,其中比较要注意的是Shuffle_group要能被输入Feature Map通道深度所整除。

 

##参考:https://github.com/timctho/shufflenet-v2-tensorflow/blob/master/module.py
##参考:https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def shuffle_unit(x, groups):  ##一般的shuffle depthwise_conv输出的Feature Map
    with tf.variable_scope('shuffle_unit'):
        n, h, w, c = x.get_shape().as_list()
        x = tf.reshape(x, shape=([tf.shape(x)[0], h, w, groups, c // groups]))
        x = tf.transpose(x, tf.convert_to_tensor([0, 1, 2, 4, 3]))
        x = tf.reshape(x, shape=[tf.shape(x)[0], h, w, c])
    return x
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv
    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def conv_bn_relu(x, out_channel, kernel_size, stride=1):  ##一般的Convolution+BN+Relu
    with tf.variable_scope(None, 'conv_bn_relu'):
        x = tf.layers.conv2d(x, out_channel, kernel_size, stride,)
        x = tf.nn.leaky_relu(tf.layers.batch_normalization(x))
    return x
def shufflenet_v2_block(x, out_channel, kernel_size, stride=1, shuffle_group=2): ##shufflenet_v2_block
    with tf.variable_scope(None, 'shuffle_v2_block'):
        if stride == 1:
            top, bottom = tf.split(x, num_or_size_splits=2, axis=3)
            half_channel = out_channel // 2
            top = conv_bn_relu(top, half_channel, 1)
            top = depthwise_conv_bn(top, kernel_size, stride)
            top = conv_bn_relu(top, half_channel, 1)
            out = tf.concat([top, bottom], axis=3)
            out = shuffle_unit(out, shuffle_group)
        else:   ##downsampling的Block
            half_channel = out_channel // 2
            b0 = conv_bn_relu(x, half_channel, 1)
            b0 = depthwise_conv_bn(b0, kernel_size, stride)
            b0 = conv_bn_relu(b0, half_channel, 1)
            b1 = depthwise_conv_bn(x, kernel_size, stride)
            b1 = conv_bn_relu(b1, half_channel, 1)
            out = tf.concat([b0, b1], axis=3)
            out = shuffle_unit(out, shuffle_group)
        return out
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 4])
out = shufflenet_v2_block(inputs, 2, (3,3), stride=1, shuffle_group=2)
print(get_num_params())

 

EfficientNet

 

EfficientNet由Google于2019年提出,透过Google AutoML的技术,搭建了八种高效的模型,分别为B0-B7,而如果我们将细节拆开来看,其实Bottleneck是由MobileNetV2所提出的Inverted Residual Block加上Squeeze-and-Excitation Networks所组成,所以我们其实只要会搭建MBConv block就能重现EfficientNet的架构,下方我们先来看看MobileNetV2相较于MobileNetV1与Resnet做了哪些重要改变。

EfficientnetB0架构

先扩张再压缩

 

作者认为,当低通道数的Feature Map经过ReLU激活后,所有值都会大于等于零,造成大量信息的流失,因此有别于Resnet先压缩、MobileNetV1直接做Depthwise separable convolution,MobileNetV2则是先透过Pointwise卷积扩张Feature Map深度。

 

跨接

 

相较于V1、V2采用ReseNet概念,对Feature Map进行跨接。

 

输出改用线性激活

 

如上方提到的,作者认为低通道数的Feature Map不适合使用ReLU激活,因此将输出层改用线性激活,如果想要使用ReLU的话,要确保输出通道深度。

MobilenetV1 、MobilenetV2、Resnet比较

跟不同阵营的shuffleNet架构比较一下,MobileNetV2推出时ShuffleNetV2还没推出,所以图中是与ShuffleNetV1比较。

Shufflenet、MobilenetV2比较

下方代码示范如何搭建MobileNetV2中的Residual_block。

 

def  depthwise_conv (x, kernel = 3, stride = 1, padding = 'SAME',
        activation_fn = None, normalizer_fn = None,
        weights_initializer = tf.contrib.layers.xavier_initializer(),
        data_format = 'NHWC', scope = 'depthwise_conv'):
        ##一般的depthwise_conv
    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape [ 3 ].value
        W = tf.get_variable ('depthwise_weights',
            [kernel, kernel, in_channels, 1 ], dtype = tf.float32,
            initializer = weights_initializer)
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format = 'NHWC')
        x = tf.layers.batch_normalization(x) if  normalizer_fn  is  not  None  else  x   # batch normalization
        x = tf.nn.leaky_relu(x) if  activation_fn  is  not  None  else  x   # nonlinearity
        return  x

def  res_block(input, expansion_ratio, output_dim, stride, name, bias = False, shortcut = True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio * input.get_shape().as_list()[-1])
        net = tf.layers.conv2d(input, bottleneck_dim,( 1, 1), name = 'pw',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##先扩张
        net = tf.layers.batch_normalization(net, name = 'pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net)
        net = tf.layers.batch_normalization(net, name = 'dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim, (1, 1), name = 'pw_linear',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##压回输出深度
        net = tf.layers.batch_normalization(net, name = 'pw_linear_bn')
        # element wise add, only for stride==1
        if  shortcut  and  stride  ==  1 :
            in_dim = int(input.get_shape().as_list()[-1])
            if  in_dim  !=  output_dim :
                ins = tf.layers.conv2d(input, output_dim, (1, 1), name = 'ex_dim',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias)
                net = ins + net
            else :
                net = input + net
        return  net

 

SENet(Squeeze-and-Excitation Networks)

SENET_Block

有了Inverted Residual Block,我们还缺Squeeze-and-Excitation Networks,SENet的核心思想在于通过网络去学习特征权重,使得有效的特征图权重大,无效或效果小的特征图权重小的方式训练模型达到更好的结果,我认为跟Attention有几分神似。

 

结合Residual后的架构如上,透过Global Average Pooling获得全局信息(Squeeze),利用FC层获取语意信息,先压缩再扩张(Excitation),最后将各个Feature Map得到系数去乘回本来的Input(中间压缩层运用ReLU感觉不太适合?),具体结合Inverted residual block,建构MBConv代码如下。

 

import tensorflow as tf
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv
    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x

def MBConvBlock(input, expansion_ratio, output_dim, stride, name, squeeze ,bias=False, shortcut=True, 
                use_Squeeze_Excitation=True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio*input.get_shape().as_list()[-1]) 
        net = tf.layers.conv2d(input, bottleneck_dim,(1,1), name='pw', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003),use_bias=bias) ##先扩张
        net = tf.layers.batch_normalization(net, name='pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net, stride=stride)
        net = tf.layers.batch_normalization(net, name='dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim,(1,1), name='pw_linear', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) ##压回输出深度
        net = tf.layers.batch_normalization(net, name='pw_linear_bn')
        
        # SENET-Squeeze-Excitation
        if use_Squeeze_Excitation:
            in_dim=int(net.get_shape().as_list()[-1])
            Squeeze=tf.layers.average_pooling2d(net, net.get_shape()[1:-1], 1)
            Squeeze=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze))
            Excitation=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=output_dim))
            Excitation=tf.nn.sigmoid(Excitation)
            net = tf.reshape(Excitation, [-1,1,1,output_dim])*net
        
        in_dim=int(input.get_shape().as_list()[-1])
        if shortcut and stride == 1:
            if in_dim != output_dim:
                ins = tf.layers.conv2d(input, output_dim,(1,1), name='ex_dim', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) 
                net = ins+net
            else:
                net = input+net
        return net
    
    
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 43])
out = MBConvBlock(inputs, 4, 64, 1, 'first', 4,bias=False, shortcut=True, use_Squeeze_Excitation=True)

 

MobileNetV3

 

只能说大神们发论文的速度比我们看论文的速度还要快,MobileNetV3传承MobileNetV1的Depthwise Separable Convolution、MobileNetV2的跨接与先放大再压缩观念,并加入了Squeeze-and-Excitation Networks,所以整个架构上与EfficientNet的MBConvBlock很相似,除此之外MobileNetV3在激励函数上做了一些变动:

 

部分Block中的ReLU使用H-swish取代,Sigmoid则使用H-sigmoid取代,H-swish是参考swish函数设计,主要是由于swish函数运算较慢,作者实验证实,使用H-swish能提高准度。

H-swish激励函数

激励函数之间的差异

下图为MobileNetV3与MobileNetV2的比较图,图中可以发现相同Latency下,MobileNetV3模型在Top-1 Accuracy上都较为胜出。

MobilenetV3 vs MobilenetV2

下方的代码实现MobileNetV3的Bottleneck。

 

import tensorflow as tf
def Hswish(input_):
    return input_* tf.nn.relu6(input_ + 3.) / 6.
def Hsigmoid(input_):
    return tf.nn.relu6(input_ + 3.) / 6.
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv
    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC',)
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def SEBlock(input_, squeeze=4):
    in_dim=int(input_.get_shape().as_list()[-1])
    Squeeze = tf.layers.average_pooling2d(input_, input_.get_shape()[1:-1], 1)
    Squeeze = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze)) 
    Excitation = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim))
    Excitation = Hsigmoid(Excitation) ##Hsigmoid replace Sigmoid
    Excitation = tf.reshape(Excitation, [-1,1,1,in_dim])
    return input_*Excitation
    
    
def MobileV3Bottleneck(input_,expand_size, squeeze,out_size, kernel_size,stride=1, relu=True, se=True):
    Shortcut = input_
    in_dim = int(input_.get_shape().as_list()[-1])
    out = tf.layers.batch_normalization(tf.layers.conv2d(input_,expand_size, (1,1), (1,1), use_bias=False))
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)
    out = depthwise_conv(out, kernel=kernel_size, stride=stride, padding='SAME')
    out = tf.layers.batch_normalization(out)
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)
        
    out = tf.layers.batch_normalization(tf.layers.conv2d(out, out_size, (1,1), (1,1), use_bias=False))
    
    if (in_dim != out_size) and (stride == 1):
        Shortcut = tf.layers.conv2d(Shortcut,out_size, (1,1), strides = (stride, stride), use_bias=False)
        Shortcut = tf.layers.batch_normalization(Shortcut)
    if se:
        assert squeeze <= out_size
        out = SEBlock(out,squeeze=squeeze)
    out = out + Shortcut if stride == 1 else out
    return out
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 80])
out = MobileBottleneck(inputs,480,4,112,3,stride=1,relu=False,se=True)

 

结论

 

今天为读者们介绍了几个最新CNN架构的核心技术,相信大家在看完后都有所收获,往后在搭建模型时,也不会局限在pretrained model,而是能依照自己的需求与想法打造最适合的Model。

Be First to Comment

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注