## 0603-常用的神经网络层

pytorch完整教程目录： https://www.cnblogs.com/nickchen121/p/14662511.html

## 一、图像相关层

```import torch as t
from torch import nn
from torch.autograd import Variable as V
from PIL import Image
from torchvision.transforms import ToTensor, ToPILImage
to_tensor = ToTensor()  # 把 img 转为 Tensor
to_pil = ToPILImage()
nick = Image.open('img/0802程序输出nick图.jpeg')  # 只能在这里牺牲我帅气的面貌
nick```

```# 锐化卷积核后的卷积，可以看上一篇文章分享的  06-01 DeepLearning-图像识别 那篇文章里讲到的特征提取
# 输入是一个 batch，batch_size=1
inp = to_tensor(nick).unsqueeze(0)
# 卷积核的构造
kernel = t.ones(3, 3, 3) / -9  # 锐化卷积核，即刻画人物边缘特征，去除掉背景
kernel[:, 1, 1] = 1
conv = nn.Conv2d(
in_channels=3,  # 输入通道数
out_channels=1,  # 输出通道数，等同于卷积核个数
kernel_size=3,  # 核大小
stride=1,
bias=False)
conv.weight.data = kernel.view(1, 3, 3, 3)
"""
conv.weight.size() 输出为 torch.Size([1, 3, 3, 3])

"""
out = conv(V(inp))
to_pil(out.data.squeeze(0))```

```# 普通卷积
# 输入是一个 batch，batch_size=1
inp = to_tensor(nick).unsqueeze(0)
# 卷积核的构造
conv = nn.Conv2d(
in_channels=3,  # 输入通道数
out_channels=1,  # 输出通道数，等同于卷积核个数
kernel_size=3,  # 核大小
stride=1,
bias=False)
out = conv(V(inp))
to_pil(out.data.squeeze(0))```

```pool = nn.AvgPool2d(2, 2)
list(pool.parameters())```

`[]`

```out = pool(V(inp))
print(inp.size(), out.size())
to_pil(out.data.squeeze(0))```

`torch.Size([1, 3, 318, 320]) torch.Size([1, 3, 159, 160])`

Linear：全连接层
BatchNorm：批规范化层，分为 1D、2D 和 3D。除了标准的 BatchNorm 外，还有在风格迁移中常用到的 InstanceNorm 层
Dropout：dropout 层，用来防止过拟合，同样分为 1D、2D 和 3D

```# 输入 batch_size=2，维度为 3
inp = V(t.randn(2, 3))
linear = nn.Linear(3, 4)
h = linear(inp)  # (2,3)*(3,4)=(2,4)
h```

```tensor([[-0.7140, -0.0469,  1.1187,  2.0739],

```# 4通道，初始化标准差为 4，均值为 0
bn = nn.BatchNorm1d(4)
bn.weight.data = t.ones(4) * 4
bn.bias.data = t.zeros(4)
bn_out = bn(h)
# 注意输出的均值和方差
# 方差是标准差的平方，计算无偏方差分母会减 1
# 使用 unbiased=False，分母不减 1
bn_out.mean(0), bn_out.var(0, unbiased=False)```

```(tensor([0., 0., 0., 0.], grad_fn=<MeanBackward1>),

```# 每个元素以 0.5 的概率舍弃
dropout = nn.Dropout(0.5)
o = dropout(bn_out)
bn_out, o  # 有一半左右的数变为 0```

```(tensor([[ 3.9993,  4.0000,  4.0000,  4.0000],
[-3.9993, -4.0000, -4.0000, -4.0000]],
tensor([[ 0.0000,  8.0000,  8.0000,  0.0000],

## 2.1 ReLU 函数

tocrh 实现了常见的激活函数，具体的使用可以去查看官方文档或查看上面分享的文章，看文章的时候需要注意的是激活函数也可以作为独立的 layer 使用。在这里我们就讲讲最常使用的 ReLU 函数，它的数学表达式为： \(ReLU(x) = max(0,x)\) ，其实就是小于 0 的数置为 0

```relu = nn.ReLU(inplace=True)
inp = V(t.randn(2, 3))
inp```

```tensor([[-1.5703,  0.0868,  1.0811],
[-0.9903,  0.5288,  0.5926]])```

```output = relu(inp)  # 小于 0 的都被置为 0，等价于 inp.camp(min=0)
output```

```tensor([[0.0000, 0.0868, 1.0811],
[0.0000, 0.5288, 0.5926]])```

## 2.2 通过Sequential 构建前馈传播网络

```# Sequential 的三种写法
net1 = nn.Sequential()
net2 = nn.Sequential(nn.Conv2d(3, 3, 3), nn.BatchNorm2d(3), nn.ReLU())
from collections import OrderedDict
net3 = nn.Sequential(
OrderedDict([('conv1', nn.Conv2d(3, 3, 3)), ('bn1', nn.BatchNorm2d(3)),
('relu1', nn.ReLU())]))```

`net1`

```Sequential(
(conv): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
(batchnorm): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation_layer): ReLU()
)```

`net2`

```Sequential(
(0): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
(1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)```

`net3`

```Sequential(
(conv1): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
(bn1): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU()
)```

```# 可以根据名字或序号取出子 module
net1.conv, net2[0], net3.conv1```

```(Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1)),
Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1)),
Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1)))```

```# 通过 Sequential 构建的网络处理数据
inp = V(t.rand(1, 3, 4, 4))
output = net1(inp)
output = net2(inp)
output = net3(inp)
output = net3.relu1(net1.batchnorm(net1.conv(inp)))```

## 2.3 通过 ModuleList 构建前馈传播网络

```modellist = nn.ModuleList([nn.Linear(3, 4), nn.ReLU(), nn.Linear(4, 2)])
inp = V(t.rand(1, 3))
for model in modellist:
inp = model(inp)
# output = modellist(inp) # 报错，因为 modellist 没有实现 forward 方法，需要新建一个类定义 forward 方法```

```class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.list = [nn.Linear(3, 4), nn.ReLU()]
self.module_list = nn.ModuleList([nn.Conv2d(3, 3, 3), nn.ReLU()])
def forward(self):
pass

model = MyModule()
model```

```MyModule(
(module_list): ModuleList(
(0): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU()
)
)```

```for name, param in model.named_parameters():
print(name, param.size())```

```module_list.0.weight torch.Size([3, 3, 3, 3])
module_list.0.bias torch.Size([3])```

## 三、循环神经网络层

RNN 和 RNNCell 的区别在于前者能够处理整个序列，而后者一次只处理序列中的一个时间点和数据，前者更易于使用，后者的灵活性更高。因此 RNN 一般组合调用 RNNCell 一起使用。

```t.manual_seed(1000)
inp = V(t.randn(2, 3, 4))  # 输入：batch_size=3，序列长度为 2，序列中每个元素占 4 维
lstm = nn.LSTM(4, 3, 1)  # lstm 输入向量 4 维，3 个隐藏元，1 层
# 初始状态：1 层，batch_size=3，3 个隐藏元
h0 = V(t.randn(1, 3, 3))
c0 = V(t.randn(1, 3, 3))
out, hn = lstm(inp, (h0, c0))
out```

```tensor([[[-0.3610, -0.1643,  0.1631],
[-0.0613, -0.4937, -0.1642],
[ 0.5080, -0.4175,  0.2502]],
[[-0.0703, -0.0393, -0.0429],
[ 0.2085, -0.3005, -0.2686],

```t.manual_seed(1000)
inp = V(t.randn(2, 3, 4))  # 输入：batch_size=3，序列长度为 2，序列中每个元素占 4 维
# 一个 LSTMCell 对应的层数只能是一层
lstm = nn.LSTMCell(4, 3)  # lstm 输入向量 4 维，3 个隐藏元，默认1 层也只能是一层
hx = V(t.randn(3, 3))
cx = V(t.randn(3, 3))
out = []
for i_ in inp:
hx, cx = lstm(i_, (hx, cx))
out.append(hx)
t.stack(out)```

```tensor([[[-0.3610, -0.1643,  0.1631],
[-0.0613, -0.4937, -0.1642],
[ 0.5080, -0.4175,  0.2502]],
[[-0.0703, -0.0393, -0.0429],
[ 0.2085, -0.3005, -0.2686],

```embedding = nn.Embedding(4, 5)  # 有 4 个词，每个词用 5 维的向量表示
embedding.weight.data = t.arange(0, 20).view(4, 5)  # 可以用预训练好的词向量初始化 embedding```

```with t.no_grad():  # 下面代码的梯度回传有问题，因此去掉梯度
inp = V(t.arange(3, 0, -1)).long()
output = embedding(inp)
print(output)```

```tensor([[15, 16, 17, 18, 19],
[10, 11, 12, 13, 14],
[ 5,  6,  7,  8,  9]])```

## 四、损失函数

```# batch_size=3,计算对应每个类别的分数（只有两个类别）
score = V(t.randn(3, 2))
# 三个样本分别属于 1，0，1 类，label 必须是 LongTensor
label = V(t.Tensor([1, 0, 1])).long()
# loss 与普通的 layer 没有什幺差别
criterion = nn.CrossEntropyLoss()
loss = criterion(score, label)
loss```

```tensor([[ 1.0592,  1.4730],
[-0.1558, -0.8712],
[ 0.2548,  0.0817]])
tensor([1, 0, 1])

tensor(0.5630)```