## 60分钟入门深度学习工具-PyTorch

https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

github下载：

https://github.com/fengdu78/machine_learning_beginner/tree/master/PyTorch_beginner

## 一、PyTorch 是什幺

### 开始

#### 张量（Tensors)

```from __future__ import print_function
import torch```

```x = torch.Tensor(5, 3)
print(x)```

```tensor([[ 0.0000e+00,  0.0000e+00,  1.3004e-42],
[ 0.0000e+00,  7.0065e-45,  0.0000e+00],
[-3.8593e+35,  7.8753e-43,  0.0000e+00],
[ 0.0000e+00,  1.8368e-40,  0.0000e+00],
[-3.8197e+35,  7.8753e-43,  0.0000e+00]])```

```x = torch.zeros(5, 3, dtype=torch.long)
print(x)```

```tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])```

```x = torch.tensor([5.5, 3])
print(x)```

`tensor([5.5000, 3.0000])`

```x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)
x = torch.randn_like(x, dtype=torch.float)    # 覆盖类型!
print(x)                                      # result 的size相同```

```tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], dtype=torch.float64)
tensor([[ 1.1701, -0.8342, -0.6769],
[-1.3060,  0.3636,  0.6758],
[ 1.9133,  0.3494,  1.1412],
[ 0.9735, -0.9492, -0.3082],
[ 0.9469, -0.6815, -1.3808]])```

`print(x.size())`

`torch.Size([5, 3])`

** 注意 **

`torch.Size` 实际上是一个元组，所以它支持元组的所有操作。

### 操作

#### 语法1

```y = torch.rand(5, 3)
print(x + y)```

```tensor([[ 1.7199, -0.1819, -0.1543],
[-0.5413,  1.1591,  1.4098],
[ 2.0421,  0.5578,  2.0645],
[ 1.7301, -0.3236,  0.4616],
[ 1.2805, -0.4026, -0.6916]])```

#### 语法二

`print(torch.add(x, y))`

```tensor([[ 1.7199, -0.1819, -0.1543],
[-0.5413,  1.1591,  1.4098],
[ 2.0421,  0.5578,  2.0645],
[ 1.7301, -0.3236,  0.4616],
[ 1.2805, -0.4026, -0.6916]])```

#### 语法三：

```result = torch.empty(5, 3)
print(result)```

```tensor([[ 1.7199, -0.1819, -0.1543],
[-0.5413,  1.1591,  1.4098],
[ 2.0421,  0.5578,  2.0645],
[ 1.7301, -0.3236,  0.4616],
[ 1.2805, -0.4026, -0.6916]])```

#### 语法四：

```# 把x加到y上
print(y)```

```tensor([[ 1.7199, -0.1819, -0.1543],
[-0.5413,  1.1591,  1.4098],
[ 2.0421,  0.5578,  2.0645],
[ 1.7301, -0.3236,  0.4616],
[ 1.2805, -0.4026, -0.6916]])```

#### 注意

`print(x[:, 1])`

`tensor([-0.8342,  0.3636,  0.3494, -0.9492, -0.6815])`

```x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # -1的意思是没有指定维度
print(x.size(), y.size(), z.size())```

`torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])`

```x = torch.randn(1)
print(x)
print(x.item())```

```tensor([0.3441])
0.34412217140197754```

### Numpy桥

Torch张量和numpy数组将共享潜在的内存，改变其中一个也将改变另一个。

#### 把Torch张量转换为numpy数组

```a = torch.ones(5)
print(a)```

`tensor([1., 1., 1., 1., 1.])`

```b = a.numpy()
print(b)
print(type(b))```

```[ 1.  1.  1.  1.  1.]
<class 'numpy.ndarray'>```

```a.add_(1)
print(a)
print(b)```

```tensor([2., 2., 2., 2., 2.])
[ 2.  2.  2.  2.  2.]```

#### 把numpy数组转换为torch张量

```import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a)
print(b)```

```[ 2.  2.  2.  2.  2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)```

### CUDA张量

```# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
device = torch.device("cuda")          # a CUDA device object
y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
x = x.to(device)                       # or just use strings ``.to("cuda")``
z = x + y
print(z)
print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!```

```tensor([1.3441], device='cuda:0')
tensor([1.3441], dtype=torch.float64)```

Jupyter notebook:

PyTorch 中所有神经网络的核心是 `autograd` 包.我们首先简单介绍一下这个包,然后训练我们的第一个神经网络.

`autograd` 包为张量上的所有操作提供了自动求导.它是一个运行时定义的框架,这意味着反向传播是根据你的代码如何运行来定义,并且每次迭代可以不同.

### 张量(Tensor)

`torch.Tensor` 是包的核心类。如果将其属性 `.requires_grad` 设置为True，则会开始跟踪其上的所有操作。完成计算后，您可以调用 `.backward()` 并自动计算所有梯度。此张量的梯度将累积到 `.grad` 属性中。

Tensor和Function互相连接并构建一个非循环图构建一个完整的计算过程。每个张量都有一个 `.grad_fn` 属性，该属性引用已创建Tensor的Function（除了用户创建的Tensors  – 它们的 `grad_fn``None` ）。

`import torch`

```x = torch.ones(2, 2, requires_grad=True)
print(x)```

```tensor([[1., 1.],

```y = x + 2
print(y)```

```tensor([[3., 3.],

```print(y.grad_fn)

```<AddBackward0 object at 0x000001E020B794A8>
None```

```z = y * y * 3
out = z.mean()
print(z, out)```

```tensor([[27., 27.],

`.requires\_grad_(...)` 就地更改现有的Tensor的 `requires_grad` 标志。如果没有给出，输入标志默认为False。

```a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
b = (a * a).sum()

```False
True
<SumBackward0 object at 0x000001E020B79FD0>```

`out.backward()`

`print(x.grad)`

```tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])```

```x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)```

`tensor([  384.5854,   -13.6405, -1049.2870], grad_fn=<MulBackward0>)`

```v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

`tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])`

```print(x.requires_grad)

```True
True
False```

Jupyter notebook:

## 三、神经网络

`weight = weight - learning_rate * gradient`

### 定义网络

```import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]  # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features

net = Net()
print(net)```

```Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)```

`net.parameters()` 返回模型需要学习的参数。

```params = list(net.parameters())
print(len(params))
print(params[0].size())```

```10
torch.Size([6, 1, 5, 5])```

`forward` 的输入和输出都是 `autograd.Variable` .注意:这个网络(LeNet)期望的输入大小是32*32.如果使用MNIST数据集来训练这个网络,请把图片大小重新调整到32*32.

```input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)```

```tensor([[-0.1217,  0.0449, -0.0392, -0.1103, -0.0534, -0.1108, -0.0565,  0.0116,

```net.zero_grad()
out.backward(torch.randn(1, 10))```

`torch.nn` 只支持小批量输入,整个 `torch.nn` 包都只支持小批量样本,而不支持单个样本

#### 回顾

`torch.Tensor` -支持自动编程操作（如 `backward()` ）的多维数组。同时保持梯度的张量。

`nn.Module` -神经网络模块.封装参数,移动到GPU上运行,导出,加载等

`nn.Parameter` -一种张量,当把它赋值给一个 `Module` 时,被自动的注册为参数.

`autograd.Function` -实现一个自动求导操作的前向和反向定义, 每个张量操作都会创建至少一个 `Function` 节点，该节点连接到创建张量并对其历史进行编码的函数。

#### 现在,我们包含了如下内容:

`backward`

### 损失函数

```output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)```

`tensor(0.5663, grad_fn=<MseLossBackward>)`

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss

```print(loss.grad_fn)  # MSELoss

```<MseLossBackward object at 0x0000029E54C509B0>

### 反向传播

```net.zero_grad()     # zeroes the gradient buffers of all parameters
loss.backward()

```conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
tensor([ 0.0006, -0.0164,  0.0122, -0.0060, -0.0056, -0.0052])```

### 更新权重

`weight=weight−learning_rate∗gradient`

```learning_rate = 0.01
for f in net.parameters():

```import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01)
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update```

#### 注意

Jupyter notebook:

## 四、训练一个分类器

### 关于数据

`torchvision` 包,包含常见数据集的数据加载,比如Imagenet,CIFAR10,MNIST等,和图像转换器,也就是
`torchvision.datasets`
`torch.utils.data.DataLoader`

### 训练一个图像分类器

`torchvision` 加载和归一化CIFAR10训练集和测试集.

#### 1. 加载和归一化CIFAR0

```import torch
import torchvision
import torchvision.transforms as transforms```

torchvision的输出是[0,1]的PILImage图像,我们把它转换为归一化范围为[-1, 1]的张量。

```transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
#这个过程有点慢，会下载大约340mb图片数据。```

```Files already downloaded and verified

```import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5     # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))```

plane                   deer               dog                  plane

#### 2. 定义一个卷积神经网络

```import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()```

#### 3. 定义损失函数和优化器

```import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)```

#### 4. 训练网络

```for epoch in range(2):  # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999:    # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')```

```[1,  2000] loss: 2.286
[1,  4000] loss: 1.921
[1,  6000] loss: 1.709
[1,  8000] loss: 1.618
[1, 10000] loss: 1.548
[1, 12000] loss: 1.496
[2,  2000] loss: 1.435
[2,  4000] loss: 1.409
[2,  6000] loss: 1.373
[2,  8000] loss: 1.348
[2, 10000] loss: 1.326
[2, 12000] loss: 1.313
Finished Training```

#### 5. 在测试集上测试网络

```dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))```

`GroundTruth:    cat  ship  ship plane`

`outputs = net(images)`

```_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))```

`Predicted:    cat  ship  ship plane`

```correct = 0
total = 0
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))```

`Accuracy of the network on the 10000 test images: 54 %`

```class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1

for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))```

```Accuracy of plane : 52 %
Accuracy of   car : 63 %
Accuracy of  bird : 43 %
Accuracy of   cat : 33 %
Accuracy of  deer : 36 %
Accuracy of   dog : 46 %
Accuracy of  frog : 68 %
Accuracy of horse : 62 %
Accuracy of  ship : 80 %
Accuracy of truck : 63 %```

### 在GPU上训练

```device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
#假设我们有一台CUDA的机器，这个操作将显示CUDA设备。
print(device)```

`cuda:0`

`net.to(device)`

```Net(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)```

`inputs, labels = inputs.to(device), labels.to(device)`

#### 实现的目标:

Jupyter notebook:

## 五、数据并行(选读)

PyTorch非常容易的就可以使用GPU,你可以用如下方式把一个模型放到GPU上:

`device = torch.device("cuda:0")`

`model.to(device)`

`mytensor = my_tensor.to(device)`

`model = nn.DataParallel(model)`

### 导入和参数

```import torch
import torch.nn as nn
input_size = 5
output_size = 2
batch_size = 30
data_size = 100```

`device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")`

### 虚拟数据集

```class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
batch_size=batch_size, shuffle=True)```

### 简单模型

```class Model(nn.Module):
# Our model
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tIn Model: input size", input.size(),
"output size", output.size())
return output```

### 创建一个模型和数据并行

```model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)```

```Model(
(fc): Linear(in_features=5, out_features=2, bias=True)
)```

### 运行模型

```for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size())```

```In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])```

### 总结

`DataParallel` 自动的划分数据，并将作业发送到多个GPU上的多个模型。在每个模型完成作业后， `DataParallel` 收集并合并结果返回给你。

http://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

Jupyter notebook:

## 结束语

https://github.com/fengdu78/machine_learning_beginner/tree/master/PyTorch_beginner