Pytorch学习-自动求导

自动求导/自动微分

 

在离骚的数据里面,求导对应为微分。

 

Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we will then go to training our first neural network.

 

Pytorch中所有的神经网络核心的包是autograd.让我们简单地浏览它,然后我们将去训练我们的第一个神经网络。

 

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

 

对于张量tensor的所有操作,autograd包都提供自动求导的功能。这是一个define-by-run的框架,这意味着您的backprop由您的代码运行方式定义,而且每次迭代的都可以不同。

 

Let us see this in more simple terms with some examples.

 

让我们来用一些例子来以更简单的术语来看待这一点。

 

TENSOR

 

张量

 

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

 

torch.Tensor是包的核心类。假如你设置了它的 .requires_grad属性为True,它将会开始跟踪关于它的所有操作。当你完成了计算你可以调用.backward()函数然后得到所有自动自动算好的梯度。此时张量的梯度将累积到.grad属性中。

 

To stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.

 

要阻止张量的跟踪历史记录,你可以调用.detch()函数将其从计算历史记录中分离出来,并防止将来的计算被跟踪。

 

To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():. This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but for which we don’t need the gradients.

 

要防止跟踪计算历史记录(和使用内存),你还可以使用torch.no_grad()去包装代码:在评估模型的时候,这可能特别有用,因为模型可能requires_grad=True的可训练参数,但是我们并不需要。

 

There’s one more class which is very important for autograd implementation - a Function.

 

还有一个类对于自动求导的实现非常重要-一个函数

 

Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).

 

Tensor和Function互相连接并构建一个非循环图,它编码完整的计算历史.每个张量都有一个.grad_fn属性,该属性引用已创建Tensor的Function(除了用户创建的张量 – 他们的grad_fn是None)

 

If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

 

如果要计算导数,可以在Tensor上调用.backward()。如果Tensor是一个标量(它只含有一个元素数据),你不需要为backward()指定任何参数,但是如果它有更多的元素,你需要指定一个梯度参数,它是一个匹配形状的张量

 

import torch

 

Create a tensor and set requires_grad=True to track computation with it

 

创建一个张量并设置requires_grad=Ture以跟踪它的计算

 

x = torch.ones(2, 2, requires_grad = True)
print(x)

 

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

 

Do an operation of tensor:

 

做一个张量的操作:

 

y = x + 2
print(y)

 

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward>)

 

#y was created as a result of an operation, so it has a grad_fn.
print(y.grad_fn)

 

<AddBackward object at 0x00000192745E2240>

 

对进行更多的操作

 

z = y * y * 3
out = z.mean()
print(z, out)

 

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward>) tensor(27., grad_fn=<MeanBackward1>)

 

requires_grad_( … ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.

 

requires_grad_(…)就地更改现有的Tensor的requires_grad标志。如果没有给出,输入标志默认为False。

 

#创建一个张量
a = torch.randn(2, 2)
#对张量进行如下计算
a = (a * 3)/(a - 1)
#产看此时requires_grad的值
print(a.requires_grad)
#赋予True
a.requires_grad_(True)
print(a.requires_grad)
#再计算
b = (a * a).sum()
print(b.grad_fn)

 

False
True
<SumBackward0 object at 0x0000019276E47208>

 

Let’s backprop now Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1)).\

 

现在让我们回溯因为out包含一个标量,out.backward()等同于out.backward(torch.tensor(1))

 

out.backward()

 

print gradients d(out)/dx

 

打印梯度d(out)/dx

 

print(x.grad)

 

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

 

以下是上面倒数的计算过程

 

You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that o=14∑izi, zi=3(xi+2)2 and zi∣∣xi=1=27. Therefore, ∂o∂xi=32(xi+2), hence ∂o∂xi∣∣xi=1=92=4.5.

 

You can do many crazy things with autograd!

 

你可以使用自动求导做很多疯狂的事情!

 

#创建一个张量并且设置跟踪计算
x = torch.randn(3, requires_grad = True)
#计算
y = x + 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

 

tensor([1091.3589, 1317.4091,  271.5838], grad_fn=<MulBackward>)

 

y.data.norm()的解释

 

In [15]: x = torch.randn(3, requires_grad=True)
In [16]: y = x * 2
In [17]: y.data
Out[17]: tensor([-1.2510, -0.6302,  1.2898])
In [18]: y.data.norm()
Out[18]: tensor(1.9041)
# computing the norm using elementary operations
In [19]: torch.sqrt(torch.sum(torch.pow(y, 2)))
Out[19]: tensor(1.9041)

 

gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
#计算dy/d(gradients)
y.backward(gradients)
print(x.grad)

 

tensor([ 51.2000, 512.0000,   0.0512])

 

print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

 

True
True
False

发表评论

电子邮件地址不会被公开。 必填项已用*标注