©PaperWeekly 原创 · 作者｜海晨威

```import torch
a = torch.Tensor([[1,2,3], [4,5,6]])
b = torch.Tensor([[7,8,9], [10,11,12]])
c = torch.Tensor([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])
print(a.shape)
# torch.Size([2, 3])```

```>> torch.cat((a,b), dim=0)
tensor([[ 1.,  2.,  3.],
[ 4.,  5.,  6.],
[ 7.,  8.,  9.],
[10., 11., 12.]])
>> torch.cat((a,b), dim=1)
tensor([[ 1.,  2.,  3.,  7.,  8.,  9.],
[ 4.,  5.,  6., 10., 11., 12.]])```

```>> torch.softmax(a, dim=0)
tensor([[0.0474, 0.0474, 0.0474],
[0.9526, 0.9526, 0.9526]])
>> torch.softmax(a, dim=1)
tensor([[0.0900, 0.2447, 0.6652],
[0.0900, 0.2447, 0.6652]])```

```# for循环计算方式
c = torch.Tensor([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])    # shape (2,2,3)
m,n,p = c.shape
res = torch.zeros((m,n,p))
for i in range(m):
for j in range(p):
res[i,:,j] = torch.softmax(torch.tensor([c[i,k,j] for k in range(n)]), dim=0)  #这里对应最内层的for循环
# 库函数设定轴计算方式
res1 = torch.softmax(c, dim=1)
print(res.equal(res1))        # True```

axis/dim 使用小总结：

BatchNorm 和 LayerNorm 是针对数据的不同轴去做 norm，假设输入数据的维度是（N,C,H,W），分别对应 batch 数，核数，高，宽，BatchNorm 就对应 dim=0，LayerNorm 就对应 dim=1，在不考虑移动平均等具体细节问题时，两者在形式上可以统一，只有一个 dim 参数的差别。

Pytorch 的实现（简化版）如下：

```class Norm(nn.Module):
def __init__(self, num_features, variance_epsilon=1e-12):
super(Norm, self).__init__()
self.gamma = nn.Parameter(torch.ones(num_features))
self.beta = nn.Parameter(torch.zeros(num_features))
self.variance_epsilon = variance_epsilon    # 一个很小的常数，防止除0
def forward(self, x, dim):
u = x.mean(dim, keepdim=True)
s = (x - u).pow(2).mean(dim, keepdim=True)
x_norm = (x - u) / torch.sqrt(s + self.variance_epsilon)
return self.gamma * x_norm + self.beta```