## 数据处理&增强篇

### 2. Augmentation by Information Dropping(AID)

2020 COCO Keypoint Challenge 冠军之路

## 结构篇

### 1. Mish

CoinCheung/pytorch-loss: label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

Mish的作者更新了一个Hard版本用于移动设备，同时也加入了inplace

def hard_mish(x, inplace: bool = False) :
"""Implements the HardMish activation function
Args:
x: input tensor
Returns:
output tensor
"""
if inplace:
return x.mul_(0.5 * (x + 2).clamp(min=0, max=2))
else:
return 0.5 * x * (x + 2).clamp(min=0, max=2)
class HardMish(nn.Module):
"""Implements the Had Mish activation module from "H-Mish" <https://github.com/digantamisra98/H-Mish>_
This activation is computed as follows:
.. math::
f(x) = \\frac{x}{2} \\cdot \\min(2, \\max(0, x + 2))
"""
def __init__(self, inplace: bool = False) -> None:
super().__init__()
self.inplace = inplace
def forward(self, x):
return hard_mish(x, inplace=self.inplace)

hardmish
deploy macs: 127.173792
mnn_inference: 9.927172660827637
mnn_inference: 9.442970752716064
mnn_inference: 9.6360182762146
mnn_inference: 9.197959899902344
mnn_inference: 11.279244422912598
hardswish
deploy macs: 126.910368
mnn_inference: 11.386265754699707
mnn_inference: 11.019973754882812
mnn_inference: 8.461620807647705
mnn_inference: 9.613375663757324
mnn_inference: 11.782505512237549
relu
deploy macs: 126.646944
mnn_inference: 9.963874816894531
mnn_inference: 11.411187648773193
mnn_inference: 9.003558158874512
mnn_inference: 8.894507884979248
mnn_inference: 10.858681201934814

def hard_sigmoid(x, inplace=False):
return nn.ReLU6(inplace=inplace)(x + 3) / 6
def hard_swish(x, inplace=False):
return x * hard_sigmoid(x, inplace)
class HardSwish(nn.Module):
def __init__(self, inplace=False):
super(HardSwish, self).__init__()
self.inplace = inplace
def forward(self, x):
return hard_swish(x, inplace=self.inplace)

### 2. RepVGG

https://zhuanlan.zhihu.com/p/344324470

### 3. OKDHP和NetAug

OKDHP的全称是Online Knowledge Distillation for Efficient Pose Estimation，总体的思路是为要训练的轻量模型增加几个分支，每个分支都学跟原来模型一样的东西，每个分支可以跟原来的模型一样，也可以不一样，这样相当于同时训练了多个小模型，将结果进行集成。由于集成学习的思想，我们知道小模型集成后的结果往往是好于单个模型的，因此我们可以把集成的结果当成蒸馏中教师网络的输出，让小模型去逼近集成的结果。

NetAug也是一篇很不错的工作，不过我觉得最重要的还是论证了网络增强和正则化在大模型和小模型上的意义，看完后虽然感觉很有道理但并不想实际去跑实验（因为感觉太花里胡哨了），所以提起劲儿把一直搁置的OKDHP跑了一下，这里贴一下我的实验结果吧。

https://zhuanlan.zhihu.com/p/399742423

## 损失函数篇

### 1. DSNT

anibali/dsntnn: PyTorch implementation of DSNT (github.com)

DSNT这种regression-based method对移动端非常友好，在移动设备上为了提高fps往往特征图会压到14×14甚至7×7，这个尺度下heatmap根本用不了

### 2. Bone Loss

PS：该实验使用的是DSNT，使用了余弦退火学习率衰减

https://github.com/674106399/JointBoneLoss

### 5. RLE

https://zhuanlan.zhihu.com/p/395521994

## 后处理篇

### 2. 指数滑动平均滤波

https://zhuanlan.zhihu.com/p/433571477