## 作用

$\frac{\partial{z}}{\partial{w_i}}=x_i \tag{1}$

$\frac{\partial{z}}{\partial{b_i}}=1 \tag{2}$

$\frac{\partial{}}{\partial{w_i}}=\frac{\partial{loss}}{\partial{\sigma(z)}}\frac{\partial{\sigma(z)}}{\partial{z}}\frac{\partial{z}}{\partial{w_i}}=\frac{\partial{loss}}{\partial{\sigma(z)}}\frac{\partial{\sigma(z)}}{\partial{z}}x_i \tag{3}$

$\frac{\partial{loss}}{\partial{b_i}}=\frac{\partial{loss}}{\partial{\sigma(z)}}\frac{\partial{\sigma(z)}}{\partial{z}}\frac{\partial{z}}{\partial{b_i}}=\frac{\partial{loss}}{\partial{\sigma(z)}}\frac{\partial{\sigma(z)}}{\partial{z}} \tag{4}$

## 常用损失函数

MSE (均方误差函数)

$loss = \frac{1}{2}\sum_{i}(y[i] – a[i]) ^ 2$ ,

$$i$$

$\frac{\partial{loss}}{\partial{z}} = \sum_{i}(y[i] – a[i])*\frac{\partial{a[i]}}{\partial{z}}$

$$\frac{\partial{a[i]}}{\partial{z}}$$

$loss = \frac{1}{2}(a – y)^2 = \frac{1}{2}(0.9 – 0.5)^2 = 0.08$

$grad = (a – y) \times \frac{\partial{a}}{\partial{z}} \frac{\partial{z}}{\partial{w}} = (a – y) \times a \times (1 – a) \times x = (0.9 – 0.5) \times 0.9 \times (1-0.9) \times 1= 0.036$

$w = w – \eta \times grad = ln(9) – 0.2 \times 0.036 = 2.161$

$a = \frac{1}{1 + e^{-wx}} = 0.8967$

$loss = \frac{1}{2}(a – y)^2 = 0.07868$

1. 交叉熵函数

$D_{KL}(P || Q) = – \sum_{i}P(i)log(\frac{Q(i)}{P(i)})$

$L = \sum_{i}log(P(y_i;x_i, \theta))$ ，

$$P(x_i)$$ 对于每一个
$$(i, x_i, y_i)$$ 来说均是定值,在确定
$$x_i$$

，输出是

$$y_i$$

$\theta^{*} = {argmax}_{\theta}\sum_{i}P_{data}(y_i;x_i)log(P(y_i;x_i, \theta))$

$D_{KL}(P || Q) = – \sum_{i}P(i)log(\frac{Q(i)}{P(i)}) = \sum_{i}P(i)log(P(i)) – \sum_{i}P(i)log(Q(i))$

$$P(i)$$ $$P_{data}(y_i;x_i)$$ ， $$Q(i)$$ 代表 $$P(y_i;x_i,\theta)$$ 。

$$-\sum_{i}P(i)log(Q(i))$$

，也就是在最大化似然函数。

$loss = \sum_{i}y(x_i)log(a(x_i)) + (1 – y(x_i))log(1 – a(x_i))$

$$a(x_i)$$

$\frac{\partial{loss}}{\partial{z}} = (-\frac{y(z)}{a(z)} + \frac{1 – y(z)}{1 – a(z)})*\frac{\partial{a(z)}}{\partial{z}} = \frac{a(z) – y(z)}{a(z)(1-y(z))}*\frac{\partial{a(z)}}{\partial{z}}$

$\frac{\partial{loss}}{\partial{z}} = \frac{a(z) – y(z)}{y(z)(1-a(z))}*\frac{\partial{a(z)}}{\partial{z}} = a(z) – y(z)$

https://www.cnblogs.com/alexanderkun/p/8098781.html