$$y = g\left(\beta + \sum_i \alpha_i K(x, x_i)\right)\label{eq:svm}$$

SVM理论不是本文的重点，我们知道它的形式如$\eqref{eq:svm}$即可。在这一节中，我们将会推导梯度下降的一个解析解，并且发现这个解跟式$\eqref{eq:svm}$具有非常相似的形式，因而我们说梯度下降出来的模型都可以近似看成一个SVM模型。

$$L(\theta) = \sum_i l(y_i, f_{\theta}(x_i))$$

”系列文章中，我们坚持的观点是梯度下降求解参数$\theta$，相当于在求解动力系统

$$\frac{d\theta}{dt} = -\frac{\partial L(\theta)}{\partial \theta}=-\sum_i \frac{\partial l(y_i, f_{\theta}(x_i))}{\partial \theta}=-\sum_i \frac{\partial l(y_i, f_{\theta}(x_i))}{\partial f_{\theta}(x_i)}\frac{\partial f_{\theta}(x_i)}{\partial \theta}$$

\begin{aligned}

\frac{df_{\theta}(x)}{dt} &= \sum_j \frac{\partial f_{\theta}(x)}{\partial \theta_j}\frac{d\theta_j}{dt}\\

&=-\sum_j \frac{\partial f_{\theta}(x)}{\partial \theta_j}\sum_i \frac{\partial l(y_i, f_{\theta}(x_i))}{\partial f_{\theta}(x_i)}\frac{\partial f_{\theta}(x_i)}{\partial \theta_j}\\

&=-\sum_i \frac{\partial l(y_i, f_{\theta}(x_i))}{\partial f_{\theta}(x_i)} \sum_j \frac{\partial f_{\theta}(x)}{\partial \theta_j} \frac{\partial f_{\theta}(x_i)}{\partial \theta_j}

\end{aligned}

$$K_{\theta}(x, x_i) = \langle\nabla_{\theta} f_{\theta}(x), \nabla_{\theta} f_{\theta}(x_i)\rangle = \sum_j \frac{\partial f_{\theta}(x)}{\partial \theta_j} \frac{\partial f_{\theta}(x_i)}{\partial \theta_j}$$

$$\frac{df_{\theta}(x)}{dt} = \sum_i \alpha_{\theta,i} K_{\theta}(x, x_i)$$

$$f_{\theta_T}(x) = f_{\theta_0}(x) + \sum_i \int_0^T \alpha_{\theta(t),i} K_{\theta(t)}(x, x_i) dt\label{eq:sgdf}$$

\alpha_i (x) = \frac{\int_0^T \alpha_{\theta(t),i} K_{\theta(t)}(x, x_i) dt}{\int_0^T K_{\theta(t)}(x, x_i) dt}, \quad K(x, x_i) = \int_0^T K_{\theta(t)}(x, x_i) dt

$$f_{\theta_T}(x) = \beta(x) + \sum_i \alpha_i (x) K(x, x_i)$$

https://kexue.fm/archives/8009

《科学空间FAQ》