Factorization Machine

Factorization Machine 主要目标是: 在数据稀疏的情况下解决组合特征的问题

推导

\begin{align}

\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i}

\end{align}

\begin{align}

\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n}\omega_{ij}x_{i}x_{j}

\end{align}

\begin{align}

W = VV^{T}

\end{align}

It is well known that for any positive definite matrix W, there exists a matrix V such that W =V · Vt provided that k is sufficiently large. This shows that a FM can express any interaction matrix W if k is chosen large enough. Nevertheless in sparse settings, typically a small k should be chosen because there is not enough data to estimate complex interactions W. Restricting k – and thus the expressiveness of the FM – leads to better generalization and thus improved interaction matrices under sparsity

\begin{align}

\hat y & = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n} <v_{i}, v_{j}>x_{i}x_{j}

\end{align}

\begin{align}

\sum_{i=1}^{n}\sum_{j=i+1}^{n} <v_{i}, v_{j}>x_{i}x_{j} & = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} <v_{i}, v_{j}>x_{i}x_{j} – \frac{1}{2}\sum_{i=1}^{n} <v_{i}, v_{i}>x_{i}x_{i} \\

& =\frac{1}{2} (\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_{i}x_{j} – \sum_{i=1}^{n}\sum_{f=1}^{k}v_{i, f}v_{i,f}x_{i}x_{i}) \\

& = \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})(\sum_{j=1}^{n}v_{j,f}x_{j}) – \sum_{i=1}^{n} v_{i,f}^2 x_i^2)\\

& = \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})^2 – \sum_{i=1}^{n} v_{i,f}^2 x_i^2)\\

\end{align}

\begin{align}

\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})^2 – \sum_{i=1}^{n} v_{i,f}^2 x_i^2)

\end{align}

训练

回归问题

\begin{align}

loss^{R}(\hat y, y) = \frac{1}{2m}\sum_{i=1}^{m}(\hat y^{(i)} – y^{(i)})^2

\end{align}

分类问题

CTR预测本质上就是一个二元分类问题, 结果为点击的概率。分类问题中 logloss 做损失函数, $$\sigma$$ 为 sigmoid函数

\begin{align}

loss^{C}(\hat y, y) = \sum_{i=1}^{m} -ln\sigma(\hat y^{(i)}y^{(i)})

\end{align}

梯度下降

\begin{align}

\theta = \theta – \alpha\frac{\partial{loss}}{\partial\theta}

\end{align}