，但今天会介绍如何使用 NumPy 实现 LSTM 等循环神经网络。

，其中 RNN 代码借鉴了 A.Karpathy 以前写过的代码。此外，作者还写了 Gradient check 以确定实现的正确性，是不是感觉自深度学习框架流行以来，梯度检验这个词就渐渐消失了～

``` loss = 0
# forward pass
for t in xrange(len(inputs)):
# encode in 1-of-k representation
xs[t] = np.zeros((M, B))
for b in range(0,B): xs[t][:,b][inputs[t][b]] = 1
# gates, linear part
gs[t] = np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh
# gates nonlinear part
#i, o, f gates
gs[t][0:3*HN,:] = sigmoid(gs[t][0:3*HN,:])
#c gate
gs[t][3*HN:4*HN, :] = np.tanh(gs[t][3*HN:4*HN,:])
#mem(t) = c gate * i gate + f gate * mem(t-1)
cs[t] = gs[t][3*HN:4*HN,:] * gs[t][0:HN,:] + gs[t][2*HN:3*HN,:] * cs[t-1]
# mem cell - nonlinearity
cs[t] = np.tanh(cs[t])
# new hidden state
hs[t] = gs[t][HN:2*HN,:] * cs[t]
# unnormalized log probabilities for next chars
ys[t] = np.dot(Why, hs[t]) + by
###################
mx = np.max(ys[t], axis=0)
# normalize
ys[t] -= mx
# probabilities for next chars
ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]), axis=0)
for b in range(0,B):
# softmax (cross-entropy loss)
if ps[t][targets[t,b],b] > 0: loss += -np.log(ps[t][targets[t,b],b])```

#### 项目的使用

`python dnc-debug.py`

```python rnn-numpy.py
python lstm-numpy.py
python dnc-numpy.py```

`0: 4163.009 s, iter 104800, 1.2808 BPC, 1488.38 char/s`

`GRAD CHECK Wxh: n = [-1.828500e-02, 5.292866e-03] min 3.005175e-09, max 3.505012e-07  a = [-1.828500e-02, 5.292865e-03] mean 5.158434e-08 # 10/4 Whh: n = [-3.614049e-01, 6.580141e-01] min 1.549311e-10, max 4.349188e-08  a = [-3.614049e-01, 6.580141e-01] mean 9.340821e-09 # 10/10 Why: n = [-9.868277e-02, 7.518284e-02] min 2.378911e-09, max 1.901067e-05  a = [-9.868276e-02, 7.518284e-02] mean 1.978080e-06 # 10/10 Whr: n = [-3.652128e-02, 1.372321e-01] min 5.520914e-09, max 6.750276e-07  a = [-3.652128e-02, 1.372321e-01] mean 1.299713e-07 # 10/10 Whv: n = [-1.065475e+00, 4.634808e-01] min 6.701966e-11, max 1.462031e-08  a = [-1.065475e+00, 4.634808e-01] mean 4.161271e-09 # 10/10 Whw: n = [-1.677826e-01, 1.803906e-01] min 5.559963e-10, max 1.096433e-07  a = [-1.677826e-01, 1.803906e-01] mean 2.434751e-08 # 10/10 Whe: n = [-2.791997e-02, 1.487244e-02] min 3.806438e-08, max 8.633199e-06  a = [-2.791997e-02, 1.487244e-02] mean 1.085696e-06 # 10/10 Wrh: n = [-7.319636e-02, 9.466716e-02] min 4.183225e-09, max 1.369062e-07  a = [-7.319636e-02, 9.466716e-02] mean 3.677372e-08 # 10/10 Wry: n = [-1.191088e-01, 5.271329e-01] min 1.168224e-09, max 1.568242e-04  a = [-1.191088e-01, 5.271329e-01] mean 2.827306e-05 # 10/10 bh: n = [-1.363950e+00, 9.144058e-01] min 2.473756e-10, max 5.217119e-08  a = [-1.363950e+00, 9.144058e-01] mean 7.066159e-09 # 10/10 by: n = [-5.594528e-02, 5.814085e-01] min 1.604237e-09, max 1.017124e-05  a = [-5.594528e-02, 5.814085e-01] mean 1.026833e-06 # 10/10`