#### 目录

1. 前言

3. Modeling one neuron

3.2 Single neuron as a linear classifier

3.2.1 Binary Softmax classifier

3.2.2 Binary SVM classifier

3.2.3 Regularization interpretation

3.3 Commonly used activation functions

3.3.6 TLDR：要点总结，扼要概述

4. Neural Network architectures

4.1 Layer-wise organization

4.2 Example feed-forward computation

4.3 Representational power

4.4 Setting number of layers and their sizes

## 3.1 神经元基本模型

```class Neuron(object):
# ...
def forward(self, inputs):
""" assume inputs and weights are 1-D numpy arrays and bias is a number """
cell_body_sum = np.sum(inputs * self.weights) + self.bias
firing_rate = 1.0 / (1.0 + math.exp(-cell_body_sum)) # sigmoid activation function
return firing_rate```

## 3.2 Single neuron as a linear classifier

### 3.2.3 Regularization interpretation

```Summary:
A single neuron can be used to implement a binary classifier (e.g. binary Softmax or binary SVM classifiers)。一个二分类线性分类器可以用单一一个神经元实现。```

## 3.3 Commonly used activation functions

### 3.3.2 tanh

tanh实际上可以有sigmoid变换而得： .

tanh和sigmoid一样，也是在两端尾巴区域梯度非常小，但是它的值域是对称的。所以，通常来说，tanh比sigmoid更有实用价值。

### 3.3.3 ReLU

ReLU pros：（1）收敛速度非常快–参见上图右；（2）运算非常简单，仅仅是一个max! 与之相比，sigmoid和tanh涉及到指数运算，运算复杂度要远远大得多

ReLU cons：  Unfortunately, ReLU units can be fragile during training and can “die”. ReLU在训练中可能会比较脆弱而且可能彻底“死”掉。 For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold（没看懂。。。^-^） 。比如说，当学习率设置比较高时，你可能会发现高达40%的神经元节点都“死”掉了。通过适当地设置学习率可以缓解这一问题.

### 3.3.5 Maxout

ReLU和Leaky ReLU都是它的特殊形式，它保持了ReLU的有点，但是回避了ReLU容易“死”的缺点。当然这些优点是有代价的，这个代价就是它的参数个数直接翻倍了。

### 3.3.6TLDR：要点总结，扼要概述

What neuron type should I use? ” Use the ReLU non-linearity, be careful with your learning rates and possibly monitor the fraction of “dead” units in a network. If this concerns you, give Leaky ReLU or Maxout a try. Never use sigmoid. Try tanh, but expect it to work worse than ReLU/Maxout.

## 4.1 Layer-wise organization

Naming conventions.约定俗成的命名规范

#### Output layer. 输出层

Sizing neural networks.神经网络大小的衡量

## 4.2 Example feed-forward computation

```# forward-pass of a 3-layer neural network:
f   = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x   = np.random.randn(3, 1)  # random input vector of three numbers (3x1)
h1  = f(np.dot(W1, x) + b1)  # calculate first hidden layer activations (4x1)
h2  = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3    # output neuron (1x1)```

`The forward pass of a fully-connected layer corresponds to one matrix multiplication followed by a bias offset and an activation function.`