1.1  基于传统统计学的方法

1.2  基于模型的方法

1.3  基于机器学习的方法

#### 项目搭建

Gain算法是由GAN网络推广而来，其中生成器用来准确估算缺失数据，判别器为判别预测值和真实值之间的误差，从而更新生成器和判别器的参数。同样按照GAN网络基本原则，其基本目标为寻找纳什平衡点，使其生成器和判别器loss相同得到最佳结果。项目整体过程分为数据集准备、数据处理、以及网络结构搭建和模型训练，具体介绍如下：

2.1  训练数据集

2.2  数据处理

```def normalization (data, parameters=None):
_, dim = data.shape
norm_data = data.copy()
if parameters is None:
min_val = np.zeros(dim)
max_val = np.zeros(dim)
for i in range(dim):
min_val[i] = np.nanmin(norm_data[:,i])
norm_data[:,i] = norm_data[:,i] - np.nanmin(norm_data[:,i])
max_val[i] = np.nanmax(norm_data[:,i])
norm_data[:,i] = norm_data[:,i] / (np.nanmax(norm_data[:,i]) + 1e-6)
norm_parameters = {'min_val': min_val,
'max_val': max_val}
else:
min_val = parameters['min_val']
max_val = parameters['max_val']
for i in range(dim):
norm_data[:,i] = norm_data[:,i] - min_val[i]
norm_data[:,i] = norm_data[:,i] / (max_val[i] + 1e-6)
norm_parameters = parameters
return norm_data, norm_parameters```

2.3  模型搭建

```def generator(x,m):
inputs = tf.concat(values = [x, m], axis = 1)
G_h1 = tf.nn.relu(tf.matmul(inputs, G_W1) + G_b1)
G_h2 = tf.nn.relu(tf.matmul(G_h1, G_W2) + G_b2)
G_prob = tf.nn.sigmoid(tf.matmul(G_h2, G_W3) + G_b3)
return G_prob
def discriminator(x, h):
inputs = tf.concat(values = [x, h], axis = 1)
D_h1 = tf.nn.relu(tf.matmul(inputs, D_W1) + D_b1)
D_h2 = tf.nn.relu(tf.matmul(D_h1, D_W2) + D_b2)
D_logit = tf.matmul(D_h2, D_W3) + D_b3
D_prob = tf.nn.sigmoid(D_logit)
return D_prob
X = tf.placeholder(tf.float32, shape = [None, dim])
M = tf.placeholder(tf.float32, shape = [None, dim])
H = tf.placeholder(tf.float32, shape = [None, dim])
D_W1 = tf.Variable(xavier_init([dim*2, h_dim]))
D_b1 = tf.Variable(tf.zeros(shape = [h_dim]))
D_W2 = tf.Variable(xavier_init([h_dim, h_dim]))
D_b2 = tf.Variable(tf.zeros(shape = [h_dim]))
D_W3 = tf.Variable(xavier_init([h_dim, dim]))
D_b3 = tf.Variable(tf.zeros(shape = [dim]))
theta_D = [D_W1, D_W2, D_W3, D_b1, D_b2, D_b3]
G_W1 = tf.Variable(xavier_init([dim*2, h_dim]))
G_b1 = tf.Variable(tf.zeros(shape = [h_dim]))
G_W2 = tf.Variable(xavier_init([h_dim, h_dim]))
G_b2 = tf.Variable(tf.zeros(shape = [h_dim]))
G_W3 = tf.Variable(xavier_init([h_dim, dim]))
G_b3 = tf.Variable(tf.zeros(shape = [dim]))
theta_G = [G_W1, G_W2, G_W3, G_b1, G_b2, G_b3]```

#### 往期回顾

50行 Python 代码绘制数据大屏！