### 三、TensorFlow构建GBDT实践

TF-DF安装很简单 `pip install -U tensorflow_decision_forests` ，有个遗憾是目前只支持Linux环境，如果本地用不了将代码复制到 Google Colab 试试~

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
tf.random.set_seed(123)
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score,roc_curve
#print(dataset_cancer['DESCR'])
df = pd.DataFrame(dataset_cancer.data, columns=dataset_cancer.feature_names)
df['label'] = dataset_cancer.target
print(df.shape)

```# holdout验证法： 按3：7划分测试集 训练集
x_train, x_test= train_test_split(df, test_size=0.3)
# EDA分析：数据统计指标
x_train.describe(include='all')```

```# 模型参数
# 模型训练
model_tf.compile()
model_tf.fit(x=train_ds,validation_freq=0.1)```

```## 模型评估

evaluation = model_tf.evaluate(test_ds,return_dict=True)
probs = model_tf.predict(test_ds)
fpr, tpr, _ = roc_curve(x_test.label, probs)
plt.plot(fpr, tpr)
plt.title('ROC curve')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.xlim(0,)
plt.ylim(0,)
plt.show()
print(evaluation)```

MEAN_MIN_DEPTH指标。平均最小深度越小，较低的值意味着大量样本是基于此特征进行分类的，变量越重要。

NUM_NODES指标。它显示了给定特征被用作分割的次数，类似split。此外还有其他指标就不一一列举了。

#### 小结

https://www.tensorflow.org/decision_forests/ https://keras.io/examples/structured_data/classification_with_tfdf/