James Pond 在 Unsplash 杂志上的照片

Stanford ML Group 最近在他们的论文中发表了一个新算法，其实现被称为 。该算法利用自然梯度将不确定性估计引入到梯度增强中。本文试图了解这个新算法，并与其他流行的 boosting 进行比较，以了解它在实践中是如何工作的。

1.什幺是自然梯度增强？

#### 评分规则

2.经验验证：与 LightGBM 和 XGBoost 的比较

billy lee 在 Unsplash 杂志上的照片

```# import packages
import pandas as pd

from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.distns import Normal
from ngboost.scores

import MLE import lightgbm as lgb

import xgboost as xgb

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt```

```# read the dataset

# feature engineering
tr, te = Nanashi_solution(df)```

```# NGBoost
ngb = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(),

ngboost = ngb.fit(np.asarray(tr.drop(['SalePrice'],1)),
np.asarray(tr.SalePrice))

y_pred_ngb = pd.DataFrame(ngb.predict(te.drop(['SalePrice'],1)))```

```# LightGBM
ltr = lgb.Dataset(tr.drop(['SalePrice'],1),label=tr['SalePrice'])

param = {
'bagging_freq': 5,
'bagging_fraction': 0.6,
'bagging_seed': 123,
'boost_from_average':'false',
'boost': 'gbdt',
'feature_fraction': 0.3,
'learning_rate': .01,
'max_depth': 3,
'metric':'rmse',
'min_data_in_leaf': 128,
'min_sum_hessian_in_leaf': 8,
'tree_learner': 'serial',
'objective': 'regression',
'verbosity': -1,
'random_state':123,
'max_bin': 8,
'early_stopping_round':100
}

lgbm = lgb.train(param,ltr,num_boost_round=10000,valid_sets= [(ltr)],verbose_eval=1000)

y_pred_lgb = lgbm.predict(te.drop(['SalePrice'],1))
y_pred_lgb = np.where(y_pred>=.25,1,0)

# XGBoost
params = {
'max_depth': 4, 'eta': 0.01,
'objective':'reg:squarederror',
'eval_metric': ['rmse'],
'booster':'gbtree',
'verbosity':0,
'sample_type':'weighted',
'max_delta_step':4,
'subsample':.5,
'min_child_weight':100,
'early_stopping_round':50
}

dtr, dte = xgb.DMatrix(tr.drop(['SalePrice'],1),label=tr.SalePrice),
xgb.DMatrix(te.drop(['SalePrice'],1),label=te.SalePrice)

num_round = 5000
xgbst = xgb.train(params,dtr,num_round,verbose_eval=500)

y_pred_xgb = xgbst.predict(dte)```

```# Check the results
print('RMSE: NGBoost',
round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_ngb)),4))
print('RMSE: LGBM',
round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_lgbm)),4))
print('RMSE: XGBoost',
round(sqrt(mean_squared_error(X_val.SalePrice,y_pred_xgb)),4))```

NGBoost 与其他 boosting 算法最大的区别之一是可以返回每个预测的概率分布。这可以通过使用 pred_dist 函数可视化。此函数能够显示概率预测的结果。

```# see the probability distributions by visualising
Y_dists = ngb.pred_dist(X_val.drop(['SalePrice'],1))
y_range = np.linspace(min(X_val.SalePrice), max(X_val.SalePrice), 200)
dist_values = Y_dists.pdf(y_range).transpose()

# plot index 0 and 114
idx = 114
plt.plot(y_range,dist_values[idx])
plt.title(f"idx: {idx}")
plt.tight_layout()
plt.show()```

NGBoost 是一种返回概率分布的 boosting 算法。

NGBoost 预测与其他流行的 boosting 算法相比具有很大的竞争力。

#### 参考文献：

[1] T. Duan, et al., NGBoost: Natural Gradient Boosting for Probabilistic Prediction (2019), ArXiv 1910.03225