XGBoost也可以用于时间序列预测，尽管要先把时间序列数据集转换成适用于有监督学习的形式。它还需要使用一种专门的技术来评估模型，称为前向推进验证，因为模型评估使用了k-折叠交叉，这会产生有正偏差的结果。

XGBoost是用于分类和回归问题的梯度提升集成方法的一个实现。

#### 一、XGBoost集成

Tree boosting has been shown to give state-of-the-art results onmany standard classification benchmarks.

— XGBoost:A Scalable Tree Boosting System, 2016.

XGBoost是随机梯度提升算法的一种高效实现，它可以通过一系列模型超参数在整个训练过程中控制模型。

The mostimportant factor behind the success of XGBoost is its scalability in allscenarios. The system runs more than ten times faster than existing popularsolutions on a single machine and scales to billions of examples in distributedor memory-limited settings.

— XGBoost: A Scalable TreeBoosting System, 2016.

XGBoost是为表格式数据集的分类和回归问题而设计的，也可以用于时间序列预测。

#### 二、时间序列数据准备

train_test_split()函数是用来把数据集划分为训练集和测试集的。可以如下定义这个方法：

#### 三、XGBoost用于时间序列预测

# forecast monthlybirths with xgboost

from numpy importasarray

from pandas importDataFrame

from pandas importconcat

from sklearn.metricsimport mean_absolute_error

from xgboost importXGBRegressor

from matplotlib importpyplot

# transform a timeseries dataset into a supervised learning dataset

defseries_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list elsedata.shape[1]

df = DataFrame(data)

cols = list()

# input sequence (t-n, … t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

# forecast sequence (t, t+1, … t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

# put it all together

agg = concat(cols, axis=1)

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg.values

# split a univariatedataset into train/test sets

deftrain_test_split(data, n_test):

return data[:-n_test, :], data[-n_test:,:]

# fit an xgboost modeland make a one step prediction

def xgboost_forecast(train,testX):

# transform list into array

train = asarray(train)

# split into input and output columns

trainX, trainy = train[:, :-1], train[:,-1]

# fit model

model =XGBRegressor(objective=’reg:squarederror’, n_estimators=1000)

model.fit(trainX, trainy)

# make a one-step prediction

yhat = model.predict(asarray([testX]))

return yhat[0]

# walk-forwardvalidation for univariate data

defwalk_forward_validation(data, n_test):

predictions = list()

# split dataset

train, test = train_test_split(data,n_test)

# seed history with training dataset

history = [x for x in train]

# step over each time-step in the testset

for i in range(len(test)):

# split test row into input andoutput columns

testX, testy = test[i, :-1],test[i, -1]

# fit model on history and make aprediction

yhat = xgboost_forecast(history,testX)

# store forecast in list ofpredictions

predictions.append(yhat)

# add actual observation tohistory for the next loop

history.append(test[i])

# summarize progress

print(‘>expected=%.1f,predicted=%.1f’ % (testy, yhat))

# estimate prediction error

error = mean_absolute_error(test[:, 1],predictions)

return error, test[:, 1], predictions

# load the dataset

values = series.values

# transform the timeseries data into supervised learning

data =series_to_supervised(values, n_in=3)

# evaluate

mae, y, yhat =walk_forward_validation(data, 12)

print(‘MAE: %.3f’ %mae)

# plot expected vspreducted

pyplot.plot(y,label=’Expected’)

pyplot.plot(yhat,label=’Predicted’)

pyplot.legend()

pyplot.show()

# finalize model andmake a prediction for monthly births with xgboost

from numpy importasarray

from pandas importDataFrame

from pandas importconcat

from xgboost importXGBRegressor

# transform a timeseries dataset into a supervised learning dataset

defseries_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list elsedata.shape[1]

df = DataFrame(data)

cols = list()

# input sequence (t-n, … t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

# forecast sequence (t, t+1, … t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

# put it all together

agg = concat(cols, axis=1)

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return agg.values

# load the dataset

values = series.values

# transform the timeseries data into supervised learning

train =series_to_supervised(values, n_in=3)

# split into input andoutput columns

trainX, trainy =train[:, :-1], train[:, -1]

# fit model

model =XGBRegressor(objective=’reg:squarederror’, n_estimators=1000)

model.fit(trainX,trainy)

# construct an inputfor a new preduction

row = values[-3:].flatten()

# make a one-stepprediction

yhat =model.predict(asarray([row]))

print(‘Input: %s,Predicted: %.3f’ % (row, yhat[0]))

#### 进一步阅读

#### 总结

XGBoost是用于分类和回归的梯度boosting集成算法的实现

