## 一、时间序列预测

Store Sales – Time Series Forecasting | Kaggle Use machine learning to predict grocery sales https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data 预测的基本对象是时间序列，它是随时间记录的一组观测值。 在预测应用程序中，通常以固定频率记录观察结果，例如每天或每月。

2000-04-01139
2000-04-02128
2000-04-03172
2000-04-04139
2000-04-05191

## 二、具有时间序列的线性回归

`target = weight_1 * feature_1 + weight_2 * feature_2 + bias`

### 1、时间步长特征

```import numpy as np
df['Time'] = np.arange(len(df.index))

DateHardcoverTime
2000-04-011390
2000-04-021281
2000-04-031722
2000-04-041393
2000-04-051914

`target = weight * time + bias`

```import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use("seaborn-whitegrid")
plt.rc(
"figure",
autolayout=True,
figsize=(11, 4),
titlesize=18,
titleweight='bold',
)
plt.rc(
"axes",
labelweight="bold",
labelsize="large",
titleweight="bold",
titlesize=16,
)
%config InlineBackend.figure_format = 'retina'
fig, ax = plt.subplots()
ax.plot('Time', 'Hardcover', data=df, color='0.75')
ax = sns.regplot(x='Time', y='Hardcover', data=df, ci=None, scatter_kws=dict(color='0.25'))
ax.set_title('Time Plot of Hardcover Sales');```

### 2、滞后特征

```df['Lag_1'] = df['Hardcover'].shift(1)
df = df.reindex(columns=['Hardcover', 'Lag_1'])

DateHardcoverLag_1
2000-04-01139NaN
2000-04-02128139.0
2000-04-03172128.0
2000-04-04139172.0
2000-04-05191139.0

`target = weight * lag + bias`

```fig, ax = plt.subplots()
ax = sns.regplot(x='Lag_1', y='Hardcover', data=df, ci=None, scatter_kws=dict(color='0.25'))
ax.set_aspect('equal')
ax.set_title('Lag Plot of Hardcover Sales');```

## 三、示例 – 隧道流量

Tunnel Traffic 是一个时间序列，描述了从 2003 年 11 月到 2005 年 11 月期间每天通过瑞士巴雷格隧道的车辆数量。在这个例子中，我们将进行一些练习，将线性回归应用于时间步长特征和滞后特征。

```from pathlib import Path
from warnings import simplefilter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
simplefilter("ignore")  # ignore warnings to clean up output cells
# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 4))
plt.rc(
"axes",
labelweight="bold",
labelsize="large",
titleweight="bold",
titlesize=14,
)
plot_params = dict(
color="0.75",
,
markeredgecolor="0.25",
markerfacecolor="0.25",
legend=False,
)
%config InlineBackend.figure_format = 'retina'
data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
# Create a time series in Pandas by setting the index to a date
# column. We parsed "Day" as a date type by using `parse_dates` when
tunnel = tunnel.set_index("Day")
# By default, Pandas creates a `DatetimeIndex` with dtype `Timestamp`
# (equivalent to `np.datetime64`, representing a time series as a
# sequence of measurements taken at single moments. A `PeriodIndex`,
# on the other hand, represents a time series as a sequence of
# quantities accumulated over periods of time. Periods are often
# easier to work with, so that's what we'll use in this course.
tunnel = tunnel.to_period()

DayNumVehicles
2003-11-01103536
2003-11-0292051
2003-11-03100795
2003-11-04102352
2003-11-05106569

### 1、时间步长特征

```df = tunnel.copy()
df['Time'] = np.arange(len(tunnel.index))

DayNumVehiclesTime
2003-11-011035360
2003-11-02920511
2003-11-031007952
2003-11-041023523
2003-11-051065694

```from sklearn.linear_model import LinearRegression
# Training data
X = df.loc[:, ['Time']]  # features
y = df.loc[:, 'NumVehicles']  # target
# Train the model
model = LinearRegression()
model.fit(X, y)
# Store the fitted values as a time series with the same time index as
# the training data
y_pred = pd.Series(model.predict(X), index=X.index)```

```ax = y.plot(**plot_params)
ax = y_pred.plot(ax=ax, linewidth=3)
ax.set_title('Time Plot of Tunnel Traffic');```

### 2、滞后特征

Pandas 为我们提供了一种简单的滞后序列的方法，即 shift 方法。

```df['Lag_1'] = df['NumVehicles'].shift(1)

DayNumVehiclesTimeLag_1
2003-11-011035360NaN
2003-11-02920511103536.0
2003-11-03100795292051.0
2003-11-041023523100795.0
2003-11-051065694102352.0

```from sklearn.linear_model import LinearRegression
X = df.loc[:, ['Lag_1']]
X.dropna(inplace=True)  # drop missing values in the feature set
y = df.loc[:, 'NumVehicles']  # create the target
y, X = y.align(X, join='inner')  # drop corresponding values in target
model = LinearRegression()
model.fit(X, y)
y_pred = pd.Series(model.predict(X), index=X.index)```

```fig, ax = plt.subplots()
ax.plot(X['Lag_1'], y, '.', color='0.25')
ax.plot(X['Lag_1'], y_pred)
ax.set_aspect('equal')
ax.set_ylabel('NumVehicles')
ax.set_xlabel('Lag_1')
ax.set_title('Lag Plot of Tunnel Traffic');```

```ax = y.plot(**plot_params)
ax = y_pred.plot()```