## 1.1、导入相关库

```import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from sklearn.neighbors import KNeighborsClassifier```

## 1.2、导入数据

```data = pd.read_excel('movies.xlsx', sheet_name=1)
data```

## 1.3、切分出 x 和 y

```x = data[['武打镜头', '接吻镜头']]
y = data['分类情况']```

## 1.4、声明算法

`knn = KNeighborsClassifier(n_neighbors=5)`

## 1.5、进行训练

`knn.fit(x, y)`

```KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')```

## 1.6、生成数据(导入预测值)

```1、碟中谍6——>动作片
2、李茶的姑妈——>爱情片```

`x_test = np.array([[100, 2], [2, 15]])`

## 1.7、使用算法进行预测

`knn.predict(x_test)`

`array(['动作片', '爱情片'], dtype=object)`

## 2、预测是否患癌症

1. 第一个特征是ID；第二个特征是这个案例的癌症诊断结果，编码“M”表示恶性，“B”表示良性。

## 2.1、获取数据

x数据，样本（人）特征，细胞核的质地、光滑度等
y目标值，样本（人）诊断结果、两种：M恶性，B良性

### 2.1.1、导入数据

```import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from sklearn.neighbors import KNeighborsClassifier
# 导入数据
# 二分类问题，良性和恶行
data```
0842302M17.9910.38122.801001.00.118400.277600.300100.1471025.38017.33184.602019.00.162200.665600.71190.26540.46010.11890
1842517M20.5717.77132.901326.00.084740.078640.086900.0701724.99023.41158.801956.00.123800.186600.24160.18600.27500.08902
284300903M19.6921.25130.001203.00.109600.159900.197400.1279023.57025.53152.501709.00.144400.424500.45040.24300.36130.08758
384348301M11.4220.3877.58386.10.142500.283900.241400.1052014.91026.5098.87567.70.209800.866300.68690.25750.66380.17300
484358402M20.2914.34135.101297.00.100300.132800.198000.1043022.54016.67152.201575.00.137400.205000.40000.16250.23640.07678
564926424M21.5622.39142.001479.00.111000.115900.243900.1389025.45026.40166.102027.00.141000.211300.41070.22160.20600.07115
565926682M20.1328.25131.201261.00.097800.103400.144000.0979123.69038.25155.001731.00.116600.192200.32150.16280.25720.06637
566926954M16.6028.08108.30858.10.084550.102300.092510.0530218.98034.12126.701124.00.113900.309400.34030.14180.22180.07820
567927241M20.6029.33140.101265.00.117800.277000.351400.1520025.74039.42184.601821.00.165000.868100.93870.26500.40870.12400
56892751B7.7624.5447.92181.00.052630.043620.000000.000009.45630.3759.16268.60.089960.064440.00000.00000.28710.07039

### 2.1.2、切分出 x数据 和 y目标值

```# 诊断结果、目标值
y = data['Diagnosis']
# 训练数据，特征，细胞核一系列特征，光滑度，质地等
x = data.iloc[:,2:] #第二列后面的所有数据```

### 2.1.3、分出训练集和测试集

```from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2) #使用20%的数据来测试数据```

## 2.3、声明算法并学习

```knn = KNeighborsClassifier()
# 算法学习455个样本
knn.fit(x_train, y_train)```

```KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')```

## 2.4、结果预测

```# 使用算法进行预测
y_pre = knn.predict(x_test)
y_pre```

```array(['B', 'B', 'B', 'B', 'B', 'M', 'M', 'B', 'B', 'M', 'B', 'B', 'M',
'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'B', 'M', 'M', 'B', 'B',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'M',
'B', 'M', 'M', 'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'M', 'B',
'B', 'B', 'B', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'M', 'B', 'M',
'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'B', 'B', 'B',
'M', 'B', 'M', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M',
'M', 'M', 'B', 'M', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'M',
'B', 'B', 'B', 'M', 'B', 'B', 'M', 'B', 'B', 'M'], dtype=object)```

## 2.5、概率预测

`knn.predict_proba(x_test) # proba概率`

```array([[1. , 0. ],
[0.8, 0.2],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
......
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[0.6, 0.4],
[0. , 1. ]])```

## 2.6、预测值与真实值对比

```# 真实情况进行比对，验证算法好不好
display(y_pre)
# 刚才保留，真实的数据
display(y_test.values)```

```array(['B', 'B', 'B', 'B', 'B', 'M', 'M', 'B', 'B', 'M', 'B', 'B', 'M',
'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'B', 'M', 'M', 'B', 'B',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'M',
'B', 'M', 'M', 'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'M', 'B',
'B', 'B', 'B', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'M', 'B', 'M',
'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'B', 'B', 'B',
'M', 'B', 'M', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M',
'M', 'M', 'B', 'M', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'M',
'B', 'B', 'B', 'M', 'B', 'B', 'M', 'B', 'B', 'M'], dtype=object)

array(['B', 'B', 'B', 'B', 'B', 'M', 'M', 'M', 'B', 'M', 'B', 'B', 'B',
'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'B', 'M', 'M', 'B', 'B',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'M', 'M', 'M',
'B', 'M', 'M', 'M', 'B', 'B', 'B', 'B', 'M', 'M', 'M', 'M', 'B',
'B', 'B', 'B', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'M', 'B', 'M',
'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'B', 'B', 'B', 'B',
'M', 'B', 'M', 'B', 'M', 'B', 'M', 'B', 'B', 'B', 'B', 'B', 'M',
'B', 'M', 'B', 'M', 'M', 'B', 'B', 'B', 'B', 'B', 'M', 'B', 'M',
'M', 'B', 'B', 'M', 'B', 'B', 'M', 'B', 'M', 'M'], dtype=object)```

## 2.7、计算预测的准确率

### 方法一：均值法

`(y_pre == y_test.values).mean()`

`0.9298245614035088`

### 方法二：knn.score()

`knn.score(x_test, y_test) #x_test是用来测试的数据， y_test是真实的数据`

`0.9298245614035088`