Press "Enter" to skip to content

### PCA方法使用

```from sklearn.decomposition import PCA
model = PCA(n_components = None, copy = True, whiten = False, svd_solver = 'auto')```

`n_components` 表示需要保留主成分（特征）的数量
`copy=` 表示针对原始数据降维还是针对原始数据副本降维，False表示针对原始数据
`whiten=` 白化表示将特征之间的相关性降低，并使得每个特征具有相同的方差
`svd_solver=` 表示奇异值分解SVD的方法。有4个参数，分别是： `auto``full``arpack``randomized`

```import numpy as np
from sklearn.decomposition import PCA
data = np.array([
[1,2],
[3,4],
[5,6],
[7,8]
]) # Create 2-dimensional array
new_data = PCA(n_components = 1).fit_transform(data) # Reduces dimensions to 1 and returns a value
print(data) # Output raw data
print(new_data) # Output the data after dimension reduction```

```[[1 2]
[3 4]
[5 6]
[7 8]]
[[ 4.24264069]
[ 1.41421356]
[-1.41421356]
[-4.24264069]]```

### 手写数字识别聚类

```from sklearn import datasets # Import Dataset package
import matplotlib.pyplot as plt
%matplotlib inline
# Load Datasets
digits_data = datasets.load_digits()
# Draw a grayscale map of the first five handwritten Numbers in the datasets
for index, image in enumerate(digits_data.images[:5]):
plt.subplot(2, 5, index + 1)
plt.imshow(image, cmap = plt.cm.gray_r, interpolation = 'nearest')
plt.show()```

```from sklearn import decomposition
from sklearn.cluster import KMeans
# Load datasets
digits_data = datasets.load_digits()
X = digits_data.data
y = digits_data.target
# PCA Reduce the data to 2 dimensions
model = decomposition.PCA(n_components = 2)
reduce_data = model.fit_transform(X)```

```# create K-Means model & input data
model = KMeans(n_clusters = 10)
model.fit(reduce_data)
# Calculate the decision boundary in the clustering process
x_min, x_max = reduce_data[:, 0].min() - 1, reduce_data[:, 0].max() + 1
y_min, y_max = reduce_data[:, 1].min() - 1, reduce_data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .05), np.arange(y_min, y_max, .05))
result = model.predict(np.c_[xx.ravel(), yy.ravel()])
# Draws the decision boundary
result = result.reshape(xx.shape)
plt.figure(figsize = (10, 5))
plt.contourf(xx, yy, result, cmap = plt.cm.Greys)
plt.scatter(reduce_data[:, 0], reduce_data[:, 1], c = y, s = 15)
# Draw the cluster center point
center = model.cluster_centers_
plt.scatter(center[:, 0], center[:, 1], marker = 'p', linewidths = 2,
color = 'b', edgecolors = 'w', zorder = 20)
# image paramters setting
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)```