## 简介

PCA形象解释说明

`PCA的设计理念与此类似，它可以将高维数据集映射到低维空间的同时，尽可能的保留更多变量。`

## 开始作图

### 1. PCA 分析图本质上是散点图

```library(ggplot2)
# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]
# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)
ggplot(pca.data, aes(x = PC1, y = PC2)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95) +
theme_bw()```

### 2. 为不同类别着色

```library(ggplot2)
# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]
# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)
ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95) +
theme_bw()```

`inherit.aes` default TRUE, If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn’t inherit behaviour from the default plot specification,

```library(ggplot2)
# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]
# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)
ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95, inherit.aes = FALSE) +
theme_bw()```

### 3. 样式微调

```library(ggplot2)
# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]
# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)
# 自定义颜色
palette = c("mediumseagreen", "darkorange", "royalblue")
ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95, inherit.aes = FALSE) +
theme_bw() +
scale_color_manual(values = palette) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
labs(x = paste0("PC1: ", signif(pca.variance[1] * 100, 3), "%"),
y = paste0("PC2: ", signif(pca.variance[2] * 100, 3), "%"),
title = paste0("PCA of iris")) +
theme(plot.title = element_text(hjust = 0.5))```

## 参考

[1] Master Machine Learning With scikit-learn