### R语言应用统计1 主成分分析

```install.packages("HSAUR2")
data("USairpollution", package = "HSAUR2")```

`cor(USairpollution[,-1])`

```temp        manu       popul        wind      precip     predays
temp     1.00000000 -0.19004216 -0.06267813 -0.34973963  0.38625342 -0.43024212
manu    -0.19004216  1.00000000  0.95526935  0.23794683 -0.03241688  0.13182930
popul   -0.06267813  0.95526935  1.00000000  0.21264375 -0.02611873  0.04208319
wind    -0.34973963  0.23794683  0.21264375  1.00000000 -0.01299438  0.16410559
precip   0.38625342 -0.03241688 -0.02611873 -0.01299438  1.00000000  0.49609671
predays -0.43024212  0.13182930  0.04208319  0.16410559  0.49609671  1.00000000```

```panel.hist <- function(x, ...) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h\$breaks; nB <- length(breaks)
y <- h\$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col="grey", ...)
}
USairpollution\$negtemp <- USairpollution\$temp * (-1)
USairpollution\$temp <- NULL
pdf("Lot0.pdf") #将图片以pdf形式导出到当前工作路径
pairs(USairpollution[,-1], diag.panel = panel.hist,
pch = ".", cex = 1.5)
dev.off() #结束导出```

```usair_pca <- princomp(USairpollution[,-1], cor = TRUE)

`cor=TRUE` 代表在计算中需要用到相关性矩阵，因为在定性分析的时候我们发现六个变量之间确实存在一定的相关性，所以需要用到，如果大家在尝试其他数据集，发现相关性系数矩阵非常接近单位矩阵，就可以用 `cor=FALSE` ，如果变量中存在常值变量，就只能用 `cor=FALSE``loadings=TRUE` 代表在展示结果时同时展示各个主成分的载荷。代码输出如下

```Importance of components:
Comp.1    Comp.2    Comp.3    Comp.4     Comp.5      Comp.6
Standard deviation     1.4819456 1.2247218 1.1809526 0.8719099 0.33848287 0.185599752
Proportion of Variance 0.3660271 0.2499906 0.2324415 0.1267045 0.01909511 0.005741211
Cumulative Proportion  0.3660271 0.6160177 0.8484592 0.9751637 0.99425879 1.000000000
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
temp     0.330  0.128  0.672  0.306  0.558  0.136
manu    -0.612  0.168  0.273 -0.137 -0.102  0.703
popul   -0.578  0.222  0.350               -0.695
wind    -0.354 -0.131 -0.297  0.869  0.113
precip         -0.623  0.505  0.171 -0.568
predays -0.238 -0.708        -0.311  0.580```

C o m p 1 = 0.330 t e m p − 0.612 m a n u − 0.578 p o p u l − 0.354 w i n d − 0.238 p r e d a y s Comp1 = 0.330temp-0.612manu-0.578popul \\-0.354wind-0.238 predays C o m p 1 = 0 . 3 3 0 t e m p − 0 . 6 1 2 m a n u − 0 . 5 7 8 p o p u l 0 . 3 5 4 w i n d − 0 . 2 3 8 p r e d a y s

```install.packages('MVA')
library(MVA)```

```pdf("Lot1.pdf")
pairs(usair_pca\$scores[,1:4], ylim = c(-6, 4), xlim = c(-6, 4),
panel = function(x,y, ...) {
text(x, y, abbreviate(row.names(USairpollution)),
cex = 0.6)
}) #[,1:4]代表分析前四个主成分
dev.off()```

`bvbox` 的作用是画二维的箱线图，在外圈虚线以外的点被认为是异常值，可以发现大部分数据点的信息都是可以被这四个主成分所代表的。

```usair_reg <- lm(SO2 ~ usair_pca\$scores, data = USairpollution)
summary(usair_reg)```

```Call:
lm(formula = SO2 ~ usair_pca\$scores, data = USairpollution)
Residuals:
Min      1Q  Median      3Q     Max
-23.004  -8.542  -0.991   5.758  48.758
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)              30.049      2.286  13.146 6.91e-15 ***
usair_pca\$scoresComp.1   -9.942      1.542  -6.446 2.28e-07 ***
usair_pca\$scoresComp.2   -2.240      1.866  -1.200  0.23845
usair_pca\$scoresComp.3    0.375      1.935   0.194  0.84752
usair_pca\$scoresComp.4   -8.549      2.622  -3.261  0.00253 **
usair_pca\$scoresComp.5  -15.176      6.753  -2.247  0.03122 *
usair_pca\$scoresComp.6   39.271     12.316   3.189  0.00306 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.64 on 34 degrees of freedom