## Naive Bayes分类器

Naive Bayes是一个概率分类器，也就是说，在文档d中，返回所有类别c中后验概率最大的类别ĉ c^:

ĉ =argmaxP(c|d)c^=argmaxP(c|d)

P(x|y)=P(y|x)P(x)P(y)P(x|y)=P(y|x)P(x)P(y)

P(A|B)=P(AB)P(B)P(A|B)=P(A∩B)P(B)

P(A|B)P(B)=P(AB)=P(B|A)P(A)P(A|B)P(B)=P(A∩B)=P(B|A)P(A)

P(A|B)=P(B|A)P(A)P(B)P(A|B)=P(B|A)P(A)P(B)

ĉ =argmaxP(c|d)=argmaxP(d|c)P(c)P(d)c^=argmaxP(c|d)=argmaxP(d|c)P(c)P(d)

ĉ =argmaxP(c|d)=argmaxP(d|c)P(c)c^=argmaxP(c|d)=argmaxP(d|c)P(c)

ĉ =argmaxP(f1,f2,,fn|c)likelihood P(c)priorc^=argmaxP(f1,f2,…,fn|c)⏞likelihood P(c)⏞prior

• 位置无关
• P(fi|c)P(fi|c)条件独立，也称朴素贝叶斯假设

P(f1,f2,,fn|c)=P(f1|c)P(f2|c)P(fn|c)P(f1,f2,…,fn|c)=P(f1|c)P(f2|c)…P(fn|c)

CNB=argmaxP(c)fFP(f|c)CNB=argmaxP(c)∏f∈FP(f|c)

CNB=argmaxP(c)ipositionsP(wi|c)CNB=argmaxP(c)∏i∈positionsP(wi|c)

cNB=argmaxlogP(c)+ipositionslogP(wi|c)cNB=argmaxlog⁡P(c)+∑i∈positionslog⁡P(wi|c)

## 训练朴素贝叶斯分类器

P̂ (c)=NcNdocP^(c)=NcNdoc
P̂ (wi|c)=count(wi,c)wVcount(w,c)P^(wi|c)=count(wi,c)∑w∈Vcount(w,c)

P̂ (wi|c)=count(wi,c)+1wV(count(w,c)+1)=count(wi,c)+1(wVcount(w,c))+|V|P^(wi|c)=count(wi,c)+1∑w∈V(count(w,c)+1)=count(wi,c)+1(∑w∈Vcount(w,c))+|V|

TODO