`pip install pillow scikit-learn numpy`

## 图像基本概念

```from PIL import Image
im = Image.open("csdn.png").convert("L")
im.show()  # 显示图像```

## 验证码字符分割

```def split_and_save(path):
pix = np.array(Image.open(path).convert("L"))
# threshold image
pix = (pix > 100) * 255
col_ranges = [
[5, 5 + 8],
[14, 14 + 8],
[23, 23 + 8],
[32, 32 + 8]
]
# split and save
for col_range in col_ranges:
letter = pix[:, col_range[0]: col_range[1]]
im = Image.fromarray(np.uint8(letter))
save_path = "./letters/" + str(uuid.uuid4()) + ".png"
im.save(save_path)```

## 机器学习之KNN算法

### 算法描述

K近邻算法的定义十分简单，在百度百科上有这样的解释：如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。

## 验证码识别

```def load_dataset():
X = []
y = []
for i in range(60):
path = "./dataset/%d%d.png" % (i / 6, i % 6 + 1)
pix = np.array(Image.open(path).convert("L"))
X.append(pix.reshape(8*20))
y.append(i/6)
return np.array(X), np.array(y)```

```X, y = load_dataset()
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X, y.astype('uint8'))```

```def split_letters(path):
pix = np.array(Image.open(path).convert("L"))
# threshold image
pix = (pix > 100) * 255
col_ranges = [
[5, 5 + 8],
[14, 14 + 8],
[23, 23 + 8],
[32, 32 + 8]
]
letters = []
for col_range in col_ranges:
letter = pix[:, col_range[0]: col_range[1]]
letters.append(letter.reshape(8*20))
return letters
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python recognizer.py <image_filename>")
letters = split_letters(sys.argv[1])
print(knn.predict(letters))```