Press "Enter" to skip to content

大神分享系列|计算机视觉从0到1,入门图像分类任务

作者: 金帆  天池的数据科学家、 浙江科技学院 大三

 

我是浙江科技学院的大三学生金帆,专业为数据科学与大数据技术,对计算机视觉非常感兴趣,课余时间会参加相关的竞赛,现在是一名天池的数据科学家。

 

「座右铭是」:学习使我快乐,快乐才想学习。

 

一、背景

 

自从2012年AlexNet网络在ILSVRC2012比赛上取得冠军后,展现出了深度卷积网络在图像任务上的惊人表现,便掀起了对CNN研究的热潮,也是如今深度学习和AI迅猛发展的重要原因。这些年来,越来越多的卷积神经网络被提出来,在ImageNet数据集上也取得了越来越出色的效果。比较经典的有AlexNet、VGG、InceptionNet、ResNet、DenseNet、EfficientNet等等。目前主流的编程语言和AI框架也越多越多并越来越完善,对于初学者,需要具备熟练使用框架完成图像分类任务的能力。

 

二、学习目标

 

基础

 

 

了解图像分类任务

 

使用pytorch实现图像分类

 

 

进阶

 

 

使用pretrainedmodels库

 

使用albumentations库

 

 

实战

 

 

尝试一个比赛

 

 

三、图像分类任务概述

 

定义:图像分类是指根据一定的分类规则将图像自动分到一组预定义类别中的过程。

 

方法:这里主要讲如何使用深度学习技术进行图像分类

 

评价指标:准确率、召回率、f1_score等常见的分类指标

 

参考:https://mp.weixin.qq.com/s?__biz=MzA3NDIyMjM1NA==&mid=2649030111&idx=1&sn=77e67f92dbf172bcf5bac96576864782&chksm=871343a2b064cab4e05f5380345b51dc8f14e8f09d0c789e9218df828445685bc2cacfc378da&scene=21#wechat_redirect

 

四、使用pytorch实现图像分类

 

任务:对玫瑰花和向日葵进行分类

 

数据集描述:训练集有20张玫瑰花图片和20张向日葵图片,验证集有10张玫瑰花图片和10张向日葵图片

 

项目地址:https://github.com/jinfanhahaha/tianchi_CV_by_jinfan/tree/master

 

实验过程 4.1 导入相关的库

 

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torch.nn.functional as F
import torchvision.transforms as transforms
import torch.utils.data as data
import os
import random
from PIL import Image
from torchvision import models
# 设置随机种子
seed = 2020
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)            # 为CPU设置随机种子
torch.cuda.manual_seed(seed)       # 为当前GPU设置随机种子
torch.cuda.manual_seed_all(seed)   # 为所有GPU设置随机种子
os.environ['PYTHONHASHSEED'] = str(seed) # 为了禁止hash随机化,使得实验可复现。

 

4.2 获取文件路径和类别标签

 

# 需要改成自己实际的路径
TRAIN_ROSES = '../input/demofortianchi/demo/data/train/roses/'
TRAIN_SUNFLOWERS = '../input/demofortianchi/demo/data/train/sunflowers/'
VAL_ROSES = '../input/demofortianchi/demo/data/val/roses/'
VAL_SUNFLOWERS = '../input/demofortianchi/demo/data/val/sunflowers/'
# 令roses类别为0,sunflower类别为1
train_roses = [TRAIN_ROSES + p for p in os.listdir(TRAIN_ROSES)]
train_sunflowers = [TRAIN_SUNFLOWERS + p for p in os.listdir(TRAIN_SUNFLOWERS)]
val_roses = [VAL_ROSES + p for p in os.listdir(VAL_ROSES)]
val_sunflowers = [VAL_SUNFLOWERS + p for p in os.listdir(VAL_SUNFLOWERS)]
train_paths = train_roses + train_sunflowers
val_paths = val_roses + val_sunflowers
train_labels = [0 for _ in range(len(train_roses))] + [1 for _ in range(len(train_sunflowers))]
val_labels = [0 for _ in range(len(val_roses))] + [1 for _ in range(len(val_sunflowers))]
# 需要将训练集进行打散,这样子训练的效果的会更好
train_paths = np.array(train_paths)
train_labels = np.array(train_labels)
p = np.random.permutation(len(train_labels))
train_paths = train_paths[p]
train_labels = train_labels[p]
print(train_paths)
print(val_paths)
print(train_labels)
print(val_labels)

 

4.3 制作data loader类

 

# data loader
class ImageDataset(data.Dataset):
def __init__(self, paths, labels, transform=None):
        self.paths = paths
        self.labels = labels
        self.transform = transform
def __getitem__(self, index):
        img_path = self.paths[index]
        label = self.labels[index]
        image = Image.open(img_path).convert('RGB')
if self.transform is not None:
            image = self.transform(image)
return image, int(label)
def __len__(self):
return len(self.paths)

 

# 定义数据管道,
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(30),
    transforms.Resize((512, 512)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
valid_transform = transforms.Compose([
    transforms.Resize((512, 512)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
trainset = ImageDataset(train_paths, train_labels, train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=4)
validset = ImageDataset(val_paths, val_labels, valid_transform)
validloader = torch.utils.data.DataLoader(validset, batch_size=4, shuffle=False, num_workers=4)

 

4.4 搭建模型

 

# 根据环境自动选择是否使用GPU
device = torch.device('cuda' if torch.cuda.is_available() else "cpu")
# 使用Alexnet网络进行图像分类
alexnet = models.alexnet(pretrained=True)
# 将alexnet最后一层全连接输出改为两个神经元,因为是二分类
alexnet.classifier[6].out_features = 2
alexnet.to(device)
# 打印alexnet网络结构
print(alexnet)

 

4.5 配置训练参数

 

# 使用交叉熵损失进行训练
criterion = nn.CrossEntropyLoss()
# 使用随机梯度下降法进行优化模型,并且设置学习率为0.001
optimizer = optim.SGD(alexnet.parameters(), lr=0.001, momentum=0.9)
# 训练10个epoch
EPOCH = 10

 

4.6 训练

 

for epoch in range(EPOCH):
  running_loss = 0
  train_correct = 0
  train_total = 0
for i, data in enumerate(trainloader):
      images, labels = data[0].to(device), data[1].to(device, dtype=torch.int64)
      optimizer.zero_grad()
      outputs = alexnet(images)
      _, predicted = torch.max(outputs.data, 1)
      train_total += labels.size(0)
      train_correct += (predicted == labels).sum().item()
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()
      running_loss += loss.item()
if i % 2 == 1:
          print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2))
          running_loss = 0.0
  train_accuracy = 100 * train_correct / train_total
  print('train dataset accuracy %.4f' % train_accuracy)

 

4.7 验证

 

test_correct = 0
test_total = 0
res = []
with torch.no_grad():
for data in validloader:
        images, labels = data[0].to(device), data[1].to(device, dtype=torch.int64)
        outputs = alexnet(images)
        _, predicted = torch.max(outputs.data, 1)
for p in predicted:
            res.append(int(p))
print("Val Accuracy: ", np.sum(np.array(res)==np.array(val_labels)) / len(val_labels))
 ```
<b>4.8 保存模型</b>
```python
PATH = 'model.pth'
torch.save(alexnet.state_dict(), PATH)

 

五、使用pretrainedmodels库

 

参考链接:https://github.com/Cadene/pretrained-models.pytorch

 

使用它的原因:比torchvision拥有更丰富的模型

 

安装:pip install pretrainedmodels

 

利用pretrainedmodels库调节模型结构 以alexnet网络结构为例,只使用它的backbone

 

class Net(nn.Module):
def __init__(self, alexnet):
        super(Net, self).__init__()
        self.backbone = alexnet._features
        self.fc = nn.Linear(1024, 2)
# 模型进行前向传导看这个函数
def forward(self, x):
        batch_size, C, H, W = x.shape
        x = self.backbone(x)
        x = F.adaptive_avg_pool2d(x, 2).reshape(batch_size, -1)
        x = self.fc(x)
return x

 

# 根据环境自动选择是否使用GPU
device = torch.device('cuda' if torch.cuda.is_available() else "cpu")
# 使用Alexnet网络进行图像分类
alexnet = pretrainedmodels.__dict__['alexnet'](num_classes=1000, pretrained='imagenet')
# 将原始alexnet送入自定义的类中
alexnet = Net(alexnet)
alexnet.to(device)
# 打印alexnet网络结构
print(alexnet)

 

修改模型是不难的,因为他里面的结构,就像是被字典与python列表包装起来了,只要按照调用字典和python列表的方式调用里面的部分结构进行修改即可,多尝试几次,慢慢就熟悉了。报错是非常正常的,因为自己修改的网络结构连接的时候神经元个数前后大概率是不相等的,根据报错信息修改神经元个数往往可以解决大部分的错误!(使用torchvision库改结构的方法也是一样的)

 

六、使用albumentations库

 

参考文档:https://albumentations.ai/docs/

 

使用它的原因:拥有更加丰富的数据增强方法

 

安装:pip install albumentations

 

使用albumentations数据增强库 这里提供一种效果比较好的数据增强组合方法

 

# data loader
class ImageDataset(data.Dataset):
def __init__(self, paths, labels, transform=None):
        self.paths = paths
        self.labels = labels
        self.transform = transform
def __getitem__(self, index):
        img_path = self.paths[index]
        label = self.labels[index]
        image = cv2.imread(img_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 要注意的是这里,和前面是不一样的
if self.transform is not None:
            res = self.transform(image=image)
            image = res['image']
return image, int(label)
def __len__(self):
return len(self.paths)

 

train_transform = A.Compose([
         A.Resize(height=512, width=512),
         A.OneOf([
             A.RandomBrightnessContrast(brightness_limit=0.1, contrast_limit=0.1, p=0.5),
             A.RandomBrightness(limit=0.1, p=0.5),
         ], p=1),
         A.GaussNoise(),
         A.HorizontalFlip(p=0.5),
         A.RandomRotate90(p=0.5),
         A.ShiftScaleRotate(rotate_limit=1, p=0.5),
         FancyPCA(alpha=0.1, p=0.5),
# blur
         A.OneOf([
             A.MotionBlur(blur_limit=3), A.MedianBlur(blur_limit=3), A.GaussianBlur(blur_limit=3),
         ], p=0.5),
# Pixels
         A.OneOf([
             A.IAAEmboss(p=0.5),
             A.IAASharpen(p=0.5),
         ], p=1),
# Affine
         A.OneOf([
             A.ElasticTransform(p=0.5),
             A.IAAPiecewiseAffine(p=0.5),
         ], p=1),
         A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
         ToTensorV2(p=1.0),
     ])
valid_transform = A.Compose([
           A.Resize(height=512, width=512),
           A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
           ToTensorV2(p=1.0),
       ])
trainset = ImageDataset(train_paths, train_labels, train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=4)
validset = ImageDataset(val_paths, val_labels, valid_transform)
validloader = torch.utils.data.DataLoader(validset, batch_size=4, shuffle=False, num_workers=4)

 

七、实战kaggle-植物病虫害检测比赛

 

比赛地址:https://www.kaggle.com/c/plant-pathology-2020-fgvc7/overview

 

环境:kaggle-kernel

 

利用前面学过的知识实战代码 7.1 安装相关库

 

! pip install albumentations
! pip install pretrainedmodels

 

7.2 导入库

 

from PIL import Image
from torchvision import models
from sklearn.metrics import f1_score
from sklearn.model_selection import KFold, StratifiedKFold
from collections import Counter
from albumentations.pytorch import ToTensorV2
from albumentations import FancyPCA
import albumentations as A
import pretrainedmodels
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torch.nn.functional as F
import torchvision.transforms as transforms
import torch.utils.data as data
import cv2
import os
import random
import pandas as pd
import json
# 设置随机种子
seed = 2020
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)            # 为CPU设置随机种子
torch.cuda.manual_seed(seed)       # 为当前GPU设置随机种子
torch.cuda.manual_seed_all(seed)   # 为所有GPU设置随机种子
os.environ['PYTHONHASHSEED'] = str(seed) # 为了禁止hash随机化,使得实验可复现。

 

7.3 获取路径以及类别,并划分训练验证集

 

DIR_INPUT = '/kaggle/input/plant-pathology-2020-fgvc7/images/'
data_df = pd.read_csv('../input/plant-pathology-2020-fgvc7/train.csv')
test_df = pd.read_csv('../input/plant-pathology-2020-fgvc7/test.csv')
data_paths = [DIR_INPUT + id for id in data_df['image_id']]
test_paths = [DIR_INPUT + id for id in test_df['image_id']]
data_labels = []
for i in range(len(data_df)):
   label = data_df.loc[i, ['healthy', 'multiple_diseases', 'rust', 'scab']].values
   data_labels.append(int(np.argwhere(label==1)))
# 打散
data_paths = np.array(data_paths)
data_labels = np.array(data_labels)
p = np.random.permutation(len(data_labels))
data_paths = data_paths[p]
data_labels = data_labels[p]
# 划分训练验证集
split_k = len(data_paths) // 8
train_paths = data_paths[split_k:]
train_labels = data_labels[split_k:]
val_paths = data_paths[:split_k]
val_labels = data_labels[:split_k]

 

7.4 制作data loader类

 

# data loader
class ImageDataset(data.Dataset):
def __init__(self, paths, labels, transform=None):
       self.paths = paths
       self.labels = labels
       self.transform = transform
def __getitem__(self, index):
       img_path = self.paths[index]
       label = self.labels[index]
       image = cv2.imread(img_path + ".jpg")
       image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if self.transform is not None:
           res = self.transform(image=image)
           image = res['image']
return image, int(label)
def __len__(self):
return len(self.paths)
# test loader
class TestDataset(data.Dataset):
def __init__(self, paths, transform=None):
       self.paths = paths
       self.transform = transform
def __getitem__(self, index):
       img_path = self.paths[index]
       image = cv2.imread(img_path + ".jpg")
       image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if self.transform is not None:
           res = self.transform(image=image)
           image = res['image']
return image
def __len__(self):
return len(self.paths)

 

train_transform = A.Compose([
        A.Resize(height=512, width=512),
        A.OneOf([
            A.RandomBrightnessContrast(brightness_limit=0.1, contrast_limit=0.1, p=0.5),
            A.RandomBrightness(limit=0.1, p=0.5),
        ], p=1),
        A.GaussNoise(),
        A.HorizontalFlip(p=0.5),
        A.RandomRotate90(p=0.5),
        A.ShiftScaleRotate(rotate_limit=1, p=0.5),
        FancyPCA(alpha=0.1, p=0.5),
# blur
        A.OneOf([
            A.MotionBlur(blur_limit=3), A.MedianBlur(blur_limit=3), A.GaussianBlur(blur_limit=3),
        ], p=0.5),
# Pixels
        A.OneOf([
            A.IAAEmboss(p=0.5),
            A.IAASharpen(p=0.5),
        ], p=1),
# Affine
        A.OneOf([
            A.ElasticTransform(p=0.5),
            A.IAAPiecewiseAffine(p=0.5),
        ], p=1),
        A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
        ToTensorV2(p=1.0),
    ])
valid_transform = A.Compose([
           A.Resize(height=512, width=512),
           A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
           ToTensorV2(p=1.0),
       ])
test_transform = A.Compose([
            A.Resize(height=512, width=512),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
            ToTensorV2(p=1.0),
        ])
trainset = ImageDataset(train_paths, train_labels, train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=16, shuffle=True, num_workers=4)
validset = ImageDataset(val_paths, val_labels, valid_transform)
validloader = torch.utils.data.DataLoader(validset, batch_size=16, shuffle=False, num_workers=4)
testset = ImageDataset(test_paths, valid_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=16, shuffle=False, num_workers=4)

 

7.5 构建模型

 

class Net(nn.Module):
def __init__(self, alexnet):
      super(Net, self).__init__()
      self.backbone = alexnet._features
      self.fc = nn.Linear(1024, 4)
def forward(self, x):
      batch_size, C, H, W = x.shape
      x = self.backbone(x)
      x = F.adaptive_avg_pool2d(x, 2).reshape(batch_size, -1)
      x = self.fc(x)
return x

 

# 根据环境自动选择是否使用GPU
device = torch.device('cuda' if torch.cuda.is_available() else "cpu")
# 使用Alexnet网络进行图像分类
alexnet = pretrainedmodels.__dict__['alexnet'](num_classes=1000, pretrained='imagenet')
# 将原始alexnet送入自定义的类中
alexnet = Net(alexnet)
alexnet.to(device)
# 打印alexnet网络结构
print(alexnet)

 

7.6 配置训练参数

 

# 使用交叉熵损失进行训练
criterion = nn.CrossEntropyLoss()
# 使用随机梯度下降法进行优化模型,并且设置学习率为0.001
optimizer = optim.SGD(alexnet.parameters(), lr=0.001, momentum=0.9)
# 训练2个epoch
EPOCH = 2

 

7.7 训练

 

for epoch in range(EPOCH):
    running_loss = 0
    train_correct = 0
    train_total = 0
for i, data in enumerate(trainloader):
        images, labels = data[0].to(device), data[1].to(device, dtype=torch.int64)
        optimizer.zero_grad()
        outputs = alexnet(images)
        _, predicted = torch.max(outputs.data, 1)
        train_total += labels.size(0)
        train_correct += (predicted == labels).sum().item()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
if i % 20 == 19:
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/20))
            running_loss = 0.0
    train_accuracy = train_correct / train_total
    print('train dataset accuracy %.4f' % train_accuracy)
    test_correct = 0
    test_total = 0
    res = []
with torch.no_grad():
for data in validloader:
            images, labels = data[0].to(device), data[1].to(device, dtype=torch.int64)
            outputs = alexnet(images)
            _, predicted = torch.max(outputs.data, 1)
for p in predicted:
                res.append(int(p))
    print("Val Accuracy: ", np.sum(np.array(res)==np.array(val_labels)) / len(val_labels))

 

7.8 生成提交文件

 

test_preds = None
submission_df = pd.read_csv('../input/plant-pathology-2020-fgvc7/sample_submission.csv')
with torch.no_grad():
for data in testloader:
        images = data.to(device)
        outputs = alexnet(images)
if test_preds is None:
            test_preds = outputs.data.cpu()
else:
            test_preds = torch.cat((test_preds, outputs.data.cpu()), dim=0)
submission_df[['healthy', 'multiple_diseases', 'rust', 'scab']] = torch.softmax(test_preds, dim=1)
submission_df.to_csv('submission.csv', index=False)

 

八、参考资料

 

awesome-image-classification :https://github.com/weiaicunzai/awesome-image-classification

 

阿里天池太阳黑子分类--赛道一方案分享(0.908第十名):https://github.com/DLLXW/TianChi-Sunsport

 

pytorch 图像分类竞赛框架:https://github.com/spytensor/pytorch_img_classification_for_competition

 

Plant Pathology 2020 – Pytorch:https://www.kaggle.com/pestipeti/plant-pathology-2020-pytorch

 

Be First to Comment

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注