1.语义分割

2.U-Net架构

3.教程

3.1. 数据预处理

3.2. 基于U-Net的语义分割

3.3. 基于迁移学习的U-Net语义分割

4.结论

5.参考文献

### 语义分割

https://youtu.be/rB1BmBOkKTw

### U-Net体系结构

U-Net是一种特定类型的卷积神经网络架构，2015年在德国弗莱堡大学计算机科学系和生物信号研究中心为生物医学图像（计算机断层扫描、显微图像、MRI扫描等）开发。

“U-Net：生物医学图像分割的卷积网络”这篇文章可以通过这里的链接访问。

# -*- coding: utf-8 -*-
"""
@author: Ibrahim Kovan
https://ibrahimkovan.medium.com/
"""
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Conv2DTranspose, BatchNormalization, Dropout, Lambda
from tensorflow.keras import backend as K
def multiclass_unet_architecture(n_classes=2, height=256, width=256, channels=3):
inputs = Input((height, width, channels))
#Contraction path
conv_1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(inputs)
conv_1 = Dropout(0.1)(conv_1)
conv_1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_1)
pool_1 = MaxPooling2D((2, 2))(conv_1)

conv_2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool_1)
conv_2 = Dropout(0.1)(conv_2)
conv_2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_2)
pool_2 = MaxPooling2D((2, 2))(conv_2)

conv_3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool_2)
conv_3 = Dropout(0.1)(conv_3)
conv_3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_3)
pool_3 = MaxPooling2D((2, 2))(conv_3)

conv_4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool_3)
conv_4 = Dropout(0.1)(conv_4)
conv_4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_4)
pool_4 = MaxPooling2D(pool_size=(2, 2))(conv_4)

conv_5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(pool_4)
conv_5 = Dropout(0.2)(conv_5)
conv_5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_5)

#Expansive path
u6 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv_5)
u6 = concatenate([u6, conv_4])
conv_6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u6)
conv_6 = Dropout(0.2)(conv_6)
conv_6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_6)

u7 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv_6)
u7 = concatenate([u7, conv_3])
conv_7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u7)
conv_7 = Dropout(0.1)(conv_7)
conv_7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_7)

u8 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv_7)
u8 = concatenate([u8, conv_2])
conv_8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u8)
conv_8 = Dropout(0.2)(conv_8)
conv_8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_8)

u9 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same')(conv_8)
u9 = concatenate([u9, conv_1], axis=3)
conv_9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u9)
conv_9 = Dropout(0.1)(conv_9)
conv_9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(conv_9)

outputs = Conv2D(n_classes, (1, 1), activation='softmax')(conv_9)

model = Model(inputs=[inputs], outputs=[outputs])
model.summary()
return model
def jacard(y_true, y_pred):
y_true_c = K.flatten(y_true)
y_pred_c = K.flatten(y_pred)
intersection = K.sum(y_true_c * y_pred_c)
return (intersection + 1.0) / (K.sum(y_true_c) + K.sum(y_pred_c) - intersection + 1.0)
def jacard_loss(y_true, y_pred):
return -jacard(y_true,y_pred)

1、输入定义为256x256x3。

2、使用16个滤波器的conv_1，可获得256x256x16的输出。使用pool_1中的Maxpooling，它将减少到128x128x16。

3、使用32个滤波器的conv_2，可获得256x256x32的输出。使用pool_2，可获得64x64x32的输出。

4、使用64个滤波器的conv_3，可获得64x64x64的输出。使用pool_3，可获得32x32x64的输出。

5、使用128个滤波器的conv_4，可获得32x32x128的输出。使用pool_4，可获得16x16x128的输出。

6、使用256个滤波器的conv_5，可获得16x16x256的输出，并从此点开始进行上采样。在滤波器数量为128和（2×2）的u6中，conv_5通过Conv2DTranspose和级联得到32x32x128的输出，级联通过u6、conv_4执行。因此，u6输出为32x32x256。使用带有128个滤波器的conv_6，它将变为32x32x128。

7、滤波器数量为64且（2×2）的u7通过应用于conv_6并将u7与conv_3串联，变为64x64x64。此操作的结果是，u7被定义为64x64x128，并在conv_7中变为64x64x64。

8、滤波器数量为32和（2×2）的u8通过应用于conv_7并将u7与conv_2串联，变为128x128x32。此操作的结果是，u8被定义为128x128x64，并使用conv_8变为128x128x32。

9、通过应用于conv_8并将u9与conv_1串联，滤波器数量为16和（2×2）的u9变为256x256x16。此操作的结果是，u9被定义为256x256x32，并通过conv_9变为256x256x16。

10、输出使用softmax激活完成分类过程，最终输出采用256x256x1的形式。

### 教程

RGB图像和标签如下图所示。该研究旨在用这种方法训练数据集，并使外部呈现的图像能够像训练数据一样进行分割。

#### 数据预处理

# -*- coding: utf-8 -*-
"""
@author: Ibrahim Kovan
https://ibrahimkovan.medium.com/
dataset: http://www.dronedataset.icg.tugraz.at/
"""
#%% Libraries
"""1"""
from architecture import multiclass_unet_architecture, jacard, jacard_loss
from tensorflow.keras.utils import normalize
import os
import glob
import cv2
import numpy as np
from matplotlib import pyplot as plt
import random
from skimage.io import imshow
from PIL import Image
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import segmentation_models as sm
from tensorflow.keras.metrics import MeanIoU
#%% Import train and mask dataset
"""2"""
train_path = r"C:\Users\ibrah\Desktop\U-Net\dataset\training_set\images/*.jpg"
def importing_data(path):
sample = []
for filename in glob.glob(path):
img = Image.open(filename,'r')
img = img.resize((256,256))
img = np.array(img)
sample.append(img)
return sample
data_train   = importing_data(train_path)
data_train = np.asarray(data_train)
def importing_data(path):
sample = []
for filename in glob.glob(path):
img = Image.open(filename,'r')
img = img.resize((256,256))
img = np.array(img)
sample.append(img)
return sample
#%% Random visualization
x = random.randint(0, len(data_train))
plt.figure(figsize=(24,18))
plt.subplot(1,2,1)
imshow(data_train[x])
plt.subplot(1,2,2)
plt.show()
#%% Normalization
"""3"""
scaler = MinMaxScaler()
nsamples, nx, ny, nz = data_train.shape
d2_data_train = data_train.reshape((nsamples,nx*ny*nz))
train_images = scaler.fit_transform(d2_data_train)
train_images = train_images.reshape(400,256,256,3)
"""4"""
labels = labels.drop(['name'],axis = 1)
labels = np.array(labels)
def image_labels(label):
image_labels = np.zeros(label.shape, dtype=np.uint8)
for i in range(24):
image_labels [np.all(label == labels[i,:],axis=-1)] = i
image_labels = image_labels[:,:,0]
return image_labels
label_final = []
label_final.append(label)
label_final = np.array(label_final)
#%% train_test
"""5"""
n_classes = len(np.unique(label_final))
labels_cat = to_categorical(label_final, num_classes=n_classes)
x_train, x_test, y_train, y_test = train_test_split(train_images, labels_cat, test_size = 0.20, random_state = 42)

1-导入库。从architecture import multiclass_unet_architecture中，定义了jacard，jacard_loss，并从上述部分导入。

2-6000×4000像素的RGB原始图像和相应标签的大小调整为256×256像素。

3-MinMaxScaler用于缩放RGB图像。

4-导入真实标签。数据集中有23个标签，并根据像素值将标签分配给图像。

5-标签数据集是用于分类的one-hot编码数据集，数据被分离为训练集和测试集。

#### 使用U-Net的语义分割（从头开始）

#%% U-Net
"""6"""
img_height = x_train.shape[1]
img_width  = x_train.shape[2]
img_channels = x_train.shape[3]
metrics=['accuracy', jacard]
def get_model():
return multiclass_unet_architecture(n_classes=n_classes, height=img_height,
width=img_width, channels=img_channels)
model = get_model()
model.summary()
history = model.fit(x_train, y_train,
batch_size = 16,
verbose=1,
epochs=100,
validation_data=(x_test, y_test),
shuffle=False)
#%%
"""7"""
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
acc = history.history['jacard']
val_acc = history.history['val_jacard']
plt.plot(epochs, acc, 'y', label='Training Jaccard')
plt.plot(epochs, val_acc, 'r', label='Validation Jaccard')
plt.title('Training and validation Jacard')
plt.xlabel('Epochs')
plt.ylabel('Jaccard')
plt.legend()
plt.show()
#%%
"""8"""
y_pred=model.predict(x_test)
y_pred_argmax=np.argmax(y_pred, axis=3)
y_test_argmax=np.argmax(y_test, axis=3)
test_jacard = jacard(y_test,y_pred)
print(test_jacard)
#%%
"""9"""
fig, ax = plt.subplots(5, 3, figsize = (12,18))
for i in range(0,5):
test_img_number = random.randint(0, len(x_test))
test_img = x_test[test_img_number]
ground_truth=y_test_argmax[test_img_number]
test_img_input=np.expand_dims(test_img, 0)
prediction = (model.predict(test_img_input))
predicted_img=np.argmax(prediction, axis=3)[0,:,:]

ax[i,0].imshow(test_img)
ax[i,0].set_title("RGB Image",fontsize=16)
ax[i,1].imshow(ground_truth)
ax[i,1].set_title("Ground Truth",fontsize=16)
ax[i,2].imshow(predicted_img)
ax[i,2].set_title("Prediction",fontsize=16)
i+=i

plt.show()

7-val_jaccard和训练过程的损失是可视化的。下图显示了val_jaccard。

8-测试数据集的Jaccard系数计算为0.5532。

9-从测试数据集中选择5幅随机图像，用训练好的算法进行预测，结果如下图所示。

#### 基于迁移学习的U-Net语义分割

#%% pre-trained model
"""10"""
BACKBONE = 'resnet34'
preprocess_input = sm.get_preprocessing(BACKBONE)
# preprocess input
x_train_new = preprocess_input(x_train)
x_test_new = preprocess_input(x_test)
# define model
model_resnet_backbone = sm.Unet(BACKBONE, encoder_weights='imagenet', classes=n_classes, activation='softmax')
metrics=['accuracy', jacard]
# compile keras model with defined optimozer, loss and metrics
print(model_resnet_backbone.summary())
history_tf=model_resnet_backbone.fit(x_train_new,
y_train,
batch_size=16,
epochs=100,
verbose=1,
validation_data=(x_test_new, y_test))
#%%
"""11"""
history = history_tf
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
acc = history.history['jacard']
val_acc = history.history['val_jacard']
plt.plot(epochs, acc, 'y', label='Training IoU')
plt.plot(epochs, val_acc, 'r', label='Validation IoU')
plt.title('Training and validation Jaccard')
plt.xlabel('Epochs')
plt.ylabel('Jaccard')
plt.legend()
plt.show()
#%%
"""12"""
y_pred_tf=model_resnet_backbone.predict(x_test)
y_pred_argmax_tf=np.argmax(y_pred_tf, axis=3)
y_test_argmax_tf=np.argmax(y_test, axis=3)
test_jacard = jacard(y_test,y_pred_tf)
print(test_jacard)
#%%
"""13"""
fig, ax = plt.subplots(5, 3, figsize = (12,18))
for i in range(0,5):
test_img_number = random.randint(0, len(x_test))
test_img_tf = x_test_new[test_img_number]
ground_truth_tf=y_test_argmax_tf[test_img_number]
test_img_input_tf=np.expand_dims(test_img_tf, 0)
prediction_tf = (model_resnet_backbone.predict(test_img_input_tf))
predicted_img_transfer_learning=np.argmax(prediction_tf, axis=3)[0,:,:]

ax[i,0].imshow(test_img_tf)
ax[i,0].set_title("RGB Image",fontsize=16)
ax[i,1].imshow(ground_truth_tf)
ax[i,1].set_title("Ground Truth",fontsize=16)
ax[i,2].imshow(predicted_img_transfer_learning)
ax[i,2].set_title("Prediction(Transfer Learning)",fontsize=16)
i+=i

plt.show()

11-val_jaccard和训练过程的丢失是可视化的。下图展示了val_jaccard。

12-测试数据集的Jaccard索引值计算为0.6545。

13-从测试数据集中选择5幅随机图像，用训练好的算法进行预测，结果如下图所示。

### 结论

"""14"""
fig, ax = plt.subplots(5, 4, figsize = (16,20))
for i in range(0,5):
test_img_number = random.randint(0, len(x_test))

test_img = x_test[test_img_number]
ground_truth=y_test_argmax[test_img_number]
test_img_input=np.expand_dims(test_img, 0)
prediction = (model.predict(test_img_input))
predicted_img=np.argmax(prediction, axis=3)[0,:,:]

test_img_tf = x_test_new[test_img_number]
ground_truth_tf=y_test_argmax_tf[test_img_number]
test_img_input_tf=np.expand_dims(test_img_tf, 0)
prediction_tf = (model_resnet_backbone.predict(test_img_input_tf))
predicted_img_transfer_learning=np.argmax(prediction_tf, axis=3)[0,:,:]

ax[i,0].imshow(test_img_tf)
ax[i,0].set_title("RGB Image",fontsize=16)
ax[i,1].imshow(ground_truth_tf)
ax[i,1].set_title("Ground Truth",fontsize=16)
ax[i,2].imshow(predicted_img)
ax[i,2].set_title("Prediction",fontsize=16)
ax[i,3].imshow(predicted_img_transfer_learning)
ax[i,3].set_title("Prediction Transfer Learning",fontsize=16)

i+=i
plt.show()

https://ibrahimkovan.medium.com/machine-learning-guideline-959da5c6f73d

#### 参考引用

O. Ronneberger, P. Fischer, and T. Brox, “LNCS 9351 — U-Net: Convolutional Networks for Biomedical Image Segmentation,” 2015, doi: 10.1007/978–3–319–24574–4_28.

A. Arnab et al. , “Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation,” IEEE Signal Process. Mag. , vol. XX, 2018.

J. Y. C. Chen, G. F. Eds, and G. Goos, 2020_Book_VirtualAugmentedAndMixedReality. 2020.

J. Maurya, R. Hebbalaguppe, and P. Gupta, “Real-Time Hand Segmentation on Frugal Head-mounted Device for Gestural Interface,” Proc. — Int. Conf. Image Process. ICIP, pp. 4023–4027, 2018, doi: 10.1109/ICIP.2018.8451213.