本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.
CV计算机视觉核心07-目标检测
设计检测类算法的output层,可用已知条件有:
1、检测问题的输出是什幺?怎幺用数字来表示?
输入是一个矩阵,输出是(x, y, w, h)其中x和y表示目标的左上角坐标,w和h表示目标的长和宽,因此输出是用四个这值来表示的。如果有多个目标N就是N(x, y, w, h),这样输出就是矩阵。这里需要注意的是(x, y, w, h)只能表示一个矩形,无法表示其他的四边形。
因此为了表示多种四边形,通过四个顶点的方式来表达多种四边形(p1, p2, p3, p4)=> (x1, y1, x2, y2, x3, y3, x4, y4)。因此用八个值表示一个四边形。
除了上面的两种方式外,还可以通过(Cx, Cy, w, h)其中Cx和Cy表示的是矩形的中心,w和h表示的是长和宽。
还可以通过添加angle角度,让矩形有一定的角度旋转。
2、我们已经掌握了分类层的设计方法,是否有用?
有用,通过窗口的内容,进行分类判断,从而确定是否为我们需要目标检测的内容。
目标检测:滑动窗口分类方法。
这种滑动窗口分类主要是遍历性的分类,此为窗口的大小不能固定,有各种尺度的窗口。
检测的性能取决于分类模型的性能,分类模型性能好,检测就好。
目标检测本质上是分类器,那如何训练?如何组织样本训练呢?
分类器本质上就是一个二分类器,将图片送给模型训练就好。
这里会遇到一个问题就是样本不均衡问题。这个问题在之后的检测问题中都是需要我们来处理的,在训练过程中正负样本比例如何控制。
此外还需要注意,所有窗口大小的图片,都需要resize到统一大小来在模型中进行training,要统一尺度来training。
做一次detect检测,我们需要考虑准确率和速度,因此我们在滑动窗口时,可以stride大一些,这样速度就会提高,但是准确度就会下降,因此需要找到一个平衡。
分类器:输入matrix => 输出vector
下图中输出的c表示置信度confidence。
3、其实我们有个隐藏技能:拟合层的设计方法。
把分类层的onehot换成一般向量。
检测问题:直接预测出目标位置(置信度c. x, y, w, h)
以上设计可以解决单类目标的检测,多类目标怎幺办?
解决分类问题:增加一个onehot
如果检测10类,那幺及时fc[5+10],其中10就可以同onehot表示。
解决多类别问题:多组输出表示多目标多类别。
多输出与位置强相关:体特征的相对应图片整体位置没有变化,即体特征位置不变性。
多组输出与位置强相关:按照空间位置来映射,N = W’ * H’
同一个位置,有时候出现大目标,有时候出现小目标,怎幺办?
如果大目标出现,可以通过中心点确定是在哪个位置。
可以设计两个不同大小的grid,一个网格是控制大目标,一个网格是控制小目标,这样两个loss分开做反向传播。
这个方法就是yolo:
前项计算:
训练的网络直接预测了98个框,如何得到最终的3个目标?
方法1:聚类,聚成3类,然后在这3类中,取置信度得分最大的框
如果两个目标本身就比较接近呢?
两个目标与剩下的一个目标比较远呢?
如果不知道到底有几个目标呢?
方法1失效。
方法2:可以通过p(object),即c表示置信度confidence,p概率最高的来进行筛选,即score分数的大小。
如果不知道到底有几个目标呢?
方法1和2失效。
方法3:比如说score前三的框,我们无法确定三个框是三个目标、还是两个目标、还是一个目标,只有绘制显示出来人眼才好查看。如果设计一个算法,代替我们人眼来检测,算法算一下这个框是否代表一个目标,如果是一个目标,这个框就不输出,找一下同一个目标score最大的输出;如果不是一个目标,把表示一个目标的框弄到一起,score一个最大的输出。
假设:两个框重合度很高,默认他们表示的是一个物体,如果假设为真,我们可以根据框的重合度来完成“聚类”,两两遍历,认为重合度较大的框们表示一个物体。
如何衡量重合度?loU交并比,交集/并集 [0到1]
两两遍历算重合度,计算量有点大,如何减少计算量:非极大值抑制NMS
以上去除冗余框的过程:非极大值抑制NMS。用于我们不知道有多少个框的时候。
Yolo:
总结检测网络的输出层:检测层的设计。
回归坐标值+onehot分类:
fc=>fc => ( C, x, y, w, h, one-hot )
fc作用是提取全局特征。
Batch Normalization : BN层(可以解决离散的问题)
测试时用指数加权平均的方法。
先BN再激活,对不易受训练影响的激活前进行BN。
因为激活函数越是两边越离散,因此需要压缩到均值为0方差为1的情况。
此外还需要注意的是,BN并不完全是均值位0方差为1,因为对于sigmoid激活函数而言,0处是呈现线性的情况,非线性能力较弱,因此需要有一定的偏移和不同情况,使得有大量非线性表现能力。因此可以通过我们loss的需求调整改变BN层的均值和方差。
relu解决sigmoid的梯度离散问题,relu梯度为1,解决了梯度离散的问题。
检侧问题的评估方式:mAP Mean Average Precision
评价一个检测模型需要三个指标,一个是precision 正确/所有输出的框;和另一个是recall 正确/应该输出的宽;以及threshold阈值。
如果要满足recall,那幺precision就会下降。
同时使用三个指标来评估检测模型,是比较困难的。因此需要将这三个指标统一起来。
Recall: TP(P表示positive,识别为正的;T表示true)
yolo的损失函数:
这里是降采样5次,2的5次方32,因此输入到网络中的大小最好是32的倍数,且降采样后最好输出的是奇数个大小,比如下面输入的是416416,最后5次降采样为13
13(416/32=13)。奇数个会有对应的中心点。
这里是预测的相对位置:
既能在小尺度的检测到,也能在大尺度的检测到:
下图3个不同大小的scale,以及每个scale有3个不同大小的后选框:
1313的做一个上采样得到26
26,再跟原来中间的26*26做一个融合拼接:
大于某个阈值的都可以取出来,这样可以预测多标签的任务:
初始版本v0的yolo:
这里所使用的数据集下载地址:https://download.csdn.net/download/m0_37755995/86224061
用于创建自定义数据集的加载PennFudanDataset_main.py
import os import numpy as np import torch from PIL import Image class PennFudanDataset(object): def __init__(self, root, transforms): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages")))) self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks")))) # dataset[0] def __getitem__(self, idx): # load images ad masks img_path = os.path.join(self.root, "PNGImages", self.imgs[idx]) mask_path = os.path.join(self.root, "PedMasks", self.masks[idx]) img = Image.open(img_path).convert("RGB") # note that we haven't converted the mask to RGB, # because each color corresponds to a different instance # with 0 being background mask = Image.open(mask_path) # convert the PIL Image into a numpy array mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # first id is the background, so remove it obj_ids = obj_ids[1:] # split the color-encoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) # convert everything into a torch.Tensor boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = { } target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target # len(dataset) def __len__(self): return len(self.imgs) import transforms as T def get_transform(train): transforms = [] transforms.append(T.ToTensor()) if train: transforms.append(T.RandomHorizontalFlip(0.5)) return T.Compose(transforms)
v0yolo_model.py
#coding:utf-8 import torch import torch.nn as nn import torch.utils.model_zoo as model_zoo import torch.nn.functional as F import math class VGG(nn.Module): def __init__(self): super(VGG,self).__init__() # the vgg's layers #self.features = features cfg = [64,64,'M',128,128,'M',256,256,256,'M',512,512,512,'M',512,512,512,'M'] layers= [] batch_norm = False in_channels = 3 for v in cfg: if v == 'M': layers += [nn.MaxPool2d(kernel_size=2,stride = 2)] else: conv2d = nn.Conv2d(in_channels,v,kernel_size=3,padding = 1) if batch_norm: layers += [conv2d,nn.Batchnorm2d(v),nn.ReLU(inplace=True)] else: layers += [conv2d,nn.ReLU(inplace=True)] in_channels = v # use the vgg layers to get the feature self.features = nn.Sequential(*layers) # 全局池化 self.avgpool = nn.AdaptiveAvgPool2d((7,7)) # 决策层:分类层 self.classifier = nn.Sequential( nn.Linear(512*7*7,4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096,4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096,1000), ) for m in self.modules(): if isinstance(m,nn.Conv2d): nn.init.kaiming_normal_(m.weight,mode='fan_out',nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias,0) elif isinstance(m,nn.BatchNorm2d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias,1) elif isinstance(m,nn.Linear): nn.init.normal_(m.weight,0,0.01) nn.init.constant_(m.bias,0) def forward(self,x): x = self.features(x) x_fea = x x = self.avgpool(x) x_avg = x x = x.view(x.size(0),-1) x = self.classifier(x) return x,x_fea,x_avg def extractor(self,x): x = self.features(x) return x class YOLOV0(nn.Module): def __init__(self): super(YOLOV0,self).__init__() vgg = VGG() self.extractor = vgg.extractor # 这里的avgpool就相当于ROIpooling self.avgpool = nn.AdaptiveAvgPool2d((7,7)) # 决策层:检测层 self.detector = nn.Sequential( # 这里的输入要和线性层能够匹配上。 # 从25088=>4096 nn.Linear(512*7*7,4096), # 经激活函数 nn.ReLU(True), nn.Dropout(), #nn.Linear(4096,1470), # 5表示[c,x,y,w,h] nn.Linear(4096,5), ) for m in self.modules(): if isinstance(m,nn.Conv2d): nn.init.kaiming_normal_(m.weight,mode='fan_out',nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias,0) elif isinstance(m,nn.BatchNorm2d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias,1) elif isinstance(m,nn.Linear): nn.init.normal_(m.weight,0,0.01) nn.init.constant_(m.bias,0) def forward(self,x): # 用vgg提取特征 x = self.extractor(x) # x = torch.Size([1, 512, 16, 16]) # print('x_feature:',x.shape) #import pdb #pdb.set_trace() # 均值pooling,就是roipooling x = self.avgpool(x) # 减小长宽 # x_avgpool: torch.Size([1, 512, 7, 7]) # print('x_avgpool:',x.shape) x = x.view(x.size(0),-1) # x降维: torch.Size([1, 25088]) # print('x降维:',x.shape) x = self.detector(x) # x_detector: torch.Size([1, 5]) # print('x_detector:',x.shape) b,_ = x.shape #x = x.view(b,7,7,30) (不输出b,7,7,30了)=> 这里只检测一个目标[1,5],获得一个框用来直接拟合。 x = x.view(b,1,1,5) return x if __name__ == '__main__': vgg = VGG() # 这里的x是随机生成的 x = torch.randn(1,3,512,512) # 将x输入到vgg模型中 feature,x_fea,x_avg = vgg(x) # 打印输出结果 # torch.Size([1, 1000]) # torch.Size([1, 512, 16, 16]) # torch.Size([1, 512, 7, 7]) print(feature.shape) print(x_fea.shape) print(x_avg.shape) yolov0 = YOLOV0() # 注意这里是yolo的初始版本,1*1*1*5 其中5表示[c,x,y,w,h] feature = yolov0(x) # feature_size b*7*7*30 # torch.Size([1, 1, 1, 5]) print(feature.shape) print(feature)
v0yolotrain.py
#coding:utf-8 from PennFudanDataset_main import * import torch import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler from torch.autograd import Variable from torch.utils.data import DataLoader from v0yolo_model import * import cv2 import numpy as np import time import sys import os ## 数据处理 #服务器上的地址 /data/2020-722-YOLOV4-Practical-datasets/PenFudanPed # dataset地址:/Users/zhaomignming/Documents/mmteacher/datasets #datapath='/Users/zhaomignming/Documents/mmteacher/datasets/PennFudanPed' # datapath='/Users/zhaomingming/data_sets/PennFudanPed' datapath = 'zhaomingming' dataset = PennFudanDataset(datapath, get_transform(train=False)) dataset_test = PennFudanDataset(datapath, get_transform(train=False)) indices = torch.randperm(len(dataset)).tolist() #dataset = torch.utils.data.Subset(dataset, indices[:-50]) #dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) #dataset = torch.utils.data.Subset(dataset, indices[0:1]) #import pdb #pdb.set_trace() #dataset = torch.utils.data.Subset(dataset, indices[0:1]) dataset = torch.utils.data.Subset(dataset, [0]) dataset_test = torch.utils.data.Subset(dataset_test, indices[0:2]) def collate_fn(batch): return tuple(zip(*batch)) # define training and validation data loaders train_loader = torch.utils.data.DataLoader( dataset, batch_size=1, shuffle=False, num_workers=1, collate_fn=collate_fn) val_loader = torch.utils.data.DataLoader( dataset_test, batch_size=2, shuffle=False, num_workers=4, collate_fn=collate_fn) def input_process(batch): #import pdb #pdb.set_trace() #batch[0],0维是表示图片数量 batch_size=len(batch[0]) # 这里是输入,每张图片是3通道448,448 input_batch= torch.zeros(batch_size,3,448,448) for i in range(batch_size): inputs_tmp = Variable(batch[0][i]) inputs_tmp1=cv2.resize(inputs_tmp.permute([1,2,0]).numpy(),(448,448)) inputs_tmp2=torch.tensor(inputs_tmp1).permute([2,0,1]) input_batch[i:i+1,:,:,:]= torch.unsqueeze(inputs_tmp2,0) return input_batch #batch[1][0]['boxes'][0] def target_process(batch): batch_size=len(batch[0]) target_batch= torch.zeros(batch_size,1,1,5) #import pdb #pdb.set_trace() for i in range(batch_size): #只处理batch中的第一张图片 # batch[1]表示label # batch[0]表示image bbox=batch[1][i]['boxes'][0] #这里是获得图片的channel、x、y _,hi,wi = batch[0][i].numpy().shape #下面bbox是通过归一化,是一个大于0小于1的数值了 bbox = bbox/ torch.tensor([wi,hi,wi,hi]) #这里bbox置信度肯定是1。通过concat实现[c,w,h] cbbox = torch.cat([torch.ones(1),bbox]) #放到四维矩阵中 target_batch[i:i+1,:,:,:] = torch.unsqueeze(cbbox,0) return target_batch num_classes = 2 n_class = 2 batch_size = 6 epochs = 500 lr = 1e-3 momentum = 0 w_decay = 1e-5 step_size = 50 gamma = 0.5 # 定义模型 yolov0_model = YOLOV0() # import pdb # pdb.set_trace() # 定义优化算法为sdg:随机梯度下降 optimizer = optim.SGD(yolov0_model.detector.parameters(), lr=lr, momentum=momentum, weight_decay=w_decay) # 定义学习率变化策略 # 每30个epoch 学习率乘以0.5 scheduler = lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=gamma) # decay LR by a factor of 0.5 every 30 epochs # 矩阵形式写法,写法简单,但是可读性不强 def lossfunc(outputs,labels): #import pdb #pdb.set_trace() tmp = (outputs-labels)**2 return torch.sum(tmp,0).view(1,5).mm(torch.tensor([10,0.0001,0.0001,0.0001,0.0001]).view(5,1)) # 定义直接拟合的学习率,可读性强 def lossfunc_details(outputs,labels): # 判断维度 assert ( outputs.shape == labels.shape),"outputs shape[%s] not equal labels shape[%s]"%(outputs.shape,labels.shape) b,w,h,c = outputs.shape #[b,1,1,5] loss = 0 #遍历每个batch图片的每个cell的loss for bi in range(b): for wi in range(w): for hi in range(h): #import pdb #pdb.set_trace() # detect_vector=[confidence,x,y,w,h] detect_vector = outputs[bi,wi,hi] gt_dv = labels[bi,wi,hi] conf_pred = detect_vector[0] conf_gt = gt_dv[0] x_pred = detect_vector[1] x_gt = gt_dv[1] y_pred = detect_vector[2] y_gt = gt_dv[2] w_pred = detect_vector[3] w_gt = gt_dv[3] h_pred = detect_vector[4] h_gt = gt_dv[4] loss_confidence = (conf_pred-conf_gt)**2 #loss_geo = (x_pred-x_gt)**2 + (y_pred-y_gt)**2 + (w_pred**0.5-w_gt**0.5)**2 + (h_pred**0.5-h_gt**0.5)**2 loss_geo = (x_pred-x_gt)**2 + (y_pred-y_gt)**2 + (w_pred-w_gt)**2 + (h_pred-h_gt)**2 loss_tmp = loss_confidence + 0.3*loss_geo #print("loss[%s,%s] = %s,%s"%(wi,hi,loss_confidence.item(),loss_geo.item())) loss += loss_tmp return loss # train def train(): for epoch in range(epochs): ts = time.time() # 这里是直接开始train_loader了 for iter, batch in enumerate(train_loader): # 梯度清零 optimizer.zero_grad() # 取图片 inputs = input_process(batch) print('inputs.shape:',inputs.shape) # 取标注 labels = target_process(batch) print('labels:',labels) #import pdb #pdb.set_trace() # 将图片输入模型中个,获得输出。 # 获取得到输出 outputs = yolov0_model(inputs) print('outputs:',outputs) #import pdb #pdb.set_trace() #loss = criterion(outputs, labels) # 预测与真实标签计算loss # 这里传入的labels是和outputs结构一样的: loss = lossfunc_details(outputs,labels) loss.backward() optimizer.step() #print(torch.cat([outputs.detach().view(1,5),labels.view(1,5)],0).view(2,5)) if iter % 10 == 0: # print(torch.cat([outputs.detach().view(1,5),labels.view(1,5)],0).view(2,5)) print("epoch{}, iter{}, loss: {}, lr: {}".format(epoch, iter, loss.data.item(),optimizer.state_dict()['param_groups'][0]['lr'])) #print("Finish epoch {}, time elapsed {}".format(epoch, time.time() - ts)) #print("*"*30) #val(epoch) scheduler.step() # inference def val(epoch): yolov0_model.eval() total_ious = [] pixel_accs = [] for iter, batch in enumerate(val_loader): inputs = input_process(batch) target,label= target_process(batch) output = yolov1_model(inputs) output = output.data.cpu().numpy() N, _, h, w = output.shape pred = output.transpose(0, 2, 3, 1).reshape(-1, n_class).argmax(axis=1).reshape(N, h, w) if __name__ == "__main__": train()
v1版本的yolo:
用于创建自定义数据集的加载PennFudanDataset_main.py
import os import numpy as np import torch from PIL import Image class PennFudanDataset(object): def __init__(self, root, transforms): self.root = root self.transforms = transforms # load all image files, sorting them to # ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages")))) self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks")))) # dataset[0] def __getitem__(self, idx): # load images ad masks img_path = os.path.join(self.root, "PNGImages", self.imgs[idx]) mask_path = os.path.join(self.root, "PedMasks", self.masks[idx]) img = Image.open(img_path).convert("RGB") # note that we haven't converted the mask to RGB, # because each color corresponds to a different instance # with 0 being background mask = Image.open(mask_path) # convert the PIL Image into a numpy array mask = np.array(mask) # instances are encoded as different colors obj_ids = np.unique(mask) # first id is the background, so remove it obj_ids = obj_ids[1:] # split the color-encoded mask into a set # of binary masks masks = mask == obj_ids[:, None, None] # get bounding box coordinates for each mask num_objs = len(obj_ids) boxes = [] for i in range(num_objs): pos = np.where(masks[i]) xmin = np.min(pos[1]) xmax = np.max(pos[1]) ymin = np.min(pos[0]) ymax = np.max(pos[0]) boxes.append([xmin, ymin, xmax, ymax]) # convert everything into a torch.Tensor boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.ones((num_objs,), dtype=torch.int64) masks = torch.as_tensor(masks, dtype=torch.uint8) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((num_objs,), dtype=torch.int64) target = { } target["boxes"] = boxes target["labels"] = labels target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: img, target = self.transforms(img, target) return img, target # len(dataset) def __len__(self): return len(self.imgs) import transforms as T def get_transform(train): transforms = [] transforms.append(T.ToTensor()) if train: transforms.append(T.RandomHorizontalFlip(0.5)) return T.Compose(transforms)
v1yolomodel.py
#coding:utf-8 import torch import torch.nn as nn import torch.utils.model_zoo as model_zoo import torch.nn.functional as F import math class VGG(nn.Module): def __init__(self): super(VGG,self).__init__() # the vgg's layers #self.features = features cfg = [64,64,'M',128,128,'M',256,256,256,'M',512,512,512,'M',512,512,512,'M'] layers= [] batch_norm = False in_channels = 3 for v in cfg: if v == 'M': layers += [nn.MaxPool2d(kernel_size=2,stride = 2)] else: conv2d = nn.Conv2d(in_channels,v,kernel_size=3,padding = 1) if batch_norm: layers += [conv2d,nn.Batchnorm2d(v),nn.ReLU(inplace=True)] else: layers += [conv2d,nn.ReLU(inplace=True)] in_channels = v # use the vgg layers to get the feature self.features = nn.Sequential(*layers) # 全局池化 self.avgpool = nn.AdaptiveAvgPool2d((7,7)) # 决策层:分类层 self.classifier = nn.Sequential( nn.Linear(512*7*7,4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096,4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096,1000), ) for m in self.modules(): if isinstance(m,nn.Conv2d): nn.init.kaiming_normal_(m.weight,mode='fan_out',nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias,0) elif isinstance(m,nn.BatchNorm2d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias,1) elif isinstance(m,nn.Linear): nn.init.normal_(m.weight,0,0.01) nn.init.constant_(m.bias,0) def forward(self,x): x = self.features(x) x_fea = x x = self.avgpool(x) x_avg = x x = x.view(x.size(0),-1) x = self.classifier(x) return x,x_fea,x_avg def extractor(self,x): x = self.features(x) return x class YOLOV1(nn.Module): def __init__(self): super(YOLOV1,self).__init__() vgg = VGG() self.extractor = vgg.extractor self.avgpool = nn.AdaptiveAvgPool2d((7,7)) # 决策层:检测层 self.detector = nn.Sequential( nn.Linear(512*7*7,4096), nn.ReLU(True), nn.Dropout(), #nn.Linear(4096,1470), nn.Linear(4096,245), #nn.Linear(4096,5), ) for m in self.modules(): if isinstance(m,nn.Conv2d): nn.init.kaiming_normal_(m.weight,mode='fan_out',nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias,0) elif isinstance(m,nn.BatchNorm2d): nn.init.constant_(m.weight,1) nn.init.constant_(m.bias,1) elif isinstance(m,nn.Linear): nn.init.normal_(m.weight,0,0.01) nn.init.constant_(m.bias,0) def forward(self,x): x = self.extractor(x) #import pdb #pdb.set_trace() x = self.avgpool(x) x = x.view(x.size(0),-1) x = self.detector(x) # detector: torch.Size([1, 245]) 这里的245是7*7*5所得 print('detector:',x.shape) b,_ = x.shape #x = x.view(b,7,7,30) # 这里我们只有预测框,没有分类 x = x.view(b,7,7,5) #x = x.view(b,1,1,5) return x if __name__ == '__main__': vgg = VGG() x = torch.randn(1,3,512,512) feature,x_fea,x_avg = vgg(x) print(feature.shape) print(x_fea.shape) print(x_avg.shape) yolov1 = YOLOV1() feature = yolov1(x) # feature_size b*7*7*30 # feature.shape: torch.Size([1, 7, 7, 5]) print('feature.shape:',feature.shape) print(feature)
v1yolotrain.py
#coding:utf-8 from PennFudanDataset_main import * import torch import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler from torch.autograd import Variable from torch.utils.data import DataLoader from v1yolomodel import * import cv2 import numpy as np import time import sys import os ## 数据处理 #服务器上的地址 /data/2020-722-YOLOV4-Practical-datasets/PenFudanPed # dataset地址:/Users/zhaomignming/Documents/mmteacher/datasets #datapath='/Users/zhaomignming/Documents/mmteacher/datasets/PennFudanPed' datapath='/Users/zhaomingming/data_sets/PennFudanPed' # datapath = 'zhaomingming' dataset = PennFudanDataset(datapath, get_transform(train=False)) dataset_test = PennFudanDataset(datapath, get_transform(train=False)) indices = torch.randperm(len(dataset)).tolist() #dataset = torch.utils.data.Subset(dataset, indices[:-50]) #dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) #dataset = torch.utils.data.Subset(dataset, indices[0:1]) #import pdb #pdb.set_trace() #dataset = torch.utils.data.Subset(dataset, indices[0:1]) dataset = torch.utils.data.Subset(dataset, [0]) dataset_test = torch.utils.data.Subset(dataset_test, indices[0:2]) def collate_fn(batch): return tuple(zip(*batch)) # define training and validation data loaders train_loader = torch.utils.data.DataLoader( dataset, batch_size=1, shuffle=False, num_workers=1, collate_fn=collate_fn) val_loader = torch.utils.data.DataLoader( dataset_test, batch_size=2, shuffle=False, num_workers=4, collate_fn=collate_fn) def input_process(batch): #import pdb #pdb.set_trace() batch_size=len(batch[0]) input_batch= torch.zeros(batch_size,3,448,448) for i in range(batch_size): inputs_tmp = Variable(batch[0][i]) inputs_tmp1=cv2.resize(inputs_tmp.permute([1,2,0]).numpy(),(448,448)) inputs_tmp2=torch.tensor(inputs_tmp1).permute([2,0,1]) input_batch[i:i+1,:,:,:]= torch.unsqueeze(inputs_tmp2,0) return input_batch #batch[1][0]['boxes'][0] def target_process(batch,grid_number=7): #中心点落在哪个框中,这个框的执行度就是1。其余全为0。 # batch[1]表示label # batch[0]表示image batch_size=len(batch[0]) target_batch= torch.zeros(batch_size,grid_number,grid_number,5) #import pdb #pdb.set_trace() for i in range(batch_size): labels = batch[1] batch_labels = labels[i] #import pdb #pdb.set_trace() number_box = len(batch_labels['boxes']) for wi in range(grid_number): for hi in range(grid_number): # 遍历每个标注的框 for bi in range(number_box): bbox=batch_labels['boxes'][bi] _,himg,wimg = batch[0][i].numpy().shape # 框归一化一下 bbox = bbox/ torch.tensor([wimg,himg,wimg,himg]) #import pdb #pdb.set_trace() # 计算框的中心点 center_x= (bbox[0]+bbox[2])*0.5 center_y= (bbox[1]+bbox[3])*0.5 #print("[%s,%s,%s],[%s,%s,%s]"%(wi/grid_number,center_x,(wi+1)/grid_number,hi/grid_number,center_y,(hi+1)/grid_number)) # 判断中心点有没有落在当前cell中,这里是7*7*5,如果是7*7*10时,就需要判断一下是大框还是小框。 if center_x<=(wi+1)/grid_number and center_x>=wi/grid_number and center_y<=(hi+1)/grid_number and center_y>= hi/grid_number: #pdb.set_trace() cbbox = torch.cat([torch.ones(1),bbox]) # 中心点落在grid内, target_batch[i:i+1,wi:wi+1,hi:hi+1,:] = torch.unsqueeze(cbbox,0) #else: #cbbox = torch.cat([torch.zeros(1),bbox]) #import pdb #pdb.set_trace() #rint(target_batch[i:i+1,wi:wi+1,hi:hi+1,:]) #target_batch[i:i+1,wi:wi+1,hi:hi+1,:] = torch.unsqueeze(cbbox,0) return target_batch num_classes = 2 n_class = 2 batch_size = 6 epochs = 500 lr = 1e-3 momentum = 0 w_decay = 1e-5 step_size = 50 gamma = 0.5 # 定义模型 yolov1_model = YOLOV1() import pdb pdb.set_trace() # 定义优化算法为sdg:随机梯度下降 optimizer = optim.SGD(yolov1_model.detector.parameters(), lr=lr, momentum=momentum, weight_decay=w_decay) # 定义学习率变化策略 # 每30个epoch 学习率乘以0.5 scheduler = lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=gamma) # decay LR by a factor of 0.5 every 30 epochs # 矩阵形式写法,写法简单,但是可读性不强 def lossfunc(outputs,labels): #import pdb #pdb.set_trace() tmp = (outputs-labels)**2 return torch.sum(tmp,0).view(1,5).mm(torch.tensor([10,0.0001,0.0001,0.0001,0.0001]).view(5,1)) # 定义直接拟合的学习率,可读性强 def lossfunc_details(outputs,labels): # 判断维度 assert ( outputs.shape == labels.shape),"outputs shape[%s] not equal labels shape[%s]"%(outputs.shape,labels.shape) #import pdb #pdb.set_trace() b,w,h,c = outputs.shape loss = 0 #import pdb #pdb.set_trace() conf_loss_matrix = torch.zeros(b,w,h) geo_loss_matrix = torch.zeros(b,w,h) loss_matrix = torch.zeros(b,w,h) for bi in range(b): for wi in range(w): for hi in range(h): #import pdb #pdb.set_trace() # detect_vector=[confidence,x,y,w,h] detect_vector = outputs[bi,wi,hi] gt_dv = labels[bi,wi,hi] conf_pred = detect_vector[0] conf_gt = gt_dv[0] x_pred = detect_vector[1] x_gt = gt_dv[1] y_pred = detect_vector[2] y_gt = gt_dv[2] w_pred = detect_vector[3] w_gt = gt_dv[3] h_pred = detect_vector[4] h_gt = gt_dv[4] loss_confidence = (conf_pred-conf_gt)**2 #loss_geo = (x_pred-x_gt)**2 + (y_pred-y_gt)**2 + (w_pred**0.5-w_gt**0.5)**2 + (h_pred**0.5-h_gt**0.5)**2 loss_geo = (x_pred-x_gt)**2 + (y_pred-y_gt)**2 + (w_pred-w_gt)**2 + (h_pred-h_gt)**2 loss_geo = conf_gt*loss_geo loss_tmp = loss_confidence + 0.3*loss_geo #print("loss[%s,%s] = %s,%s"%(wi,hi,loss_confidence.item(),loss_geo.item())) loss += loss_tmp conf_loss_matrix[bi,wi,hi]=loss_confidence geo_loss_matrix[bi,wi,hi]=loss_geo loss_matrix[bi,wi,hi]=loss_tmp #打印出batch中每张片的位置loss,和置信度输出 print(geo_loss_matrix) print(outputs[0,:,:,0]>0.5) return loss,loss_matrix,geo_loss_matrix,conf_loss_matrix # train def train(): for epoch in range(epochs): ts = time.time() for iter, batch in enumerate(train_loader): optimizer.zero_grad() # 取图片 inputs = input_process(batch) # 取标注 labels = target_process(batch) # 获取得到输出 outputs = yolov1_model(inputs) #import pdb #pdb.set_trace() #loss = criterion(outputs, labels) loss,lm,glm,clm = lossfunc_details(outputs,labels) loss.backward() optimizer.step() #print(torch.cat([outputs.detach().view(1,5),labels.view(1,5)],0).view(2,5)) if iter % 10 == 0: # print(torch.cat([outputs.detach().view(1,5),labels.view(1,5)],0).view(2,5)) print("epoch{}, iter{}, loss: {}, lr: {}".format(epoch, iter, loss.data.item(),optimizer.state_dict()['param_groups'][0]['lr'])) #print("Finish epoch {}, time elapsed {}".format(epoch, time.time() - ts)) #print("*"*30) #val(epoch) scheduler.step() # inference def val(epoch): yolov1_model.eval() total_ious = [] pixel_accs = [] for iter, batch in enumerate(val_loader): inputs = input_process(batch) target,label= target_process(batch) output = yolov1_model(inputs) output = output.data.cpu().numpy() N, _, h, w = output.shape pred = output.transpose(0, 2, 3, 1).reshape(-1, n_class).argmax(axis=1).reshape(N, h, w) if __name__ == "__main__": train()
Be First to Comment