yolov2 object detector doesn't work

1 view (last 30 days)
Mario
Mario on 1 Sep 2024
Commented: Vivek Akkala on 1 Oct 2024
Hi all.
I'm trying to build and train a yolov2 network, following the tutorial in this website. My network has to detect bycicles.
I have a problem: bboxes and labels after training are always empties. Moreover, when I preprocess my data and I show the result, the rectangles move and doesn't point out the correct object, such as in this image:
I don't know what can I do. I'm training my network with 300 images.
This is my code:
function B = augmentData(A)
% Apply random horizontal flipping, and random X/Y scaling. Boxes that get
% scaled outside the bounds are clipped if the overlap is above 0.25. Also,
% jitter image color.
B = cell(size(A));
I = A{1};
sz = size(I);
if numel(sz)==3 && sz(3) == 3
I = jitterColorHSV(I,...
"Contrast",0.2,...
"Hue",0,...
"Saturation",0.1,...
"Brightness",0.2);
end
% Randomly flip and scale image.
tform = randomAffine2d("XReflection",true,"Scale",[1 1.1]);
rout = affineOutputView(sz,tform,"BoundsStyle","CenterOutput");
B{1} = imwarp(I,tform,"OutputView",rout);
% Sanitize boxes, if needed. This helper function is attached as a
% supporting file. Open the example in MATLAB to access this function.
A{2} = helperSanitizeBoxes(A{2});
% Apply same transform to boxes.
[B{2},indices] = bboxwarp(A{2},tform,rout,"OverlapThreshold",0.25);
B{3} = A{3}(indices);
% Return original data only when all boxes are removed by warping.
if isempty(indices)
B = A;
end
end
function data = preprocessData(data, inputSize)
data{1} = imresize(data{1}, inputSize(1:2));
% Convert grayscale images to RGB by replicating the single channel
if size(data{1}, 3) == 1
data{1} = cat(3, data{1}, data{1}, data{1});
end
% Resize the bounding boxes
scale = inputSize(1:2) ./ size(data{1}, 1:2);
data{2} = bboxresize(data{2}, scale);
end
function boxes = helperSanitizeBoxes(boxes, ~)
persistent hasInvalidBoxes
valid = all(boxes > 0, 2);
if any(valid)
if ~all(valid) && isempty(hasInvalidBoxes)
% Issue one-time warning about removing invalid boxes.
hasInvalidBoxes = true;
warning('Removing ground truth bouding box data with values <= 0.')
end
boxes = boxes(valid,:);
end
end
arch = imageDatastore(".\immaginiBici");
%imageLabeler(arch)
rng(0);
shuffledIndices = randperm(height(gTruth300));
idx = floor(0.6 * length(shuffledIndices) );
trainingIdx = 1:idx;
trainingDataTbl = gTruth300(shuffledIndices(trainingIdx),:);
validationIdx = idx+1 : idx + 1 + floor(0.1 * length(shuffledIndices) );
validationDataTbl = gTruth300(shuffledIndices(validationIdx),:);
testIdx = validationIdx(end)+1 : length(shuffledIndices);
testDataTbl = gTruth300(shuffledIndices(testIdx),:);
imdsTrain = imageDatastore(trainingDataTbl{:,"imageFilename"});
bldsTrain = boxLabelDatastore(trainingDataTbl(:,"Bicicletta"));
imdsValidation = imageDatastore(validationDataTbl{:,"imageFilename"});
bldsValidation = boxLabelDatastore(validationDataTbl(:,"Bicicletta"));
imdsTest = imageDatastore(testDataTbl{:,"imageFilename"});
bldsTest = boxLabelDatastore(testDataTbl(:,"Bicicletta"));
trainingData = combine(imdsTrain,bldsTrain);
validationData = combine(imdsValidation,bldsValidation);
testData = combine(imdsTest,bldsTest);
data = read(trainingData);
I = data{1};
bbox = data{2};
annotatedImage = insertShape(I,"rectangle",bbox);
annotatedImage = imresize(annotatedImage,2);
figure
imshow(annotatedImage)
augmentedTrainingData = transform(trainingData,@augmentData);
augmentedData = cell(4,1);
for k = 1:4
data = read(augmentedTrainingData);
augmentedData{k} = insertShape(data{1},"rectangle",data{2});
reset(augmentedTrainingData);
end
figure
montage(augmentedData,"BorderSize",10)
imageInputSize = [300 300 3];
preprocessedTrainingData = transform(augmentedTrainingData,@(data)preprocessData(data,imageInputSize));
preprocessedValidationData = transform(validationData,@(data)preprocessData(data,imageInputSize));
data = read(preprocessedTrainingData);
data = read(preprocessedTrainingData);
I = data{1};
bbox = data{2};
annotatedImage = insertShape(I,"rectangle",bbox);
annotatedImage = resize(annotatedImage,imageInputSize);
figure
imshow(annotatedImage)
net = mobilenetv2();
lgraph = layerGraph(net);
imgLayer = imageInputLayer(imageInputSize,"Name","input_1")
lgraph = replaceLayer(lgraph,"input_1",imgLayer);
featureExtractionLayer = "block_12_add";
index = find(strcmp({lgraph.Layers(1:end).Name},featureExtractionLayer));
lgraph = removeLayers(lgraph,{lgraph.Layers(index+1:end).Name});
filterSize = [3 3];
numFilters = 96;
detectionLayers = [
convolution2dLayer(filterSize,numFilters,"Name","yolov2Conv1","Padding", "same", "WeightsInitializer",@(sz)randn(sz)*0.01)
batchNormalizationLayer("Name","yolov2Batch1")
reluLayer("Name","yolov2Relu1")
convolution2dLayer(filterSize,numFilters,"Name","yolov2Conv2","Padding", "same", "WeightsInitializer",@(sz)randn(sz)*0.01)
batchNormalizationLayer("Name","yolov2Batch2")
reluLayer("Name","yolov2Relu2")
]
numClasses = 1;
numAnchors = 5
[anchorBoxes, meanIoU] = estimateAnchorBoxes(preprocessedTrainingData, numAnchors)
numPredictionsPerAnchor = 5;
numFiltersInLastConvLayer = numAnchors*(numClasses+numPredictionsPerAnchor);
detectionLayers = [
detectionLayers
convolution2dLayer(1,numFiltersInLastConvLayer,"Name","yolov2ClassConv",...
"WeightsInitializer", @(sz)randn(sz)*0.01)
yolov2TransformLayer(numAnchors,"Name","yolov2Transform")
yolov2OutputLayer(anchorBoxes,"Name","yolov2OutputLayer")
]
lgraph = addLayers(lgraph,detectionLayers);
lgraph = connectLayers(lgraph,featureExtractionLayer,"yolov2Conv1");
options = trainingOptions("sgdm", ...
"MiniBatchSize",16, ....
"InitialLearnRate",1e-3, ...
"MaxEpochs",20, ...
"CheckpointPath",tempdir, ...
"ValidationData",preprocessedValidationData);
[detector,info] = trainYOLOv2ObjectDetector(preprocessedTrainingData,lgraph,options);
I = imread("fotoProva.jpg");
I = imresize(I,imageInputSize(1:2));
[bboxes,labels] = detect(detector,I, "Threshold", 0.1);
I = insertObjectAnnotation(I,"rectangle",bboxes,labels);
figure
imshow(I)
preprocessedTestData = transform(testData,@(data)preprocessData(data,imageInputSize));
detectionResults = detect(detector, preprocessedTestData, "Threshold", 0);
metrics = evaluateObjectDetection(detectionResults,preprocessedTestData);
classID = 1;
precision = metrics.ClassMetrics.Precision{classID};
recall = metrics.ClassMetrics.Recall{classID};
figure
plot(recall,precision)
xlabel("Recall")
ylabel("Precision")
grid on
title(sprintf("Average Precision = %.2f",metrics.ClassMetrics.mAP(classID)))

Answers (1)

Shivansh
Shivansh on 2 Sep 2024
Hello Mario,
The problem appears to be with the preprocessing step in the modeling. The "preprocessData" function should correctly scale both the image and the bounding boxes. You can also make sure that the bounding boxes are correctly resized and aligned with the image dimensions after the data augmentation step using the "augmentData" function.
If the issue persists, share the code for "preprocessData" along with a few sample images.
You can refer to the following documentation for more information:
  1. yolov2ObjectDetector: https://www.mathworks.com/help/vision/ref/yolov2objectdetector.html.
  2. Augment images for deep learning workflows: https://www.mathworks.com/help/deeplearning/ug/image-augmentation-using-image-processing-toolbox.html.
  3. imageDataAugmentor: https://www.mathworks.com/help/deeplearning/ref/imagedataaugmenter.html.
I hope the above information helps in resolving the issue.
  2 Comments
Mario
Mario on 2 Sep 2024
Edited: Mario on 2 Sep 2024
Hi, I modified the "preprocess data" function and now it works. The network finds bycicles in images, but average precision is about 0.6 and the bycicle isn't totally in the rectangle. What can I do?
This is an example:
Vivek Akkala
Vivek Akkala on 1 Oct 2024
Hi Mario,
You might consider adding more training images and retraining the newtork or updating the detector from YOLO v2 to YOLO v3 or YOLO v4.

Sign in to comment.

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!