What am I doing wrong?
- My training data is a table with 6 columns: column 1 is the image filename and columns 2-6 are the bounding boxes of objects to detect.
- Only one class is displayed in each image.
- I have used resnet50 and vgg16.
- I have used adam and sgdm.
- I have tried mini-batch sizes from 8-64.
Do I need to add a column 7 in the data that has the bounding box(es) of background items?
The objects I am trying to detect and classify are red 4x5 checkerboard patterns with blue digits 0-5 in the center. If you filter out the green and blue channels, you get the checkerboard. if you filter out the green and red channels, you get the digit. I have also tried training the SSD with just the digit patterns but that did not seem to make any difference.
At best the detector is able to correctly detect any checkerboard pattern only 15% of the time and the correct checkerboard/digit combination only 25% of that, giving me a success rate of only 3.75%.
As a comparison, using an SSD trained to just detect the 6 checkerboard patterns as a single class results in at least an 80% success rate. Using a second detector to then identify the digit results in a 75% total success rate, but at the cost of 2/3 of my performance in frames per second.
Hence the need for a multiclass SSD detector.