how to train a bag of words for pattern recognition problems in neural network
1 view (last 30 days)
hello I am using bag of words concept to encode images as 500 visual words to represent as feature vector.
I was partitioning the data set using 70 /30 percent criteria for training and testing. I was creating two bag of words and for training and testing and also encoding them with their respective results.This I think is a fundamental mistake as the vocabulary encoding has to be done from the whole image data set that is 400 images and not 320 and 80 or 280 and 120 images (70/30).
So I created an entire bag of words for 400 images total and with a default vocabulary of 500 visual words.And then I encoded 280 images for the training set from it and then 20 query images for testing phase from it (from the testing partition).
Is it method correct or do i need to separately create two bag of words?
Greg Heath on 7 Jul 2016
Edited: Greg Heath on 14 Aug 2016
The general assumptions are
1. training and nontraining (i.e., validation, test and unseen data)
have the same summary statistics
2. Statistics of the training set are used to design the net.
3. Investigations of nonstationary statistics are implemented to
explain significant differences between performances
on training and nontraining data.
Hope this helps
Thank you for formally accepting my answer