How to represent gray scale images as affine subspaces?
Show older comments
How to represent gray scale images as affine subspaces?
4 Comments
M
on 23 Oct 2023
Matt J
on 23 Oct 2023
M
on 24 Oct 2023
Walter Roberson
on 27 Oct 2023
This is not a topic I know anything about.
Answers (4)
Image Analyst
on 23 Oct 2023
0 votes
I don't know what you mean. What's the context? What do you mean by "model"? What do you mean by "affine subspaces"? Do you just want to warp or spatially transform the image?
If you have any more questions, then attach your data and code to read it in with the paperclip icon after you read this:
1 Comment
One way, I suppose would be to train an affine neural network with the Deep Learning Toolbox, e.g.,
layers=[imageinputLayer(120,160,1);
convolution2dLayer([120,160],N) )
regressionLayer];
XTrain=images;
YTrain=zeros(1,1,N,size(XTrain,4));
net=trainNetwork(XTrain, YTrain, layers,.... )
but you would need a truly huge number of images and good regularization for the training to be well-posed. You should probably look at augmentedImageDatastore.
34 Comments
M
on 23 Oct 2023
this is a normal CNN
I don't know why you'd call it "normal". There are no ReLUs or pooling layers, and the filter kernel is the size of the whole image (you could have equivalently used a fullyconnectedLayer as well). The weights of the CNN will provide equations for the best fitting N-dimensional affine subspace to your image data set.
If you do not want an equation matrix, what form should the result take?
The paper you've posted is about image classification, so I assume that's your task as well. As I understand @Matt J's approach, the idea is that you would separate your images into classes and train a separate neural network for every image class. The weights of each network give you the affine subspace that each of your classes belong to within the Grassmannian manifold, and the output of the network will be near zero for any image within that subspace. Therefore, if you want to classify a new image, you pass it through each of the networks and see which network gives the smallest output.
I have 3 classes , so I have to separate each class in image set and save each weight after training?
Or you can just save the net object that trainNetwork gives you.
How did you reach to this conclusion, what is the mathematics behind it?
For some matrix A and vector b, an affine subspace is the set of vectors x satisfying A*x+b=0. The Grassmanian manifold, as I understand it, is the set of all {A,b} pairs defining subspaces of a fixed dimension k. The neural network above implements A*x(:)+b where x is an input image and A and b are the training weights. We want to find the weights such that A*x(:)+b=0 over each class of images.
do you mean that all the labels will be zeros?
No, the network responses will be zeros. It's a regression network.
What is the advange of doing this?
I don't know that there is an advantage. You brought the Sharma article to the party. Does it say there should be an advantage relative to conventional CNNs?
M
on 24 Oct 2023
This example ran for me, but I wouldn't expect you to run it exactly this way. You probably want to set up an augmentedDataStore, as I mentioned before. You will need to read examples of that in the documentation.
N=10;
layers=[imageInputLayer([120,160,1])
fullyConnectedLayer(N)
regressionLayer];
XTrain=rand(120,160,1,5);
YTrain=zeros(1,1,N,size(XTrain,4));
options = trainingOptions('adam', ...
'MiniBatchSize',200, ...
'MaxEpochs',10, ...
'InitialLearnRate',1e-3, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',20, ...
'ValidationFrequency',200, ...
'Shuffle','every-epoch', ...
'Plots','training-progress');
net=trainNetwork(XTrain, YTrain, layers,options);
Could you please tell me based on what do we choose the number of N?
No, I've never seen this kind of classification done before. Doesn't Sharma make some sort of choice of what subspace dimensions the images are supposed to live in?
How can I see the outputs?
Also, How can we extract the pair of (A,B) from the neural net so I can use them in further analysis?
Once the network is trained, A will be the Weights property of the fullyConectedLayer and b will be the Bias property.
M
on 24 Oct 2023
Well, in general, we can write the estimation of A,b as the norm minimization problem,

where X is a matrix whose rows are your training images (from one class). In principal, there are a number of solution approaches to this when X and A are of small dimension. However, you seemed to imply that X will be a very big matrix, probably too big to store in RAM. Indeed also, you will need a large number of images depending on how many unknown variables A(i,j) b(i) that you are expecting to use to represent the subspaces. Ideally, you would want size(X,1) to be considerably greater than N=size(A,1).
The advantage of using trainNetwork is that it is uniquely suited to large data: it allows your X data to be kept in image data stores instead of in RAM and breaks the optimization down into steps which use only small sub-chunks of X . It also has regularization methods built into it, which is helpful when you don't have a lot of data.
M
on 25 Oct 2023
I would suggest first testing on simulated low-dimensional image data that you know obeys an affine model perfectly,
images=randn(3,2)*randn(2,1000) + randn(3,1); %3x1 images
These toy images live in
, so we can verify whether the NN find the correct result using simple plane-fitting, e.g., by downloading planarFit() from here,
p=planarFit(images);
A=p.normal;
b=-p.distance;
norm(A*images+b) %very close to zero
plot(p); legend Images
I don't understand what you did and what this plot represents. Is this generated from your real images? Your real images are not 3x1. They are 120x160.
In any case, if your images are in the same class, but do not lie in or near to a common affine subspace, then your model fails, and the method fails.
M
on 25 Oct 2023
Matt J
on 25 Oct 2023
Well, the neural network training determines that. Assuming the training is successful (and that is an important assumption), the training responses will be close to zero.
M
on 25 Oct 2023
Matt J
on 25 Oct 2023
The output of the network. Remember as well that "close to zero" is a relative term. I don't think we expect your image classes to fit the model without errors, and we don't know how big the errors will be. We just have to hope that it is small enough to allow the classes to be distinguished.
Here's a working example of the plane fitting, but with a neural network. I had to turn of L2-regularization to get it to work. Not sure what that will mean for your real use case.
images=randn(3,2)*randn(2,1000) + randn(3,1); %3x1 images
images=reshape(images,3,1,1,[]);
N=1;
layers=[imageInputLayer([3,1,1],'Normalization','none')
fullyConnectedLayer(N,'WeightL2Factor',0,'BiasL2Factor',0 )
regressionLayer];
XTrain=images;
YTrain=zeros(1,1,N,size(XTrain,4));
options = trainingOptions('adam', ...
'MiniBatchSize',100, ...
'MaxEpochs',50, ...
'InitialLearnRate',1, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',20, ...
'ValidationFrequency',200, ...
'Shuffle','every-epoch');
close all force
net=trainNetwork(XTrain, YTrain, layers,options);
A=net.Layers(2).Weights; s=norm(A);
A=A/s;
b=net.Layers(2).Bias/s;
norm(A*images(:,:)+b)
p=planarFit.groundtruth(images(:,:), A,-b);
plot(p)
M
on 26 Oct 2023
Matt J
on 26 Oct 2023
What makes you think they're high?
The training curves look pretty good to me. But you can't use rescale-to-one normalization. That ruins the affineness of the network. Also, the normalization of the weights and biases will be important.
A=net.Layers(2).Weights;
s=vecnorm(A,2,2);
A=A./s;
b=net.Layers(2).Bias./s;
Matt J
on 26 Oct 2023
You can always try...
M
on 26 Oct 2023
Matt J
on 26 Oct 2023
You have more experience than I do. As I said before, I've never seen this classification strategy before.
M
on 23 Nov 2023
Matt J
on 23 Nov 2023
As opposed to what? What else might we have used?
Well, in general, we can write the estimation of A,b as the norm minimization problem, 

If X can be fit in RAM, you could just use svd() to solve it
N=14;
X=images(:,:)';
vn=vecnorm(X,inf,1);
[~,~,V]=svd([X./vn, ones(height(X),1)] , 0);
Abt=V(:,end+1-N:end)./vn';
A=Abt(1:end-1,:)';
b=Abt(end,:)';
s=vecnorm(A,2,2);
[A,b]=deal(A./s, b./s);
43 Comments
M
on 30 Oct 2023
You'll probably have to downsample the images so that you have more samples than pixels. I've used sepblockfun from this FEX download,
to do so.
load('XT.mat')
XT=sepblockfun(XT,[3,3],'mean');
size(XT)
N=14;
X=reshape(XT,[], size(XT,4)).';
size(X)
vn=max([vecnorm(X,inf,1),1],1);
[~,~,V]=svd([X, ones(height(X),1)]./vn , 0);
Abt=V(:,end+1-N:end)./vn';
A=Abt(1:end-1,:)';
b=Abt(end,:)';
s=vecnorm(A,2,2);
[A,b]=deal(A./s, b./s);
PercentError = norm(A*X.'+b)/norm(X.')*100
M
on 1 Nov 2023
vn=max([vecnorm(X,inf,1),1],1); %For normalizing X
[~,~,V]=svd([X, ones(height(X),1)]./vn , 0); %Solve equations [X,1]*[A';b']=0
Abt=V(:,end+1-N:end)./vn'; %Pick best N solutions and undo normalization
A=Abt(1:end-1,:)'; %Separate [A';b'] into A and b parts.
b=Abt(end,:)';
M
on 2 Nov 2023
Matt J
on 2 Nov 2023
You need to pick N solutions because we want A to have N rows.
You need to undo the normalization so that the A,b solution will work on the original, unnormalized image data.
Apparently, svd() does support tall arrays,
M
on 28 Nov 2023
Matt J
on 28 Nov 2023
I would convert X to tall, not XTrain.
Maybe as follows? I don't know how much RAM you have.
N=4;
XTrain = Data_S;
X=reshape(XTrain,[], size(XTrain,4)).';
size(X)
vn=max([vecnorm(X,inf,1),1],1);
%Process as tall
T=tall([X, ones(height(X),1)]./vn);
clear X
[~,~,V]=svd( T , 0);
clear T
Abt=gather( V(:,end+1-N:end) ) ./vn';
A=Abt(1:end-1,:)';
b=Abt(end,:)';
s=vecnorm(A,2,2);
[A,b]=deal(A./s, b./s);
M
on 28 Nov 2023
M
on 28 Nov 2023
Matt J
on 28 Nov 2023
Use tall() only when T is created
M
on 28 Nov 2023
M
on 28 Nov 2023
M
on 28 Nov 2023
M
on 29 Nov 2023
Matt J
on 29 Nov 2023
Not sure, but if it is hanging, it is for hardware-specific reasons. Maybe try it on a different computer?
Matt J
on 5 Dec 2023
Is (A*XTrain'+b) a subspace and (A*XTest+b) a subspace?
The subspace is defined by the pair of matrices {A,b}, but I don't know how the Grassman distance is defined nor how the {A,b} representation would be used to compute it.
Why did you consider V in the computions not U, [~,~,V]=svd( T , 0);
V contains the (approximate) null vectors of T.
but when I tried to compute A'*A it didnt give me the identity which is supposed to give Identity
I don't know why you think it should.
There purpose is to judge that a matrix is belong to the subspace or not(Classification)
If so, then maybe an alternative distance metric could be,
distance(x)=norm(A*x+b)
M
on 5 Dec 2023
M
on 5 Dec 2023
Failed, it gave me less norm for images that don't belong to the same class compared to the norm for images that belong to the same class
Then either you have insufficient training data or you need to increase N.
[A, b0] orthogonal affine coordinates if [A, b0] ∈ R n×(k+1) , A TA = I, A T b0 = 0
That's not what A and b are in this context.
As well I tried different number of N, but I didn't find improvement in the performance
Then perhaps the images simply do not follow an affine model. Or, the dimension of the subspace is very low and requires a very large N (but if so that will be computationally problematic).
So what is the A and b in this context? Are not the direction and origin?
No, they are simply matrices of equation coefficients. The equation for the subspace is A*x=b.
You could also try a somewhat modified pre-normalization from what we had before,
N=4;
XTrain = tall(Data_S);
X=reshape(XTrain,[], size(XTrain,4)).';
mu=mean(X,1); %<-----
X=X-mu; %<-----
vn=max([vecnorm(X,inf,1),1],1);
[~,~,V]=svd([X, ones(height(X),1)]./vn , 0);
Abt=V(:,end+1-N:end)./vn';
A=Abt(1:end-1,:)';
b=Abt(end,:)';
b=b+A*mu(:); %<-----
s=vecnorm(A,2,2);
[A,b]=deal(A./s, b./s);
M
on 5 Dec 2023
Torsten
on 5 Dec 2023
If you would in short explain how images are usually given, how this representation can be transformed to a representation by an affine subspace and how you plan to cluster images for which the distance between the subspace representations is "near", maybe more participants in this forum could help.
Torsten
on 5 Dec 2023
Ok, with this explanation I think Matt will remain the only participant in the discussion :-)
Matt J
on 5 Dec 2023
I am thinking to have a subspace for each class
It's important to keep in mind that this could be a bad modeling assumption. If you are getting bad results because the subspaces are not truly affine, we cannot help you overcome that.
So you say you have 3 classes for your images that are known right from the beginning.
Say you determine the affine subspace for each 49000 images that best represents your image and you compute the mutual distance by SVD between these 49000 affine subspaces (which would give a 49000x49000 matrix). Now say you cluster your images according to this distance matrix into 3 (similar) clusters derived from the distance matrix.
The question you should ask yourself is: would these three clusters resemble the 3 classes that you think the images belong to right at the beginning ?
If the answer is no and if you consider the 3 classes from the beginning as fixed, then the distance measure via SVD is not adequate for your application.
So how Can I get the direction and origins so I can compute the grassman distance?
Here is another variant that gives a Basis/Origin description of the subspace.
N=100; %estimated upper bound on subspace dimension
X=reshape(XTrain,[], size(XTrain,4)); %<----no tranpose
mu=mean(X,2);
X=X-mu;
[Q,~,~]=qr(X , 0);
Basis=Q(:,1:N); %Basis direction vectors
Origin=mu-Basis*(Basis.'*mu); %Orthogonalize
18 Comments
Just one question: Is it really true that in the end, you want to characterize the difference between two images by one single number, namely their distance in the grassman metric ? Just naively thought: can this ever work for such a complex thing like an image ? Or are the images themselves quite similar with only slight differences that are to be worked out ?
why did you decide to take into consideration only the first N columns of Q
The final columns of a QRP decomposition contribute the least to the decomposition, and might be attributable to noise. For example, here is a wide matrix X of rank N=2 as well as a noise-corrupted version, Xnoisy:
X=repmat( rand(4,2) ,1,4)
Xnoisy=X + 1e-4*randn(size(X))
When we take the QRP decomposition of X, we see that the last two rows of R are essentially 0, which means that only linear combinations of the first 2 columns of Q are used in the decomposition to regenerate the columns of X. The final two columns of Q can be discarded. This makes sense, because we know that X is rank-2.
[Q,R,~]=qr(X,"econ"); R
When we do this with Xnoisy, the final two rows are still quite small, but the effect of the noise is that they are not as close to zero as they should be. Therefore, the final two columns of Q need to be discarded based on some other criterion besides R(i,:)=0.
[Q,R]=qr(Xnoisy, "econ"); R
You don't necessarily have to choose N manually, though. You could have used some threshold on the diagonal values of R, e.g.,
N=find(abs(diag(abs(R))> 0.01*max(abs(R(:)))) ,1,'last')
M
on 6 Dec 2023
Torsten
on 6 Dec 2023
this is not a new approach for image classification!
Many papers in the literature used the affine grassmannian, grassman distance, grassman kernal, etc.. for image classification!
It was just a question. From my naive view without background in Image Processing, it's hard to believe that one number could suffice to classify images in similar groups.
But I am stuck in how can I test a one image whether it belongs to a certain affine subspace
But if both images can be represented by an affine subspace, we derived the formula to compute the distance of these two subspaces. If the distance is 0, they belong to the same affine subspace. Or is my train of thought too simple ?
Torsten
on 6 Dec 2023
Didn't we have the formula for the distance of two affine spaces defined by (A1,b1) and (A2,b2) ?
Here, (A1,b1) is the best affine subspace you could get from your training with the 40000 images and (A2,b2) is the affine subspace for the test image.
M
on 6 Dec 2023
Also, what about the Basis and origin of the QR, can we apply the grassman distance and kernel on them?
@M You're the one who told us that you could. That's why you asked us how to calculate them. You wrote earlier "So how Can I get the direction and origins so I can compute the grassman distance? ".
M
on 6 Dec 2023
Note that there is a reason that I separated my A,b computation proposal and my Basis/Origin proposal into separate answers. They are very different.
The A,b description of a subspace involves finding the null vectors of a matrix. The equation A*X+b=0 or equivalently [X.',1]*[A,b].'=0 is the same as saying that [A,b].' is in the null space of the matrix [X.',1]. In order to find [A,b], we need to compute the null vectors of [X.',1], which are traditionally given by the final V columns of its SVD.
Conversly in the Basis/Origin description, we are searching for a set of basis vectors for the column space of X. As I explained above, these vectors lie in the initial columns of Q in the QRP decomposition of X.
M
on 7 Dec 2023
is there here a criteria for selcting the N final V columns of its SVD as you suggested for N initial columns of Q?
The dimension of the subspace that you fit to X will be NumPixels-N, where NumPixels is the total number of pixels in one image.
Also, Regarding Basis/Origin description, can you give me an idea how do we usually use this information for classification?
No, pursuing it was your idea. You said it would help you compute the Grassman distance, whatever that is.
Categories
Find more on Get Started with Statistics and Machine Learning Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

