Interesting. I tried to see what happens with matrix of different sizes:
n = round(logspace(1, 6, 30));
t = [];
figure
for k = 1 : numel(n)
a = sprand(n(k),n(k),5/n(k));
v = rand(n(k),1);
tic; for i = 1:1e3, b = a*v; end; t(k,1) = toc;
tic; for i = 1:1e3, c = a'*v; end; t(k,2) = toc;
loglog(n(1:k), t(:,1), 'o-', n(1:k), t(:,2), 's-') ; grid on ;
xlabel('n') ; ylabel('T_{CPU}') ;
legend('a*v', 'a''*v');
drawnow;
end
Clearly something happens close to n = 10000 (more precisely, on my PC between n = 10002 and n = 10003, guess why this value!?):
My very personal hypothesis is that until n = 10002 the two multiplications are executed in the same way, by using a trivial algorithm. After, the transposition and multiplication are combined in a single operation.
The interest of this approach is that, as in MATLAB both full and sparse matrix are stored column-wisely (= like in Fortran), multiplicating the transpose of A by a vector v boils up into executing a kind of column-column multiplication, which is more efficient because if reduces cache misses.