Why does backslash behave differently when transposition is used in-line?

Question

0 votes

Hello, in the process of creating a problem for my students in a numerical methods class I found out that the backslash operation produces different results if you solve A^{T}x=b as "x=(A')\b" when compared to first defining "AT=A'" and then solving as "x=AT\b". An example is given below. Order of operations suggests that in the first case, A should be transposed and then the backslash operation performed, which is forced in the second case. One would expect the results to then be the same. Why is there a difference at all?

n = 15;
m = 2*n+1;
t = fliplr(cos(pi/(n)*(0:n))).';
A = [t.^(0:m);
    t.^([0,0:m-1])*diag(max(0,0:m))];
b = (1./((0:m)'+1).*((1).^((0:m)'+1)-(-1).^(((0:m)'+1))));
x = (A')\b;
AT = A';
x2 = AT\b;
difference = norm(x-x2)
difference = 1.3350e-05

2 Comments
Show None Hide None

Steven Lord on 23 Apr 2026

Open in MATLAB Online

This is indepedent of the question about the difference between those two solutions, but you might find the cospi function useful. [Or perhaps not; it can give a different answer than cos(pi*...). Would you expect those two calls to return exactly down-to-the-last-bit (DTTLB) identical answers?]

n = 15;
V = (1/n)*(0:n);
t1 = cos(pi*V);
t2 = cospi(V);
norm(t1-t2)
ans = 3.2870e-16

John D'Errico on 19 May 2026 at 14:57

Open in MATLAB Online

n = 15;
m = 2*n+1;
t = fliplr(cos(pi/(n)*(0:n))).';
A = [t.^(0:m);
    t.^([0,0:m-1])*diag(max(0,0:m))];
cond(A)
ans = 1.0901e+12
b = (1./((0:m)'+1).*((1).^((0:m)'+1)-(-1).^(((0:m)'+1))));
norm(b)
ans = 2.2073

Note that the error seen is exactly as would be expected, when a subtly different order of operations is performed. That is, we would expect to see floating point trash on the order of:

cond(A)*eps(norm(b))
ans = 4.8410e-04

which is quite roughly what you got.

Never presume that two such computations are performed in exactly the same sequence, and if they are not, then you can and often will see floating point trash creep in like this:

0.3 - 0.1 - 0.2
ans = -2.7756e-17
-0.1 - 0.2 + 0.3
ans = -5.5511e-17

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Christine Tobler on 17 Apr 2026

Open in MATLAB Online

4 votes

This is an optimization that's present in both matrix multiplication and backslash: When transposition is applied as part of these operations (that is, on the same line), they are fused into just one operation for performance.

Here's an example for matrix multiplication:

% Make a random matrix and vector of size 10000 x 10000
n = 2e4;
A = randn(n);
x = randn(n, 1);
% Cost of matrix-vector multiplication:
tic;
y1 = A * x;
toc
Elapsed time is 0.084646 seconds.
% Cost of matrix-vector multiplication with transposed A:
tic;
y2 = A' * x;
toc
Elapsed time is 0.062393 seconds.
% Cost of transposing A
tic;
AT = A';
toc
Elapsed time is 0.372679 seconds.

The reason that transposing is so slow is that it requires a new n-by-n matrix to be allocated, and quite some memory movement. On the other hand, transposed matrix-vector multiplication can be achieved by simply changing the order of indexing:

% Compute y1 = A*x
y1 = zeros(n, 1);
for i=1:n
    for j=1:n
        y1(i) = y1(i) + A(i, j) * x(j);
    end
end
% Compute y2 = A'*x
y2 = zeros(n, 1);
for i=1:n
    for j=1:n
        y2(i) = y2(i) + A(j, i) * x(j);
    end
end

The same principle is also applied for backslash. In this case, usually the cost of factorizing A is much larger than the cost of transposing it, but there's still a noticeable advantage for the case of a triangular matrix:

% Make a random triangular matrix and vector of size 10000 x 10000
n = 5e3;
A = triu(randn(n));
x = randn(n, 1);
% Cost of backslash
tic;
y1 = A \ x;
toc
Elapsed time is 0.019095 seconds.
% Cost of backslash with transposed A:
tic;
y2 = A' \ x;
toc
Elapsed time is 0.019226 seconds.
% Cost of transposing A
tic;
AT = A';
toc
Elapsed time is 0.196998 seconds.

19 Comments
Show 17 older comments Hide 17 older comments

Jonah A Reeger on 17 Apr 2026

Thank you for the response. I understand the point that you are making about performance improvement (relative to computation time) of combining the operations; however, the particular case I am considering suffers in terms of accuracy of the solution. That is, the example I provided comes from a naive implementation of an approach to computing Gaussian quadrature weights by integrating a Hermite interpolating polynomial. We use roots of Legendre polynomials to construct t instead of the values in t in my example code. When using the roots of the Legendre polynomials, the computed quadrature weights (the first n+1 entries of either x or x2 in the code) lead to orders of magnitude more accurate approximations when transposition is done in a separate line (i.e., when the quadrature weights come from x2).

All of this said, I would expect the language to parse my expression with order of operations in mind. In particular, the parentheses around (A') are being ignored in your description. Further, if transposition is to be avoided for the sake of computational efficiency, then the combined operation should not produce a result that is orders of magnitude different (even with poor conditioning).

Jonah A Reeger on 22 Apr 2026

I am not suggesting that mathematically the results should not be different when using different algorithms two solve a system of linear equations given poor conditioning. Instead, I am saying that when I request the solution of

, the result should be more predictably similar whether or not I define AT directly first. That is, MATLAB's parser is detecting the transposition and then doing something here that I, and every one of my colleagues, would have never expected to solve this system of equations. This is not discussed anywhere in the documentation of "mldivide, \" (mldivide - Solve systems of linear equations Ax = B for x - MATLAB) as far as I am aware, which is incredibly unfortunate. Perhaps the flowcharts on the documentation page should be updated to reflect that this is done.

Further, in my response I say

"When using the roots of the Legendre polynomials, the computed quadrature weights (the first n+1 entries of either x or x2 in the code) lead to orders of magnitude more accurate approximations when transposition is done in a separate line (i.e., when the quadrature weights come from x2)."

This is entirely different than considering the absolute or relative difference in the entries of x or x2. I am considering the error after applying these weights to approximate the action of the linear operation they are designed for. Either way, a 0.0022 percent difference is too large for my purposes.

Matt J on 22 Apr 2026

Edited: Matt J on 23 Apr 2026

I am not suggesting that mathematically the results should not be different when using different algorithms two solve a system of linear equations given poor conditioning. Instead, I am saying that when I request the solution of , the result should be more predictably similar whether or not I define AT directly first.

It is not clear to me what the distinction here is between "not different" and "predictably similar". An ill-conditioned operation will amplify numerical noise by a factor of the condition number, which in this case is ~1e12. There is no predictable pattern that the amplified noise can be expected to have.

Do you mean the magnitude of the difference you expect is smaller? The precision limit of double floats is ~1e-16. With a condition number of 1e12, those deviations are predicted to get amplified to as much as 1e-4 in x, which is even greater than what is seen for your specific A,b..

Either way, a 0.0022 percent difference is too large for my purposes.

The only principled way to solve that is to improve the conditioning of your problem. Even if A.'\b was implemented in separate steps the way you expect, there are numerous other differences in the way the same operation in Matlab may be executed in different environments that can trigger similarly different results. When the dimensions of A and b get larger, the way the operations are multithreaded across your CPUs will be different, for example.

Jan on 28 Apr 2026

Edited: Jan on 28 Apr 2026

Open in MATLAB Online

The prescribed (and documented) order of operations indicated by (A')*y and AT = A'; AT*y should yield the same results

MathWorks does not document the JIT acceleration. In older releases you find the hint, that the order of calculations can be changed. Therefore the acceleration was disabled, when the profiler was called. This limitation has been removed, but it is reasonable, that a re-ordering is still applied, if it saves processing time.

Obviously, Matlab does not perform the transposition in (A')*y explicitely:

n = 4e3;
A = randn(n);
x = randn(n, 1);
format longe
timeit(@() F1(A, x), 1)
ans = 
     1.877561900000000e-02
timeit(@() F2(A, x), 1)
ans = 
     2.865461900000000e-02
function y = F1(A, x)
B = A + 1;
y = (B') * x;
end
function y = F2(A, x)
B = A' + 1;
y = B * x;
end

I understand the point "should yield the same results", but Matlab is a high-level language, which tries to accelerate the code in a smart way. Omitting the transpose operation is smart and if the programmer is aware of the underlying BLAS and LAPACK functions, this is even expected.

If a computation suffers from such optimizations due to numerical instabilities, it is flawed. The results cannot be reproduced on a different CPU or with a different number of available threads.

This is a standard rule in numerical maths and not a problem of Matlab.

A documentation of the JIT acceleration would be useful in my opinion. The MathWorks team explained repeatedly, that they will not publish this to avoid, that users optimize their code to the JIT. It is the goal of MathWorks, to do it the other way aroud: optimize the JIT to the code written by users.

Paul on 5 May 2026 at 4:17

Edited: Paul on 5 May 2026 at 11:42

@Jan

The primary issue that I'm focusing on, again, has nothing to do with how Matlab calls the underlying BLAS routine.

The issue is that Matlab is ignoring the parentheses, which I think is worthy of dicussion insofar as there is documentation that states that parentheses have the highest precedence and there is no documentation to indicate that there is ever an exception to that rule.

More generally, it's always better to have documentation that explains what a software tool does, rather than having to figure it out by experimenation.

In this regard, secondary to my primary concern, the documentation should explain that operations like '* will not explicitly form the transpose and then multiply. I agree that not forming the explicit transpose is the right thing to do, but it should be documented. What would be the harm for the documentation to explain what the software actually does?

If a user really wants the transpose formed expliclity before the multiplication, then the user should be able to use parentheses as in the Question and count on those parentheses to be respected, which is exactly why parentheses should be used, also as documented.

In the example with c, sum, t, and y (assuming those are floating point types) it's my understanding that a standard-compliant C compiler could implement that as

c = ((sum + y) - sum) - y;

but can't reduce that expression to c = 0, absent specific direction from the user via compiler flag (because floating point addition/subtraction is not associative). Is that not true?

Paul on 12 May 2026 at 19:51

Open in MATLAB Online

Because Matlab (incorrectly, IMO) ignores the parentheses, one way to force the transpose without explicitly creating a new variable is to use the transpose function literally. Some may find it a bit disconcerting that y1 and y4 are not equivalent.

rng(100);
n = 2e4;
A = randn(n);
x = randn(n, 1);
y1 = A'*x;
y2 = (A') * x;
isequal(y1,y2)
ans = logical
   1
AT = A';
y3 = AT * x;
isequal(y2,y3)
ans = logical
   0
y4 = transpose(A)*x;
isequal(y4,y3)
ans = logical
   1
isequal(y1,y4)
ans = logical
   0

Sign in to comment.

Answer 2