# Possible bug in regress when y and x are complex

8 views (last 30 days)
Robert Weigel on 8 Feb 2024
Commented: Robert Weigel on 26 Feb 2024
I am looking for feedback on whether regress produces incorrect confidence intervals when x and y are complex.
The following code
rng(2);
% Model equation: y = x*b + error, where b = (1 + 1j).
% The 95% confidence interval the imaginary part of b has zero width.
x = randn(N,1) + 1j*randn(N,1);
y = x*(1+1j) + sigma*(randn(N,1) + 1j*randn(N,1));
[b,bint] = regress(y,x)
produces
b = 1.0120 + 0.9942i
bint = 0.9520 + 0.9942i 1.0720 + 0.9942i
My interpretation is
real(b) = 1.012 with a 95% confidence interval of [0.95, 1.07]
imag(b) = 0.9942 with a 95% confidence interval of [.9942, .9942]
But imag(b) should have a 95% confidence interval that is similar in width to that for real(b).
To get the expected answer, I do the regression without complex numbers:
% Emulate how the regression is done complex x and y. We get the
% same result for b and the correct 95% confidence intervals on
% both the real and imaginary parts of b. (I have verified via
% simulation that this gives confidence intervals for both components
% are consistent with a 95% confidence intervals.)
y = [real(y);imag(y)];
x = [real(x),-imag(x);imag(x),real(x)];
[b,bint] = regress(y,x);
b = b(1) + 1j*b(2)
bint = [bint(1,1)+1j*bint(2,1),bint(1,2)+1j*bint(2,2)]
In this case I get the same answer for b and confidence intervals that are consistent with what is expected, but different from the first example:
b = 1.0120 + 0.9942i
bint = 0.9726 + 0.9548i 1.0514 + 1.0336i
My interpretation of this result is
real(b) = 1.012 with a 95% confidence interval of [0.9726, 1.514]
imag(b) = 0.9942 with a 95% confidence interval of [.9548, 1.0336]
Note that the condidence interval width for real(b) here is smaller than found in the first example. It seems possible that the confidence interval for real(b) for the first example is based on the confidence interval of abs(b).

The MATLAB function `regress` is not designed to handle complex numbers directly. When you pass complex numbers to `regress`, it only operates on the real part of the data, which is why you observe the zero width for the confidence interval of the imaginary part of `b`. This is consistent with the documentation, which does not specify support for complex numbers.
Your interpretation of the results from the first example is correct; however, the confidence interval for the imaginary part is incorrect because `regress` is not handling the complex data properly.
The second approach you've taken is the correct way to perform linear regression with complex numbers. By separating the real and imaginary parts and stacking them, you can use `regress` to estimate the coefficients for both the real and imaginary parts. This method treats the problem as a multivariate regression with two independent variables for each observation: one for the real part and one for the imaginary part of `x`.
The confidence intervals you get from the second approach are correct for both real and imaginary parts of `b`. The smaller confidence interval width for the real part in the second example compared to the first is likely due to the proper handling of the variance in both real and imaginary parts, which affects the estimated standard errors used to calculate the confidence intervals.
In summary, to perform regression with complex numbers and obtain correct confidence intervals, you should use the second approach, which correctly accounts for the complex nature of the data. The first approach using `regress` directly with complex numbers will not provide accurate confidence intervals for the imaginary part.
Robert Weigel on 26 Feb 2024
I would argue that MATLAB's regress is designed to handle complex numbers. It gets b and the residuals correct. Most libraries that do matrix inversion handle complex inputs (e.g., b = y\X works). So it is surprising that all of the outputs of regress are not correct for complex inputs.
In the same way the documentation has
X should include a column of ones so that the model contains a constant
term. The F statistic and p value are computed under the assumption
that the model contains a constant term, and they are not correct for
models without a constant.
there should have a caveat that if y and/or X are complex, the statistical estimates will not be correct.
(I find it odd that the regress does not compute the correct F statistic and p value when there is not a constant term, similar to how it does not fully handle complex inputs. There are straightforward equations to get the correct answer.)
I am not comfortable with the arguement that if a function does not mention that it handles complex inputs that the user should recognize this. The documentation for mean and mldivide do not say that they handle complex inputs, but they do. Most functions throw an error if the input is not of the correct type instead of silently trying to do a calculation and possibly returing an invalid answer.
I think the fix is to throw a warning if one or more of the inputs is complex and the outputs that are incorrect for complex inputs are requested. Or to implement the correct calculation.
I think robustfit has a similar problem. I'll post about it when I encounter it again.

### Categories

Find more on Support Vector Machine Regression in Help Center and File Exchange

R2023b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!