“Regression design matrix is rank deficient” - what to do next?

53 views (last 30 days)
I have the following model:
Reduction_in_clinical_score ~ Baseline_clinical_score + Site_of_data_collection + Treatment_Type + Age + Sex + ERP
Site of data collection is made up of four levels (categorical variable), treatment type has two levels (categorical variable), and sex has two levels (categorical). All other variables are continuous (double).
There are 88 observations in total.
In Matlab (using fitlm), I am running into the following error: Warning: Regression design matrix is rank deficient to within machine precision.
From what I have gathered online, it seems as though this may be caused by having an inadequate number of observations relative to the number of predictors in my model.
My question is then what would be the next step in this case?
Would it be to remove a predictor (ideally based on theory/literature)?
I ran the same linear regression in SPSS, which provided no warning (the output all looks reasonable). So I am not sure if this is something wrong perhaps with how I prepared my table for Matlab?
Thank you.

Accepted Answer

John D'Errico
John D'Errico on 5 Nov 2020
You already know what to do. Get more data, and perhaps as importantly, get better data that better fills out the design space.
From the other end of things, it may be as important to simplify your model. Having too many terms in your model will make it difficult to estimate. Of course, we have no clue as to the specifics of your model, or what your data looks like. So there could be many reasons you have made too complicated a model.
For example, suppose you have a model with linear AND quadratic terms in those two level categorical variables? The result would be a singular matrix. No unique solution will exist, nor would any solution be meaningful.
Did SPSS arrive at a solution with no error reported? Apparently so, if the data and the model are truly the same between solutions. But that tells us little. It may only tell us that SPSS uses a different algorithm, probably based on a pseudo-inverse. While it will give a solution, the solution is then not unique, and no warning message was then reported to tell you there may well be a fundamental problem in your data and your model. Personally, I think the warning message of singularity is a highly important piece of information. It tells you to fix the problem, instead of ignoring the problem while you believe a valid "solution" has been found.
So I would
  1. Verify that you really have the same model as was used in SPSS, as well as the same data.
  2. Look carefully at the model you have posed, in context of the data you have. Is there a singularity seen for a good reason?
  3. Decide if you can get more/better data, or if you need to eliminate terms in the model.
Some of these actions might be best done with the aid of a statistician to act as a consultant. That depends on your knowledge of modeling and statistics.
  1 Comment
Prabhjot Dhami
Prabhjot Dhami on 6 Nov 2020
Thank you John. There indeed was design issue with the data I have, leading to a singularity. I've gone ahead and eliminated one of the terms leading to no issues. I verified this in R as well.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!