Find nonlinear function to optimize parameters

4 views (last 30 days)
Hello,
I'm trying to optimize a large dataset that contains 36 predictors and 1 response variable. To optimize, I am using fminsearchbnd, which I found on the MathWorks File Exchange. However, I don't know the best formula/function to use for the optimization (e.g., coefficients, highest order, etc). I tried using fitlm with linear, squared, and interaction terms between all 36 predictors, but the function output isn't great, and the response variable goes below 0 (which it shouldn't because it is a RMSE). It should be a nonlinear function, but I don't know of what form.
Is there a function / toolbox I could use to find the formula/function to optimize the predictor variables so that the response variable (RMSE) is minimized?
Thank you in advance!

Accepted Answer

Walter Roberson
Walter Roberson on 2 Nov 2021
Is there a function / toolbox I could use to find the formula/function to optimize the predictor variables so that the response variable (RMSE) is minimized?
No.
It can be proven mathematically (and I have personally posted proofs in the past) that any finite set of points of finite precision, can be exactly fitted (to within round-off error) by an uncountable infinity of different formula. If a program were to pick one of the formulas, then the probability that it picked the "right" formula would be which is 0 .
If you do not have a restricted set of possible forms, then there is no possible program that can find the "right" form of the equation.
Even if you have a restircted set of possible forms, due to round-off error and noise in measurements, it is notoriously true that a form known in advance to be the "wrong" equation can end up with a lower RMSE than the "right" equation.

More Answers (1)

John D'Errico
John D'Errico on 2 Nov 2021
Edited: John D'Errico on 2 Nov 2021
NO. Do NOT use fminsearchbnd to try to optimize a problem with 36 parameters. You will be wasting your time and mine, when you next send me a plaintive e-mail asking why it does not work.
fminsearchbnd uses fminsearch, as an overlay to do the work, but then apply bound constraints. fminsearch is able to optimize problems with perhaps 6-8 parameters. Maybe 10 in a pinch. But 36 unknowns? Give me a break. It won't work. PERIOD.
What we are not told is how many data points you have. Far too often people think they don't need many data points. With too few data points, expect garbage for results no matter what. You say the dataset is large, but is it? Do you have sufficient information to reasonably estimate that many parameters?
Next, we are given no clue if the model is even reasonable for your data. Too often, people try to cram their own favorite model into their data. You can't fit a square peg into a round hole. Well, you can, but either the peg or the hole will suffer.
And, oh. it looks like you have no idea what model to use here, so you are trying to use a multinomial model (polynomial in multiple dimensions.) Expect randomly garbage results with that model.
Finally, you need good starting values for a nonlinear model. A 36 dimensinal search space is IMMENSE. Provide poor starting values, and expect crapola for a result. But if your model is LINEAR, as it would be if you used fitlm, then there is no reason to even bother with an iterative method like fminsearchbnd. fitlm will give you the optimal answer. It may not be a model that you like, but that is the fault of your data and your choice of model.
  1 Comment
Matthew Blomquist
Matthew Blomquist on 2 Nov 2021
I have a table of 10000 rows by 37 columns. I am running simulations that randomly varied the first 36 parameters, then compared the output to experimental data, which the comparison is the 37th column (the RMSE of the simulation data compared to experimental data). I don't know how the functions for these parameters, and I'm not sure how they all interact, so I was checking to see if there was some way I could make some sort of response surface to all the data that I had, then check to see if there were some set of values for the parameters that would further decrease the RMSE.
I don't have a favorite model. I believe it is nonlinear, but I'm just trying to use different functions to see what best fits the data.
For the initial starting value, I used the set of parameters that gave the lowest RMSE in the table.
Does that clear up any of your questions? If it is still impossible / too complex, then just let me know.
Thanks

Sign in to comment.

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!