Seeking Alternative Optimization Methods for a 9-Parameter Optimization Task

Question

0 votes

Hi there,
I am working on an optimization task where I need to determine 9 optimal parameters that satisfy my objective function (Objective.m). I have experimented with various optimization methods, including fmincon, fminsearch, particleswarm, lsqnonlin, etc, and documented a summary of their performance in computeError_1.m.
So far, fmincon and fminsearch have shown the most promising results, but they are not achieving the level of optimization I need (computeError.m). I am looking for alternative approaches that might improve performance, efficiency, and robustness.
If you have any recommendations or insights on more effective optimization techniques—whether it's hybrid methods, global optimization strategies, or derivative-free approaches—I would greatly appreciate your input.
Thank you in advance for your help!

5 Comments
Show 3 older comments Hide 3 older comments

William Rose on 31 Mar 2025

@payam samadi,

I like the recommendations of @Torsten very much. You suggest, in your reply to him, that you have done what he recommends. However, it seems to me that Test_8.m does not really do what @Torsten recommends. He recommends that the initial guess for the unknown vector be the correct value, or a slightly perturbed value. Test_8.m calls computeError.m, which estimates the 9 unknown parameters with fminsearch (i.e. with simplex algorithm, unconstrained) and with fmincon (constrained). fminsearch() starts from initial guess LvecInit, which is not the actual value of [mx,my,mz,sx,sy,sz,rxy,ryz,rzx] for this data set (Sample_2.xlsx). fmincon() starts from

LvecStart = LvecInit + 0.1 * randn(size(LvecInit));

which is not a slightly perturbed version of the actual vector of parameters for this data set.

Test_8.m does not display the true value of [mx,my,mz,sx,sy,sz,rxy,ryz,rzx] for the data set and does not display the value of [mx,my,mz,sx,sy,sz,rxy,ryz,rzx] found by minimization. I suggest you add that to the code.

Test_8.m does display the covariance matrix of the actual x,y,z data and the covariance matrix of x,y,z data which is computed using the best-fit values for [mx,my,mz,sx,sy,sz,rxy,ryz,rzx]. The comparison of covariance matrices shows that estimated Var(x) is too large by a factor of about 40. The covariance matrices also show the wrong signs for estimated cov(x,y) and cov(z,x). This indicates that the estimated values for rho(x,y) and rho(z,x) have the wrong signs. These are significant errors in estimation of the unknown parameters.

If you modify Test_8.m so that it does what @Torsten suggests, then it will be interesting to see how close it gets to the correct values for the unknown vector [mx,my,mz,sx,sy,sz,rxy,ryz,rzx].

payam samadi on 1 Apr 2025

@Torsten and @William Rose

Thank you for your both comments. I believed I had sent the LvecArray.m file; however, here is the code. If I'm not mistaken, this is what you suggested.

For 'Sample 1.xlsx', I went with these two scenarios:

1. LvecInit= [1.50 -7.35 2.38 2.01 5.66 2.82 -0.94 0.95 -0.96]; % Best guess for Lvec, which produces the following covariance matrix for both real and estimation datasets.

Covariance Matrix A_matrix From Real Data:

4.0600 -10.7324 5.4269

-10.7324 32.0362 -15.5045

5.4269 -15.5045 7.9759

Covariance Matrix A_matrix From Estimation:

8.3597 -3.6364 -6.9907

-3.6364 30.0348 4.7806

-6.9907 4.7806 13.8301

RMSE for X, Y, Z (mm): [3.17, 0.56, 5.23] mm

Elapsed time is 18.800424 seconds. Please also see the attached Sample_1_a.fig.

2. LvecInit= [1.40 -7.15 2.18 1.91 5.36 2.72 -0.84 0.75 -0.76]; % A little far from the best value, which produces the following covariance matrix for both real data and estimation.

Covariance Matrix A_matrix From Real Data:

4.0600 -10.7324 5.4269

-10.7324 32.0362 -15.5045

5.4269 -15.5045 7.9759

Covariance Matrix A_matrix From Estimation:

5.4911 -4.7757 -3.7426

-4.7757 29.9628 3.7121

-3.7426 3.7121 13.8008

RMSE for X, Y, Z (mm): [2.54, 0.54, 5.23] mm

Elapsed time is 17.252999 seconds. Please also see the attached Sample_1_b.fig.

Note: for both scenarios, the "fminsearch" used LvecInit while the "fmincon" used LvecStart = LvecInit + 0.1 * randn(size(LvecInit));

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

William Rose on 31 Mar 2025

Edited: William Rose on 31 Mar 2025

0 votes

@payam samadi,

[Edit: Some of the equations in my answer do not look right. I pasted them in. The equations looked fine in the Matlab Answers editing window, but they do not look fine once my answer is posted. Therefore I am editing them. I have not changed the content.]

In your 17 March 2025 posting on Matlab Answers, you included data for a target that moves in a deterministic way in 3 dimensions, as the scanner rotates around the target. The five test data files you attached abouve also show deterministic motion in 3D. The model with 9 adjustable parameters, which you are fitting in this case, is from Poulsen et al., 2009. That model is for a target that moves randomly. The model is that the target is decribed by a 3-D Gaussian distibution. The 3D Gaussian distribution can be thought of as an ellipse in 3D. The 9 parameters are the centroid of the ellipse, the SDs along x,y,z, and the correlations

,

. (From these parameters, one can infer the lengths along the principal axes of the ellipse, and the orientation of the 3D ellipse.) The difference between random and deterministric motion is important. If you are using simulated or real data in which the target moves deterministically as the scanner rotates about the target, then the diferent 2D views of the target will not be random samples of the distribution, and one is likely to obtain incorrect estimates for the parameters.

Perhaps the script fails to find the correct solution because it gets stuck in a local minimum. You could try different starting points. You could use the mean location, found by a separate method that may use singular value decomposition method, as the initial guess for the first three elements of the 9-dimensional minimization. This is the method used by Poulsen at al., 2008, pp. 1588-1589.

Other ideas for improvement:

More restrictive upper and lower bounds for the mean y-coordinate, my. ub(my) = f*max(Yp); lb(my) = f*min(Yp); using all projection angles, using all projection angles. f=approximate inverse magnification=SAD/SID.
More restrictive upper bound and better initial guess for sy. ub(sy)=(ub(my)- lb(my))/2. sy initial guess = f*std(Yp), using all Yp observations. f=approximate inverse magnification=SAD/SID.
More restrictive upper and lower bounds for mx and mz. ub(mx) = ub(mz) = f*max(abs(Xp)); lb(mx) = lb(mz) = -ub(mx).
Better initial guess for sx and sz. Let the initial guesses be sx0 and sz0. Three options for the initial guesses are:
sx0=sz0=s.d.(Xp), using all projections. This may be a considerable overestimate of sx and sz, for a target whose centroid is not at the origin, because, even if the target does not move in 3D at all (i.e. true sx=sz=0), Xp will vary as the source and detector rotate.
For each projection angle ai, compute the deviation of the projected point from the projection of the centroid, corrected for approximate magnification: where f=inverse magnification=SAD/SDD and =projection of the initial 3D estimate of the centroid. Then

Use Xp values from projections with angles between ±30° and 180°±30° to estimate sx0, taking magnification into account:, using only projections for which or , and where the Xp values are negated when . Use Xp values from projections with angles between 90±30° and 270°±30° to estimate sz0, taking magnification into account: , using only projections for which or , and where the Xp values are negated when . This method is unlikely to yield useful results if there are few projections in each set.
Instead of using initial guesses for sx, sy, sz computed with a specific strategy, as described immediately above, use values chosen at random within the 3D cube defined by the lower and upper bounds for sx, sy,sz. Choose the best fit values that minimize the objective function among all the different initial guesses. This strategy will reduce the chance of getting stuck in a local minimum that is not the global minimum. To sample the different parts of this cube, we probably should use 2^3=8 to 3^3=27 different initial guesses.
Instead of using initial guess = 0 for rx, ry, rz, use values chosen at random in the 3D cube with bounds ±1. Choose the best fit values that minimize the objective function among all the different initial guesses. Use 8 to 27 different random initial guesses to explore the cube.
If we combine random initial guesses for sx, sy,sz and for rx, ry, rz, we get a 6D hypercube. To sample the different parts of this hypercube, we probably want 2^6=64 to 3^6=729 different initial guesses. This could take a long time.

22 Comments
Show 20 older comments Hide 20 older comments

payam samadi on 1 Apr 2025

Edited: payam samadi on 1 Apr 2025

@William Rose

Many thanks for you time and comments.

The five test data files you attached abouve also show deterministic motion in 3D. The model with 9 adjustable parameters, which you are fitting in this case, is from Poulsen et al., 2009. That model is for a target that moves randomly.

Ans: Correct. To do this, I project this 3D motion into the 2D space of the imager to obtain the 2D motion observed by the imager.

From these parameters, one can infer the lengths along the principal axes of the ellipse, and the orientation of the 3D ellipse.) The difference between random and deterministric motion is important. If you are using simulated or real data in which the target moves deterministically as the scanner rotates about the target, then the diferent 2D views of the target will not be random samples of the distribution, and one is likely to obtain incorrect estimates for the parameters.

Ans: I am working with real 3D motion data, and I also suspect that projection inconsistencies could be affecting the results. Since I am following Poulsen et al., 2009, I attempted to construct a projection approach suited for the 3D data.

In doing so, I came across an article from the Poulsen et al., 2009 (Appendix) group [1], which appears to use the same dataset as another publication (1, 2). Given this, I believe the projection method (Xp, Yp) described in that article could be applied here.

Another solution would be to consider my own image geometry for p (projection point) and f (projection point), and then modify the equations (A-4 and A-5) on the Objective.m. What do you think?

Additionally, Poulsen et al., 2008 (pp. 1588-1589) appears to simplify the problem by assuming prior knowledge of the mean values in the X, Y, and Z directions. However, this assumption does not fully reflect the true simulation conditions in my case.

Also thanks for the other ideas for improvement. I will go through them.

Ref:

1. Three-dimensional prostate position estimation with a single x-ray imager utilizing the spatial probability density

2. Real-time prostate trajectory estimation with a single imager in arc radiotherapy: a simulation study

payam samadi on 2 Apr 2025

Edited: Walter Roberson on 3 Apr 2025

Open in MATLAB Online

@Matt J

Thank you for your comments.

William and I had extensive discussions about Step 3, and we believe that even with changes to this step, the output results will not be affected.

Here is the edited version for step 3.

%% Step 3: Projection Model: Compute 2D projections
% Add simulated error standard deviation (sig) to the projection
sig=0.05; %simulated error standard deviation
[Xp,Yp]=deal(nan(num_projections,num_detectors));    % allocate arrays
for i=1:numT
    for j=1:num_detectors
        f_theta = (SAD - (true_T(i,1) .* sin(theta_rad(i,j)) + true_T(i,3) .* cos(theta_rad(i,j)))) ./ SID;
        Xp(i,j) = (true_T(i,1) .* cos(theta_rad(i,j)) - true_T(i,3) .* sin(theta_rad(i,j))) ./ f_theta;
        Yp(i,j) = true_T(i,2) ./ f_theta; 
        Xp(i,j)=Xp(i,j) + sig*randn;
        Yp(i,j)=Yp(i,j) + sig*randn;
    end
end

The problem is that even when choosing the closest initial value, LvecInit for sample 1 (for example), which is a bit far from the best value, the optimization is not working in the way we want.

Best value: [+1.51 -7.36 +2.39 2.01 5.66 2.82 -0.94 -0.97 +0.95];

LvecInit= [+1.41 -7.46 +2.19 2.21 5.46 2.92 -0.74 -0.77 +0.79];

The following covariance matrix produces for both real and estimation datasets.

Covariance Matrix A_matrix From Real Data:

4.0600 -10.7324 5.4269

-10.7324 32.0362 -15.5045

5.4269 -15.5045 7.9759

Covariance Matrix A_matrix From Estimation:

6.4816 -3.9255 4.2139

-3.9255 29.9421 -4.8484

4.2139 -4.8484 10.3860

RMSE for X, Y, Z (mm): [3.00, 0.53, 4.04] mm

Elapsed time is 18.035074 seconds.

William Rose on 2 Apr 2025

@payam samadi,

I have renamed your current file Objecitve.m to Objective20250402.m for my purposes, to avoid confusion with earlier files of the same name. I see that your current Objective.m is very similar to Objective0.m, which I posted on Matlab Answeers on March 11 - except you have removed my name from the comments at the top.

Your latest Objective.m does not use the covariance matrix A, which is computed from the SDs and rho values passed to the function in vector x. My Objective0.m computes B=inv(A). Your script computes B by generating random numbers and using them to make a positive definite matrix.(* See paragraph below.) Your Objective.m does not use the sigmas or the rhos which are included in vector x, which is passed to Objective.m. Therefore the objective function is not utilizing the parameters which the outer minimization routine is trying to optimize. Fix your Objective.m, or use Objective0.m, which I uploaded to this site on March 11.

* You say in the code comments that you generate B the way you do "because the Construct A matrix (Upper Matrix) results in a negative B matrix". Do you mean that B is not necessarily positive deifnite? I would be very suprised if B is not positive definite, as long as the values passed in vector x are reasonable (i.e. as long as the standard deviations are positive and the rho values are between -1 and +1). If A is a valid covariance matrix, then A will be positive deifnite, and if A is positive definite, then B=inv(A) is also positive definite (see here).

Other notes related to your latest Objective.m:

In your current Objective.m, you compute Mahalanobis distance instead of K. In your comments you call this "A.10", but this is only part of eqn A.10 in Poulsen et al 2009. The calculation of K, which I do in Objective0.m, follows Poulsen's A.10. You then incorporate the det(B) term into FMLE in a different way, and your comment says "Compute log-determinant more robustly". My Objective0.m follows the published equations of Poulsen et al. 2009 more closely than your current Objective.m. I have not done the math or other checks to confirm that your "method 2" gives the same answer as my way, but I recommend that you do those checks, if you have doubts.

If your "method 2" (with Mahalanobis distance) does in fact replicate the formulas of Poulsen et al 2009, then I suspect the problem is due to the fact that your data are distributed quite differently from the 3D Gaussian statistics which are the basis of Poulsen's approach. I said this in my March 31 comment above, but I was not as clear as I could have been. You could test my hypothesis by generating simulated 3D data that DOES have a 3D Gaussian distribution. Then you could see if the minimization algorithm which uses your Objective.m (after you fix the problem I identified above) will stop, if the intial guess for vector x matches the parameters of the simulated data.

payam samadi on 3 Apr 2025

@Matt J @William Rose @Sam Chak @Torsten

Dear William,

I found the "Objective 0" however, It doesn't work with the code since the Matrix B is not positive definite. Therefore, I made a little change in it and changed the name"Objective_1" by following two things improvement.

1. Ensure Proper Bounds on Correlations (ρxy, ρyz, ρzx) ranged (-1 ≤ ρ ≤ 1)

2. % Ensure positive definiteness using regularization

epsilon = 1e-10; % Small positive value

A = A + epsilon * eye(3); % Adds a small value to the diagonal to ensure positive definiteness

However, this "Objective_1" doesn't work.

Therefore, I write another objective function "Objective_2" with the following update.

1. 23 different methods are applied to ensure Matrix B is positive ( The following are the results for sample 1.xls with the LvecInit=[0.1,0.1,0.1,1.5,1.2,1,-0.9,-0.9,0.9]).

2. mahalanobisDist (This method was Matt J idea)

---Overall, among them, methods 3, 4, 6, 7, 8, 11, 17, and 19, work with the following sometime warning:

Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =

7.289900e-20.

However, in the end, the results are not even close.

---Other Methods (1, 5, 13, 14, 16) give me the following error:

The objective function is undefined at the initial point. Fmincon cannot continue.

---Methods 9, 10, 15, 21, 22, and 23 are based on random, and between these methods 10 and 15 provide the best results so far.

--Methods 12 and 16, started with this Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =

7.289900e-20.

However, in the end, it stopped.

---Method 18 give me:

Matrix must be positive definite.

---Method 2 stopped:

Matrix is close to singular or badly scaled.

---Method 20 sometimes gives the near mean value for the mx, my, and mz (sometimes got stuck in local minima).

Note: I applied the Method 20 to the Objective_1, the final results was also so far.

William Rose on 3 Apr 2025

Open in MATLAB Online

Sample 2.xlsx

@payam samadi,

Let Y be a Nx3 array of measured 3D positions. The eigenvectors of the covariance matrix of Y are the principal directions of Y. In other words, the eigenvectors (which are guaranteed to be orthogonal) correspond to the axes of the best-fit (in a least-squares sense) ellipsoid that fits the data. The square roots of the eigenvalues are the standard deviations of the data with respect to the principal axes. You can use the array of eigenvectors to rotate the data to an orientation where the principal axes are aligned with the x,y,z axes. (An alternative interpretation is that you can you the array of eigenvectors to express the original data in terms of the rotated coordinate frame defined by the principal axes.) When I applied these ideas to your data, I got a new understanding of your data: I learned that the largest eigenvalue is a lot bigger than the other two, and the smaller eigenvalues are similar. That means the marker moves mainly along one dimension, without much deviation in the orthogonal directions.

The script below applies these ideas to Sample 2.xlsx, which is the data set used by your Test8.m.

Y=readmatrix('Sample 2.xlsx');
Y=Y(:,2:4);                 % discard column 1
covY=cov(Y);                % covariance matrix
[V,D]=eig(covY,"vector");   % eigenvectors and eigenvalues
s=D.^0.5;                   % square roots of eigenvalues
fprintf('Cov(Y)=\n'); disp(covY)
Cov(Y)=
    0.4218    3.0784   -1.3875
    3.0784   25.3191  -11.5885
   -1.3875  -11.5885    5.4339
fprintf('Standard deviations along prinicpal axes=\n'); disp(s')
Standard deviations along prinicpal axes=
    0.2000    0.3413    5.5694

Display the data:

figure;

subplot(2,2,1); plot(Y(:,1),Y(:,2),'r.')

grid on; xlabel('X'); ylabel('Y'); axis equal

xl=xlim; % get x-axis limits

subplot(2,2,2); plot(Y(:,3),Y(:,2),'r.')

grid on; xlabel('Z'); ylabel('Y'); axis equal

subplot(2,2,3); plot(Y(:,1),Y(:,3),'r.')

grid on; xlabel('X'); ylabel('Z'); axis equal

xlim(xl); % set x-axis limits to match plot above it

The fact that this data varies primarily along one direction may have implications for fitting. For example, you might decide that you only need to find the mean location (3 parameters) and the principal direction of motion (2 parameters) and the standard deviaiton along the principal direction (1 parameter). In this case, you are estimating 6 parameters instead of the 9 parameters in Poulsen's model. When you eliminate poorly constrained parameters from a model, you often find that you can get more reliable estimates for the remaining parameters.

William Rose on 4 Apr 2025

Edited: William Rose on 4 Apr 2025

@payam samadi,

[edit: fix spelling error]

Thank you for noting that Poulsen 2008 set rLR-CC and rLR-Ap to zero. That is a good example of reducing the number of adjustable parmeters in the model in order to get a better fit for the remaining parameters. I understand that that decision was reasonable for the prostate, but might not be reasonable for a thoracic or abdominal tumor. Thank you also for explaining that Sample 1.xlsx through Sample 5.xlsx are from a liver tumor.

I am impressed that you tried 23 different methods to assure that B is positive definite. But I don't understand how B was not positive definite. How did you determine that B was not positive definite? If A is a covariance matrix, then A and B=inv(A) will be positive definite. Can you give an example of a case when B is not positive definite?

I think I am at risk of forgetting the big picture. Please tell me if the following paragraph is a reasonable sumary of the big picture.

You want to estimate 3D tumor marker location and variation from 2D projections. Fortunately, you have some actual 3D data (Sample [1..5].xlsx), so you can simulate the 2D projections, run your estimation algorithm, and compare the fitted estimates for mean position, stadard deviaitons, and correlations to the actualvalues drived from the 3D data directly. You are not happy with the results you are getting, using the algorithm of Poulsen et al. 2009.

William Rose on 5 Apr 2025

@payam samadi,

I implemented the suggestions I made on 17 March. The script successfully estimates the nine marker position and movement parameters, for all five sample files you provided. A screenshot of results is below. The screenshot shows results for each of the five sample data files. For each file, line 1 is the true values of means, SDs, and correlations, derived from the x,y,z data in the sample data files. Line 2 are the estimated values, based on the 2D projections. The agreement is very good, for all 9 parameters, for all five files. The screenshot also shows the distance beween the true mean in 3D and the esitmated mean position. The distance is < 0.2 mm in all cases, which is probably smaller than the marker itself.

The script calls computeError_e(), which is similar to computeError_d. The objective function is Objective0.m (posted here 11 March). Function computeError_e() uses an initial guess and upper and lower limits that are determined from the Xp,Yp data. Specifically, computeError_e() calls computeError_d() and Objective2() to get an initial estimate for the mean position. I emailed Objective2.m to you on 17 March; I see you posted it here, almost identical (but without my name) on 21 March. Function computeError_e() uses the standard deviation of the Xp values, corrected for image magnification, as the initial guess for sd(X) and sd(Z), and it uses the standard deviation of the Yp values, corrected for image magnification, as the initial guess for sd(Y) (as I suggested in my 17 March email and on this site on 31 March). The upper and lower limits for the mean are initial mean estimate +- 2*initial SD estimate. The lower and upper limits for the SDs are 0 and 3*initial SD estimate.

payam samadi on 5 Apr 2025

You want to estimate 3D tumor marker location and variation from 2D projections. Fortunately, you have some actual 3D data (Sample [1..5].xlsx), so you can simulate the 2D projections, run your estimation algorithm, and compare the fitted estimates for mean position, stadard deviaitons, and correlations to the actualvalues drived from the 3D data directly. You are not happy with the results you are getting, using the algorithm of Poulsen et al. 2009.

Yes, Indeed.

You are absolutely right — the content I shared on 21 March was nearly identical to what we had discussed previously, and I failed to include your name or acknowledge your input and earlier contributions. My intention was not to take credit or misrepresent our prior work. I now realize that by omitting your name and the context of our collaboration, I created a misleading impression, and for that, I am truly sorry.

At the time, I was hoping to get help from other contributors and thought that sharing all details, including your name and our previous updates, might discourage engagement. In hindsight, this was a poor judgment on my part, and I fully accept the responsibility for this mistake.

If you're open to it, I would greatly appreciate the chance to discuss this further via email. I value your expertise and guidance, and I hope to make amends and continue learning from you.

payam samadi on 6 Apr 2025

@William Rose

Thank you for the current script. I have summarized all of these scripts, as well as a few I developed earlier, into one comprehensive document, which I have sent to you via email. Additionally, I checked a few other items and explained them in the email as well.

William Rose on 8 Apr 2025

Edited: William Rose on 8 Apr 2025

Open in MATLAB Online

@payam samadi

Here is a revised version of the script to assist in visualizing the 3D Gaussian distribution. The input is the 1x9 L vector = [mean(x),mean(y),mean(z),sd(x),sd(y),sd(z),corr(x,y),corr(y,z),corr(z,x)].

L may be estimated from 2D projections or may be computed directly, if the x,y,z coordinates are known.

The script displays the orientation of the principal axis of the 3D Gaussian distribution, and displays the fraction of total variance accounted for by movement along the principal axis.

The script generates 2 figures showing an ellipsoid that bounds 90% of the probability for the marker location. Change the value 0.9 in the code, to generate an ellipse that bounds a different probability.

The first figure is designed to be rotated as desired by the user. The second figure shows three orthogonal two-D views of the ellipsoid, for a supine patient. The ellipsoid color has no specific meaning, but it helps the viewer know how the different views are related.

visualize3dGaussian1

Principal direction=0.306 left, 0.846 caudal, 0.437 anterior. Percent of motion variance due to movement along principal axis=98.8%.

●

Sign in to comment.

Seeking Alternative Optimization Methods for a 9-Parameter Optimization Task

5 Comments
Show 3 older comments Hide 3 older comments

Accepted Answer

22 Comments
Show 20 older comments Hide 20 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Seeking Alternative Optimization Methods for a 9-Parameter Optimization Task

5 Comments Show 3 older comments Hide 3 older comments

Accepted Answer

22 Comments Show 20 older comments Hide 20 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

22 Comments
Show 20 older comments Hide 20 older comments