Optimization by adding a penalization term

Question

0 votes

Hello, I write the following function that minimize an objective function by adding a penalization term:

function [theta_opt, fval]=optimization_P0_RMDF_Penalise_without_laguerreValue(X,D,a,eta,sigma)
[n,p]=size(X);
    function f = objectif_function(theta)
        fun=0;
        for i=1:n
            for j=i+1:n
                S=0;
                for k=1:p
                    S=S+(X(i,k)+theta(i,k)-X(j,k)-theta(j,k))^2;
                end
                lambda(i,j)=S;
                fun=fun+D(i,j)^2+2*sigma^2*a^2*p+a^2*lambda(i,j)-2*sqrt(pi)*a*sigma*D(i,j)*laguerreL(1/2,p/2,-(lambda(i,j)/(4*sigma^2)));
            end
        end
%%%%%%%%Penalization term%%%%%%%%%%%%
      nb_disp=0;
      for i=1:n
          for k=1:p
           if theta(i,k)~=0
            nb_disp=nb_disp+1;
           end
          end
      end  
     f=fun+eta*nb_disp;
  end
theta0=zeros(n,p);
options = optimset('Display','iter','Algorithm','active-set');
[theta_opt,fval] = fminunc(@objectif_function,theta0,options);
end

My problem that the optimzation is made without taking into account the penalization term. The result of optimization for any value of $\eta$ is similar to that obtained by taking eta=0 which means that the part "penalization term" is not considered during the optimization .

Can someone help me to fix thisd problem.

Thanks in advance,

Heborvi

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 3 Oct 2016

0 votes

Your penalty term is a discrete function of theta, and hence not differentiable. This violates the assumptions of fminunc. Moreover, because the penalty function is piece-wise flat, it has zero gradient almost everywhere, which would likely explain why it's hard to get it to move, for small values of eta.

2 Comments
Show None Hide None

Heborvi on 3 Oct 2016

Edited: Heborvi on 3 Oct 2016

Thank you for your answer.

So it is needed to choose a large value of eta? I have tested different values of eta and I still obtain the same result!

Matt J on 3 Oct 2016

Edited: Matt J on 3 Oct 2016

Because the gradient and Hessian is zero almost everywhere, the search directions, calculated by fminunc using function derivatives, are the same almost everywhere as when eta=0.

Sign in to comment.

Answer 2

John D'Errico on 3 Oct 2016

Edited: John D'Errico on 3 Oct 2016

0 votes

A major part of your problem is that you have created a discontinuous objective by adding discrete integer amounts to the objective as a penalty. Then you want to use fminunc to optimize the problem.

You clearly do not appreciate that fminunc REQUIRES a differentiable objective. That you call it a penalty term, as opposed to the objective is irrelevant. fminunc sees a discontinuous objective function. That will cause it to do unpredictable things.

In fact, this is abad thing to do to virtually ANY optimizer, so I am not sure who suggested the idea to you, or where you got it from.

My guess is that your penalty, if it is your goal that this term be close to zero, should be a simple function of the distance away from the goal. Quadratic penalties might make sense to you, maybe even exponential in some form. I can't say. Note that a LINEAR penalty function would again be wrong, since then your objective is again non-differentiable.

2 Comments
Show None Hide None

Heborvi on 3 Oct 2016

Edited: Heborvi on 3 Oct 2016

Thank you for you answer,

In fact my penalty term is related to the parsimonious choice of vectors theta, therefore I use the l0-norm to do this. So, in my case I cant use fminunc to do my optimization?

John D'Errico on 3 Oct 2016

Fminunc is absolutely out of the question. And since a smoothly increasing penalty is not an option, you cannot really use any optimizer that assumes differentiability.

I think you need to be looking for a mixed integer programming tool, that can handle nonlinear objectives.

Sign in to comment.

Answer 3

Matt J on 4 Oct 2016

Edited: Matt J on 4 Oct 2016

Open in MATLAB Online

0 votes

In fact my penalty term is related to the parsimonious choice of vectors theta, therefore I use the l0-norm to do this.

Often people compromise by using the l1-norm instead. This can be formulated differentiably by minimizing

   min. fun(theta)+eta*sum(r)
   s.t. r(i) >= theta(i) , 
        r(i) >= -theta(i)

where we have introduced additional unknown variables, r(i). This will require fmincon, as opposed to fminunc.

6 Comments
Show 4 older comments Hide 4 older comments

Heborvi on 4 Oct 2016

And the sparsity of theta is taken into account by using this formulation?

Matt J on 4 Oct 2016

Insofar as norm(theta,1) is an approximation of norm(theta,0), yes.

Sign in to comment.

Optimization by adding a penalization term

0 Comments
Show -2 older comments Hide -2 older comments

Answers (3)

2 Comments
Show None Hide None

2 Comments
Show None Hide None

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Tags

Community Treasure Hunt

Optimization by adding a penalization term

0 Comments Show -2 older comments Hide -2 older comments

Answers (3)

2 Comments Show None Hide None

2 Comments Show None Hide None

6 Comments Show 4 older comments Hide 4 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

2 Comments
Show None Hide None

2 Comments
Show None Hide None

6 Comments
Show 4 older comments Hide 4 older comments