# Understanding the update equation in logistic regression/classifier

2 views (last 30 days)
Z Liang on 16 Oct 2019
Following a tutorial, I tried implementing the steps of building a logistic classifier below
%% logistic regression tutorial; https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/
% initialize variables
temp = [2.7810836 2.550537003 0
1.465489372 2.362125076 0
3.396561688 4.400293529 0
1.38807019 1.850220317 0
3.06407232 3.005305973 0
7.627531214 2.759262235 1
5.332441248 2.088626775 1
6.922596716 1.77106367 1
8.675418651 -0.2420686549 1
7.673756466 3.508563011 1]
X1 = temp(:,1)
X2 = temp(:,2)
Y = temp(:,3)
% define inline function "log_trans"
log_trans = @(x)(1 ./ (1 + exp(-x)));
% initialize parameters
B0 = 0;
B1 = 0;
B2 = 0;
alpha = 0.3;
epoc = 1;
dataSize = size(Y,1);
for i2 = 1:dataSize*10
i1 = round(mod(i2,10.0001));
x = B0*1 + B1*X1(i1) + B2*X2(i1);
prediction = log_trans(x);
B0 = B0 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*1;
B1 = B1 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*X1(i1);
B2 = B2 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*X2(i1);
if prediction > 0.5
Y_pred(i1,1) = 1;
else
Y_pred(i1,1) = 0;
end
if mod(i1,10) == 0
Acc(epoc,1) = ((dataSize-sum(abs(Y - Y_pred)))/(dataSize));
epoc = epoc + 1;
end
end
It works to produce the same final coefficient values the tutorial puts forth.
My question is regarding these lines:
B0 = B0 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*1;
B1 = B1 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*X1(i1);
B2 = B2 + alpha*(Y(i1) - prediction)*prediction*(1 - prediction)*X2(i1);
I understand each iteration updates the previous coefficient values (B0, B1, B2). The update is weighted by alpha (set to 0.3 per tutorial). The remaining 3 "terms": (Y(i1) - prediction), prediction, and (1 - prediction) I cannot arrive at a satisfyingly intuitive understanding.
Prediction is a "logistic curve" (again excuse my lack of formal language) ranging from 0 to 1. Y is a column vector of labels 0 vs 1. So I intuit at least that the closer prediction is to Y(i1), the better the coefficient is performing, and so the smaller the incremental adjustment. I cannot however intuit the inclusion of prediction, and of (1 - prediction), and would appreciate some help here.