Right measure for logistic regression of imbalance data and dealing with Complete Separation

Question

Matthias on 5 Jul 2016

1
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/293700-right-measure-for-logistic-regression-of-imbalance-data-and-dealing-with-complete-separation

Answered: Ive J on 4 Jun 2022

data.mat

I have a highly imbalanced data set (ratio 1:150) with four predictors, where two are correlated. I attached the data as data.m below, you can also see the two figures below.

I would like to use logistic regression, and then validate it, in order to

two compare it with a different model,
check which predictors can be omitted
check if the performance can be improved by combining features (feat1, feat1*feat2, etc.).

I also wanted to do undersampling to reduce the computational effort (I want to use the classifier in live application).

My questions:

Which measure should I use to check performance? There are too many (F-measure, Cohen's Kappa, Powers Informedness, AUC for ROC). I thought first about the AUC, because then I don't have to select a threshold like for the other measures. Or is the best method to use the sum of the error: (predicted label- classifier continuous output)^2.
How would you reduce the computational effort? I thought about focused undersampling, instead of random undersampling, and keep class overlapping points. But I'm guessing this might lead to bias.
To deal with the separation there is Firth penalized logistic regression as by Heinze2002 and bayesian logistic regression as in Gelman2008. Both are implemented in R ( logisticf and bayesglm ), which I'm not familiar. How can I deal with complete separation in Matlab? I tried to implement

Figure 1. Two features plotted against each other for the full data set:

Figure 2. Random undersampled data, leading to complete separation:

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ive J on 4 Jun 2022

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/293700-right-measure-for-logistic-regression-of-imbalance-data-and-dealing-with-complete-separation#answer_978765

It's probably a bit late for your original problem, but since it's an important question and MATLAB still lacks such important features, this Github repo has already implemented various penalized logistic regression methods (and much more):

https://github.com/treder/MVPA-Light

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Right measure for logistic regression of imbalance data and dealing with Complete Separation

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Right measure for logistic regression of imbalance data and dealing with Complete Separation

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments