How well can I predict task performance from predictor variables?

Question

Toby Feld on 7 May 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/824450-how-well-can-i-predict-task-performance-from-predictor-variables

Edited: dpb on 10 May 2021

Hello,

I have the following research question: I would like to predict the performance in a response time experiment (participants have to respond as fast as possible to a target stimulus) from three neural measures: Amplitude of an EEG signal, speed of a saccade (eye movement), and activity in a specific brain area as measured with fMRI.

What I have is a matrix with 5 columns: participant ID, EEG, saccade, fMRI, response time. The first column is just to identify the participants, columns 2-4 are predictor variables and the fifth column is the to-be-predicted variable.

Here are the actual questions: What would be a good way of testing how well I can predict the task performance? A regression I assume? Which function in MATLAB would you recommend? Does it make sense to segment the participants before running the regression?

Thanks,

Tim

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 7 May 2021

We have absolutely no way to answer your question having no knowledge of the experiment.

Rightfully, the analysis methods would have been picked first and then the experiment designed and executed so as to be able to estimate the parameters of the model.

See a G. E. Box white paper that outlines some of the possible problems here <Regression Analysis Applied to Happenstance Data>

Given you already have the data and likely can't repeat the experiment, one must do what one can to at least be aware of potential issues unless the data were taken under well-controlled circumstances.

As for the last Q? specifically, "maybe"; your independent variables are markedly lacking in such information as age, sex, health status, etc., etc., etc., ... all of which may be the easily thought of confounding variables of which Professor Box speaks, not to mention less obvious but maybe even more important to the results of things like amount and quality of sleep the night before, etc.,

Sign in to comment.

Sign in to answer this question.

Answer 1

Scott MacKenzie on 7 May 2021

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/824450-how-well-can-i-predict-task-performance-from-predictor-variables#answer_694750

Edited: Scott MacKenzie on 8 May 2021

Open in MATLAB Online

If you want a prediction equation expressing RT as a linear function of "amplitude of EEG signal", "speed of saccade", and "fMRI brain activity" and you've already collected the data, this is doable. Of course, whose to say the relationship with each of these variables is linear. But, that's another story (see dpb's comment).

The following code with fake data for 10 participants demonstrates the mechanics of building such a model. And it should get you thinking about your goals.

eeg = rand(10,1);
saccade = rand(10,1);
fmri = rand(10,1);
rt = rand(10,1);
data = [ones(size(eeg)) eeg saccade fmri];
[b, ~, ~, ~, stats] = regress(rt, data)

Output:

b =

0.86189

-0.16142

-0.58097

0.034387

stats =

0.3729 1.1893 0.39018 0.12236

The prediction equation is

rt = 0.861 - 0.161 x eeg - 0.581 x saccade x 0.034 x fmri

with R^2 = 0.3729.

I suggest you read the documenation on the regress function and study the examples. Good luck.

12 Comments
Show 10 older commentsHide 10 older comments

Toby Feld on 8 May 2021

Open in MATLAB Online

You both made some excellent points. I do indeed not want to fish around for good model fits, I want to do a cross validation. For example something like this:

% Run regression on simulated data
iterations = 100; % number of iterations
subjectTotal = 1000; % number of simulated subjects
subjectTest = 500; % subjects used for testing
% Create data
eeg = rand(subjectTotal,1);
saccade = rand(subjectTotal,1);
fmri = rand(subjectTotal,1);
rt = rand(subjectTotal,1);
% Combine data into matrix and add ones
data = [ones(size(eeg)) eeg saccade fmri];
fit = [];
for i = 1:iterations
    shuffledSubjects = randperm(subjectTotal); % shuffle subjects
    subjectsTrain = shuffledSubjects(1:subjectTest); % select sample for training
    subjectsTest = shuffledSubjects(subjectTest+1:end); % select remaining subjects as sample for testing
    X1 = data(subjectsTrain,:); % predictive variables training
    X2 = data(subjectsTest,:); % predictive variables testing
    Y1 = rt(subjectsTrain,:); % predicted variable training
    Y2 = rt(subjectsTest,:); % predicted variable testing
    [b, ~, R, ~, stats] = regress(Y1,X1); % run regression model
    estimates = X2*b; % using weights from regression model, estimate values for test data
    ResTest = Y2-estimates; % calculate the residuals
    fit(i,1) = (mean(ResTest.^2)); % use square (to punish for strong deviations) and average
end
mean(fit) % average fit across iterations (the smaller the better the fit)

It's quite interesting how the model fit changes with different samples for training and testing. If I use 900 out of 1000 for training I get better fits than for 100 or so. Of course that only works for real data as random data shouldn't allow any fit. Does this make sense? Thanks!

Scott MacKenzie on 10 May 2021

I'm not sure. Perhaps dpb will have some ideas to offer.

dpb on 10 May 2021

Edited: dpb on 10 May 2021

Open in MATLAB Online

I don't have time to do much right now; I am interested and will try to get back later -- just one observation to emphasize what was said before -- "R-sq isn't the tell-all, end-all" to evaluate a model.

" I also tried nonlinear apporaches:lm = fitlm([EEG saccade fMRI],rt,'quadratic') and get an even better R^2 ..."

A quadratic surface is still a linear model; just higher order;
Of course you get a higher R-sq, you've added six (6) additional terms and reduced the residual numbers of DOF by that many as well.

You seemingly still haven't looked at the model nor the data itself, though...the "exploratory" part --

>> mdl=fitlm([EEG saccade, fMRI],rt)
mdl = 
Linear regression model:
    y ~ 1 + x1 + x2 + x3
Estimated Coefficients:
                   Estimate      SE       tStat        pValue  
                   ________    ______    ________    __________
    (Intercept)     464.76     8.2759      56.158    2.0735e-40
    x1              18.336     2.3883      7.6777    1.8516e-09
    x2             0.11507     3.0848    0.037303       0.97042
    x3              37.777     3.1301      12.069    4.4618e-15
Number of observations: 45, Error degrees of freedom: 41
Root Mean Squared Error: 18.3
R-squared: 0.939,  Adjusted R-Squared: 0.935
F-statistic vs. constant model: 211, p-value = 6.17e-25
>> 

NB: that coefficient x2 ~ saccade has a SE (standard error of estimate) that is ~30X the magnitude of the coefficient -- IOW, it is meaningless as that says the coefficient is ~0.1 +/- 3 -- or anywhere between [-2.9, 3.1].

So, to interpret this model more accurately, it's really the same thing as

>> fitlm([EEG, fMRI],rt)
ans = 
Linear regression model:
    y ~ 1 + x1 + x2
Estimated Coefficients:
                   Estimate      SE      tStat       pValue  
                   ________    ______    ______    __________
    (Intercept)     464.85     7.7969     59.62     3.197e-42
    x1              18.367     2.2196    8.2748    2.3207e-10
    x2              37.852     2.3729    15.951    1.9796e-19
Number of observations: 45, Error degrees of freedom: 42
Root Mean Squared Error: 18.1
R-squared: 0.939,  Adjusted R-Squared: 0.936
F-statistic vs. constant model: 324, p-value = 3.02e-26
>> 

which actually is just slightly better with fewer terms -- RMSE 18.1 vs 18.3

"Everything should be a simple as possible, but not simpler." -- Einstein

Goes for model-building as well as physics.

This doesn't even start on residuals analyses, etc., etc., etc., ...

Sign in to comment.

How well can I predict task performance from predictor variables?

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

12 Comments
Show 10 older commentsHide 10 older comments

See Also

Categories

Tags

Community Treasure Hunt

How well can I predict task performance from predictor variables?

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

12 Comments Show 10 older commentsHide 10 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

12 Comments
Show 10 older commentsHide 10 older comments