Kruskalwallis test for ECG signal
2 views (last 30 days)
Show older comments
Hello,
I windowed measured ECG signal for windows of lentght: 5s, 10s, 20s. Then I determined for each frame for given windowed signal the features. For each feature and for each frame of windowe signal, separately I calculated the average values and standard deviations after all windows and check whether the differences between the values for different windows are statistically significant.
For nonparametric tests for statistical analysis I used p-values and Kruskalwallis test. However For the given windowed signal for each frame of them I am obtaining the same values. How would the fix look like?
The code is as follows:
% Define files and parameters
files = {'features_arrythmia_window_5_non_overlapping_3D_var.mat', ...
'features_arrythmia_window_10_non_overlapping_3D_var.mat', ...
'features_arrythmia_window_20_non_overlapping_3D_var.mat'};
nFiles = numel(files);
mean_values_all = cell(nFiles, 1);
std_values_all = cell(nFiles, 1);
p_values_windows_all = cell(nFiles, 1);
% Step 1: Analyze statistical differences across windows
for fileIndex = 1:nFiles
% Load the features file
load(files{fileIndex}, 'feature');
[nWindows, nMeasures, nArrhythmias] = size(feature);
% Initialize matrices
mean_values = zeros(nMeasures, nArrhythmias);
std_values = zeros(nMeasures, nArrhythmias);
p_values_windows = zeros(nMeasures, nArrhythmias);
for iMeasure = 1:nMeasures
for iArrhythmia = 1:nArrhythmias
% Calculate mean and standard deviation
mean_values(iMeasure, iArrhythmia) = mean(feature(:, iMeasure, iArrhythmia), 'omitnan');
std_values(iMeasure, iArrhythmia) = std(feature(:, iMeasure, iArrhythmia), 'omitnan');
% Statistical test across windows for the same measure and arrhythmia
data = squeeze(feature(:, iMeasure, iArrhythmia));
group = (1:nWindows)'; % Grouping vector for Kruskal-Wallis test
p_values_windows(iMeasure, iArrhythmia) = kruskalwallis(data, group, 'off');
end
end
% Store results
mean_values_all{fileIndex} = mean_values;
std_values_all{fileIndex} = std_values;
p_values_windows_all{fileIndex} = p_values_windows;
end
Regards
Elzbieta
1 Comment
William Rose
on 5 Jan 2025
Please provide data files that illustrate the problem (or simulated data, if privacy is an issue), so others can run your code.
Answers (1)
William Rose
on 6 Jan 2025
First, I needed to understand what the problm is. Since I needed files to run your script, I made 3 files of random data, as shown below. Each file contains a 3D array, 'feature', with dimensions (12 or 6 or 3) by 4 by 5. The dimensions, per your code, are nWindows, nMeasures, nArrhythmias. I made nWindows=12,6,3 for window length=5,10,20, respectively, with the idea that there'd be more windows if the window length is shorter.
feature=rand(12,4,5);
save('features_arrythmia_window_5_non_overlapping_3D_var.mat','feature')
feature=rand(6,4,5);
save('features_arrythmia_window_10_non_overlapping_3D_var.mat','feature')
feature=rand(3,4,5);
save('features_arrythmia_window_20_non_overlapping_3D_var.mat','feature')
Then I ran your script.
ElzECGanalysis
The script runs without error and produces no console or graphical output. The script produce 3 cell arrays: mean_values_all, std_values_all, and p_values_windows_all. Each cell of the three cell arrays is 4x5, which indicates that the mean, std, and p-values are computed along the nWindows dimension. For example:
disp(mean_values_all{1})
disp(mean_values_all{2})
disp(std_values_all{3})
The problem comes with the p-vaues:
disp(p_values_windows_all{1})
disp(p_values_windows_all{2})
disp(p_values_windows_all{3})
The problem is that p-values are the same, for every element of each cell. The p-values are calculated with the lines
data = squeeze(feature(:, iMeasure, iArrhythmia));
group = (1:nWindows)'; % Grouping vector for Kruskal-Wallis test
p_values_windows(iMeasure, iArrhythmia) = kruskalwallis(data, group, 'off');
I'm not sure what you are trying to do, but this is not it. The commands above take the vector
feature(:,iMeasure, iArrhythmia) (which has length 12 or 6 or 3, depending on the file being processed)
and pairs that vector with a group vector 1:12 or 1:6 or 1:3 and does a K-W test. This has the effect of creating 12 or 6 or 3 groups, with only 1 member in each group. Since there is only 1 member in each group, the K-W test can't give meaningful results. In fact, the p-value in these cases depends only on the length of the vector, and is unaffected by the actual random values in the vector:
disp(kruskalwallis(rand(1,12), 1:12, 'off'))
disp(kruskalwallis(rand(1,6), 1:6, 'off'))
disp(kruskalwallis(rand(1,3), 1:3, 'off'))
Note that the p-values above match the p-values for the data in the files, even though the random vectors are new. Do it again with new random values, and the p-values are the same:
disp(kruskalwallis(rand(1,12), 1:12, 'off'))
disp(kruskalwallis(rand(1,6), 1:6, 'off'))
disp(kruskalwallis(rand(1,3), 1:3, 'off'))
Please specify clearly and specifically the hypothesis you want to test. I suspect you want to test the hypothesis that the distribution of feature(:,j,k) (where j and k are nMeasures and nArrhythmias) is the same for window lengths 5, 10, and 20. If that is correct, then you can't do it the way you're doing it now.
We can do the K-W test across the three window lengths, for each value of j and k, but if we do, there'll be a significant multiple comparisons problem. That is, the probability of making a type I error for the whole group of comparisons is higher than the p-value for each individual comparison. A Bonferroni correction is one way to address this issue, but that has its own issues.
Please explain the meaning and role of dimensions nMeasures and nArrythmias in your data set. Please provide sample data files. I am asking for this info in order to provide more useful advice, including on the potential usefulness of a Bonferroni correction.
1 Comment
William Rose
on 6 Jan 2025
Here is an example of how you can test hypothesis H0: the distribution of feature(:,j,k) is the same, for window lengths 5, 10, and 20 (where j=1:nMeasures and k=1:nArrhythmias). For this example, I assume nWindows=5 for all three files, nMeasures=4 (for example, the features could be RR interval, QRS duration, and QT interval), and nArrhythmias=2. (If nWindows is different for the different files, we will need to adjust the code. If nMeasures or nArrythmias is different for different files, then please explain the data sets and your goals as clearly as possible.) In this example, random numbers are used. The random numbers are all ~U(0,1), except for feature (:,1,1) when window length=5. In that case only, the random numbers are U(2,3). Therefore I expect a significant p-value (p<0.05) when j,k=1,1, and I expect no significant difference for all other j,k.
% Make simulated data files
feature=rand(5,4,2); feature(:,1,1)=feature(:,1,1)+2;
save('features_5s.mat','feature');
feature=rand(5,4,2);
save('features_10s.mat','feature');
feature=rand(5,4,2);
save('features_20s.mat','feature');
% Load the data from the files
files={'features_5s.mat','features_10s.mat','features_20s.mat'};
nFile=numel(files);
for fileIndex=1:nFile
load(files{fileIndex},'feature');
feature4d(:,:,:,fileIndex)=feature;
end
[nWindows,nMeasures,nArrhythmias,nF] = size(feature4d);
fprintf('Size(feature4d)=%d,%d,%d,%d.\n',size(feature4d))
p_values=zeros(nMeasures,nArrhythmias); % allocate array
for j=1:nMeasures
for k=1:nArrhythmias
p_values(j,k)=kruskalwallis(squeeze(feature4d(:,j,k,:)),[],'off');
end
end
disp(p_values)
Note that p_values(1,1) is <0.05, and all the other p_values() are >0.05. This is what we expected, based on the simulated data - see above. p_value(j,k) is probability that the null hypothesis is true. The null hypothesis is H0: features(:,j,k) has the same distribution for files 1, 2, and 3.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!