How to statistically determine how much two curves are different?
Show older comments
Hi,
I have two curves with diferent X and Y but the shape are similar. I would like to determin in percentagem or some number how they look like.
I tried Kolmogorov-Smirnov test in matlab but the problem is that my curve has different dimensions as well. One is 1x1001 and the other is 1x1000000.
Attached are the two data if it is helpfull.
Thanks for any help.
Regards,
Taynara
10 Comments
Ive J
on 4 Feb 2021
I don't understand the question, what do you mean exactly by "to calculate how diff they are in numbers" ?
Why do you wanna use KS-test? what are you assumptions and research questions here?
Regardless, for "One is 1x1001 and the other is 1x1000000" you still can use kstest2.
Tay
on 4 Feb 2021
Tay
on 4 Feb 2021
Every test (roughly) has some assumptions and KS is no exception here. So, in your case you're interested in testing if those two samples come from a same distribution, which is of course depends on your research questions and you cannot choose a test simply because it gives you some p-value without knowing the underlying assumptions. Say, you've measured refractive index in two different experimental settings (CT and Fraunhofer) and you are asking if both come from the same distribution (H0 = P_CT == P_Fraunhofer). But what happens if you reject H0 here? You should be able to discuss and interpret this.
That being said, if you just simply wanna see how much they differ with respect to your dependent variable (RI), the answer below would work for you.
Tay
on 6 Feb 2021
In this case I guess you would be able to use a Wilcoxon Signed Rank Test (your data doesn't look fairly normal) test, but you must keep paired samples:
% height_CIT and CIT match
% height_Franhoufer and Franhoufer match
[f_cit, f_franhoufer] = ismember(hieght_CIT, height_Franhoufer); % under the assumption that height_CIT and height_Franhoufer have overlaps, otherwise you may use a cutoff for height to find their paired samples
f_franhoufer(f_franhoufer < 1) = [];
CIT = CIT(f_cit); Franhoufer = Franhoufer(f_franhoufer); % only keep paired samples
[p, h, stat] = signrank(CIT, Franhoufer); % does their underlying median significantly differ?
Nevertheless, as I also pointed out before, your research questions should be clear and of course the reason(s) why you intend to perform a hypothesis test. I suspect your data comes from some sort of simulation/modeling. In this scenario, you don't have any limitations in sample size, and as the result, you can easily increase the power to reject the null hypothesis (for a more complete discussion look here) rendering any p-value practically useless. But if the data come from some experiments (say you measured RI using different devices or experimental settings), then hypothesis tests may make sense.
You may also use correlation coefficients (your data looks monotonic, so you can use Spearman for instance) to see how well those two graphs correlate, but that does not measure the magnitude of their difference though.
Tay
on 7 Feb 2021
As an example, assume we measured blood pressure of some patients before and after some treatment. We wonder if this treatment has significanly influeced the blood pressure, so we compare the blood pressure between same patients before and after this treatment. We do that by comparing the median (only for the sake of using Wilcoxon Signed Rank Test, if we have enough sample size or normally distributed variables, we do a paired t-test instead) of two samples:
load hospital
[p, h] = signrank(hospital.BloodPressure(:, 1), hospital.BloodPressure(:, 2))
p =
3.7965e-18
h =
logical
1
Small p-value here suggests we are safe to reject null hypothesis (our alpha is usually 0.05, which of course should be set before running any hypothesis test), and therefore, the median of two samples are not equal, so the treatment indeed has some effect on blood pressure values.
For normality, you should plot the histogram of your variable (see histogram) to see if your variables are normal or not (if not, you cannot use more powerful tests like t-test).
height_CIT and height_Franhoufer would be height value (on X-axis) for each value of your response (on Y-axis). For paired tests you need to compare the same samples together (look at my example above, each patient was measured twice, before and after the treatment). So your two vectors should be something like:
sample height CIT Franhoufer
1 5 1.51 1.55
2 10 1.45 1.43
. . . .
Amanda Botelho Amaral
on 17 Dec 2021
I have a similar problem. How did you solve it?
Answers (1)
clc,clear
x1 = 0:0.2:4; % fine mesh
x2 = 0:0.5:4; % coarse mesh
y1 = sin(x1); % fine data
y2 = sin(0.9*x2); % coarse data
y22 = interp1(x2,y2,x1); % create better data (interpolation)
plot(x1,y1,'.-r')
hold on
plot(x2,y2,'.-b')
plot(x1,y22)
plot([x1; x1],[y1; y22],'-b')
hold off

Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!