statistics reported by ranksum are wrong

2 views (last 30 days)
Jeff Howbert
Jeff Howbert on 16 Apr 2013
This is less a question and more of a bug report.
The ranksum U statistic reported by the ranksum function is much too large. Here's a simple example:
a1 = 1 : 100;
a2 = a1 + 0.01;
[ p, h, stats ] = ranksum( a2, a1 )
p =
0.9037
h =
0
stats =
zval: 0.1209
ranksum: 10100
The correct ranksum, working from the formal definition of Wilcoxon ranksum, is 5050. I have verified this with an online calculator for the U statistic.
After some experimentation, I believe the value being reported for U is actually U + ( n1 * n2 ) / 2, where n1 and n2 are the number of instances in the two samples.
The reported p and h values agree reasonably well with what I get from other calculators.

Answers (1)

the cyclist
the cyclist on 16 Apr 2013
Jeff,
Here is an excerpt from the notes to the equivalent function in R:
"The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)"
It seems you are seeing this lack of convention.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!