Non-parametric statistical tests

This page compares the Mann-Whitney-Wilcoxon test against two other non-parametric statistical tests, which are the Kolmogorov-Smirnov and the Pearson's Chi-square test.

We introduce each test separately in Section 1 to 3, while elaborating more on the chosen test (Mann-Whitney-Wilcoxon, Section 1).

We give a qualitative comparison in Section 4 and demonstrate that the test of our choice is solely sensitive to a difference in two distributions' medians. It does not have an undesired (for our application) sensitivity to differences in shape, which is the case for the other two.

1. Mann-Whitney-Wilcoxon test (MWW)

The MWW test was first presented by Wilcoxon in 1945 [1] and two years later discussed by Mann and Whitney [2] on a more solid mathematical basis. The test assess whether one of two random variables is stochastically larger than the other, i.e. whether their medians differ.

Let X1 and X2 be sets of drawings from unknown distributions functions, respectively. The test to assess whether the two underlying random variables are identical is done in three steps:

The elements of the two sets X₁ and X₂ are concatenated. If X₁ an X₂ have cardinalities n₁ and n₂, respectively, the joint set has cardinality n₁ + n₂.
The elements in the joint set are sorted in increasing order. The smallest (first) element has rank 1, the largest (last) element has rank n₁ + n₂.
The ranksum is the sum of the ranks from all those elements that came from the set X1. Wilcoxon denoted this statistic with T.

For example, let X₁ = {17.5, -2} and X₂ = {23, -11.7, 3.1, 0.9, 42}. Then the three steps are as follows:

X∪Y: {17.5, -2, 23, -11.7, 3.1, 0.9, 42}
sort: {-11.7, -2, 0.9, 3.1, 17.5, 23, 42}
ranksum: T = 2 + 5 = 7

The expected mean and variance of the statistic T are [2]:

μ_T =	n₁*(n₁ + n₂ + 1)
	2

σ_T² =	n₁n₂(n₁ + n₂ + 1)
	12

The expected mean and variance can be used to normalize the statistic, yielding the standard z value:

z =	T - μ_T
	σ_T

The z value is positive (negative) if the median of the first distribution is larger (smaller) than the one from the 2nd distribution. If the medians are equal, the z value is equal to zero. This can be seen in the graphs of the third column in Section 4.

2. Kolmogorov-Smirnov test (KS)

The two-sample KS test assesses whether two probability distributions differ or not [3,4]. It is sensitive to location and shape.

Given two drawings X₁ and X₂, the empirical cumulative distributions functions are F₁(x) and F₂(x), respectively. Then the test statistic is computed as:

D_{n₁, n₂} = sup_x|F₁(x) - F₂(x)|

which is the maximum difference between the two cumulative distribution functions along the horizontal x-axis. n₁, n₂ are the cardinalities of X₁ and X₂, respectively.

The statistic D_{n₁, n₂} can be normalized using precomputed tables [4].

3. Pearson's Chi-square test

The Pearson's Chi-square test assesses whether an observed random variable with distribution follows an expected distribution [5].

Let O_i and E_i be the bins of the observed and expected probability function, respectively. Then the Chi-square test is:

X² =	Σ_i=1..n	(O_i - E_i)²
		E_i

where X² follows a χ²-distribution.

4. Qualitative Comparison

The following table qualitatively shows for different input distributions (columns 1 and 2) the behaviour of the three presented tests (columns 3-5). The first distribution is always rectangular. The second distribution:

has same shape, but different median (1st row)
has equal median, but different shape (2nd row)
has equal median, but different shape (3rd row)
is identical to the first distribution (4th row)

If the test statistic is zero, the respective graph is marked with a dashed frame. One sees that the MWW test is only unequal to zero in the first case where the medians are different. The KS test measures the difference in shape for the example in row two. However, it barely measure the difference in shape in the third row since the cumulative distribution functions are very similar. The test statistic is close to zero. The Chi-square test also measures the difference in the third row since it sums up the squared differences in every single bin.

The semantic gray-level enhancement and color transfer are based on tone-mapping curves that adaptively de- or increase pixel values in different channles independent of their distribution. As we do not consider the shape of the distribution as an extra feature we use the MWW test. We have found the MWW to be more robust and adapted to our application.

input: 1st distribution	input: 2nd distribution	Mann-Whitney-Wilcoxon test	Kolmogorov-Smirnov test	Pearson's Chi-square test

References:

[1] Frank Wilcoxon, Individual Comparisons by Ranking Methods, Biometrics Bulletin, vol. 1, nr. 6, pp. 80-83, 1945

[2] H. B. Mann and D. R. Whitney, On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other, The Annals of Mathematical Statistics, vol. 18, nr. 1, pp. 50-60, 1947

[3] A. Kolmogorov, Sulla determinazione empirica di una legge di distributione, Giornale dell' Istituto Italiano degli Attuari 4, pp 83-91, 1933

[4] N. Smirnov, Table for Estimating the Goodness of Fit of Empirical Distributions, The Annals of Mathematical Statistics, vol. 19, nr. 2, pp. 279-281, 1948

[5] R. L. Plackett, Karl Pearson and the Chi-Squared Test, International Statistical Review, vol. 51, nr. 1, pp. 59-72, 1983