permutation_test: Permutation test for hypothesis testing
An implementation of a permutation test for hypothesis testing  testing the null hypothesis that two different groups come from the same distribution.
from mlxtend.evaluate import permutation_test
Overview
Permutation tests (also called exact tests, randomization tests, or rerandomization tests) are nonparametric test procedures to test the null hypothesis that two different groups come from the same distribution. A permutation test can be used for significance or hypothesis testing (including A/B testing) without requiring to make any assumptions about the sampling distribution (e.g., it doesn't require the samples to be normal distributed).
In this document, we will refer to the exact method as "permutation test" and the approximated method as "randomization test."
Permutation Test Mechanics
Under the null hypothesis (treatment = control), any permutations are equally likely. (Note that there are (n+m)! permutations, where n is the number of records in the treatment sample, and m is the number of records in the control sample). For a twosided test, we define the alternative hypothesis that the two samples are different (e.g., treatment != control).
 Compute the difference (here: mean) of sample x and sample y
 Combine all measurements into a single dataset
 Draw a permuted dataset from all possible permutations of the dataset in 2.
 Divide the permuted dataset into two datasets x' and y' of size n and m, respectively
 Compute the difference (here: mean) of sample x' and sample y' and record this difference
 Repeat steps 35 until all permutations are evaluated
 Return the pvalue as the number of times the recorded differences were at least as extreme as the original difference from 1. and divide this number by the total number of permutations
Here, the pvalue is defined as the probability, given the null hypothesis (no difference between the samples) is true, that we obtain results that are at least as extreme as the results we observed (i.e., the sample difference from 1.).
More formally, we can express the computation of the pvalue as follows (adapted from [2]):
where is the observed value of the test statistic (1. in the list above), and is the tvalue, the statistic computed from the resamples (5.) , and I is the indicator function.
Given a significance level that we specify prior to carrying out the permutation test (e.g., alpha=0.05), we fail to reject the null hypothesis if the pvalue is greater than alpha.
Note that if the number of permutation is large, sampling all permutation may not computationally be feasible. Thus, a common approximation is to perfom k rounds of permutations (where k is typically a value between 1000 and 2000).
Paired Samples
The permutation (/randomization) tests can also be performed for paired samples by setting paired=True
. The paired tests are related to the regular permutation test procedure described above except that the permuted samples are created by randomly swapping the a treatment and a control data point within each pair.
References
 [1] Efron, Bradley and Tibshirani, R. J., An introduction to the bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994.
 [2] Unpingco, JosÃ©. Python for probability, statistics, and machine learning. Springer, 2016.
 [3] Pitman, E. J. G., Significance tests which may be applied to samples from any population, Royal Statistical Society Supplement, 1937, 4: 11930 and 22532
Example 1  Twosided randomization test
Perform a twosided randomization test to test the null hypothesis that two groups, "treatment" and "control" come from the same distribution. We specify alpha=0.01 as our significance level.
treatment = [ 28.44, 29.32, 31.22, 29.58, 30.34, 28.76, 29.21, 30.4 ,
31.12, 31.78, 27.58, 31.57, 30.73, 30.43, 30.31, 30.32,
29.18, 29.52, 29.22, 30.56]
control = [ 33.51, 30.63, 32.38, 32.52, 29.41, 30.93, 49.78, 28.96,
35.77, 31.42, 30.76, 30.6 , 23.64, 30.54, 47.78, 31.98,
34.52, 32.42, 31.32, 40.72]
Since evaluating all possible permutations may take a while, we will use the approximation method (see the introduction for details) i.e., randomization test:
from mlxtend.evaluate import permutation_test
p_value = permutation_test(treatment, control,
method='approximate',
num_rounds=10000,
seed=0)
print(p_value)
0.0066993300669933005
Since pvalue < alpha, we can reject the null hypothesis that the two samples come from the same distribution.
Example 2  Permutation test for calculating the pvalue for correlation analysis (Pearson's R)
Note: this is a onesided hypothesis testing as we conduct the permutation test as "how many times obtain a correlation coefficient that is greater than the observed value?"
import numpy as np
from mlxtend.evaluate import permutation_test
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 1, 5, 6, 7])
print('Observed pearson R: %.2f' % np.corrcoef(x, y)[1][0])
p_value = permutation_test(x, y,
method='exact',
func=lambda x, y: np.corrcoef(x, y)[1][0],
seed=0)
print('P value: %.2f' % p_value)
Observed pearson R: 0.81
P value: 0.10
Example 3  Paired twosample randomization test
Suppose we have a dataset consisting of the the depths (in meters) of seven lakes of Wisconsin:
We are interested in testing the null hypothesis that the lakes in 1980 and 1990 don't have a significantly different depth. For this paired twosample test, we are conducting a randomization test for paired samples at a significance level of 0.05:
from mlxtend.evaluate import permutation_test
lakes_1980 = [3.67, 1.72, 3.46, 2.60, 2.03, 2.10, 3.01]
lakes_1990 = [2.11, 1.79, 2.71, 1.89, 1.69, 1.71, 2.01]
p_value = permutation_test(
lakes_1980, lakes_1990, paired=True, method="approximate", seed=0, num_rounds=100000
)
print('P value: %.3f' % p_value)
P value: 0.031
Since the p value is smaller than the significance threshold of 0.05, we conclude that there is a significant difference between the lake depths in 1980 and 1990.
API
permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None, paired=False)
Nonparametric permutation test
Parameters

x
: list or numpy array with shape (n_datapoints,)A list or 1D numpy array of the first sample (e.g., the treatment group).

y
: list or numpy array with shape (n_datapoints,)A list or 1D numpy array of the second sample (e.g., the control group).

func
: custom function or str (default: 'x_mean != y_mean')function to compute the statistic for the permutation test.  If 'x_mean != y_mean', uses
func=lambda x, y: np.abs(np.mean(x)  np.mean(y)))
for a twosided test.  If 'x_mean > y_mean', usesfunc=lambda x, y: np.mean(x)  np.mean(y))
for a onesided test.  If 'x_mean < y_mean', usesfunc=lambda x, y: np.mean(y)  np.mean(x))
for a onesided test. 
method
: 'approximate' or 'exact' (default: 'exact')If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by
num_rounds
. Note that 'exact' is typically not feasible unless the dataset size is relatively small. 
paired
: boolIf True, a paired test is performed by only exchanging each datapoint with its associate.

num_rounds
: int (default: 1000)The number of permutation samples if
method='approximate'
. 
seed
: int or None (default: None)The random seed for generating permutation samples if
method='approximate'
.
Returns
pvalue under the null hypothesis Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/