stats_tools.perm_test¶

stats_tools.perm_test(X, Y, paired=None, useR=False, nperms=10000, tail='two', correction='maxT', get_dist=False, mth='t', verbose=True, fname=None, vars=None, g1str=None, g2str=None)[source]¶

Perform permutation tests for paired/unpaired uni-/multi-variate two-sample problems

Parameters:

Parameters:	X : NumPy 2darray An #samples-by-#variables array holding the data of the first group X : NumPy 2darray An #samples-by-#variables array holding the data of the second group paired : bool Switch to indicate whether the two data-sets X and Y represent paired (paired = True) or unpaired data. useR : bool Switch that determines whether the R library flip is used for testing. Note: unpaired data can only be tested in R! nperms : int Number of permutations for shuffling the input data tail : str The alternative hypothesis the data is tested against. If tail = ‘less’, then the null is tested against the alternative that the mean of the first group is less than the mean of the second group (‘lower tailed’). Alternatively, tail = ‘greater’ indicates the alternative that the mean of the first group is greater than the mean of the second group (‘upper tailed’). For tail = ‘two’ the alternative hypothesis is that the means of the data are different (‘two tailed’), correction : str Multiplicity correction method. If the R package flip is not used for testing (useR = False) this option is ignored, since MNE‘s permutation t-test only supports p-value correction using the maximal test statistic Tmax [R14]. Otherwise (either if paired = False or useR = True) the R library flip is used which supports the options “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”, “Fisher”, “Liptak”, “Tippett”, “MahalanobisT”, “MahalanobisP”, “minP”, “maxT”, “maxTstd”, “sumT”, “Direct”, “sumTstd”, “sumT2” (see [R13] for a detailed explanation). By default “maxT” is used. get_dist : bool Switch that determines whether the sampling distribution used for testing is returned (by default it is not returned). mth : str Only relevant if testing is done in R (useR = True or paired = False). If mth is not specified a permutation t-test will be performed. Available (but completely untested!) options are: “t”, “F”, “ANOVA”,”Kruskal-Wallis”, “kruskal”, “Mann-Whitney”, “sum”, “Wilcoxon”, “rank”, “Sign” (see [R13] for details). Note that by design this wrapper only supports two-sample problems (X and Y). To analyze k-sample data using, e.g., an ANOVA, please refer to the flip package directly. verbose : bool If verbose = True then intermediate results, progression messages and a table holding the final statistical evaluation are printed to the prompt. fname : str If provided, testing results are saved to the csv file fname. The file-name can be provided with or without the extension ‘.csv’ (WARNING: existing files will be overwritten!). By default, the output is not saved. vars : list or NumPy 1darray Names of the variables that are being tested. Only relevant if verbose = True and/or fname is not None. If vars is None and output is shown and/or saved, a generic list [‘Variable 1’,’Variable 2’,...] will be used in the table summarizing the final results. g1str : str Name of the first sample. Only relevant if verbose = True and/or fname is not None. If g1str = None and output is shown/saved a generic group name (‘Group 1’) will be used in the table showing the final results. g2str : str Name of the second sample. Only relevant if verbose = True and/or fname is not None. If g2str = None and output is shown/saved a generic group name (‘Group 2’) will be used in the table showing the final results.
Returns:	stats_dict : dictionary Test-results are saved in a Python dictionary. By default stats_dict has the keys ‘pvals’ (the adjusted p-values) and ‘statvals’ (values of the test statistic observed for all variables). If get_dist = True then an additional entry ‘dist’ is created for the employed sampling distribution.

X : NumPy 2darray

An #samples-by-#variables array holding the data of the first group

X : NumPy 2darray

An #samples-by-#variables array holding the data of the second group

paired : bool

Switch to indicate whether the two data-sets X and Y represent paired (paired = True) or unpaired data.

useR : bool

Switch that determines whether the R library flip is used for testing. Note: unpaired data can only be tested in R!

nperms : int

Number of permutations for shuffling the input data

tail : str

The alternative hypothesis the data is tested against. If tail = ‘less’, then the null is tested against the alternative that the mean of the first group is less than the mean of the second group (‘lower tailed’). Alternatively, tail = ‘greater’ indicates the alternative that the mean of the first group is greater than the mean of the second group (‘upper tailed’). For tail = ‘two’ the alternative hypothesis is that the means of the data are different (‘two tailed’),

correction : str

Multiplicity correction method. If the R package flip is not used for testing (useR = False) this option is ignored, since MNE‘s permutation t-test only supports p-value correction using the maximal test statistic Tmax [R14]. Otherwise (either if paired = False or useR = True) the R library flip is used which supports the options “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”, “Fisher”, “Liptak”, “Tippett”, “MahalanobisT”, “MahalanobisP”, “minP”, “maxT”, “maxTstd”, “sumT”, “Direct”, “sumTstd”, “sumT2” (see [R13] for a detailed explanation). By default “maxT” is used.

get_dist : bool

Switch that determines whether the sampling distribution used for testing is returned (by default it is not returned).

mth : str

Only relevant if testing is done in R (useR = True or paired = False). If mth is not specified a permutation t-test will be performed. Available (but completely untested!) options are: “t”, “F”, “ANOVA”,”Kruskal-Wallis”, “kruskal”, “Mann-Whitney”, “sum”, “Wilcoxon”, “rank”, “Sign” (see [R13] for details). Note that by design this wrapper only supports two-sample problems (X and Y). To analyze k-sample data using, e.g., an ANOVA, please refer to the flip package directly.

verbose : bool

If verbose = True then intermediate results, progression messages and a table holding the final statistical evaluation are printed to the prompt.

fname : str

If provided, testing results are saved to the csv file fname. The file-name can be provided with or without the extension ‘.csv’ (WARNING: existing files will be overwritten!). By default, the output is not saved.

vars : list or NumPy 1darray

Names of the variables that are being tested. Only relevant if verbose = True and/or fname is not None. If vars is None and output is shown and/or saved, a generic list [‘Variable 1’,’Variable 2’,...] will be used in the table summarizing the final results.

g1str : str

Name of the first sample. Only relevant if verbose = True and/or fname is not None. If g1str = None and output is shown/saved a generic group name (‘Group 1’) will be used in the table showing the final results.

g2str : str

Name of the second sample. Only relevant if verbose = True and/or fname is not None. If g2str = None and output is shown/saved a generic group name (‘Group 2’) will be used in the table showing the final results.

Returns:

stats_dict : dictionary

Test-results are saved in a Python dictionary. By default stats_dict has the keys ‘pvals’ (the adjusted p-values) and ‘statvals’ (values of the test statistic observed for all variables). If get_dist = True then an additional entry ‘dist’ is created for the employed sampling distribution.

See also

printstats: routine to pretty-print results computed by a hypothesis test
flip: a R library for uni-variate and multivariate permutation (and rotation) tests, currently available here
mne: a software package for processing magnetoencephalography (MEG) and electroencephalography (EEG) data, currently available at the Python Package Index here

Notes

This routine is merely a wrapper and does not do any heavy computational lifting. In case of paired data and useR = False the function permutation_t_test of the MNE package [R14] is called. If the samples are independent (paired = False) or useR = True the R library flip [R13] is loaded. Thus, this routine has a number of dependencies: for paired data at least the Python package mne is required, unpaired samples can only be tested if pandas as well as rpy2 (for R/Python conversion) and, of course, R and the R-library flip are installed (and in the search path). To show/save results the routine printstats (part of this module) is called.

References

[R13]

(1, 2, 3, 4) F. Pesarin. Multivariate Permutation Tests with Applications in Biostatistics. Wiley, New York, 2001.

[R14]

(1, 2, 3) A. Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier, C. Brodbeck, L. Parkkonen, M. Haemaelaeinen. MNE software for processing MEG and EEG data. NeuroImage 86, 446-460, 2014

Examples

Assume we want to analyze medical data of 200 healthy adult subjects collected before and after physical exercise. For each subject, we have measurements of heart-rate (HR), blood pressure (BP) and body temperature (BT) before and after exercise. Thus our data sets contain 200 observations of 3 variables. We want to test the data for a statistically significant difference in any of the three observed quantities (HR, BP, BT) after physical exercise compared to the measurements acquired before exercise.

Assume all samples are given as Python lists: HR_before, BP_before, BT_before, HR_after, BP_after, BT_after. To be able to use perm_test, we collect the data in NumPy arrays:

>>> import numpy as np
>>> X = np.zeros((200,3))
>>> X[:,0] = HR_before
>>> X[:,1] = BP_before
>>> X[:,2] = BT_before
>>> Y = np.zeros((200,3))
>>> Y[:,0] = HR_after
>>> Y[:,1] = BP_after
>>> Y[:,2] = BT_after

Our null-hypothesis is that physical exercise did not induce a significant change in any of the observed variables. As an alternative hypothesis, we assume that exercise induced an increase in heart rate, blood pressure and body temperature. To test our hypotheses we use the following command

>>> perm_test(X,Y,paired=True,nperms=20000,tail='less',fname='stats.csv',
>>>           vars=['Heart Rate','Blood Pressure','Body Temperature'],
>>>           g1str='Before Exercise',g2str='After Exercise')

which performs a lower-tailed paired permutation t-test with 20000 permutations, prints the results to the prompt and also saves them in the file stats.csv.