Hypothesis Tests

Overview

Hypothesis testing is a formal statistical procedure used to determine whether an observed effect in data is real or likely due to random chance. It provides a structured framework for making evidence-based decisions about populations based on sample data. All hypothesis tests in this category are implemented using SciPy’s comprehensive statistical functions.

Fundamental Concepts

Every hypothesis test involves setting up competing claims about a population parameter:

Null Hypothesis (H_0): The default assumption that there is no effect or no difference (e.g., “There is no difference between the groups” or “The variable has no association”).
Alternative Hypothesis (H_1 or H_a): The claim you are testing for (e.g., “Group A differs from Group B” or “The variables are associated”).
P-value: The probability of observing results as extreme as or more extreme than those obtained, assuming the null hypothesis is true. When p < \alpha (typically 0.05), we reject H_0 in favor of H_1.
Significance Level (\alpha): The threshold (usually 0.05) for deciding whether the p-value provides sufficient evidence against the null hypothesis.
Type I and Type II Errors: Type I (false positive) occurs when rejecting a true H_0; Type II (false negative) occurs when failing to reject a false H_0.

Figure 1: Decision-making framework for hypothesis tests. Left: The t-distribution showing critical regions for a one-sided test. Right: The relationship between effect size and statistical power, illustrating that larger effects are easier to detect.

Test Selection by Data Structure

Hypothesis tests are organized by the structure of your data and research question:

One Sample Tests: Used when testing a single sample against a population parameter or theoretical distribution. These include tests for the mean (e.g., TTEST_1SAMP), goodness-of-fit tests (e.g., SHAPIRO, KSTEST), and tests for specific distributional properties (e.g., NORMALTEST for normality, JARQUE_BERA for skewness and kurtosis).
Independent Sample Tests: Used when comparing two or more independent groups or samples. These range from parametric tests assuming normality and equal variances (e.g., TTEST_IND, F_ONEWAY) to non-parametric alternatives (e.g., MANNWHITNEYU, KRUSKAL) and specialized tests for variance equality (e.g., LEVENE, FLIGNER). Multiple comparison corrections are supported through specialized tests like DUNNETT.
Association and Correlation Tests: Used when examining relationships between two or more variables. These include correlation-based tests (e.g., PEARSONR, SPEARMANR, KENDALLTAU) for measuring linear or monotonic associations, tests of independence for categorical variables (e.g., CHI2_CONTINGENCY, FISHER_EXACT), and robust regression alternatives (e.g., THEILSLOPES, SIEGELSLOPES) for estimating relationships while reducing the influence of outliers.

Choosing the Right Test

Your choice of test depends on several factors: the number of samples or groups, whether your data are independent or paired, the scale of measurement (continuous, ordinal, categorical), whether you want to assume normality and equal variances, and your research hypothesis (one-sided or two-sided). Parametric tests are generally more powerful when their assumptions are met, while non-parametric tests are more robust to violations of these assumptions but may have less power. Exact tests (e.g., FISHER_EXACT, BARNARD_EXACT) are particularly useful for small sample sizes or sparse contingency tables.

Association Correlation

Tool	Description
BARNARD_EXACT	Perform Barnard’s exact test on a 2x2 contingency table.
BOSCHLOO_EXACT	Perform Boschloo’s exact test on a 2x2 contingency table.
CHI2_CONTINGENCY	Perform the chi-square test of independence for variables in a contingency table.
FISHER_EXACT	Perform Fisher’s exact test on a 2x2 contingency table.
KENDALLTAU	Calculate Kendall’s tau, a correlation measure for ordinal data.
PAGE_TREND_TEST	Perform Page’s L trend test for monotonic trends across treatments.
PEARSONR	Calculate the Pearson correlation coefficient and p-value for two datasets.
POINTBISERIALR	Calculate a point biserial correlation coefficient and its p-value.
SIEGELSLOPES	Compute the Siegel repeated medians estimator for robust linear regression using scipy.stats.siegelslopes.
SOMERSD	Calculate Somers’ D, an asymmetric measure of ordinal association between two variables.
SPEARMANR	Calculate a Spearman rank-order correlation coefficient with associated p-value.
THEILSLOPES	Compute the Theil-Sen estimator for a set of points (robust linear regression).
WEIGHTEDTAU	Compute a weighted version of Kendall’s tau correlation coefficient.

Independent Sample

Tool	Description
ALEXANDERGOVERN	Performs the Alexander-Govern test for equality of means across multiple independent samples with possible heterogeneity of variance.
ANDERSON_KSAMP	Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population.
ANSARI	Performs the Ansari-Bradley test for equal scale parameters (non-parametric) using scipy.stats.ansari.
BRUNNERMUNZEL	Computes the Brunner-Munzel nonparametric test for two independent samples.
BWS_TEST	Performs the Baumgartner-Weiss-Schindler test on two independent samples.
CVM_2SAMP	Performs the two-sample Cramér-von Mises test using scipy.stats.cramervonmises_2samp.
DUNNETT	Performs Dunnett’s test for multiple comparisons of means against a control group.
EPPS_SINGLE_2SAMP	Compute the Epps-Singleton test statistic and p-value for two samples.
F_ONEWAY	Performs a one-way ANOVA test for two or more independent samples.
FLIGNER	Performs the Fligner-Killeen test for equality of variances across multiple samples.
FRIEDMANCHISQUARE	Computes the Friedman test for repeated samples.
KRUSKAL	Computes the Kruskal-Wallis H-test for independent samples.
KS_2SAMP	Performs the two-sample Kolmogorov-Smirnov test for goodness of fit.
LEVENE	Performs the Levene test for equality of variances across multiple samples.
MANNWHITNEYU	Performs the Mann-Whitney U rank test on two independent samples using scipy.stats.mannwhitneyu.
MEDIAN_TEST	Performs Mood’s median test to determine if two or more independent samples come from populations with the same median.
MOOD	Perform Mood’s two-sample test for scale parameters.
POISSON_MEANS_TEST	Performs the Poisson means test (E-test) to compare the means of two Poisson distributions.
RANKSUMS	Computes the Wilcoxon rank-sum statistic and p-value for two independent samples.
TTEST_IND	Performs the independent two-sample t-test for the means of two groups.
TTEST_IND_STATS	Perform a t-test for means of two independent samples using summary statistics.

One Sample

Tool	Description
BINOMTEST	Perform a binomial test for the probability of success in a Bernoulli experiment.
JARQUE_BERA	Perform the Jarque-Bera goodness of fit test for normality.
KSTEST	Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
KURTOSISTEST	Test whether the kurtosis of a sample is different from that of a normal distribution.
NORMALTEST	Test whether a sample differs from a normal distribution (omnibus test).
QUANTILE_TEST	Perform a quantile test to determine if a population quantile equals a hypothesized value.
SHAPIRO	Perform the Shapiro-Wilk test for normality.
SKEWTEST	Test whether the skewness of a sample is different from that of a normal distribution.
TTEST_1SAMP	Perform a one-sample t-test for the mean of a group of scores.