Probability Distributions

Overview

Probability distributions are fundamental mathematical functions that describe the likelihood of different possible outcomes for a random variable. They form the theoretical foundation of statistics and data science, enabling practitioners to model uncertainty, quantify risk, predict future events, and make informed decisions under incomplete information.

Every probability distribution is defined by its probability density function (PDF) for continuous variables or probability mass function (PMF) for discrete variables, which specifies the relative likelihood of each outcome. Associated with each distribution are its cumulative distribution function (CDF) for calculating probabilities up to a threshold, quantile functions for finding threshold values, and key summary statistics like mean, variance, and skewness that characterize its behavior.

Background and Implementation

This library provides a comprehensive suite of probability distributions built on SciPy, a well-established Python library for scientific computing. SciPy’s statistical module (scipy.stats) contains implementations of dozens of continuous and discrete distributions, along with tools for computing their properties. These tools handle the mathematical complexity behind distribution computations, making it accessible to calculate probabilities, generate random samples, and derive statistics without requiring manual implementation of sophisticated numerical algorithms.

Continuous Distributions

Continuous probability distributions apply to variables that can take any value within a range. The NORM tool provides the normal (Gaussian) distribution, the most widely used distribution in statistics due to the central limit theorem. Other essential continuous distributions include the EXPON distribution for modeling wait times and lifetimes, the CHISQ distribution for hypothesis testing and goodness-of-fit tests, and the T_DIST distribution for inference on small samples. The BETA, LOGNORM, WEIBULL_MIN, and PARETO distributions model phenomena with specific shapes and are useful for modeling skewed or heavy-tailed data. The UNIFORM distribution represents maximum uncertainty within a bounded range, while the LAPLACE distribution is useful for modeling data with a sharp peak around a central value.

Discrete Distributions

Discrete probability distributions describe outcomes that take only integer or countable values. The BINOM distribution models the number of successes in a fixed number of independent trials, while the POISSON_DIST distribution models the count of rare events occurring in a fixed interval. The GEOM distribution describes the number of trials until the first success, and the NBINOM distribution extends this to multiple successes. The BERNOULLI distribution represents a single binary outcome. Specialized distributions like HYPERGEOM model sampling without replacement, ZIPF and ZIPFIAN describe power-law phenomena, and SKELLAM models the difference between two Poisson-distributed variables. These discrete tools are essential for count data analysis and categorical problems.

Multivariate Distributions

When working with multiple correlated random variables, multivariate distributions become essential. The MULTIVARIATE_NORMAL distribution generalizes the normal distribution to multiple dimensions and is fundamental for modeling correlated continuous variables. The MULTINOMIAL distribution extends the binomial distribution to multiple categories, while DIRICHLET provides a distribution over probability distributions themselves. The WISHART distribution models covariance matrices, making it crucial for Bayesian statistics and random matrix theory. Additional tools include MULTIVARIATE_T for robust multivariate inference, and specialized distributions like VONMISES_FISHER for directional data on hyperspheres.

Using the Distribution Tools

Each distribution tool provides a standardized interface to common operations: computing probability densities or masses, evaluating cumulative probabilities, finding quantiles, generating random samples, and calculating summary statistics. Choose a distribution based on your data characteristics—are values continuous or discrete? Is the data bounded or unbounded? Are there multiple correlated variables? The distributions in this library provide tested implementations that handle numerical edge cases and ensure accuracy across different parameter ranges.

Figure 1: Probability distribution fundamentals: (A) The normal distribution and its cumulative function demonstrate how PDFs and CDFs relate. (B) Comparison of discrete binomial and continuous normal distributions illustrates the difference between modeling count data and continuous measurements.

Continuous Distributions

Tool Description
BETA Wrapper for scipy.stats.beta distribution providing multiple statistical methods.
CAUCHY Wrapper for scipy.stats.cauchy distribution providing multiple statistical methods.
CHISQ Compute various statistics and functions for the chi-squared distribution from scipy.stats.chi2.
EXPON Exponential distribution function wrapping scipy.stats.expon.
F_DIST Unified interface to the main methods of the F-distribution, including PDF, CDF, inverse CDF, survival function, and distribution statistics.
LAPLACE Laplace distribution function supporting multiple methods.
LOGNORM Compute lognormal distribution statistics and evaluations.
NORM Normal (Gaussian) distribution function supporting multiple methods.
PARETO Generalized Pareto distribution function supporting multiple methods.
T_DIST Student’s t distribution function supporting multiple methods from scipy.stats.t.
UNIFORM Uniform distribution function supporting multiple methods.
WEIBULL_MIN Compute various functions of the Weibull minimum distribution using scipy.stats.weibull_min.

Discrete Distributions

Tool Description
BERNOULLI Calculates properties of a Bernoulli discrete random variable.
BETABINOM Compute Beta-binomial distribution values from scipy.stats.betabinom.
BETANBINOM Compute Beta-negative-binomial distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
BINOM Compute Binomial distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
BOLTZMANN Compute Boltzmann distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
DLAPLACE Compute Discrete Laplace distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
GEOM Compute Geometric distribution values using scipy.stats.geom.
HYPERGEOM Compute Hypergeometric distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
LOGSER Compute Log-Series distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
NBINOM Compute Negative Binomial distribution values using scipy.stats.nbinom.
NHYPERGEOM Compute Negative Hypergeometric distribution values using scipy.stats.nhypergeom.
PLANCK Compute Planck distribution values using scipy.stats.planck.
POISSON_DIST Compute Poisson distribution values using scipy.stats.poisson.
RANDINT Compute Uniform discrete distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
SKELLAM Compute Skellam distribution values using scipy.stats.skellam.
YULESIMON Compute Yule-Simon distribution values using scipy.stats.yulesimon.
ZIPF Compute Zipf distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
ZIPFIAN Compute Zipfian distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.

Multivariate Distributions

Tool Description
DIRICHLET Computes the PDF, log-PDF, mean, variance, covariance, entropy, or draws random samples from a Dirichlet distribution.
DIRICHLET_MULTINOM Computes the probability mass function, log probability mass function, mean, variance, or covariance of the Dirichlet multinomial distribution.
MATRIX_NORMAL Computes the PDF, log-PDF, or draws random samples from a matrix normal distribution.
MULTINOMIAL Compute the probability mass function, log-PMF, entropy, covariance, or draw random samples from a multinomial distribution.
MULTIVARIATE_NORMAL Computes the PDF, CDF, log-PDF, log-CDF, entropy, or draws random samples from a multivariate normal distribution.
MULTIVARIATE_T Computes the PDF, CDF, or draws random samples from a multivariate t-distribution.
MV_HYPERGEOM Computes probability mass function, log-PMF, mean, variance, covariance, or draws random samples from a multivariate hypergeometric distribution.
ORTHO_GROUP Draws random samples of orthogonal matrices from the O(N) Haar distribution using scipy.stats.ortho_group.
RANDOM_CORRELATION Generates a random correlation matrix with specified eigenvalues.
SPECIAL_ORTHO_GROUP Draws random samples from the special orthogonal group SO(N), returning orthogonal matrices with determinant +1.
UNIFORM_DIRECTION Draws random unit vectors uniformly distributed on the surface of a hypersphere in the specified dimension.
UNITARY_GROUP Generate a random unitary matrix of dimension N from the Haar distribution.
VONMISES_FISHER Computes the PDF, log-PDF, entropy, or draws random samples from a von Mises-Fisher distribution on the unit hypersphere.
WISHART Computes the PDF, log-PDF, or draws random samples from the Wishart distribution using scipy.stats.wishart.