GLM_GAMMA

Overview

The GLM_GAMMA function fits a Generalized Linear Model (GLM) with the Gamma distribution family, designed for modeling positive continuous response variables. This model is particularly well-suited for data where the variance increases with the mean, such as insurance claims, survival times, income, and other right-skewed positive quantities.

Generalized Linear Models extend ordinary linear regression by allowing the response variable to follow a distribution from the exponential family and relating the mean response to the linear predictor through a link function. The Gamma GLM assumes responses follow a Gamma distribution, where the variance is proportional to the square of the mean:

\text{Var}(Y) = \phi \mu^2

where \mu is the mean and \phi is the dispersion (scale) parameter.

This implementation uses the statsmodels library’s GLM class with the Gamma family. The function supports three link functions:

  • Inverse (canonical link): g(\mu) = 1/\mu, the default and theoretically optimal for Gamma
  • Log: g(\mu) = \log(\mu), ensures positive fitted values and is commonly used
  • Identity: g(\mu) = \mu, provides direct interpretation but may produce negative predictions

The model is fitted using Iteratively Reweighted Least Squares (IRLS), an efficient algorithm for maximum likelihood estimation in GLMs. The output includes coefficient estimates, standard errors, z-statistics, p-values, and confidence intervals for each predictor, along with model fit statistics including deviance, Pearson chi-squared, AIC, BIC, and log-likelihood.

For additional background, see McCullagh & Nelder’s foundational text Generalized Linear Models (1989) and the statsmodels GLM documentation. The source code is available on the statsmodels GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLM_GAMMA(y, x, glm_gamma_link, fit_intercept, alpha)
  • y (list[list], required): Dependent variable as a column vector. Must contain positive values.
  • x (list[list], required): Independent variables (predictors). Each column represents a predictor.
  • glm_gamma_link (str, optional, default: “inverse”): Link function to use for the Gamma GLM.
  • fit_intercept (bool, optional, default: true): If True, includes an intercept term in the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).

Returns (list[list]): 2D list with GLM results and statistics, or error string.

Examples

Example 1: Gamma inverse link with single predictor

Inputs:

y x
2.5 1
3.2 2
4.1 3
5.3 4
6.7 5

Excel formula:

=GLM_GAMMA({2.5;3.2;4.1;5.3;6.7}, {1;2;3;4;5})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept 0.42897 0.02122 20.2117 0 0.38737 0.47057
x1 -0.05733 0.00515 -11.14206 0 -0.06742 -0.04725
deviance 0.01357
pearson_chi2 0.01347
aic 3.30684
bic 2.52572
log_likelihood 0.34658
scale 0.00449

Example 2: Gamma log link with two predictors

Inputs:

y x glm_gamma_link
1.2 1 2 log
2.3 1.5 2.5
3.4 2 3
4.5 2.5 3.5
5.6 3 4
6.7 3.5 4.5

Excel formula:

=GLM_GAMMA({1.2;2.3;3.4;4.5;5.6;6.7}, {1,2;1.5,2.5;2,3;2.5,3.5;3,4;3.5,4.5}, "log")

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept -0.37516 0.15277 -2.45572 0.01406 -0.67458 -0.07574
x1 0.51493 0.11499 4.47801 0.00001 0.28955 0.74031
x2 0.13978 0.04016 3.48078 0.0005 0.06107 0.21848
deviance 0.11672
pearson_chi2 0.11083
aic 12.48693
bic 12.07045
log_likelihood -4.24347
scale 0.02771

Example 3: Gamma identity link without intercept

Inputs:

y x glm_gamma_link fit_intercept
1.5 1 identity false
3 2
4.5 3
6 4

Excel formula:

=GLM_GAMMA({1.5;3;4.5;6}, {1;2;3;4}, "identity", FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
x1 1.5 0 10706035608844454 0 1.5 1.5
deviance 0
pearson_chi2 0
aic 2
bic 1.38629
log_likelihood 0
scale 0

Example 4: Gamma with custom alpha

Inputs:

y x alpha
2.1 1 0.5 0.1
3.2 1.5 1
4.3 2 1.5
5.4 2.5 2
6.5 3 2.5
7.6 3.5 3
8.7 4 3.5

Excel formula:

=GLM_GAMMA({2.1;3.2;4.3;5.4;6.5;7.6;8.7}, {1,0.5;1.5,1;2,1.5;2.5,2;3,2.5;3.5,3;4,3.5}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
intercept 0.351 0.03949 8.8892 0 0.28605 0.41595
x1 0.0483 0.00363 13.30259 0 0.04232 0.05427
x2 -0.1272 0.01677 -7.58423 0 -0.15479 -0.09962
deviance 0.1952
pearson_chi2 0.17716
aic 21.26157
bic 21.15339
log_likelihood -8.63078
scale 0.03543

Python Code

import math
import warnings
from statsmodels.genmod.generalized_linear_model import GLM as statsmodels_GLM, SET_USE_BIC_LLF
from statsmodels.genmod.families import Gamma as statsmodels_Gamma
from statsmodels.genmod.families.links import InversePower, Log, Identity

def glm_gamma(y, x, glm_gamma_link='inverse', fit_intercept=True, alpha=0.05):
    """
    Fit a Generalized Linear Model with Gamma family for positive continuous data.

    See: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable as a column vector. Must contain positive values.
        x (list[list]): Independent variables (predictors). Each column represents a predictor.
        glm_gamma_link (str, optional): Link function to use for the Gamma GLM. Valid options: Inverse, Log, Identity. Default is 'inverse'.
        fit_intercept (bool, optional): If True, includes an intercept term in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.

    Returns:
        list[list]: 2D list with GLM results and statistics, or error string.
    """
    def to2d(arr):
        return [[arr]] if not isinstance(arr, list) else arr

    def validate_numeric(val, name):
        try:
            num = float(val)
        except Exception:
            return f"Error: Invalid input: {name} must be numeric."
        if math.isnan(num) or math.isinf(num):
            return f"Error: Invalid input: {name} must be finite."
        return num

    try:
        # Normalize inputs
        y_2d = to2d(y)
        x_2d = to2d(x)

        # Validate y is a column vector
        if not isinstance(y_2d, list) or len(y_2d) == 0:
            return "Error: Invalid input: y must be a non-empty 2D list."

        # Extract y values
        y_values = []
        for row in y_2d:
            if not isinstance(row, list) or len(row) != 1:
                return "Error: Invalid input: y must be a column vector (single column)."
            val = validate_numeric(row[0], "y")
            if isinstance(val, str):
                return val
            if val <= 0:
                return "Error: Invalid input: y values must be positive for Gamma family."
            y_values.append(val)

        # Validate x is a matrix
        if not isinstance(x_2d, list) or len(x_2d) == 0:
            return "Error: Invalid input: x must be a non-empty 2D list."

        n_rows = len(x_2d)
        if n_rows != len(y_values):
            return "Error: Invalid input: y and x must have the same number of rows."

        # Extract x values
        n_cols = None
        x_values = []
        for i, row in enumerate(x_2d):
            if not isinstance(row, list):
                return "Error: Invalid input: x must be a 2D list."
            if n_cols is None:
                n_cols = len(row)
            elif len(row) != n_cols:
                return "Error: Invalid input: all rows in x must have the same number of columns."

            row_vals = []
            for val in row:
                num = validate_numeric(val, "x")
                if isinstance(num, str):
                    return num
                row_vals.append(num)
            x_values.append(row_vals)

        # Validate link function
        valid_links = ['inverse', 'log', 'identity']
        if not isinstance(glm_gamma_link, str) or glm_gamma_link not in valid_links:
            return f"Error: Invalid input: glm_gamma_link must be one of {valid_links}."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: Invalid input: fit_intercept must be a boolean."

        # Validate alpha
        alpha_num = validate_numeric(alpha, "alpha")
        if isinstance(alpha_num, str):
            return alpha_num
        if alpha_num <= 0 or alpha_num >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Add intercept if needed
        if fit_intercept:
            x_with_intercept = [[1.0] + row for row in x_values]
        else:
            x_with_intercept = x_values

        # Create link function
        try:
            if glm_gamma_link == 'inverse':
                link = InversePower()
            elif glm_gamma_link == 'log':
                link = Log()
            else:  # identity
                link = Identity()

            family = statsmodels_Gamma(link=link)
        except Exception as exc:
            return f"Error: Error creating Gamma family: {exc}"

        # Fit the model
        try:
            SET_USE_BIC_LLF(True)
            with warnings.catch_warnings():
                warnings.filterwarnings('ignore', category=FutureWarning)
                warnings.filterwarnings('ignore', message='.*Perfect separation.*')
                warnings.filterwarnings('ignore', message='.*does not respect the domain.*')
                model = statsmodels_GLM(y_values, x_with_intercept, family=family)
                results = model.fit()
        except Exception as exc:
            return f"Error: Error fitting GLM: {exc}"

        # Extract results
        try:
            params = results.params
            std_errors = results.bse
            z_stats = results.tvalues
            p_values = results.pvalues
            conf_int = results.conf_int(alpha=alpha_num)

            # Build output table
            output = [['parameter', 'coefficient', 'std_error', 'z_statistic', 'p_value', 'ci_lower', 'ci_upper']]

            # Add parameter rows
            for i in range(len(params)):
                if fit_intercept and i == 0:
                    param_name = 'intercept'
                else:
                    predictor_idx = i if not fit_intercept else i - 1
                    param_name = f'x{predictor_idx + 1}'

                output.append([
                    param_name,
                    float(params[i]),
                    float(std_errors[i]),
                    float(z_stats[i]),
                    float(p_values[i]),
                    float(conf_int[i, 0]),
                    float(conf_int[i, 1])
                ])

            # Add model statistics
            output.append(['deviance', float(results.deviance), '', '', '', '', ''])
            output.append(['pearson_chi2', float(results.pearson_chi2), '', '', '', '', ''])
            output.append(['aic', float(results.aic), '', '', '', '', ''])
            output.append(['bic', float(results.bic_llf), '', '', '', '', ''])
            output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
            output.append(['scale', float(results.scale), '', '', '', '', ''])

            return output

        except Exception as exc:
            return f"Error: Error extracting results: {exc}"
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator