GLS_REGRESSION

Overview

The GLS_REGRESSION function fits a Generalized Least Squares (GLS) regression model, an extension of ordinary least squares (OLS) designed to handle situations where the error terms exhibit non-constant variance (heteroscedasticity) or are correlated with each other (autocorrelation). GLS was first described by Alexander Aitken in 1935 and remains a fundamental technique in econometrics and time series analysis.

In standard OLS regression, the Gauss-Markov theorem guarantees that the estimator is the best linear unbiased estimator (BLUE) when errors are independent and identically distributed. However, when these assumptions are violated, OLS estimates remain unbiased but are no longer efficient, and standard errors become unreliable. GLS addresses this by incorporating a known error covariance matrix \Omega into the estimation process.

The GLS estimator minimizes the squared Mahalanobis distance of the residual vector:

\hat{\beta}_{GLS} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} y

This is equivalent to applying OLS to a transformed version of the data. Using Cholesky decomposition \Omega = CC^T, the model is whitened by multiplying both sides by C^{-1}, which standardizes the scale and decorrelates the errors. The resulting estimator is unbiased, consistent, efficient, and asymptotically normal with covariance:

\text{Cov}[\hat{\beta} | X] = (X^T \Omega^{-1} X)^{-1}

When \Omega is diagonal, GLS reduces to weighted least squares (WLS), which handles heteroscedasticity without correlation between observations. This implementation uses the statsmodels library’s GLS class. The function returns coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics including log-likelihood, AIC, and BIC.

For more details on the theoretical background, see the Wikipedia article on Generalized Least Squares and the statsmodels GLS documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLS_REGRESSION(y, x, sigma, fit_intercept, alpha)
  • y (list[list], required): Column vector of dependent variable values (n x 1 array)
  • x (list[list], required): Matrix of independent variables (n x p array where p is the number of predictors)
  • sigma (list[list], optional, default: null): Error covariance matrix (n x n array). Defaults to identity matrix if not provided.
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with GLS results, or error message string.

Examples

Example 1: Demo case 1

Inputs:

y x
1 1
2 2
3 3
4 4

Excel formula:

=GLS_REGRESSION({1;2;3;4}, {1;2;3;4})

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0 0 0.20101 0.85928 0 0
x1 1 0 2479150001185748 0 1 1
log_likelihood 134.27843
aic -264.55687
bic -265.78428

Example 2: Demo case 2

Inputs:

y x fit_intercept
2 1 false
4 2
6 3
8 4

Excel formula:

=GLS_REGRESSION({2;4;6;8}, {1;2;3;4}, FALSE)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
x1 2 0 7023929877767954 0 2 2
log_likelihood 131.27702
aic -260.55405
bic -261.16775

Example 3: Demo case 3

Inputs:

y x sigma
1.5 1 1 0.5 0 0
2.5 2 0.5 1 0.5 0
3.5 3 0 0.5 1 0.5
4.5 4 0 0 0.5 1

Excel formula:

=GLS_REGRESSION({1.5;2.5;3.5;4.5}, {1;2;3;4}, {1,0.5,0,0;0.5,1,0.5,0;0,0.5,1,0.5;0,0,0.5,1})

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.5 0 176194507563049.12 0 0.5 0.5
x1 1 0 1017259463712596.5 0 1 1
log_likelihood 131.29679
aic -258.59359
bic -259.821

Example 4: Demo case 4

Inputs:

y x fit_intercept alpha
3 1 0.5 true 0.1
5 2 1
7 3 1.5
9 4 2

Excel formula:

=GLS_REGRESSION({3;5;7;9}, {1,0.5;2,1;3,1.5;4,2}, TRUE, 0.1)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 1 0 241418280619737.1 0 1 1
x1 1.6 0 1322302380895419.2 0 1.6 1.6
x2 0.8 0 1322302380895419.2 0 0.8 0.8
log_likelihood 128.99168
aic -253.98336
bic -255.21077

Python Code

import math
import numpy as np
from statsmodels.regression.linear_model import GLS as statsmodels_gls

def gls_regression(y, x, sigma=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Least Squares (GLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.GLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable values (n x 1 array)
        x (list[list]): Matrix of independent variables (n x p array where p is the number of predictors)
        sigma (list[list], optional): Error covariance matrix (n x n array). Defaults to identity matrix if not provided. Default is None.
        fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with GLS results, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_array(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if len(arr) == 0:
            return f"Invalid input: {name} cannot be empty."
        for row in arr:
            if not isinstance(row, list):
                return f"Invalid input: {name} must be a 2D list."
        return None

    def to_numpy(arr_2d, name):
        try:
            arr = np.array(arr_2d, dtype=float)
        except Exception:
            return f"Error: {name} must contain numeric values."
        if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
            return f"Error: {name} must contain finite values."
        return arr

    try:
        # Normalize inputs
        y = to2d(y)
        x = to2d(x)
        if sigma is not None:
            sigma = to2d(sigma)

        # Validate inputs
        err = validate_2d_array(y, "y")
        if err:
            return f"Error: {err}"
        err = validate_2d_array(x, "x")
        if err:
            return f"Error: {err}"

        # Convert to numpy arrays
        y_arr = to_numpy(y, "y")
        if isinstance(y_arr, str):
            return y_arr
        x_arr = to_numpy(x, "x")
        if isinstance(x_arr, str):
            return x_arr

        # Validate dimensions
        if y_arr.ndim != 2 or y_arr.shape[1] != 1:
            return "Error: y must be a column vector (n x 1)."
        n = y_arr.shape[0]
        if x_arr.ndim != 2:
            return "Error: x must be a 2D matrix."
        if x_arr.shape[0] != n:
            return "Error: y and x must have the same number of rows."

        # Handle sigma
        if sigma is not None:
            err = validate_2d_array(sigma, "sigma")
            if err:
                return f"Error: {err}"
            sigma_arr = to_numpy(sigma, "sigma")
            if isinstance(sigma_arr, str):
                return sigma_arr
            if sigma_arr.ndim != 2:
                return "Error: sigma must be a 2D matrix."
            if sigma_arr.shape[0] != n or sigma_arr.shape[1] != n:
                return f"Error: sigma must be {n} x {n} to match the number of observations."
        else:
            sigma_arr = None

        # Validate alpha
        if not isinstance(alpha, (int, float)):
            return "Error: alpha must be a number."
        if math.isnan(alpha) or math.isinf(alpha):
            return "Error: alpha must be finite."
        if alpha <= 0 or alpha >= 1:
            return "Error: alpha must be between 0 and 1."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Add intercept if requested
        if fit_intercept:
            x_arr = np.hstack([np.ones((n, 1)), x_arr])

        # Flatten y for statsmodels
        y_vec = y_arr.flatten()

        # Fit GLS model
        model = statsmodels_gls(y_vec, x_arr, sigma=sigma_arr)
        results = model.fit()

        # Extract results
        params = results.params
        std_err = results.bse
        t_stats = results.tvalues
        p_values = results.pvalues
        conf_int = results.conf_int(alpha=alpha)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Parameter names
        if fit_intercept:
            param_names = ['intercept'] + [f'x{i+1}' for i in range(x_arr.shape[1] - 1)]
        else:
            param_names = [f'x{i+1}' for i in range(x_arr.shape[1])]

        # Add parameter rows
        for i, name in enumerate(param_names):
            output.append([
                name,
                float(params[i]),
                float(std_err[i]),
                float(t_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1])
            ])

        # Add model statistics
        output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
        output.append(['aic', float(results.aic), '', '', '', '', ''])
        output.append(['bic', float(results.bic), '', '', '', '', ''])

        return output
    except Exception as exc:
        return f"Error: {exc}"

Online Calculator