WLS_REGRESSION

Overview

The WLS_REGRESSION function fits a Weighted Least Squares (WLS) regression model, which is a generalization of ordinary least squares (OLS) designed to handle heteroscedasticity—situations where the variance of errors differs across observations. WLS is commonly used when data points have unequal reliability or precision, such as in survey data with varying sample sizes or measurements with different levels of uncertainty.

This implementation uses the statsmodels library’s WLS class. For source code and additional details, see the statsmodels GitHub repository.

In standard OLS, the objective is to minimize the sum of squared residuals. WLS extends this by assigning weights to each observation, minimizing a weighted sum of squared residuals instead:

S(\beta) = \sum_{i=1}^{n} w_i (y_i - X_i \beta)^2

where w_i are the weights for each observation. The weights are presumed to be proportional to the inverse of the variance of the observations, i.e., w_i = 1/\sigma_i^2. This means observations with lower variance (higher precision) receive higher weights and contribute more to the parameter estimates.

The WLS estimator is given by the solution to the weighted normal equations:

\hat{\beta} = (X^T W X)^{-1} X^T W y

where W is a diagonal matrix containing the weights. When all weights are equal, WLS reduces to OLS. According to the Gauss-Markov theorem, when weights are correctly specified as the inverse of the error variances, WLS produces the Best Linear Unbiased Estimator (BLUE). For more theoretical background, see Weighted least squares on Wikipedia.

The function returns comprehensive regression results including coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics such as R², adjusted R², F-statistic, AIC, and BIC.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=WLS_REGRESSION(y, x, weights, fit_intercept, alpha)

y (list[list], required): Column vector of dependent variable (response) values.
x (list[list], required): Matrix of independent variables (predictors). Each column is a predictor.
weights (list[list], required): Column vector of positive weights for each observation.
fit_intercept (bool, optional, default: true): Whether to add an intercept term to the model.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (e.g., 0.05 for 95% CI).

Returns (list[list]): 2D list with WLS results, or error message string.

Examples

Example 1: Basic WLS regression with uniform weights

Inputs:

y	x	weights
1.1	1	1
1.9	2	1
3.2	3	1
3.8	4	1
5.1	5	1

Excel formula:

=WLS_REGRESSION({1.1;1.9;3.2;3.8;5.1}, {1;2;3;4;5}, {1;1;1;1;1})

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	0.05	0.1981	0.2524	0.817	-0.5804	0.6804
x1	0.99	0.05972	16.58	0.0004779	0.7999	1.18
r_squared	0.9892
adj_r_squared	0.9856
f_statistic	274.8
f_pvalue	0.0004779
aic	-1.032
bic	-1.814

Example 2: WLS regression without intercept

Inputs:

y	x	weights	fit_intercept
2.1	1	1	false
4.2	2	1
5.8	3	1
8.1	4	1

Excel formula:

=WLS_REGRESSION({2.1;4.2;5.8;8.1}, {1;2;3;4}, {1;1;1;1}, FALSE)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
x1	2.01	0.03283	61.23	0.0000096	1.906	2.114
r_squared	0.9992
adj_r_squared	0.9989
f_statistic	3749
f_pvalue	0.0000096
aic	-1.526
bic	-2.14

Example 3: WLS with custom weights and alpha

Inputs:

y	x	weights	alpha
1.2	1	2	0.1
2.4	2	1.5
3.1	3	1
4.5	4	1.2
5.2	5	1.8

Excel formula:

=WLS_REGRESSION({1.2;2.4;3.1;4.5;5.2}, {1;2;3;4;5}, {2;1.5;1;1.2;1.8}, 0.1)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	0.2564	0.1656	1.548	0.2193	-0.1333	0.646
x1	1.006	0.05032	20	0.0002733	0.8879	1.125
r_squared	0.9926
adj_r_squared	0.9901
f_statistic	399.9
f_pvalue	0.0002733
aic	-1.721
bic	-2.502

Example 4: WLS with multiple predictors and varying weights

Inputs:

y	x		weights	fit_intercept	alpha
5.5	1	3	1	true	0.05
8.2	2	2	1.5
11.1	3	4	1
12.5	4	1	2
16.3	5	5	1.2

Excel formula:

=WLS_REGRESSION({5.5;8.2;11.1;12.5;16.3}, {1,3;2,2;3,4;4,1;5,5}, {1;1.5;1;2;1.2}, TRUE, 0.05)

Expected output:

parameter	coefficient	std_error	t_statistic	p_value	ci_lower	ci_upper
intercept	2.37	0.3476	6.817	0.02085	0.8741	3.865
x1	2.475	0.09095	27.22	0.001347	2.084	2.867
x2	0.3109	0.08295	3.749	0.06437	-0.04596	0.6679
r_squared	0.9976
adj_r_squared	0.9952
f_statistic	412.6
f_pvalue	0.002418
aic	2.661
bic	1.489

Python Code

import math
from statsmodels.regression.linear_model import WLS as statsmodels_WLS

def wls_regression(y, x, weights, fit_intercept=True, alpha=0.05):
    """
    Fits a Weighted Least Squares (WLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.WLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable (response) values.
        x (list[list]): Matrix of independent variables (predictors). Each column is a predictor.
        weights (list[list]): Column vector of positive weights for each observation.
        fit_intercept (bool, optional): Whether to add an intercept term to the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (e.g., 0.05 for 95% CI). Default is 0.05.

    Returns:
        list[list]: 2D list with WLS results, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val
    try:
        # Normalize inputs to 2D lists
        y = to2d(y)
        x = to2d(x)
        weights = to2d(weights)

        # Validate y is a column vector
        if not isinstance(y, list) or not all(isinstance(row, list) for row in y):
            return "Error: y must be a 2D list."
        if len(y) == 0:
            return "Error: y must not be empty."
        if not all(len(row) == 1 for row in y):
            return "Error: y must be a column vector (each row must have exactly 1 element)."

        # Extract y values
        y_vals = [float(row[0]) for row in y]

        if any(math.isnan(val) or math.isinf(val) for val in y_vals):
            return "Error: y must contain finite values."

        n_obs = len(y_vals)

        # Validate x is a matrix
        if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
            return "Error: x must be a 2D list."
        if len(x) != n_obs:
            return "Error: x must have the same number of rows as y."
        if len(x) == 0:
            return "Error: x must not be empty."

        n_predictors = len(x[0])
        if n_predictors == 0:
            return "Error: x must have at least one column."
        if not all(len(row) == n_predictors for row in x):
            return "Error: x must have consistent column count across all rows."

        # Extract x values
        x_vals = [[float(val) for val in row] for row in x]

        if any(math.isnan(val) or math.isinf(val) for row in x_vals for val in row):
            return "Error: x must contain finite values."

        # Validate weights is a column vector
        if not isinstance(weights, list) or not all(isinstance(row, list) for row in weights):
            return "Error: weights must be a 2D list."
        if len(weights) != n_obs:
            return "Error: weights must have the same number of rows as y."
        if not all(len(row) == 1 for row in weights):
            return "Error: weights must be a column vector (each row must have exactly 1 element)."

        # Extract weight values
        weight_vals = [float(row[0]) for row in weights]

        if any(math.isnan(val) or math.isinf(val) for val in weight_vals):
            return "Error: weights must contain finite values."
        if any(val <= 0 for val in weight_vals):
            return "Error: weights must be positive."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Validate alpha
        alpha_val = float(alpha)
        if math.isnan(alpha_val) or math.isinf(alpha_val):
            return "Error: alpha must be finite."
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: alpha must be between 0 and 1."

        # Add intercept column if needed
        if fit_intercept:
            x_vals = [[1.0] + row for row in x_vals]

        # Fit WLS model
        model = statsmodels_WLS(y_vals, x_vals, weights=weight_vals)
        results = model.fit()

        # Extract confidence intervals
        conf_int = results.conf_int(alpha=alpha_val)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Add parameter results
        param_names = []
        if fit_intercept:
            param_names.append('intercept')
        for i in range(n_predictors):
            param_names.append(f'x{i+1}')

        for i, param_name in enumerate(param_names):
            coef = float(results.params[i])
            std_err = float(results.bse[i])
            t_stat = float(results.tvalues[i])
            p_val = float(results.pvalues[i])
            ci_lower = float(conf_int[i, 0])
            ci_upper = float(conf_int[i, 1])

            if any(math.isnan(val) or math.isinf(val) for val in [coef, std_err, t_stat, p_val, ci_lower, ci_upper]):
                return f"Error: non-finite value in results for parameter {param_name}."

            output.append([param_name, coef, std_err, t_stat, p_val, ci_lower, ci_upper])

        # Add model statistics
        r_squared = float(results.rsquared)
        adj_r_squared = float(results.rsquared_adj)
        f_stat = float(results.fvalue)
        f_pval = float(results.f_pvalue)
        aic = float(results.aic)
        bic = float(results.bic)

        if any(math.isnan(val) or math.isinf(val) for val in [r_squared, adj_r_squared, f_stat, f_pval, aic, bic]):
            return "Error: non-finite value in model statistics."

        output.append(['r_squared', r_squared, '', '', '', '', ''])
        output.append(['adj_r_squared', adj_r_squared, '', '', '', '', ''])
        output.append(['f_statistic', f_stat, '', '', '', '', ''])
        output.append(['f_pvalue', f_pval, '', '', '', '', ''])
        output.append(['aic', aic, '', '', '', '', ''])
        output.append(['bic', bic, '', '', '', '', ''])

        return output
    except Exception as exc:
        return f"Error: {exc}"

Overview

Excel Usage

Examples

Python Code

Online Calculator