WLS_REGRESSION
Overview
The WLS_REGRESSION function fits a Weighted Least Squares (WLS) regression model, which is a generalization of ordinary least squares (OLS) designed to handle heteroscedasticity—situations where the variance of errors differs across observations. WLS is commonly used when data points have unequal reliability or precision, such as in survey data with varying sample sizes or measurements with different levels of uncertainty.
This implementation uses the statsmodels library’s WLS class. For source code and additional details, see the statsmodels GitHub repository.
In standard OLS, the objective is to minimize the sum of squared residuals. WLS extends this by assigning weights to each observation, minimizing a weighted sum of squared residuals instead:
S(\beta) = \sum_{i=1}^{n} w_i (y_i - X_i \beta)^2
where w_i are the weights for each observation. The weights are presumed to be proportional to the inverse of the variance of the observations, i.e., w_i = 1/\sigma_i^2. This means observations with lower variance (higher precision) receive higher weights and contribute more to the parameter estimates.
The WLS estimator is given by the solution to the weighted normal equations:
\hat{\beta} = (X^T W X)^{-1} X^T W y
where W is a diagonal matrix containing the weights. When all weights are equal, WLS reduces to OLS. According to the Gauss-Markov theorem, when weights are correctly specified as the inverse of the error variances, WLS produces the Best Linear Unbiased Estimator (BLUE). For more theoretical background, see Weighted least squares on Wikipedia.
The function returns comprehensive regression results including coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics such as R², adjusted R², F-statistic, AIC, and BIC.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=WLS_REGRESSION(y, x, weights, fit_intercept, alpha)
y(list[list], required): Column vector of dependent variable (response) values.x(list[list], required): Matrix of independent variables (predictors). Each column is a predictor.weights(list[list], required): Column vector of positive weights for each observation.fit_intercept(bool, optional, default: true): Whether to add an intercept term to the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals (e.g., 0.05 for 95% CI).
Returns (list[list]): 2D list with WLS results, or error message string.
Examples
Example 1: Basic WLS regression with uniform weights
Inputs:
| y | x | weights |
|---|---|---|
| 1.1 | 1 | 1 |
| 1.9 | 2 | 1 |
| 3.2 | 3 | 1 |
| 3.8 | 4 | 1 |
| 5.1 | 5 | 1 |
Excel formula:
=WLS_REGRESSION({1.1;1.9;3.2;3.8;5.1}, {1;2;3;4;5}, {1;1;1;1;1})
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.05 | 0.1981 | 0.2524 | 0.817 | -0.5804 | 0.6804 |
| x1 | 0.99 | 0.05972 | 16.58 | 0.0004779 | 0.7999 | 1.18 |
| r_squared | 0.9892 | |||||
| adj_r_squared | 0.9856 | |||||
| f_statistic | 274.8 | |||||
| f_pvalue | 0.0004779 | |||||
| aic | -1.032 | |||||
| bic | -1.814 |
Example 2: WLS regression without intercept
Inputs:
| y | x | weights | fit_intercept |
|---|---|---|---|
| 2.1 | 1 | 1 | false |
| 4.2 | 2 | 1 | |
| 5.8 | 3 | 1 | |
| 8.1 | 4 | 1 |
Excel formula:
=WLS_REGRESSION({2.1;4.2;5.8;8.1}, {1;2;3;4}, {1;1;1;1}, FALSE)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| x1 | 2.01 | 0.03283 | 61.23 | 0.0000096 | 1.906 | 2.114 |
| r_squared | 0.9992 | |||||
| adj_r_squared | 0.9989 | |||||
| f_statistic | 3749 | |||||
| f_pvalue | 0.0000096 | |||||
| aic | -1.526 | |||||
| bic | -2.14 |
Example 3: WLS with custom weights and alpha
Inputs:
| y | x | weights | alpha |
|---|---|---|---|
| 1.2 | 1 | 2 | 0.1 |
| 2.4 | 2 | 1.5 | |
| 3.1 | 3 | 1 | |
| 4.5 | 4 | 1.2 | |
| 5.2 | 5 | 1.8 |
Excel formula:
=WLS_REGRESSION({1.2;2.4;3.1;4.5;5.2}, {1;2;3;4;5}, {2;1.5;1;1.2;1.8}, 0.1)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.2564 | 0.1656 | 1.548 | 0.2193 | -0.1333 | 0.646 |
| x1 | 1.006 | 0.05032 | 20 | 0.0002733 | 0.8879 | 1.125 |
| r_squared | 0.9926 | |||||
| adj_r_squared | 0.9901 | |||||
| f_statistic | 399.9 | |||||
| f_pvalue | 0.0002733 | |||||
| aic | -1.721 | |||||
| bic | -2.502 |
Example 4: WLS with multiple predictors and varying weights
Inputs:
| y | x | weights | fit_intercept | alpha | |
|---|---|---|---|---|---|
| 5.5 | 1 | 3 | 1 | true | 0.05 |
| 8.2 | 2 | 2 | 1.5 | ||
| 11.1 | 3 | 4 | 1 | ||
| 12.5 | 4 | 1 | 2 | ||
| 16.3 | 5 | 5 | 1.2 |
Excel formula:
=WLS_REGRESSION({5.5;8.2;11.1;12.5;16.3}, {1,3;2,2;3,4;4,1;5,5}, {1;1.5;1;2;1.2}, TRUE, 0.05)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 2.37 | 0.3476 | 6.817 | 0.02085 | 0.8741 | 3.865 |
| x1 | 2.475 | 0.09095 | 27.22 | 0.001347 | 2.084 | 2.867 |
| x2 | 0.3109 | 0.08295 | 3.749 | 0.06437 | -0.04596 | 0.6679 |
| r_squared | 0.9976 | |||||
| adj_r_squared | 0.9952 | |||||
| f_statistic | 412.6 | |||||
| f_pvalue | 0.002418 | |||||
| aic | 2.661 | |||||
| bic | 1.489 |
Python Code
import math
from statsmodels.regression.linear_model import WLS as statsmodels_WLS
def wls_regression(y, x, weights, fit_intercept=True, alpha=0.05):
"""
Fits a Weighted Least Squares (WLS) regression model.
See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.WLS.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Column vector of dependent variable (response) values.
x (list[list]): Matrix of independent variables (predictors). Each column is a predictor.
weights (list[list]): Column vector of positive weights for each observation.
fit_intercept (bool, optional): Whether to add an intercept term to the model. Default is True.
alpha (float, optional): Significance level for confidence intervals (e.g., 0.05 for 95% CI). Default is 0.05.
Returns:
list[list]: 2D list with WLS results, or error message string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
try:
# Normalize inputs to 2D lists
y = to2d(y)
x = to2d(x)
weights = to2d(weights)
# Validate y is a column vector
if not isinstance(y, list) or not all(isinstance(row, list) for row in y):
return "Error: y must be a 2D list."
if len(y) == 0:
return "Error: y must not be empty."
if not all(len(row) == 1 for row in y):
return "Error: y must be a column vector (each row must have exactly 1 element)."
# Extract y values
y_vals = [float(row[0]) for row in y]
if any(math.isnan(val) or math.isinf(val) for val in y_vals):
return "Error: y must contain finite values."
n_obs = len(y_vals)
# Validate x is a matrix
if not isinstance(x, list) or not all(isinstance(row, list) for row in x):
return "Error: x must be a 2D list."
if len(x) != n_obs:
return "Error: x must have the same number of rows as y."
if len(x) == 0:
return "Error: x must not be empty."
n_predictors = len(x[0])
if n_predictors == 0:
return "Error: x must have at least one column."
if not all(len(row) == n_predictors for row in x):
return "Error: x must have consistent column count across all rows."
# Extract x values
x_vals = [[float(val) for val in row] for row in x]
if any(math.isnan(val) or math.isinf(val) for row in x_vals for val in row):
return "Error: x must contain finite values."
# Validate weights is a column vector
if not isinstance(weights, list) or not all(isinstance(row, list) for row in weights):
return "Error: weights must be a 2D list."
if len(weights) != n_obs:
return "Error: weights must have the same number of rows as y."
if not all(len(row) == 1 for row in weights):
return "Error: weights must be a column vector (each row must have exactly 1 element)."
# Extract weight values
weight_vals = [float(row[0]) for row in weights]
if any(math.isnan(val) or math.isinf(val) for val in weight_vals):
return "Error: weights must contain finite values."
if any(val <= 0 for val in weight_vals):
return "Error: weights must be positive."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Error: fit_intercept must be a boolean."
# Validate alpha
alpha_val = float(alpha)
if math.isnan(alpha_val) or math.isinf(alpha_val):
return "Error: alpha must be finite."
if alpha_val <= 0 or alpha_val >= 1:
return "Error: alpha must be between 0 and 1."
# Add intercept column if needed
if fit_intercept:
x_vals = [[1.0] + row for row in x_vals]
# Fit WLS model
model = statsmodels_WLS(y_vals, x_vals, weights=weight_vals)
results = model.fit()
# Extract confidence intervals
conf_int = results.conf_int(alpha=alpha_val)
# Build output table
output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]
# Add parameter results
param_names = []
if fit_intercept:
param_names.append('intercept')
for i in range(n_predictors):
param_names.append(f'x{i+1}')
for i, param_name in enumerate(param_names):
coef = float(results.params[i])
std_err = float(results.bse[i])
t_stat = float(results.tvalues[i])
p_val = float(results.pvalues[i])
ci_lower = float(conf_int[i, 0])
ci_upper = float(conf_int[i, 1])
if any(math.isnan(val) or math.isinf(val) for val in [coef, std_err, t_stat, p_val, ci_lower, ci_upper]):
return f"Error: non-finite value in results for parameter {param_name}."
output.append([param_name, coef, std_err, t_stat, p_val, ci_lower, ci_upper])
# Add model statistics
r_squared = float(results.rsquared)
adj_r_squared = float(results.rsquared_adj)
f_stat = float(results.fvalue)
f_pval = float(results.f_pvalue)
aic = float(results.aic)
bic = float(results.bic)
if any(math.isnan(val) or math.isinf(val) for val in [r_squared, adj_r_squared, f_stat, f_pval, aic, bic]):
return "Error: non-finite value in model statistics."
output.append(['r_squared', r_squared, '', '', '', '', ''])
output.append(['adj_r_squared', adj_r_squared, '', '', '', '', ''])
output.append(['f_statistic', f_stat, '', '', '', '', ''])
output.append(['f_pvalue', f_pval, '', '', '', '', ''])
output.append(['aic', aic, '', '', '', '', ''])
output.append(['bic', bic, '', '', '', '', ''])
return output
except Exception as exc:
return f"Error: {exc}"