QUANTILE_REGRESSION
Overview
The QUANTILE_REGRESSION function fits a quantile regression model to estimate conditional quantiles of a response variable given predictor variables. Unlike ordinary least squares (OLS) regression, which estimates the conditional mean, quantile regression estimates specific percentiles (such as the median or any other quantile) of the response distribution. This makes it particularly valuable for understanding relationships across the entire distribution of outcomes, not just the average.
Quantile regression was introduced by Roger Koenker in 1978 and has become a standard tool in econometrics, ecology, and medical research. The method is especially useful when the relationship between variables differs at different points of the outcome distribution, or when dealing with heteroscedastic data where variance changes across observations. For a comprehensive treatment, see Koenker’s foundational text Quantile Regression (Cambridge University Press, 2005).
This implementation uses the statsmodels library’s QuantReg class, which solves the quantile regression problem using iterative reweighted least squares. For details, see the statsmodels QuantReg documentation.
The quantile regression estimator minimizes an asymmetric loss function called the check function (or pinball loss):
\hat{\beta}_\tau = \arg\min_{\beta} \sum_{i=1}^{n} \rho_\tau(y_i - x_i'\beta)
where the loss function \rho_\tau(u) is defined as:
\rho_\tau(u) = u \cdot (\tau - \mathbf{1}_{u < 0}) = \begin{cases} \tau \cdot u & \text{if } u \geq 0 \\ (\tau - 1) \cdot u & \text{if } u < 0 \end{cases}
For \tau = 0.5, the check function reduces to the absolute value (up to a constant), making median regression equivalent to Least Absolute Deviations (LAD) regression. Key advantages of quantile regression include robustness to outliers in the response variable and the ability to characterize heterogeneous effects across the outcome distribution.
The function returns coefficients, standard errors, t-statistics, p-values, confidence intervals, and a pseudo R-squared measure of fit. The pseudo R-squared compares the minimized loss under the full model to an intercept-only model. For more theoretical background, see Quantile regression on Wikipedia.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=QUANTILE_REGRESSION(y, x, quantile, fit_intercept)
y(list[list], required): Dependent variable (response) as a column vector of numeric values.x(list[list], required): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.quantile(float, optional, default: 0.5): Quantile level to estimate, between 0 and 1 (exclusive).fit_intercept(bool, optional, default: true): If true, adds an intercept term to the model.
Returns (list[list]): 2D list with quantile regression results, or error string.
Examples
Example 1: Demo case 1
Inputs:
| y | x |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Excel formula:
=QUANTILE_REGRESSION({1;2;3;4;5}, {1;2;3;4;5})
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0 | |||||
| x1 | 1 | |||||
| pseudo_r_squared | 1 |
Example 2: Demo case 2
Inputs:
| y | x | quantile |
|---|---|---|
| 2.3 | 1 | 0.25 |
| 4.1 | 2 | |
| 5.8 | 3 | |
| 7.2 | 4 | |
| 9.5 | 5 | |
| 11.1 | 6 |
Excel formula:
=QUANTILE_REGRESSION({2.3;4.1;5.8;7.2;9.5;11.1}, {1;2;3;4;5;6}, 0.25)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.55 | |||||
| x1 | 1.75 | |||||
| pseudo_r_squared | 0.94243 |
Example 3: Demo case 3
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 1.5 | 1 | false |
| 3.2 | 2 | |
| 4.8 | 3 | |
| 6.1 | 4 | |
| 7.9 | 5 |
Excel formula:
=QUANTILE_REGRESSION({1.5;3.2;4.8;6.1;7.9}, {1;2;3;4;5}, FALSE)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| x1 | 1.58 | |||||
| pseudo_r_squared | 0.95699 |
Example 4: Demo case 4
Inputs:
| y | x | quantile | fit_intercept |
|---|---|---|---|
| 3.1 | 1 | 0.75 | true |
| 5.8 | 2 | ||
| 8.2 | 3 | ||
| 10.5 | 4 | ||
| 13.1 | 5 | ||
| 15.8 | 6 |
Excel formula:
=QUANTILE_REGRESSION({3.1;5.8;8.2;10.5;13.1;15.8}, {1;2;3;4;5;6}, 0.75, TRUE)
Expected output:
| parameter | coefficient | std_error | t_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| intercept | 0.8 | |||||
| x1 | 2.5 | |||||
| pseudo_r_squared | 0.9766 |
Python Code
import math
from statsmodels.regression.quantile_regression import QuantReg as sm_quantreg
def quantile_regression(y, x, quantile=0.5, fit_intercept=True):
"""
Fits a quantile regression model to estimate conditional quantiles of the response distribution.
See: https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Dependent variable (response) as a column vector of numeric values.
x (list[list]): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
quantile (float, optional): Quantile level to estimate, between 0 and 1 (exclusive). Default is 0.5.
fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.
Returns:
list[list]: 2D list with quantile regression results, or error string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_2d_numeric(arr, name):
if not isinstance(arr, list):
return f"Error: {name} must be a 2D list."
if not arr:
return f"Error: {name} cannot be empty."
for i, row in enumerate(arr):
if not isinstance(row, list):
return f"Error: {name} must be a 2D list."
if not row:
return f"Error: {name} cannot contain empty rows."
for j, val in enumerate(row):
if not isinstance(val, (int, float)):
return f"Error: {name}[{i}][{j}] must be numeric."
if math.isnan(val) or math.isinf(val):
return f"Error: {name}[{i}][{j}] must be finite."
return None
try:
# Normalize inputs to 2D lists
y = to2d(y)
x = to2d(x)
# Validate inputs
err = validate_2d_numeric(y, "y")
if err:
return err
err = validate_2d_numeric(x, "x")
if err:
return err
# Validate quantile
if not isinstance(quantile, (int, float)):
return "Error: quantile must be numeric."
if math.isnan(quantile) or math.isinf(quantile):
return "Error: quantile must be finite."
if quantile <= 0 or quantile >= 1:
return "Error: quantile must be between 0 and 1 (exclusive)."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Error: fit_intercept must be a boolean."
# Extract y as 1D array
y_flat = [row[0] for row in y]
n_obs = len(y_flat)
# Check x dimensions
n_rows_x = len(x)
if n_rows_x != n_obs:
return f"Error: x must have same number of rows as y ({n_obs})."
# Extract x as 2D array
n_cols_x = len(x[0])
for row in x:
if len(row) != n_cols_x:
return "Error: x must have consistent column count."
# Add intercept column if requested
if fit_intercept:
x_data = [[1.0] + list(row) for row in x]
else:
x_data = [list(row) for row in x]
# Fit quantile regression model
model = sm_quantreg(y_flat, x_data)
result = model.fit(q=quantile)
# Extract results
params = result.params
bse = result.bse
tvalues = result.tvalues
pvalues = result.pvalues
conf_int = result.conf_int()
prsquared = result.prsquared
# Build output table
headers = ['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']
output = [headers]
# Add parameter rows
for i in range(len(params)):
if fit_intercept and i == 0:
param_name = 'intercept'
else:
predictor_idx = i if not fit_intercept else i - 1
param_name = f'x{predictor_idx + 1}'
# Convert values to float or "" if NaN
def safe_float(val):
fval = float(val)
if math.isnan(fval):
return ""
if math.isinf(fval):
return f"Error: infinite value in result for {param_name}."
return fval
coef = safe_float(params[i])
if isinstance(coef, str) and coef.startswith("Error:"):
return coef
se = safe_float(bse[i])
if isinstance(se, str) and se.startswith("Error:"):
return se
tval = safe_float(tvalues[i])
if isinstance(tval, str) and tval.startswith("Error:"):
return tval
pval = safe_float(pvalues[i])
if isinstance(pval, str) and pval.startswith("Error:"):
return pval
ci_low = safe_float(conf_int[i, 0])
if isinstance(ci_low, str) and ci_low.startswith("Error:"):
return ci_low
ci_high = safe_float(conf_int[i, 1])
if isinstance(ci_high, str) and ci_high.startswith("Error:"):
return ci_high
row = [param_name, coef, se, tval, pval, ci_low, ci_high]
output.append(row)
# Add pseudo R-squared row
prsq_val = float(prsquared)
if math.isinf(prsq_val):
return "Error: infinite pseudo R-squared."
if math.isnan(prsq_val):
prsq_val = ""
output.append(['pseudo_r_squared', prsq_val, '', '', '', '', ''])
return output
except Exception as exc:
return f"Error: {exc}"