QUANTILE_REGRESSION

Overview

The QUANTILE_REGRESSION function fits a quantile regression model to estimate conditional quantiles of a response variable given predictor variables. Unlike ordinary least squares (OLS) regression, which estimates the conditional mean, quantile regression estimates specific percentiles (such as the median or any other quantile) of the response distribution. This makes it particularly valuable for understanding relationships across the entire distribution of outcomes, not just the average.

Quantile regression was introduced by Roger Koenker in 1978 and has become a standard tool in econometrics, ecology, and medical research. The method is especially useful when the relationship between variables differs at different points of the outcome distribution, or when dealing with heteroscedastic data where variance changes across observations. For a comprehensive treatment, see Koenker’s foundational text Quantile Regression (Cambridge University Press, 2005).

This implementation uses the statsmodels library’s QuantReg class, which solves the quantile regression problem using iterative reweighted least squares. For details, see the statsmodels QuantReg documentation.

The quantile regression estimator minimizes an asymmetric loss function called the check function (or pinball loss):

\hat{\beta}_\tau = \arg\min_{\beta} \sum_{i=1}^{n} \rho_\tau(y_i - x_i'\beta)

where the loss function \rho_\tau(u) is defined as:

\rho_\tau(u) = u \cdot (\tau - \mathbf{1}_{u < 0}) = \begin{cases} \tau \cdot u & \text{if } u \geq 0 \\ (\tau - 1) \cdot u & \text{if } u < 0 \end{cases}

For \tau = 0.5, the check function reduces to the absolute value (up to a constant), making median regression equivalent to Least Absolute Deviations (LAD) regression. Key advantages of quantile regression include robustness to outliers in the response variable and the ability to characterize heterogeneous effects across the outcome distribution.

The function returns coefficients, standard errors, t-statistics, p-values, confidence intervals, and a pseudo R-squared measure of fit. The pseudo R-squared compares the minimized loss under the full model to an intercept-only model. For more theoretical background, see Quantile regression on Wikipedia.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=QUANTILE_REGRESSION(y, x, quantile, fit_intercept)
  • y (list[list], required): Dependent variable (response) as a column vector of numeric values.
  • x (list[list], required): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
  • quantile (float, optional, default: 0.5): Quantile level to estimate, between 0 and 1 (exclusive).
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model.

Returns (list[list]): 2D list with quantile regression results, or error string.

Examples

Example 1: Demo case 1

Inputs:

y x
1 1
2 2
3 3
4 4
5 5

Excel formula:

=QUANTILE_REGRESSION({1;2;3;4;5}, {1;2;3;4;5})

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0
x1 1
pseudo_r_squared 1

Example 2: Demo case 2

Inputs:

y x quantile
2.3 1 0.25
4.1 2
5.8 3
7.2 4
9.5 5
11.1 6

Excel formula:

=QUANTILE_REGRESSION({2.3;4.1;5.8;7.2;9.5;11.1}, {1;2;3;4;5;6}, 0.25)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.55
x1 1.75
pseudo_r_squared 0.94243

Example 3: Demo case 3

Inputs:

y x fit_intercept
1.5 1 false
3.2 2
4.8 3
6.1 4
7.9 5

Excel formula:

=QUANTILE_REGRESSION({1.5;3.2;4.8;6.1;7.9}, {1;2;3;4;5}, FALSE)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
x1 1.58
pseudo_r_squared 0.95699

Example 4: Demo case 4

Inputs:

y x quantile fit_intercept
3.1 1 0.75 true
5.8 2
8.2 3
10.5 4
13.1 5
15.8 6

Excel formula:

=QUANTILE_REGRESSION({3.1;5.8;8.2;10.5;13.1;15.8}, {1;2;3;4;5;6}, 0.75, TRUE)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.8
x1 2.5
pseudo_r_squared 0.9766

Python Code

import math
from statsmodels.regression.quantile_regression import QuantReg as sm_quantreg

def quantile_regression(y, x, quantile=0.5, fit_intercept=True):
    """
    Fits a quantile regression model to estimate conditional quantiles of the response distribution.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable (response) as a column vector of numeric values.
        x (list[list]): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
        quantile (float, optional): Quantile level to estimate, between 0 and 1 (exclusive). Default is 0.5.
        fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.

    Returns:
        list[list]: 2D list with quantile regression results, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_numeric(arr, name):
        if not isinstance(arr, list):
            return f"Error: {name} must be a 2D list."
        if not arr:
            return f"Error: {name} cannot be empty."
        for i, row in enumerate(arr):
            if not isinstance(row, list):
                return f"Error: {name} must be a 2D list."
            if not row:
                return f"Error: {name} cannot contain empty rows."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Error: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Error: {name}[{i}][{j}] must be finite."
        return None

    try:
        # Normalize inputs to 2D lists
        y = to2d(y)
        x = to2d(x)

        # Validate inputs
        err = validate_2d_numeric(y, "y")
        if err:
            return err
        err = validate_2d_numeric(x, "x")
        if err:
            return err

        # Validate quantile
        if not isinstance(quantile, (int, float)):
            return "Error: quantile must be numeric."
        if math.isnan(quantile) or math.isinf(quantile):
            return "Error: quantile must be finite."
        if quantile <= 0 or quantile >= 1:
            return "Error: quantile must be between 0 and 1 (exclusive)."

        # Validate fit_intercept
        if not isinstance(fit_intercept, bool):
            return "Error: fit_intercept must be a boolean."

        # Extract y as 1D array
        y_flat = [row[0] for row in y]
        n_obs = len(y_flat)

        # Check x dimensions
        n_rows_x = len(x)
        if n_rows_x != n_obs:
            return f"Error: x must have same number of rows as y ({n_obs})."

        # Extract x as 2D array
        n_cols_x = len(x[0])
        for row in x:
            if len(row) != n_cols_x:
                return "Error: x must have consistent column count."

        # Add intercept column if requested
        if fit_intercept:
            x_data = [[1.0] + list(row) for row in x]
        else:
            x_data = [list(row) for row in x]

        # Fit quantile regression model
        model = sm_quantreg(y_flat, x_data)
        result = model.fit(q=quantile)

        # Extract results
        params = result.params
        bse = result.bse
        tvalues = result.tvalues
        pvalues = result.pvalues
        conf_int = result.conf_int()
        prsquared = result.prsquared

        # Build output table
        headers = ['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']
        output = [headers]

        # Add parameter rows
        for i in range(len(params)):
            if fit_intercept and i == 0:
                param_name = 'intercept'
            else:
                predictor_idx = i if not fit_intercept else i - 1
                param_name = f'x{predictor_idx + 1}'

            # Convert values to float or "" if NaN
            def safe_float(val):
                fval = float(val)
                if math.isnan(fval):
                    return ""
                if math.isinf(fval):
                    return f"Error: infinite value in result for {param_name}."
                return fval

            coef = safe_float(params[i])
            if isinstance(coef, str) and coef.startswith("Error:"):
                return coef
            se = safe_float(bse[i])
            if isinstance(se, str) and se.startswith("Error:"):
                return se
            tval = safe_float(tvalues[i])
            if isinstance(tval, str) and tval.startswith("Error:"):
                return tval
            pval = safe_float(pvalues[i])
            if isinstance(pval, str) and pval.startswith("Error:"):
                return pval
            ci_low = safe_float(conf_int[i, 0])
            if isinstance(ci_low, str) and ci_low.startswith("Error:"):
                return ci_low
            ci_high = safe_float(conf_int[i, 1])
            if isinstance(ci_high, str) and ci_high.startswith("Error:"):
                return ci_high

            row = [param_name, coef, se, tval, pval, ci_low, ci_high]
            output.append(row)

        # Add pseudo R-squared row
        prsq_val = float(prsquared)
        if math.isinf(prsq_val):
            return "Error: infinite pseudo R-squared."
        if math.isnan(prsq_val):
            prsq_val = ""

        output.append(['pseudo_r_squared', prsq_val, '', '', '', '', ''])

        return output
    except Exception as exc:
        return f"Error: {exc}"

Online Calculator