MANOVA_TEST

Overview

The MANOVA_TEST function performs Multivariate Analysis of Variance (MANOVA), a statistical procedure for comparing multivariate sample means across two or more groups. MANOVA extends univariate analysis of variance (ANOVA) to situations where there are multiple dependent variables, using the covariance between outcome variables when testing the statistical significance of mean differences.

This implementation uses the statsmodels library’s MANOVA class, which is based on multivariate regression. For more details, see the statsmodels MANOVA documentation. The function tests the null hypothesis that all group mean vectors are equal across the specified dependent variables.

The function returns four commonly used test statistics, each derived from the eigenvalues \lambda_p of the matrix A = S_{\text{model}} S_{\text{res}}^{-1}:

  • Wilks’ lambda: \Lambda_{\text{Wilks}} = \prod (1 + \lambda_p)^{-1} — measures the proportion of variance not explained by group differences
  • Pillai’s trace: \Lambda_{\text{Pillai}} = \sum \frac{\lambda_p}{1 + \lambda_p} — considered the most robust to violations of assumptions
  • Hotelling-Lawley trace: \Lambda_{\text{LH}} = \sum \lambda_p — powerful when group differences are concentrated in one dimension
  • Roy’s greatest root: \Lambda_{\text{Roy}} = \max(\lambda_p) — most powerful when the alternative hypothesis is true for a single linear combination

Each test statistic is converted to an approximate F-statistic with associated degrees of freedom and p-value. The function compares the minimum p-value across all test statistics against the specified significance level (alpha) to determine whether to reject the null hypothesis.

MANOVA is particularly useful in experimental designs where multiple related outcomes are measured simultaneously, as it controls the family-wise error rate better than running separate ANOVAs. For background on multivariate analysis of variance, see the Wikipedia article on MANOVA.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=MANOVA_TEST(data, groups, alpha)
  • data (list[list], required): A matrix of dependent variables where rows are observations and columns are dependent variables.
  • groups (list[list], required): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
  • alpha (float, optional, default: 0.05): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive).

Returns (list[list]): 2D list with MANOVA results, or error message string.

Examples

Example 1: Two groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.22857 13.5 1 4 0.02131
Pillai Pillai’s trace 0.77143 13.5 1 4 0.02131
Hotelling-Lawley Hotelling-Lawley trace 3.375 13.5 1 4 0.02131
Roy Roy’s greatest root 3.375 13.5 1 4 0.02131
reject_null

Example 2: Three groups with two dependent variables

Inputs:

data groups
1 2 1
2 3 1
3 4 1
5 6 2
6 7 2
7 8 2
9 10 3
10 11 3
11 12 3

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;5,6;6,7;7,8;9,10;10,11;11,12}, {1;1;1;2;2;2;3;3;3})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.05882 112 1 7 0.00001
Pillai Pillai’s trace 0.94118 112 1 7 0.00001
Hotelling-Lawley Hotelling-Lawley trace 16 112 1 7 0.00001
Roy Roy’s greatest root 16 112 1 7 0.00001
reject_null

Example 3: Custom alpha value with stricter significance level

Inputs:

data groups alpha
1 2 1 0.01
2 3 1
3 4 1
4 5 2
5 6 2
6 7 2

Excel formula:

=MANOVA_TEST({1,2;2,3;3,4;4,5;5,6;6,7}, {1;1;1;2;2;2}, 0.01)

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.22857 13.5 1 4 0.02131
Pillai Pillai’s trace 0.77143 13.5 1 4 0.02131
Hotelling-Lawley Hotelling-Lawley trace 3.375 13.5 1 4 0.02131
Roy Roy’s greatest root 3.375 13.5 1 4 0.02131
fail_to_reject_null

Example 4: Two groups with three dependent variables

Inputs:

data groups
1 2 3 1
2 3 4 1
3 4 5 1
5 6 7 2
6 7 8 2
7 8 9 2

Excel formula:

=MANOVA_TEST({1,2,3;2,3,4;3,4,5;5,6,7;6,7,8;7,8,9}, {1;1;1;2;2;2})

Expected output:

test_statistic statistic_name statistic_value f_value df_num df_denom p_value
Wilks Wilks’ lambda 0.14286 24 1 4 0.00805
Pillai Pillai’s trace 0.85714 24 1 4 0.00805
Hotelling-Lawley Hotelling-Lawley trace 6 24 1 4 0.00805
Roy Roy’s greatest root 6 24 1 4 0.00805
reject_null

Python Code

import pandas as pd
from statsmodels.multivariate.manova import MANOVA as statsmodels_manova

def manova_test(data, groups, alpha=0.05):
    """
    Performs Multivariate Analysis of Variance (MANOVA) for multiple dependent variables.

    See: https://www.statsmodels.org/stable/generated/statsmodels.multivariate.manova.MANOVA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): A matrix of dependent variables where rows are observations and columns are dependent variables.
        groups (list[list]): A column vector of group membership indicators (integer coded). Must have the same number of rows as data.
        alpha (float, optional): Significance level for hypothesis testing. Must be between 0 and 1 (exclusive). Default is 0.05.

    Returns:
        list[list]: 2D list with MANOVA results, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    def validate_float(val, name):
        if not isinstance(val, (int, float)):
            return f"Error: Invalid input: {name} must be a number."
        val = float(val)
        if val != val or val == float('inf') or val == float('-inf'):
            return f"Error: Invalid input: {name} must be finite."
        return val

    try:
        # Normalize inputs
        data = to2d(data)
        groups = to2d(groups)

        # Validate alpha
        alpha_val = validate_float(alpha, "alpha")
        if isinstance(alpha_val, str):
            return alpha_val
        if alpha_val <= 0 or alpha_val >= 1:
            return "Error: Invalid input: alpha must be between 0 and 1."

        # Validate data is a 2D list
        if not isinstance(data, list) or len(data) == 0:
            return "Error: Invalid input: data must be a non-empty 2D list."

        for i, row in enumerate(data):
            if not isinstance(row, list):
                return f"Error: Invalid input: data row {i} must be a list."
            if len(row) == 0:
                return f"Error: Invalid input: data row {i} must be non-empty."

        # Get dimensions
        n_obs = len(data)
        n_vars = len(data[0])

        # Validate all rows have same length
        for i, row in enumerate(data):
            if len(row) != n_vars:
                return "Error: Invalid input: all rows in data must have the same length."

        # Validate all elements in data are numeric
        data_flat = []
        for i, row in enumerate(data):
            row_vals = []
            for j, val in enumerate(row):
                validated = validate_float(val, f"data[{i}][{j}]")
                if isinstance(validated, str):
                    return validated
                row_vals.append(validated)
            data_flat.append(row_vals)

        # Validate groups is a column vector
        if len(groups) != n_obs:
            return f"Error: Invalid input: groups must have {n_obs} rows to match data."

        for i, row in enumerate(groups):
            if not isinstance(row, list):
                return f"Error: Invalid input: groups row {i} must be a list."
            if len(row) != 1:
                return "Error: Invalid input: groups must be a column vector (each row has 1 element)."

        # Extract and validate group values
        group_vals = []
        for i, row in enumerate(groups):
            validated = validate_float(row[0], f"groups[{i}][0]")
            if isinstance(validated, str):
                return validated
            group_vals.append(int(validated))

        # Check we have at least 2 groups
        unique_groups = list(set(group_vals))
        if len(unique_groups) < 2:
            return "Error: Invalid input: groups must contain at least 2 distinct values."

        # Check we have at least 1 dependent variable
        if n_vars < 1:
            return "Error: Invalid input: data must have at least 1 dependent variable."

        # Create DataFrame
        df_data = {}
        for j in range(n_vars):
            df_data[f"DV{j+1}"] = [data_flat[i][j] for i in range(n_obs)]
        df_data["Group"] = group_vals
        df = pd.DataFrame(df_data)

        # Create formula
        dv_names = [f"DV{j+1}" for j in range(n_vars)]
        formula = " + ".join(dv_names) + " ~ Group"

        # Fit MANOVA
        try:
            manova = statsmodels_manova.from_formula(formula, data=df)
            results = manova.mv_test()
        except Exception as exc:
            return f"Error: {str(exc)}"

        # Extract test results
        try:
            test_results = results.results["Group"]["stat"]
        except Exception as exc:
            return f"Error: Unable to extract MANOVA results: {str(exc)}"

        # Build output
        output = []

        # Header row
        output.append(["test_statistic", "statistic_name", "statistic_value", "f_value", "df_num", "df_denom", "p_value"])

        # Test statistics to extract
        test_stats = [
            ("Wilks' lambda", "Wilks"),
            ("Pillai's trace", "Pillai"),
            ("Hotelling-Lawley trace", "Hotelling-Lawley"),
            ("Roy's greatest root", "Roy")
        ]

        min_p_value = 1.0

        for stat_name, stat_key in test_stats:
            try:
                if stat_name in test_results.index:
                    row_data = test_results.loc[stat_name]
                    stat_value = float(row_data["Value"])
                    f_value = float(row_data["F Value"])
                    df_num = float(row_data["Num DF"])
                    df_denom = float(row_data["Den DF"])
                    p_value = float(row_data["Pr > F"])

                    # Track minimum p-value for conclusion
                    if p_value < min_p_value:
                        min_p_value = p_value

                    output.append([stat_key, stat_name, stat_value, f_value, df_num, df_denom, p_value])
            except Exception as exc:
                return f"Error: Unable to extract MANOVA statistic {stat_name}: {str(exc)}"

        # Add conclusion row
        if min_p_value < alpha_val:
            conclusion = "reject_null"
        else:
            conclusion = "fail_to_reject_null"

        output.append([conclusion, "", "", "", "", "", ""])

        return output
    except Exception as exc:
        return f"Error: {str(exc)}"

Online Calculator