Mastering Statistical Tests (Part II): Your Guide to Choosing the Right Test for Your Data
Mastering Statistical Tests (Part II)

In part I, I explored a variety of statistical tests categorized by criteria such as the number of independent variables (I focused exclusively on scenarios with a single independent variable) and dependent variables (where again, I considered only one dependent variable), as well as the levels within the independent variable and the independence of these levels. Building on this foundation, part II extends the analysis to scenarios involving more than two groups (levels) within the independent variable, and even increasing the number of independent variables. Here, I delve deeper into specialized tests tailored for these complex configurations, offering insights into their effective application.
11- One Way Analysis of Variance (ANOVA)
The test is used to determine if there are significant differences between the means of three or more independent groups of the independent variable. It helps to ascertain whether at least one group mean is different from the others, making it a crucial tool for comparing multiple datasets and identifying potential patterns in experimental research. Before providing a python example, I would like to draw the attention to four conditions that should verified before performing the test:
- Normality: The dependent variable should be approximately normally distributed within each group. This can be checked using statistical tests like the Shapiro-Wilk test or by visually inspecting Q-Q plots.
- Homoscedasticity (i.e., equality of variance): The variances within each group should be approximately equal. This assumption can be tested using Levene's test.
- Random Sampling: The data should be collected through a random sampling process to ensure that the sample is representative of the population.
- Dependent Variable Measurement: The dependent variable should be measured at an interval or ratio scale (i.e., it should be a continuous variable).
Moderate violations of the first two assumptions (independence of observations and normality) are generally not critical. The sampling distribution of the test statistic is quite robust, particularly with larger sample sizes, and even more so if the sample sizes across all factor levels are equal.
import scipy.stats as stats
import pandas as pd
# Sample data: weight loss (in kg) for three different diets
data = {
'Diet_A': [2.1, 2.5, 3.0, 3.2, 2.7],
'Diet_B': [2.0, 2.2, 2.4, 2.8, 2.5],
'Diet_C': [1.8, 2.0, 2.1, 2.2, 1.9]
}
# Convert the data to a DataFrame
df = pd.DataFrame(data)
# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(df['Diet_A'], df['Diet_B'], df['Diet_C'])
print(f'F-statistic: {f_statistic:.2f}')
print(f'p-value: {p_value:.4f}')
# Interpretation
if p_value < 0.05:
print("There is a statistically significant difference between the groups.")
else:
print("There is no statistically significant difference between the groups.")
Now, if the p-value from the one-way ANOVA test suggests rejecting the null hypothesis, it indicates that there are statistically significant differences between the means of at least two of the groups. However, it does not specify which groups are different from each other. To identify the specific groups that differ, you need to conduct post-hoc tests such as Tukey's honest significant difference (HSD) test.
12- The Kruskal-Wallis test
The test is a non-parametric method used to determine if there are statistically significant differences between the medians of three or more independent groups of the independent variable. It's an alternative to the one-way ANOVA when the assumptions of normality and homoscedasticity are not met.
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Generate synthetic data
np.random.seed(42)
# Normally distributed data for Low_Revenue
low_revenue = np.random.normal(loc=1.5, scale=0.5, size=50)
# Skewed data for Medium_Revenue
medium_revenue = np.random.exponential(scale=1.5, size=50)
# Normally distributed data for High_Revenue but with higher variance
high_revenue = np.random.normal(loc=1.5, scale=1.5, size=50)
# Create DataFrame
data = {
'Low_Revenue': low_revenue,
'Medium_Revenue': medium_revenue,
'High_Revenue': high_revenue
}
df = pd.DataFrame(data)
# Checking Normality with Shapiro-Wilk Test
for column in df.columns:
stat, p = stats.shapiro(df[column])
print(f'{column} - Shapiro-Wilk Test Statistics={stat:.3f}, p={p:.3f}')
if p > 0.05:
print(f'{column} follows a normal distributionn')
else:
print(f'{column} does not follow a normal distributionn')
# Checking Homogeneity of Variances with Levene's Test
stat, p = stats.levene(df['Low_Revenue'], df['Medium_Revenue'], df['High_Revenue'])
print(f'Levene's Test - Statistics={stat:.3f}, p={p:.3f}')
if p > 0.05:
print('The variances are equaln')
else:
print('The variances are not equaln')
# Perform Kruskal-Wallis H-test
stat, p_value = stats.kruskal(df['Low_Revenue'], df['Medium_Revenue'], df['High_Revenue'])
print(f'Kruskal-Wallis statistic: {stat:.2f}')
print(f'p-value: {p_value:.4f}')
# Interpretation
if p_value < 0.05:
print("There is a statistically significant difference between the groups.")
else:
print("There is no statistically significant difference between the groups.")
13- Chi square test of independence ( 3 or more groups)
The Chi-square test serves as a counterpart to one-way ANOVA when analyzing categorical data. While ANOVA examines differences in means across groups of a continuous variable, Chi-square assesses associations or differences in frequencies between categories of a discrete variable.
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
# Example data: DataFrame with grades ('Pass' or 'Fail') in three subjects
data = {
'Subject 1': np.random.choice(['Pass', 'Fail'], size=10),
'Subject 2': np.random.choice(['Pass', 'Fail'], size=10),
'Subject 3': np.random.choice(['Pass', 'Fail'], size=10)
}
# Create Pandas DataFrame
df = pd.DataFrame(data)
# Print the DataFrame
print("DataFrame with Grades:")
print(df)
# Perform chi-square test of independence
observed = pd.crosstab(df['Subject 1'], [df['Subject 2'], df['Subject 3']])
chi2, p, dof, expected = chi2_contingency(observed)
# Output results
print("nChi-square Test Results:")
print(f"Chi-square statistic: {chi2}")
print(f"P-value: {p}")
print(f"Degrees of freedom: {dof}")
print("nExpected frequencies (based on independence assumption):")
print(expected)
# Decision based on p-value
alpha = 0.05
if p < alpha:
print("nReject the null hypothesis: There is a significant relationship between the variables.")
else:
print("nDo not reject the null hypothesis: There is no significant relationship between the variables.")
Remember that our variables are the subject type and student's final grade (Pass/Fail). Both variables are categorical.
By now, you might be wondering what happens if we have three or more levels in the independent variable, and they are dependent. Well, this is exactly what we shall discuss in the next three tests.
14- One-way repeated measures ANOVA
If you can think of the one-way repeated measures ANOVA as an extension of the paired t-test, congratulations! You're on the right track. For example, this test can be applied in a study assessing the reaction times (the dependent variable) of participants under four different levels of caffeine consumption (no caffeine, low caffeine, moderate caffeine, high caffeine). Before performing this test, ensure that the assumptions of normality and homogeneity of variances are satisfied.
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import AnovaRM
# Simulate data
np.random.seed(42)
# Number of participants
n = 30
# Caffeine levels
caffeine_levels = ['no_caffeine', 'low_caffeine', 'moderate_caffeine', 'high_caffeine']
# Simulate reaction times for each caffeine level
reaction_times = {
'no_caffeine': np.random.normal(loc=250, scale=15, size=n),
'low_caffeine': np.random.normal(loc=240, scale=15, size=n),
'moderate_caffeine': np.random.normal(loc=230, scale=15, size=n),
'high_caffeine': np.random.normal(loc=220, scale=15, size=n)
}
# Create a list to store the data
data = []
# Populate the data list
for participant in range(1, n+1):
for level in caffeine_levels:
data.append({
'participant': participant,
'caffeine_level': level,
'reaction_time': reaction_times[level][participant-1]
})
# Create DataFrame
df = pd.DataFrame(data)
# Perform one-way repeated measures ANOVA
aovrm = AnovaRM(df, 'reaction_time', 'participant', within=['caffeine_level'])
res = aovrm.fit()
# Print the results
print(res)
# Extract the p-value using iloc
p_value = res.anova_table['Pr > F'].iloc[0]
# Set significance level
alpha = 0.05
# Print the decision
if p_value < alpha:
print(f"Reject the null hypothesis (p-value = {p_value:.4f})")
else:
print(f"Fail to reject the null hypothesis (p-value = {p_value:.4f})")
15- Friedman test
The Friedman test is similar to the repeated measures one-way ANOVA, but it is used when the dependent variable is not interval and not normally distributed, but rather ordinal. This makes the Friedman test suitable for analyzing ranked data or ordinal scales where traditional parametric tests are not appropriate. For instance, it can be used to compare the rankings of different treatments or conditions across multiple subjects in a study, providing a non-parametric alternative to repeated measures ANOVA.
import pandas as pd
import numpy as np
from scipy.stats import friedmanchisquare
# Simulate ordinal data
np.random.seed(42)
# Number of participants
n = 30
# Create ordinal reaction times for each caffeine level
reaction_times = {
'no_caffeine': np.random.randint(1, 5, size=n),
'low_caffeine': np.random.randint(1, 5, size=n),
'moderate_caffeine': np.random.randint(1, 5, size=n),
'high_caffeine': np.random.randint(1, 5, size=n)
}
# Create a DataFrame
df = pd.DataFrame(reaction_times)
df['participant'] = np.arange(1, n+1)
# Perform the Friedman test
stat, p_value = friedmanchisquare(df['no_caffeine'], df['low_caffeine'], df['moderate_caffeine'], df['high_caffeine'])
# Print the results
print(f"Friedman test statistic: {stat:.4f}")
print(f"P-value: {p_value:.4f}")
# Set significance level
alpha = 0.05
# Print the decision
if p_value < alpha:
print(f"Reject the null hypothesis (p-value = {p_value:.4f})")
else:
print(f"Fail to reject the null hypothesis (p-value = {p_value:.4f})")
16- Cochran's Q test
The test is used to determine if there are differences in a dichotomous dependent variable across three or more related groups. It is similar to the one-way repeated measures ANOVA but is designed for dichotomous data rather than continuous data, making it an extension of McNemar's test. This test is particularly useful for analyzing longitudinal study designs and is often employed when participants have undergone multiple trials or treatments. By comparing the proportions of a binary outcome (e.g., success/failure, yes/no) across different conditions, Cochran's Q test helps to identify whether there are statistically significant differences in the outcomes among the related groups.
import pandas as pd
import numpy as np
from statsmodels.stats.contingency_tables import cochrans_q
# Simulate binary outcome data
np.random.seed(42)
n = 30
# Create binary outcomes for each condition
data = {
'no_caffeine': np.random.randint(0, 2, size=n),
'low_caffeine': np.random.randint(0, 2, size=n),
'moderate_caffeine': np.random.randint(0, 2, size=n),
'high_caffeine': np.random.randint(0, 2, size=n)
}
# Create DataFrame
df = pd.DataFrame(data)
# Perform Cochran's Q test
result = cochrans_q(df)
# Extract the test statistic and p-value
stat = result.statistic
p_value = result.pvalue
# Print the results
print("Cochran's Q Test:")
print(f"Cochran's Q statistic: {stat:.4f}")
print(f"P-value: {p_value:.4f}")
# Set significance level
alpha = 0.05
# Print the decision
if p_value < alpha:
print(f"Reject the null hypothesis (p-value = {p_value:.4f})")
else:
print(f"Fail to reject the null hypothesis (p-value = {p_value:.4f})")
However, if you ended up rejecting the null hypothesis, you will probably want to follow up your Cochran's Q test with a post hoc analysis.
Now, what happens if instead of one independent variable, we have two or more!, and each independent variable has independent levels.

17- Factorial ANOVA (N-WAY ANOVA) test
The test is applied when there are several categorical independent variables (factors), with or without their interactions, and a single normally distributed continuous dependent variable. This statistical technique enables the examination of both the independent variables' main effects and their interactions on the dependent variable's variance. It assumes that residuals follow a normal distribution, variances are homogeneous across groups, and observations are independent.
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
# Hypothetical data
data = {
'Gender': ['Male', 'Male', 'Male', 'Male', 'Male', 'Male',
'Female', 'Female', 'Female', 'Female', 'Female', 'Female'],
'Education_Level': ['High School', 'High School', 'Bachelor's Degree', 'Bachelor's Degree', 'Master's Degree', 'Master's Degree',
'High School', 'High School', 'Bachelor's Degree', 'Bachelor's Degree', 'Master's Degree', 'Master's Degree'],
'Salary': [40000, 38000, 60000, 62000, 80000, 78000,
35000, 36000, 58000, 59000, 75000, 73000]
}
# Create DataFrame
df = pd.DataFrame(data)
# Convert categorical variables to categorical type
df['Gender'] = pd.Categorical(df['Gender'])
df['Education_Level'] = pd.Categorical(df['Education_Level'])
# Fit the ANOVA model
formula = 'Salary ~ C(Gender) + C(Education_Level) + C(Gender):C(Education_Level)'
model = ols(formula, data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
# Print ANOVA table
print("ANOVA Table:")
print(anova_table)
print()
# Extract p-values
p_gender = anova_table.loc['C(Gender)', 'PR(>F)']
p_education = anova_table.loc['C(Education_Level)', 'PR(>F)']
p_interaction = anova_table.loc['C(Gender):C(Education_Level)', 'PR(>F)']
# Set significance level
alpha = 0.05
# Decision based on p-values
if p_gender < alpha:
print(f"Reject the null hypothesis for Gender (p-value = {p_gender:.4f})")
else:
print(f"Accept the null hypothesis for Gender (p-value = {p_gender:.4f})")
if p_education < alpha:
print(f"Reject the null hypothesis for Education Level (p-value = {p_education:.4f})")
else:
print(f"Accept the null hypothesis for Education Level (p-value = {p_education:.4f})")
if p_interaction < alpha:
print(f"Reject the null hypothesis for Interaction (p-value = {p_interaction:.4f})")
else:
print(f"Accept the null hypothesis for Interaction (p-value = {p_interaction:.4f})")
18- "Non-parametric" N-way ANOVA test
Alright! if you are thinking that when there are breaches to the noramlity and homoscedasticity conditions and the dependent variable is ordinal, then we need to extend the Kruskal-Wallis test; I'm sorry to tell you that such an extension simply doesn't exist in statistics. Alternatively, we have two approximate approaches (I recommend reading this nice blog by Charles Holbert, yet the codes are given in R):
- Transforming the data into ranks and then perform a parametric two-way ANOVA on these ranks. This rank transformation technique helps determine whether there are significant differences in ranks across groups. Additionally, post-hoc comparisons on the data ranks can identify which specific groups differ from each other.
- Using ordinal logistic regression, which offers general contrasts on the log odds ratio scale (Harrell 2015). This method is particularly useful for analyzing ordinal data and provides a robust framework for understanding the relationships between variables without relying on the stringent assumptions of parametric ANOVA.
I will give a python code for the second method:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.miscmodels.ordinal_model import OrderedModel
# Create a small dataset
data = {
'Study_Hours': [2, 3, 5, 7, 8, 10, 12, 15, 16, 18],
'Attendance': [80, 85, 70, 75, 90, 95, 60, 88, 92, 96],
'Grade': [1, 2, 2, 3, 3, 4, 1, 4, 4, 5] # Ordinal grade (1 to 5)
}
df = pd.DataFrame(data)
# Check the first few rows of the DataFrame
print(df.head())
# Define X (independent variables) and y (dependent variable)
X = df[['Study_Hours', 'Attendance']]
y = df['Grade']
# Fit ordered logistic regression model (do not add constant)
model = OrderedModel(y, X, distr='logit')
result = model.fit(method='bfgs')
# Print summary of the regression
print(result.summary())
# Hypothesis testing: Check p-values to decide whether to accept or reject null hypothesis
alpha = 0.05 # Significance level
# Check p-values of the predictor coefficients only (exclude thresholds)
predictor_pvalues = result.pvalues[:X.shape[1]]
reject_null = False
for i, p_value in enumerate(predictor_pvalues):
if p_value < alpha:
print(f"Reject the null hypothesis for predictor '{X.columns[i]}': p-value = {p_value:.4f}")
reject_null = True
else:
print(f"Accept the null hypothesis for predictor '{X.columns[i]}': p-value = {p_value:.4f}")
# Overall model significance (Wald test)
if reject_null:
print("At least one predictor is significant, hence reject the null hypothesis of no relationship.")
else:
print("Accept the null hypothesis: There is no significant relationship between predictors and grades.")
19- Factorial logistic regression method
Factorial logistic regression is an extension of logistic regression that allows for the analysis of interactions between two or more categorical independent variables (factors) on a binary dependent variable. This method helps in understanding not only the main effects of each factor but also how the combination of factors influences the probability of the outcome, providing a more comprehensive view of the relationships in the data.
import statsmodels.api as sm
import pandas as pd
import numpy as np
# Generating synthetic data
np.random.seed(0)
n = 100 # Number of samples
X = np.random.normal(size=(n, 2)) # Two independent variables
y = np.random.binomial(1, 0.5, size=n) # Binary outcome variable
# Creating a DataFrame
df = pd.DataFrame({'X1': X[:, 0], 'X2': X[:, 1], 'Outcome': y})
# Define predictors and target variable
predictors = ['X1', 'X2']
target = 'Outcome'
# Fit the logistic regression model
model = sm.Logit(df[target], df[predictors])
result = model.fit()
# Print model summary
print(result.summary())
# Decision on null hypothesis based on p-values
alpha = 0.05 # significance level
print("nNull Hypothesis Decision:")
for idx, p_value in enumerate(result.pvalues):
if p_value < alpha:
print(f"Reject null hypothesis for predictor '{predictors[idx]}'.")
else:
print(f"Fail to reject null hypothesis for predictor '{predictors[idx]}'.")
where the null hypothesis on the intercept and the predictor variables (x1 and x2) are:
- The intercept is zero.
- That predictors has no effect on the log odds (or equivalently, the probability) of the outcome.
Summary
In this article, I explore three primary groups of common statistical tests. The first group includes one-way ANOVA, Kruskal-Wallis, and Chi-square tests, which are suitable for scenarios involving a single independent variable with multiple levels, regardless of whether the dependent variable is interval, ordinal, or categorical. The second group of tests (One-way repeated measures ANOVA, Friedman, and Cochran's Q) relaxes the independence assumption among these levels. Finally, I delve into advanced techniques (N-way ANOVA, ordered logistic regression, and factorial logistic regression) for cases involving multiple independent variables, each with distinct levels.
If you found value in this article, please show your support by