power analysis

In this article we demonstrate how to calculate the sample size needed to achieve a desired power for experiments with an ordered categorical outcome. We assume the data will be analyzed with a proportional odds logistic regression model. We’ll use the R statistical computing environment and functions from the {Hmisc} package to implement the calculations.

In this article we demonstrate how to use simulation in R to estimate power and sample size for proposed logistic regression models that feature two binary predictors and their interaction.

Recall that logistic regression attempts to model the probability of an event conditional on the values of predictor variables. If we have a binary response, y, and two predictors, x and z, that interact, we specify the logistic regression model as follows:

The Analysis of Covariance, or ANCOVA, is a regression model that includes both categorical and numeric predictors, often just one of each. It is commonly used to analyze a follow-up numeric response after exposure to various treatments, controlling for a baseline measure of that same response. For example, given two subjects with the same baseline value of the study outcome, one in a treated group and the other in a control group, will the subjects have different follow-up outcomes on average?

When designing an experiment it’s good practice to estimate the number of subjects or observations we’ll need. If we recruit or collect too few, our analysis may be too uncertain or misleading. If we collect too many, we potentially waste time and expense on diminishing returns. The optimal sample size provides enough information to allow us to analyze our research questions with confidence. The traditional approach to sample size estimation is based on hypothesis tests.

The power of a test is the probability of correctly rejecting a null hypothesis. For example, let’s say we suspect a coin is not fair and lands heads 65% of the time.

It is well documented that post hoc power calculations are not useful (Althouse, 2020; Goodman & Berlin, 1994; Hoenig & Heisey, 2001). Also known as observed power or retrospective power, post hoc power purports to estimate the power of a test given an observed effect size. The idea is to show that a “non-significant” hypothesis test failed to achieve significance because it wasn’t powerful enough. This allows researchers to entertain the notion that their hypothesized effect may actually exist; they just needed to use a bigger sample size.

Logistic regression is a method for modeling binary data as a function of other variables. For example we might want to model the occurrence or non-occurrence of a disease given predictors such as age, race, weight, etc. The result is a model that returns a predicted probability of occurrence (or non-occurrence, depending on how we set up our data) given certain values of our predictors. We might also be able to interpret the coefficients in our model to summarize how a change in one predictor affects the odds of occurrence.

The paper Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors by Andrew Gelman and John Carlin introduces the idea of performing design calculations to help prevent researchers from being misled by statistically significant results in studies with small samples and/or noisy measurements.