Power and Sample Size Estimation for Logistic Regression

In this article we demonstrate how to use simulation in R to estimate power and sample size for proposed logistic regression models that feature two binary predictors and their interaction.

Recall that logistic regression attempts to model the probability of an event conditional on the values of predictor variables. If we have a binary response, y, and two predictors, x and z, that interact, we specify the logistic regression model as follows:

\[P(y = 1|x,z) = \text{logit}^{-1}(\beta_0 + \beta_1 x +\beta_2 z + \beta_3 xz)\]

The phrase \(\text{logit}^{-1}\) refers to the inverse logit function. This function transforms values on the real number line to lie in the interval (0,1). This ensures the model always returns a probability. The inverse logit function is

\[\text{logit}^{-1} = \frac{e^x}{1 + e^x}\]

The R function plogis() implements the inverse logit. For example, below we transform numbers ranging from -4 to 4 to range from 0 to 1.


x <- -4:4
plogis(x)


[1] 0.01798621 0.04742587 0.11920292 0.26894142 0.50000000 0.73105858 0.88079708
[8] 0.95257413 0.98201379

The additive model, \(\beta_0 + \beta_1 x +\beta_2 z + \beta_3 xz\), is the linear predictor. The coefficient, \(\beta_3\), represents the interaction. This model says the effect of x depends on z, and vice versa. When planning a study with an interaction, we want to make sure we sample enough data to reliably estimate, or at least detect, the interaction. This leads to estimating power and sample size. If we desire a high power, say 0.90, to correctly detect a real interaction, then we estimate the minimum sample size. If our sample size is already determined or limited in some way, then we estimate the power of our model. In both cases we need to hypothesize a realistic interaction effect that we think exists and would be important to detect and quantify.

To begin it may be helpful to think in terms of hypothesized probabilities. For example, what is the expected probability that y = 1 when x = 0 and z = 0? Perhaps we think it’s about 0.40. What about when x = 1 and z = 0? Maybe we think it increases to 0.85. Let’s list out some hypothesized probabilities for all four combinations of x and z:

  • x = 1 and z = 0: 0.85
  • x = 1 and z = 1: 0.50
  • x = 0 and z = 0: 0.40
  • x = 0 and z = 1: 0.30

Notice that x and z interact in determining probability. For example, when x = 1, the probability decreases from 0.85 to 0.50 as z goes from 0 to 1. But when x = 0, the probability only decreases from 0.40 to 0.30 as z goes from 0 to 1. We can make this interaction more precise by expressing the change in probabilities as odds ratios.

To convert probability to odds, we divide probability by 1 minus the probability. For example, 0.50 expressed as odds is 0.5/(1 - 0.5) = 1, and 0.85 expressed as odds is 0.85/(1 - 0.85) = 5.667. Therefore the odds ratio of 0.5 versus 0.85 is 1/5.667 = 0.176. This says the odds that y = 1 given x = 1 and z = 1 are about 82% lower than the odds that y = 1 given x = 1 and z = 0. (We get 82% by subtracting 0.176 from 1.)

We can make calculating odds ratios easy by creating a simple function to do it for us.


# function to calculate odds ratio
or <- function(p1,p2)(p1/(1-p1))/(p2/(1-p2))
or(0.5,0.85)


[1] 0.1764706

The odds ratio of 0.3 versus 0.4 (i.e., when x = 0) is about 0.643.


or(0.3, 0.4)


[1] 0.6428571

The ratio of these odds ratios quantifies the interaction.


or(0.5,0.85)/or(0.3, 0.4)


[1] 0.2745098

If there was no interaction, this ratio would be one.

Returning to our probabilities, it’s important to emphasize these are assumed probabilities. Hopefully these would be based on a pilot study or subject matter expertise. The intent is to use these probabilities to simulate many data sets, analyze the data using our planned statistical analysis, and then determine how well our model is performing given the size of the data or how often it detects an interaction.

Now recall the linear predictor portion of our model from above:

\[\beta_0 + \beta_1 x +\beta_2 z + \beta_3 xz\]

If x = 0 and z = 0, we are left with \(\beta_0\). We expect the model to predict about 0.4, therefore it seems like \(\beta_0\) should be set to 0.4. But remember the linear predictor gets transformed to the probability scale using the inverse logit. Therefore we need to transform 0.4 using the logit scale. The phrase “logit” refers to a log-odds transformation of probability:

\[\text{logit}(p) = \text{log}\left( \frac{p}{(1 - p)} \right)\]

The qlogis() function in R implements the logit transformation.


b0 <- qlogis(0.4)
b0


[1] -0.4054651

So -0.4054651 should be our hypothesized \(\beta_0\) coefficient. Even though the model is used to estimate probabilities, the coefficients are specified on the logit, or log-odds, scale. Again, to undo the log-odds and get probability, we use the inverse logit:


plogis(b0)


[1] 0.4

When x = 1 and z = 0, our predicted probability will be determined by the sum of \(\beta_0\) and \(\beta_1\). Therefore \(\beta_1\) represents in the change in log-odds when going from x = 0 to x = 1. Since we hypothesized the probability to be 0.85 when x = 1 and z = 0, we can solve for \(\beta_1\) as follows:


b1 <- qlogis(0.85) - b0
b1


[1] 2.140066

So our \(\beta_1\) coefficients is 2.1400662. Again this is on the log-odds scale. If we add the two coefficients on the log-odds scale and take the inverse logit, we get our hypothesized probability of 0.85:


plogis(b0 + b1)


[1] 0.85

Also notice that exponentiating the \(\beta_1\) coefficients returns the odds ratio of 0.85 versus 4.0.


exp(b1)


[1] 8.5


or(0.85, 0.4)


[1] 8.5

We can make the same calculation to find our \(\beta_2\) coefficient, which is the change in log-odds when z increases from 0 to 1. Since we hypothesized the probability to be 0.30 when x = 0 and z = 1, we can solve for \(\beta_2\) as follows:


b2 <- qlogis(0.3) - b0
b2


[1] -0.4418328

If we add the \(\beta_0\) and \(\beta_2\) coefficients on the log-odds scale and take the inverse logit, we’ll get our hypothesized probability of 0.30.


plogis(b0 + b2)


[1] 0.3

Exponentiating the \(\beta_2\) coefficient provides the odds ratio of 0.3 versus 0.4.


exp(b2)


[1] 0.6428571


or(0.3, 0.4)


[1] 0.6428571

And finally we come to the interaction coefficient, \(\beta_3\). This is what we add to the x coefficient, \(\beta_1\), when z = 1. When z = 0, the effect of x is \(\beta_1\). But when z = 1, the effect of x is \(\beta_1\) + \(\beta_3\). That’s the interaction of x and z. The effect of x depends on z. Obviously this means the effect of z depends on x as well. So we could also explain the interaction as what we add to the z coefficient, \(\beta_2\), when x = 1. The effect of z when x = 0 is \(\beta_2\), but the effect of z when x = 1 is \(\beta_2\) + \(\beta_3\).

Mathematically, this means that \(\beta_3\) is what we add to the sum of the other three coefficients when both x = 1 and z = 1. Recall our hypothesized probability above was 0.5 when x = 1 and z = 1. Therefore we take the log-odds of 0.5 and subtract the other three log-odds coefficients to get our \(\beta_3\) coefficient:


b3 <- qlogis(0.5) - b0 - b1 - b2
b3


[1] -1.292768

Summing all the log-odds coefficients and taking the inverse logit verifies the probability is 0.5.


plogis(b0 + b1 + b2 + b3)


[1] 0.5

Exponentiating the \(\beta_3\) coefficient returns the ratio of odds ratios that we calculated above.


exp(b3)


[1] 0.2745098


or(0.5,0.85)/or(0.3, 0.4)


[1] 0.2745098

Now that we have our coefficients, we can simulate our data. First we’ll simulate x and z and assume they’re independent of one another. In other words, knowing the value of x for a subject tells us nothing about the value of z for that same subject. This would be plausible if we’re planning an experiment where x is a treatment variable (control or treatment) and z is a demographic variable, such as gender (male or female), that is not related to whether someone is in the control or treatment group. Let’s simulate data for 200 subjects using the sample() function.


n <- 200
x <- sample(0:1, size = n, replace = TRUE)
z <- sample(0:1, size = n, replace = TRUE)

Next we generate probabilities using the values of x and z and our log-odds coefficients. The formula in the plogis() function is the linear predictor we expressed above mathematically.


p <- plogis(b0 + 
              b1*(x == 1) +
              b2*(z == 1) +
              b3*(x == 1)*(z == 1))

We use the table() function to demonstrate this simply generated our four hypothesized probabilities based on the values of x and z.


table(p)


p
 0.3  0.4  0.5 0.85 
  51   47   54   48 

Now we can use these probabilities to simulate our response, y. For this we’ll use the rbinom() function. This function draws random values from a binomial distribution with given size and prob parameters. With size = 1, this function simulates either 0 or 1, where the probability of getting a 1 is the probability supplied to the prob argument. The prob argument is vectorized, which means we can pass it our vector of probabilities and have the function draw a 0 or 1 based on each element of the vector.


 y <- rbinom(n = n, size = 1, prob = p)

All of our data is now simulated. It’s not necessary, but we can put all our data into a data frame and look at the first few rows:


d <- data.frame(y, x, z)
head(d)


  y x z
1 0 0 0
2 1 1 0
3 0 1 1
4 0 0 1
5 1 0 1
6 0 0 1

Let’s fit a logistic regression model to this data and see how well it works. Below we model y as a function of x and z and their interaction. This is the precise model we used to simulate the data so it should work well and detect the interaction. If it doesn’t, that means our sample size of 200 is too small. To fit the model we use the glm() function with family = binomial. We specify the model as y ~ x + z + x:z, where x:z is the interaction. Calling summary() on the model object returns the estimated coefficients. Hopefully they’re somewhat close to the coefficients we used above to simulate the data.


m <- glm(y ~ x + z + x:z, data = d, family = binomial)
summary(m)



Call:
glm(formula = y ~ x + z + x:z, family = binomial, data = d)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.04256    0.29180  -0.146  0.88404    
x            2.75061    0.66385   4.143 3.42e-05 ***
z           -0.47874    0.41113  -1.164  0.24424    
x:z         -2.15520    0.77379  -2.785  0.00535 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 272.74  on 199  degrees of freedom
Residual deviance: 229.71  on 196  degrees of freedom
AIC: 237.71

Number of Fisher Scoring iterations: 5

Of interest is whether or not we detected an interaction. The usual way to determine this is checking if the p-value for the interaction coefficient, x:z, is “significant” at a particular level, say 0.05 or 0.01. A small p-value provides evidence against the null hypothesis that the interaction coefficient is 0. This doesn’t tell us anything about how important the interaction is or the nature of the interaction. For that we would need to follow up the analysis with effect plots or pairwise comparisons. But if we’re able to consistently detect the interaction in our simulation, this can help us to at least establish a minimum sample size for our study.

So far we have only simulated one data set and run one analysis. To see how often we detect the sample size, let’s simulate 1000 data sets, run 1000 analyses, and calculate the proportion of times the model detects an interaction at the significance level of 0.05. This can serve as an estimate of power. To do this, we’ll use the replicate() function. We wrap our previous code in curly braces and give it to the expr argument, which is short for “expression”. We also add one last line of code to check if the p-value for the interaction is less than 0.05. We do this by extracting the coefficients object from the summary() result and checking if the value in the 4th row and 4th column (i.e., the p-value for the interaction) is less than 0.05. This will produce a vector of 1000 logical values (TRUE/FALSE), which we save as “p_out”.


n <- 200
p_out <- replicate(n = 1000, expr = {
  x <- sample(0:1, size = n, replace = TRUE)
  z <- sample(0:1, size = n, replace = TRUE)
  p <- plogis(b0 + 
              b1*(x == 1) +
              b2*(z == 1) +
              b3*(x == 1)*(z == 1))
  y <- rbinom(n = n, size = 1, prob = p)
  d <- data.frame(y, x, z)
  m <- glm(y ~ x + z + x:z, data = d, family = binomial)
  summary(m)$coefficients[4,4] < 0.05
})

Since TRUE/FALSE values correspond to 1/0, we can take the mean of the p_out vector to get the proportion of TRUEs. This is our estimate of power.


mean(p_out)


[1] 0.523

Assuming the hypothesized probabilities we used to simulate this data are true, a sample size of 200 only has about 0.52 power to detect the interaction.

How many subjects should we sample to have 0.90 power? One way to figure this out is to simply try different sample sizes. We can make this a little easier by creating a function. Below we create a function called power_fn() with two arguments, n and sig.level. This is basically all our code above encapsulated in a function.


power_fn <- function(n, sig.level = 0.05){
  p_out <- replicate(n = 1000, expr = {
  x <- sample(0:1, size = n, replace = TRUE)
  z <- sample(0:1, size = n, replace = TRUE)
  p <- plogis(b0 + 
              b1*(x == 1) +
              b2*(z == 1) +
              b3*(x == 1)*(z == 1))
  y <- rbinom(n = n, size = 1, prob = p)
  d <- data.frame(y, x, z)
  m <- glm(y ~ x + z + x:z, data = d, family = binomial)
  summary(m)$coefficients[4,4] < sig.level
  })
  mean(p_out)
}

Now we can “apply” our function to a few candidate sample sizes using sapply().


sapply(c(350, 400, 450, 500), power_fn)


[1] 0.760 0.800 0.869 0.887

It looks like we need at least 500 subjects to have a power of about 0.90 to detect the interaction in our model at a significance level of 0.05.

Lowering the significance level requires larger sample sizes. Notice how the power decreases substantially when we specify a significance level of 0.01 instead of 0.05. Now 500 subjects doesn’t even give us power of 0.8.


sapply(c(350, 400, 450, 500), power_fn, sig.level = 0.01)


[1] 0.545 0.596 0.683 0.715

Of course all this rides on the assumption that our hypothesized probabilities are true. But this is no different from traditional closed-form formulas for estimating sample size and power, which also assume the hypothesized effect size is true.

Hopefully you now have a better understanding of how to use simulation to estimate power and sample size for logistic regression models that employ two binary predictors and their interaction.


References

  • Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
  • R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Clay Ford
Statistical Research Consultant
University of Virginia Library
January 19, 2024


For questions or clarifications regarding this article, contact statlab@virginia.edu.

View the entire collection of UVA Library StatLab articles, or learn how to cite.