Getting Started with Multilevel Regression and Poststratification

Multilevel Regression and Poststratification (MRP) is a method of adjusting model estimates for non-response. By “non-response” we mean under-sampled groups in a population. For example, imagine conducting a phone survey to estimate the percentage of a population that approves of an elected official. It’s likely that certain age groups in the population will be under-sampled because they’re less likely to answer a call from an unfamiliar number. MRP allows us to analyze the data and adjust the estimate by taking the under-sampled groups into account.

In this article we present a basic example of MRP using simulated data. The inspiration for this article comes from section 17.2 of the book Regression and Other Stories (Gelman, Hill, and Vehtari 2020, pages 320 - 322), which also demonstrates MRP via simulation. What makes this article different is that we use a simpler data set and scale down the code to hopefully make the topic more accessible to beginners.

Simple poststratfication

To get started, let’s imagine a new mandatory “special building fee” has been added to the tuition for all students at a large public university. This new fee is intended to help cover the cost of improving and expanding student facilities. The fee is quite large and we’re interested in learning whether or not students approve of the fee. To investigate we plan to sample 1000 students and ask them “Do you approve of the special building fee (Yes/No)?”

To begin our simulation, we need to generate student enrollment numbers. Our imaginary college has 4000 each of freshman, sophomores, juniors and seniors, and 7000 graduate students, for a total of 23,000 students.


N <- c(rep(4000, 4), 7000)
N


[1] 4000 4000 4000 4000 7000

In real life, we would have to obtain such numbers from the university. In other scenarios, we might have to turn to census data, such as the American Community Survey. Regardless, to conduct MRP, you will always need to determine population values like these.

For our simulation, we also need to declare the true probabilities of supporting this fee for each class of student. In real life, we don’t know these values. That’s why we’re doing the survey: to estimate them. But to simulate data we need to know the true probabilities in advance. We decide that freshmen and sophomores will mostly support the new fee (since they will likely benefit from it), while juniors, seniors and graduate students will not. Below 1 corresponds to freshmen, 2 to sophomores, and so on.


p_support <- c("1" = 0.65, "2" = 0.65, "3" = 0.4, "4" = 0.4, 
               "5" = 0.2)

Since we know the true probabilities and the group population sizes, we can calculate the true proportion of students that support the special building fee. To do this we need to take a weighted mean. This means we weight each probability above with the class population, sum the weighted probabilities, and then divide by the total population.


sum(p_support*N)/sum(N)


[1] 0.426087

We can also use the base R function weighted.mean() to make this calculation.


weighted.mean(x = p_support, w = N)


[1] 0.426087

The true proportion of approval is about 0.43. Going forward we can compare our simulated results to this value.

Now let’s sample 1000 students. Ideally, we would take a representative sample where each group is sampled in proportion to their population. Below we do that by supplying our vector of class populations to the prob argument of the sample() function, which we use to sample our imaginary students. (The sample() function automatically rescales the population values to probabilities.) When we count how many people we sampled using the table() function, we see that we have a representative sample. Classes 1 - 4 are fairly even and class 5 (the graduate students) is much bigger.


RNGversion('4.4.2') #  set the random generator to use R version 4.4.2
n <- 1000
set.seed(10)
people <- sample(1:5, size = n, replace = TRUE, prob = N)
table(people)


people
  1   2   3   4   5 
188 172 185 156 299

Now we use the true probabilities to simulate survey responses. We do this by simulating a series of zeroes and ones from a binomial distribution using the rbinom() function. In this case a one will correspond to a “yes” answer. The syntax p_support[people], supplied to the prob argument, takes the 5-element vector of p_support and expands it to length 1000 by using the sampled numbers in the “people” vector as index values. This creates a vector of 1000 separate probabilities to be used for each of the simulated binomial responses. Once we have our survey responses, we can estimate the proportion of students who approve of the fee by simply taking the mean of the “ideal_response” vector. (Recall that the mean of a sequence of zeroes and ones is simply the proportion of ones.) The estimated approval rating of the special building fee is very close to the true value.


ideal_response <- rbinom(n = n, size = 1, prob = p_support[people])
mean(ideal_response)


[1] 0.432

Unfortunately, it is difficult to sample groups in proportion to their populations. For example, we may have a hard time finding graduate students to take our survey. This is called non-response. We can simulate this, too. Below we define probabilities of responding to the survey. Notice graduate students (5) have only a 0.1 probability of response.


p_response <- c("1" = 0.4, "2" = 0.4, "3" = 0.2, "4" = 0.1, "5" = 0.1)

Now let’s take a new sample from our population using our response probabilities. Notice graduate students are now the smallest sample despite being the largest group in the population.


set.seed(20)
people <- sample(1:5, size = n, replace = TRUE, prob = p_response)
table(people)


people
  1   2   3   4   5 
341 326 148  95  90

When we generate survey responses using this sample, which under-samples graduate students, we get a biased result that shows a majority of students (0.55) supporting the new fee.


set.seed(30)
survey_response <- rbinom(n = n, size = 1, prob = p_support[people])
mean(survey_response)


[1] 0.552

This is where poststratification enters the picture. Poststratification can be summarized in two steps:

determine the proportion who answered “yes” in each class
compute a weighted mean using the class populations

The proportion who answered “yes” in each class can be determined by cross-tabulating the survey responses with the “people” vector which contains class membership. Then we can pipe the result into the proportions() function to calculate proportions along the rows (margin = 1). The column labeled “1” contains the proportion of students in each class who support the new building fee.


tab <- table(people, survey_response) |> 
  proportions(margin = 1)
tab


      survey_response
people         0         1
     1 0.3460411 0.6539589
     2 0.3282209 0.6717791
     3 0.6418919 0.3581081
     4 0.6315789 0.3684211
     5 0.7555556 0.2444444

Then we take the class proportions, along with the class populations (N), and calculate the poststratified estimate using the weighted.mean() function:


weighted.mean(x = tab[,"1"], w = N)


[1] 0.4313122

This estimate is very close to the true level of support despite the high levels of non-response among the older students! Also notice that poststratification simply means calculating a weighted mean, where the weights are the true population values for each group.

Regression and Poststratification

We could have also computed this estimate using binomial logistic regression. Below we create a data frame that contains the survey response and the class of the respondent. Then we model the response as a function of class.


d <- data.frame(response = survey_response, class = factor(people))
m <- glm(response ~ class, data = d, family = binomial)

Next we use our model to estimate the proportion of “yes” responses for each class. To do this we use the predict() function and provide a data frame of class values for which we want predicted proportions. Notice we need to explicitly set the values 1-5 to be factors.


est_support <- predict(m, newdata = data.frame(class = factor(1:5)),
                       type = "response")
est_support


        1         2         3         4         5 
0.6539589 0.6717791 0.3581081 0.3684211 0.2444444

If you look closely you’ll see these are the exact same proportions we got above using the table() and proportions() functions.

Finally we poststratify to get our estimated proportion of students who support the building fee.


weighted.mean(x = est_support, w = N)


[1] 0.4313122

Since we used logistic regression, we can also request standard errors for our predictions, which we can then use to estimate the uncertainty in our final estimate. Below we run predict() but this time with se.fit = TRUE to request standard errors for each prediction. Then we add and subtract two standard errors to the predictions to create vectors of lower and upper bounds. Finally we poststratify the lower and upper values to create a confidence interval.


est_support <- predict(m, newdata = data.frame(class = factor(1:5)),
                       type = "response",
                       se.fit = TRUE)
# add/subtract two standard errors 
lb <- est_support$fit - 2*est_support$se.fit
ub <- est_support$fit + 2*est_support$se.fit
# poststratify
mrp_est_lb <- weighted.mean(lb, N)
mrp_est_ub <- weighted.mean(ub, N)
c(mrp_est_lb, mrp_est_ub)


[1] 0.3548098 0.5078145

It’s a fairly wide interval but it does capture the true value of 0.43.

What we have demonstrated so far is Regression and Poststratification. But the title of this article says Multilevel Regression and Poststratification. How does multilevel come into play?

Multilevel Regression and Poststratification

Multilevel modeling (also known as mixed-effect modeling) allows coefficients to be estimated for specific groups in your data. For example, a random-intercept model estimates different intercepts for specified groups. These groups are often random samples from a larger population, such as people, classrooms, mice litters, and so on. But the groups can also represent fixed levels such as ethnicity, US states, age groups, or in our case, college classification.

Let’s fit a multilevel model to our simulated data. Below we use the glmer() function from the {lme4} package to fit a random-intercept model. The model syntax (1|class) says “the intercept is conditional on class”. In other words, fit a separate intercept to each level of class. Calling coef() on the fitted model object shows each class gets its own intercept.


library(lme4)
m2 <- glmer(response ~ 1 + (1|class), data = d, family = binomial)
coef(m2)


$class
  (Intercept)
1   0.6167566
2   0.6930053
3  -0.5605574
4  -0.5082169
5  -1.0284200

attr(,"class")
[1] "coef.mer"

As before, we can make predictions for each class and then poststratify the results. Once again our estimated proportion is very close to the true value. This is truly Multilevel Regression and Poststratification.


est_support <- predict(m2, newdata = data.frame(class = factor(1:5)),
                       type = "response")
weighted.mean(x = est_support, w = N)


[1] 0.437579

Why use a multilevel model when the poststratified estimate appears to be about the same as the previous regression model? The advantage of a multilevel model is its ability to share information between the levels of a group. Gelman and Hill (2007) call this a partial-pooling compromise (page 257), where “pooling” refers to combining data to make an estimate. In our example, pooling all the classes would result in a single estimated proportion. This would mean just taking the overall proportion of responses as our estimate without accounting for class. Not pooling the classes would result in five separate estimated proportions, each based only on the data for that level. That’s what we did with the binomial logistic regression model. The multilevel model is a compromise between the two and pulls the five proportions closer to the overall proportion. We can see this by comparing the predicted proportions from each of the three models.

First we fit a model where we pool all the classes. The syntax response ~ 1 means we only estimate an intercept, which is a fancy way of saying “just take the overall proportion.” This produces one estimated proportion for each class: 0.552


m0 <- glm(response ~ 1, data = d, family = binomial)
predict(m0, newdata = data.frame(class = factor(1:5)),
                       type = "response")


    1     2     3     4     5 
0.552 0.552 0.552 0.552 0.552

Next we look at the estimated proportions of the logistic regression model, where there is no pooling of data.


predict(m, newdata = data.frame(class = factor(1:5)),
                       type = "response")


        1         2         3         4         5 
0.6539589 0.6717791 0.3581081 0.3684211 0.2444444

And finally we look at the predicted proportions of the multilevel logistic regression model, where there is partial pooling of data. Notice the proportions have all been pulled slightly toward the overall mean of 0.552.


predict(m2, newdata = data.frame(class = factor(1:5)),
                       type = "response")


        1         2         3         4         5 
0.6494805 0.6666351 0.3634185 0.3756116 0.2633905

The power of partial-pooling is that it prevents levels with fewer observations from having extreme estimates. This is not a problem for our simulated data, but is likely to occur in real-world data where we might have fewer observations for, say, certain ethnic categories or income levels.

We can also estimate a confidence interval for our multilevel estimated proportion, but we need to use the bootstrap to accomplish this. The reason for this is that the predict() method for lmer() models does not return standard errors. Fortunately, the {lme4} package provides the bootMer() function to implement model-based bootstrapping. The basic idea is that we use our model to simulate responses, refit the model to the new set of responses, calculate the poststratified estimate, and repeat many times. When finished we can calculate a confidence interval based on the bootstrapped MRP estimates.

We first define a function that calculates the poststratified estimate called mrp_f(). This is simply our code from before defined as a function. We then use bootMer() to run the model-based bootstrap 999 times, each time calculating the poststratified estimate. Notice the argument use.u = TRUE. This ensures that the model simulates data for all the class levels using the original estimated intercepts. (The default setting is to estimate new intercepts.) The .progress = "txt" argument displays a progress bar as the bootstrapping progresses.


mrp_f <- function(x){
  est_support <- predict(x, newdata = data.frame(class = factor(1:5)),
                         type = "response")
  weighted.mean(est_support, N)
}
bout <- bootMer(m2, FUN = mrp_f, nsim = 999, .progress = "txt", 
                use.u = TRUE)

When this finishes running the bout object is a list with several objects. The t object is the vector of MRP estimates. We can calculate a confidence interval by calculating the standard deviation of the 999 MRP estimates, which serves as an estimate of standard error, and then adding/subtracting two standard errors to the original MRP estimate.


weighted.mean(x = est_support, w = N) + c(-1,1)*2*sd(bout$t)


[1] 0.3983542 0.4768037

The interval easily captures the true value of 0.43. Also notice how much tighter this interval is than the interval estimated by the “no pooling” logistic regression model.

Bayesian Multilevel Regression and Poststratification

Many practitioners of MRP prefer to use Bayesian multilevel models when doing MRP. For example, the online book, Multilevel Regression and Poststratification Case Studies (Lopez-Martin, et al. 2022), exclusively uses Bayesian multilevel models. The {lme4} multilevel analysis we carried out above is sometimes called a frequentist model. A frequentist multilevel model assumes parameters such as intercepts are fixed but unknown. The estimated intercepts are an attempt to identify the true fixed values. A Bayesian multilevel model doesn’t care if the intercepts have a “true” value or not. It instead estimates the probability distribution of values that the intercepts might take. So whereas the frequentist model estimates a point value, the Bayesian model estimates an entire distribution, called the posterior distribution. The specifics of Bayesian multilevel modeling go far beyond the scope of this article, but we’ll demonstrate how to obtain an MRP estimate using a Bayesian approach.

First we load the {rstanarm} package, which provides {lme4}-like modeling syntax for Bayesian multilevel modeling. Then we fit the model using the stan_glmer() function. This function provides default prior distributions that are frequently good enough out-of-the-box. (For the Bayesian model to estimate posterior distributions from the data, it needs prior distributions to get started.) Notice the model syntax is identical to the model syntax used with glmer(). This can make it easier for a researcher accustomed to traditional frequentist modeling to transition to a Bayesian framework. Before executing the code below, be warned: it could take a couple of minutes to run. That’s one of the drawbacks of Bayesian modeling.


library(rstanarm)
m3 <- stan_glmer(response ~ (1|class), data = d, family = binomial)

Once the model is fit, we use the posterior_epred() function to make predictions for our five class levels. Whereas frequentist methods only make one set of predictions, this function makes 4000! This is because a Bayesian model is fit by taking samples from the posterior distributions of the model parameters. The default sample size of {rstanarm} is 4000. Therefore when we make predictions, we make 4000 predictions because we have 4000 sets of model coefficients. Below we show the first six sets of predictions.


pred_sim <- posterior_epred(m3, newdata = data.frame(class = factor(1:5)))
dim(pred_sim)


[1] 4000    5


head(pred_sim)


          
iterations         1         2         3         4         5
      [1,] 0.6710584 0.6802820 0.3684973 0.3398175 0.2873937
      [2,] 0.6268714 0.6817769 0.3287983 0.3243950 0.2322064
      [3,] 0.6443309 0.6899975 0.3380987 0.3467824 0.2272521
      [4,] 0.6608415 0.6516225 0.3649863 0.3864817 0.2092763
      [5,] 0.6477668 0.6860014 0.4094303 0.3987220 0.2494082
      [6,] 0.6786143 0.6614597 0.3535026 0.3768066 0.3013672

To get a single estimate for each class, we take the means of the columns using the colMeans() function.


pred_est <- colMeans(pred_sim)
pred_est


        1         2         3         4         5 
0.6494100 0.6682584 0.3630604 0.3741806 0.2601629

And then we find the MRP estimate, which is very close to the true value of 0.43.


weighted.mean(pred_est, w = N)


[1] 0.4365555

To calculate uncertainty in the MRP estimate, we can use our matrix of 4000 predictions to make 4000 MRP estimates and then take the standard deviation of those MRP estimates to estimate a standard error. The expression pred_sim %*% N/sum(N) is matrix multiplication that calculates the 4000 MRP estimates. To get the standard error we simply calculate the standard deviation of those 4000 estimates using the sd() function. Then we add/subtract two standard errors to the original MRP estimate to obtain a confidence interval. The Bayesian model returns an interval about the same size as the frequentist interval above.


poststrat_sim <- pred_sim %*% N/sum(N)
weighted.mean(pred_est, w = N) + c(-1,1)*2*sd(poststrat_sim)


[1] 0.3985078 0.4746033

Does this mean there are no substantive differences between frequentist and Bayesian approaches when it comes to MRP? Not always. One of the draws of Bayesian models is its ability to incorporate informative prior distributions that can help estimate models with sparse data in groups. Traditional frequentist multilevel models may not produce reliable results in such scenarios.

Hopefully you now have a better understanding of MRP. Please keep in mind that our simulated example was simple and used only one demographic variable. Most practical applications of MRP will involve several variables. For example, in addition to class, we could have collected sex (Male/Female), residence (in-state/out-of-state), and ethnicity (White/Black/Hispanic/Other). This would result in 5 x 2 x 2 x 4 = 80 student types. This would mean obtaining population counts for 80 student types and fitting a more complex multilevel model in order to compute an MRP estimate.

This article is meant to help you take your first steps to performing MRP. We recommend you explore the references below to learn more.

References

Bates, D, Maechler, M, Bolker, B, and Walker, S (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Gelman, A, and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. (Chapter 14)
Gelman, A, Hill, J, and Vehtari, A. (2020). Regression and Other Stories. Cambridge University Press. (Chapter 17)
Ghitza, Y and Gelman, A. (2013). Deep Interactions with MRP: Election Turnout and Voting Patterns Among Small Electoral Subgroups. American Journal of Political Science, Vol. 57, No. 3, Pages 762 - 776.
Goodrich B, Gabry J, Ali I and Brilleman S. (2024). rstanarm: Bayesian applied regression modeling via Stan. R package version 2.32.1 https://mc-stan.org/rstanarm.
Kennedy, L and Gabry, J. (2020). MRP with rstanarm. rstanarm Package vignette.
Lopez-Martin, J, Phillips, JH, and Gelman, A (2022). Multilevel Regression and Poststratification Case Studies. bookdown.org.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Version 4.4.2.