If you have ever performed binary logistic regression in R using the `glm()`

function, you may have noticed a summary of “Deviance Residuals” at the top of the summary output. In this article, we talk about how these residuals are calculated and what we can use them for. We also talk about other types of residuals available for binary logistic regression.

# logistic regression

## What is Logistic Regression?

Logistic regression is a predictive analysis that estimates/models the probability of event occurring based on a given dataset. This dataset contains both independent variables, or predictors, and their corresponding dependent variables, or responses.

Let’s say we fit a logistic regression model for the purpose of predicting the probability of low infant birth weight, which is an infant weighing less than 2.5 kg. Below we fit such a model using the `birthwt`

data set that comes with the MASS package in R. (This is an example model and not to be used as medical advice.)

We first subset the data to select four variables:

*This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools can be of limited use.*

What are *average marginal effects*? If we unpack the phrase, it looks like we have *effects* that are *marginal* to something, all of which we *average*. So let’s look at each piece of this phrase and see if we can help you get a better handle on this topic.

Binomial generalized linear mixed models, or binomial GLMMs, are useful for modeling binary outcomes for repeated or clustered measures. For example, let’s say we design a study that tracks what college students eat over the course of 2 weeks, and we’re interested in whether or not they eat vegetables each day. For each student, we’ll have 14 binary events: eat vegetables or not.

Logistic regression is a method for modeling binary data as a function of other variables. For example we might want to model the occurrence or non-occurrence of a disease given predictors such as age, race, weight, etc. The result is a model that returns a predicted probability of occurrence (or non-occurrence, depending on how we set up our data) given certain values of our predictors. We might also be able to interpret the coefficients in our model to summarize how a change in one predictor affects the odds of occurrence.

Logistic regression is a popular and effective way of modeling a binary response. For example, we might wonder what influences a person to volunteer, or not volunteer, for psychological research. Some do, some don’t. Are there independent variables that would help explain or distinguish between those who volunteer and those who don’t? Logistic regression gives us a mathematical model that we can we use to estimate the probability of someone volunteering given certain independent variables.