StatLab Articles

A Beginner’s Guide to Marginal Effects

What are average marginal effects? If we unpack the phrase, it looks like we have effects that are marginal to something, all of which we average. So let’s look at each piece of this phrase and see if we can help you get a better handle on this topic.

R, logistic regression, statistical methods, marginal effects, marginal means, emmeans, Clay Ford

The Intuition Behind Confidence Intervals

Say it with me: An X% confidence interval captures the population parameter in X% of repeated samples.

In the course of our statistical educations, many of us had that line (or some variant of it) crammed, wedged, stuffed, and shoved into our skulls until definitional precision was leaking out of noses and pooling on our upper lips like prop blood.

Or, at least, I felt that way.

R, simulation, statistical methods, confidence intervals, Jacob Goldstein-Greenwood

Power and Sample Size Analysis Using Simulation

The power of a test is the probability of correctly rejecting a null hypothesis. For example, let’s say we suspect a coin is not fair and lands heads 65% of the time.

R, power analysis, simulation, statistical methods, Clay Ford

Post Hoc Power Calculations Are Not Useful

It is well documented that post hoc power calculations are not useful (Althouse, 2020; Goodman & Berlin, 1994; Hoenig & Heisey, 2001). Also known as observed power or retrospective power, post hoc power purports to estimate the power of a test given an observed effect size. The idea is to show that a “non-significant” hypothesis test failed to achieve significance because it wasn’t powerful enough. This allows researchers to entertain the notion that their hypothesized effect may actually exist; they just needed to use a bigger sample size.

R, power analysis, simulation, statistical methods, Clay Ford

Understanding Ordered Factors in a Linear Model

Consider the following data from the text Design and Analysis of Experiments, 7th ed. (Montgomery, 2009, Table 3.1). It has two variables: power and rate. power is a discrete setting on a tool used to etch circuits into a silicon wafer. There are four levels to choose from. rate is the distance etched measured in Angstroms per minute. (An Angstrom is one ten-billionth of a meter.) Of interest is how (or if) the power setting affects the etch rate.

R, linear regression, statistical methods, ordered factors, Clay Ford

Ask Better Code Questions (and Get Better Answers) With Reprex

In the forums and Q&A sections of websites like Stack Overflow, GitHub, and forum.posit.co, there is a volunteer force of data-science detectives, code consultants, and error-fighting emissaries ready to offer assistance to programmers who find themselves staring down unhappy code that’s resisting placation.

R, data wrangling, regular expressions, reprex, Jacob Goldstein-Greenwood

Getting Started with Generalized Estimating Equations

Generalized estimating equations, or GEE, is a method for modeling longitudinal or clustered data. It is usually used with non-normal data such as binary or count data. The name refers to a set of equations that are solved to obtain parameter estimates (i.e., model coefficients). If interested, see Agresti (2002) for the computational details. In this article we simply aim to get you started with implementing and interpreting GEE using the R statistical computing environment.

R, effect plots, mixed effect models, statistical methods, GEE, Clay Ford

Getting Started with Binomial Generalized Linear Mixed Models

Binomial generalized linear mixed models, or binomial GLMMs, are useful for modeling binary outcomes for repeated or clustered measures. For example, let’s say we design a study that tracks what college students eat over the course of 2 weeks, and we’re interested in whether or not they eat vegetables each day. For each student, we’ll have 14 binary events: eat vegetables or not.

R, logistic regression, mixed effect models, simulation, statistical methods, binomial GLMM, Clay Ford

Getting Started with Web Scraping in Python

"Web scraping," or "data scraping," is simply the process of extracting data from a website. This can, of course, be done manually: You could go to a website, find the relevant data or information, and enter that information into some data file that you have stored locally. But imagine that you want to pull a very large dataset or data from hundreds or thousands of individual URLs. In this case, extracting the data manually sounds overwhelming and time-consuming.

Python, data wrangling, web scraping, Hannah Lewis

A Brief on Brier Scores

Not all predictions are created equal, even if, in categorical terms, the predictions suggest the same outcome: “X will (or won’t) happen.” Say that I estimate that there’s a 60% chance that 100 million COVID-19 vaccines will be administered in the US during the first 100 days of Biden’s presidency, but my friend estimates that there’s a 90% chance of that outcome.

R, statistical methods, Brier scores, Jacob Goldstein-Greenwood

Research Data Services

Want updates in your inbox? Subscribe to our monthly Research Data Services Newsletter!