Nonparametric and Parametric Power: Comparing the Wilcoxon Test and the t-test
From 2004 to 2008, a series of four brief, disagreeing papers in the journal Medical Education took up the question of whether and when it’s appropriate to analyze data from Likert scales (i.e., integers reflecting degrees of agreement with statements) with parametric or nonparametric statistical methods.
Getting Started with Gamma Regression
In this article, we plan to get you up and running with gamma regression. But before we dive into that, let’s review the familiar normal distribution. This will provide some scaffolding to help us transition to the gamma distribution.
Understanding Deviance Residuals
If you have ever performed binary logistic regression in R using the glm()
function, you may have noticed a summary of “Deviance Residuals” at the top of the summary output. In this article, we talk about how these residuals are calculated and what we can use them for. We also talk about other types of residuals available for binary logistic regression.
Logistic Regression Four Ways with Python
What is Logistic Regression?
Logistic regression is a predictive analysis that estimates/models the probability of event occurring based on a given dataset. This dataset contains both independent variables, or predictors, and their corresponding dependent variables, or responses.
Getting Started with Bootstrap Model Validation
Let’s say we fit a logistic regression model for the purpose of predicting the probability of low infant birth weight, which is an infant weighing less than 2.5 kg. Below we fit such a model using the birthwt
data set that comes with the MASS package in R. (This is an example model and not to be used as medical advice.)
We first subset the data to select four variables:
Mathematical Annotation in R
In this article, we demonstrate how to include mathematical symbols and formulas in plots created with R. This can mean adding a formula in the title of the plot, adding symbols to axis labels, annotating a plot with some math, and so on.
Detecting Influential Points in Regression with DFBETA(S)
In regression modeling, influential points are observations that, individually, exert large effects on a model’s results—the parameter estimates (\(\hat{\beta_0}, \hat{\beta_1}, ..., \hat{\beta_j}\)) and, consequently, the model’s predictions (\(\hat{y_1}, \hat{y_2}, ..., \hat{y_i}\)).
Comparing Mixed-Effect Models in R and SPSS
Occasionally we are asked to help students or faculty implement a mixed-effect model in SPSS. Our training and expertise is primarily in R, so it can be challenging to transfer and apply our knowledge to SPSS. In this article we document for posterity how to fit some basic mixed-effect models in R using the lme4 and nlme packages, and how to replicate the results in SPSS.
In this article we work with R 4.2.0, lme4 version 1.1-29, nlme version 3.1-157, and SPSS version 28.0.1.1.
ROC Curves and AUC for Models Used for Binary Classification
This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools can be of limited use.
Comparing the Accuracy of Two Binary Diagnostic Tests in a Paired Study Design
There are many medical tests for detecting the presence of a disease or condition. Some examples include tests for lesions, cancer, pregnancy, or COVID-19. While these tests are usually accurate, they’re not perfect. In addition, some tests are designed to detect the same condition, but use a different method. A recent example are PCR and antigen tests for COVID-19. In these cases we might want to compare the two tests on the same subjects. This is known as a paired study design.