StatLab Articles

Principal Component Analysis (PCA)

Data that has many features or variables, sometimes with the number of variables even exceeding the number of samples, is often referred to as high-dimensional data. High-dimensional data may present challenges for computation, data management, and collinearity. Therefore, a common approach is to reduce the number of dimensions.

R, statistical methods, PCA, eigenvalues, eigenvectors, Lauren Brideau

Parallel Mediation Analysis

In many cases an independent variable affects the outcome not only directly but also indirectly through another variable. We call the in-between variable a mediator.

parallel mediation analysis, multiple mediation, mediation analysis, statistical methods, SEM, R, lavaan, Hyeseon Seo

Understanding ICC

The intraclass correlation coefficient, or ICC, summarizes the relative value of random effect groups in mixed-effect/multilevel models. To be more precise, it quantifies the amount of variability in the outcome that is due to variance between random effect groups. If you’re not sure what any of this means, this article is for you.

R, ICC, mixed effect models, statistical methods, Clay Ford

Understanding Dunnett’s Test

Multiple comparison procedures are fundamental in experimental research. Dunnett’s test, which compares multiple treatments to a single control, is particularly common in laboratory studies. When multiple comparisons are made, proper statistical methods are essential to control false positives. This article demonstrates how the number of comparisons affects p-values in Dunnett’s test, implements the procedure in R, and discusses strategies to improve statistical power.

R, statistical methods, Dunnett's Test, multiple comparisons, FWER, Hyeseon Seo

Mixed Effect versus Fixed Effect Models

When faced with analyzing clustered or repeated measures data, some researchers and analysts turn to mixed effect modeling. Yet others when faced with the same situation turn to fixed effect modeling. Which one you choose is usually dictated by your field of study and statistical education. Those coming from fields like Psychology, Ecology, and Education often choose mixed effect modeling, while those coming from fields like Economics and Political Science typically choose fixed effect modeling.

R, statistical methods, mixed effect models, fixed effect models, GLS, Clay Ford

Making Maps with Raster Data in R

To work with raster data, we will be using a few different packages. If you do not have one or more of the packages, you can install them using install.packages(). After installing packages, you can load them using library().

R, visualization, spatial data, GIS, Lauren Brideau

Distribution-Free Confidence Intervals for Percentiles

Percentiles are order statistics. This means they’re determined by ordering observations from smallest to largest and then finding the value below which some percentage of the data lie. The most common percentile is the median. It’s simply the middle value (or the average of the two middle values if there are an even number of observations). Fifty percent of the data lie below the median. Other percentiles frequently of interest are the 25th and 75th percentiles. These are the data values below which lie 25 and 75 percent of the data, respectively.

R, statistical methods, confidence intervals, bootstrap, Clay Ford

Getting Started with Multiple Imputation for Longitudinal Data

Multiple Imputation (MI) is a method for dealing with missing data in a statistical analysis. The general idea of MI is to simulate values for missing data points using the data we have on hand, generating multiple new sets of complete data. We then run our proposed analysis on all the complete data sets and combine the results to obtain overall estimates. The end product is an analysis with proper standard errors and unbiased estimates.

multiple imputation, simulation, mixed effect models, R, statistical methods, Clay Ford

Addressing Multicollinearity

When a linear model has two or more highly correlated predictor variables, it is often said to suffer from multicollinearity. The danger of multicollinearity is that estimated regression coefficients can be highly uncertain and possibly nonsensical (e.g., getting a negative coefficient that common sense dictates should be positive). Multicollinearity is usually detected using variance inflation factors (VIF).

R, statistical methods, multicollinearity, ridge regression, PCA, Clay Ford

Correlation: Pearson, Spearman, and Kendall's tau

Correlation is a widely used method that helps us explore how two variables change together, providing insight into whether a relationship exists between them. For example, imagine we want to understand if there is an association between time spent studying and exam scores. Or, maybe we think that people who eat more cookies are happier. Or, we want to see if people who live near a park hear more birds singing in the morning. Correlation is a valuable tool for understanding the extent to which variables are associated.

R, correlation, statistical methods, spearman correlation, kendall tau, Lauren Brideau

Research Data Services

Want updates in your inbox? Subscribe to our monthly Research Data Services Newsletter!