StatLab Articles

Creating an SQLite database for Use with R

When you import or load data into R, the data are stored in random-access memory (RAM). This is the memory that is deleted when you close R or shut off your computer. It’s very fast but temporary. If you save your data, it is saved to your hard drive. But when you open R again and load the data, once again it is loaded into RAM. While many newer computers come with lots of RAM (such as 16 GB), it’s not an infinite amount. When you open RStudio, you’re using RAM even if no data is loaded. Open a web browser or any other program and they too are loaded into RAM.

R, data wrangling, SQL, SQLite, Clay Ford

Simulating Data for Count Models

A count model is a linear model where the dependent variable is a count. For example, the number of times a car breaks down, the number of rats in a litter, the number of times a young student gets out of his seat, etc. Counts are either 0 or a positive whole number, which means we need to use special distributions to generate the data.

R, simulation, statistical methods, poisson regression, negative binomial regression, zero-inflated models, Clay Ford

Simulating a Logistic Regression Model

Logistic regression is a method for modeling binary data as a function of other variables. For example we might want to model the occurrence or non-occurrence of a disease given predictors such as age, race, weight, etc. The result is a model that returns a predicted probability of occurrence (or non-occurrence, depending on how we set up our data) given certain values of our predictors. We might also be able to interpret the coefficients in our model to summarize how a change in one predictor affects the odds of occurrence.

R, logistic regression, power analysis, simulation, statistical methods, Clay Ford

An Introduction to Analyzing Twitter Data with R

NOTE: As of March 2023, the free version of the Twitter API no longer allows read requests. This means the instructions below to create a developer account, access Twitter, and download tweets no longer works as written. If you have a paid "Basic" tier or higher then these instructions may work for you, but we have not verified this.

R, text analysis, text mining, Leah Malkovich

Getting Started with Multiple Imputation in R

Whenever we are dealing with a dataset, we almost always run into a problem that may decrease our confidence in the results that we are getting - missing data! Examples of missing data can be found in surveys - where respondents intentionally refrained from answering a question, didn’t answer a question because it is not applicable to them, or simply forgot to give an answer. Or our dataset on trade in agricultural products for country-pairs over years could suffer from missing data as some countries fail to report their accounts for certain years.

R, linear regression, statistical methods, multiple imputation, Aycan Katitas

Digital Governance Lab Proposal

Related Scholarship

A Guide to Python in QGIS

This post is something I’ve been thinking about writing for a while. I was inspired to write it by my own trials and tribulations, which are still ongoing, while working with the QGIS API, trying to programmatically do stuff in QGIS instead of relying on available widgets and plugins. I have spent, and will probably continue to spend, many hours scouring the internet and especially Stack Overflow looking for answers of how to use various classes, methods, attributes, etc.

Python, data wrangling, QGIS, Erich Purpur

How to Create and Export Print Layouts in Python for QGIS 3

I've been struggling off and on for literally months trying to create and export a print layout using Python for QGIS 3. Or PyQGIS 3 for short. I have finally figured out may of the ins and outs of the process and hopefully this will serve as a guide to save someone else a lot of effort and time.

Python, visualization, QGIS, Erich Purpur

Analysis of Ours to Shape Comments, Part 5

Introduction

In the penultimate post of this series, we’ll use some unsupervised learning approaches to uncover comment clusters and latent themes among the comments to President Ryan’s Ours to Shape website.

The full code to recreate the analysis in the blog posts is available on GitHub.

Ours to Shape, quanteda, R, text analysis, text mining, Michele Claibourn

Analysis of Ours to Shape Comments, Part 4

Introduction

We're still analyzing the comments submitted to President Ryan’s Ours to Shape website.

In the fourth installment of this series (we’re almost done, I promise), we’ll look at the sentiment – aka positive-negative tone, polarity, affect – of the comments to President Ryan’s Ours to Shape website.

Ours to Shape, quanteda, R, text analysis, text mining, Michele Claibourn