StatLab Articles

Stata Basics: Combine Data (Append and Merge)

When we first start working with data, usually in a statistics class, we mostly use clean and completed datasets as examples. Later on, we realize data is not always clean or complete when doing research or data analysis for other purposes. In reality, we often need to put two or more datasets together to begin whatever statistical analysis tasks we would like to perform. In this post, we demonstrate how to combine datasets using append and merge, which are row-wise combining and column-wise combining, respectively.

Stata, data management, data wrangling, Yun Tai

Stata Basics: Subset Data

Sometimes only parts of a dataset mean something to you. In this post, we show you how to subset a dataset in Stata by variables or by observations. We use the census.dta dataset installed with Stata as the sample data.

Stata, data management, data wrangling, Yun Tai

Stata Basics: Create, Recode and Label Variables

In this article we demonstrate how to create new variables, recode existing variables, and label variables and values of variables. We work with the census.dta data that is included with Stata to provide examples.

generate: create variables

Here we use the generate command to create a new variable representing the population younger than 18 years old. We do so by summing up the two existing variables: poplt5 (population < 5 years old) and pop5_17 (population of 5 to 17 years old).

Stata, data management, data wrangling, Yun Tai

Stata Basics: Data Import, Use and Export

In Stata, the first step of analyzing a dataset is opening the data in Stata so that it knows which file you are working with. Yes, you can simply double click on a Stata data file that ends in .dta to open it, but we prefer to write syntax so we can easily reproduce the same work or use the scripts again when working on similar tasks. In this post, we introduce methods of reading in, using, and saving Stata and other formats of data files.

Stata, data management, data wrangling, Yun Tai

Using Data.gov APIs in R

Data.gov catalogs US government data and makes them available on the web; you can find data on a variety of topics such as agriculture, business, climate, education, energy, finance, public safety, and many more. It is a good starting point for finding data if you don’t already know which particular data source to begin your search with; however, it can still be time consuming when it comes to actually downloading the raw data you need. Fortunately, Data.gov also includes APIs from across the government, which can help with obtaining raw datasets.

R, data wrangling, Yun Tai

Look People Are Going to Think... (Debate Rhetoric Redux)

I'm still looking at the rhetoric from the presidential debates, this time focusing on the first general election debate between Hillary Clinton and Donald Trump.

quanteda, R, text analysis, text mining, visualization, Michele Claibourn

Using a Census Data API with R

Data sets provided by the US Census Bureau, such as the Decennial Census and American Community Survey (ACS), are widely used by researchers, among others. You can certainly find and download census data from the Census Bureau website, from the licensed data source Social Explorer, or from other free sources such as IPUMS-USA and then load the data into a statistical package or other software to analyze or present the data.

R, data wrangling, visualization, Yun Tai

Debate Prep!

I'm teaching a Text as Data short course (using R) right now, and as a card-carrying political scientist, I couldn't resist using the ongoing campaign as an example (this was, in part, a way of handling my own anxiety about last Monday's debate---this is what I was doing while watching). So here goes...

quanteda, R, text analysis, text mining, Michele Claibourn

Getting Started with Exploratory Factor Analysis

Take a look at the following correlation matrix for Olympic decathlon data calculated from 280 scores from 1960 through 2004 (Johnson & Wichern, 2007, p. 499):

R, statistical methods, factor analysis, Clay Ford

An Introduction to Loglinear Models

Loglinear models model cell counts in contingency tables. They're a little different from other modeling methods in that they don't distinguish between response and explanatory variables. All variables in a loglinear model are essentially "responses."

To learn more about loglinear models, we'll explore the following data from Agresti (1996, Table 6.3). It summarizes responses from a survey that asked high school seniors in a particular city whether they had ever used alcohol, cigarettes, or marijuana.

R, statistical methods, loglinear models, Clay Ford

Research Data Services

Want updates in your inbox? Subscribe to our monthly Research Data Services Newsletter!

Stata Basics: Combine Data (Append and Merge)

Stata Basics: Subset Data

Stata Basics: Create, Recode and Label Variables

generate: create variables

Stata Basics: Data Import, Use and Export

Using Data.gov APIs in R

Look People Are Going to Think... (Debate Rhetoric Redux)

Using a Census Data API with R

Debate Prep!

Getting Started with Exploratory Factor Analysis

An Introduction to Loglinear Models

Research Data Services

Subscribe

Using the Library

About

Contact us