StatLab Articles

Stata Basics: foreach and forvalues

There are times we need to do some repetitive tasks in the process of data preparation, analysis, or presentation. For instance, we may need to compute a set of variables in the same manner, rename or create a series of variables, or repetitively recode values of a number of variables. In this post, we show a few simple example "loops" using the Stata commands foreach, local and forvalues to handle some common repetitive tasks.

Stata, data management, data wrangling, Yun Tai

Stata Basics: Reshape Data

In this post, we demonstrate how to convert datasets between wide form and long form. This is also known as "reshaping data". Reshaping is often needed when you work with datasets that contain variables with some kinds of sequences, say, time-series data. It is fairly easy to transform data between wide and long forms in Stata using the reshape command, however you'll want to be careful when you do so to eliminate possible mistakes in the process of transforming. First, let's see how the wide and long forms look.

Stata, data management, data wrangling, data reshaping, Yun Tai

Stata Basics: Combine Data (Append and Merge)

When we first start working with data, usually in a statistics class, we mostly use clean and completed datasets as examples. Later on, we realize data is not always clean or complete when doing research or data analysis for other purposes. In reality, we often need to put two or more datasets together to begin whatever statistical analysis tasks we would like to perform. In this post, we demonstrate how to combine datasets using append and merge, which are row-wise combining and column-wise combining, respectively.

Stata, data management, data wrangling, Yun Tai

Stata Basics: Subset Data

Sometimes only parts of a dataset mean something to you. In this post, we show you how to subset a dataset in Stata by variables or by observations. We use the census.dta dataset installed with Stata as the sample data.

Stata, data management, data wrangling, Yun Tai

Stata Basics: Create, Recode and Label Variables

In this article we demonstrate how to create new variables, recode existing variables, and label variables and values of variables. We work with the census.dta data that is included with Stata to provide examples.

generate: create variables

Here we use the generate command to create a new variable representing the population younger than 18 years old. We do so by summing up the two existing variables: poplt5 (population < 5 years old) and pop5_17 (population of 5 to 17 years old).

Stata, data management, data wrangling, Yun Tai

Stata Basics: Data Import, Use and Export

In Stata, the first step of analyzing a dataset is opening the data in Stata so that it knows which file you are working with. Yes, you can simply double click on a Stata data file that ends in .dta to open it, but we prefer to write syntax so we can easily reproduce the same work or use the scripts again when working on similar tasks. In this post, we introduce methods of reading in, using, and saving Stata and other formats of data files.

Stata, data management, data wrangling, Yun Tai

Using Data.gov APIs in R

Data.gov catalogs US government data and makes them available on the web; you can find data on a variety of topics such as agriculture, business, climate, education, energy, finance, public safety, and many more. It is a good starting point for finding data if you don’t already know which particular data source to begin your search with; however, it can still be time consuming when it comes to actually downloading the raw data you need. Fortunately, Data.gov also includes APIs from across the government, which can help with obtaining raw datasets.

R, data wrangling, Yun Tai

Look People Are Going to Think... (Debate Rhetoric Redux)

I'm still looking at the rhetoric from the presidential debates, this time focusing on the first general election debate between Hillary Clinton and Donald Trump.

quanteda, R, text analysis, text mining, visualization, Michele Claibourn

Using a Census Data API with R

Data sets provided by the US Census Bureau, such as the Decennial Census and American Community Survey (ACS), are widely used by researchers, among others. You can certainly find and download census data from the Census Bureau website, from the licensed data source Social Explorer, or from other free sources such as IPUMS-USA and then load the data into a statistical package or other software to analyze or present the data.

R, data wrangling, visualization, Yun Tai

Debate Prep!

I'm teaching a Text as Data short course (using R) right now, and as a card-carrying political scientist, I couldn't resist using the ongoing campaign as an example (this was, in part, a way of handling my own anxiety about last Monday's debate---this is what I was doing while watching). So here goes...

quanteda, R, text analysis, text mining, Michele Claibourn