data management

There are times we need to do some repetitive tasks in the process of data preparation, analysis, or presentation. For instance, we may need to compute a set of variables in the same manner, rename or create a series of variables, or repetitively recode values of a number of variables. In this post, we show a few simple example "loops" using the Stata commands foreach, local and forvalues to handle some common repetitive tasks.

In this post, we demonstrate how to convert datasets between wide form and long form. This is also known as "reshaping data". Reshaping is often needed when you work with datasets that contain variables with some kinds of sequences, say, time-series data. It is fairly easy to transform data between wide and long forms in Stata using the reshape command, however you'll want to be careful when you do so to eliminate possible mistakes in the process of transforming. First, let's see how the wide and long forms look.

When we first start working with data, usually in a statistics class, we mostly use clean and completed datasets as examples. Later on, we realize data is not always clean or complete when doing research or data analysis for other purposes. In reality, we often need to put two or more datasets together to begin whatever statistical analysis tasks we would like to perform. In this post, we demonstrate how to combine datasets using append and merge, which are row-wise combining and column-wise combining, respectively.

Sometimes only parts of a dataset mean something to you. In this post, we show you how to subset a dataset in Stata by variables or by observations. We use the census.dta dataset installed with Stata as the sample data.

In this article we demonstrate how to create new variables, recode existing variables, and label variables and values of variables. We work with the census.dta data that is included with Stata to provide examples.

generate: create variables

Here we use the generate command to create a new variable representing the population younger than 18 years old. We do so by summing up the two existing variables: poplt5 (population < 5 years old) and pop5_17 (population of 5 to 17 years old).

In Stata, the first step of analyzing a dataset is opening the data in Stata so that it knows which file you are working with. Yes, you can simply double click on a Stata data file that ends in .dta to open it, but we prefer to write syntax so we can easily reproduce the same work or use the scripts again when working on similar tasks. In this post, we introduce methods of reading in, using, and saving Stata and other formats of data files.