Causal inference an introduction

Welcome to causal inference. At the end of this course, you will be able to know what is causal inference, the assumptions we need to make to causally infer from data, and find causal answers using quasi-experimental approaches. We suggest that you take this course before taking econometrics. Econometrics is the application of causal inference in finding causal relationships among economic variables. This course on causal inference is a lot less technical and it lays the foundation for more advanced topics. Math is used minimally and only when it’s really really needed.

Let’s get started 🔥🔥🔥

What is causal inference?

Statistical inference helps us decide if there are relationships between variables and if so, how strong those relationships are. It also helps us understand if those relationships are due to chance or actually exists. Causal inference goes beyond simply finding relationships between variables. In causal inference, we are also interested in the direction of those relationships.

In a large share of all questions (especially those related to public policy and social sciences), we are after a causal question. We need to compare different worlds (like different outcomes) under different regimes (like causes). How does carbon emission (and by how much) affect global warming? How does college affect earnings? What is the best way to foster economic growth across the developing world? Does drug A work better than drug B? These, more or less, are causal questions.

Causal inference gives us the tools we need to find causal relationships among variables.

Non-causal questions are simply after some statistical comparison of variables, the correlation between two variables, or predicting a variable. Here are some examples:

How much less do women earn compared to men?
What is the correlation between the number of rainy days and suicide?
What will be the stock valuation of Google one year from now?

Let’s specifically talk about correlations between variables. Relationships of associative nature are only about how some variables are linked (associated) together without any assumption on which one causes the other one(s). For instance, we may find that there is a correlation between how much a dog wags their tail and their happiness. This correlation doesn’t tell us about which one causes the other.

The graph below can summarize what we mean by associations. Wagging tail and happiness are associated without any further proof of what factor causes the other. That’s why there is a link between the two variables without any arrow going from one to the other. We’ll see in a future module that graphs like this (aka causal graphs) can help us visualize relationships between variables.

Statistical concepts such as correlation, regression, conditional dependence, likelihood, propensity score, and odds ratio (concepts you might have heard of in your STAT101 course) all help us find these associative relationships.

An association is a relationship that can be defined in terms of a joint distribution of some observed variables. However, a causal concept is a relationship that can’t be defined from the distribution alone.

So what, beyond regression models, do we need to know in order to find causal relations?

Regressions and causal inference

In many empirical studies ranging from genetics to political science and economics, researchers have used regressions to study the relationships between things.

Udny Yule’s 1899 paper was one of the earliest applications of regression models in understanding social phenomena. In his paper in the Journal of the Royal Statistical Society, Yule used regression theory to understand the “social physics” of poverty.

But poverty is not the same as gravity or some Newtonian mechanics. We can always conjecture but there is no known theory that fully explains poverty and the causal relationships between variables are not fully known. There are likely many factors that are involved. For example, areas that have more efficient administrations are more successful in reducing poverty by enforcing policies such as building homes for the poor.

Regression theory is only going to answer causal questions (especially using observational data which we will discuss later) if the researcher makes certain statistical and causal assumptions. Leaving the statistical assumptions aside, if the causal assumptions are not met, a regression model tells us nothing beyond associations among variables.

When Legendre and Gauss first developed regression models, they used it to understand physical phenomena in which measurements can be done with better precision and the relationships and their directions are better known.

A causal assumption that we will discuss in this course is that the who is affected by the cause (or how the treatment is assigned) is independent of (has nothing to do with) the effect (the outcome variable). In the absence of this assumption, regression is simply a tool for descriptive inference and prediction.

We use the word treatment throughout this course even if most of our examples are not in the medical field. Other people use the word exposure instead of treatment, however, treatment is more widely used.

In the absence of this assumption, a regression model can only answer questions of this nature 👇

What is the expected value of income for a person with a high school degree?

And NOT questions of this nature 👇

What is the causal effect of education on income?

The first question doesn’t necessarily answer whether having a high school degree causes more income or not. However, the second question specifically tries to answer the causal relationship between education and income.

We can use the same causal graph that we saw before to visualize the second question. Note that in this graph, there is an arrow going from education to income showing the causal effect of education on income.

Going back to Yule’s research on poverty, he only tried to establish associations between variables in studying poverty. Yule’s findings mainly refer to correlations or associations between variables without any assumption on which variable(s) cause the other ones. In fact, Yule was smart enough to recognize this in a tiny footnote in the paper by saying

Strictly speaking, for “due to” read “associated with.”

A formal definition of causal inference

Causal inference is, therefore, the study of relationships among variables and deciding which relationships are causal and if so under what assumptions. Causal inference also offers methods of making a causal inference when those assumptions are not met.

A causal relationship refers to changes in the effect of an outcome variable when the cause is changed. Therefore, causal inference refers to inference from data that points to the causal relationship between variables.

Questions of causality are not only the subject of statistics. Many philosophers and social scientists have contemplated about causality. There are many different ways to formally define causation. For the purpose of this course, we’ll stick to David Lewis’s definition.

David Lewis was one of the most important philosophers of the 20th century and when he wasn’t thinking about metaphysics and philosophy of language, he would study the philosophy of probability and logic.

Lewis defined cause as something that makes a difference. He then defined causality as the difference a cause makes that is different from what would have happened without it.

We will focus on the limitations of observational data. Even in the presence of such limitations, we will learn how to best use available causal methods to better infer from the data. We will also discuss empirical tools for estimating the size of causal relationships between variables. In the world of causal inference, we’ll get familiar with concepts such as confounders, randomization, quasi-experiments, intervention, and instrumental variables.

We will learn causal inference through the magical world of potential outcome models first introduced by Neyman and then developed and popularized by Rubin (often called the Neyman-Rubin or the Rubin model).

So you’ll learn a lot about causal inference in this course. You will be as confident as this guy in skateboarding.

Next Lesson

What is a cause?

In this lesson, we'll think about what is a cause and what is an effect.