🐾

Reproduce Section 8.10 in Imbens and Rubin (2005) with Stan

2023/11/23に公開

Overview

Chapters 1-11 of "Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction" deal with causal effect estimation in the case of random assignment. I think this book is very educational in that it often uses concrete data as a starting point. This book is also characterized by the facts "DAGs do not appear at all", "the causal effect is estimated through the problem of missing value imputation of potential outcomes" and "the assignment mechanism is expressed by a random variable". In this blog post, I implemented the case in Section 8.10 with Stan. Some notations have been modified.

Target Readers

People who have read the above book and know a little about Stan's grammar.

Data

I will explain the data. Here I estimate the effect of a job training program (Natinal Supported Work program, NSW program) on how much earnings were increased. Assignment to the program is completely randomized. The data are included in the Matching package in R or can be obtained from the web site (direct link).

Outcome: income in 1978 after completing the training program (re78)
Covariate used: age (age), years of education (education), whether they were now or ever before married (married), whether they were high school dropouts (nodegree), ethnicity (black), income in 1974 (re74), whether income in 1974 is greater than zero (re74pos), income in 1975 (re75), whether income in 1975 is greater than zero (re75pos)

The histogram of the outcomes, which is not plotted here, has frequent zero-valued observations and a heavy tail.

Causal Effect

In this book, the analysis unit $i$ has potential outcomes $Y_{i}(0)$ and $Y_{i}(1)$ . $Y_{i}(0)$ is observed when $i$ is assigned to the control group ( $0$ ) and $Y_{i}(1)$ is observed when $i$ is assigned to the treatment group ( $1$ ). Because one of the two is always missing at the unit level, it is not possible (unless we make very strong assumptions) to obtain the causal effects at the unit-level. Therefore, we will consider causal effects at the population level. The causal effect at the population level is the effect obtained by comparing "the average of the outcomes if all members of the population did not receive the treatment" with "the average of the outcomes if all members of the population received the treatment. This one can be obtained by making relatively loose assumptions. Please refer to the book for more details on these assumptions.

Model

Four different models are used in Section 8.10:

Model 1: no covariates, correlations between potential outcomes, common variance
Model 2: no covariates, potential outcomes are independent, different variances
Model 3: with covariates, potential outcomes are independent, different variances
Model 4: with covariates, potential outcomes are independent, different variances, using a zero-inflated lognormal distribution

We will examine each one in detail below.

Model 1

Assume that the potential outcomes at the unit level follows the following bivariate normal distribution:

\begin{pmatrix} Y_{i}(0) \\ Y_{i}(1) \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu_{c} \\ \mu_{t} \end{pmatrix} , \begin{pmatrix} \sigma^2 & \sigma^2 \\ \sigma^2 & \sigma^2 \end{pmatrix} \right)

Here, the unknown parameters are $\mu_{c}$ , $\mu_{t}$ , and $\sigma^2$ . I think the subscript $c$ is an acronym for control and $t$ is an acronym for treatment. The prior distributions for $\mu_{c}$ and $\mu_{t}$ are set to an uninformed prior with zero mean and variance of $100^2$ . The prior distribution for $\sigma^2$ is set to an inverse gamma distribution with parameters $1$ and $0.01$ .

The correlation coefficient of the variance-covariance matrix is set to 1 for a reason: it has been shown in Chapter 6 that the variance of the estimated causal effect is largest (a conservative evaluation) when the correlation is 1.

The causal effect at the population level is estimated by imputing the missing potential outcomes according to this model equation. The code is as follows:

Line 5: The correlation coefficient in the variance-covariance matrix is given as data.
Lines 44-45 and 47-48: The missing potential outcomes that were not observed are imputed. Note that one of the bivariate variables in the bivariate normal distribution is observed, so it is generated from a conditional distribution. If you do not know what it is, search for "bivariate normal distribution conditional distribution" or something similar.
Line 36: This corresponds to the mean causal effect on the super population.
Line 37: This corresponds to the mean causal effect on the population corresponding to the data you have (finite sample).
Lines 38-40: Instead of the quantiles of the causal effect, the differences between the quantiles of the control group and the quantiles of the treatment group are calculated (see the book for details).

The R code to be executed is as follows:

Line 5 gives the correlation coefficient, which works fine in R for a normal distribution with 0 standard deviation, but will give an error in Stan, so I subtract a small value.

The results are coincident with the book.

Model 2

Assume that the potential outcomes at the unit level follows the following bivariate normal distribution:

\begin{pmatrix} Y_{i}(0) \\ Y_{i}(1) \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu_{c} \\ \mu_{t} \end{pmatrix} , \begin{pmatrix} \sigma_{c}^2 & 0 \\ 0 & \sigma_{t}^2 \end{pmatrix} \right)

The prior distributions for $\sigma_{c}^2$ and $\sigma_{t}^2$ are set to the inverse gamma distribution with parameters $1$ and $0.01$ as in Model 1.

The code is almost the same as in Model 1, so it is omitted. The results are almost consistent with the book.

Model 3

Assume that the potential outcomes at the unit level follows the following bivariate normal distribution:

\begin{pmatrix} Y_{i}(0) \\ Y_{i}(1) \end{pmatrix} \bigg| \vec{X}_i \sim \mathcal{N} \left( \begin{pmatrix} \vec{X_i}\cdot\vec{\beta_{c}} \\ \vec{X_i}\cdot\vec{\beta_{t}} \end{pmatrix} , \begin{pmatrix} \sigma_{c}^2 & 0 \\ 0 & \sigma_{t}^2 \end{pmatrix} \right)

The prior distributions for $\vec{\beta}_{c}$ and $\vec{\beta}_{t}$ are set to an uninformed prior with zero mean and $100^2$ variance. The prior distributions for $\sigma_{c}^2$ and $\sigma_{t}^2$ are the same as in Model 2.

The code is as follows:

Line 6: Covariates are added.
Lines 22-23: The coefficients of the covariates are defined as vectors.
Lines 29 and 31: The mean of the normal distribution is a linear combination of the covariates and coefficients.

The results are almost consistent with the book.

Model 4

In Models 1-3 above, a normal distribution has been fitted without special care for the large number of zeros in the observed outcomes, but here a zero-inflated lognormal distribution is used for the fitting (although the book does not use the term "zero-inflated lognormal distribution," I think it is clearer to assume that the data follow that distribution.)

\operatorname{Pr}\left(Y_i(0)>0 \mid \vec{X_i}\right)=\frac{1}{1+\exp \left(-\vec{X_i}\cdot\vec{\gamma_c}\right)}

Y_i(0) \mid Y_i(0)>0, \vec{X_i} \sim \operatorname{logNormal}\left(\vec{X_i}\cdot\vec{\beta_c}, \sigma_c^2\right)

Another zero-inflated lognormal distribution is fitted to the treatment group data in the same way.

The code is as follows:

Lines 2-8: The likelihood of the Zero-Inflated Log Normal (ZILN) distribution is defined.
Lines 10-17: Random number generation according to the ZILN distribution is defined.

The R code to be executed is as follows:

The results are...a bit inconsistent! This issue of inconsistency was discussed in the past in Stan Forum. However, the regression coefficients for the covariates are almost consistent. I don't know the details because the code for the official implementation has not been published. If you write a book or paper, be sure to publish the code for reproducibility.

Overview

Target Readers

Data

Causal Effect

Model

Model 1

Model 2

Model 3

Model 4

Discussion