🐾

Reproduce Section 10.7 in Imbens and Rubin (2005) with Stan

2023/11/29に公開

Overview

The educational television program "The Electric Company" will be shown to children with the goal of improving their reading comprehension. In each school, pairs of two classes will be selected, and one class will be randomly assigned to a standard class (control group) and the other class will be randomly assigned to a class in which "The Electric Company" is shown (treatment group). We would like to estimate the causal effect of showing "The Electric Company". The notation will be partially modified.

Data

No raw data was found. Therefore, I gave up on trying to reproduce the analysis from the raw data and manually entered and saved the data in Table 10.1, which is the data after preprocessing.

G: school ID
W: Assignment. 0 for control group, 1 for treatment group.
X: Covariates. Pre-treatment scores.
Y: Outcomes. Post-treatment scores.

Model

Assume that there are 8 strata (blocks) because there are 8 schools (pairs). The missing values are then imputed using a hierarchical model with school differences for potential outcomes. The advantage of the Bayesian model-based analysis in Rubin's style is that the hierarchical model can be easily incorporated in this way.

Assume that the potential outcome for a school $j$ follows the following normal distribution. The standard deviation is assumed to be the same for all schools:

\begin{pmatrix} Y_{i}(0) \\ Y_{i}(1) \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu(j) + \vec{X}_i\cdot\vec{\beta} \\ \mu(j) + \gamma + \vec{X}_i\cdot\vec{\beta} \end{pmatrix} , \begin{pmatrix} \sigma_{c}^2 & 0 \\ 0 & \sigma_{t}^2 \end{pmatrix} \right)

Assume that the mean for each school independently follows the following normal distribution:

\mu(j) \sim \mathcal{N} \left(\mu_{\text{all}},\sigma_{\mu}^2\right)

Set prior distributions for $\mu_{\text{all}}$ , $\sigma_{\mu}^2$ , $\sigma_c$ , $\sigma_t$ , $\gamma$ , $\vec{\beta}$ . The book uses an inverse chi-square distribution with two parameters for the variance parameters, but I can't figure it out. So I give up on trying to reproduce the book exactly and set an noninformative prior distribution for the standard deviations.

The code is as follows, using the fact that Stan sets a noninformative prior (a very wide uniform distribution) if the prior is not specifically stated:

When using the hierarchical model, we could have written lines 26, 28, 41, and 43 neatly by providing a variable that maps the class index i to the school index j, such as i2j in line 7.
In line 22, the mean for each school follows a normal distribution.
The rest of the code is roughly the same as Model 1 in the previous post.

Here is the R code to execute:

Line 4: Because the number of covariates is 1, we make the $N \times 1$ matrix. Note that there is $\mu(j)$ , so the intercept term is not needed.

The results are almost consistent with the book.

Overview

Data

Model

Discussion