Reproduce Section 10.7 in Imbens and Rubin (2005) with Stan
Overview
The educational television program "The Electric Company" will be shown to children with the goal of improving their reading comprehension. In each school, pairs of two classes will be selected, and one class will be randomly assigned to a standard class (control group) and the other class will be randomly assigned to a class in which "The Electric Company" is shown (treatment group). We would like to estimate the causal effect of showing "The Electric Company". The notation will be partially modified.
Data
No raw data was found. Therefore, I gave up on trying to reproduce the analysis from the raw data and manually entered and saved the data in Table 10.1, which is the data after preprocessing.

G
: school ID 
W
: Assignment. 0 for control group, 1 for treatment group. 
X
: Covariates. Pretreatment scores. 
Y
: Outcomes. Posttreatment scores.
Model
Assume that there are 8 strata (blocks) because there are 8 schools (pairs). The missing values are then imputed using a hierarchical model with school differences for potential outcomes. The advantage of the Bayesian modelbased analysis in Rubin's style is that the hierarchical model can be easily incorporated in this way.
Assume that the potential outcome for a school
Assume that the mean for each school independently follows the following normal distribution:
Set prior distributions for
The code is as follows, using the fact that Stan sets a noninformative prior (a very wide uniform distribution) if the prior is not specifically stated:
 When using the hierarchical model, we could have written lines 26, 28, 41, and 43 neatly by providing a variable that maps the class index
i
to the school indexj
, such asi2j
in line 7.  In line 22, the mean for each school follows a normal distribution.
 The rest of the code is roughly the same as Model 1 in the previous post.
Here is the R code to execute:
 Line 4: Because the number of covariates is 1, we make the
matrix. Note that there isN \times 1 , so the intercept term is not needed.\mu(j)
The results are almost consistent with the book.
Discussion