iTranslated by AI
Implementing Score Tests Using SymPy Symbolic Computation: A Case Study on Gamma Distribution Shape Parameter Testing
Introduction
This article was inspired by Sibley A, Li Z, Jiang Y, Li YJ, Chan C, Allen A, Owzar K. Facilitating the Calculation of the Efficient Score Using Symbolic Computing. Am Stat. 2018;72(2):199-205.
The probability density function of the Gamma distribution is given by the following equation:
Here,
The properties of the Gamma distribution vary significantly depending on the shape parameter. When the shape parameter is greater than 1, the distribution is concentrated around the mode; when it is 1 or less, it becomes a heavy-tailed distribution that diverges at 0.

Probability density function of the Gamma distribution
If we consider this as a distribution of wealth, one might say that a society with a shape parameter of 1 or less has a large wealth gap.
On the other hand, a scale parameter increasing by a factor of
Therefore, in Testing the Shape Parameter of the Gamma Distribution (or an example of a score test with nuisance parameters), I considered testing the shape parameter while treating the scale parameter as a nuisance parameter.
Specifically, let
- Parameter of interest: Shape parameter
\alpha - Nuisance parameter: Scale parameter
\theta
While such hand calculations are manageable, mistakes tend to increase as models become more complex, even if they can be solved analytically. Thus, this time, with an eye toward applications to more complex models, I will perform the same task using the symbolic computing capabilities of SymPy. Since I am accustomed to R, I will use SymPy via R using reticulate.
The Main Topic
First, install SymPy.
library(reticulate)
virtualenv_install("r-reticulate", "sympy")
If it is already installed, you can import it as follows:
library(reticulate)
reticulate::py_require("sympy")
sympy <- import("sympy")
stats <- import("sympy.stats")
Declare the symbols. In particular, we declare X as follows because we want to calculate the expected value later.
alpha <- sympy$Symbol("alpha", positive = TRUE)
theta <- sympy$Symbol("theta", positive = TRUE)
X <- stats$Gamma("X", alpha, theta)
Declare the likelihood per sample (i.e., the logarithm of the density function) and calculate the score and Fisher information.
logpdf <- sympy$simplify(sympy$log(stats$density(X)(X)))
# score
S_alpha <- sympy$simplify(sympy$diff(logpdf, alpha))
S_theta <- sympy$simplify(sympy$diff(logpdf, theta))
# Fisher information
I_11 <- sympy$simplify(-stats$Expectation(sympy$diff(S_alpha, alpha)))
I_12 <- sympy$simplify(-stats$Expectation(sympy$diff(S_alpha, theta)))
I_21 <- sympy$simplify(-stats$Expectation(sympy$diff(S_theta, alpha)))
I_22 <- sympy$simplify(-stats$Expectation(sympy$diff(S_theta, theta)))
Calculate the efficient information.
I_eff <- sympy$simplify((I_11 - I_12 * I_21 / I_22))
If you print this, you should find that it matches the result derived in Testing the Shape Parameter of the Gamma Distribution (or an example of a score test with nuisance parameters). Note that here
> print(I_eff)
polygamma(1, alpha) - 1/alpha
We provide the data using random numbers. Let the null hypothesis be
# Data
set.seed(1234)
x_data <- rgamma(5, shape = 2)
# alpha under the null hypothesis
alpha0 <- 1
Sum the scores and the efficient information over the number of samples.
# Sum of scores
S_alpha0 <- sympy$Integer(0)
S_theta0 <- sympy$Integer(0)
I_eff0 <- sympy$Integer(0)
for(val in x_data) {
S_alpha0 <- S_alpha0 + S_alpha$subs(dict(alpha = alpha0, X = val))
S_theta0 <- S_theta0 + S_theta$subs(dict(alpha = alpha0, X = val))
I_eff0 <- I_eff0 + I_eff$subs(dict(alpha=alpha0))
}
Solve for
sol_theta <- sympy$solve(sympy$Eq(S_theta0, 0), theta)
Substitute
U_eff0 <- sympy$simplify(
S_alpha0$subs(dict(theta=sol_theta[[1]]))^2/I_eff0$subs(dict(theta=sol_theta[[1]]))
)
Convert it to an R numeric value.
builtins <- import_builtins()
U_eff0_val <- py_to_r(builtins$float(U_eff0$evalf()))
Let's compare it with the following function derived in Testing the Shape Parameter of the Gamma Distribution (or an example of a score test with nuisance parameters).
gamma_shape_score_test <- function(x, alpha0) {
n <- length(x)
xbar <- mean(x)
logxbar <- mean(log(x))
U <- n * (logxbar - log(xbar) + log(alpha0) - digamma(alpha0))
Ieff <- n * (trigamma(alpha0) - 1 / alpha0)
stat <- (U^2) / Ieff
pval <- pchisq(stat, df = 1, lower.tail = FALSE)
list(
statistic = stat,
p.value = pval,
null.value = alpha0
)
}
res_r <- gamma_shape_score_test(x = x_data, 1)
You should see that almost identical values were obtained.
> print(U_eff0_val)
[1] 1.145261
> print(res_r$statistic)
[1] 1.145261
> print(all.equal(res_r$statistic, U_eff0_val))
[1] TRUE
The full R code can be found here:
That is all.
Discussion