Causal Mediation Analysis for Stochastic Interventions

Background

We are interested in assessing the population intervention direct effect and the population intervention indirect effect, based on the effect decomposition of the population intervention effect introduced in Dı́az and Hejazi (2020).

To proceed, we’ll use as our running example a simple data set from an observational study of the relationship between BMI and kids behavior, distributed as part of the mma R package on CRAN. First, let’s load the packages we’ll be using and set a seed; then, load this data set and take a quick look at it

# preliminaries
library(data.table)
library(dplyr)
library(tidyr)
library(sl3)
library(medshift)
library(mma)
set.seed(429153)

# load and examine data
data(weight_behavior)
dim(weight_behavior)
## [1] 691  15
head(weight_behavior)
##        bmi  age sex  race numpeople car gotosch snack tvhours cmpthours
## 1 18.20665 12.2   F OTHER         5   3       2     1       4         0
## 2 22.78401 12.8   M OTHER         4   3       2     1       4         2
## 3 19.60725 12.6   F OTHER         4   2       4     2      NA        NA
## 4 25.56754 12.1   M OTHER         2   3       2     1       0         2
## 5 15.07408 12.3   M OTHER         4   1       2     1       2         1
## 6 22.98338 11.8   M OTHER         4   1       1     1       4         3
##   cellhours sports exercises sweat overweigh
## 1         0      2         2     1         0
## 2         0      1         8     2         0
## 3        NA   <NA>         4     2         0
## 4         0      2         9     1         1
## 5         3      1        12     1         0
## 6         2      1         1     1         0

The documentation for the data set describes it as a “database obtained from the Louisiana State University Health Sciences Center, New Orleans, by Dr. Richard Scribner. He explored the relationship between BMI and kids behavior through a survey at children, teachers and parents in Grenada in 2014. This data set includes 691 observations and 15 variables.”

Unfortunately, the data set contains a few observations with missing values. As these are unrelated to the object of our analysis, we’ll simply remove these for the time being. Note that in a real data analysis, we might consider strategies to fully make of the observed data, perhaps by imputing missing values. For now, we simply remove the incomplete observations, resulting in a data set with fewer observations but much the same structure as the original:

## [1] 567  15
##        bmi  age sex  race numpeople car gotosch snack tvhours cmpthours
## 1 18.20665 12.2   F OTHER         5   3       2     1       4         0
## 2 22.78401 12.8   M OTHER         4   3       2     1       4         2
## 3 25.56754 12.1   M OTHER         2   3       2     1       0         2
## 4 15.07408 12.3   M OTHER         4   1       2     1       2         1
## 5 22.98338 11.8   M OTHER         4   1       1     1       4         3
## 6 19.15658 12.1   F OTHER         3   3       2     1       0         0
##   cellhours sports exercises sweat overweigh
## 1         0      1         2     1         0
## 2         0      0         8     2         0
## 3         0      1         9     1         1
## 4         3      0        12     1         0
## 5         2      0         1     1         0
## 6         1      0         1     3         0

For the analysis of this observational data set, we focus on the effect of participating in a sports team (sports) on the BMI of children (bmi), taking several related covariates as mediators (snack, exercises, overweigh) and all other collected covariates as potential confounders. Considering an NPSEM, we separate the observed variables from the data set into their corresponding nodes as follows

Y <- weight_behavior_complete$bmi
A <- weight_behavior_complete$sports
Z <- weight_behavior_complete %>%
  select(snack, exercises, overweigh)
W <- weight_behavior_complete %>%
  select(age, sex, race, numpeople, car, gotosch, tvhours, cmpthours,
         cellhours, sweat)

Finally, in our analysis, we consider an incremental propensity score intervention (IPSI), as first proposed by Kennedy (2018), wherein the odds of participating in a sports team is modulated by some fixed amount (0 ≤ δ ≤ ∞) for each individual. Such an intervention may be interpreted as the effect of a school program that motivates children to participate in sports teams. To exemplify our approach, we postulate a motivational intervention that triples the odds of participating in a sports team for each individual:

delta_shift_ipsi <- 3

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the sl3 R package (coyle2020sl3?). For a complete guide on using the sl3 R package, consider consulting https://tlverse.org/sl3, or https://tlverse.org (and https://github.com/tlverse) for the tlverse ecosystem, of which sl3 is a major part. We construct an ensemble learner using a handful of popular machine learning algorithms below

# SL learners used for continuous data (the nuisance parameter M)
xgb_contin_lrnr <- Lrnr_xgboost$new(nrounds = 50, objective = "reg:linear")
enet_contin_lrnr <- Lrnr_glmnet$new(alpha = 0.5, family = "gaussian",
                                    nfolds = 3)
lasso_contin_lrnr <- Lrnr_glmnet$new(alpha = 1, family = "gaussian",
                                     nfolds = 3)
fglm_contin_lrnr <- Lrnr_glm_fast$new(family = gaussian())
contin_lrnr_lib <- Stack$new(enet_contin_lrnr, lasso_contin_lrnr,
                             fglm_contin_lrnr, xgb_contin_lrnr)
sl_contin_lrnr <- Lrnr_sl$new(learners = contin_lrnr_lib,
                              metalearner = Lrnr_nnls$new())

# SL learners used for binary data (nuisance parameters G and E in this case)
xgb_binary_lrnr <- Lrnr_xgboost$new(nrounds = 50, objective = "reg:logistic")
enet_binary_lrnr <- Lrnr_glmnet$new(alpha = 0.5, family = "binomial",
                                    nfolds = 3)
lasso_binary_lrnr <- Lrnr_glmnet$new(alpha = 1, family = "binomial",
                                     nfolds = 3)
fglm_binary_lrnr <- Lrnr_glm_fast$new(family = binomial())
binary_lrnr_lib <- Stack$new(enet_binary_lrnr, lasso_binary_lrnr,
                             fglm_binary_lrnr, xgb_binary_lrnr)
logistic_metalearner <- make_learner(Lrnr_solnp,
                                     metalearner_logistic_binomial,
                                     loss_loglik_binomial)
sl_binary_lrnr <- Lrnr_sl$new(learners = binary_lrnr_lib,
                              metalearner = logistic_metalearner)

Decomposing the population intervention effect

We may decompose the population intervention effect (PIE) in terms of a population intervention direct effect (PIDE) and a population intervention indirect effect (PIIE):

This decomposition of the PIE as the sum of the population intervention direct and indirect effects has an interpretation analogous to the corresponding standard decomposition of the average treatment effect. In the sequel, we will compute each of the components of the direct and indirect effects above using appropriate estimators as follows

  • For 𝔼{Y(A, Z)}, the sample mean $\frac{1}{n}\sum_{i=1}^n Y_i$ is sufficient;
  • for 𝔼{Y(Aδ, Z)}, an efficient one-step estimator for the effect of a joint intervention altering the exposure mechanism but not the mediation mechanism, as proposed in Dı́az and Hejazi (2020); and,
  • for 𝔼{Y(Aδ, ZAδ)}, an efficient one-step estimator for the effect of a joint intervention altering both the exposure and mediation mechanisms, as proposed in Kennedy (2018) and implemented in the npcausal R package.

Estimating the effect decomposition term

As given in Dı́az and Hejazi (2020), the statistical functional identifying the decomposition term that appears in both the PIDE and PIIE 𝔼{Y(Aδ, Z)}, which corresponds to altering the exposure mechanism while keeping the mediation mechanism fixed, is for which a one-step estimator is available. The corresponding efficient influence function (EIF) with respect to the nonparametric model is Dη, δ(o) = Dη, δY(o) + Dη, δA(o) + Dη, δZ, W(o) − θ(δ). The one-step estimator may be computed using the EIF estimating equation, making use of cross-fitting (Zheng and van der Laan 2011; Chernozhukov et al. 2018) to circumvent any need for entropy conditions (i.e., Donsker class restrictions). The resultant estimator is which is implemented in the medshift R package. We make use of that implementation to estimate 𝔼{Y(Aδ, Z)} via its one-step estimator θ̂(δ) below

# let's compute the parameter where A (but not Z) are shifted
theta_eff <- medshift(W = W, A = A, Z = Z, Y = Y,
                      delta = delta_shift_ipsi,
                      g_learners = sl_binary_lrnr,
                      e_learners = sl_binary_lrnr,
                      m_learners = sl_contin_lrnr,
                      phi_learners = Lrnr_hal9001$new(),
                      estimator = "onestep",
                      estimator_args = list(cv_folds = 3))
## [02:46:10] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:10] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:11] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:47] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:47] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:46:48] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:07] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:08] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:08] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:08] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
## [02:47:08] WARNING: src/objective/regression_obj.cu:213: reg:linear is now deprecated in favor of reg:squarederror.
summary(theta_eff)
##       lwr_ci    param_est       upr_ci    param_var     eif_mean    estimator 
##     18.77974    19.133084    19.486428     0.032501 9.767717e-16      onestep

Estimating the direct effect

Recall that, based on the decomposition outlined previously, the population intervention direct effect may be denoted βPIDE(δ) = θ0(δ) − 𝔼Y. Thus, an estimator of the PIDE, β̂PIDE(δ) may be expressed as a composition of estimators of its constituent parameters:

Based on the above, we may construct an estimator of the PIDE using quantities already computed. The convenience function below applies the simple delta method required in the case of a linear contrast between the two constituent parameters:

# convenience function to compute inference via delta method: EY1 - EY0
linear_contrast <- function(params, eifs, ci_level = 0.95) {
  # bounds for confidence interval
  ci_norm_bounds <- c(-1, 1) * abs(stats::qnorm(p = (1 - ci_level) / 2))
  param_est <- params[[1]] - params[[2]]
  eif <- eifs[[1]] - eifs[[2]]
  se_eif <- sqrt(var(eif) / length(eif))
  param_ci <- param_est + ci_norm_bounds * se_eif
  # parameter and inference
  out <- c(param_ci[1], param_est, param_ci[2])
  names(out) <- c("lwr_ci", "param_est", "upr_ci")
  return(out)
}

With the above convenience function in hand, we’ll construct or extract the necessary components from existing objects and simply apply the function:

# parameter estimates and EIFs for components of direct effect
EY <- mean(Y)
eif_EY <- Y - EY
params_de <- list(theta_eff$theta, EY)
eifs_de <- list(theta_eff$eif, eif_EY)

# direct effect = EY - estimated quantity
de_est <- linear_contrast(params_de, eifs_de)
de_est
##       lwr_ci    param_est       upr_ci 
## -0.481588292  0.005976251  0.493540793

As given above, we have for our estimate of the direct effect β̂PIDE(δ)= 0.006.

References

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1). https://doi.org/10.1111/ectj.12097.
Dı́az, Iván, and Nima S Hejazi. 2020. “Causal Mediation Analysis for Stochastic Interventions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology). https://doi.org/10.1111/rssb.12362.
Kennedy, Edward H. 2018. “Nonparametric Causal Effects Based on Incremental Propensity Score Interventions.” Journal of the American Statistical Association.
Zheng, Wenjing, and Mark J van der Laan. 2011. “Cross-Validated Targeted Minimum-Loss-Based Estimation.” In Targeted Learning, 459–74. Springer.