Variable Importance Analysis with Stochastic Interventions

Introduction

Stochastic treatment regimes present a relatively simple manner in which to assess the effects of continuous treatments by way of parameters that examine the effects induced by the counterfactual shifting of the observed values of a treatment of interest. Here, we present an implementation of a new algorithm for computing targeted minimum loss-based estimates of treatment shift parameters defined based on a shifting function d(A, W). For a technical presentation of the algorithm, the interested reader is invited to consult Dı́az and van der Laan (2018). For additional background on Targeted Learning and previous work on stochastic treatment regimes, please consider consulting van der Laan and Rose (2011), van der Laan and Rose (2018), and Dı́az and van der Laan (2012).

To start, let’s load the packages we’ll use and set a seed for simulation:

library(data.table)
library(sl3)
library(tmle3)
library(tmle3shift)
set.seed(429153)

Data and Notation

Consider n observed units O1, …, On, where each random variable O = (W, A, Y) corresponds to a single observational unit. Let W denote baseline covariates (e.g., age, sex, education level), A an intervention variable of interest (e.g., nutritional supplements), and Y an outcome of interest (e.g., disease status). Though it need not be the case, let A be continuous-valued, i.e. A ∈ ℝ. Let Oi ∼ 𝒫 ∈ ℳ, where is the nonparametric statistical model defined as the set of continuous densities on O with respect to some dominating measure. To formalize the definition of stochastic interventions and their corresponding causal effects, we introduce a nonparametric structural equation model (NPSEM), based on Pearl (2000), to define how the system changes under posited interventions: We denote the observed data structure O = (W, A, Y)

Letting A denote a continuous-valued treatment, we assume that the distribution of A conditional on W = w has support in the interval (l(w), u(w)) – for convenience, let this support be a.e. That is, the minimum natural value of treatment A for an individual with covariates W = w is l(w); similarly, the maximum is u(w). Then, a simple stochastic intervention, based on a shift δ, may be defined where 0 ≤ δ ≤ u(w) is an arbitrary pre-specified value that defines the degree to which the observed value A is to be shifted, where possible.

Simulate Data

# simulate simple data for tmle-shift sketch
n_obs <- 1000 # number of observations
n_w <- 1 # number of baseline covariates
tx_mult <- 2 # multiplier for the effect of W = 1 on the treatment

# baseline covariates -- simple, binary
W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5)))

# create treatment based on baseline W
A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1))

# create outcome as a linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 0.5)

The above composes our observed data structure O = (W, A, Y). To formally express this fact using the tlverse grammar introduced by the tmle3 package, we create a single data object and specify the functional relationships between the nodes in the directed acyclic graph (DAG) via nonparametric structural equation models (NPSEMs), reflected in the node list that we set up:

# organize data and nodes for tmle3
data <- data.table(W, A, Y)
node_list <- list(W = "W", A = "A", Y = "Y")
head(data)
##        W          A          Y
##    <num>      <num>      <num>
## 1:     1  2.4031607  3.7157578
## 2:     1  4.4973744  5.9651611
## 3:     1  2.0330871  2.2531970
## 4:     0 -0.8089023 -0.8849531
## 5:     1  1.8432067  2.7193091
## 6:     1  1.3555863  2.5705832

We now have an observed data structure (data) and a specification of the role that each variable in the data set plays as the nodes in a DAG.

Methodology

Defining a grid of counterfactual interventions

In order to specify a grid of shifts δ to be used in defining a set of stochastic intervention policies in an a priori manner, let us consider an arbitrary scalar δ that defines a counterfactual outcome ψn = Qn(d(A, W), W), where, for simplicity, let d(A, W) = A + δ. A simplified expression of the auxiliary covariate for the TML estimator of ψ is $H_n = \frac{g^{\star}(a \mid w)}{g(a \mid w)}$, where g(a ∣ w) defines the treatment mechanism with the stochastic intervention implemented. To ascertain whether a given choice of the shift δ is admissable – that is, whether such an intervention may be implemented while avoiding violations of the positivity assumption – define a bound $C(\delta) := \frac{g^{\star}(a \mid w)}{g(a \mid w)} \leq M$, where g(a ∣ w) is a function of δ in part, and M is a potentially user-specified upper bound of C(δ). Then, C(δ) may be interpreted as a measure of the influence of a given observation providing a way to limit the maximum influence of a given observation through a choice of the shift δ and the setting of the bound M.

We formalize and extend the procedure to determine an acceptable set of values for the shift δ in the sequel. Specifically, let there be a shift d(a, w) = a + δ, where the shift δ is defined where $$\delta_{\text{max}}(a, w) = \text{argmax}_{\left\{\delta \geq 0, \frac{g(a - \delta \mid w)}{g(a \mid w)} \leq M \right\}} \frac{g(a - \delta \mid w)}{g(a \mid w)}$$ and $$\delta_{\text{min}}(a, w) = \text{argmin}_{\left\{\delta \leq 0, \frac{g(a - \delta \mid w)}{g(a \mid w)} \leq M \right\}} \frac{g(a - \delta \mid w)}{g(a \mid w)}.$$

The above provides a strategy for implementing a shift at the level of a given observation (a, w), thereby allowing for all observations to be shifted to an appropriate value – whether δmin, δ, or δmax. For the purpose of using such a shift in practice, the present software provides the functions shift_additive_bounded and shift_additive_bounded_inv, which define a variation of this shift: which corresponds to an intervention in which the natural value of treatment of a given observational unit is shifted by a value δ in the case that the ratio of the intervened density g(a ∣ w) to the natural density g(a ∣ w) (that is, C(δ)) does not exceed a bound M. In the case that the ratio C(δ) exceeds the bound M, the stochastic intervention policy does not apply to the given unit and they remain at their natural value of treatment a.

Interlude: Constructing Optimal Stacked Regressions with sl3

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the sl3 R package. For a complete guide on using the sl3 R package, consider consulting https://tlverse.org/sl3, or https://tlverse.org for the tlverse ecosystem, of which sl3 is a core engine.

Using the framework provided by the sl3 package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). To estimate the treatment mechanism (often denoted “g” in the targeted learning literature), we must make use of learning algorithms specifically suited to conditional density estimation; a list of such learners may be extracted from sl3 by using sl3_list_learners():

sl3_list_learners("density")
## [1] "Lrnr_density_discretize"     "Lrnr_density_hse"           
## [3] "Lrnr_density_semiparametric" "Lrnr_haldensify"            
## [5] "Lrnr_solnp_density"

To proceed, we’ll select two of the above learners, Lrnr_haldensify for using the highly adaptive lasso for conditional density estimation, based on an algorithm given by Dı́az and van der Laan (2011), and Lrnr_density_semiparametric, an approach for semiparametric conditional density estimation:

# learners used for conditional density regression (i.e., propensity score)
haldensify_lrnr <- Lrnr_haldensify$new(
  n_bins = 3, grid_type = "equal_mass",
  lambda_seq = exp(seq(-1, -9, length = 100))
)
hse_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new())
mvd_lrnr <- Lrnr_density_semiparametric$new(mean_learner = Lrnr_glm$new(),
                                            var_learner = Lrnr_mean$new())
sl_lrn_dens <- Lrnr_sl$new(
  learners = list(haldensify_lrnr, hse_lrnr, mvd_lrnr),
  metalearner = Lrnr_solnp_density$new()
)

We also required an approach for estimating the outcome regression (often denoted “Q” in the targeted learning literature). For this, we build a Super Learner composed of an intercept model, a GLM, and the xgboost algorithm for gradient boosting:

# learners used for conditional expectation regression (e.g., outcome)
mean_lrnr <- Lrnr_mean$new()
glm_lrnr <- Lrnr_glm$new()
xgb_lrnr <- Lrnr_xgboost$new()
sl_lrn <- Lrnr_sl$new(
  learners = list(mean_lrnr, glm_lrnr, xgb_lrnr),
  metalearner = Lrnr_nnls$new()
)

We can make the above explicit with respect to standard notation by bundling the ensemble learners into a list object below.

# specify outcome and treatment regressions and create learner list
Q_learner <- sl_lrn
g_learner <- sl_lrn_dens
learner_list <- list(Y = Q_learner, A = g_learner)

The learner_list object above specifies the role that each of the ensemble learners we’ve generated is to play in computing initial estimators to be used in building a TMLE for the parameter of interest here. In particular, it makes explicit the fact that our Q_learner is used in fitting the outcome regression while our g_learner is used in fitting our treatment mechanism regression.

Initializing vimshift through its tmle3_Spec

To start, we will initialize a specification for the TMLE of our parameter of interest (called a tmle3_Spec in the tlverse nomenclature) simply by calling tmle_shift. We specify the argument shift_grid = seq(-1, 1, by = 1) when initializing the tmle3_Spec object to communicate that we’re interested in assessing the mean counterfactual outcome over a grid of shifts -1, 0, 1 on the scale of the treatment A (note that the numerical choice of shift is an arbitrarily chosen set of values for this example).

# what's the grid of shifts we wish to consider?
delta_grid <- seq(-1, 1, 1)

# initialize a tmle specification
tmle_spec <- tmle_vimshift_delta(shift_fxn = shift_additive_bounded,
                                 shift_fxn_inv = shift_additive_bounded_inv,
                                 shift_grid = delta_grid,
                                 max_shifted_ratio = 2)

As seen above, the tmle_vimshift specification object (like all tmle3_Spec objects) does not store the data for our specific analysis of interest. Later, we’ll see that passing a data object directly to the tmle3 wrapper function, alongside the instantiated tmle_spec, will serve to construct a tmle3_Task object internally (see the tmle3 documentation for details).

Targeted Estimation of Stochastic Interventions Effects

One may walk through the step-by-step procedure for fitting the TML estimator of the mean counterfactual outcome under each shift in the grid, using the machinery exposed by the tmle3 R package (see below); however, the step-by-step procedure is more often not of interest.

# NOT RUN -- SEE NEXT CODE CHUNK

# define data (from tmle3_Spec base class)
tmle_task <- tmle_spec$make_tmle_task(data, node_list)

# define likelihood (from tmle3_Spec base class)
likelihood_init <- tmle_spec$make_initial_likelihood(tmle_task, learner_list)

# define update method (fluctuation submodel and loss function)
updater <- tmle_spec$make_updater()
likelihood_targeted <- Targeted_Likelihood$new(likelihood_init, updater)

# invoke params specified in spec
tmle_params <- tmle_spec$make_params(tmle_task, likelihood_targeted)
updater$tmle_params <- tmle_params

# fit TML estimator update
tmle_fit <- fit_tmle3(tmle_task, likelihood_targeted, tmle_params, updater)

# extract results from tmle3_Fit object
tmle_fit

Instead, one may invoke the tmle3 wrapper function (a user-facing convenience utility) to fit the series of TML estimators (one for each parameter defined by the grid delta) in a single function call:

# fit the TML estimator
tmle_fit <- tmle3(tmle_spec, data, node_list, learner_list)
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 
## Iter: 1 fn: 1383.2954     Pars:  0.38063 0.34552 0.27385
## Iter: 2 fn: 1383.2954     Pars:  0.38063 0.34552 0.27385
## solnp--> Completed in 2 iterations
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 14% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 15% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 17% of observations outside training support...predictions trimmed.
## 18% of observations outside training support...predictions trimmed.
## 12% of observations outside training support...predictions trimmed.
## 22% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 14% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 15% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 17% of observations outside training support...predictions trimmed.
## 18% of observations outside training support...predictions trimmed.
## 12% of observations outside training support...predictions trimmed.
## 22% of observations outside training support...predictions trimmed.
## 16% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 12% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 9% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 10% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 12% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 7% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 9% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 15.6% of observations outside training support...predictions trimmed.
## 15.6% of observations outside training support...predictions trimmed.
## 7.7% of observations outside training support...predictions trimmed.
## 7.7% of observations outside training support...predictions trimmed.
## 0.7% of observations outside training support...predictions trimmed.
## 0.7% of observations outside training support...predictions trimmed.
## Warning in ED * private$.targeted_components: longer object length is not a
## multiple of shorter object length
tmle_fit
## A tmle3_Fit that took 1 step(s)
##          type          param  init_est  tmle_est         se     lower     upper
##        <char>         <char>     <num>     <num>      <num>     <num>     <num>
## 1:        TSM  E[Y_{A=NULL}] 0.5647509 0.5737838 0.05873221 0.4586708 0.6888968
## 2:        TSM  E[Y_{A=NULL}] 1.5436331 1.5436275 0.05987214 1.4262803 1.6609748
## 3:        TSM  E[Y_{A=NULL}] 2.5582052 2.5596756 0.05973218 2.4426027 2.6767486
## 4: MSM_linear MSM(intercept) 1.5555297 1.5590290 0.05917357 1.4430509 1.6750071
## 5: MSM_linear     MSM(slope) 0.9967272 0.9929459 0.00648008 0.9802452 1.0056466
##    psi_transformed lower_transformed upper_transformed
##              <num>             <num>             <num>
## 1:       0.5737838         0.4586708         0.6888968
## 2:       1.5436275         1.4262803         1.6609748
## 3:       2.5596756         2.4426027         2.6767486
## 4:       1.5590290         1.4430509         1.6750071
## 5:       0.9929459         0.9802452         1.0056466

Remark: The print method of the resultant tmle_fit object conveniently displays the results from computing our TML estimator.

Inference with Marginal Structural Models

In the directly preceding section, we consider estimating the mean counterfactual outcome ψn under several values of the intervention δ, taken from the aforementioned δ-grid. We now turn our attention to an approach for obtaining inference on a single summary measure of these estimated quantities. In particular, we propose summarizing the estimates ψn through a marginal structural model (MSM), obtaining inference by way of a hypothesis test on a parameter of this working MSM. For a data structure O = (W, A, Y), let ψδ(P0) be the mean outcome under a shift δ of the treatment, so that we have ψ⃗δ = (ψδ : δ) with corresponding estimators ψ⃗n, δ = (ψn, δ : δ). Further, let β(ψ⃗δ) = ϕ((ψδ : δ)).

For a given MSM mβ(δ), we have that β0 = argminβδ(ψδ(P0) − mβ(δ))2h(δ), which is the solution to $$u(\beta, (\psi_{\delta}: \delta)) = \sum_{\delta}h(\delta) \left(\psi_{\delta}(P_0) - m_{\beta}(\delta) \right) \frac{d}{d\beta} m_{\beta}(\delta) = 0.$$ This then leads to the following expansion $$\beta(\vec{\psi}_n) - \beta(\vec{\psi}_0) \approx -\frac{d}{d\beta} u(\beta_0, \vec{\psi}_0)^{-1} \frac{d}{d\psi} u(\beta_0, \psi_0)(\vec{\psi}_n - \vec{\psi}_0),$$ where we have $$\frac{d}{d\beta} u(\beta, \psi) = -\sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta)^t \frac{d}{d\beta} m_{\beta}(\delta) -\sum_{\delta} h(\delta) m_{\beta}(\delta) \frac{d^2}{d\beta^2} m_{\beta}(\delta),$$ which, in the case of an MSM that is a linear model (since $\frac{d^2}{d\beta^2} m_{\beta}(\delta) = 0$), reduces simply to $$\frac{d}{d\beta} u(\beta, \psi) = -\sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta)^t \frac{d}{d\beta} m_{\beta}(\delta),$$ and $$\frac{d}{d\psi}u(\beta, \psi)(\psi_n - \psi_0) = \sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta) (\psi_n - \psi_0)(\delta),$$ which we may write in terms of the efficient influence function (EIF) of ψ by using the first order approximation $(\psi_n - \psi_0)(\delta) = \frac{1}{n}\sum_{i = 1}^n \text{EIF}_{\psi_{\delta}}(O_i)$, where EIFψδ is the efficient influence function (EIF) of ψ⃗.

Now, say, ψ⃗ = (ψ(δ) : δ) is d-dimensional, then we may write the efficient influence function of the MSM parameter β (assuming a linear MSM) as follows $$\text{EIF}_{\beta}(O) = \left(\sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta) \frac{d}{d\beta} m_{\beta}(\delta)^t \right)^{-1} \cdot \sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta) \text{EIF}_{\psi_{\delta}}(O),$$ where the first term is of dimension d × d and the second term is of dimension d × 1.

In an effort to generalize still further, consider the case where ψδ(P0) ∈ (0, 1) – that is, ψδ(P0) corresponds to the probability of some event of interest. In such a case, it would be more natural to consider a logistic MSM $$m_{\beta}(\delta) = \frac{1}{1 + \exp(-f_{\beta}(\delta))},$$ where fβ is taken to be linear in β (e.g., fβ = β0 + β1δ + …). In such a case, we have the parameter of interest β0 = argmaxβδ(ψδ(P0)logmβ(δ) + (1 − ψδ(P0))log (1 − mβ(δ)))h(δ), where β0 solves the following $$ \sum_{\delta} h(\delta) \frac{d}{d\beta} f_{\beta}(\delta) (\psi_{\delta}(P_0) - m_{\beta}(\delta)) = 0.$$

Inference from a working MSM is rather straightforward. To wit, the limiting distribution for mβ(δ) may be expressed $$\sqrt{n}(\beta_n - \beta_0) \to N(0, \Sigma),$$ where Σ is the empirical covariance matrix of EIFβ(O).

Directly Targeting the MSM Parameter β

Note that in the above, a working MSM is fit to the individual TML estimates of the mean counterfactual outcome under a given value of the shift δ in the supplied grid. The parameter of interest β of the MSM is asymptotically linear (and, in fact, a TML estimator) as a consequence of its construction from individual TML estimators. In smaller samples, it may be prudent to perform a TML estimation procedure that targets the parameter β directly, as opposed to constructing it from several independently targeted TML estimates. An approach for constructing such an estimator is proposed in the sequel.

Let $C = \left(\sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta) \frac{d}{d\beta} m_{\beta}(\delta)^t \right)$, then $$\text{EIF}_{\beta}(O) = C^{-1} \cdot \sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta)(Y - \overline{Q}(A,W) + C^{-1} \sum_{\delta} h(\delta) \frac{d}{d\beta} m_{\beta}(\delta) \left(\int \overline{Q}(a,w) g_{\delta}^0(a \mid w) - \Psi_{\delta}\right).$$

Suppose a simple working MSM 𝔼Ygδ0 = β0 + β1δ, then a TML estimator targeting β0 and β1 may be constructed as $$\overline{Q}_{n, \epsilon}(A,W) = \overline{Q}_n(A,W) + \epsilon (H_1(g), H_2(g),$$ for all δ, where H1(g) is the auxiliary covariate for β0 and H2(g) is the auxiliary covariate for β1.

To construct a targeted maximum likelihood estimator that directly targets the parameters of the working marginal structural model, we may use the tmle_vimshift_msm Spec (instead of the tmle_vimshift_delta Spec that appears above):

# what's the grid of shifts we wish to consider?
delta_grid <- seq(-1, 1, 1)

# initialize a tmle specification
tmle_msm_spec <- tmle_vimshift_msm(shift_fxn = shift_additive_bounded,
                                   shift_fxn_inv = shift_additive_bounded_inv,
                                   shift_grid = delta_grid,
                                   max_shifted_ratio = 2)

# fit the TML estimator and examine the results
tmle_msm_fit <- tmle3(tmle_msm_spec, data, node_list, learner_list)
## 2% of observations outside training support...predictions trimmed.
## 
## Iter: 1 fn: 1384.1240     Pars:  0.39096 0.29352 0.31552
## Iter: 2 fn: 1384.1240     Pars:  0.39096 0.29352 0.31552
## solnp--> Completed in 2 iterations
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 0.7% of observations outside training support...predictions trimmed.
## 3.3% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 3% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
## 6% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 2% of observations outside training support...predictions trimmed.
## 5% of observations outside training support...predictions trimmed.
## 4% of observations outside training support...predictions trimmed.
## 1% of observations outside training support...predictions trimmed.
tmle_msm_fit
## A tmle3_Fit that took 100 step(s)
##          type          param init_est  tmle_est          se     lower    upper
##        <char>         <char>    <num>     <num>       <num>     <num>    <num>
## 1: MSM_linear MSM(intercept) 1.553775 1.5540594 0.059376693 1.4376832 1.670436
## 2: MSM_linear     MSM(slope) 0.999536 0.9995316 0.006167658 0.9874432 1.011620
##    psi_transformed lower_transformed upper_transformed
##              <num>             <num>             <num>
## 1:       1.5540594         1.4376832          1.670436
## 2:       0.9995316         0.9874432          1.011620

References

Dı́az, Iván, and Mark J van der Laan. 2011. “Super Learner Based Conditional Density Estimation with Application to Marginal Structural Models.” The International Journal of Biostatistics 7 (1): 1–20.
———. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.
———. 2018. “Stochastic Treatment Regimes.” In Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies, 167–80. Springer Science & Business Media.
Pearl, Judea. 2000. Causality. Cambridge university press.
van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).
van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Science & Business Media.
———. 2018. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer Science & Business Media.