CV_lrnr_sl
to cv_sl
Lrnr_glmtree
, which uses the partykit
R package to fit recursive
partitioning and regression trees in a generalized linear model.cv_sl
, and removed
the coefficients column from the returned cv_risk
table.get_sl_revere_risk
argument to Lrnr_sl
's cv_risk
method to
provide the option (with default of FALSE
) to add a super learner's
revere-based risk (not a true cross-validated risk) to cv_risk
output.Lrnr_nnls
for binary and continuous outcomes.cv_control
argument to Lrnr_sl
, which allows users to define
specific cross-validation structures for fitting the super learner. This is
intended for use in a nested cross-validation scheme (such as cross-validated
super learner, cv_sl
, or when Lrnr_sl
is considered in the list of
candidate learners
in another Lrnr_sl
). In addition to constructing
clustered cross-validation with respect to id
, cv_control
also
can be used to construct stratified cross-validation folds for Lrnr_sl
.Lrnr_caret
now works for binary and categorical outcomes. Previous versions
state that these discrete outcome types are supported by Lrnr_caret
, but
the functionality would brake.sl3_Task
, get_folds
, which takes in
origami::make_folds
arguments and returns the folds. This function is
now called by task$folds
and it can be called in train as well, to obtain
folds from a task that have a non-default fold structure.Lrnr_caret
, Lrnr_glmnet
, Lrnr_hal9001
,
and Lrnr_sl
, use task$get_folds
to create folds. The learners' folds
respect the default CV fold structure in sl3
tasks (clustered CV when id
is supplied in the task; and stratified CV when outcomes are categorical or
binary, and when id
are nested in strata if id
supplied to task). However,
V
can be modified according to the learner-specific parameters. (Lrnr_sl
has a few extra CV tuning arguments, which are thoroughly documented in
cv_control
and modifications are only recommended for advanced use of
Lrnr_sl
.)formula
bug, which was causing formulas with "." to
return an empty task, and therefore learners with these formulas to fail.Lrnr_cv_selector
metalearner, which was using the wrong folds
to calculate the cross-validated risk estimate. This impacted
Lrnr_cv_selector
when eval_function
was not a loss function, e.g. AUC.
By calling task$folds
on the metalearner's training task, we were deriving
folds from the matrix of cross-validated predictions, and not using the folds
for cross-validating the candidates. We now require the folds for cross-
validating the candidates (i.e., the folds in task for training Lrnr_sl
) to
be supplied when Lrnr_cv_selector
's eval_function
is not a loss function.Lrnr_caret
and Lrnr_rpart
factor binary outcomes in their train
methods,
thereby considering a classification prediction problem. To avoid this
behavior and consider a regression prediction problem with a binary outcome
(e.g., to minimize the squared error or negative log likelihood loss in a
binary outcome prediction problem), users can set
factor_binary_outcome = FALSE
when they instantiate the learner.outcome_type
) is necessary for
a learner's predict
method (e.g., if categorical outcome predictions need to
be "packed" together), the outcome type in the training task should be
used. That is, private$.training_outcome_type
should be used to obtain
the outcome type in a learner's predict
method; the task supplied to
predict
should not be used. The following learners were referring to the
task supplied to predict
in order to retain the outcome type, and they were
modified to use the training task's outcome type instead: Lrnr_svm
,
Lrnr_randomForest
, Lrnr_ranger
, Lrnr_rpart
, Lrnr_polspline
. The
issue with pulling the outcome type from the task supplied to predict
is
that the outcome type of that task might be "none", if the outcome
argument
is not supplied to it.sl3_Task
parameters (man-roxygen/sl3_Task_extra.R).
Specifically, drop_missing_outcome
and flag
were added; offset
description was fixed; description of folds
was added, including how to
modify it and the default; and description of how the default cross-validation
structure considers id
and discrete (binary and categorical) outcome types
to construct clustered and stratified cross-validation schemes, respectively,
was added.process_data
(R/process_data.R), which
is called when instantiating a task, to process the covariates and identify
missingness in the outcome.Lrnr_grfcate
, a prediction function estimator for conditional average
treatment effect (CATE), which uses the causal_forest
function in grf
package. This learner is intended for use in the tmle3mopttx
package, where
CATE estimation and prediction is required.sl3_Task
argument
outcome_type
. Either "binomial"
, "binary"
or binomial()
can be
supplied for a binary outcome; "continuous"
,"gaussian"
, or gaussian()
for a continuous outcome; "categorical"
, "multinomial"
, or mutlinomial()"
for a categorical outcome. As before, when outcome_type
is not supplied, we
will try to detect it from the outcome values. If the supplied outcome_type
differs from the detected one, a warning is now thrown. If outcome_type
is
supplied but invalid, then an error is thrown upon sl3_Task
instantiation,
opposed to learner training.cv_sl
) returns the cross-validated
predictions for the super learner and its candidates.Lrnr_nnls
to support binary outcomes, including support for
convexity of the resultant model fit and warnings on prediction quality.Lrnr_define_interactions
Lrnr_bound
to better support more flexible bounding for
continuous outcomes (automatically setting a maximum of infinity).Lrnr_cv_selector
to support improved computation of the CV-risk,
averaging the risk strictly across validation/holdout sets.Lrnr_earth
(improving formals recognition), Lrnr_glmnet
(allowing offsets), and Lrnr_caret
(reformatting of arguments).Lrnr_lstm_keras
and
Lrnr_gru_keras
provide support for callback functions list and 2-layer
networks. Default callbacks
list provides early stopping criteria with
respect to 'Keras' defaults and patience
of 10 epochs. Also, these two
'Keras' learners now call args_to_list
upon initialization, and set
verbose argument according to options("keras.fit_verbose")
or
options("sl3.verbose")
.Lrnr_xgboost
to support prediction tasks consisting of one
observation (e.g., leave-one-out cross-validation).Lrnr_sl
by adding a new private slot .cv_risk
to store the risk
estimates, using this to avoid unnecessary re-computation in the print
method (the .cv_risk
slot is populated on the first print
call, and only
ever re-printed thereafter).default_metalearner
to use native markdown tables.Lrnr_screener_importance
's pairing of (a) covariates returned by the
importance function with (b) covariates as they are defined in the task. This
issue only arose when discrete covariates were automatically one-hot encoded
upon task initiation (i.e., when colnames(task$X) != task$nodes$covariates
).importance_plot
to plot variables in decreasing order of
importance, so most important variables are placed at the top of the dotchart.sl3
task's add_interactions
method to support
interactions that involve factors. This method is most commonly used by
Lrnr_define_interactions
, which is intended for use with another learner
(e.g., Lrnr_glmnet
or Lrnr_glm
) in a Pipeline
.Lrnr_gam
formula (if not specified by user) to not use mgcv
's
default k=10
degrees of freedom for each smooth s
term when there are
less than k=10
degrees of freedom. This bypasses an mgcv::gam
error, and
tends to be relevant only for small n.options(java.parameters = "-Xmx2500m")
and warning message when
Lrnr_bartMachine
is initialized, if this option has not already been set.
This option was incorporated since the default RAM of 500MB for a Java
virtual machine often errors due to memory issues with Lrnr_bartMachine
.stratify_cv
argument in Lrnr_glmnet
, which stratifies
internal cross-validation folds such that binary outcome prevalence in
training and validation folds roughly matches the prevalence in the training
task.min_screen
argument Lrnr_screener_coefs
, which tries to
ensure that at least min_screen
number of covariates are selected. If this
argument is specified and the learner
argument in Lrnr_screener_coefs
is
a Lrnr_glmnet
, then lambda
is increased until min_screen
number of
covariates are selected and a warning is produced. If min_screen
is
specified and the learner
argument in Lrnr_screener_coefs
is not a
Lrnr_glmnet
then it will error.Lrnr_hal9001
to work with v0.4.0 of the hal9001
package.formula
parameter and process_formula
function to the base
learner, Lrnr_base
, whose methods carry over to all other learners. When
a formula
is supplied as a learner parameter, the process_formula function constructs a design matrix by supplying the
formulato
model.matrix. This implementation allows
formulato be supplied to all learners, even those without native
formulasupport. The
formula should be an object of class "
formula`", or a character string that can be coerced
to that class.ROCR
performance measures custom_ROCR_risk
. Supports cutoff-dependent and
scalar ROCR
performance measures. The risk is defined as 1 - performance,
and is transformed back to the performance measure in cv_risk
and
importance
functions. This change prompted the revision of argument name
loss_fun
and loss_function
to eval_fun
and eval_function
,
respectively, since the evaluation of predictions relative to the observations
can be either a risk or a loss function. This argument name change impacted
the following: Lrnr_solnp
, Lrnr_optim
, Lrnr_cv_selector
, cv_risk
,
importance
, and CV_Lrnr_sl
.cv_risk
and importance
tables now swap "risk"
with this name attribute.folds
are not supplied to the
sl3_Task
and the outcome is a discrete (i.e., binary or categorical)
variable.importance
method the option to evaluate importance over
covariate_groups
, by removing/permuting all covariates in the same group
together.Lrnr_ga
as another metalearner.importance_plot
to summarize variable importance findings.reparameterize
and retrain
to Lrnr_base
, which
allows modification of the covariate set while training on a conserved task
and prediction on a new task using previously trained learners, respectively.[missing]
[missing]
[missing]
Lrnr_hal9001
and Lrnr_glmnet
to respect observation-level IDs.Remotes
and deprecation of Lrnr_rfcde
and Lrnr_condensier
:
Lrnr_rfcde
wrapped https://github.com/tpospisi/RFCDE, a sporadically
maintained tool for conditional density estimation (CDE). Support for
this has been removed in favor of built-in CDE tools, including, among
others, Lrnr_density_semiparametric
.Lrnr_condensier
wrapped https://github.com/osofr/condensier, which
provided a pooled hazards approach to CDE. This package contained an
implementation error (https://github.com/osofr/condensier/issues/15) and
was removed from CRAN. Support for this has been removed in favor of
Lrnr_density_semiparametric
and Lrnr_haldensify
, both of which more
reliably provide CDE support.Stack
objects for time series learners.README.Rmd
.Lrnr_nnls
.NA
s.gam
and caret
packages.gbm
, earth
,
polspline
packages.xgboost
and ranger
).