Title: | Machine Learning and Visualization |
---|---|
Description: | Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics. |
Authors: | E.D. Gennatas [aut, cre] |
Maintainer: | E.D. Gennatas <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.98.1 |
Built: | 2024-11-19 19:26:18 UTC |
Source: | https://github.com/egenn/rtemis |
Advanced Machine Learning made easy, efficient, reproducible
There are some options you can define in your .Rprofile (usually found in your home directory), so you do not have to define each time you execute a function.
General plotting theme; set to e.g. "whiteigrid" or "darkgraygrid"
Name of default palette to use in plots. See options by running rtpalette()
Font family to use in plots.
Number of cores to use. By default, rtemis will use available cores reported by future::availableCores(). In shared systems, you should limit this as appropriate.
Default plan to use for parallel processing.
Static graphics are handled using the mplot3
family.
Dynamic graphics are handled using the dplot3
family.
Functions for Regression and Classification begin with s_*
.
Run select_learn to get a list of available algorithms
The documentation of each supervised learning function indicates in
brackets, after the title whether the function supports classification,
regression, and survival analysis [C, R, S]
Functions for Clustering begin with c_*
.
Run select_clust to get a list of available algorithms
Functions for Decomposition and Dimensionality reduction begin with
d_*
.
Run select_decom to get a list of available algorithms
Functions for Cross-Decomposition begin with x_*
.
Run xselect_decom to get a list of available algorithms
Meta models are trained using meta*
functions.
Function documentation includes input type (e.g. "String", "Integer", "Float"/"Numeric", etc) and range in interval notation where applicable. For example, Float: [0, 1)" means floats between 0 and 1 including 0, but excluding 1
For all classification models, the outcome should be provided as a factor, with the first level of the factor being the 'positive' class, if applicable. A character vector supplied as outcome will be converted to factors, where by default the levels are set alphabetically and therefore the positive class may not be set correctly.
Maintainer: E.D. Gennatas [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/egenn/rtemis/issues
Binary matrix times character vector
x %BC% labels
x %BC% labels
x |
A binary matrix or data.frame |
labels |
Character vector length equal to |
a character vector
E.D. Gennatas
Checks if any column of a data frame have zero variance
any_constant(x)
any_constant(x)
x |
Input Data Frame |
E.D. Gennatas
linadleaves
to data.tree
objectConvert linadleaves
to data.tree
object
as.data.tree.linadleaves(object)
as.data.tree.linadleaves(object)
object |
|
rpart
rules to data.tree
objectConvert an rpart
object to a data.tree
object, which can be plotted with
dplot3_cart
as.data.tree.rpart(object, verbose = FALSE)
as.data.tree.rpart(object, verbose = FALSE)
object |
|
verbose |
Logical: If TRUE, print messages to console |
data.tree
object
E.D. Gennatas
shyoptleaves
to data.tree
objectConvert shyoptleaves
to data.tree
object
as.data.tree.shyoptleaves(object)
as.data.tree.shyoptleaves(object)
object |
|
Get the Area under the ROC curve to assess classifier performance.
auc( preds, labels, method = c("pROC", "ROCR", "auc_pairs"), verbose = FALSE, trace = 0 )
auc( preds, labels, method = c("pROC", "ROCR", "auc_pairs"), verbose = FALSE, trace = 0 )
preds |
Numeric, Vector: Probabilities or model scores (e.g. c(.32, .75, .63), etc) |
labels |
True labels of outcomes (e.g. c(0, 1, 1)) |
method |
Character: "pROC", "auc_pairs", or "ROCR": Method to use.
Will use |
verbose |
Logical: If TRUE, print messages to output |
trace |
Integer: If > 0, print more messages to output |
Important Note: We assume that true labels are a factor where the first level is the "positive" case, a.k.a. the event. All methods used here, "pROC", "auc_pairs", "ROCR", have been setup to expect this. This goes against the default setting for both "pROC" and "ROCR", which will not give an AUC less than .5 because they will reorder levels. We don't want this because you can have a classifier perform worse than .5 and it can be very confusing if levels are reordered automatically and different functions give you different AUC.
EDG
## Not run: preds <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2) labels <- factor(c("a", "a", "a", "b", "b", "b", "b")) auc(preds, labels, method = "ROCR") auc(preds, labels, method = "pROC") auc(preds, labels, method = "auc_pairs") ## End(Not run)
## Not run: preds <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2) labels <- factor(c("a", "a", "a", "b", "b", "b", "b")) auc(preds, labels, method = "ROCR") auc(preds, labels, method = "pROC") auc(preds, labels, method = "auc_pairs") ## End(Not run)
Get the Area under the ROC curve to assess classifier performance using pairwise concordance
auc_pairs(estimated.score, true.labels, verbose = TRUE)
auc_pairs(estimated.score, true.labels, verbose = TRUE)
estimated.score |
Float, Vector: Probabilities or model scores (e.g. c(.32, .75, .63), etc) |
true.labels |
True labels of outcomes (e.g. c(0, 1, 1)) |
verbose |
Logical: If TRUE, print messages to output |
The first level of true.labels
must be the positive class, and high numbers in
estimated.score
should correspond to the positive class.
## Not run: true.labels <- factor(c("a", "a", "a", "b", "b", "b", "b")) estimated.score <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2) auc_pairs(estimated.score, true.labels, verbose = TRUE) ## End(Not run)
## Not run: true.labels <- factor(c("a", "a", "a", "b", "b", "b", "b")) estimated.score <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2) auc_pairs(estimated.score, true.labels, verbose = TRUE) ## End(Not run)
Balanced Accuracy of a binary classifier
bacc(true, predicted, harmonize = FALSE, verbosity = 1)
bacc(true, predicted, harmonize = FALSE, verbosity = 1)
true |
True labels |
predicted |
Estimated labels |
harmonize |
Logical: passed to sensitivity and specificity, which use factor_harmonize. Default = FALSE |
verbosity |
Integer: If > 0, print messages to console. |
BAcc = .5 * (Sensitivity + Specificity)
Extract coefficients from Additive Tree leaves
betas.lihad(object, newdata, verbose = FALSE, trace = 0)
betas.lihad(object, newdata, verbose = FALSE, trace = 0)
object |
|
newdata |
matrix/data.frame of features |
verbose |
Logical: If TRUE, print output to console |
trace |
Integer 0:2 Increase verbosity |
E.D. Gennatas
Bias-Variance Decomposition
bias_variance( x, y, mod, res1_train.p = 0.7, params = list(), resample.params = setup.resample(n.resamples = 100), seed = NULL, verbose = TRUE, res.verbose = FALSE, ... )
bias_variance( x, y, mod, res1_train.p = 0.7, params = list(), resample.params = setup.resample(n.resamples = 100), seed = NULL, verbose = TRUE, res.verbose = FALSE, ... )
x |
Predictors |
y |
Outcome |
mod |
Character: rtemis learner |
res1_train.p |
Numeric: Proportion of cases to use for training |
params |
List of |
resample.params |
Output of setup.resample |
seed |
Integer: Seed for initial train/test split |
verbose |
Logical: If TRUE, print messages to console |
res.verbose |
Logical: passed to the learning function |
... |
Additional arguments passed to |
E.D. Gennatas
Binary matrix times character vector
binmat2vec(x, labels = colnames(x))
binmat2vec(x, labels = colnames(x))
x |
A binary matrix or data.frame |
labels |
Character vector length equal to |
a character vector
String formatting utilities
bold(...) italic(...) underline(...) hilite(..., col = "69;1", bold = TRUE) hilitebig(x) red(..., bold = FALSE) green(..., bold = FALSE) orange(..., bold = FALSE) cyan(..., bold = FALSE) magenta(..., bold = FALSE) gray(..., bold = FALSE, sep = " ") reset(...)
bold(...) italic(...) underline(...) hilite(..., col = "69;1", bold = TRUE) hilitebig(x) red(..., bold = FALSE) green(..., bold = FALSE) orange(..., bold = FALSE) cyan(..., bold = FALSE) magenta(..., bold = FALSE) gray(..., bold = FALSE, sep = " ") reset(...)
... |
Character objects to format |
bold |
Logical: If TRUE, use bold font |
x |
Numeric: Input |
sep |
Character: Separator |
Train an ensemble using boosting of any learner
boost( x, y = NULL, x.valid = NULL, y.valid = NULL, x.test = NULL, y.test = NULL, mod = "cart", resid = NULL, boost.obj = NULL, mod.params = list(), case.p = 1, weights = NULL, learning.rate = 0.1, earlystop.params = setup.earlystop(window = 30, window_decrease_pct_min = 0.01), earlystop.using = "train", tolerance = 0, tolerance.valid = 1e-05, max.iter = 10, init = NULL, x.name = NULL, y.name = NULL, question = NULL, base.verbose = FALSE, verbose = TRUE, trace = 0, print.progress.every = 5, print.error.plot = "final", prefix = NULL, plot.theme = rtTheme, plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, print.base.plot = FALSE, plot.type = "l", outdir = NULL, ... )
boost( x, y = NULL, x.valid = NULL, y.valid = NULL, x.test = NULL, y.test = NULL, mod = "cart", resid = NULL, boost.obj = NULL, mod.params = list(), case.p = 1, weights = NULL, learning.rate = 0.1, earlystop.params = setup.earlystop(window = 30, window_decrease_pct_min = 0.01), earlystop.using = "train", tolerance = 0, tolerance.valid = 1e-05, max.iter = 10, init = NULL, x.name = NULL, y.name = NULL, question = NULL, base.verbose = FALSE, verbose = TRUE, trace = 0, print.progress.every = 5, print.error.plot = "final", prefix = NULL, plot.theme = rtTheme, plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, print.base.plot = FALSE, plot.type = "l", outdir = NULL, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.valid |
Data.frame; optional: Validation data |
y.valid |
Float, vector; optional: Validation outcome |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
mod |
Character: Algorithm to train base learners, for options, see select_learn. Default = "cart" |
resid |
Float, vector, length = length(y): Residuals to work on. Do not change unless you know what you're doing. Default = NULL, for regular boosting |
boost.obj |
(Internal use) |
mod.params |
Named list of arguments for |
case.p |
Float (0, 1]: Train each iteration using this perceent of cases. Default = 1, i.e. use all cases |
weights |
Numeric vector: Weights for cases. For classification, |
learning.rate |
Float (0, 1] Learning rate for the additive steps |
earlystop.params |
List with early stopping parameters. Set using setup.earlystop |
earlystop.using |
Character: "train" or "valid". For the latter,
requires |
tolerance |
Float: If training error <= this value, training stops |
tolerance.valid |
Float: If validation error <= this value, training stops |
max.iter |
Integer: Maximum number of iterations (additive steps) to perform. Default = 10 |
init |
Float: Initial value for prediction. Default = mean(y) |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
base.verbose |
Logical: |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If > 0, print diagnostic info to console |
print.progress.every |
Integer: Print progress over this many iterations |
print.error.plot |
String or Integer: "final" plots a training and validation (if available) error curve at the end of training. If integer, plot training and validation error curve every this many iterations during training. "none" for no plot. |
prefix |
Internal |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
print.plot |
Logical: if TRUE, produce plot using |
print.base.plot |
Logical: Passed to |
plot.type |
Character: "l" or "p". Plot using lines or points. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional parameters to be passed to learner define by |
If learning.rate
is set to 0, a nullmod will be created
E.D. Gennatas
Bootstrap Resampling
bootstrap(x, n.resamples = 10, seed = NULL)
bootstrap(x, n.resamples = 10, seed = NULL)
x |
Input vector |
n.resamples |
Integer: Number of resamples to make. Default = 10 |
seed |
Integer: If provided, set seed for reproducibility. Default = NULL |
E.D. Gennatas
Calculate the Brier Score for classification:
brier_score(true, estimated.prob)
brier_score(true, estimated.prob)
true |
Numeric vector, 0, 1: True labels |
estimated.prob |
Numeric vector, [0, 1]: Estimated probabilities |
E.D. Gennatas
Perform fuzzy C-means clustering using e1071::cmeans
c_CMeans( x, k = 2, iter.max = 100, dist = "euclidean", method = "cmeans", m = 2, rate.par = NULL, weights = 1, control = list(), verbose = TRUE, ... )
c_CMeans( x, k = 2, iter.max = 100, dist = "euclidean", method = "cmeans", m = 2, rate.par = NULL, weights = 1, control = list(), verbose = TRUE, ... )
x |
Input data |
k |
Integer: Number of clusters to get. Default = 2 |
iter.max |
Integer: Maximum number of iterations. Default = 100 |
dist |
Character: Distance measure to use: 'euclidean' or 'manhattan'. Default = "euclidean" |
method |
Character: "cmeans" - fuzzy c-means clustering; "ufcl": on-line update. Default = "cmeans" |
m |
Float (>1): Degree of fuzzification. Default = 2 |
rate.par |
Float (0, 1): Learning rate for the online variant. (Default = .3) |
weights |
Float (>0): Case weights |
control |
List of control parameters. See |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters to be passed to |
rtClust
object
E.D. Gennatas
Other Clustering:
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform DBSCAN clustering
c_DBSCAN( x, x.test = NULL, eps = 1, minPts = NCOL(x) + 1, weights = NULL, borderPoints = TRUE, search = c("kdtree", "linear", "dist"), verbose = TRUE, ... )
c_DBSCAN( x, x.test = NULL, eps = 1, minPts = NCOL(x) + 1, weights = NULL, borderPoints = TRUE, search = c("kdtree", "linear", "dist"), verbose = TRUE, ... )
x |
Input matrix / data.frame |
x.test |
Testing set matrix / data.frame |
eps |
Numeric: Radius of the epsilon neighborhood |
minPts |
Integer: Number of minimum points required in the eps neighborhood for core points (including the point itself). |
weights |
Numeric vector: Data points' weights. Needed for weighted clustering. |
borderPoints |
Logical: If TRUE, assign border points to clusters, otherwise they are considered noise |
search |
Character: "kdtree", "linear" or "dist": nearest neighbor search strategy |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to be passed to |
See dbscan::dbscan
for info on how to choose eps
and
minPts
Efstathios D. Gennatas
Other Clustering:
c_CMeans()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform clustering by
EM
using EMCluster::emcluster
c_EMC( x, x.test = NULL, k = 2, lab = NULL, EMC = EMCluster::.EMC, verbose = TRUE, ... )
c_EMC( x, x.test = NULL, k = 2, lab = NULL, EMC = EMCluster::.EMC, verbose = TRUE, ... )
x |
Input matrix / data.frame |
x.test |
Testing set matrix / data.frame |
k |
Integer: Number of clusters to get |
lab |
Vector, length |
EMC |
List of control parameters for |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to be passed to |
First, EMCluster::simple.init(x, nclass = k)
is run,
followed by EMCluster::emcluster(x, emobj = emobj, assign.class = TRUE, ...)
This can be very slow.
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perfomr K-Means clustering using
h2o::h2o.kmeans
c_H2OKMeans( x, x.test = NULL, k = 2, estimate.k = FALSE, nfolds = 0, max.iterations = 10, ip = "localhost", port = 54321, n.cores = rtCores, seed = -1, init = c("Furthest", "Random", "PlusPlus", "User"), categorical.encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), verbose = TRUE, ... )
c_H2OKMeans( x, x.test = NULL, k = 2, estimate.k = FALSE, nfolds = 0, max.iterations = 10, ip = "localhost", port = 54321, n.cores = rtCores, seed = -1, init = c("Furthest", "Random", "PlusPlus", "User"), categorical.encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), verbose = TRUE, ... )
x |
Input matrix / data.frame |
x.test |
Testing set matrix / data.frame |
k |
Integer: Number of clusters to get |
estimate.k |
Logical: if TRUE, estimate k up to a maximum set by the |
nfolds |
Integer: Number of cross-validation folds |
max.iterations |
Integer: Maximum number of iterations |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port number of H2O server. Default = 54321 |
n.cores |
Integer: Number of cores to use |
seed |
Integer: Seed for H2O's random number generator. Default = -1 (time-based ranodm number) |
init |
Character: Initialization mode: "Furthest", "Random", "PlusPlus", "User". Default = "Furthest" |
categorical.encoding |
Character: How to encode categorical variables: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Default = "AUTO" |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional arguments to pass to |
Check out the H2O Flow at [ip]:[port]
, Default IP:port is "localhost:54321"
e.g. if running on localhost, point your web browser to localhost:54321
For additional information, see help on h2o::h2o.kmeans
rtMod
object
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform clustering by
Hard Competitive Learning
using flexclust::cclust
c_HARDCL(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
c_HARDCL(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
x |
Input matrix / data.frame |
x.test |
Optional test set data |
k |
Integer: Number of clusters to get |
dist |
Character: Distance measure to use: 'euclidean' or 'manhattan' |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters to be passed to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform
HOPACH clustering
using hopach::hopach
c_HOPACH( x, dmat = NULL, metric = c("cosangle", "abscosangle", "euclid", "abseuclid", "cor", "abscor"), k = 15, kmax = 9, khigh = 9, trace = 0, verbose = TRUE, ... )
c_HOPACH( x, dmat = NULL, metric = c("cosangle", "abscosangle", "euclid", "abseuclid", "cor", "abscor"), k = 15, kmax = 9, khigh = 9, trace = 0, verbose = TRUE, ... )
x |
Input matrix / data.frame |
dmat |
Matrix (numeric, no missing values) or |
metric |
Character: Dissimilarity metric to be used. Options: "cosangle", "abscosangle", "euclid", "abseuclid", "cor", "abscor" |
k |
Integer, (0:15]: Maximum number of levels |
kmax |
Integer, [1:9]: Maximum number of children at each node in the tree |
khigh |
Integer, [1:9]: Maximum number of children at each nod ein the tree when computing the
the Mean/Median Split Silhouette. Usually same as |
trace |
Integer: If trace > 0, print messages during HOPACH run. Default = 0 |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters to pass to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform K-means clustering using flexclust::cclust
c_KMeans(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
c_KMeans(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
x |
Input matrix / data.frame |
x.test |
Testing set matrix / data.frame |
k |
Integer: Number of clusters to get |
dist |
Character: Distance measure to use: 'euclidean' or 'manhattan' |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to pass to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform Mean Shift clustering using meanShiftR::meanShift
c_MeanShift( x, nNeighbors = NROW(x), algorithm = c("LINEAR", "KDTREE"), kernelType = c("NORMAL", "EPANECHNIKOV", "BIWEIGHT"), bandwidth = rep(1, NCOL(x)), alpha = 0, iterations = 10, epsilon = 1e-08, epsilonCluster = 1e-04, parameters = NULL, verbose = TRUE, ... )
c_MeanShift( x, nNeighbors = NROW(x), algorithm = c("LINEAR", "KDTREE"), kernelType = c("NORMAL", "EPANECHNIKOV", "BIWEIGHT"), bandwidth = rep(1, NCOL(x)), alpha = 0, iterations = 10, epsilon = 1e-08, epsilonCluster = 1e-04, parameters = NULL, verbose = TRUE, ... )
x |
Input matrix |
nNeighbors |
Integer: Number of neighbors to consider for kernel density estimate |
algorithm |
Character: "LINEAR" or "KDTREE" |
kernelType |
Character: "NORMAL", "EPANECHNIKOV", "BIWEIGHT" |
bandwidth |
Numeric vector, length = ncol(x): Use in kernel density estimation for steepest ascent classification. |
alpha |
Numeric: A scalar tuning parameter for normal kernels. When this parameter is set to zero, the mean shift algorithm will operate as usual. When this parameter is set to one, the mean shift algorithm will be approximated through Newton's Method. When set to a value between zero and one, a generalization of Newton's Method and mean shift will be used instead providing a means to balance convergence speed with stability. |
iterations |
Integer: Number of iterations to perform |
epsilon |
Numeric: used to determine when to terminate the iteration of an individual query point. If the distance between the query point at iteration i and i+1 is less than epsilon, then iteration ceases on this point. |
epsilonCluster |
Numeric: Used to determine the minimum distance between distinct clusters. This distance is applied after all iterations have finished and in order of the rows of queryData. |
parameters |
A scalar or vector of paramters used by the specific algorithm. There are no optional parameters for the "LINEAR" method, "KDTREE" supports optional parameters for the maximum number of points to store in a leaf node and the maximum value for the quadratic form in the normal kernel, ignoring the constant value -0.5. |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters to be passed to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform Neural Gas clustering using flexclust::cclust
c_NGAS(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
c_NGAS(x, x.test = NULL, k = 2, dist = "euclidean", verbose = TRUE, ...)
x |
Input matrix / data.frame |
x.test |
Testing set matrix / data.frame |
k |
Integer: Number of clusters to get |
dist |
Character: Distance measure to use: 'euclidean' or 'manhattan' |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to be passed to |
rtClust
object
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_PAM()
,
c_PAMK()
,
c_SPEC()
Perform
PAM clustering
using cluster::pam
c_PAM( x, k = 2, diss = FALSE, metric = "euclidean", do.swap = TRUE, verbose = TRUE, ... )
c_PAM( x, k = 2, diss = FALSE, metric = "euclidean", do.swap = TRUE, verbose = TRUE, ... )
x |
Input matrix / data.frame |
k |
Integer: Number of clusters to get |
diss |
Logical: If TRUE, |
metric |
Character: Dissimilarity metric to be used. Options: 'euclidean', 'manhattan' |
do.swap |
Logical: If TRUE, perform the swap phase (See |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to be passed to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAMK()
,
c_SPEC()
Estimate
PAM clustering
solution and optimal k using fpc::pamk
c_PAMK( x, krange = 2:10, criterion = "asw", usepam = ifelse(nrow(x) < 2000, TRUE, FALSE), scaling = TRUE, diss = inherits(data, "dist"), metric = "euclidean", do.swap = TRUE, trace = 0, verbose = TRUE, ... )
c_PAMK( x, krange = 2:10, criterion = "asw", usepam = ifelse(nrow(x) < 2000, TRUE, FALSE), scaling = TRUE, diss = inherits(data, "dist"), metric = "euclidean", do.swap = TRUE, trace = 0, verbose = TRUE, ... )
x |
Input matrix / data.frame |
krange |
Integer vector: Range of k values to try |
criterion |
Character: Criterion to use for selecting k: "asw",
"multiasw" or "ch". See |
usepam |
Logical: If TRUE, use |
scaling |
Logical or Numeric vector: If TRUE, scale input. If numeric vector of length equal to number of features, the features are divided by the corresponding value. |
diss |
Logical: If TRUE, treat |
metric |
Character: Dissimilarity metric to be used. Options: 'euclidean', 'manhattan' |
do.swap |
Logical: If TRUE, perform the swap phase. See |
trace |
Integer [0, 3]: Trace level for |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters to be passed to |
rtClust
object
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_SPEC()
Perform Spectral Clustering
using kernlab::specc
c_SPEC( x, k = 2, kernel = "rbfdot", kpar = "automatic", nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6, iterations = 200, mod.sample = 0.75, na.action = na.omit, verbose = TRUE, ... )
c_SPEC( x, k = 2, kernel = "rbfdot", kpar = "automatic", nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6, iterations = 200, mod.sample = 0.75, na.action = na.omit, verbose = TRUE, ... )
x |
Input matrix / data.frame |
k |
Integer: Number of clusters to get |
kernel |
Character: Kernel to use: "rbfdot", "polydot", "vanilladot", tanhdot", "laplacedot", "besseldot", "anovadot", "splinedot", "stringdot" |
kpar |
String OR List: "automatic", "local" OR list with: sigma (for "rbfdor", "laplacedot"); degree, scale, offset (for "polydot"); scale, offset (for "tanhdot"); sigma, order, degree (for "besseldot"); sigma, degree (for "anovadot"); length, lambda, normalized (for "stringdot") |
nystrom.red |
Logical: if TRUE, use nystrom method to calculate eigenvectors (Default = FALSE) |
nystrom.sample |
Integer: Number of points to use for estimating the eigenvalues when |
iterations |
Integer: Number of iterations allowed |
mod.sample |
Float (0, 1): Proportion of data to use when estimating sigma. Default = .75 |
na.action |
Function: Action to perform on NA (Default = |
verbose |
Logical: If TRUE, print messages to screen |
... |
Additional parameters to be passed to |
E.D. Gennatas
Other Clustering:
c_CMeans()
,
c_DBSCAN()
,
c_EMC()
,
c_H2OKMeans()
,
c_HARDCL()
,
c_HOPACH()
,
c_KMeans()
,
c_MeanShift()
,
c_NGAS()
,
c_PAM()
,
c_PAMK()
Calibrate predicted probabilities using a generalized additive model (GAM).
calibrate( true.labels, predicted.prob, pos.class = NULL, mod = c("gam", "glm"), k = 5, verbose = TRUE )
calibrate( true.labels, predicted.prob, pos.class = NULL, mod = c("gam", "glm"), k = 5, verbose = TRUE )
true.labels |
Factor with true class labels |
predicted.prob |
Numeric vector with predicted probabilities |
pos.class |
Integer: Index of the positive class |
mod |
Character: Model to use for calibration. Either "gam" or "glm" |
k |
Integer: GAM degrees of freedom |
verbose |
Logical: If TRUE, print messages to the console |
mod: fitted GAM model. Use mod$fitted.values
to get calibrated
input probabilities; use predict(mod, newdata = newdata, type = "response")
to calibrate other estimated probabilities.
EDG
## Not run: data(segment_logistic, package = "probably") # Plot the calibration curve of the original predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = segment_logistic$.pred_poor, n_windows = 10, pos.class = 2 ) # Plot the calibration curve of the calibrated predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = calibrate( segment_logistic$Class, segment_logistic$.pred_poor )$fitted.values, n_windows = 10, pos.class = 2 ) ## End(Not run)
## Not run: data(segment_logistic, package = "probably") # Plot the calibration curve of the original predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = segment_logistic$.pred_poor, n_windows = 10, pos.class = 2 ) # Plot the calibration curve of the calibrated predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = calibrate( segment_logistic$Class, segment_logistic$.pred_poor )$fitted.values, n_windows = 10, pos.class = 2 ) ## End(Not run)
Calibrate cross-validated model trained using train_cv
calibrate_cv( mod, alg = "gam", learn.params = list(), resample.params = setup.resample(resampler = "kfold", n.resamples = 5, seed = NULL), which.repeat = 1, verbosity = 1, debug = FALSE )
calibrate_cv( mod, alg = "gam", learn.params = list(), resample.params = setup.resample(resampler = "kfold", n.resamples = 5, seed = NULL), which.repeat = 1, verbosity = 1, debug = FALSE )
mod |
|
alg |
Character: "gam" or "glm", algorithm to use for calibration |
learn.params |
List: List of parameters to pass to the learning algorithm |
resample.params |
List of parameters to pass to the resampling algorithm. Build using setup.resample |
which.repeat |
Integer: Which repeat to use for calibration |
verbosity |
Integer: 0: silent, > 0: print messages |
debug |
Logical: If TRUE, run without parallel processing, to allow better debugging. |
This is a work in progress to be potentially incorporated into train_cv
You start by training a cross-validated model using train_cv, then this function
can be used to calibrate the model. In order to use all available data, each outer
resample from the input mod
is resampled (using 5-fold CV by default) to train
and test calibration models. This allows using the original label-based metrics
of mod
and also extract calibration metrics based on the same data, after
aggregating the test set predictions of the calibration models.
List: Calibrated models, test-set labels, test set performance metrics, estimated probabilities (uncalibrated), calibrated probabilities,
E.D. Gennatas
Print range of continuous variable
catrange(x, ddSci = TRUE, decimal.places = 1, na.rm = TRUE)
catrange(x, ddSci = TRUE, decimal.places = 1, na.rm = TRUE)
x |
Numeric vector |
ddSci |
Logical: If TRUE, use ddSci or range. Default = TRUE |
decimal.places |
Integer: Number of decimal place to use if |
na.rm |
Logical: passed to |
E.D. Gennatas
Get NCOL(x)
and NROW{x}
catsize(x, name = NULL, verbose = TRUE, newline = TRUE)
catsize(x, name = NULL, verbose = TRUE, newline = TRUE)
x |
R object (usually that inherits from matrix or data.frame) |
name |
Character: Name of input object |
verbose |
Logical: If TRUE, print NROW and NCOL to console. |
newline |
Logical: If TRUE, end with new line character. |
vector of NROW, NCOL invisibly
E.D. Gennatas
catsize(iris)
catsize(iris)
Check Data
check_data( x, name = NULL, get_duplicates = TRUE, get_na_case_pct = FALSE, get_na_feature_pct = FALSE )
check_data( x, name = NULL, get_duplicates = TRUE, get_na_case_pct = FALSE, get_na_feature_pct = FALSE )
x |
data.frame, data.table or similar structure |
name |
Character: Name of dataset |
get_duplicates |
Logical: If TRUE, check for duplicate cases |
get_na_case_pct |
Logical: If TRUE, calculate percent of NA values per case |
get_na_feature_pct |
Logical: If TRUE, calculate percent of NA values per feature |
E.D. Gennatas
## Not run: n <- 1000 x <- rnormmat(n, 50, return.df = TRUE) x$char1 <- sample(letters, n, TRUE) x$char2 <- sample(letters, n, TRUE) x$fct <- factor(sample(letters, n, TRUE)) x <- rbind(x, x[1, ]) x$const <- 99L x[sample(nrow(x), 20), 3] <- NA x[sample(nrow(x), 20), 10] <- NA x$fct[30:35] <- NA check_data(x) ## End(Not run)
## Not run: n <- 1000 x <- rnormmat(n, 50, return.df = TRUE) x$char1 <- sample(letters, n, TRUE) x$char2 <- sample(letters, n, TRUE) x$fct <- factor(sample(letters, n, TRUE)) x <- rbind(x, x[1, ]) x$const <- 99L x[sample(nrow(x), 20), 3] <- NA x[sample(nrow(x), 20), 10] <- NA x$fct[30:35] <- NA check_data(x) ## End(Not run)
Check file(s) exist
check_files(paths, verbose = TRUE, pad = 0)
check_files(paths, verbose = TRUE, pad = 0)
paths |
Character vector of paths |
verbose |
Logical: If TRUE, print messages to console |
pad |
Integer: Number of spaces to pad to the left |
E.D. Gennatas
Returns list with relative variance over n.steps, absolute.threshold, last value,
and logical "stop", if conditions are met and training should stop.
The final stop decision is:
check.thresh | (check.rthresh & check.rvar)
if combine.relative.thresholds = "AND"
or
check.thresh | (check.rthresh | check.rvar)
if combine.relative.thresholds = "OR"
checkpoint_earlystop( x, absolute.threshold = NA, relative.threshold = NA, minimize = TRUE, relativeVariance.threshold = NA, n.steps = 10, combine.relative.thresholds = "AND", min.steps = 50, na.response = c("stop", "continue"), verbose = TRUE )
checkpoint_earlystop( x, absolute.threshold = NA, relative.threshold = NA, minimize = TRUE, relativeVariance.threshold = NA, n.steps = 10, combine.relative.thresholds = "AND", min.steps = 50, na.response = c("stop", "continue"), verbose = TRUE )
x |
Float, vector: Input - this would normally be the loss at each iteration |
absolute.threshold |
Float: If set and the last value of |
relative.threshold |
Float: If set, checks if the relative change from the first to last value of |
minimize |
Logical: See |
relativeVariance.threshold |
Float: If relative variance over last |
n.steps |
Integer; > 1: Calculate relative variance over this many last values of |
combine.relative.thresholds |
Character: "AND" or "OR": How to combine the criteria |
min.steps |
Integer: Do not calculate relativeVariance unless |
na.response |
Character: "stop" or "continue": what should happen if the last value of |
verbose |
Logical: If TRUE, print messages to console |
List with the following items:
last.value
Float: Last value of x
relativeVariance
Float: relative variance of last n.steps
check.thresh
Logical: TRUE, if absolute threshold was reached
check.rvar
Logical: TRUE, if relative variance threshold was reached
stop
Logical: TRUE, if either criterion was met - absolute threshold or relativeVariance.threshold
E.D. Gennatas
Relax. Use Ctrl-C to exit (but try to stay relaxed)
chill(sleep = 0.5, text = NULL, max = 1000)
chill(sleep = 0.5, text = NULL, max = 1000)
sleep |
Float: Time in seconds between drawings. Default = .5 |
text |
Character: Text to display |
max |
Integer: Max times to repeat. Default = 1000 |
Calculates Classification Metrics
class_error( true, estimated, estimated.prob = NULL, calc.auc = TRUE, calc.brier = TRUE, auc.method = c("pROC", "ROCR", "auc_pairs"), trace = 0 )
class_error( true, estimated, estimated.prob = NULL, calc.auc = TRUE, calc.brier = TRUE, auc.method = c("pROC", "ROCR", "auc_pairs"), trace = 0 )
true |
Factor: True labels |
estimated |
Factor: Estimated values |
estimated.prob |
Numeric vector: Estimated probabilities |
calc.auc |
Logical: If TRUE, calculate AUC. May be slow in very large datasets. |
calc.brier |
Logical: If TRUE, calculate Brier Score |
auc.method |
Character: "pROC", "ROCR", "auc_pairs": Method to use, passed to auc. |
trace |
Integer: If > 0, print diagnostic messages. Default = 0 |
Note that auc.method = "pROC" is the only one that will output an AUC even if one or more estimated probabilities are NA.
S3 object of type "class_error"
E.D. Gennatas
## Not run: true <- factor(c("a", "a", "a", "b", "b", "b", "b", "b", "b", "b")) estimated <- factor(c("a", "a", "b", "b", "a", "a", "b", "b", "a", "a")) estimated.prob <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2, .37, .57, .61) class_error(true, estimated, estimated.prob, auc.method = "pROC") class_error(true, estimated, estimated.prob, auc.method = "ROCR") class_error(true, estimated, estimated.prob, auc.method = "auc_pairs") ## End(Not run)
## Not run: true <- factor(c("a", "a", "a", "b", "b", "b", "b", "b", "b", "b")) estimated <- factor(c("a", "a", "b", "b", "a", "a", "b", "b", "a", "a")) estimated.prob <- c(0.7, 0.55, 0.45, 0.25, 0.6, 0.7, 0.2, .37, .57, .61) class_error(true, estimated, estimated.prob, auc.method = "pROC") class_error(true, estimated, estimated.prob, auc.method = "ROCR") class_error(true, estimated, estimated.prob, auc.method = "auc_pairs") ## End(Not run)
Calculate class imbalance as given by:
where is the number of classes, and
is the number of
instances of class
class_imbalance(x)
class_imbalance(x)
x |
Vector, factor: Labels of outcome. If |
E.D. Gennatas
Clean column names by replacing all spaces and punctuation with a single underscore
clean_colnames(x)
clean_colnames(x)
x |
Character, vector |
E.D. Gennatas
clean_colnames(iris)
clean_colnames(iris)
Clean character vector by replacing all symbols and sequences of symbols with single underscores, ensuring no name begins or ends with a symbol
clean_names(x, prefix_digits = "V_")
clean_names(x, prefix_digits = "V_")
x |
Character vector |
prefix_digits |
Character: prefix to add to names beginning with a digit. Set to NA to skip |
E.D. Gennatas
x <- c("Patient ID", "_Date-of-Birth", "SBP (mmHg)") x clean_names(x)
x <- c("Patient ID", "_Date-of-Birth", "SBP (mmHg)") x clean_names(x)
Convenience function to perform any rtemis clustering
clust(x, clust = "kmeans", x.test = NULL, verbose = TRUE, ...)
clust(x, clust = "kmeans", x.test = NULL, verbose = TRUE, ...)
x |
Numeric matrix / data frame: Input data |
clust |
Character: Decomposition algorithm name, e.g. "nmf" (case-insensitive) |
x.test |
Numeric matrix / Data frame: Testing set data if supported by
|
verbose |
Logical: if TRUE, print messages to screen |
... |
Additional arguments to be passed to clusterer |
rtClust
object
E.D. Gennatas
Extract coefficients from Hybrid Additive Tree leaves
## S3 method for class 'lihad' coef(object, newdata, verbose = FALSE, trace = 0, ...)
## S3 method for class 'lihad' coef(object, newdata, verbose = FALSE, trace = 0, ...)
object |
|
newdata |
matrix/data.frame of features |
verbose |
Logical: If TRUE, print output to console |
trace |
Integer 0:2 Increase verbosity |
... |
Not used |
E.D. Gennatas
Convert a color to grayscale
col2grayscale(x, what = c("color", "decimal"))
col2grayscale(x, what = c("color", "decimal"))
x |
Color to convert to grayscale |
what |
Character: "color" returns a hexadecimal color, "decimal" returns a decimal between 0 and 1 |
Uses the NTSC grayscale conversion: 0.299 * R + 0.587 * G + 0.114 * B
col2grayscale("red") col2grayscale("red", "dec")
col2grayscale("red") col2grayscale("red", "dec")
Convert a color that R understands into the corresponding hexadecimal code
col2hex(color)
col2hex(color)
color |
Color(s) that R understands |
E.D. Gennatas
col2hex(c("gray50", "skyblue"))
col2hex(c("gray50", "skyblue"))
Collapse data.frame to vector by getting column max
colMax(x, na.rm = TRUE)
colMax(x, na.rm = TRUE)
x |
Matrix or Data frame input |
na.rm |
Logical: passed to |
E.D. Gennatas
Fade color towards target
color_fade(x, to = "#000000", pct = 0.5)
color_fade(x, to = "#000000", pct = 0.5)
x |
Color source |
to |
Target color |
pct |
Numeric (0, 1) fraction of the distance in RGBA space between
|
Color in hex notation
E.D. Gennatas
Invert Color in RGB space
color_invertRGB(x)
color_invertRGB(x)
x |
Color, vector |
Inverted colors using hexadecimal notation #RRGGBBAA
E.D. Gennatas
## Not run: cols <- c("red", "green", "blue") previewcolor(cols) cols |> color_invertRGB() |> previewcolor() ## End(Not run)
## Not run: cols <- c("red", "green", "blue") previewcolor(cols) cols |> color_invertRGB() |> previewcolor() ## End(Not run)
Average colors
color_mean(x, space = c("RGB", "HSV"))
color_mean(x, space = c("RGB", "HSV"))
x |
Color vector |
space |
Character: RGB or HSV; space to average in |
E.D. Gennatas
## Not run: color_mean(c("red", "blue")) |> previewcolor() color_mean(c("red", "blue"), "HSV") |> previewcolor() ## End(Not run)
## Not run: color_mean(c("red", "blue")) |> previewcolor() color_mean(c("red", "blue"), "HSV") |> previewcolor() ## End(Not run)
Order colors by RGB distance
color_order(x, start_with = 1, order_by = c("similarity", "dissimilarity"))
color_order(x, start_with = 1, order_by = c("similarity", "dissimilarity"))
x |
Vector of colors |
start_with |
Integer: Which color to output in first position |
order_by |
Character: "similarity" or "dissimilarity" |
E.D. Gennatas
Separate colors by RGB distance
color_separate(x, start_with = 1)
color_separate(x, start_with = 1)
x |
Vector of colors |
start_with |
Integer: Which color to output in first position |
Starting with the first color defined by start_with
, the next color
is chosen to be max distance from all preceding colors
E.D. Gennatas
Get the squared RGB distance between two colors
color_sqdist(x, y)
color_sqdist(x, y)
x |
Color |
y |
Color |
E.D. Gennatas
color_sqdist("red", "green") color_sqdist("#16A0AC", "#FA6E1E")
color_sqdist("red", "green") color_sqdist("#16A0AC", "#FA6E1E")
Modify alpha, hue, saturation and value (HSV) of a color
colorAdjust(color, alpha = NULL, hue = 0, sat = 0, val = 0)
colorAdjust(color, alpha = NULL, hue = 0, sat = 0, val = 0)
color |
Input color. Any format that grDevices::col2rgb() recognizes |
alpha |
Numeric: Scale alpha by this amount. Future: replace with absolute setting |
hue |
Float: How much hue to add to |
sat |
Float: How much saturation to add to |
val |
Float: How much to increase value of |
Adjusted color
E.D. Gennatas
Create a gradient of colors and optionally a colorbar
colorGrad( n = 21, colors = NULL, space = c("rgb", "Lab"), lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", preview = FALSE, colorbar = FALSE, cb.n = 21, cb.mar = c(1, 1, 1, 1), cb.add = FALSE, cb.add.mar = c(5, 0, 2, 5), cb.axis.pos = 1.1, cb.axis.las = 1, cb.axis.hadj = 0, cb.cex = 6, bar.min = -1, bar.mid = 0, bar.max = 1, cex = 1.2, filename = NULL, pdf.width = 3, pdf.height = 7, theme = getOption("rt.theme", "light"), bg = NULL, col.text = NULL, plotlycb = FALSE, plotly.width = 80, plotly.height = 500, rtrn.plotly = FALSE, margins = c(0, 0, 0, 0), pad = 0, par.reset = TRUE )
colorGrad( n = 21, colors = NULL, space = c("rgb", "Lab"), lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", preview = FALSE, colorbar = FALSE, cb.n = 21, cb.mar = c(1, 1, 1, 1), cb.add = FALSE, cb.add.mar = c(5, 0, 2, 5), cb.axis.pos = 1.1, cb.axis.las = 1, cb.axis.hadj = 0, cb.cex = 6, bar.min = -1, bar.mid = 0, bar.max = 1, cex = 1.2, filename = NULL, pdf.width = 3, pdf.height = 7, theme = getOption("rt.theme", "light"), bg = NULL, col.text = NULL, plotlycb = FALSE, plotly.width = 80, plotly.height = 500, rtrn.plotly = FALSE, margins = c(0, 0, 0, 0), pad = 0, par.reset = TRUE )
n |
Integer: How many distinct colors you want. If not odd, converted to |
colors |
Character: Acts as a shortcut to defining |
space |
Character: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb".
Recommendation: If |
lo |
Color for low end |
lomid |
Color for low-mid |
mid |
Color for middle of the range or "mean", which will result in |
midhi |
Color for middle-high |
hi |
Color for high end |
preview |
Logical: Plot the colors horizontally |
colorbar |
Logical: Create a vertical colorbar |
cb.n |
Integer: How many steps you would like in the colorbar |
cb.mar |
Vector, length 4: Colorbar margins. Default: c(1, 1, 1, 1) |
cb.add |
Logical: If TRUE, colorbar will be added to existing plot |
cb.add.mar |
Vector: Margins for colorbar (See |
cb.axis.pos |
Float: Position of axis (See |
cb.axis.las |
Integer 0,1,2,3: Style of axis labels. 0: Always parallel to the axis, 1: Horizontal, 2: Perpendicular, 3: Vertical. Default = 1 |
cb.axis.hadj |
Float: Adjustment parallel to the reading direction (See |
cb.cex |
FLoat: Character expansion factor for colorbar (See |
bar.min |
Numeric: Lowest value in colorbar |
bar.mid |
Numeric: Middle value in colorbar |
bar.max |
Numeric: Max value in colorbar |
cex |
Float: Character expansion for axis |
filename |
String (Optional: Path to file to save colorbar |
pdf.width |
Float: Width for PDF output. Default = 3 |
pdf.height |
Float: Height for PDF output. Default = 7 |
theme |
Character: "light", "dark" |
bg |
Color: Background color |
col.text |
Color: Colorbar text color |
plotlycb |
Logical: Create colorbar using |
plotly.width |
Float: Width for plotly colorbar. Default = 80 |
plotly.height |
Float: Height for plotly colorbar. Default = 500 |
rtrn.plotly |
Logical: If TRUE, return |
margins |
Vector: Plotly margins. Default = c(0, 0, 0, 0) |
pad |
Float: Padding for |
par.reset |
Logical: If TRUE (Default), reset |
It is best to provide an odd number, so that there is always an equal number of colors on either side of the midpoint. For example, if you want a gradient from -1 to 1 or equivalent, an n = 11, will give 5 colors on either side of 0, each representing a 20\
colors
can be defined as a sequence of 3-letter color abbreviations of 2, 3, 4, or 5 colors
which will correspond to values: {"lo","hi"}; {"lo", "mid", "hi"}; {"lo", "mid", "midhi", "hi"}, and
{"lo", "lomid", "mid", "midhi", "hi"}, respectively.
For example, try colorGrad(21, "blugrnblkredyel", colorbar = TRUE)
3-letter color abbreviations:
wht: white; blk: black; red; grn: green; blu: blue; yel: yellow; rng: orange; prl: purple
Invisible vector of hexadecimal colors / plotly object if rtrn.plotly = TRUE
E.D. Gennatas
Color gradient for continuous variable
colorGrad.x(x, color = c("gray20", "#18A3AC"), space = "Lab")
colorGrad.x(x, color = c("gray20", "#18A3AC"), space = "Lab")
x |
Float, vector |
color |
Color, vector, length 2 |
space |
Character: "rgb" or "Lab", Default = "Lab" |
E.D. Gennatas
Color gradient for continuous variable
colorgradient.x( x, symmetric = FALSE, lo.col = "#0290EE", mid.col = "#1A1A1A", hi.col = "#FFBD4F", space = "Lab" )
colorgradient.x( x, symmetric = FALSE, lo.col = "#0290EE", mid.col = "#1A1A1A", hi.col = "#FFBD4F", space = "Lab" )
x |
Float, vector |
symmetric |
Logical: If TRUE, make symmetric gradient between
|
lo.col |
Low color |
mid.col |
Middle color |
hi.col |
High color |
space |
Character: "rgb" or "Lab". Default = "Lab" |
E.D. Gennatas
## Not run: x <- seq(-10, 10, length.out = 51) previewcolor(colorgradient.x(x)) x <- sort(rnorm(40)) previewcolor(colorgradient.x(x, mid.col = "white")) # Notice how most values are near zero therefore almost white ## End(Not run)
## Not run: x <- seq(-10, 10, length.out = 51) previewcolor(colorgradient.x(x)) x <- sort(rnorm(40)) previewcolor(colorgradient.x(x, mid.col = "white")) # Notice how most values are near zero therefore almost white ## End(Not run)
Create an alternating sequence of graded colors
colorMix(color, n = 4)
colorMix(color, n = 4)
color |
List: List of two or more elements, each containing two colors. A gradient will be created from the first to the second color of each element |
n |
Integer: Number of steps in each gradient. |
E.D. Gennatas
color <- list(blue = c("#82afd3", "#000f3a"), gray = c("gray10", "gray85")) previewcolor(desaturate(colorMix(color, 6), .3)) color <- list(blue = c("#82afd3", "#57000a"), gray = c("gray10", "gray85")) previewcolor(desaturate(colorMix(color, 6), .3)) color <- list(blue = c("#82afd3", "#000f3a"), purple = c("#23001f", "#c480c1")) previewcolor(desaturate(colorMix(color, 5), .3))
color <- list(blue = c("#82afd3", "#000f3a"), gray = c("gray10", "gray85")) previewcolor(desaturate(colorMix(color, 6), .3)) color <- list(blue = c("#82afd3", "#57000a"), gray = c("gray10", "gray85")) previewcolor(desaturate(colorMix(color, 6), .3)) color <- list(blue = c("#82afd3", "#000f3a"), purple = c("#23001f", "#c480c1")) previewcolor(desaturate(colorMix(color, 5), .3))
Invert a color or calculate the mean of two colors in HSV or RGB space. This may be useful in creating colors for plots
colorOp(col, fn = c("invert", "mean"), space = c("HSV", "RGB"))
colorOp(col, fn = c("invert", "mean"), space = c("HSV", "RGB"))
col |
Input color(s) |
fn |
Character: "invert", "mean": Function to perform |
space |
Character: "HSV", "RGB": Colorspace to operate in - for averaging only |
The average of two colors in RGB space will often pass through gray, which is likely undesirable. Averaging in HSV space, better for most applications.
Color
E.D. Gennatas
Convenience function to create a list out of data frame columns
cols2list(x)
cols2list(x)
x |
Input: Will be coerced to data.frame, then each column will become an element of a list |
E.D. Gennatas
Defines a complete predictive modeling pipeline and saves it as a JSON file.
create_config( data_path, target = NULL, binclass_posid = 1, alg = "lightgbm", train.params = NULL, inner.resampling = setup.resample(resampler = "cv", n.resamples = 5), outer.resampling = setup.resample(resampler = "cv", n.resamples = 10), config.path = "rtemis-config.json", model.outdir = NULL, allow.overwrite = FALSE, verbose = TRUE )
create_config( data_path, target = NULL, binclass_posid = 1, alg = "lightgbm", train.params = NULL, inner.resampling = setup.resample(resampler = "cv", n.resamples = 5), outer.resampling = setup.resample(resampler = "cv", n.resamples = 10), config.path = "rtemis-config.json", model.outdir = NULL, allow.overwrite = FALSE, verbose = TRUE )
data_path |
Character: Path to data file. Can be any file recognized by read, commonly CSV, Excel, or RDS. |
target |
Character: Name of the target variable in the data. If not specified, the last
column of |
alg |
Character: Algorithm to use. Any of |
train.params |
List: Parameters for the training algorithm. |
inner.resampling |
List: Resampling method for the inner loop, i.e. hyperparameter tuning, a.k.a. model selection. Set using setup.resample |
outer.resampling |
List: Resampling method for the outer loop, i.e. testing. Set using setup.resample |
config.path |
Character: Path to save configuration file. |
model.outdir |
Character: Directory to save trained model and associated files. If NULL,
the directory of |
config as list, invisibly.
EDG
Combine rules
crules(...)
crules(...)
... |
Character: Rules |
E.D. Gennatas
Train an Autoencoder using h2o::h2o.deeplearning
Check out the H2O Flow at [ip]:[port]
, Default IP:port is "localhost:54321"
e.g. if running on localhost, point your web browser to localhost:54321
d_H2OAE( x, x.test = NULL, x.valid = NULL, ip = "localhost", port = 54321, n.hidden.nodes = c(ncol(x), 3, ncol(x)), extract.layer = ceiling(length(n.hidden.nodes)/2), epochs = 5000, activation = "Tanh", loss = "Automatic", input.dropout.ratio = 0, hidden.dropout.ratios = rep(0, length(n.hidden.nodes)), learning.rate = 0.005, learning.rate.annealing = 1e-06, l1 = 0, l2 = 0, stopping.rounds = 50, stopping.metric = "AUTO", scale = TRUE, center = TRUE, n.cores = rtCores, verbose = TRUE, save.mod = FALSE, outdir = NULL, ... )
d_H2OAE( x, x.test = NULL, x.valid = NULL, ip = "localhost", port = 54321, n.hidden.nodes = c(ncol(x), 3, ncol(x)), extract.layer = ceiling(length(n.hidden.nodes)/2), epochs = 5000, activation = "Tanh", loss = "Automatic", input.dropout.ratio = 0, hidden.dropout.ratios = rep(0, length(n.hidden.nodes)), learning.rate = 0.005, learning.rate.annealing = 1e-06, l1 = 0, l2 = 0, stopping.rounds = 50, stopping.metric = "AUTO", scale = TRUE, center = TRUE, n.cores = rtCores, verbose = TRUE, save.mod = FALSE, outdir = NULL, ... )
x |
Vector / Matrix / Data Frame: Training set Predictors |
x.test |
Vector / Matrix / Data Frame: Testing set Predictors |
x.valid |
Vector / Matrix / Data Frame: Validation set Predictors |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port number for server. Default = 54321 |
Integer vector of length equal to the number of hidden layers you wish to create |
|
extract.layer |
Integer: Which layer to extract. For regular autoencoder, this is the middle layer.
Default = |
epochs |
Integer: How many times to iterate through the dataset. Default = 5000 |
activation |
Character: Activation function to use: "Tanh" (Default), "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout" |
loss |
Character: "Automatic" (Default), "CrossEntropy", "Quadratic", "Huber", "Absolute" |
input.dropout.ratio |
Float (0, 1): Dropout ratio for inputs |
Vector, Float (0, 2): Dropout ratios for hidden layers |
|
learning.rate |
Float: Learning rate. Default = .005 |
learning.rate.annealing |
Float: Learning rate annealing. Default = 1e-06 |
l1 |
Float (0, 1): L1 regularization (introduces sparseness; i.e. sets many weights to 0; reduces variance, increases generalizability) |
l2 |
Float (0, 1): L2 regularization (prevents very large absolute weights; reduces variance, increases generalizability) |
stopping.rounds |
Integer: Stop if simple moving average of length |
stopping.metric |
Character: Stopping metric to use: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error". Default = "AUTO" ("logloss" for Classification, "deviance" for Regression) |
scale |
Logical: If TRUE, scale input before training autoencoder. Default = TRUE |
center |
Logical: If TRUE, center input before training autoencoder. Default = TRUE |
n.cores |
Integer: Number of cores to use |
verbose |
Logical: If TRUE, print summary to screen. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional arguments to pass to |
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Other Deep Learning:
s_H2ODL()
,
s_TFN()
Perform GLRM decomposition using h2o::h2o.glrm
Given Input matrix A
:
A(m x n) = X(m x k) \%*\% Y(k x n)
d_H2OGLRM( x, x.test = NULL, x.valid = NULL, k = 3, ip = "localhost", port = 54321, transform = "NONE", loss = "Quadratic", regularization.x = "None", regularization.y = "None", gamma.x = 0, gamma.y = 0, max_iterations = 1000, max_updates = 2 * max_iterations, init_step_size = 1, min_step_size = 1e-04, seed = -1, init = "PlusPlus", svd.method = "Randomized", verbose = TRUE, print.plot = TRUE, plot.theme = rtTheme, n.cores = rtCores, ... )
d_H2OGLRM( x, x.test = NULL, x.valid = NULL, k = 3, ip = "localhost", port = 54321, transform = "NONE", loss = "Quadratic", regularization.x = "None", regularization.y = "None", gamma.x = 0, gamma.y = 0, max_iterations = 1000, max_updates = 2 * max_iterations, init_step_size = 1, min_step_size = 1e-04, seed = -1, init = "PlusPlus", svd.method = "Randomized", verbose = TRUE, print.plot = TRUE, plot.theme = rtTheme, n.cores = rtCores, ... )
x |
Input data |
x.test |
Optional test set. Will be projected on to NMF basis |
x.valid |
Optional validation set |
k |
Integer: Rank of decomposition |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port number for server. Default = 54321 |
transform |
Character: Transformation of input prior to decomposition |
loss |
Character: Numeric loss function: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", "Periodic". Default = "Quadratic" |
regularization.x |
Character: Regularization function for X matrix: "None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex". Default = "None" |
regularization.y |
Character: Regularization function for Y matrix: "None", "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex". Default = "None" |
gamma.x |
Float: Regularization weight on X matrix. Default = 0 |
gamma.y |
Float: Regularization weight on Y matrix. Default = 0 |
max_iterations |
Integer: Maximum number of iterations. Default = 1000 |
max_updates |
Integer: Maximum number of iterations. Default = 2 * |
init_step_size |
Float: Initial step size. Default = 1 |
min_step_size |
Float: Minimum step size. Default = .0001 |
seed |
Integer: Seed for random number generator. Default = -1 (time-based) |
init |
Character: Initialization mode: "Random", "SVD", "PlusPlus", "User". Default = "PlusPlus" |
svd.method |
Character: SVD method for initialization: "GramSVD", "Power", "Randomized". Default = "Randomized" |
verbose |
Logical: If TRUE, print console messages |
print.plot |
Logical: If TRUE, print objective score against iteration number |
plot.theme |
Character: Theme to pass to mplot3_xy if |
n.cores |
Integer: Number of cores to use |
... |
Additional parameters to be passed to |
Learn more about GLRM from the H2O tutorial https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/glrm/glrm-tutorial.md
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform ICA decomposition using the fastICA algorithm in fastICA::fastICA
or
ica::fastica
d_ICA( x, k = 3, package = c("fastICA", "ica"), alg.type = "parallel", maxit = 100, scale = TRUE, center = TRUE, verbose = TRUE, trace = 0, ... )
d_ICA( x, k = 3, package = c("fastICA", "ica"), alg.type = "parallel", maxit = 100, scale = TRUE, center = TRUE, verbose = TRUE, trace = 0, ... )
x |
Input data |
k |
Integer vector of length 1 or greater. Rank of decomposition |
package |
Character: Which package to use for ICA. "fastICA" will
use |
alg.type |
Character: For |
maxit |
Integer: Maximum N of iterations |
scale |
Logical: If TRUE, scale input data before decomposition. |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
trace |
Integer: If > 0, print messages during ICA run. Default = 0 |
... |
Additional parameters to be passed to |
Project scaled variables to ICA components. Input must be n by p, where n represents number of cases, and p represents number of features. fastICA will be applied to the transpose of the n x p matrix. fastICA will fail if there are any NA values or constant features: remove them using preprocess
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform ISOMAP decomposition using vegan::isomap
d_Isomap( x, k = 2, dist.method = "euclidean", nsd = 0, path = c("shortest", "extended"), center = TRUE, scale = TRUE, verbose = TRUE, n.cores = rtCores, ... )
d_Isomap( x, k = 2, dist.method = "euclidean", nsd = 0, path = c("shortest", "extended"), center = TRUE, scale = TRUE, verbose = TRUE, n.cores = rtCores, ... )
x |
Input data |
k |
Integer vector of length 1 or greater. Rank of decomposition |
dist.method |
Character: Distance calculation method. See |
nsd |
Integer: Number of shortest dissimilarities retained |
path |
Character: The |
center |
Logical: If TRUE, center data prior to decomposition. Default = TRUE |
scale |
Logical: If TRUE, scale data prior to decomposition. Default = TRUE |
verbose |
Logical: If TRUE, print messages to output |
n.cores |
Integer: Number of cores to use |
... |
Additional parameters to be passed to |
Project scaled variables to ISOMAP components Input must be n by p, where n represents number of cases, and p represents number of features. ISOMAP will be applied to the transpose of the n x p matrix.
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform kernel PCA decomposition using kernlab::kpca
d_KPCA( x, x.test = NULL, k = 2, th = 1e-04, kernel = "rbfdot", kpar = NULL, center = TRUE, scale = TRUE, verbose = TRUE, ... )
d_KPCA( x, x.test = NULL, k = 2, th = 1e-04, kernel = "rbfdot", kpar = NULL, center = TRUE, scale = TRUE, verbose = TRUE, ... )
x |
Input data |
x.test |
Optional test set. Will be projected on to KPCA basis |
k |
Integer vector of length 1 or greater. N of components to return
If set to 0, |
th |
Threshold for eigenvalue below which PCs are ignored if |
kernel |
Character: Type of kernel to use. See |
kpar |
List of hyperparameters: See |
center |
Logical: If TRUE, center data prior to decomposition. Default = TRUE |
scale |
Logical: If TRUE, scale data prior to decomposition. Default = TRUE |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Project scaled variables to KPCA components. Input must be n by p, where n represents number of cases, and p represents number of features. KPCA will be applied to the transpose of the n x p matrix.
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform LLE decomposition using RDRToolbox::lle
d_LLE(x, k = 2, nn = 6, verbose = TRUE)
d_LLE(x, k = 2, nn = 6, verbose = TRUE)
x |
Input data |
k |
Integer: dimensionality of the embedding |
nn |
Integer: Number of neighbors. |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
Project scaled variables to LLE components Input must be n by p, where n represents number of cases, and p represents number of features.
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform MDS decomposition using stats:cmdscale
d_MDS( x, k = 2, dist.method = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), eig = FALSE, add = FALSE, x.ret = FALSE, scale = TRUE, center = TRUE, verbose = TRUE, ... )
d_MDS( x, k = 2, dist.method = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), eig = FALSE, add = FALSE, x.ret = FALSE, scale = TRUE, center = TRUE, verbose = TRUE, ... )
x |
Input data |
k |
Integer vector of length 1 or greater. Rank of decomposition |
dist.method |
Character: method to use to calculate distance. See |
eig |
Logical: If TRUE, return eigenvalues. Default = FALSE |
add |
Logical: If TRUE, an additive constant |
x.ret |
Logical: If TRUE, return the doubly centered symmetric distance matrix. Default = FALSE |
scale |
Logical: If TRUE, scale input data before decomposition. Default = TRUE |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Project scaled variables to MDS components. Input must be n by p, where n represents number of cases, and p represents number of features. fastMDS will be applied to the transpose of the n x p matrix. fastMDS will fail if there are any NA values or constant features: remove them using preprocess
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform NMF decomposition using NMF::nmf
d_NMF( x, x.test = NULL, k = 2, method = "brunet", nrun = 30, scale = TRUE, center = FALSE, verbose = TRUE, ... )
d_NMF( x, x.test = NULL, k = 2, method = "brunet", nrun = 30, scale = TRUE, center = FALSE, verbose = TRUE, ... )
x |
Input data |
x.test |
Optional test set. Will be projected on to NMF basis |
k |
Integer vector of length 1 or greater. Rank of decomposition |
method |
NMF method. Defaults to "brunet". See |
nrun |
Integer: Number of runs to perform |
scale |
Logical: If TRUE, scale input data before projecting |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Project scaled variables to NMF bases. Input must be n by p, where n represents number of cases, and p represents number of features. NMF will be applied to the transpose of the n x p matrix.
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform PCA decomposition using stats::prcomp
d_PCA( x, x.test = NULL, k = NULL, scale = TRUE, center = TRUE, verbose = TRUE, ... )
d_PCA( x, x.test = NULL, k = NULL, scale = TRUE, center = TRUE, verbose = TRUE, ... )
x |
Input matrix |
x.test |
Optional test set. Will be projected on to PCA basis |
k |
Integer: Number of right singular vectors to compute ( |
scale |
Logical: If TRUE, scale input data before doing SVD |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Same solution as d_SVD. d_PCA runs prcomp
, which has useful
summary
output
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform sparse and/or non-negative PCA or cumulative PCA decomposition
using nsprcomp::nsprcomp
or nsprcomp::nscumcomp
respectively
d_SPCA( x, x.test = NULL, k = 1, nz = floor(0.5 * NCOL(x)), nneg = FALSE, gamma = 0, method = c("cumulative", "vanilla"), scale = TRUE, center = TRUE, verbose = TRUE, ... )
d_SPCA( x, x.test = NULL, k = 1, nz = floor(0.5 * NCOL(x)), nneg = FALSE, gamma = 0, method = c("cumulative", "vanilla"), scale = TRUE, center = TRUE, verbose = TRUE, ... )
x |
Input matrix |
x.test |
Optional test set. Will be projected on to SPCA basis |
k |
Integer vector of length 1 or greater. N of components to return
If set to 0, |
nz |
Integer: Upper bound on non-zero loadings. See |
nneg |
Logical: If TRUE, calculate non-negative loadings only. Default = FALSE |
gamma |
Float (>0): Penalty on the divergence from otrhonormality of the pseudo-rotation matrix. Default = 0, i.e. no penalty. May need to increase with collinear features. |
method |
Character: "cumulative" or "vanilla" sparse PCA. Default = "cumulative" |
scale |
Logical: If TRUE, scale input data before projecting. Default = TRUE |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Project scaled variables to sparse and/or non-negative PCA components. Input must be n by p, where n represents number of cases, and p represents number of features. SPCA will be applied to the transpose of the n x p matrix.
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SVD()
,
d_TSNE()
,
d_UMAP()
Perform SVD decomposition using base::svd
d_SVD( x, x.test = NULL, k = 2, nu = 0, scale = TRUE, center = TRUE, verbose = TRUE, ... )
d_SVD( x, x.test = NULL, k = 2, nu = 0, scale = TRUE, center = TRUE, verbose = TRUE, ... )
x |
Input matrix |
x.test |
Optional test set matrix. Will be projected on to SVD bases |
k |
Integer: Number of right singular vectors to compute ( |
nu |
Integer: Number of left singular vectors to compute |
scale |
Logical: If TRUE, scale input data before doing SVD. Default = TRUE |
center |
Logical: If TRUE, also center input data if |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Same solution as d_PCA
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_TSNE()
,
d_UMAP()
Perform t-SNE decomposition using Rtsne::Rtsne
d_TSNE( x, k = 3, initial.dims = 50, perplexity = 15, theta = 0, check.duplicates = TRUE, pca = TRUE, max.iter = 1000, scale = FALSE, center = FALSE, is.distance = FALSE, verbose = TRUE, outdir = "./", ... )
d_TSNE( x, k = 3, initial.dims = 50, perplexity = 15, theta = 0, check.duplicates = TRUE, pca = TRUE, max.iter = 1000, scale = FALSE, center = FALSE, is.distance = FALSE, verbose = TRUE, outdir = "./", ... )
x |
Input matrix |
k |
Integer. Number of t-SNE components required |
initial.dims |
Integer: Number of dimensions to retain in initial PCA. Default = 50 |
perplexity |
Numeric: Perplexity parameter |
theta |
Float: 0.0: exact TSNE. Increase for higher speed, lower accuracy. Default = 0 |
check.duplicates |
Logical: If TRUE, Checks whether duplicates are present. Best to set test manually |
pca |
Logical: If TRUE, perform initial PCA step. Default = TRUE |
max.iter |
Integer: Maximum number of iterations. Default = 1000 |
scale |
Logical: If TRUE, scale before running t-SNE using |
center |
Logical: If TRUE, and |
is.distance |
Logical: If TRUE, |
verbose |
Logical: If TRUE, print messages to output |
outdir |
Path to output directory |
... |
Options for |
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_UMAP()
Perform UMAP decomposition using uwot::umap
d_UMAP( x, x.test = NULL, k = 2, n.neighbors = 15, init = "spectral", metric = c("euclidean", "cosine", "manhattan", "hamming", "categorical"), epochs = NULL, learning.rate = 1, scale = TRUE, verbose = TRUE, ... )
d_UMAP( x, x.test = NULL, k = 2, n.neighbors = 15, init = "spectral", metric = c("euclidean", "cosine", "manhattan", "hamming", "categorical"), epochs = NULL, learning.rate = 1, scale = TRUE, verbose = TRUE, ... )
x |
Input matrix |
x.test |
Optional test set matrix. Will be projected on to UMAP bases |
k |
Integer: Number of projections |
n.neighbors |
Integer: Number of keighbors |
init |
Character: Initialization type. See |
metric |
Character: Distance metric to use: "euclidean", "cosine", "manhattan", "hamming", "categorical". Default = "euclidean" |
epochs |
Integer: Number of epochs |
learning.rate |
Float: Learning rate. Default = 1 |
scale |
Logical: If TRUE, scale input data before doing UMAP. Default = TRUE |
verbose |
Logical: If TRUE, print messages to screen. Default = TRUE |
... |
Additional parameters to be passed to |
Updated 2023-12-09: See GitHub issue and related comment
rtDecom
object
E.D. Gennatas
Other Decomposition:
d_H2OAE()
,
d_H2OGLRM()
,
d_ICA()
,
d_Isomap()
,
d_KPCA()
,
d_LLE()
,
d_MDS()
,
d_NMF()
,
d_PCA()
,
d_SPCA()
,
d_SVD()
,
d_TSNE()
Convert a dataset to its b-spline basis set
dat2bsplinemat( x, df = NULL, knots = NULL, degree = 3L, intercept = FALSE, Boundary.knots = range(x, na.rm = TRUE), return.deriv = FALSE, as.data.frame = TRUE )
dat2bsplinemat( x, df = NULL, knots = NULL, degree = 3L, intercept = FALSE, Boundary.knots = range(x, na.rm = TRUE), return.deriv = FALSE, as.data.frame = TRUE )
x |
data.frame: Input |
df |
Integer: Degrees of freedom. See |
knots |
Float, vector: Internal breakpoints. See |
degree |
Integer (>0): Degree of the piecewise polynomial. See |
intercept |
Logical: If TRUE, an intercept is included. Default = FALSE |
Boundary.knots |
Float, vector (length = 2): Boundary points to anchor the spline basis. |
return.deriv |
Logical: If TRUE, return list containing a data frame with the splines and another data frame with their derivatives |
as.data.frame |
Logical: If TRUE, return data.frame, otherwise matrix. Default = TRUE
See |
If return.deriv=F
, a data frame where each original feature is replaced with its basis set or a list,
otherwise a list containing a data frame with splines and a data frame with their derivatives
E.D. Gennatas
This is a convenience function that will take each column of the input and calculate 1:degree powers and concatenate
into a data.frame of dimensions n * (degree * p)
given an n * p
input
dat2poly( dat, method = c("simple", "poly"), degree = 2, raw = FALSE, as.data.frame = TRUE )
dat2poly( dat, method = c("simple", "poly"), degree = 2, raw = FALSE, as.data.frame = TRUE )
dat |
Numeric, matrix / data.frame: Input |
method |
Character: "simple", "poly". "simple": raise each column of |
degree |
Integer: degree of polynomials to create. Default = 2 |
raw |
Logical: If TRUE, create simple polynomial, not orthogonalized. Default = FALSE |
as.data.frame |
Logical: If TRUE, return data.frame. Default = TRUE |
E.D. Gennatas
Convert Date to time bin factor.
date2factor( x, time_bin = c("year", "quarter", "month", "day"), make_bins = c("range", "present"), bin_range = range(x, na.rm = TRUE), ordered = FALSE )
date2factor( x, time_bin = c("year", "quarter", "month", "day"), make_bins = c("range", "present"), bin_range = range(x, na.rm = TRUE), ordered = FALSE )
x |
Date vector |
time_bin |
Character: "year", "quarter", "month", or "day" |
make_bins |
Character: "range" or "preseent". If "range" the factor levels will include all
time periods define by |
bin_range |
Date, vector, length 2: Range of dates to make levels for. Defaults to range of
input dates |
ordered |
Logical: If TRUE, factor output is ordered. Default = FALSE |
Order of levels will be chronological (important e.g. for plotting)
Additionally, can output ordered factor with ordered = TRUE
factor of time periods
E.D. Gennatas
## Not run: library(data.table) startDate <- as.Date("2018-01-01") endDate <- as.Date("2020-12-31") time <- sample(seq(startDate, endDate, length.out = 100)) date2factor(time) date2factor(time, "quarter") date2factor(time, "month") date2factor(time, "day") # range vs present x <- sample(seq(as.Date("2018-01-01"), as.Date("2021-01-01"), by = 1), 10) date2factor(x, time_bin = "quarter", make_bins = "present") date2factor(x, time_bin = "quarter", make_bins = "range") ## End(Not run)
## Not run: library(data.table) startDate <- as.Date("2018-01-01") endDate <- as.Date("2020-12-31") time <- sample(seq(startDate, endDate, length.out = 100)) date2factor(time) date2factor(time, "quarter") date2factor(time, "month") date2factor(time, "day") # range vs present x <- sample(seq(as.Date("2018-01-01"), as.Date("2021-01-01"), by = 1), 10) date2factor(x, time_bin = "quarter", make_bins = "present") date2factor(x, time_bin = "quarter", make_bins = "range") ## End(Not run)
Date to year-month factor
date2ym(x, ordered = FALSE)
date2ym(x, ordered = FALSE)
x |
Date vector |
ordered |
Logical: If TRUE, return ordered factor. Default = FALSE |
E.D. Gennatas
Date to year-quarter factor
date2yq(x, ordered = FALSE)
date2yq(x, ordered = FALSE)
x |
Date vector |
ordered |
Logical: If TRUE, return ordered factor. Default = FALSE |
E.D. Gennatas
Collect a table read with ddb_data(x, collect = FALSE)
ddb_collect(sql, progress = TRUE, returnobj = c("data.frame", "data.table"))
ddb_collect(sql, progress = TRUE, returnobj = c("data.frame", "data.table"))
sql |
Character: DuckDB SQL query, usually output of
ddb_data with |
progress |
Logical: If TRUE, show progress bar |
returnobj |
Character: data.frame or data.table: class of object to return |
E.D. Gennatas
## Not run: sql <- ddb_data("/Data/iris.csv", collect = FALSE) ir <- ddb_ollect(sql) ## End(Not run)
## Not run: sql <- ddb_data("/Data/iris.csv", collect = FALSE) ir <- ddb_ollect(sql) ## End(Not run)
Lazy-read a CSV file, optionally filter rows, remove duplicates, clean column names, convert character to factor, and collect.
ddb_data( filename, datadir = NULL, sep = ",", header = TRUE, quotechar = "", ignore_errors = TRUE, make_unique = TRUE, select_columns = NULL, filter_column = NULL, filter_vals = NULL, character2factor = FALSE, collect = TRUE, progress = TRUE, returnobj = c("data.table", "data.frame"), data.table.key = NULL, clean_colnames = TRUE, verbose = TRUE )
ddb_data( filename, datadir = NULL, sep = ",", header = TRUE, quotechar = "", ignore_errors = TRUE, make_unique = TRUE, select_columns = NULL, filter_column = NULL, filter_vals = NULL, character2factor = FALSE, collect = TRUE, progress = TRUE, returnobj = c("data.table", "data.frame"), data.table.key = NULL, clean_colnames = TRUE, verbose = TRUE )
filename |
Character: file name; either full path or just the file name,
if |
datadir |
Character: Optional path if |
sep |
Character: Field delimiter/separator |
header |
Logical: If TRUE, first line will be read as column names |
quotechar |
Character: Quote character |
ignore_errors |
Logical: If TRUE, ignore parsing errors (sometimes it's either this or no data, so) |
make_unique |
Logical: If TRUE, keep only unique rows |
select_columns |
Character vector: Column names to select |
filter_column |
Character: Name of column to filter on, e.g. "ID" |
filter_vals |
Numeric or Character vector: Values in
|
character2factor |
Logical: If TRUE, convert character columns to factors |
collect |
Logical: If TRUE, collect data and return structure class
as defined by |
progress |
Logical: If TRUE, print progress (no indication this works) |
returnobj |
Character: "data.frame" or "data.table" object class to
return. If "data.table", data.frame object returned from
|
data.table.key |
Character: If set, this correspond to a column name in the dataset. This column will be set as key in the data.table output |
clean_colnames |
Logical: If TRUE, clean colnames with clean_colnames |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
## Not run: ir <- ddb_data("/Data/massive_dataset.csv", filter_column = "ID", filter_vals = 8001:9999 ) ## End(Not run)
## Not run: ir <- ddb_data("/Data/massive_dataset.csv", filter_column = "ID", filter_vals = 8001:9999 ) ## End(Not run)
2 Decimal places, otherwise scientific notation
ddSci(x, decimal.places = 2, hi = 1e+06, asNumeric = FALSE)
ddSci(x, decimal.places = 2, hi = 1e+06, asNumeric = FALSE)
x |
Vector of numbers |
decimal.places |
Integer: Return this many decimal places. Default = 2 |
hi |
Float: Threshold at or above which scientific notation is used. Default = 1e06 |
asNumeric |
Logical: If TRUE, convert to numeric before returning. Default = FALSE.
This will not force all numbers to print 2 decimal places. For example:
1.2035 becomes "1.20" if |
Numbers will be formatted to 2 decimal places, unless this results in 0.00 (e.g. if input was .0032),
in which case they will be converted to scientific notation with 2 significant figures.
ddSci
will return 0.00
if the input is exactly zero.
This function can be used to format numbers in plots, on the console, in logs, etc.
Formatted number
E.D. Gennatas
x <- .34876549 ddSci(x) # "0.35" x <- .00000000457823 ddSci(x) # "4.6e-09"
x <- .34876549 ddSci(x) # "0.35" x <- .00000000457823 ddSci(x) # "4.6e-09"
Convenience function to perform any rtemis decomposition
decom(x, decom = "PCA", verbose = TRUE, ...)
decom(x, decom = "PCA", verbose = TRUE, ...)
x |
Numeric matrix / data frame: Input data |
decom |
Character: Decomposer name. See ]linkselect_decom. |
verbose |
Logical: if TRUE, print messages to console |
... |
Additional arguments to be passed to |
decom
returns an R6 class object rtDecom
rtDecom
object
E.D. Gennatas
Checks if dependencies can be loaded; names missing dependencies if not.
dependency_check(..., verbose = FALSE)
dependency_check(..., verbose = FALSE)
... |
List or vector of strings defining namespaces to be checked |
verbose |
Logical. If TRUE, print messages to consolde. Note: An error will always printed if dependencies are missing. Setting this to FALSE stops it from printing "Dependencies check passed". |
E.D. Gennatas
Lower a color's saturation by a given percent in the HSV color system
desaturate(color, s = 0.3)
desaturate(color, s = 0.3)
color |
Color, vector: Color(s) to operate on |
s |
Float: Decrease saturation by this fraction. Default = .3, which means if saturation of given color is 1, it will become .7 |
List of adjusted colors
E.D. Gennatas
color <- c("red", "green", "blue") color.p <- desaturate(color)
color <- c("red", "green", "blue") color.p <- desaturate(color)
Describe generic
describe(object, ...)
describe(object, ...)
object |
object to describe |
... |
Additional arguments passed to |
E.D. Gennatas
Move data frame column
df_movecolumn(x, from, to = ncol(x))
df_movecolumn(x, from, to = ncol(x))
x |
data.frame |
from |
String or Integer: Define which column holds the vector you want to move |
to |
Integer: Define which column number you want the vector to be moved to.
Default = |
E.D. Gennatas
mtcars_hp <- df_movecolumn(mtcars, "hp")
mtcars_hp <- df_movecolumn(mtcars, "hp")
Extract rules from RF or GBM model, prune, and remove unnecessary rules using inTrees
distillTreeRules( mod, x, y = NULL, n.trees = NULL, maxdepth = 100, maxDecay = 0.05, typeDecay = 2, verbose = TRUE )
distillTreeRules( mod, x, y = NULL, n.trees = NULL, maxdepth = 100, maxDecay = 0.05, typeDecay = 2, verbose = TRUE )
mod |
A trained RF or GBM model |
x |
The training set features |
y |
The training set outcomes. If NULL, assumed to be last column of |
n.trees |
Integer: Number of trees to extract |
maxdepth |
Integer: Max depth to consider |
maxDecay |
Float: See |
typeDecay |
Integer: See |
verbose |
Logical: If TRUE, print messages to output |
Models must be trained with s_RF or s_GBM
E.D. Gennatas
Plot AddTree trees trained with s_AddTree using data.tree::plot.Node
dplot3_addtree( addtree, col.positive = "#F48024DD", col.negative = "#18A3ACDD", node.col = "#666666", node.shape = "none", node.labels = TRUE, node.labels.pct.pos = NULL, pos.name = NULL, edge.col = "#999999", layout = "dot", rankdir = "TB", splines = "polyline", fontname = "helvetica", bg.color = "#ffffff", overlap = "false", prune = NULL, prune.empty.leaves = TRUE, remove.bad.parents = FALSE )
dplot3_addtree( addtree, col.positive = "#F48024DD", col.negative = "#18A3ACDD", node.col = "#666666", node.shape = "none", node.labels = TRUE, node.labels.pct.pos = NULL, pos.name = NULL, edge.col = "#999999", layout = "dot", rankdir = "TB", splines = "polyline", fontname = "helvetica", bg.color = "#ffffff", overlap = "false", prune = NULL, prune.empty.leaves = TRUE, remove.bad.parents = FALSE )
addtree |
Additive Tree object created by s_AddTree |
col.positive |
Color for outcome positive. |
col.negative |
Color for negative outcome. |
node.col |
Color for non-terminal leaves. |
node.shape |
Character: Node shape, passed to |
node.labels |
Logical: If |
node.labels.pct.pos |
Logical: If |
pos.name |
Character: Name for "positive" outcome. |
edge.col |
Color for edges. |
layout |
Character: Passed to |
rankdir |
Character: Passed to |
splines |
Character: Passed to |
fontname |
Character: Passed to |
bg.color |
Background color. |
overlap |
Character: Passed to |
prune |
Logical: If |
prune.empty.leaves |
Logical: If |
remove.bad.parents |
Logical: If TRUE, remove nodes with no siblings but children and give their children to their parent. |
Edge info and styles have been removed because of problems with DiagrammeR
E.D. Gennatas
Draw interactive barplots using plotly
dplot3_bar( x, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 1, horizontal = FALSE, theme = rtTheme, palette = rtPalette, barmode = c("group", "relative", "stack", "overlay"), group.names = NULL, order.by.val = FALSE, ylim = NULL, hovernames = NULL, feature.names = NULL, font.size = 16, annotate = FALSE, annotate.col = theme$labs.col, legend = NULL, legend.col = NULL, legend.xy = c(1, 1), legend.orientation = "v", legend.xanchor = "left", legend.yanchor = "auto", hline = NULL, hline.col = NULL, hline.width = 1, hline.dash = "solid", hline.annotate = NULL, hline.annotation.x = 1, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), automargin.x = TRUE, automargin.y = TRUE, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, trace = 0, ... )
dplot3_bar( x, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 1, horizontal = FALSE, theme = rtTheme, palette = rtPalette, barmode = c("group", "relative", "stack", "overlay"), group.names = NULL, order.by.val = FALSE, ylim = NULL, hovernames = NULL, feature.names = NULL, font.size = 16, annotate = FALSE, annotate.col = theme$labs.col, legend = NULL, legend.col = NULL, legend.xy = c(1, 1), legend.orientation = "v", legend.xanchor = "left", legend.yanchor = "auto", hline = NULL, hline.col = NULL, hline.width = 1, hline.dash = "solid", hline.annotate = NULL, hline.annotation.x = 1, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), automargin.x = TRUE, automargin.y = TRUE, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, trace = 0, ... )
x |
vector (possibly named), matrix, or data.frame: If matrix or data.frame, rows are groups (can be 1 row), columns are features |
main |
Character: Main plot title. |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
col |
Color, vector: Color for bars. Default NULL, which will draw
colors from |
alpha |
Float (0, 1]: Transparency for bar colors. Default = .8 |
horizontal |
Logical: If TRUE, plot bars horizontally |
theme |
List or Character: Either the output of a |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
barmode |
Character: Type of bar plot to make: "group", "relative", "stack", "overlay". Default = "group". Use "relative" for stacked bars, wich handles negative values correctly, unlike "stack", as of writing. |
group.names |
Character, vector, length = NROW(x): Group names.
Default = NULL, which uses |
order.by.val |
Logical: If TRUE, order bars by increasing value. Only use for single group data. Default = NULL |
ylim |
Float, vector, length 2: y-axis limits. |
hovernames |
Character, vector: Optional character vector to show on hover over each bar. |
feature.names |
Character, vector, length = NCOL(x): Feature names.
Default = NULL, which uses |
font.size |
Float: Font size for all labels. Default = 16 |
annotate |
Logical: If TRUE, annotate stacked bars |
annotate.col |
Color for annotations |
legend |
Logical: If TRUE, draw legend. Default = NULL, and will be turned on if there is more than one feature present |
legend.col |
Color: Legend text color. Default = NULL, determined by theme |
legend.xy |
Numeric, vector, length 2: x and y for plotly's legend |
legend.orientation |
"v" or "h" for vertical or horizontal |
legend.xanchor |
Character: Legend's x anchor: "left", "center", "right", "auto" |
legend.yanchor |
Character: Legend's y anchor: "top", "middle", "bottom", "auto" |
hline |
Float: If defined, draw a horizontal line at this y value. |
hline.col |
Color for |
hline.width |
Float: Width for |
hline.dash |
Character: Type of line to draw: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
hline.annotate |
Character: Text of horizontal line annotation if
|
hline.annotation.x |
Numeric: x position to place annotation with paper as reference. 0: to the left of the plot area; 1: to the right of the plot area |
margin |
Named list: plot margins. |
automargin.x |
Logical: If TRUE, automatically set x-axis amrgins |
automargin.y |
Logical: If TRUE, automatically set y-axis amrgins |
padding |
Integer: N pixels to pad plot. |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" / any output file type supported by plotly and your system |
filename |
Character: Path to file to save static plot. Default = NULL |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
trace |
Integer: The height the number the more diagnostic info is printed to the console |
... |
Additional arguments passed to theme |
E.D. Gennatas
## Not run: dplot3_bar(VADeaths, legend.xy = c(0, 1)) dplot3_bar(VADeaths, legend.xy = c(1, 1), legend.xanchor = "left") # simple individual bars a <- c(4, 7, 2) dplot3_bar(a) # if input is a data.frame, each row is a group and each column is a feature b <- data.frame(x = c(3, 5, 7), y = c(2, 1, 8), z = c(4, 5, 2)) rownames(b) <- c("Jen", "Ben", "Ren") dplot3_bar(b) # stacked dplot3_bar(b, barmode = "stack") ## End(Not run)
## Not run: dplot3_bar(VADeaths, legend.xy = c(0, 1)) dplot3_bar(VADeaths, legend.xy = c(1, 1), legend.xanchor = "left") # simple individual bars a <- c(4, 7, 2) dplot3_bar(a) # if input is a data.frame, each row is a group and each column is a feature b <- data.frame(x = c(3, 5, 7), y = c(2, 1, 8), z = c(4, 5, 2)) rownames(b) <- c("Jen", "Ben", "Ren") dplot3_bar(b) # stacked dplot3_bar(b, barmode = "stack") ## End(Not run)
Draw interactive boxplots or violin plots using plotly
dplot3_box( x, time = NULL, time.bin = c("year", "quarter", "month", "day"), type = c("box", "violin"), group = NULL, x.transform = c("none", "scale", "minmax"), main = NULL, xlab = "", ylab = NULL, col = NULL, alpha = 0.6, bg = NULL, plot.bg = NULL, theme = rtTheme, palette = rtPalette, boxpoints = "outliers", quartilemethod = "linear", xlim = NULL, ylim = NULL, violin.box = TRUE, orientation = "v", annotate_n = FALSE, annotate_n_y = 1, annotate_mean = FALSE, annotate_meansd = FALSE, annotate_meansd_y = 1, annotate.col = theme$labs.col, xnames = NULL, group.lines = FALSE, group.lines.dash = "dot", group.lines.col = NULL, group.lines.alpha = 0.5, labelify = TRUE, order.by.fn = NULL, font.size = 16, ylab.standoff = 18, legend = NULL, legend.col = NULL, legend.xy = NULL, legend.orientation = "v", legend.xanchor = "auto", legend.yanchor = "auto", xaxis.type = "category", cataxis_tickangle = "auto", margin = list(b = 65, l = 65, t = 50, r = 12, pad = 0), automargin.x = TRUE, automargin.y = TRUE, boxgroupgap = NULL, hovertext = NULL, show_n = FALSE, pvals = NULL, htest = "none", htest.compare = 0, htest.y = NULL, htest.annotate = TRUE, htest.annotate.x = 0, htest.annotate.y = -0.065, htest.star.col = theme$labs.col, htest.bracket.col = theme$labs.col, starbracket.pad = c(0.04, 0.05, 0.09), use.plotly.group = FALSE, width = NULL, height = NULL, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_box( x, time = NULL, time.bin = c("year", "quarter", "month", "day"), type = c("box", "violin"), group = NULL, x.transform = c("none", "scale", "minmax"), main = NULL, xlab = "", ylab = NULL, col = NULL, alpha = 0.6, bg = NULL, plot.bg = NULL, theme = rtTheme, palette = rtPalette, boxpoints = "outliers", quartilemethod = "linear", xlim = NULL, ylim = NULL, violin.box = TRUE, orientation = "v", annotate_n = FALSE, annotate_n_y = 1, annotate_mean = FALSE, annotate_meansd = FALSE, annotate_meansd_y = 1, annotate.col = theme$labs.col, xnames = NULL, group.lines = FALSE, group.lines.dash = "dot", group.lines.col = NULL, group.lines.alpha = 0.5, labelify = TRUE, order.by.fn = NULL, font.size = 16, ylab.standoff = 18, legend = NULL, legend.col = NULL, legend.xy = NULL, legend.orientation = "v", legend.xanchor = "auto", legend.yanchor = "auto", xaxis.type = "category", cataxis_tickangle = "auto", margin = list(b = 65, l = 65, t = 50, r = 12, pad = 0), automargin.x = TRUE, automargin.y = TRUE, boxgroupgap = NULL, hovertext = NULL, show_n = FALSE, pvals = NULL, htest = "none", htest.compare = 0, htest.y = NULL, htest.annotate = TRUE, htest.annotate.x = 0, htest.annotate.y = -0.065, htest.star.col = theme$labs.col, htest.bracket.col = theme$labs.col, starbracket.pad = c(0.04, 0.05, 0.09), use.plotly.group = FALSE, width = NULL, height = NULL, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Vector or List of vectors: Input |
time |
Date or date-time vector |
time.bin |
Character: "year", "quarter", "month", or "day". Period to bin by |
type |
Character: "box" or "violin" |
group |
Factor to group by |
x.transform |
Character: "none", "scale", or "minmax" to use raw values, scaled and centered values or min-max normalized to 0-1, respectively. Transform is applied to each variable before grouping, so that groups are comparable |
main |
Character: Plot title. |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
col |
Color, vector: Color for boxes. If NULL, which will draw
colors from |
alpha |
Float (0, 1]: Transparency for box colors. |
bg |
Color: Background color. Default = "white" |
plot.bg |
Color: Background color for plot area. |
theme |
Character: Theme to use: Run |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
boxpoints |
Character or FALSE: "all", "suspectedoutliers", "outliers" See https://plotly.com/r/box-plots/#choosing-the-algorithm-for-computing-quartiles |
quartilemethod |
Character: "linear", "exclusive", "inclusive" |
xlim |
Numeric vector: x-axis limits |
ylim |
Numeric vector: y-axis limits |
violin.box |
Logical: If TRUE and type is "violin" show box within violin plot |
orientation |
Character: "v" or "h" for vertical, horizontal |
annotate_n |
Logical: If TRUE, annotate with N in each box |
annotate_n_y |
Numeric: y position for |
annotate_mean |
Logical: If TRUE, annotate with mean of each box |
annotate_meansd |
Logical: If TRUE, annotate with mean (SD) of each box |
annotate_meansd_y |
Numeric: y position for |
annotate.col |
Color for annotations |
xnames |
Character, vector, length = NROW(x): x-axis names. Default = NULL, which tries to set names appropriately |
group.lines |
Logical: If TRUE, add separating lines between groups of boxplots |
group.lines.dash |
Character: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
group.lines.col |
Color for |
group.lines.alpha |
Numeric: transparency for |
labelify |
Logical: If TRUE, labelify x names |
order.by.fn |
Function: If defined, order boxes by increasing value of this function (e.g. median). |
font.size |
Float: Font size for all labels. |
ylab.standoff |
Numeric: Standoff for y-axis label |
legend |
Logical: If TRUE, draw legend. Default = TRUE |
legend.col |
Color: Legend text color. Default = NULL, determined by the theme |
legend.xy |
Float, vector, length 2: Relative x, y position for legend. |
legend.orientation |
"v" or "h" for vertical, horizontal |
legend.xanchor |
Character: Legend's x anchor: "left", "center", "right", "auto" |
legend.yanchor |
Character: Legend's y anchor: "top", "middle", "bottom", "auto" |
xaxis.type |
Character: "linear", "log", "date", "category", "multicategory" |
cataxis_tickangle |
Numeric: Angle for categorical axis tick labels |
margin |
Named list: plot margins.
Default = |
automargin.x |
Logical: If TRUE, automatically set x-axis amrgins |
automargin.y |
Logical: If TRUE, automatically set y-axis amrgins |
boxgroupgap |
Numeric: Sets the gap (in plot fraction) between boxes of the same location coordinate |
hovertext |
Character vector: Text to show on hover for each data point |
show_n |
Logical: If TRUE, show N in each box |
pvals |
Numeric vector: Precomputed p-values. Should correspond to each box.
Bypasses |
htest |
Character: e.g. "t.test", "wilcox.test" to compare each box to
the first box. If grouped, compare within each group to the first box.
If p-value of test is less than |
htest.compare |
Integer: 0: Compare all distributions against the first one;
2: Compare every second box to the one before it. Requires |
htest.y |
Numeric: y coordinate for |
htest.annotate |
Logical: if TRUE, include htest annotation |
htest.annotate.x |
Numeric: x-axis paper coordinate for htest annotation |
htest.annotate.y |
Numeric: y-axis paper coordinate for htest annotation |
htest.star.col |
Color for htest annotation stars |
htest.bracket.col |
Color for htest annotation brackets |
starbracket.pad |
Numeric: Padding for htest annotation brackets |
use.plotly.group |
If TRUE, use plotly's |
width |
Numeric: Force plot size to this width. Default = NULL, i.e. fill available space |
height |
Numeric: Force plot size to this height. Default = NULL, i.e. fill available space |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" |
filename |
Character: Path to file to save static plot. |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional arguments passed to theme |
For multiple box plots, the recommendation is:
x=dat[, columnindex]
for multiple variables of a data.frame
x=list(a=..., b=..., etc.)
for multiple variables of potentially
different length
x=split(var, group)
for one variable with multiple groups: group names
appear below boxplots
x=dat[, columnindex], group = factor
for grouping multiple variables:
group names appear in legend
If orientation == "h"
, xlab
is applied to y-axis and vice versa.
Similarly, x.axist.type
applies to y-axis - this defaults to
"category" and would not normally need changing.
E.D. Gennatas
## Not run: # A.1 Box plot of 4 variables dplot3_box(iris[, 1:4]) # A.2 Grouped Box plot dplot3_box(iris[, 1:4], group = iris$Species) dplot3_box(iris[, 1:4], group = iris$Species, annotate_n = TRUE) # B. Boxplot binned by time periods # Synthetic data with an instantenous shift in distributions set.seed(2021) dat1 <- data.frame(alpha = rnorm(200, 0), beta = rnorm(200, 2), gamma = rnorm(200, 3)) dat2 <- data.frame(alpha = rnorm(200, 5), beta = rnorm(200, 8), gamma = rnorm(200, -3)) x <- rbind(dat1, dat2) startDate <- as.Date("2019-12-04") endDate <- as.Date("2021-03-31") time <- seq(startDate, endDate, length.out = 400) dplot3_box(x[, 1], time, "year", ylab = "alpha") dplot3_box(x, time, "year", legend.xy = c(0, 1)) dplot3_box(x, time, "quarter", legend.xy = c(0, 1)) dplot3_box(x, time, "month", legend.orientation = "h", legend.xy = c(0, 1), legend.yanchor = "bottom" ) # (Note how the boxplots widen when the period includes data from both dat1 and dat2) ## End(Not run)
## Not run: # A.1 Box plot of 4 variables dplot3_box(iris[, 1:4]) # A.2 Grouped Box plot dplot3_box(iris[, 1:4], group = iris$Species) dplot3_box(iris[, 1:4], group = iris$Species, annotate_n = TRUE) # B. Boxplot binned by time periods # Synthetic data with an instantenous shift in distributions set.seed(2021) dat1 <- data.frame(alpha = rnorm(200, 0), beta = rnorm(200, 2), gamma = rnorm(200, 3)) dat2 <- data.frame(alpha = rnorm(200, 5), beta = rnorm(200, 8), gamma = rnorm(200, -3)) x <- rbind(dat1, dat2) startDate <- as.Date("2019-12-04") endDate <- as.Date("2021-03-31") time <- seq(startDate, endDate, length.out = 400) dplot3_box(x[, 1], time, "year", ylab = "alpha") dplot3_box(x, time, "year", legend.xy = c(0, 1)) dplot3_box(x, time, "quarter", legend.xy = c(0, 1)) dplot3_box(x, time, "month", legend.orientation = "h", legend.xy = c(0, 1), legend.yanchor = "bottom" ) # (Note how the boxplots widen when the period includes data from both dat1 and dat2) ## End(Not run)
Draw calibration plot
dplot3_calibration( true.labels, predicted.prob, n.bins = 10, bin.method = c("quantile", "equidistant"), pos.class = NULL, main = NULL, subtitle = NULL, xlab = "Mean predicted probability", ylab = "Empirical risk", show.marginal.x = TRUE, marginal.x.y = -0.02, marginal.col = NULL, marginal.size = 10, mode = "markers+lines", show.brier = TRUE, theme = rtTheme, filename = NULL, ... )
dplot3_calibration( true.labels, predicted.prob, n.bins = 10, bin.method = c("quantile", "equidistant"), pos.class = NULL, main = NULL, subtitle = NULL, xlab = "Mean predicted probability", ylab = "Empirical risk", show.marginal.x = TRUE, marginal.x.y = -0.02, marginal.col = NULL, marginal.size = 10, mode = "markers+lines", show.brier = TRUE, theme = rtTheme, filename = NULL, ... )
true.labels |
Factor or list of factors with true class labels |
predicted.prob |
Numeric vector or list of numeric vectors with predicted probabilities |
n.bins |
Integer: Number of windows to split the data into |
bin.method |
Character: "quantile" or "equidistant": Method to bin the estimated probabilities. |
pos.class |
Integer: Index of the positive class |
main |
Character: Main title |
subtitle |
Character: Subtitle, placed bottom right of plot |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
show.marginal.x |
Logical: Add marginal plot of distribution of estimated probabilities |
marginal.x.y |
Numeric: Y position of marginal markers on x-axis |
marginal.col |
Color for marginal markers |
marginal.size |
Numeric: Size of marginal markers |
mode |
Character: "lines", "markers", "lines+markers": How to plot. |
show.brier |
Logical: If TRUE, add Brier scores to trace names. |
theme |
List or Character: Either the output of a |
filename |
Character: Path to save output. |
... |
Additional arguments passed to dplot3_xy |
EDG
## Not run: data(segment_logistic, package = "probably") # Plot the calibration curve of the original predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = segment_logistic$.pred_poor, n.bins = 10, pos.class = 2 ) # Plot the calibration curve of the calibrated predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = calibrate( segment_logistic$Class, segment_logistic$.pred_poor )$fitted.values, n.bins = 10, pos.class = 2 ) ## End(Not run)
## Not run: data(segment_logistic, package = "probably") # Plot the calibration curve of the original predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = segment_logistic$.pred_poor, n.bins = 10, pos.class = 2 ) # Plot the calibration curve of the calibrated predictions dplot3_calibration( true.labels = segment_logistic$Class, predicted.prob = calibrate( segment_logistic$Class, segment_logistic$.pred_poor )$fitted.values, n.bins = 10, pos.class = 2 ) ## End(Not run)
rpart
decision treesPlot rpart
decision trees using data.tree::plot.Node
dplot3_cart( object, col.positive = "#F48024DD", col.negative = "#18A3ACDD", col.lo = "#80ffff", col.mid = "gray20", col.hi = "#F4A0FF", node.col = "#666666", node.shape = "none", node.labels = TRUE, node.cond = TRUE, node.prob = TRUE, node.estimate = NULL, node.n = TRUE, edge.col = "#999999", edge.width = 2, edge.labels = FALSE, arrowhead = "vee", layout = "dot", drop.leaves = FALSE, rankdir = "TB", splines = "polyline", fontname = "helvetica", bg.color = "white", overlap = "false", prune = FALSE, rpart.cp = NULL, verbose = TRUE )
dplot3_cart( object, col.positive = "#F48024DD", col.negative = "#18A3ACDD", col.lo = "#80ffff", col.mid = "gray20", col.hi = "#F4A0FF", node.col = "#666666", node.shape = "none", node.labels = TRUE, node.cond = TRUE, node.prob = TRUE, node.estimate = NULL, node.n = TRUE, edge.col = "#999999", edge.width = 2, edge.labels = FALSE, arrowhead = "vee", layout = "dot", drop.leaves = FALSE, rankdir = "TB", splines = "polyline", fontname = "helvetica", bg.color = "white", overlap = "false", prune = FALSE, rpart.cp = NULL, verbose = TRUE )
object |
Either |
col.positive |
Color for outcome positive. |
col.negative |
Color for negative outcome. |
col.lo |
Low color for estimated outcome |
col.mid |
Middle color for estimated outcome |
col.hi |
High color for estimated outcome |
node.col |
Color for non-terminal leaves. |
node.shape |
Shape of node. Default = "none" |
node.labels |
Logical: If TRUE, print the node labels. |
node.cond |
Logical: If TRUE, print the splitting condition inside each node. |
node.prob |
Logical: If TRUE, print the probability estimate for the first class of the outcome inside each node. |
node.estimate |
Logical: If TRUE, print the estimated outcome level inside each node. |
node.n |
Logical: If TRUE, print the number of cases (from training data) that matched this condition |
edge.col |
Color for edges. |
edge.width |
Width of edges. |
edge.labels |
Logical: If TRUE, print the splitting condition on the edge. |
arrowhead |
Character: Arrowhead shape. |
layout |
Character: Passed to |
drop.leaves |
Logical: If TRUE, position leaves at the bottom of the plot. |
rankdir |
Character: Passed to |
splines |
Character: Passed to |
fontname |
Character: Passed to |
bg.color |
Background color. |
overlap |
Character: Passed to |
prune |
Logical: If TRUE, prune tree using |
rpart.cp |
Numeric: Complexity parameter for pruning. If NULL, no pruning is performed. |
verbose |
Logical: If TRUE, print messages. |
If you want to show split conditions as edge labels (edge.labels = TRUE
),
it is recommened to set rankdir = "LR"
and node.cond = FALSE
.
Edge labels in graphviz are shown to the right of
the edge when rankdir = "TB"
and above when rankdir = "LR"
.
E.D. Gennatas
Plot confusion matrix
dplot3_conf( x, true.col = "#72CDF4", false.col = "#FEB2E0", pos.class = rtenv$binclasspos, font.size = 18, main = NULL, main.y = 1, main.yanchor = "bottom", theme = rtTheme, margin = list(l = 20, r = 5, b = 5, t = 20), filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_conf( x, true.col = "#72CDF4", false.col = "#FEB2E0", pos.class = rtenv$binclasspos, font.size = 18, main = NULL, main.y = 1, main.yanchor = "bottom", theme = rtTheme, margin = list(l = 20, r = 5, b = 5, t = 20), filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Confusion matrix where rows are the reference and columns are the estimated classes or
rtemis |
true.col |
Color for true positives & true negatives |
false.col |
Color for false positives & false negatives |
pos.class |
Integer: Index of factor level to treat as the positive class |
font.size |
Integer: font size |
main |
Character: plot title |
main.y |
Numeric: y position of the title |
main.yanchor |
Character: y anchor of the title |
theme |
List or Character: Either the output of a |
margin |
List: Plot margins |
filename |
Character: Path to file to save static plot. |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional arguments passed to theme function. |
A plotly object
EDG
## Not run: true <- factor(c("a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b")) predicted <- factor(c("a", "a", "b", "a", "b", "b", "a", "a", "b", "b", "a", "a")) predicted.prob <- c(0.7, 0.55, 0.45, 0.62, 0.41, 0.32, 0.59, .63, .32, .21, .52, .58) error <- mod_error(true, predicted, predicted.prob) dplot3_conf(error) ## End(Not run)
## Not run: true <- factor(c("a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b")) predicted <- factor(c("a", "a", "b", "a", "b", "b", "a", "a", "b", "b", "a", "a")) predicted.prob <- c(0.7, 0.55, 0.45, 0.62, 0.41, 0.32, 0.59, .63, .32, .21, .52, .58) error <- mod_error(true, predicted, predicted.prob) dplot3_conf(error) ## End(Not run)
A dplot3_xy
wrapper for plotting true vs. predicted values
dplot3_fit(x, y, fit = "gam", se.fit = TRUE, ...)
dplot3_fit(x, y, fit = "gam", se.fit = TRUE, ...)
x |
Numeric, vector/data.frame/list: True values. If y is NULL and
|
y |
Numeric, vector/data.frame/list: Predicted values |
fit |
Character: rtemis model to calculate |
se.fit |
Logical: If TRUE, draw the standard error of the fit |
... |
Additional arguments passed to dplot3_xy |
EDG
## Not run: x <- rnorm(500) y <- x + rnorm(500) dplot3_fit(x, y) ## End(Not run)
## Not run: x <- rnorm(500) y <- x + rnorm(500) dplot3_fit(x, y) ## End(Not run)
Plot graph using networkD3
dplot3_graphd3( net, groups = NULL, color.scale = NULL, edge.col = NULL, node.col = NULL, node.alpha = 0.5, edge.alpha = 0.33, zoom = TRUE, legend = FALSE, palette = rtPalette, theme = rtTheme, ... )
dplot3_graphd3( net, groups = NULL, color.scale = NULL, edge.col = NULL, node.col = NULL, node.alpha = 0.5, edge.alpha = 0.33, zoom = TRUE, legend = FALSE, palette = rtPalette, theme = rtTheme, ... )
net |
igraph network |
groups |
Vector, length n nodes indicating group/cluster/community membership of nodes in
|
color.scale |
D3 colorscale (e.g. |
edge.col |
Color for edges |
node.col |
Color for nodes |
node.alpha |
Float [0, 1]: Node opacity. Default = .5 |
edge.alpha |
Float [0, 1]: Edge opacity. Default = .33 |
zoom |
Logical: If TRUE, graph is zoomable. Default = TRUE |
legend |
Logical: If TRUE, display legend for groups |
palette |
Vector of colors, or Character defining a builtin palette - get options with
|
theme |
rtemis theme to use |
... |
Additional arguments to pass to |
E.D. Gennatas
Interactive plotting of an igraph net using threejs
dplot3_graphjs( net, vertex.size = 1, vertex.col = NULL, vertex.label.col = NULL, vertex.label.alpha = 0.66, vertex.frame.col = NA, vertex.label = NULL, vertex.shape = "circle", edge.col = NULL, edge.alpha = 0.5, edge.curved = 0.35, edge.width = 2, layout = c("fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"), coords = NULL, layout_params = list(), cluster = NULL, groups = NULL, cluster_params = list(), cluster_mark_groups = TRUE, cluster_color_vertices = FALSE, main = "", theme = rtTheme, theme_extra_args = list(), palette = rtPalette, mar = rep(0, 4), par.reset = TRUE, filename = NULL, verbose = TRUE, ... )
dplot3_graphjs( net, vertex.size = 1, vertex.col = NULL, vertex.label.col = NULL, vertex.label.alpha = 0.66, vertex.frame.col = NA, vertex.label = NULL, vertex.shape = "circle", edge.col = NULL, edge.alpha = 0.5, edge.curved = 0.35, edge.width = 2, layout = c("fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"), coords = NULL, layout_params = list(), cluster = NULL, groups = NULL, cluster_params = list(), cluster_mark_groups = TRUE, cluster_color_vertices = FALSE, main = "", theme = rtTheme, theme_extra_args = list(), palette = rtPalette, mar = rep(0, 4), par.reset = TRUE, filename = NULL, verbose = TRUE, ... )
net |
|
vertex.size |
Numeric: Vertex size |
vertex.col |
Color for vertices |
vertex.label.col |
Color for vertex labels |
vertex.label.alpha |
Numeric: transparency for |
vertex.frame.col |
Color for vertex border (frame) |
vertex.label |
Character vector: Vertex labels. Default = NULL, which will keep existing
names in |
vertex.shape |
Character, vector, length 1 or N nodes: Vertex shape.
See |
edge.col |
Color for edges |
edge.alpha |
Numeric: Transparency for edges |
edge.curved |
Numeric: Curvature of edges. Default = .35 |
edge.width |
Numeric: Edge thickness |
layout |
Character: one of: "fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama", corresponding to all the available layouts in igraph |
coords |
Output of precomputed igraph layout. If provided,
|
layout_params |
List of parameters to pass to |
cluster |
Character: one of: "edge_betweenness", "fast_greedy", "infomap", "label_prop", "leading_eigen", "louvain", "optimal", "spinglass", "walktrap", corresponding to all the available igraph clustering functions |
groups |
Output of precomputed igraph clustering. If provided,
|
cluster_params |
List of parameters to pass to |
cluster_mark_groups |
Logical: If TRUE, draw polygons to indicate
clusters, if |
cluster_color_vertices |
Logical: If TRUE, color vertices by cluster membership |
main |
Character: main title |
theme |
rtemis theme to use |
theme_extra_args |
List of extra arguments to pass to the theme function
defined by |
palette |
Color vector or name of rtemis palette |
mar |
Numeric vector, length 4: |
par.reset |
Logical: If TRUE, reset par before exiting. Default = TRUE |
filename |
Character: If provided, save plot to this filepath |
verbose |
Logical, If TRUE, print messages to console. Default = TRUE |
... |
Extra arguments to pass to |
E.D. Gennatas
Draw interactive heatmaps using heatmaply
dplot3_heatmap( x, Rowv = TRUE, Colv = TRUE, cluster = FALSE, symm = FALSE, cellnote = NULL, colorGrad.n = 101, colors = NULL, space = "rgb", lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", k_row = 1, k_col = 1, grid.gap = 0, limits = NULL, margins = NULL, main = NULL, xlab = NULL, ylab = NULL, key.title = NULL, showticklabels = NULL, colorbar_len = 0.7, plot_method = "plotly", theme = rtTheme, row_side_colors = NULL, row_side_palette = NULL, col_side_colors = NULL, col_side_palette = NULL, font.size = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_heatmap( x, Rowv = TRUE, Colv = TRUE, cluster = FALSE, symm = FALSE, cellnote = NULL, colorGrad.n = 101, colors = NULL, space = "rgb", lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", k_row = 1, k_col = 1, grid.gap = 0, limits = NULL, margins = NULL, main = NULL, xlab = NULL, ylab = NULL, key.title = NULL, showticklabels = NULL, colorbar_len = 0.7, plot_method = "plotly", theme = rtTheme, row_side_colors = NULL, row_side_palette = NULL, col_side_colors = NULL, col_side_palette = NULL, font.size = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Input matrix |
Rowv |
Logical or dendrogram.
If Logical: Compute dendrogram and reorder rows. Defaults to FALSE
If dendrogram: use as is, without reordering
See more at |
Colv |
Logical or dendrogram.
If Logical: Compute dendrogram and reorder columns. Defaults to FALSE
If dendrogram: use as is, without reordering
See more at |
cluster |
Logical: If TRUE, set |
symm |
Logical: If TRUE, treat |
cellnote |
Matrix with values to be displayed on hover. Defaults to |
colorGrad.n |
Integer: Number of distinct colors to generate using colorGrad. Default = 101 |
colors |
Character: Acts as a shortcut to defining |
space |
Character: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb".
Recommendation: If |
lo |
Color for low end |
lomid |
Color for low-mid |
mid |
Color for middle of the range or "mean", which will result in |
midhi |
Color for middle-high |
hi |
Color for high end |
k_row |
Integer: Number of desired number of groups by which to color dendrogram branches in the rows.
Default = NA (determined automatically). See |
k_col |
Integer: Number of desired number of groups by which to color dendrogram branches in the columns.
Default = NA (determined automatically). See |
grid.gap |
Integer: Space between cells. Default = 0 (no space) |
limits |
Float, length 2: Determine color range. Default = NULL, which automatically centers values around 0 |
margins |
Float, length 4: Heatmap margins. |
main |
Character: Plot title |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
key.title |
Character: Title for the color key. |
showticklabels |
Logical: If TRUE, show tick labels. |
colorbar_len |
Numeric: Length of the colorbar. |
plot_method |
Character: Update February 2021: "ggplot" causes R session to hang on MacOS but "plotly" seems to work |
theme |
Character: "light", "dark" |
row_side_colors |
Data frame: Column names will be label names, cells
should be label colors. See |
row_side_palette |
Color palette function:
See |
col_side_colors |
Data frame: Column names will be label names, cells |
col_side_palette |
Color palette function:
See |
font.size |
Numeric: Font size |
padding |
Numeric: Padding between cells |
displayModeBar |
Logical: If TRUE, display the plotly mode bar |
modeBar.file.format |
Character: File format for image exports from the mode bar |
filename |
String (Optional: Path to file to save colorbar |
file.width |
Numeric: Width of exported image |
file.height |
Numeric: Height of exported image |
file.scale |
Numeric: Scale of exported image |
... |
Additional arguments to be passed to |
E.D. Gennatas
## Not run: x <- rnormmat(200, 20) xcor <- cor(x) dplot3_heatmap(xcor) ## End(Not run)
## Not run: x <- rnormmat(200, 20) xcor <- cor(x) dplot3_heatmap(xcor) ## End(Not run)
Plot interactive choropleth map using leaflet
dplot3_leaflet( fips, values, names = NULL, fillOpacity = 1, palette = NULL, color.mapping = c("Numeric", "Bin"), col.lo = "#0290EE", col.hi = "#FE4AA3", col.na = "#303030", col.highlight = "#FE8A4F", col.interpolate = c("linear", "spline"), col.bins = 21, domain = NULL, weight = 0.5, color = "black", alpha = 1, bg.tile.provider = leaflet::providers$Stamen.TonerBackground, bg.tile.alpha = 0.67, fg.tile.provider = leaflet::providers$Stamen.TonerLabels, legend.position = c("topright", "bottomright", "bottomleft", "topleft"), legend.alpha = 0.8, legend.title = NULL, init.lng = -98.5418083333333, init.lat = 39.2074138888889, init.zoom = 3, stroke = TRUE )
dplot3_leaflet( fips, values, names = NULL, fillOpacity = 1, palette = NULL, color.mapping = c("Numeric", "Bin"), col.lo = "#0290EE", col.hi = "#FE4AA3", col.na = "#303030", col.highlight = "#FE8A4F", col.interpolate = c("linear", "spline"), col.bins = 21, domain = NULL, weight = 0.5, color = "black", alpha = 1, bg.tile.provider = leaflet::providers$Stamen.TonerBackground, bg.tile.alpha = 0.67, fg.tile.provider = leaflet::providers$Stamen.TonerLabels, legend.position = c("topright", "bottomright", "bottomleft", "topleft"), legend.alpha = 0.8, legend.title = NULL, init.lng = -98.5418083333333, init.lat = 39.2074138888889, init.zoom = 3, stroke = TRUE )
fips |
Character vector of FIPS codes. (If numeric, it will be appropriately zero-padded) |
values |
Values to map to |
names |
Character vector: Optional county names to appear on hover
along |
fillOpacity |
Float: Opacity for fill colors. Default = 1 |
palette |
Character: Color palette to use |
color.mapping |
Character: "Numeric" or "Bin" |
col.lo |
Overlay color mapped to lowest value |
col.hi |
Overaly color mapped to highest value |
col.na |
Color mappes to NA values |
col.highlight |
Hover border color. Default = "#FE8A4F" (orange) |
col.interpolate |
Character: "linear" or "spline" |
col.bins |
Integer: Number of color bins to create if
|
domain |
Limits for mapping colors to values. Default = NULL and set to range |
weight |
Float: Weight of county border lines. Default = .5 |
color |
Color of county border lines. Default = "black" |
alpha |
Float: Overaly transparency. Default = 1 |
bg.tile.provider |
Background tile (below overlay colors), one of
|
bg.tile.alpha |
Float: Background tile transparency. Default = .67 |
fg.tile.provider |
Foreground tile (above overlay colors), one of
|
legend.position |
Character: One of: "topright", "bottomright", "bottomleft", "topleft". Default = "topright" |
legend.alpha |
Float: Legend box transparency. Default = .8 |
legend.title |
Character: Defaults to name of |
init.lng |
Float: Center map around this longitude (in decimal form). Default = -98.54180833333334 (US geographic center) |
init.lat |
Float: Center map around this latitude (in decimal form). Default = 39.207413888888894 (US geographic center) |
init.zoom |
Integer: Initial zoom level (depends on device, i.e. window, size). Default = 3 |
stroke |
Logical: If TRUE, draw polygon borders. Default = TRUE |
E.D. Gennatas
## Not run: fips <- c(06075, 42101) population <- c(874961, 1579000) names <- c("SF", "Philly") dplot3_leaflet(fips, supervals, names) ## End(Not run)
## Not run: fips <- c(06075, 42101) population <- c(874961, 1579000) names <- c("SF", "Philly") dplot3_leaflet(fips, supervals, names) ## End(Not run)
Plot a Linear Additive Tree trained by s_LINAD using visNetwork
dplot3_linad( x, main = NULL, bg = "#FFFFFF", shape = "box", nodelabels = TRUE, ncases.inlabels = TRUE, rules.on.edges = FALSE, top = NULL, root.col = "#202020", node.col = "#5a5a5a", leaf.col = "#178CCB", edge.col = "#848484", edge.width = 4, arrow.scale = 0.7, arrow.middle = FALSE, col.highlight = "#FE4AA3", node.font.col = NULL, edge.font.col = "#000000", sort.coefs = FALSE, height = NULL, width = NULL, levelSeparation = 100, tree.font.size = 22, edgethickness.by.ncases = FALSE, font.family = "Lato", uselog = FALSE, tooltip.coefs = TRUE, tooltip.delay = 50, table.font.size = "16px", table.dat.padding = "0px", table.lo.col = "#0290EE", table.hi.col = "#FE4AA3", dragNodes = FALSE, zoomView = FALSE, nodeSpacing = 150, blockShifting = TRUE, edgeMinimization = TRUE, parentCentralization = TRUE, direction = "UD", trace = 0 )
dplot3_linad( x, main = NULL, bg = "#FFFFFF", shape = "box", nodelabels = TRUE, ncases.inlabels = TRUE, rules.on.edges = FALSE, top = NULL, root.col = "#202020", node.col = "#5a5a5a", leaf.col = "#178CCB", edge.col = "#848484", edge.width = 4, arrow.scale = 0.7, arrow.middle = FALSE, col.highlight = "#FE4AA3", node.font.col = NULL, edge.font.col = "#000000", sort.coefs = FALSE, height = NULL, width = NULL, levelSeparation = 100, tree.font.size = 22, edgethickness.by.ncases = FALSE, font.family = "Lato", uselog = FALSE, tooltip.coefs = TRUE, tooltip.delay = 50, table.font.size = "16px", table.dat.padding = "0px", table.lo.col = "#0290EE", table.hi.col = "#FE4AA3", dragNodes = FALSE, zoomView = FALSE, nodeSpacing = 150, blockShifting = TRUE, edgeMinimization = TRUE, parentCentralization = TRUE, direction = "UD", trace = 0 )
x |
|
main |
Character: Title. |
bg |
Background color. |
shape |
Character: Node shape; one of: "square", "triangle", "box", "circle", "dot", "star", "ellipse", "database", "text", "diamond". |
nodelabels |
Logical: If TRUE, inlcude node labels. |
ncases.inlabels |
Logical: If TRUE, include number of cases with the node labels. |
rules.on.edges |
Logical: If TRUE, display rules on edges instead of nodes. |
top |
Integer: If not NULL, only show the top |
root.col |
Color for root node. |
node.col |
Color for nodes. |
leaf.col |
Color for leaf nodes. |
edge.col |
Color for edges. |
edge.width |
Numeric: Width for edges. |
arrow.scale |
Numeric: Scale factor for arrows. |
arrow.middle |
Logical: If TRUE, draw arrows in the middle of edges. |
col.highlight |
Color for surrounding edges when node is selected. |
node.font.col |
Color for node labels. Default varies by |
edge.font.col |
Color for edge labels. |
sort.coefs |
Logical: If TRUE, sort each coefs table. |
height |
Numeric: Height for |
width |
Numeric: Width for |
levelSeparation |
Numeric: N of pixels to separate tree levels. |
tree.font.size |
Integer: Font size for tree labels. Default = 22 |
edgethickness.by.ncases |
Logical: If TRUE, scale edge thickness by number of cases with weight = 1 |
font.family |
Character: Font to use throughout. Default = 'Helvetica Neue', because otherwise it may fail on a number of external viewers. |
uselog |
Logical: If TRUE, use log10 scale for coefficient colors. |
tooltip.coefs |
Logical: If TRUE, show html coefficient tables on hover over nodes. This was placed here before a custom html table creation function was made to replace some impossibly slow alternatives. |
tooltip.delay |
Numeric: Delay (in milliseconds) on mouse over before showing tooltip. |
table.font.size |
Character: Font size for html coefficient on-hover tables. |
table.dat.padding |
Ignore, has no visible effect. Otherwise, Character: html table padding. |
table.lo.col |
Color for lowest coefficient values (negative) |
table.hi.col |
Color for highest coefficient values (positive). |
dragNodes |
Logical: If TRUE, allow dragging nodes. |
zoomView |
Logical: If TRUE, allow zooming. |
nodeSpacing |
Numeric: Spacing between nodes. |
blockShifting |
Logical: If TRUE, allow block shifting. |
edgeMinimization |
Logical: If TRUE, minimize edge length. |
parentCentralization |
Logical: If TRUE, centralize parent nodes. |
direction |
Character: Direction of tree. One of: "UD", "DU", "LR", "RL". |
trace |
Integer: If > 0, print info to console (not particularly informative). |
E.D. Gennatas
Draw interactive pie charts using plotly
dplot3_pie( x, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 0.8, bg = NULL, plot.bg = NULL, theme = getOption("rt.theme", "black"), palette = rtPalette, category.names = NULL, textinfo = "label+percent", font.size = 16, labs.col = NULL, legend = TRUE, legend.col = NULL, sep.col = NULL, margin = list(b = 50, l = 50, t = 50, r = 20), padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_pie( x, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 0.8, bg = NULL, plot.bg = NULL, theme = getOption("rt.theme", "black"), palette = rtPalette, category.names = NULL, textinfo = "label+percent", font.size = 16, labs.col = NULL, legend = TRUE, legend.col = NULL, sep.col = NULL, margin = list(b = 50, l = 50, t = 50, r = 20), padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
data.frame: Input: Either a) 1 numeric column with categories defined by rownames, or
b) two columns, the first is category names, the second numeric or c) a numeric vector with categories defined using
the |
main |
Character: Plot title. Default = NULL, which results in colnames(x)[1], |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
col |
Color, vector: Color for bars. Default NULL, which will draw
colors from |
alpha |
Float (0, 1]: Transparency for bar colors. Default = .8 |
bg |
Background color |
plot.bg |
Plot background color |
theme |
Character: "light", "dark". Default = |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
category.names |
Character, vector, length = NROW(x): Category names. Default = NULL, which uses
either |
textinfo |
Character: Info to show over each slince: "label", "percent", "label+percent" Default = "label+percent" |
font.size |
Float: Font size for all labels. Default = 16 |
labs.col |
Color of labels |
legend |
Logical: If TRUE, draw legend. Default = NULL, and will be turned on if there is more than one feature present |
legend.col |
Color: Legend text color. Default = NULL, determined by theme |
sep.col |
Separator color |
margin |
Named list: plot margins. |
padding |
Integer: N pixels to pad plot. |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" / any output file type supported by plotly and your system |
filename |
Character: Path to file to save static plot. Default = NULL |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional arguments passed to theme |
E.D. Gennatas
## Not run: dplot3_pie(VADeaths[, 1, drop = F]) ## End(Not run)
## Not run: dplot3_pie(VADeaths[, 1, drop = F]) ## End(Not run)
Plot the amino acid sequence with annotations
dplot3_protein( x, site = NULL, region = NULL, ptm = NULL, clv = NULL, variant = NULL, disease.variants = NULL, n.per.row = NULL, main = NULL, main.xy = c(0.055, 0.975), main.xref = "paper", main.yref = "paper", main.xanchor = "middle", main.yanchor = "top", layout = c("simple", "grid", "1curve", "2curve"), show.markers = TRUE, show.labels = TRUE, font.size = 18, label.col = NULL, scatter.mode = "markers+lines", marker.size = 28, marker.col = NULL, marker.alpha = 1, marker.symbol = "circle", line.col = NULL, line.alpha = 1, line.width = 2, show.full.names = TRUE, region.scatter.mode = "markers+lines", region.style = 3, region.marker.size = marker.size, region.marker.alpha = 0.6, region.marker.symbol = "circle", region.line.dash = "solid", region.line.shape = "line", region.line.smoothing = 1, region.line.width = 1, region.line.alpha = 0.6, theme = rtTheme, region.palette = rtPalette, region.outline.only = FALSE, region.outline.pad = 2, region.pad = 0.35, region.fill.alpha = 0.1666666, region.fill.shape = "line", region.fill.smoothing = 1, bpadcx = 0.5, bpadcy = 0.5, site.marker.size = marker.size, site.marker.symbol = marker.symbol, site.marker.alpha = 1, site.border.width = 1.5, site.palette = rtPalette, variant.col = "#FA6E1E", disease.variant.col = "#E266AE", showlegend.ptm = TRUE, ptm.col = NULL, ptm.symbol = "circle", ptm.offset = 0.12, ptm.pad = 0.35, ptm.marker.size = marker.size/4.5, clv.col = NULL, clv.symbol = "triangle-down", clv.offset = 0.12, clv.pad = 0.35, clv.marker.size = marker.size/4, annotate.position.every = 10, annotate.position.alpha = 0.5, annotate.position.ay = -0.4 * marker.size, position.font.size = font.size - 6, legend.xy = c(0.97, 0.954), legend.xanchor = "left", legend.yanchor = "top", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, margin = list(b = 0, l = 0, t = 0, r = 0, pad = 0), showgrid.x = FALSE, showgrid.y = FALSE, automargin.x = TRUE, automargin.y = TRUE, xaxis.autorange = TRUE, yaxis.autorange = "reversed", scaleanchor.y = "x", scaleratio.y = 1, hoverlabel.align = "left", displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 1320, file.height = 990, file.scale = 1, width = NULL, height = NULL, verbosity = 1, ... )
dplot3_protein( x, site = NULL, region = NULL, ptm = NULL, clv = NULL, variant = NULL, disease.variants = NULL, n.per.row = NULL, main = NULL, main.xy = c(0.055, 0.975), main.xref = "paper", main.yref = "paper", main.xanchor = "middle", main.yanchor = "top", layout = c("simple", "grid", "1curve", "2curve"), show.markers = TRUE, show.labels = TRUE, font.size = 18, label.col = NULL, scatter.mode = "markers+lines", marker.size = 28, marker.col = NULL, marker.alpha = 1, marker.symbol = "circle", line.col = NULL, line.alpha = 1, line.width = 2, show.full.names = TRUE, region.scatter.mode = "markers+lines", region.style = 3, region.marker.size = marker.size, region.marker.alpha = 0.6, region.marker.symbol = "circle", region.line.dash = "solid", region.line.shape = "line", region.line.smoothing = 1, region.line.width = 1, region.line.alpha = 0.6, theme = rtTheme, region.palette = rtPalette, region.outline.only = FALSE, region.outline.pad = 2, region.pad = 0.35, region.fill.alpha = 0.1666666, region.fill.shape = "line", region.fill.smoothing = 1, bpadcx = 0.5, bpadcy = 0.5, site.marker.size = marker.size, site.marker.symbol = marker.symbol, site.marker.alpha = 1, site.border.width = 1.5, site.palette = rtPalette, variant.col = "#FA6E1E", disease.variant.col = "#E266AE", showlegend.ptm = TRUE, ptm.col = NULL, ptm.symbol = "circle", ptm.offset = 0.12, ptm.pad = 0.35, ptm.marker.size = marker.size/4.5, clv.col = NULL, clv.symbol = "triangle-down", clv.offset = 0.12, clv.pad = 0.35, clv.marker.size = marker.size/4, annotate.position.every = 10, annotate.position.alpha = 0.5, annotate.position.ay = -0.4 * marker.size, position.font.size = font.size - 6, legend.xy = c(0.97, 0.954), legend.xanchor = "left", legend.yanchor = "top", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, margin = list(b = 0, l = 0, t = 0, r = 0, pad = 0), showgrid.x = FALSE, showgrid.y = FALSE, automargin.x = TRUE, automargin.y = TRUE, xaxis.autorange = TRUE, yaxis.autorange = "reversed", scaleanchor.y = "x", scaleratio.y = 1, hoverlabel.align = "left", displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 1320, file.height = 990, file.scale = 1, width = NULL, height = NULL, verbosity = 1, ... )
x |
Character vector: amino acid sequence (1-letter abbreviations) OR
|
site |
Named list of lists with indices of sites. These will be highlighted by coloring the border of markers |
region |
Named list of lists with indices of regions. These will be
highlighted by coloring the markers and lines of regions using the
|
ptm |
List of post-translational modifications |
clv |
List of cleavage sites |
variant |
List of variant information |
disease.variants |
List of disease variant information |
n.per.row |
Integer: Number of amino acids to show per row |
main |
Character: Main title |
main.xy |
Numeric vector, length 2: x and y coordinates for title.
e.g. if |
main.xref |
Character: xref for title |
main.yref |
Character: yref for title |
main.xanchor |
Character: xanchor for title |
main.yanchor |
Character: yanchor for title |
layout |
Character: "1curve", "grid": type of layout to use |
show.markers |
Logical: If TRUE, show amino acid markers |
show.labels |
Logical: If TRUE, annotate amino acids with elements |
font.size |
Integer: Font size for labels |
label.col |
Color for labels |
scatter.mode |
Character: Mode for scatter plot |
marker.size |
Integer: Size of markers |
marker.col |
Color for markers |
marker.alpha |
Numeric: Alpha for markers |
marker.symbol |
Character: Symbol for markers |
line.col |
Color for lines |
line.alpha |
Numeric: Alpha for lines |
line.width |
Numeric: Width for lines |
show.full.names |
Logical: If TRUE, show full names of amino acids |
region.scatter.mode |
Character: Mode for scatter plot |
region.style |
Integer: Style for regions |
region.marker.size |
Integer: Size of region markers |
region.marker.alpha |
Numeric: Alpha for region markers |
region.marker.symbol |
Character: Symbol for region markers |
region.line.dash |
Character: Dash for region lines |
region.line.shape |
Character: Shape for region lines |
region.line.smoothing |
Numeric: Smoothing for region lines |
region.line.width |
Numeric: Width for region lines |
region.line.alpha |
Numeric: Alpha for region lines |
theme |
Character: Theme to use: Run |
region.palette |
Named list of colors for regions |
region.outline.only |
Logical: If TRUE, only show outline of regions |
region.outline.pad |
Numeric: Padding for region outline |
region.pad |
Numeric: Padding for region |
region.fill.alpha |
Numeric: Alpha for region fill |
region.fill.shape |
Character: Shape for region fill |
region.fill.smoothing |
Numeric: Smoothing for region fill |
bpadcx |
Numeric: Padding for region border |
bpadcy |
Numeric: Padding for region border |
site.marker.size |
Integer: Size of site markers |
site.marker.symbol |
Character: Symbol for site markers |
site.marker.alpha |
Numeric: Alpha for site markers |
site.border.width |
Numeric: Width for site borders |
site.palette |
Named list of colors for sites |
variant.col |
Color for variants |
disease.variant.col |
Color for disease variants |
showlegend.ptm |
Logical: If TRUE, show legend for PTMs |
ptm.col |
Named list of colors for PTMs |
ptm.symbol |
Character: Symbol for PTMs |
ptm.offset |
Numeric: Offset for PTMs |
ptm.pad |
Numeric: Padding for PTMs |
ptm.marker.size |
Integer: Size of PTM markers |
clv.col |
Color for cleavage site annotations |
clv.symbol |
Character: Symbol for cleavage site annotations |
clv.offset |
Numeric: Offset for cleavage site annotations |
clv.pad |
Numeric: Padding for cleavage site annotations |
clv.marker.size |
Integer: Size of cleavage site annotation markers |
annotate.position.every |
Integer: Annotate every nth position |
annotate.position.alpha |
Numeric: Alpha for position annotations |
annotate.position.ay |
Numeric: Y offset for position annotations |
position.font.size |
Integer: Font size for position annotations |
legend.xy |
Numeric vector, length 2: x and y coordinates for legend |
legend.xanchor |
Character: xanchor for legend |
legend.yanchor |
Character: yanchor for legend |
legend.orientation |
Character: Orientation for legend |
legend.col |
Color for legend |
legend.bg |
Color for legend background |
legend.border.col |
Color for legend border |
legend.borderwidth |
Numeric: Width for legend border |
legend.group.gap |
Numeric: Gap between legend groups |
margin |
List: Margin settings |
showgrid.x |
Logical: If TRUE, show x grid |
showgrid.y |
Logical: If TRUE, show y grid |
automargin.x |
Logical: If TRUE, use automatic margin for x axis |
automargin.y |
Logical: If TRUE, use automatic margin for y axis |
xaxis.autorange |
Logical: If TRUE, use automatic range for x axis |
yaxis.autorange |
Character: If TRUE, use automatic range for y axis |
scaleanchor.y |
Character: Scale anchor for y axis |
scaleratio.y |
Numeric: Scale ratio for y axis |
hoverlabel.align |
Character: Alignment for hover label |
displayModeBar |
Logical: If TRUE, display mode bar |
modeBar.file.format |
Character: File format for mode bar |
scrollZoom |
Logical: If TRUE, enable scroll zoom |
filename |
Character: File name to save plot |
file.width |
Integer: Width for saved file |
file.height |
Integer: Height for saved file |
file.scale |
Numeric: Scale for saved file |
width |
Integer: Width for plot |
height |
Integer: Height for plot |
verbosity |
Integer: If > 0, print messages to console. If > 1, print trace messages |
... |
Additional arguments to pass to the theme function |
A plotly object
E.D. Gennatas
## Not run: tau <- seqinr::read.fasta("https://rest.uniprot.org/uniprotkb/P10636.fasta", seqtype = "AA" ) dplot3_protein(as.character(tau[[1]])) # or directly using the UniProt accession number: dplot3_protein("P10636") ## End(Not run)
## Not run: tau <- seqinr::read.fasta("https://rest.uniprot.org/uniprotkb/P10636.fasta", seqtype = "AA" ) dplot3_protein(as.character(tau[[1]])) # or directly using the UniProt accession number: dplot3_protein("P10636") ## End(Not run)
Plot 1 - p-values as a barplot
dplot3_pvals( x, xnames = NULL, yname = NULL, p.adjust.method = "none", pval.hline = 0.05, hline.col = "#FE4AA3", hline.dash = "dash", ... )
dplot3_pvals( x, xnames = NULL, yname = NULL, p.adjust.method = "none", pval.hline = 0.05, hline.col = "#FE4AA3", hline.dash = "dash", ... )
x |
Float, vector: p-values |
xnames |
Character, vector: feature names |
yname |
Character: outcome name |
p.adjust.method |
Character: method for p.adjust. Default = "none" |
pval.hline |
Float: Significance level at which to plot horizontal line. Default = .05 |
hline.col |
Color for |
hline.dash |
Character: type of line to draw. Default = "dash" |
... |
Additional arguments passed to dplot3_bar |
E.D. Gennatas
Draw interactive spectrograms using plotly
dplot3_spectrogram( x, y, z, colorGrad.n = 101, colors = NULL, xlab = "Time", ylab = "Frequency", zlab = "Power", hover.xlab = xlab, hover.ylab = ylab, hover.zlab = zlab, zmin = NULL, zmax = NULL, zauto = TRUE, hoverlabel.align = "right", colorscale = "Jet", colorbar.y = 0.5, colorbar.yanchor = "middle", colorbar.xpad = 0, colorbar.ypad = 0, colorbar.len = 0.75, colorbar.title.side = "bottom", showgrid = FALSE, space = "rgb", lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", grid.gap = 0, limits = NULL, main = NULL, key.title = NULL, showticklabels = NULL, theme = rtTheme, font.size = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_spectrogram( x, y, z, colorGrad.n = 101, colors = NULL, xlab = "Time", ylab = "Frequency", zlab = "Power", hover.xlab = xlab, hover.ylab = ylab, hover.zlab = zlab, zmin = NULL, zmax = NULL, zauto = TRUE, hoverlabel.align = "right", colorscale = "Jet", colorbar.y = 0.5, colorbar.yanchor = "middle", colorbar.xpad = 0, colorbar.ypad = 0, colorbar.len = 0.75, colorbar.title.side = "bottom", showgrid = FALSE, space = "rgb", lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", grid.gap = 0, limits = NULL, main = NULL, key.title = NULL, showticklabels = NULL, theme = rtTheme, font.size = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Numeric: Time |
y |
Numeric: Frequency |
z |
Numeric: Power |
colorGrad.n |
Integer: Number of distinct colors to generate using colorGrad. Default = 101 |
colors |
Character: Acts as a shortcut to defining |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
zlab |
Character: z-axis label |
hover.xlab |
Character: x-axis label for hover |
hover.ylab |
Character: y-axis label for hover |
hover.zlab |
Character: z-axis label for hover |
zmin |
Numeric: Minimum value for color scale |
zmax |
Numeric: Maximum value for color scale |
zauto |
Logical: If TRUE, automatically set zmin and zmax |
hoverlabel.align |
Character: Alignment of hover labels |
colorscale |
Character: Color scale. Default = "Jet" |
colorbar.y |
Numeric: Y position of colorbar |
colorbar.yanchor |
Character: Y anchor of colorbar |
colorbar.xpad |
Numeric: X padding of colorbar |
colorbar.ypad |
Numeric: Y padding of colorbar |
colorbar.len |
Numeric: Length of colorbar |
colorbar.title.side |
Character: Side of colorbar title |
showgrid |
Logical: If TRUE, show grid |
space |
Character: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb".
Recommendation: If |
lo |
Color for low end |
lomid |
Color for low-mid |
mid |
Color for middle of the range or "mean", which will result in |
midhi |
Color for middle-high |
hi |
Color for high end |
grid.gap |
Integer: Space between cells. Default = 0 (no space) |
limits |
Numeric, length 2: Determine color range. Default = NULL, which automatically centers values around 0 |
main |
Character: Plot title |
key.title |
Character: Title of the key |
showticklabels |
Logical: If TRUE, show tick labels |
theme |
Character: "light", "dark" |
font.size |
Numeric: Font size |
padding |
Numeric: Padding between cells |
displayModeBar |
Logical: If TRUE, display the plotly mode bar |
modeBar.file.format |
Character: File format for image exports from the mode bar |
filename |
String (Optional: Path to file to save colorbar |
file.width |
Numeric: Width of exported image |
file.height |
Numeric: Height of exported image |
file.scale |
Numeric: Scale of exported image |
... |
Additional arguments to be passed to |
To set custom colors, use a minimum of lo
and hi
, optionnaly also
lomid
, mid
, midhi
colors and set colorscale = NULL
.
E.D. Gennatas
Draw an html table using plotly
dplot3_table( x, .ddSci = TRUE, main = NULL, main.col = "black", main.x = 0, main.xanchor = "auto", fill.col = "#18A3AC", table.bg = "white", bg = "white", line.col = "white", lwd = 1, header.font.col = "white", table.font.col = "gray20", font.size = 14, font.family = "Helvetica Neue", margin = list(l = 0, r = 5, t = 30, b = 0, pad = 0) )
dplot3_table( x, .ddSci = TRUE, main = NULL, main.col = "black", main.x = 0, main.xanchor = "auto", fill.col = "#18A3AC", table.bg = "white", bg = "white", line.col = "white", lwd = 1, header.font.col = "white", table.font.col = "gray20", font.size = 14, font.family = "Helvetica Neue", margin = list(l = 0, r = 5, t = 30, b = 0, pad = 0) )
x |
data.frame: Table to draw |
.ddSci |
Logical: If TRUE, apply ddSci to numeric columns. |
main |
Character: Table tile. |
main.col |
Color: Title color. |
main.x |
Float [0, 1]: Align title: 0: left, .5: center, 1: right. |
main.xanchor |
Character: "auto", "left", "right": plotly's layout xanchor for title. Default = "auto" |
fill.col |
Color: Used to fill header with column names and first column with row names. |
table.bg |
Color: Table background. |
bg |
Color: Background. |
line.col |
Color: Line color. |
lwd |
Float: Line width. Default = 1 |
header.font.col |
Color: Header font color. |
table.font.col |
Color: Table font color. |
font.size |
Integer: Font size. |
font.family |
Character: Font family. |
margin |
List: plotly's margins. |
E.D. Gennatas
Draw interactive timeseries plots using plotly
dplot3_ts( x, time, window = 7L, group = NULL, roll.fn = c("mean", "median", "max", "none"), roll.col = NULL, roll.alpha = 1, roll.lwd = 2, roll.name = NULL, alpha = NULL, align = "center", group.names = NULL, xlab = "Time", n.xticks = 12, scatter.type = "scatter", legend = TRUE, x.showspikes = TRUE, y.showspikes = FALSE, spikedash = "solid", spikemode = "across", spikesnap = "hovered data", spikecolor = NULL, spikethickness = 1, displayModeBar = TRUE, modeBar.file.format = "svg", theme = rtTheme, palette = rtPalette, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_ts( x, time, window = 7L, group = NULL, roll.fn = c("mean", "median", "max", "none"), roll.col = NULL, roll.alpha = 1, roll.lwd = 2, roll.name = NULL, alpha = NULL, align = "center", group.names = NULL, xlab = "Time", n.xticks = 12, scatter.type = "scatter", legend = TRUE, x.showspikes = TRUE, y.showspikes = FALSE, spikedash = "solid", spikemode = "across", spikesnap = "hovered data", spikecolor = NULL, spikethickness = 1, displayModeBar = TRUE, modeBar.file.format = "svg", theme = rtTheme, palette = rtPalette, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Numeric vector of values to plot or list of vectors |
time |
Numeric or Date vector of time corresponding to values of |
window |
Integer: apply |
group |
Factor defining groups |
roll.fn |
Character: "mean", "median", "max", or "sum": Function to apply on
rolling windows of |
roll.col |
Color for rolling line |
roll.alpha |
Numeric: transparency for rolling line |
roll.lwd |
Numeric: width of rolling line |
roll.name |
Rolling function name (for annotation) |
alpha |
Numeric [0, 1]: Transparency |
align |
Character: "center", "right", or "left" |
group.names |
Character vector of group names |
xlab |
Character: x-axis label |
n.xticks |
Integer: number of x-axis ticks to use (approximately) |
scatter.type |
Character: "scatter" or "lines" |
legend |
Logical: If TRUE, show legend |
x.showspikes |
Logical: If TRUE, show x-axis spikes on hover |
y.showspikes |
Logical: If TRUE, show y-axis spikes on hover |
spikedash |
Character: dash type string ("solid", "dot", "dash", "longdash", "dashdot", or "longdashdot") or a dash length list in px (eg "5px,10px,2px,2px") |
spikemode |
Character: If "toaxis", spike line is drawn from the data point to the axis the series is plotted on. If "across", the line is drawn across the entire plot area, and supercedes "toaxis". If "marker", then a marker dot is drawn on the axis the series is plotted on |
spikesnap |
Character: "data", "cursor", "hovered data". Determines whether spikelines are stuck to the cursor or to the closest datapoints. |
spikecolor |
Color for spike lines |
spikethickness |
Numeric: spike line thickness |
displayModeBar |
Logical: If TRUE, display plotly's modebar |
modeBar.file.format |
Character: modeBar image export file format |
theme |
Character: theme name or list of theme parameters |
palette |
Character: palette name, or list of colors |
filename |
Character: Path to filename to save plot |
file.width |
Numeric: image export width |
file.height |
Numeric: image export height |
file.scale |
Numeric: image export scale |
... |
Additional arguments to be passed to dplot3_xy |
E.D. Gennatas
## Not run: time <- sample(seq(as.Date("2020-03-01"), as.Date("2020-09-23"), length.out = 140)) x1 <- rnorm(140) x2 <- rnorm(140, 1, 1.2) # Single timeseries dplot3_ts(x1, time) # Multiple timeseries input as list dplot3_ts(list(Alpha = x1, Beta = x2), time) # Multiple timeseries grouped by group, different lengths time1 <- sample(seq(as.Date("2020-03-01"), as.Date("2020-07-23"), length.out = 100)) time2 <- sample(seq(as.Date("2020-05-01"), as.Date("2020-09-23"), length.out = 140)) time <- c(time1, time2) x <- c(rnorm(100), rnorm(140, 1, 1.5)) group <- c(rep("Alpha", 100), rep("Beta", 140)) dplot3_ts(x, time, 7, group) ## End(Not run)
## Not run: time <- sample(seq(as.Date("2020-03-01"), as.Date("2020-09-23"), length.out = 140)) x1 <- rnorm(140) x2 <- rnorm(140, 1, 1.2) # Single timeseries dplot3_ts(x1, time) # Multiple timeseries input as list dplot3_ts(list(Alpha = x1, Beta = x2), time) # Multiple timeseries grouped by group, different lengths time1 <- sample(seq(as.Date("2020-03-01"), as.Date("2020-07-23"), length.out = 100)) time2 <- sample(seq(as.Date("2020-05-01"), as.Date("2020-09-23"), length.out = 140)) time <- c(time1, time2) x <- c(rnorm(100), rnorm(140, 1, 1.5)) group <- c(rep("Alpha", 100), rep("Beta", 140)) dplot3_ts(x, time, 7, group) ## End(Not run)
Plot variable importance using plotly
dplot3_varimp( x, names = NULL, main = NULL, xlab = "Variable Importance", ylab = "", plot.top = 1, labelify = TRUE, col = NULL, alpha = 1, palette = NULL, mar = NULL, font.size = 16, axis.font.size = 14, theme = rtTheme, showlegend = TRUE, ... )
dplot3_varimp( x, names = NULL, main = NULL, xlab = "Variable Importance", ylab = "", plot.top = 1, labelify = TRUE, col = NULL, alpha = 1, palette = NULL, mar = NULL, font.size = 16, axis.font.size = 14, theme = rtTheme, showlegend = TRUE, ... )
x |
Vector, numeric: Input |
names |
Vector, string: Names of features |
main |
Character: main title |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
plot.top |
Float or Integer: If <= 1, plot this percent highest absolute values, otherwise plot this many top values.
i.e.: |
labelify |
Logical: If TRUE convert |
col |
Vector, colors: Single value, or multiple values to define bar (feature) color(s) |
alpha |
Numeric: Transparency |
palette |
Character: Name of rtemis palette to use. |
mar |
Vector, numeric, length 4: Plot margins in pixels (NOT inches). Default = c(50, 110, 50, 50) |
font.size |
Integer: Overall font size to use (essentially for the title at this point). Default = 14 |
axis.font.size |
Integer: Font size to use for axis labels and tick labels
(Seems not to be in same scale as |
theme |
Output of an rtemis theme function (list of parameters) or theme
name. Use |
showlegend |
Logical: If TRUE, show legend |
... |
Additional arguments passed to theme |
A simple plotly
wrapper to plot horizontal barplots, sorted by value,
which can be used to visualize variable importance, model coefficients, etc.
E.D. Gennatas
# made-up data x <- rnorm(10) names(x) <- paste0("Feature_", seq(x)) dplot3_varimp(x)
# made-up data x <- rnorm(10) names(x) <- paste0("Feature_", seq(x)) dplot3_varimp(x)
Volcano Plot
dplot3_volcano( x, pvals, xnames = NULL, group = NULL, x.thresh = 0, p.thresh = 0.05, p.transform = function(x) -log10(x), p.adjust.method = c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"), legend = NULL, legend.lo = NULL, legend.hi = NULL, label.lo = "Low", label.hi = "High", main = NULL, xlab = NULL, ylab = NULL, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), xlim = NULL, ylim = NULL, alpha = NULL, hline = NULL, hline.col = NULL, hline.width = 1, hline.dash = "solid", hline.annotate = NULL, hline.annotation.x = 1, annotate = TRUE, annotate.col = theme$labs.col, theme = rtTheme, font.size = 16, palette = NULL, legend.x.lo = NULL, legend.x.hi = NULL, legend.y = 0.97, annotate.n = 7, ax.lo = NULL, ay.lo = NULL, ax.hi = NULL, ay.hi = NULL, annotate.alpha = 0.7, hovertext = NULL, displayModeBar = FALSE, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, verbose = TRUE, ... )
dplot3_volcano( x, pvals, xnames = NULL, group = NULL, x.thresh = 0, p.thresh = 0.05, p.transform = function(x) -log10(x), p.adjust.method = c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"), legend = NULL, legend.lo = NULL, legend.hi = NULL, label.lo = "Low", label.hi = "High", main = NULL, xlab = NULL, ylab = NULL, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), xlim = NULL, ylim = NULL, alpha = NULL, hline = NULL, hline.col = NULL, hline.width = 1, hline.dash = "solid", hline.annotate = NULL, hline.annotation.x = 1, annotate = TRUE, annotate.col = theme$labs.col, theme = rtTheme, font.size = 16, palette = NULL, legend.x.lo = NULL, legend.x.hi = NULL, legend.y = 0.97, annotate.n = 7, ax.lo = NULL, ay.lo = NULL, ax.hi = NULL, ay.hi = NULL, annotate.alpha = 0.7, hovertext = NULL, displayModeBar = FALSE, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, verbose = TRUE, ... )
x |
Numeric vector: Input values, e.g. log2 fold change, coefficients, etc. |
pvals |
Numeric vector: p-values |
xnames |
Character vector: |
group |
Factor: Used to color code points. If NULL, significant points
below |
x.thresh |
Numeric x-axis threshold separating low from high |
p.thresh |
Numeric: p-value threshold of significance. Default = .05 |
p.transform |
function. Default = |
p.adjust.method |
Character: p-value adjustment method. "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none" Default = "holm". Use "none" for raw p-values. |
legend |
Logical: If TRUE, show legend. Will default to FALSE, if
|
legend.lo |
Character: Legend to annotate significant points below the
|
legend.hi |
Character: Legend to annotate significant points above the
|
label.lo |
Character: label for low values |
label.hi |
Character: label for high values |
main |
Character: Main plot title. |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
margin |
Named list of plot margins.
Default = |
xlim |
Numeric vector, length 2: x-axis limits |
ylim |
Numeric vector, length 2: y-axis limits |
alpha |
Numeric: point transparency |
hline |
Numeric: If defined, draw a horizontal line at this y value. |
hline.col |
Color for |
hline.width |
Numeric: Width for |
hline.dash |
Character: Type of line to draw: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
hline.annotate |
Character: Text of horizontal line annotation if
|
hline.annotation.x |
Numeric: x position to place annotation with paper as reference. 0: to the left of the plot area; 1: to the right of the plot area |
annotate |
Logical: If TRUE, annotate significant points |
annotate.col |
Color for annotations |
theme |
List or Character: Either the output of a |
font.size |
Float: Font size for all labels. Default = 16 |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
legend.x.lo |
Numeric: x position of |
legend.x.hi |
Numeric: x position of |
legend.y |
Numeric: y position for |
annotate.n |
Integer: Number of significant points to annotate |
ax.lo |
Numeric: Sets the x component of the arrow tail about the arrow head for
significant points below |
ay.lo |
Numeric: Sets the y component of the arrow tail about the arrow head for
significant points below |
ax.hi |
Numeric: Sets the x component of the arrow tail about the arrow head for
significant points above |
ay.hi |
Numeric: Sets the y component of the arrow tail about the arrow head for
significant points above |
annotate.alpha |
Numeric: Transparency for annotations |
hovertext |
List of character vectors with hovertext to include for each group of markers |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
filename |
Character: Path to file to save static plot. Default = NULL |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional parameters passed to dplot3_xy |
E.D. Gennatas
## Not run: set.seed(2019) x <- rnormmat(500, 500) y <- x[, 3] + x[, 5] - x[, 9] + x[, 15] + rnorm(500) mod <- massGLM(y, x) dplot3_volcano(mod$summary$`Coefficient y`, mod$summary$`p_value y`) ## End(Not run)
## Not run: set.seed(2019) x <- rnormmat(500, 500) y <- x[, 3] + x[, 5] - x[, 9] + x[, 15] + rnorm(500) mod <- massGLM(y, x) dplot3_volcano(mod$summary$`Coefficient y`, mod$summary$`p_value y`) ## End(Not run)
Draw interactive univariate plots using plotly
dplot3_x( x, type = c("density", "histogram"), mode = c("overlap", "ridge"), group = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 0.75, plot.bg = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, font.alpha = 0.8, legend = NULL, legend.xy = c(0, 1), legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", bargap = 0.05, vline = NULL, vline.col = theme$fg, vline.width = 1, vline.dash = "dot", text = NULL, text.x = 1, text.xref = "paper", text.xanchor = "left", text.y = 1, text.yref = "paper", text.yanchor = "top", text.col = theme$fg, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), automargin.x = TRUE, automargin.y = TRUE, zerolines = FALSE, density.kernel = "gaussian", density.bw = "SJ", histnorm = c("", "density", "percent", "probability", "probability density"), histfunc = c("count", "sum", "avg", "min", "max"), hist.n.bins = 20, barmode = "overlay", ridge.sharex = TRUE, ridge.y.labs = FALSE, ridge.order.on.mean = TRUE, displayModeBar = TRUE, modeBar.file.format = "svg", width = NULL, height = NULL, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_x( x, type = c("density", "histogram"), mode = c("overlap", "ridge"), group = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = 0.75, plot.bg = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, font.alpha = 0.8, legend = NULL, legend.xy = c(0, 1), legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", bargap = 0.05, vline = NULL, vline.col = theme$fg, vline.width = 1, vline.dash = "dot", text = NULL, text.x = 1, text.xref = "paper", text.xanchor = "left", text.y = 1, text.yref = "paper", text.yanchor = "top", text.col = theme$fg, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), automargin.x = TRUE, automargin.y = TRUE, zerolines = FALSE, density.kernel = "gaussian", density.bw = "SJ", histnorm = c("", "density", "percent", "probability", "probability density"), histfunc = c("count", "sum", "avg", "min", "max"), hist.n.bins = 20, barmode = "overlay", ridge.sharex = TRUE, ridge.y.labs = FALSE, ridge.order.on.mean = TRUE, displayModeBar = TRUE, modeBar.file.format = "svg", width = NULL, height = NULL, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Numeric, vector / data.frame /list: Input. If not a vector, each column of or each element |
type |
Character: "density" or "histogram" |
mode |
Character: "overlap", "ridge". How to plot different groups; on the same axes ("overlap"), or on separate plots with the same x-axis ("ridge") |
group |
Vector: Will be converted to factor; levels define group members. Default = NULL |
main |
Character: Main plot title. |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
col |
Color, vector: Color for bars. Default NULL, which will draw
colors from |
alpha |
Float (0, 1]: Transparency for bar colors. Default = .8 |
plot.bg |
Color: Background color for plot area |
theme |
List or Character: Either the output of a |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
axes.square |
Logical: If TRUE: draw a square plot to fill the graphic device. Default = FALSE. Note: If TRUE, the device size at time of call is captured and height and width are set so as to draw the largest square available. This means that resizing the device window will not automatically resize the plot. |
group.names |
Character, vector, length = NROW(x): Group names.
Default = NULL, which uses |
font.size |
Float: Font size for all labels. Default = 16 |
font.alpha |
Float: Alpha transparency for font. |
legend |
Logical: If TRUE, draw legend. Default = NULL, which will be set to TRUE if x is a list of more than 1 element |
legend.xy |
Float, vector, length 2: Relative x, y position for legend. Default = c(0, 1), which places the legend top left within the plot area. Set to NULL to place legend top right beside the plot area |
legend.col |
Color: Legend text color. Default = NULL, determined by theme |
legend.bg |
Color: Background color for legend |
legend.border.col |
Color: Border color for legend |
bargap |
Float: The gap between adjacent histogram bars in plot fraction. |
vline |
Float, vector: If defined, draw a vertical line at this x value(s). Default = NULL |
vline.col |
Color for |
vline.width |
Float: Width for |
vline.dash |
Character: Type of line to draw: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
text |
Character: If defined, add this text over the plot |
text.x |
Float: x-coordinate for |
text.xref |
Character: "x": |
text.xanchor |
Character: "auto", "left", "center", "right" |
text.y |
Float: y-coordinate for |
text.yref |
Character: "y": |
text.yanchor |
Character: "auto", "top", "middle", "bottom" |
text.col |
Color for |
margin |
Named list: plot margins. |
automargin.x |
Logical: If TRUE, automatically set x-axis amrgins |
automargin.y |
Logical: If TRUE, automatically set y-axis amrgins |
zerolines |
Logical: If TRUE, draw lines at y = 0. |
density.kernel |
Character: Kernel to use for density estimation. |
density.bw |
Character: Bandwidth to use for density estimation. |
histnorm |
Character: NULL, "percent", "probability", "density", "probability density" |
histfunc |
Character: "count", "sum", "avg", "min", "max". |
hist.n.bins |
Integer: Number of bins to use if type = "histogram". |
barmode |
Character: Type of bar plot to make: "group", "relative", "stack", "overlay". Default = "group". Use "relative" for stacked bars, wich handles negative values correctly, unlike "stack", as of writing. |
ridge.sharex |
Logical: If TRUE, draw single x-axis when
|
ridge.y.labs |
Lofical: If TRUE, show individual y labs when
|
ridge.order.on.mean |
Logical: If TRUE, order groups by mean value
when |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" / any output file type supported by plotly and your system |
width |
Float: Force plot size to this width. Default = NULL, i.e. fill available space |
height |
Float: Force plot size to this height. Default = NULL, i.e. fill available space |
filename |
Character: Path to file to save static plot. |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional arguments passed to theme function. |
If input is data.frame, non-numeric variables will be removed
A plotly object
E.D. Gennatas
## Not run: dplot3_x(iris) dplot3_x(split(iris$Sepal.Length, iris$Species), xlab = "Sepal Length") ## End(Not run)
## Not run: dplot3_x(iris) dplot3_x(split(iris$Sepal.Length, iris$Species), xlab = "Sepal Length") ## End(Not run)
Plot timeseries data
dplot3_xt( x, y = NULL, x2 = NULL, y2 = NULL, which.xy = NULL, which.xy2 = NULL, shade.bin = NULL, shade.interval = NULL, shade.col = NULL, shade.x = NULL, shade.name = "", shade.showlegend = FALSE, ynames = NULL, y2names = NULL, xlab = NULL, ylab = NULL, y2lab = NULL, xunits = NULL, yunits = NULL, y2units = NULL, yunits.col = NULL, y2units.col = NULL, zt = NULL, show.zt = TRUE, show.zt.every = NULL, zt.nticks = 18L, main = NULL, main.y = 1, main.yanchor = "bottom", x.nticks = 0, y.nticks = 0, show.rangeslider = NULL, slider.start = NULL, slider.end = NULL, theme = rtTheme, palette = rtpalette(rtPalette), font.size = 16, yfill = "none", y2fill = "none", fill.alpha = 0.2, yline.width = 2, y2line.width = 2, x.showspikes = TRUE, spike.dash = "solid", spike.col = NULL, x.spike.thickness = -2, tickfont.size = 16, x.tickmode = "auto", x.tickvals = NULL, x.ticktext = NULL, x.tickangle = NULL, legend.x = 0, legend.y = 1.1, legend.xanchor = "left", legend.yanchor = "top", legend.orientation = "h", margin = list(l = 75, r = 75, b = 75, t = 75), x.standoff = 20L, y.standoff = 20L, y2.standoff = 20L, hovermode = "x", displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 960, file.height = 500, file.scale = 1, ... )
dplot3_xt( x, y = NULL, x2 = NULL, y2 = NULL, which.xy = NULL, which.xy2 = NULL, shade.bin = NULL, shade.interval = NULL, shade.col = NULL, shade.x = NULL, shade.name = "", shade.showlegend = FALSE, ynames = NULL, y2names = NULL, xlab = NULL, ylab = NULL, y2lab = NULL, xunits = NULL, yunits = NULL, y2units = NULL, yunits.col = NULL, y2units.col = NULL, zt = NULL, show.zt = TRUE, show.zt.every = NULL, zt.nticks = 18L, main = NULL, main.y = 1, main.yanchor = "bottom", x.nticks = 0, y.nticks = 0, show.rangeslider = NULL, slider.start = NULL, slider.end = NULL, theme = rtTheme, palette = rtpalette(rtPalette), font.size = 16, yfill = "none", y2fill = "none", fill.alpha = 0.2, yline.width = 2, y2line.width = 2, x.showspikes = TRUE, spike.dash = "solid", spike.col = NULL, x.spike.thickness = -2, tickfont.size = 16, x.tickmode = "auto", x.tickvals = NULL, x.ticktext = NULL, x.tickangle = NULL, legend.x = 0, legend.y = 1.1, legend.xanchor = "left", legend.yanchor = "top", legend.orientation = "h", margin = list(l = 75, r = 75, b = 75, t = 75), x.standoff = 20L, y.standoff = 20L, y2.standoff = 20L, hovermode = "x", displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 960, file.height = 500, file.scale = 1, ... )
x |
Datetime vector or list of vectors OR object of class |
y |
Numeric vector or named list of vectors: y-axis data. |
x2 |
Datetime vector or list of vectors, optional: must be provided if |
y2 |
Numeric vector, optional: If provided, a second y-axis will be added to the right side of the plot |
which.xy |
Integer vector: Indices of |
which.xy2 |
Integer vector: Indices of |
shade.bin |
Integer vector 0, 1: Time points in |
shade.interval |
List of numeric vectors: Intervals to shade on the plot. Only set
|
shade.col |
Color: Color to shade intervals. |
shade.x |
Numeric vector: x-values to use for shading. |
shade.name |
Character: Name for shaded intervals. |
shade.showlegend |
Logical: If TRUE, show legend for shaded intervals. |
ynames |
Character vector, optional: Names for each vector in |
y2names |
Character vector, optional: Names for each vector in |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
y2lab |
Character: y2-axis label. |
xunits |
Character: x-axis units. |
yunits |
Character: y-axis units. |
y2units |
Character: y2-axis units. |
yunits.col |
Color for y-axis units. |
y2units.col |
Color for y2-axis units. |
zt |
Numeric vector: Zeitgeber time. If provided, will be shown on the x-axis instead of
|
show.zt |
Logical: If TRUE, show zt on x-axis, if zt is provided. |
show.zt.every |
Integer: Show zt every |
zt.nticks |
Integer: Number of zt ticks to show. Only used if |
main |
Character: Main title. |
main.y |
Numeric: Y position of main title. |
main.yanchor |
Character: "top", "middle", "bottom". |
x.nticks |
Integer: Number of ticks on x-axis. |
y.nticks |
Integer: Number of ticks on y-axis. |
show.rangeslider |
Logical: If TRUE, show a range slider. |
slider.start |
Numeric: Start of range slider. |
slider.end |
Numeric: End of range slider. |
theme |
Character or list: Name of theme or list of plot parameters. |
palette |
Color list: will be used to draw each vector in |
font.size |
Numeric: Font size for text. |
yfill |
Character: Fill type for y-axis: "none", "tozeroy", "tonexty" |
y2fill |
Character: Fill type for y2-axis: "none", "tozeroy", "tonexty" |
fill.alpha |
Numeric: Fill opacity for y-axis. |
yline.width |
Numeric: Line width for y-axis lines. |
y2line.width |
Numeric: Line width for y2-axis lines. |
x.showspikes |
Logical: If TRUE, show spikes on x-axis. |
spike.dash |
Character: Dash type for spikes: "solid", "dot", "dash", "longdash", "dashdot", "longdashdot". |
spike.col |
Color for spikes. |
x.spike.thickness |
Numeric: Thickness of spikes. |
tickfont.size |
Numeric: Font size for tick labels. |
x.tickmode |
Character: "auto", "linear", "array". |
x.tickvals |
Numeric vector: Tick positions. |
x.ticktext |
Character vector: Tick labels. |
x.tickangle |
Numeric: Angle of tick labels. |
legend.x |
Numeric: X position of legend. |
legend.y |
Numeric: Y position of legend. |
legend.xanchor |
Character: "left", "center", "right". |
legend.yanchor |
Character: "top", "middle", "bottom". |
legend.orientation |
Character: "v" for vertical, "h" for horizontal. |
margin |
Named list with 4 numeric values: "l", "r", "t", "b" for left, right, top, bottom margins. |
x.standoff |
Numeric: Distance from x-axis to x-axis label. |
y.standoff |
Numeric: Distance from y-axis to y-axis label. |
y2.standoff |
Numeric: Distance from y2-axis to y2-axis label. |
hovermode |
Character: "closest", "x", "x unified" |
displayModeBar |
Logical: If TRUE, display plotly mode bar. |
modeBar.file.format |
Character: "png", "svg", "jpeg", "webp", "pdf": file format for mode bar image export. |
scrollZoom |
Logical: If TRUE, enable zooming by scrolling. |
filename |
Character: Path to file to save static plot. |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional theme arguments. |
We are switching to palette
being a color vector instead of the name of a built-in palette.
A plotly object
EDG
Draw interactive scatter plots using plotly
dplot3_xy( x, y = NULL, fit = NULL, se.fit = FALSE, se.times = 1.96, include.fit.name = TRUE, cluster = NULL, cluster.params = list(k = 2), group = NULL, formula = NULL, rsq = TRUE, mode = "markers", order.on.x = NULL, main = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, marker.col = NULL, marker.size = 8, symbol = "circle", fit.col = NULL, fit.alpha = 0.8, fit.lwd = 2.5, se.col = NULL, se.alpha = 0.4, scatter.type = "scatter", show.marginal.x = FALSE, show.marginal.y = FALSE, marginal.x = x, marginal.y = y, marginal.x.y = NULL, marginal.y.x = NULL, marginal.col = NULL, marginal.alpha = 0.333, marginal.size = 10, legend = NULL, legend.xy = c(0, 0.98), legend.xanchor = "left", legend.yanchor = "auto", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, x.showspikes = FALSE, y.showspikes = FALSE, spikedash = "solid", spikemode = "across", spikesnap = "hovered data", spikecolor = NULL, spikethickness = 1, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), main.y = 1, main.yanchor = "bottom", subtitle.x = 0.02, subtitle.y = 0.99, subtitle.xref = "paper", subtitle.yref = "paper", subtitle.xanchor = "left", subtitle.yanchor = "top", automargin.x = TRUE, automargin.y = TRUE, xlim = NULL, ylim = NULL, axes.equal = FALSE, diagonal = FALSE, diagonal.col = NULL, diagonal.alpha = 0.2, fit.params = list(), vline = NULL, vline.col = theme$fg, vline.width = 1, vline.dash = "dot", hline = NULL, hline.col = theme$fg, hline.width = 1, hline.dash = "dot", hovertext = NULL, width = NULL, height = NULL, displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, trace = 0, ... )
dplot3_xy( x, y = NULL, fit = NULL, se.fit = FALSE, se.times = 1.96, include.fit.name = TRUE, cluster = NULL, cluster.params = list(k = 2), group = NULL, formula = NULL, rsq = TRUE, mode = "markers", order.on.x = NULL, main = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, col = NULL, alpha = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, marker.col = NULL, marker.size = 8, symbol = "circle", fit.col = NULL, fit.alpha = 0.8, fit.lwd = 2.5, se.col = NULL, se.alpha = 0.4, scatter.type = "scatter", show.marginal.x = FALSE, show.marginal.y = FALSE, marginal.x = x, marginal.y = y, marginal.x.y = NULL, marginal.y.x = NULL, marginal.col = NULL, marginal.alpha = 0.333, marginal.size = 10, legend = NULL, legend.xy = c(0, 0.98), legend.xanchor = "left", legend.yanchor = "auto", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, x.showspikes = FALSE, y.showspikes = FALSE, spikedash = "solid", spikemode = "across", spikesnap = "hovered data", spikecolor = NULL, spikethickness = 1, margin = list(b = 65, l = 65, t = 50, r = 10, pad = 0), main.y = 1, main.yanchor = "bottom", subtitle.x = 0.02, subtitle.y = 0.99, subtitle.xref = "paper", subtitle.yref = "paper", subtitle.xanchor = "left", subtitle.yanchor = "top", automargin.x = TRUE, automargin.y = TRUE, xlim = NULL, ylim = NULL, axes.equal = FALSE, diagonal = FALSE, diagonal.col = NULL, diagonal.alpha = 0.2, fit.params = list(), vline = NULL, vline.col = theme$fg, vline.width = 1, vline.dash = "dot", hline = NULL, hline.col = theme$fg, hline.width = 1, hline.dash = "dot", hovertext = NULL, width = NULL, height = NULL, displayModeBar = TRUE, modeBar.file.format = "svg", scrollZoom = TRUE, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, trace = 0, ... )
x |
Numeric, vector/data.frame/list: x-axis data. If y is NULL and
|
y |
Numeric, vector/data.frame/list: y-axis data |
fit |
Character: rtemis model to calculate |
se.fit |
Logical: If TRUE, draw the standard error of the fit |
se.times |
Draw polygon or lines at +/- |
include.fit.name |
Logical: If TRUE, include fit name in legend. |
cluster |
Character: Clusterer name. Will cluster
|
cluster.params |
List: Names list of parameters to pass to the
|
group |
Vector: Will be converted to factor; levels define group members. Default = NULL |
formula |
Formula: Provide a formula to be solved using s_NLS.
If provided, |
rsq |
Logical: If TRUE, print R-squared values in legend if |
mode |
Character, vector: "markers", "lines", "markers+lines". |
order.on.x |
Logical: If TRUE, order |
main |
Character: Main plot title. |
subtitle |
Character: Subtitle |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
col |
Color for markers. Default=NULL, which will draw colors from palette |
alpha |
Float (0, 1]: Transparency for bar colors. Default = .8 |
theme |
List or Character: Either the output of a |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
axes.square |
Logical: If TRUE: draw a square plot to fill the graphic device. Default = FALSE. Note: If TRUE, the device size at time of call is captured and height and width are set so as to draw the largest square available. This means that resizing the device window will not automatically resize the plot. |
group.names |
Character, vector, length = NROW(x): Group names.
Default = NULL, which uses |
font.size |
Float: Font size for all labels. Default = 16 |
marker.col |
Color for marker |
marker.size |
Numeric: Marker size. |
symbol |
Character: Marker symbol. |
fit.col |
Color: Color of the fit line. |
fit.alpha |
Float [0, 1]: Transparency for fit line |
fit.lwd |
Float: Fit line width |
se.col |
Color for |
se.alpha |
Alpha for |
scatter.type |
Character: "scatter", "scattergl", "scatter3d", "scatterternary", "scatterpolar", "scattermapbox", |
show.marginal.x |
Logical: If TRUE, add marginal distribution line markers on x-axis |
show.marginal.y |
Logical: If TRUE, add marginal distribution line markers on y-axis |
marginal.x |
Numeric: Data whose distribution will be shown on x-axis. Only
specify if different from |
marginal.y |
Numeric: Data whose distribution will be shown on y-axis. Only
specify if different from |
marginal.x.y |
Numeric: Y position of marginal markers on x-axis |
marginal.y.x |
Numeric: X position of marginal markers on y-axis |
marginal.col |
Color for marginal markers |
marginal.alpha |
Numeric: Alpha for marginal markers |
marginal.size |
Numeric: Size of marginal markers |
legend |
Logical: If TRUE, draw legend. Default = NULL, which will be set to
TRUE if there are more than 1 groups, or |
legend.xy |
Numeric, vector, length 2: x and y for plotly's legend |
legend.xanchor |
Character: Legend's x anchor: "left", "center", "right", "auto" |
legend.yanchor |
Character: Legend's y anchor: "top", "middle", "bottom", "auto" |
legend.orientation |
"v" or "h" for vertical or horizontal |
legend.col |
Color: Legend text color. Default = NULL, determined by theme |
legend.bg |
Color: Background color for legend |
legend.border.col |
Color: Border color for legend |
legend.borderwidth |
Numeric: Border width for legend |
legend.group.gap |
Numeric: Gap between legend groups |
x.showspikes |
Logical: If TRUE, show spikes on x-axis |
y.showspikes |
Logical: If TRUE, show spikes on y-axis |
spikedash |
Character: Dash type for spikes |
spikemode |
Character: "across", "toaxis", "marker", or any combination of those
joined by |
spikesnap |
Character: "data", "cursor", "hovered data" |
spikecolor |
Color for spikes |
spikethickness |
Numeric: Thickness of spikes |
margin |
Named list: plot margins. |
main.y |
Numeric: Y position of main title |
main.yanchor |
Character: "top", "middle", "bottom" |
subtitle.x |
Numeric: X position of subtitle relative to paper |
subtitle.y |
Numeric: Y position of subtitle relative to paper |
subtitle.xref |
Character: "paper", "x", "y" |
subtitle.yref |
Character: "paper", "x", "y" |
subtitle.xanchor |
Character: "left", "center", "right" |
subtitle.yanchor |
Character: "top", "middle", "bottom" |
automargin.x |
Logical: If TRUE, automatically set x-axis amrgins |
automargin.y |
Logical: If TRUE, automatically set y-axis amrgins |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float, vector, length 2: y-axis limits. |
axes.equal |
Logical: Should axes be equal? Defaults to FALSE |
diagonal |
Logical: If TRUE, draw diagonal line. |
diagonal.col |
Color: Color for |
diagonal.alpha |
Float: Alpha for |
fit.params |
List: Arguments for learner defined by |
vline |
Float, vector: If defined, draw a vertical line at this x value(s). Default = NULL |
vline.col |
Color for |
vline.width |
Float: Width for |
vline.dash |
Character: Type of line to draw: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
hline |
Float: If defined, draw a horizontal line at this y value. |
hline.col |
Color for |
hline.width |
Float: Width for |
hline.dash |
Character: Type of line to draw: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot" |
hovertext |
List of character vectors with hovertext to include for each group of markers |
width |
Numeric: Force plot size to this width. Default = NULL, i.e. fill available space |
height |
Numeric: Force plot size to this height. Default = NULL, i.e. fill available space |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" / any output file type supported by plotly and your system |
scrollZoom |
Logical: If TRUE, enable scroll zoom |
filename |
Character: Path to file to save static plot. Default = NULL |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
trace |
Integer: The height the number the more diagnostic info is printed to the console |
... |
Additional arguments passed to theme |
use theme$tick.labels.col for both tick color and tick label color - this may change
E.D. Gennatas
## Not run: dplot3_xy(iris$Sepal.Length, iris$Petal.Length, fit = "gam", se.fit = TRUE, group = iris$Species ) ## End(Not run)
## Not run: dplot3_xy(iris$Sepal.Length, iris$Petal.Length, fit = "gam", se.fit = TRUE, group = iris$Species ) ## End(Not run)
Draw interactive 3D plots using plotly
dplot3_xyz( x, y = NULL, z = NULL, fit = NULL, cluster = NULL, cluster.params = list(k = 2), group = NULL, formula = NULL, rsq = TRUE, mode = "markers", order.on.x = NULL, main = NULL, xlab = NULL, ylab = NULL, zlab = NULL, col = NULL, alpha = 0.8, bg = NULL, plot.bg = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, marker.col = NULL, marker.size = 8, fit.col = NULL, fit.alpha = 0.7, fit.lwd = 2.5, tick.font.size = 12, spike.col = NULL, legend = NULL, legend.xy = c(0, 1), legend.xanchor = "left", legend.yanchor = "auto", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, margin = list(t = 30, b = 0, l = 0, r = 0), fit.params = list(), width = NULL, height = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", trace = 0, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
dplot3_xyz( x, y = NULL, z = NULL, fit = NULL, cluster = NULL, cluster.params = list(k = 2), group = NULL, formula = NULL, rsq = TRUE, mode = "markers", order.on.x = NULL, main = NULL, xlab = NULL, ylab = NULL, zlab = NULL, col = NULL, alpha = 0.8, bg = NULL, plot.bg = NULL, theme = rtTheme, palette = rtPalette, axes.square = FALSE, group.names = NULL, font.size = 16, marker.col = NULL, marker.size = 8, fit.col = NULL, fit.alpha = 0.7, fit.lwd = 2.5, tick.font.size = 12, spike.col = NULL, legend = NULL, legend.xy = c(0, 1), legend.xanchor = "left", legend.yanchor = "auto", legend.orientation = "v", legend.col = NULL, legend.bg = "#FFFFFF00", legend.border.col = "#FFFFFF00", legend.borderwidth = 0, legend.group.gap = 0, margin = list(t = 30, b = 0, l = 0, r = 0), fit.params = list(), width = NULL, height = NULL, padding = 0, displayModeBar = TRUE, modeBar.file.format = "svg", trace = 0, filename = NULL, file.width = 500, file.height = 500, file.scale = 1, ... )
x |
Numeric, vector/data.frame/list: x-axis data. If y is NULL and
|
y |
Numeric, vector/data.frame/list: y-axis data |
z |
Numeric, vector/data.frame/list: z-axis data |
fit |
Character: rtemis model to calculate |
cluster |
Character: Clusterer name. Will cluster
|
cluster.params |
List: Names list of parameters to pass to the
|
group |
Vector: Will be converted to factor; levels define group members. Default = NULL |
formula |
Formula: Provide a formula to be solved using s_NLS.
If provided, |
rsq |
Logical: If TRUE, print R-squared values in legend if |
mode |
Character, vector: "markers", "lines", "markers+lines". |
order.on.x |
Logical: If TRUE, order |
main |
Character: Main plot title. |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
zlab |
Character: z-axis label |
col |
Color for markers. Default=NULL, which will draw colors from palette |
alpha |
Float (0, 1]: Transparency for bar colors. Default = .8 |
bg |
Background color |
plot.bg |
Plot background color |
theme |
List or Character: Either the output of a |
palette |
Character: Name of rtemis palette to use.
Default = "rtCol1". Only used if |
axes.square |
Logical: If TRUE: draw a square plot to fill the graphic device. Default = FALSE. Note: If TRUE, the device size at time of call is captured and height and width are set so as to draw the largest square available. This means that resizing the device window will not automatically resize the plot. |
group.names |
Character, vector, length = NROW(x): Group names.
Default = NULL, which uses |
font.size |
Float: Font size for all labels. Default = 16 |
marker.col |
Color for marker |
marker.size |
Numeric: Marker size. |
fit.col |
Color: Color of the fit line. |
fit.alpha |
Float [0, 1]: Transparency for fit line |
fit.lwd |
Float: Fit line width |
tick.font.size |
Numeric: Tick font size |
spike.col |
Spike lines color |
legend |
Logical: If TRUE, draw legend. Default = NULL, which will be set to
TRUE if there are more than 1 groups, or |
legend.xy |
Numeric, vector, length 2: x and y for plotly's legend |
legend.xanchor |
Character: Legend's x anchor: "left", "center", "right", "auto" |
legend.yanchor |
Character: Legend's y anchor: "top", "middle", "bottom", "auto" |
legend.orientation |
"v" or "h" for vertical or horizontal |
legend.col |
Color: Legend text color. Default = NULL, determined by theme |
legend.bg |
Color: Background color for legend |
legend.border.col |
Color: Border color for legend |
legend.borderwidth |
Numeric: Border width for legend |
legend.group.gap |
Numeric: Gap between legend groups |
margin |
Numeric, named list: Margins for top, bottom, left, right.
Default = |
fit.params |
List: Arguments for learner defined by |
width |
Numeric: Force plot size to this width. Default = NULL, i.e. fill available space |
height |
Numeric: Force plot size to this height. Default = NULL, i.e. fill available space |
padding |
Numeric: Graph padding. |
displayModeBar |
Logical: If TRUE, show plotly's modebar |
modeBar.file.format |
Character: "svg", "png", "jpeg", "pdf" / any output file type supported by plotly and your system |
trace |
Integer: The height the number the more diagnostic info is printed to the console |
filename |
Character: Path to file to save static plot. Default = NULL |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
... |
Additional arguments passed to theme |
Note that dplot3_xyz
uses the theme's plot.bg
as grid.col
E.D. Gennatas
## Not run: dplot3_xyz(iris, group = iris$Species, theme = "darkgrid") ## End(Not run)
## Not run: dplot3_xyz(iris, group = iris$Species, theme = "darkgrid") ## End(Not run)
rtemis preproc
: Adjusts the dynamic range of a vector or matrix input.
By default normalizes to 0-1 range.
drange(x, lo = 0, hi = 1, byCol = TRUE)
drange(x, lo = 0, hi = 1, byCol = TRUE)
x |
Numeric vector or matrix / data frame: Input |
lo |
Target range minimum. Defaults to 0 |
hi |
Target range maximum. Defaults to 1 |
byCol |
Logical: If TRUE: if |
E.D. Gennatas
x <- runif(20, -10, 10) x <- drange(x)
x <- runif(20, -10, 10) x <- drange(x)
Check if all levels in a column are unique
dt_check_unique(x, on)
dt_check_unique(x, on)
x |
data.frame or data.table |
on |
Integer or character: column to check |
E.D. Gennatas
Describe data.table
dt_describe(x)
dt_describe(x)
x |
data.table |
## Not run: origin <- as.POSIXct("2022-01-01 00:00:00", tz = "America/Los_Angeles") x <- data.table( ID = paste0("ID", 1:10), V1 = rnorm(10), V2 = rnorm(10, 20, 3), V1_datetime = as.POSIXct( seq( 1, 1e7, length.out = 10 ), origin = origin ), V2_datetime = as.POSIXct( seq( 1, 1e7, length.out = 10 ), origin = origin ), C1 = sample(c("alpha", "beta", "gamma"), 10, TRUE), F1 = factor(sample(c("delta", "epsilon", "zeta"), 10, TRUE)) ) ## End(Not run)
## Not run: origin <- as.POSIXct("2022-01-01 00:00:00", tz = "America/Los_Angeles") x <- data.table( ID = paste0("ID", 1:10), V1 = rnorm(10), V2 = rnorm(10, 20, 3), V1_datetime = as.POSIXct( seq( 1, 1e7, length.out = 10 ), origin = origin ), V2_datetime = as.POSIXct( seq( 1, 1e7, length.out = 10 ), origin = origin ), C1 = sample(c("alpha", "beta", "gamma"), 10, TRUE), F1 = factor(sample(c("delta", "epsilon", "zeta"), 10, TRUE)) ) ## End(Not run)
Tabulate column attributes
dt_get_column_attr(x, attr = "source", useNA = "always")
dt_get_column_attr(x, attr = "source", useNA = "always")
x |
data.table |
attr |
Character: Attribute to get |
useNA |
Character: Passed to |
E.D. Gennatas
Get index of duplicate values
dt_get_duplicates(x, on)
dt_get_duplicates(x, on)
x |
data.frame or data.table |
on |
Integer or character: column to check |
E.D. Gennatas
Get factor levels from data.table
dt_get_factor_levels(dat)
dt_get_factor_levels(dat)
dat |
data.table Note: If |
Named list of factor levels. Names correspond to column names.
Index columns by attribute name & value
dt_index_attr(x, name, value)
dt_index_attr(x, name, value)
x |
data.frame or compatible |
name |
Character: Name of attribute |
value |
Character: Value of attribute |
E.D. Gennatas
Will attempt to identify columns that should be numeric but are either character or factor by running inspect_type on each column.
dt_inspect_type(x, cols = NULL, verbose = TRUE)
dt_inspect_type(x, cols = NULL, verbose = TRUE)
x |
data.table |
cols |
Character vector: columns to inspect. |
verbose |
Logical: If TRUE, print messages to console. |
E.D. Gennatas
Reshape a long format data.table
using key-value pairs with
data.table::dcast
dt_keybin_reshape( x, id_name, key_name, positive = 1, negative = 0, xname = NULL, verbose = TRUE )
dt_keybin_reshape( x, id_name, key_name, positive = 1, negative = 0, xname = NULL, verbose = TRUE )
x |
A |
id_name |
Character: Name of column in |
key_name |
Character: Name of column in |
positive |
Numeric or Character: Used to fill id ~ key combination
present in the long format input |
negative |
Numeric or Character: Used to fill id ~ key combination
NOT present in the long format input |
xname |
Character: Name of |
verbose |
Logical: If TRUE, print messages to the console |
E.D. Gennatas
## Not run: x <- data.table( ID = rep(1:3, each = 2), Dx = c("A", "C", "B", "C", "D", "A") ) dt_keybin_reshape(x, id_name = "ID", key_name = "Dx") ## End(Not run)
## Not run: x <- data.table( ID = rep(1:3, each = 2), Dx = c("A", "C", "B", "C", "D", "A") ) dt_keybin_reshape(x, id_name = "ID", key_name = "Dx") ## End(Not run)
Merge data.tables
dt_merge( left, right, on = NULL, left_on = NULL, right_on = NULL, how = "left", left_name = NULL, right_name = NULL, left_suffix = NULL, right_suffix = NULL, verbose = TRUE, ... )
dt_merge( left, right, on = NULL, left_on = NULL, right_on = NULL, how = "left", left_name = NULL, right_name = NULL, left_suffix = NULL, right_suffix = NULL, verbose = TRUE, ... )
left |
data.table |
right |
data.table |
on |
Character: Name of column to join on |
left_on |
Character: Name of column on left table |
right_on |
Character: Name of column on right table |
how |
Character: Type of join: "inner", "left", "right", "outer". |
left_name |
Character: Name of left table |
right_name |
Character: Name of right table |
left_suffix |
Character: If provided, add this suffix to all left column names, excluding on/left_on |
right_suffix |
Character: If provided, add this suffix to all right column names, excluding on/right_on |
verbose |
Logical: If TRUE, print messages to console |
... |
Additional arguments to be passed to |
E.D. Gennatas
List column names by attribute
dt_names_by_attr(x, which, exact = TRUE, sorted = TRUE)
dt_names_by_attr(x, which, exact = TRUE, sorted = TRUE)
x |
data.table |
which |
Character: name of attribute |
exact |
Logical: If TRUE, use exact matching |
sorted |
Logical: If TRUE, sort the output |
E.D. Gennatas
List column names by class
dt_names_by_class(x, sorted = TRUE, item.format = hilite, maxlength = 24)
dt_names_by_class(x, sorted = TRUE, item.format = hilite, maxlength = 24)
x |
data.table |
sorted |
Logical: If TRUE, sort the output |
item.format |
Function: Function to format each item |
maxlength |
Integer: Maximum number of items to print |
E.D. Gennatas
Get N and percent match of values between two columns of two data.tables
dt_pctmatch(x, y, on = NULL, left_on = NULL, right_on = NULL, verbose = TRUE)
dt_pctmatch(x, y, on = NULL, left_on = NULL, right_on = NULL, verbose = TRUE)
x |
data.table |
y |
data.table |
on |
Integer or character: column to read in |
left_on |
Integer or character: column to read in |
right_on |
Integer or character: column to read in |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
Get percent of missing values from every column
dt_pctmissing(x, verbose = TRUE)
dt_pctmissing(x, verbose = TRUE)
x |
data.frame or data.table |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
This function inspects a data.table and attempts to identify columns that should be numeric but have been read in as character, because one or more fields contain non-numeric characters
dt_set_autotypes(x, cols = NULL, verbose = TRUE)
dt_set_autotypes(x, cols = NULL, verbose = TRUE)
x |
data.table |
cols |
Character vector: columns to work on. If not defined, will work on all columns |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
Clean column names and factor levels in-place
dt_set_clean_all(x, prefix_digits = NA)
dt_set_clean_all(x, prefix_digits = NA)
x |
data.table |
prefix_digits |
Character: prefix to add to names beginning with a digit. Set to NA to skip Note: If |
E.D. Gennatas
Finds all factors in a data.table and cleans factor levels to include only underscore symbols
dt_set_cleanfactorlevels(x, prefix_digits = NA)
dt_set_cleanfactorlevels(x, prefix_digits = NA)
x |
data.table |
prefix_digits |
Character: If not NA, add this prefix to all factor levels that are numbers |
E.D. Gennatas
## Not run: x <- as.data.table(iris) levels(x$Species) <- c("setosa:iris", "versicolor$iris", "virginica iris") dt_set_cleanfactorlevels(x) x ## End(Not run)
## Not run: x <- as.data.table(iris) levels(x$Species) <- c("setosa:iris", "versicolor$iris", "virginica iris") dt_set_cleanfactorlevels(x) x ## End(Not run)
Convert data.table logical columns to factor with custom labels in-place
dt_set_logical2factor( x, cols = NULL, labels = c("False", "True"), maintain_attributes = TRUE, fillNA = NULL )
dt_set_logical2factor( x, cols = NULL, labels = c("False", "True"), maintain_attributes = TRUE, fillNA = NULL )
x |
data.table |
cols |
Integer or character: columns to convert, if NULL, operates on all logical columns |
labels |
Character: labels for factor levels |
maintain_attributes |
Logical: If TRUE, maintain column attributes |
fillNA |
Character: If not NULL, fill NA values with this constant |
E.D. Gennatas
## Not run: library(data.table) x <- data.table(a = 1:5, b = c(T, F, F, F, T)) x dt_set_logical2factor(x) x z <- data.table(alpha = 1:5, beta = c(T, F, T, NA, T), gamma = c(F, F, T, F, NA)) z # You can usee fillNA to fill NA values with a constant dt_set_logical2factor(z, cols = "beta", labels = c("No", "Yes"), fillNA = "No") z w <- data.table(mango = 1:5, banana = c(F, F, T, T, F)) w dt_set_logical2factor(w, cols = 2, labels = c("Ugh", "Huh")) w # Column attributes are maintained by default: z <- data.table(alpha = 1:5, beta = c(T, F, T, NA, T), gamma = c(F, F, T, F, NA)) for (i in seq_along(z)) setattr(z[[i]], "source", "Guava") str(z) dt_set_logical2factor(z, cols = "beta", labels = c("No", "Yes")) str(z) ## End(Not run)
## Not run: library(data.table) x <- data.table(a = 1:5, b = c(T, F, F, F, T)) x dt_set_logical2factor(x) x z <- data.table(alpha = 1:5, beta = c(T, F, T, NA, T), gamma = c(F, F, T, F, NA)) z # You can usee fillNA to fill NA values with a constant dt_set_logical2factor(z, cols = "beta", labels = c("No", "Yes"), fillNA = "No") z w <- data.table(mango = 1:5, banana = c(F, F, T, T, F)) w dt_set_logical2factor(w, cols = 2, labels = c("Ugh", "Huh")) w # Column attributes are maintained by default: z <- data.table(alpha = 1:5, beta = c(T, F, T, NA, T), gamma = c(F, F, T, F, NA)) for (i in seq_along(z)) setattr(z[[i]], "source", "Guava") str(z) dt_set_logical2factor(z, cols = "beta", labels = c("No", "Yes")) str(z) ## End(Not run)
Check loss vector for early stopping criteria: - either total percent decrease from starting error (e.g. if predictions started at expectation) - or minimum percent decrease (relative to the first value of the vector) over a window of last n steps
earlystop( x, window = 10, window_decrease_pct_min = 0.01, total_decrease_pct_max = NULL, verbose = TRUE )
earlystop( x, window = 10, window_decrease_pct_min = 0.01, total_decrease_pct_max = NULL, verbose = TRUE )
x |
Numeric vector: loss at each iteration |
window |
Integer: Number of steps to consider |
window_decrease_pct_min |
Float: Stop if improvement is less than this percent over last |
total_decrease_pct_max |
Float: Stop if improvement from first to last step exceeds this percent. If defined, overrides |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
If the first loss value was set to be the loss when yhat = mean(y) (e.g. in boosting), then
total_decrease_pct_max
corresponds to R-squared and window_decrease_pct_min
to percent R-squared
improvement over window
last steps.
E.D. Gennatas
Expand a boost object by adding more iterations
expand.boost( object, x, y = NULL, x.valid = NULL, y.valid = NULL, x.test = NULL, y.test = NULL, mod = NULL, resid = NULL, mod.params = NULL, max.iter = 10, learning.rate = NULL, case.p = 1, prefix = NULL, verbose = TRUE, trace = 0, print.error.plot = "final", print.plot = FALSE )
expand.boost( object, x, y = NULL, x.valid = NULL, y.valid = NULL, x.test = NULL, y.test = NULL, mod = NULL, resid = NULL, mod.params = NULL, max.iter = 10, learning.rate = NULL, case.p = 1, prefix = NULL, verbose = TRUE, trace = 0, print.error.plot = "final", print.plot = FALSE )
object |
boost object |
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.valid |
Data.frame; optional: Validation data |
y.valid |
Float, vector; optional: Validation outcome |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
mod |
Character: Algorithm to train base learners, for options, see select_learn. Default = "cart" |
resid |
Float, vector, length = length(y): Residuals to work on. Do not change unless you know what you're doing. Default = NULL, for regular boosting |
mod.params |
Named list of arguments for |
max.iter |
Integer: Maximum number of iterations (additive steps) to perform. Default = 10 |
learning.rate |
Float (0, 1] Learning rate for the additive steps |
case.p |
Float (0, 1]: Train each iteration using this perceent of cases. Default = 1, i.e. use all cases |
prefix |
Internal |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If > 0, print diagnostic info to console |
print.error.plot |
String or Integer: "final" plots a training and validation (if available) error curve at the end of training. If integer, plot training and validation error curve every this many iterations during training. "none" for no plot. |
print.plot |
Logical: if TRUE, produce plot using |
E.D. Gennatas
Explain individual-level model predictions
explain(mod, x, digits = 2, top = NULL, trace = 0)
explain(mod, x, digits = 2, top = NULL, trace = 0)
mod |
|
x |
Single-row data.frame of predictors. |
digits |
Integer: Number of digits to round coefficients to. |
top |
Integer: Number of top rules to show by absolute coefficient. |
trace |
Integer: If > 0, print more messages to output. |
ED Gennatas
Calculate the F1 score for classification:
f1(precision, recall)
f1(precision, recall)
precision |
Float [0, 1]: Precision a.k.a. Positive Predictive Value |
recall |
Float [0, 1]: Recall a.k.a. Sensitivity |
E.D. Gennatas
Factor harmonize
factor_harmonize(reference, x, verbosity = 1)
factor_harmonize(reference, x, verbosity = 1)
reference |
Reference factor |
x |
Input factor |
verbosity |
Integer: If > 0, print messages to console. |
Set NA values of a factor vector to a new level indicating missingness
factor_NA2missing(x, na_level_name = "missing")
factor_NA2missing(x, na_level_name = "missing")
x |
Factor |
na_level_name |
Character: Name of new level to create that will be assigned to all current NA values. Default = "missing" |
E.D. Gennatas
x <- factor(sample(letters[1:3], 100, TRUE)) x[sample(1:100, 10)] <- NA xm <- factor_NA2missing(x)
x <- factor(sample(letters[1:3], 100, TRUE)) x[sample(1:100, 10)] <- NA xm <- factor_NA2missing(x)
Perform parallel analysis, factor analysis, bifactor analysis and hierarchical clustering
factoryze( x, n.factors = NULL, method = "minres", rotation = "oblimin", scores = "regression", cor = "cor", fa.n.iter = 100, omega.method = "minres", omega.rotation = c("oblimin", "simplimax", "promax", "cluster", "target"), omega.n.iter = 1, x.name = NULL, print.plot = TRUE, do.pa = TRUE, do.fa = TRUE, do.bifactor = TRUE, do.hclust = FALSE, verbose = TRUE, ... )
factoryze( x, n.factors = NULL, method = "minres", rotation = "oblimin", scores = "regression", cor = "cor", fa.n.iter = 100, omega.method = "minres", omega.rotation = c("oblimin", "simplimax", "promax", "cluster", "target"), omega.n.iter = 1, x.name = NULL, print.plot = TRUE, do.pa = TRUE, do.fa = TRUE, do.bifactor = TRUE, do.hclust = FALSE, verbose = TRUE, ... )
x |
Data. Will be coerced to data frame |
n.factors |
Integer: If NULL, will be estimated using parallel analysis |
method |
Character: Factor analysis method: "minres": minimum residual (OLS), "wls": weighted least squares (WLS); "gls": generalized weighted least squares (GLS); "pa": principal factor solution; "ml": maximum likelihood; "minchi": minimize the sample size weighted chi square when treating pairwise correlations with different number of subjects per pair; "minrank": minimum rank factor analysis. Default = "minres" |
rotation |
Character: Rotation methods. No rotation: "none"; Orthogonal: "varimax", "quartimax", "bentlerT", "equamax", "varimin", "geominT", "bifactor"; Oblique: "promax", "oblimin", "simplimax", "bentlerQ, "geominQ", "biquartimin", "cluster". Default = "oblimin" |
scores |
Character: Factor score estimation method. Options: "regression", "Thurstone": simple regression, "tenBerge": correlation-preserving, "Anderson", "Barlett". Default = "regression" |
cor |
Character: Correlation method: "cor": Pearson correlation, "cov": Covariance, "tet": tetrachoric, "poly": polychoric, "mixed": mixed cor for a mixture of tetrachorics, polychorics, Pearsons, biserials, and polyserials, "Yuleb": Yulebonett, "Yuleq" and "YuleY": Yule coefficients |
fa.n.iter |
Integer: Number of iterations for factor analysis. Default = 100 |
omega.method |
Character: Factor analysis method for the bifactor analysis. Same options as |
omega.rotation |
Character: Rotation method for bifactor analysis: "oblimin", "simplimax", "promax", "cluster", "target". Default = "oblimin" |
omega.n.iter |
Integer: Number of iterations for bifactor analysis. Default = 1 |
x.name |
Character: Name your dataset. Used for plotting |
print.plot |
Logical: If TRUE, print plots along the way. Default = TRUE |
do.pa |
Logical: If TRUE, perform parallel analysis. Default = TRUE |
do.fa |
Logical: If TRUE, perform factor analysis. Default = TRUE |
do.bifactor |
Logical: If TRUE, perform bifactor analysis. Default = TRUE |
do.hclust |
Logical: If TRUE, perform hierarchical cluster analysis. Default = TRUE |
verbose |
Logical: If TRUE, print messages to output. Default = TRUE |
... |
Additional arguments to pass to |
Consult psych::fa
for more information on the parameters
E.D. Gennatas
Outputs a single character with names and counts of each level of the input factor
fct_describe(x, max_n = 5, return_ordered = TRUE)
fct_describe(x, max_n = 5, return_ordered = TRUE)
x |
factor |
max_n |
Integer: Return counts for up to this many levels |
return_ordered |
Logical: If TRUE, return levels ordered by count, otherwise return in level order |
Character with level counts
E.D. Gennatas
## Not run: # Small number of levels fct_describe(iris$Species) # Large number of levels: show top n by count x <- factor(sample(letters, 1000, TRUE)) fct_describe(x) fct_describe(x, 3) ## End(Not run)
## Not run: # Small number of levels fct_describe(iris$Species) # Large number of levels: show top n by count x <- factor(sample(letters, 1000, TRUE)) fct_describe(x) fct_describe(x, 3) ## End(Not run)
call
objectsFormat method for call
objects
## S3 method for class 'call' format(x, as.html = FALSE, class = "rtcode", ...)
## S3 method for class 'call' format(x, as.html = FALSE, class = "rtcode", ...)
x |
|
as.html |
Logical: If TRUE, output HTML span element |
class |
Character: CSS class to assign to span containing code |
... |
Not used |
E.D. Gennatas
## Not run: irmod <- elevate(iris, mod = "cart", maxdepth = 2:3, n.resamples = 9, train.p = .85 ) format(irmod$call) |> cat() ## End(Not run)
## Not run: irmod <- elevate(iris, mod = "cart", maxdepth = 2:3, n.resamples = 9, train.p = .85 ) format(irmod$call) |> cat() ## End(Not run)
Converts R-executable logical expressions to a more human-friendly format
formatLightRules(x, space.after.comma = FALSE, decimal.places = NULL)
formatLightRules(x, space.after.comma = FALSE, decimal.places = NULL)
x |
Vector, string: Logical expressions |
space.after.comma |
Logical: If TRUE, place spaces after commas. Default = false |
decimal.places |
Integer: Limit all floats (numbers of the form 9.9) to this many decimal places |
E.D. Gennatas
Converts R-executable logical expressions to a more human-friendly format
formatRules(x, space.after.comma = FALSE, decimal.places = NULL)
formatRules(x, space.after.comma = FALSE, decimal.places = NULL)
x |
Vector, string: Logical expressions |
space.after.comma |
Logical: If TRUE, place spaces after commas. Default = false |
decimal.places |
Integer: Limit all floats (numbers of the form 9.9) to this many decimal places |
E.D. Gennatas
Convert Full width at half maximum values to sigma
fwhm2sigma(fwhm)
fwhm2sigma(fwhm)
fwhm |
FWHM value |
sigma
E.D. Gennatas
fwhm2sigma(8) # FWHM of 8 is equivalent to sigma = 3.397287
fwhm2sigma(8) # FWHM of 8 is equivalent to sigma = 3.397287
Get version of all loaded packages (namespaces)
get_loaded_pkg_version()
get_loaded_pkg_version()
Data frame with columns "Package_Name" and "Version"
E.D. Gennatas
Returns the mode of a factor or integer
get_mode(x, na.exclude = TRUE, getlast = TRUE, retain.class = TRUE)
get_mode(x, na.exclude = TRUE, getlast = TRUE, retain.class = TRUE)
x |
Vector, factor or integer: Input data |
na.exclude |
Logical: If TRUE, exclude NAs |
getlast |
Logical: If TRUE, get |
retain.class |
Logical: If TRUE, output is always same class as input |
The mode of x
E.D. Gennatas
x <- c(9, 3, 4, 4, 0, 2, 2, NA) get_mode(x) x <- c(9, 3, 2, 2, 0, 4, 4, NA) get_mode(x) get_mode(x, getlast = FALSE)
x <- c(9, 3, 4, 4, 0, 2, 2, NA) get_mode(x) x <- c(9, 3, 2, 2, 0, 4, 4, NA) get_mode(x) get_mode(x, getlast = FALSE)
Get rules generated by s_RuleFit or s_LightRuleFit
get_rules( mod, formatted = FALSE, collapse = TRUE, collapse.keep.names = FALSE, collapse.unique = TRUE )
get_rules( mod, formatted = FALSE, collapse = TRUE, collapse.keep.names = FALSE, collapse.unique = TRUE )
mod |
Model created by s_RuleFit or s_LightRuleFit |
formatted |
Logical: If TRUE, return human-readable rules, otherwise return R-parsable rules |
collapse |
Logical: If TRUE, collapse all rules to a single character vector |
collapse.keep.names |
Logical: If TRUE, keep names when collapsing (will
be able to tell which run each rule came from). However, has no effect if
|
collapse.unique |
Logical: If TRUE and |
ED Gennatas
Extract variable names from rules
get_vars_from_rules(rules, unique = FALSE)
get_vars_from_rules(rules, unique = FALSE)
rules |
Character vector: Rules. |
unique |
Logical: If TRUE, return only unique variables. |
Character vector: Variable names.
E.D. Gennatas
Get factor/numeric/logical/character names from data.frame/data.table
getfactornames(x)
getfactornames(x)
x |
data.frame or data.table (or data.frame-compatible object) |
E.D. Gennatas
Get names by string matching
getnames( x, pattern = NULL, starts_with = NULL, ends_with = NULL, ignore.case = TRUE ) getnumericnames(x) getlogicalnames(x) getcharacternames(x) getdatenames(x)
getnames( x, pattern = NULL, starts_with = NULL, ends_with = NULL, ignore.case = TRUE ) getnumericnames(x) getlogicalnames(x) getcharacternames(x) getdatenames(x)
x |
object with |
pattern |
Character: pattern to match anywhere in names of x |
starts_with |
Character: pattern to match in the beginning of names of x |
ends_with |
Character: pattern to match at the end of names of x |
ignore.case |
Logical: If TRUE, well, ignore case. Default = TRUE |
E.D. Gennatas
Get data.frame names and types
getnamesandtypes(x)
getnamesandtypes(x)
x |
data.frame / data.table or similar |
character vector of column names with attribute "type" holding the class of each column
ggplot2
dark themertemis ggplot2
dark theme
ggtheme_dark( base_size = 14, base_family = "Helvetica Neue", base_line_size = base_size/22, base_rect_size = base_size/22, axis.text.size.rel = 1, legend.key.fill = NA, legend.text.size.rel = 1, legend.position = "right", strip.background.fill = "gray25" )
ggtheme_dark( base_size = 14, base_family = "Helvetica Neue", base_line_size = base_size/22, base_rect_size = base_size/22, axis.text.size.rel = 1, legend.key.fill = NA, legend.text.size.rel = 1, legend.position = "right", strip.background.fill = "gray25" )
base_size |
Float: Base font size. Default = 14 |
base_family |
Character: Font family. Default = "Helvetica Neue" |
base_line_size |
Float: Line size. Default = base_size/22 |
base_rect_size |
Float: Size for rect elements. Default = base_size/22 |
axis.text.size.rel |
Float: Relative size for axis text. Default = 1 |
legend.key.fill |
Color: Fill color for legend. Default = NA (no color) |
legend.text.size.rel |
Float: Relative size for legend text. Default = 1 |
legend.position |
Character: Legend position, "top", "bottom", "right", "left" Default = "right" |
strip.background.fill |
Color: Fill color from facet labels. Default = "gray25" |
E.D. Gennatas
## Not run: (p <- ggplot(iris, aes(Sepal.Length, Petal.Length, color = Species)) + geom_point() + ggtheme_light()) ## End(Not run)
## Not run: (p <- ggplot(iris, aes(Sepal.Length, Petal.Length, color = Species)) + geom_point() + ggtheme_light()) ## End(Not run)
ggplot2
light themertemis ggplot2
light theme
ggtheme_light( base_size = 14, base_family = "Helvetica Neue", base_line_size = base_size/22, base_rect_size = base_size/22, axis.text.size.rel = 1, legend.key.fill = NA, legend.text.size.rel = 1, legend.position = "right", strip.background.fill = "grey85" )
ggtheme_light( base_size = 14, base_family = "Helvetica Neue", base_line_size = base_size/22, base_rect_size = base_size/22, axis.text.size.rel = 1, legend.key.fill = NA, legend.text.size.rel = 1, legend.position = "right", strip.background.fill = "grey85" )
base_size |
Float: Base font size. Default = 14 |
base_family |
Character: Font family. Default = "Helvetica Neue" |
base_line_size |
Float: Line size. Default = base_size/22 |
base_rect_size |
Float: Size for rect elements. Default = base_size/22 |
axis.text.size.rel |
Float: Relative size for axis text. Default = 1 |
legend.key.fill |
Color: Fill color for legend. Default = NA (no color) |
legend.text.size.rel |
Float: Relative size for legend text. Default = 1 |
legend.position |
Character: Legend position, "top", "bottom", "right", "left" Default = "right" |
strip.background.fill |
Color: Fill color from facet labels. Default = "grey85" |
E.D. Gennatas
## Not run: (p <- ggplot(iris, aes(Sepal.Length, Petal.Length, color = Species)) + geom_point() + ggtheme_light()) ## End(Not run)
## Not run: (p <- ggplot(iris, aes(Sepal.Length, Petal.Length, color = Species)) + geom_point() + ggtheme_light()) ## End(Not run)
rpart
A super-stripped down decision tree for when space and performance are critical
glmLite( x, y, weights = NULL, method = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve"), alpha = 0, lambda = 0.01, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), save.fitted = FALSE, ... )
glmLite( x, y, weights = NULL, method = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve"), alpha = 0, lambda = 0.01, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), save.fitted = FALSE, ... )
x |
Feature matrix or data.frame. Will be coerced to data.frame for method = "allSubsets", "forwardStepwise", or "backwardStepwise" |
y |
Outcome |
weights |
Float, vector: Case weights |
method |
Character: Method to use:
|
alpha |
Float: |
lambda |
Float: The lambda value for |
lambda.seq |
Float, vector: lambda sequence for |
cv.glmnet.nfolds |
Integer: Number of folds for |
which.cv.glmnet.lambda |
Character: Whitch lambda to pick from cv.glmnet: "lambda.min": Lambda that gives minimum cross-validated error; |
nbest |
Integer: For |
nvmax |
Integer: For |
sgd.model |
Character: Model to use for |
sgd.model.control |
List: |
sgd.control |
List: |
save.fitted |
Logical: If TRUE, save fitted values in output. Default = FALSE |
... |
Additional arguments to pass to lincoef |
E.D. Gennatas
Geometric mean
gmean(x)
gmean(x)
x |
Numeric vector |
E.D. Gennatas
x <- c(1, 3, 5) mean(x) gmean(x) # same as, but a little faster than: exp(mean(log(x)))
x <- c(1, 3, 5) mean(x) gmean(x) # same as, but a little faster than: exp(mean(log(x)))
Fit a gaussian process
gp( x, y, new.x = NULL, x.name = "x", y.name = "y", print.plot = TRUE, lwd = 3, cex = 1.2, par.reset = TRUE, ... )
gp( x, y, new.x = NULL, x.name = "x", y.name = "y", print.plot = TRUE, lwd = 3, cex = 1.2, par.reset = TRUE, ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
new.x |
(Optional) Numeric vector or matrix of new set of features
Must have same set of columns as |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, draw plot when done |
lwd |
Line width for plotting |
cex |
Character expansion factor for plotting |
par.reset |
Logical. Reset |
... |
Additional arguments to be passed to tgp::bgp |
E.D. Gennatas
Node-wise (i.e. vertex-wise) graph metrics
graph_node_metrics(x, verbose = TRUE)
graph_node_metrics(x, verbose = TRUE)
x |
igraph network |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
## Not run: datcor <- cor(rnormmat(20, 20, seed = 2021)) datcor[sample(seq(datcor), 250)] <- 0 x <- igraph::graph_from_adjacency_matrix(adjmatrix = datcor, mode = "lower", weighted = TRUE, diag = FALSE) graph_node_metrics(x) ## End(Not run)
## Not run: datcor <- cor(rnormmat(20, 20, seed = 2021)) datcor[sample(seq(datcor), 250)] <- 0 x <- igraph::graph_from_adjacency_matrix(adjmatrix = datcor, mode = "lower", weighted = TRUE, diag = FALSE) graph_node_metrics(x) ## End(Not run)
Checks if grid search needs to be performed.
All tunable parameters should be passed to this function, individually or as
a list. If any argument has more than one assigned values, the function
returns TRUE, otherwise FALSE. This can be used to check whether
gridSearchLearn
must be run.
gridCheck(...)
gridCheck(...)
... |
Parameters; will be converted to a list |
The idea is that if you know which parameter values you want to use, you
define them directly
e.g. alpha = 0, lambda = .2
.
If you don't know, you enter the set of values to be tested,
e.g. alpha = c(0, .5, 1), lambda = seq(.1, 1, .1)
.
Compare vectors element-wise, and tabulate N times each vector is greater than the others
gtTable(x = list(), x.name = NULL, na.rm = TRUE, verbose = TRUE)
gtTable(x = list(), x.name = NULL, na.rm = TRUE, verbose = TRUE)
x |
List of vectors of same length |
x.name |
Character: Name of measure being compared |
na.rm |
Passed to |
verbose |
Logical: If TRUE, write output to console |
E.D. Gennatas
Basic Bivariate Hypothesis Testing and Plotting
htest( y, group = NULL, x = NULL, yname = NULL, groupname = NULL, xname = NULL, test = c("t.test", "wilcox.test", "aov", "kruskal.test", "chisq.test", "fisher.test", "cor.test", "pearson", "kendall", "spearman", "ks"), print.plot = TRUE, plot.args = list(), theme = rtTheme, verbose = TRUE, ... )
htest( y, group = NULL, x = NULL, yname = NULL, groupname = NULL, xname = NULL, test = c("t.test", "wilcox.test", "aov", "kruskal.test", "chisq.test", "fisher.test", "cor.test", "pearson", "kendall", "spearman", "ks"), print.plot = TRUE, plot.args = list(), theme = rtTheme, verbose = TRUE, ... )
y |
Float, vector: Outcome of interest |
group |
Factor: Groups to compare |
x |
Float, vector: Second outcome for correlation tests |
yname |
Character: y variable name |
groupname |
Character: group variable name |
xname |
Character: x variable name |
test |
Character: Test to use; one of:
|
print.plot |
Logical: If TRUE, print plot. Default = TRUE |
plot.args |
List of arguments to pass to plotting function |
theme |
Character: Run |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
... |
Additional arguments to pass to test call |
E.D. Gennatas
## Not run: # t.test, wilcoxon y <- c(rnorm(200, 2, 1.2), rnorm(300, 2.5, 1.4)) group <- c(rep(1, 200), rep(2, 300)) ht_ttest <- htest(y, group, test = "t.test") ht_wilcoxon <- htest(y, group, test = "wilcox.test") # aov, kruskal y <- c(rnorm(200, 2, 1.2), rnorm(300, 2.5, 1.4), rnorm(100, 2.3, 1.1)) group <- c(rep(1, 200), rep(2, 300), rep(3, 100)) ht_aov <- htest(y, group, test = "aov") ht_kruskal <- htest(y, group, test = "kruskal.test") # chisq, fisher y <- c(sample(c(1, 2), 100, T, c(.7, .3)), sample(c(1, 2), 100, T, c(.35, .65))) group <- c(rep(1, 100), rep(2, 100)) ht_chisq <- htest(y, group, test = "chisq") ht_fisher <- htest(y, group, test = "fisher") # cor.test x <- rnorm(300) y <- x * .3 + rnorm(300) ht_pearson <- htest(x = x, y = y, test = "pearson") ht_kendall <- htest(x = x, y = y, test = "kendall") ht_kendall <- htest(x = x, y = y, test = "spearman") ## End(Not run)
## Not run: # t.test, wilcoxon y <- c(rnorm(200, 2, 1.2), rnorm(300, 2.5, 1.4)) group <- c(rep(1, 200), rep(2, 300)) ht_ttest <- htest(y, group, test = "t.test") ht_wilcoxon <- htest(y, group, test = "wilcox.test") # aov, kruskal y <- c(rnorm(200, 2, 1.2), rnorm(300, 2.5, 1.4), rnorm(100, 2.3, 1.1)) group <- c(rep(1, 200), rep(2, 300), rep(3, 100)) ht_aov <- htest(y, group, test = "aov") ht_kruskal <- htest(y, group, test = "kruskal.test") # chisq, fisher y <- c(sample(c(1, 2), 100, T, c(.7, .3)), sample(c(1, 2), 100, T, c(.35, .65))) group <- c(rep(1, 100), rep(2, 100)) ht_chisq <- htest(y, group, test = "chisq") ht_fisher <- htest(y, group, test = "fisher") # cor.test x <- rnorm(300) y <- x * .3 + rnorm(300) ht_pearson <- htest(x = x, y = y, test = "pearson") ht_kendall <- htest(x = x, y = y, test = "kendall") ht_kendall <- htest(x = x, y = y, test = "spearman") ## End(Not run)
Checks character or factor vector to determine whether it might be best to convert to numeric.
inspect_type(x, xname = NULL, verbose = TRUE, thresh = 0.5, na.omit = TRUE)
inspect_type(x, xname = NULL, verbose = TRUE, thresh = 0.5, na.omit = TRUE)
x |
Character or factor vector. |
xname |
Character: Name of input vector |
verbose |
Logical: If TRUE, print messages to console. |
thresh |
Numeric: Threshold for determining whether to convert to numeric. |
na.omit |
Logical: If TRUE, remove NA values before checking. |
E.D. Gennatas
## Not run: x <- c("3", "5", "undefined", "21", "4", NA) inspect_type(x) z <- c("mango", "banana", "tangerine", NA) inspect_type(z) ## End(Not run)
## Not run: x <- c("3", "5", "undefined", "21", "4", NA) inspect_type(x) z <- c("mango", "banana", "tangerine", NA) inspect_type(z) ## End(Not run)
Inverse Logit
invlogit(x)
invlogit(x)
x |
Float: Input data |
The inverse logit of the input
E.D. Gennatas
Check if vector is constant
is_constant(x, skip_missing = FALSE)
is_constant(x, skip_missing = FALSE)
x |
Vector: Input |
skip_missing |
Logical: If TRUE, skip NA values before testing |
E.D. Gennatas
## Not run: x <- rep(9, 1000000) is_constant(x) x[10] <- NA is_constant(x) is_constant(x, skip_missing = TRUE) ## End(Not run)
## Not run: x <- rep(9, 1000000) is_constant(x) x[10] <- NA is_constant(x) is_constant(x, skip_missing = TRUE) ## End(Not run)
Check if variable is discrete (factor or integer)
is_discrete(x)
is_discrete(x)
x |
Input |
E.D. Gennatas
K-fold Resampling
kfold( x, k = 10, stratify.var = NULL, strat.n.bins = 4, seed = NULL, verbosity = TRUE )
kfold( x, k = 10, stratify.var = NULL, strat.n.bins = 4, seed = NULL, verbosity = TRUE )
x |
Input Vector |
k |
Integer: Number of folds. Default = 10 |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
strat.n.bins |
Integer: Number of groups to use for stratification for
|
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
verbosity |
Logical: If TRUE, print messages to console |
E.D. Gennatas
Format text for label printing
labelify( x, underscoresToSpaces = TRUE, dotsToSpaces = TRUE, toLower = FALSE, toTitleCase = TRUE, capitalize.strings = c("id"), stringsToSpaces = c("\\$", "`") )
labelify( x, underscoresToSpaces = TRUE, dotsToSpaces = TRUE, toLower = FALSE, toTitleCase = TRUE, capitalize.strings = c("id"), stringsToSpaces = c("\\$", "`") )
x |
Character: Input |
underscoresToSpaces |
Logical: If TRUE, convert underscores to spaces. |
dotsToSpaces |
Logical: If TRUE, convert dots to spaces. |
toLower |
Logical: If TRUE, convert to lowercase (precedes |
toTitleCase |
Logical: If TRUE, convert to Title Case. Default = TRUE (This does not change
all-caps words, set |
capitalize.strings |
Character, vector: Always capitalize these strings, if present. Default = |
stringsToSpaces |
Character, vector: Replace these strings with spaces. Escape as needed for |
E.D. Gennatas
Get linear model coefficients
lincoef( x, y, weights = NULL, method = "glmnet", type = c("Regression", "Classification", "Survival"), learning.rate = 1, alpha = 1, lambda = 0.05, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), trace = 0 )
lincoef( x, y, weights = NULL, method = "glmnet", type = c("Regression", "Classification", "Survival"), learning.rate = 1, alpha = 1, lambda = 0.05, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), trace = 0 )
x |
Feature matrix or data.frame. Will be coerced to data.frame for method = "allSubsets", "forwardStepwise", or "backwardStepwise" |
y |
Outcome |
weights |
Float, vector: Case weights |
method |
Character: Method to use:
|
type |
Character: "Regression", "Classification", or "Survival" |
learning.rate |
Numeric: Coefficients will be multiplied by this number |
alpha |
Float: |
lambda |
Float: The lambda value for |
lambda.seq |
Float, vector: lambda sequence for |
cv.glmnet.nfolds |
Integer: Number of folds for |
which.cv.glmnet.lambda |
Character: Whitch lambda to pick from cv.glmnet: "lambda.min": Lambda that gives minimum cross-validated error; |
nbest |
Integer: For |
nvmax |
Integer: For |
sgd.model |
Character: Model to use for |
sgd.model.control |
List: |
sgd.control |
List: |
trace |
Integer: If set to zero, all warnings are ignored |
This function minimizes checks for speed. It doesn't check dimensionality
of x
.
Only use methods "glm", "sgd", or "solve" if there is only one feature in
x
.
Named numeric vector of linear coefficients
E.D. Gennatas
Write list elements to CSV files
list2csv(x, outdir)
list2csv(x, outdir)
x |
List containing R objects to be written to CSV (e.g. data.frames, matrices, etc.) |
outdir |
Character: Path to output directory |
E.D. Gennatas
Logistic function
logistic(x, x0 = 0, L = 1, k = 1)
logistic(x, x0 = 0, L = 1, k = 1)
x |
Float: Input |
x0 |
x-value of the midpoint. |
L |
maximum value. |
k |
steepness of the curve. |
Log Loss for a binary classifier
logloss(true, estimated.prob)
logloss(true, estimated.prob)
true |
Factor: True labels. First level is the positive case |
estimated.prob |
Float, vector: Estimated probabilities |
E.D. Gennatas
Leave-one-out Resampling
loocv(x)
loocv(x)
x |
Input vector |
E.D. Gennatas
Turn the lower triangle of a connectivity matrix (e.g. correlation matrix or similar) to an edge list of the form: Source, Target, Weight
lotri2edgeList(A, filename = NULL, verbose = TRUE)
lotri2edgeList(A, filename = NULL, verbose = TRUE)
A |
Square matrix |
filename |
Character: Path for csv file. Defaults to "conmat2edgelist.csv" |
verbose |
Logical: If TRUE, print messages to console |
The output can be read, for example, into gephi
E.D. Gennatas
lsapply
lsapply
lsapply(X, FUN, ..., outnames = NULL, simplify = FALSE)
lsapply(X, FUN, ..., outnames = NULL, simplify = FALSE)
X |
a vector (atomic or list) or an |
FUN |
the function to be applied to each element of |
... |
optional arguments to |
outnames |
Character vector: Optional names to apply to output |
simplify |
logical or character string; should the result be
simplified to a vector, matrix or higher dimensional array if
possible? For |
Make key from data.table id - description columns
make_key(x, code_name, description_name, filename = NULL)
make_key(x, code_name, description_name, filename = NULL)
x |
Input data.table |
code_name |
Character: Name of column name that holds codes |
description_name |
Character: Name of column that holds descriptions |
filename |
Character: Path to file to save CSV with key |
E.D. Gennatas
Fits a GAM for each of multiple outcomes using a fixed set of features (many y's, one X).
massGAM( x, y, covariates = NULL, x.name = NULL, y.name = NULL, k = NULL, family = gaussian(), weights = NULL, method = "REML", n.cores = rtCores, save.mods = FALSE, save.summary = TRUE, print.plots = FALSE, outdir = NULL, save.plots = FALSE, new.x.breaks = 9 )
massGAM( x, y, covariates = NULL, x.name = NULL, y.name = NULL, k = NULL, family = gaussian(), weights = NULL, method = "REML", n.cores = rtCores, save.mods = FALSE, save.summary = TRUE, print.plots = FALSE, outdir = NULL, save.plots = FALSE, new.x.breaks = 9 )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric matrix / data frame: Outcomes |
covariates |
Numeric matrix / data.frame of additional covariates |
x.name |
Character: Name of the predictor |
y.name |
Character, vector: Names of the outcomes |
k |
Integer: Basis dimension for smoothing spline |
family |
|
weights |
Vector, numeric: Weights for GAM |
method |
Estimation method for GAM |
n.cores |
Integer. Number of cores to use |
save.mods |
Logical. Should models be saved |
save.summary |
Logical. Should model summary be saved |
print.plots |
Logical Should plots be shown |
outdir |
Path to save output |
save.plots |
Logical. Should plots be saved |
new.x.breaks |
Integer. Number of splits in the range of x to form vector of features for estimation of fitted values |
NA in the input will be kept as NA in the results, maintaining n of cases.
E.D. Gennatas
Run a mass-univariate analysis with either: a) single outome (y) and multiple predictors (x), one at a time, with an optional common set of covariates in each model - "massx" b) multiple different outcomes (y) with a fixed set of predictors (x) - "massy" Therefore, the term mass-univariate refers to looking at one variable of interest (with potential covariates of no interest) at a time
massGLAM( x, y, scale.x = FALSE, scale.y = FALSE, mod = c("glm", "gam"), type = NULL, xnames = NULL, ynames = NULL, spline.index = NULL, gam.k = 6, save.mods = TRUE, print.plot = FALSE, include_anova_pvals = NA, verbose = TRUE, trace = 0, n.cores = 1 )
massGLAM( x, y, scale.x = FALSE, scale.y = FALSE, mod = c("glm", "gam"), type = NULL, xnames = NULL, ynames = NULL, spline.index = NULL, gam.k = 6, save.mods = TRUE, print.plot = FALSE, include_anova_pvals = NA, verbose = TRUE, trace = 0, n.cores = 1 )
x |
Matrix / data frame of features |
y |
Matrix / data frame of outcomes |
scale.x |
Logical: If TRUE, scale and center |
scale.y |
Logical: If TRUE, scale and center |
mod |
Character: "glm" or "gam". |
type |
Character: "massx" or "massy". Default = NULL, where if (NCOL(x) > NCOL(y)) "massx" else "massy" |
xnames |
Character vector: names of |
ynames |
Character vector: names of |
spline.index |
Integer vector: indices of features to fit splines for. |
gam.k |
Integer: The dimension of the spline basis. |
save.mods |
Logical: If TRUE, save models. Default = TRUE |
print.plot |
Logical: If TRUE, print plot. Default = FALSE (best to choose which p-values you want to plot directly) |
include_anova_pvals |
Logical: If TRUE, include ANOVA p-values,
generated by |
verbose |
Logical: If TRUE, print messages during run |
trace |
Integer: If > 0, print more verbose output to console. |
n.cores |
Integer: Number of cores to use. (Testing only, do not change from 1) |
E.D. Gennatas
## Not run: # Common usage is "reversed": # x: outcome of interest as first column, optional covariates # in the other columns # y: features whose association with x we want to study set.seed(2022) features <- rnormmat(500, 40) outcome <- features[, 3] - features[, 5] + features[, 14] + rnorm(500) massmod <- massGLAM(outcome, features) plot(massmod) plot(massmod, what = "coef") plot(massmod, what = "volcano") ## End(Not run)
## Not run: # Common usage is "reversed": # x: outcome of interest as first column, optional covariates # in the other columns # y: features whose association with x we want to study set.seed(2022) features <- rnormmat(500, 40) outcome <- features[, 3] - features[, 5] + features[, 14] + rnorm(500) massmod <- massGLAM(outcome, features) plot(massmod) plot(massmod, what = "coef") plot(massmod, what = "volcano") ## End(Not run)
Run a mass-univariate analysis with either: a) single outome (y) and multiple predictors (x), one at a time, with an optional common set of covariates in each model - "massx" b) multiple different outcomes (y) with a fixed set of predictors (x) - "massy" Therefore, the term mass-univariate refers to looking at one variable of interest (with potential covariates of no interest) at a time
massGLM( x, y, scale.x = FALSE, scale.y = FALSE, type = NULL, xnames = NULL, ynames = NULL, coerce.y.numeric = FALSE, save.mods = FALSE, print.plot = FALSE, include_anova_pvals = NA, verbose = TRUE, trace = 0 )
massGLM( x, y, scale.x = FALSE, scale.y = FALSE, type = NULL, xnames = NULL, ynames = NULL, coerce.y.numeric = FALSE, save.mods = FALSE, print.plot = FALSE, include_anova_pvals = NA, verbose = TRUE, trace = 0 )
x |
Matrix / data frame of features |
y |
Matrix / data frame of outcomes |
scale.x |
Logical: If TRUE, scale and center |
scale.y |
Logical: If TRUE, scale and center |
type |
Character: "massx" or "massy". Default = NULL, where if (NCOL(x) > NCOL(y)) "massx" else "massy" |
xnames |
Character vector: names of |
ynames |
Character vector: names of |
coerce.y.numeric |
Logical: If |
save.mods |
Logical: If TRUE, save models. |
print.plot |
Logical: If TRUE, print plot. |
include_anova_pvals |
Logical: If TRUE, include ANOVA p-values,
(generated by |
verbose |
Logical: If TRUE, print messages during run |
trace |
Integer: If > 0, print more verbose output to console. |
E.D. Gennatas
## Not run: # Common usage is "reversed": # x: outcome of interest as first column, optional covariates # in the other columns # y: features whose association with x we want to study set.seed(2022) features <- rnormmat(500, 40) outcome <- features[, 3] - features[, 5] + features[, 14] + rnorm(500) massmod <- massGLM(outcome, features) plot(massmod) plot(massmod, what = "coef") plot(massmod, what = "volcano") ## End(Not run)
## Not run: # Common usage is "reversed": # x: outcome of interest as first column, optional covariates # in the other columns # y: features whose association with x we want to study set.seed(2022) features <- rnormmat(500, 40) outcome <- features[, 3] - features[, 5] + features[, 14] + rnorm(500) massmod <- massGLM(outcome, features) plot(massmod) plot(massmod, what = "coef") plot(massmod, what = "volcano") ## End(Not run)
Run a mass-univariate analysis: same features (predictors) on multiple outcomes
massUni( x, y, mod = "gam", save.mods = FALSE, verbose = TRUE, n.cores = rtCores, ... )
massUni( x, y, mod = "gam", save.mods = FALSE, verbose = TRUE, n.cores = rtCores, ... )
x |
Matrix / data frame of features |
y |
Matrix / data frame of outcomes |
mod |
rtemis algorithm to use. Options: run |
save.mods |
Logical: If TRUE, save fitted models |
verbose |
Logical: If TRUE, print messages during run |
n.cores |
Integer: Number of cores to use |
... |
Arguments to be passed to |
E.D. Gennatas
Find one or more cases from a pool
data.frame that match cases in a target
data.frame. Match exactly and/or by distance (sum of squared distances).
matchcases( target, pool, n.matches = 1, target.id = NULL, pool.id = NULL, exactmatch.factors = TRUE, exactmatch.cols = NULL, distmatch.cols = NULL, norepeats = TRUE, ignore.na = FALSE, verbose = TRUE )
matchcases( target, pool, n.matches = 1, target.id = NULL, pool.id = NULL, exactmatch.factors = TRUE, exactmatch.cols = NULL, distmatch.cols = NULL, norepeats = TRUE, ignore.na = FALSE, verbose = TRUE )
target |
data.frame you are matching against |
pool |
data.frame you are looking for matches from |
n.matches |
Integer: Number of matches to return |
target.id |
Character: Column name in |
pool.id |
Character: Same as |
exactmatch.factors |
Logical: If TRUE, selected cases will have to
exactly match factors
available in |
exactmatch.cols |
Character: Names of columns that should be matched exactly |
distmatch.cols |
Character: Names of columns that should be distance-matched |
norepeats |
Logical: If TRUE, cases in |
ignore.na |
Logical: If TRUE, ignore NA values during exact matching. |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
E.D. Gennatas
set.seed(2021) cases <- data.frame( PID = paste0("PID", seq(4)), Sex = factor(c(1, 1, 0, 0)), Handedness = factor(c(1, 1, 0, 1)), Age = c(21, 27, 39, 24), Var = c(.7, .8, .9, .6), Varx = rnorm(4) ) controls <- data.frame( CID = paste0("CID", seq(50)), Sex = factor(sample(c(0, 1), 50, TRUE)), Handedness = factor(sample(c(0, 1), 50, TRUE, c(.1, .9))), Age = sample(16:42, 50, TRUE), Var = rnorm(50), Vary = rnorm(50) ) mc <- matchcases(cases, controls, 2, "PID", "CID")
set.seed(2021) cases <- data.frame( PID = paste0("PID", seq(4)), Sex = factor(c(1, 1, 0, 0)), Handedness = factor(c(1, 1, 0, 1)), Age = c(21, 27, 39, 24), Var = c(.7, .8, .9, .6), Varx = rnorm(4) ) controls <- data.frame( CID = paste0("CID", seq(50)), Sex = factor(sample(c(0, 1), 50, TRUE)), Handedness = factor(sample(c(0, 1), 50, TRUE, c(.1, .9))), Age = sample(16:42, 50, TRUE), Var = rnorm(50), Vary = rnorm(50) ) mc <- matchcases(cases, controls, 2, "PID", "CID")
Merge long format treatment and outcome data from multiple sources with possibly hierarchical matching IDs using data.table
mergelongtreatment( x, group_varnames, time_varname = "Date", start_date, end_date, interval_days = 14, verbose = TRUE, trace = 1 )
mergelongtreatment( x, group_varnames, time_varname = "Date", start_date, end_date, interval_days = 14, verbose = TRUE, trace = 1 )
x |
Named list: Long form datasets to merge. Will be converted to |
group_varnames |
Vector, character: Variable names to merge by, in order. If first is present on a given pair of datasets, merge on that, otherwise try the next in line. |
time_varname |
Character: Name of column that should be present in all datasets containing time information. Default = "Date" |
start_date |
Date or characcter: Start date for final dataset in format "YYYY-MM-DD" |
end_date |
Date or character: End dat for final dataset in format "YYYY-MM-DD" |
interval_days |
Integer: Starting with |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
trace |
Integer: If > 0 print additional info to console. Default = 1 |
Merged data.table
Train a meta model from the output of base learners trained using different learners (algorithms)
meta_mod( x, y = NULL, x.test = NULL, y.test = NULL, base.mods = c("mars", "ranger"), base.params = vector("list", length(base.mods)), base.resample.params = setup.resample(resampler = "kfold", n.resamples = 4), meta.mod = "gam", meta.params = list(), x.name = NULL, y.name = NULL, save.base.res = TRUE, save.base.full = FALSE, col = NULL, se.lty = 3, print.base.plot = FALSE, print.plot = TRUE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose.base.res.mods = FALSE, verbose.base.mods = FALSE, verbose = TRUE, trace = 0, base.n.cores = 1, n.cores = rtCores, save.mod = FALSE, outdir = NULL, ... )
meta_mod( x, y = NULL, x.test = NULL, y.test = NULL, base.mods = c("mars", "ranger"), base.params = vector("list", length(base.mods)), base.resample.params = setup.resample(resampler = "kfold", n.resamples = 4), meta.mod = "gam", meta.params = list(), x.name = NULL, y.name = NULL, save.base.res = TRUE, save.base.full = FALSE, col = NULL, se.lty = 3, print.base.plot = FALSE, print.plot = TRUE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose.base.res.mods = FALSE, verbose.base.mods = FALSE, verbose = TRUE, trace = 0, base.n.cores = 1, n.cores = rtCores, save.mod = FALSE, outdir = NULL, ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
(Optional) Numeric vector or matrix of validation set features
must have set of columns as |
y.test |
(Optional) Numeric vector of validation set outcomes |
base.mods |
Character vector: Two or more base learners. Options: select_learn |
base.params |
List of length equal to N of |
meta.mod |
String. Meta learner. Options: select_learn |
x.name |
Character: Name for predictor set. (What kind of data is it?) |
y.name |
Character: Name for outcome |
se.lty |
How to plot standard errors. If a number, it corresponds to par("lty") line types and is plotted with lines(). If "solid", a transparent polygon is plotted using polygon() |
resampler |
String. Resampling method to use. Options: "bootstrap", "kfold", "strat.boot", "strat.sub" |
This is included mainly for educational purposes.
Train a set of base learners on resamples of the training set x
Train a meta learner to map bases' validation set predictions to outcomes
Train base learners on full training set x
Use the meta learner to predict test set outcome y.test from testing set (x.test)
E.D. Gennatas
Get names by string matching multiple patterns
mgetnames( x, pattern = NULL, starts_with = NULL, ends_with = NULL, ignore.case = TRUE, return.index = FALSE )
mgetnames( x, pattern = NULL, starts_with = NULL, ends_with = NULL, ignore.case = TRUE, return.index = FALSE )
x |
Character vector or object with |
pattern |
Character vector: pattern(s) to match anywhere in names of x |
starts_with |
Character: pattern to match in the beginning of names of x |
ends_with |
Character: pattern to match at the end of names of x |
ignore.case |
Logical: If TRUE, well, ignore case. Default = TRUE |
return.index |
Logical: If TRUE, return integer index of matches instead of names |
Character vector of matched names or integer index
E.D. Gennatas
Draws a histogram using lines.
mhist( x, breaks = "Sturges", measure = c("density", "counts"), lwd = 3, xlim = NULL, ylim = NULL, plot.axes = FALSE, xaxis = TRUE, yaxis = TRUE, xaxis.line = 0, yaxis.line = 0, xlab = NULL, ylab = measure, xaxs = "r", yaxs = "r", box = FALSE, grid = FALSE, col = pennCol$lighterBlue, horiz = FALSE, main = "", add = FALSE, ... )
mhist( x, breaks = "Sturges", measure = c("density", "counts"), lwd = 3, xlim = NULL, ylim = NULL, plot.axes = FALSE, xaxis = TRUE, yaxis = TRUE, xaxis.line = 0, yaxis.line = 0, xlab = NULL, ylab = measure, xaxs = "r", yaxs = "r", box = FALSE, grid = FALSE, col = pennCol$lighterBlue, horiz = FALSE, main = "", add = FALSE, ... )
x |
Input vector |
breaks |
See |
measure |
Character: "density"(Default), "counts" |
lwd |
Float: Line width |
xlim |
Vector, length 2: x-axis limits |
ylim |
Vector, length 2: y-axis limits |
plot.axes |
Logical: If TRUE, draws plot axes. Separate from |
xaxis |
Logical: If TRUE, draws x-axis |
yaxis |
Logical: If TRUE, draws y-axis |
xaxis.line |
Float: Number of lines into the margin to position |
yaxis.line |
Float: Number of lines into the margin to position |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
xaxs |
Character: 'r' (Default): Extends x-axis range by 4 percent at each end, 'i': Does not extend x-axis range |
yaxs |
Character: 'r' (Default): Extends y-axis range by 4 percent at each end, 'i': Does not extend y-axis range |
box |
Logical: If TRUE, draws a box around plot |
grid |
Logical: If TRUE, draws a grid |
col |
Color to use for histogram lines |
horiz |
Logical: If TRUE, switches x and y axes. Important: Provide all other arguments as if for a
non-rotated plot - i.e. |
main |
Character: Main title |
add |
Logical: If TRUE, add histogram to existing plot (Caution: make sure the axes line up!) |
... |
Additional arguments to be passed to |
Using horiz = TRUE
, you can draw vertical histograms (as used by mplot3_xym
)
E.D. Gennatas
mplot3
plotAdd legend to mplot3
plot
mlegend( lims, title = NULL, group.names, title.col = "black", col = rtpalette("rtCol1"), horiz.pad = 0.04, footer = NULL, font = 1, font.family = "Helvetica Neue" )
mlegend( lims, title = NULL, group.names, title.col = "black", col = rtpalette("rtCol1"), horiz.pad = 0.04, footer = NULL, font = 1, font.family = "Helvetica Neue" )
lims |
List with plot limits in the form list(xlim = xlim, ylim = ylim) e.g. as returned by mplot3_xy |
title |
Character: Legend title |
group.names |
Character: group names |
title.col |
Title color |
col |
Color vector |
horiz.pad |
Numeric: Proportion of plot width to pad by |
footer |
Character: Footer annotation |
font |
1 or 2 for regular and bold |
font.family |
Character: Font family to use |
E.D. Gennatas
Calculate error metrics for pair of vector, e.g. true and estimated values from a model
mod_error( true, estimated, estimated.prob = NULL, type = NULL, rho = FALSE, tau = FALSE, na.rm = TRUE, verbosity = 0 )
mod_error( true, estimated, estimated.prob = NULL, type = NULL, rho = FALSE, tau = FALSE, na.rm = TRUE, verbosity = 0 )
true |
Vector: True values |
estimated |
Vector: Estimated values |
estimated.prob |
Vector: Estimated probabilities for Classification, if available. |
type |
Character: "Regression", "Classification", or "Survival". If not provided, will be set to Regression if y is numeric. |
rho |
Logical: If TRUE, calculate Spearman's rho. |
tau |
Logical: If TRUE, calculate Kendall's tau. This can be slow for long vectors |
na.rm |
Logical: Passed to |
verbosity |
Integer: If > 0, print messages to console. |
In regression, NRMSE = RMSE / range(observed)
Object of class mod_error
E.D. Gennatas
Plot AGGTEobj object from the did package.
mplot_AGGTEobj( x, x.factor = 1, y.factor = 1, error = c("se", "95%ci"), main = "Average Effect by Length of Exposure", legend.title = "", group.names = c("Pre", "Post"), xlab = NULL, ylab = NULL, mar = c(2.5, 3.5, 2, 7), theme = rtTheme, font.family = "Helvetica", col = c("#EC1848", "#18A3AC"), filename = NULL, file.width = 6.5, file.height = 5.5, par.reset = TRUE, ... )
mplot_AGGTEobj( x, x.factor = 1, y.factor = 1, error = c("se", "95%ci"), main = "Average Effect by Length of Exposure", legend.title = "", group.names = c("Pre", "Post"), xlab = NULL, ylab = NULL, mar = c(2.5, 3.5, 2, 7), theme = rtTheme, font.family = "Helvetica", col = c("#EC1848", "#18A3AC"), filename = NULL, file.width = 6.5, file.height = 5.5, par.reset = TRUE, ... )
x |
AGGTEobj object |
main |
Character: Plot title |
group.names |
(Optional) If multiple groups are plotted, use these
names if |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
mar |
Float, vector, length 4: Margins; see |
theme |
Character: Run |
filename |
Character: Path to file to save plot. Default = NULL |
par.reset |
Logical: If TRUE, reset |
... |
Additional arguments to be passed to theme function |
E.D. Gennatas
Plot HSV color range
mplot_hsv( h.steps = seq(0, 1, 0.025), s.steps = seq(0, 1, 0.05), v = 1, alpha = 1, pch = 16, bg = "black", axes = TRUE, pty = "s", cex = 2, mar = c(3, 3, 2, 0.5), lab.col = NULL, type = c("radial", "square"), line.col = "gray50", show.grid = TRUE, show.radial.grid = FALSE, show.grid.labels = 1, cex.axis = 1, cex.lab = 1, par.reset = TRUE )
mplot_hsv( h.steps = seq(0, 1, 0.025), s.steps = seq(0, 1, 0.05), v = 1, alpha = 1, pch = 16, bg = "black", axes = TRUE, pty = "s", cex = 2, mar = c(3, 3, 2, 0.5), lab.col = NULL, type = c("radial", "square"), line.col = "gray50", show.grid = TRUE, show.radial.grid = FALSE, show.grid.labels = 1, cex.axis = 1, cex.lab = 1, par.reset = TRUE )
h.steps |
Float, vector: Hue values to plot.
Default = |
s.steps |
Float, vector: Saturation values to plot. Default = same as
|
v |
Float: Value. |
alpha |
Float: Alpha. |
pch |
Integer: pch plot parameter. Default = 15 (square) |
bg |
Color: Background color. Default = "black" |
axes |
Logical: for |
pty |
Character: for |
cex |
Float: |
mar |
Float, vector: for |
lab.col |
Color: Color for axes and labels. Defaults to inverse of
|
type |
Character: "square" for square plot, "radial" for radial plot. |
show.grid |
Logical: if TRUE, show grid then type is "radial" |
par.reset |
Logical: If TRUE, reset |
E.D. Gennatas
## Not run: mplot_hsv() ## End(Not run)
## Not run: mplot_hsv() ## End(Not run)
Plots 2D (grayscale) or 3D (color) array as Raster Image
mplot_raster( x, max.value = max(x), mar = NULL, main = NULL, main.line = 0, main.side = 3, main.col = "#ffffff", main.adj = 0, main.font = 2, mono = FALSE, mono.fn = mean, bg = "gray10", par.set = TRUE, par.reset = TRUE, verbose = TRUE )
mplot_raster( x, max.value = max(x), mar = NULL, main = NULL, main.line = 0, main.side = 3, main.col = "#ffffff", main.adj = 0, main.font = 2, mono = FALSE, mono.fn = mean, bg = "gray10", par.set = TRUE, par.reset = TRUE, verbose = TRUE )
x |
Array, 2D or 3D: Input describing grayscale or color image in RGB space |
mono |
Logical: If TRUE, plot as grayscale using |
mono.fn |
Function: Apply this function to the array to convert to 2D for grayscale plotting. Default = mean |
bg |
Color: Background color (around the plotted image when window proportions do not match image). Default = "gray10" |
par.reset |
Logical: If TRUE, reset par settings before exiting. Default = TRUE |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
E.D. Gennatas
## Not run: img <- imager::load.image("https://www.r-project.org/logo/Rlogo.png") mplot_raster(img) ## End(Not run)
## Not run: img <- imager::load.image("https://www.r-project.org/logo/Rlogo.png") mplot_raster(img) ## End(Not run)
mplot3
: ADSR PlotPlot Attack Decay Sustain Release Envelope Generator using mplot3_xy
mplot3_adsr( Attack = 300, Decay = 160, Sustain = 40, Release = 500, Value = 80, I = 200, O = 1800, lty = 1, lwd = 4, main = "ADSR Envelope", main.line = 1.6, main.col = "white", Attack.col = "#44A6AC", Decay.col = "#F4A362", Sustain.col = "#3574A7", Release.col = "#C23A70", draw.poly = FALSE, poly.alpha = 0.15, draw.verticals = TRUE, v.lty = 1, v.lwd = 0.8, arrow.code = 2, arrow.length = 0.09, grid = FALSE, grid.lty = 1, grid.lwd = 0.4, grid.col = NULL, zerolines.col = "gray50", theme = "darkgray", labs.col = "gray70", tick.col = "gray70", on.col = "gray70", off.col = "gray70", pty = "m", mar = c(3, 3, 3.2, 0.5), xaxs = "i", yaxs = "i", par.reset = TRUE, ... )
mplot3_adsr( Attack = 300, Decay = 160, Sustain = 40, Release = 500, Value = 80, I = 200, O = 1800, lty = 1, lwd = 4, main = "ADSR Envelope", main.line = 1.6, main.col = "white", Attack.col = "#44A6AC", Decay.col = "#F4A362", Sustain.col = "#3574A7", Release.col = "#C23A70", draw.poly = FALSE, poly.alpha = 0.15, draw.verticals = TRUE, v.lty = 1, v.lwd = 0.8, arrow.code = 2, arrow.length = 0.09, grid = FALSE, grid.lty = 1, grid.lwd = 0.4, grid.col = NULL, zerolines.col = "gray50", theme = "darkgray", labs.col = "gray70", tick.col = "gray70", on.col = "gray70", off.col = "gray70", pty = "m", mar = c(3, 3, 3.2, 0.5), xaxs = "i", yaxs = "i", par.reset = TRUE, ... )
Attack |
Numeric: Attack time (in milliseconds) |
Decay |
Numeric: Decay time (in milliseconds) |
Sustain |
Numeric: Sustain Level (percent) |
Release |
Numeric: Release time (in milliseconds) |
Value |
Numeric: Value (percent) |
I |
Numeric: Note on time (in milliseconds) |
O |
Numeric: Note off time (in milliseconds) |
lty |
Integer: Line type |
lwd |
Numeric: Line width |
main |
Character: Main title |
main.line |
Numeric: Main title line height |
main.col |
Main title color |
Attack.col |
Attack color |
Decay.col |
Decay color |
Sustain.col |
Sustain color |
Release.col |
Release color |
draw.poly |
Logical: If TRUE, draw polygons for each segment |
poly.alpha |
Numeric: Polygon alpha |
draw.verticals |
Logical: If TRUE, draw vertical lines |
v.lty |
Integer: Vertical line type |
v.lwd |
Numeric: Vertical line width |
arrow.code |
Integer: Arrow code |
arrow.length |
Numeric: Arrow length |
grid |
Logical: If TRUE, draw grid |
grid.lty |
Integer: Grid line type |
grid.lwd |
Numeric: Grid line width |
grid.col |
Grid line color |
zerolines.col |
Color for zero lines |
theme |
Character: "light" or "dark" (Default) |
labs.col |
Color for axis labels |
tick.col |
Color for axis ticks |
on.col |
Color for "on" line |
off.col |
Color for "off" line |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
mar |
Float, vector, length 4: Margins; see |
xaxs |
Character: "r": Extend plot x-axis limits by 4% on either end; "i": Use exact x-axis limits. |
yaxs |
Character: as |
par.reset |
Logical: If TRUE, reset |
... |
Additional arguments to pass to mplot3_xy |
Learn more: (https://en.wikipedia.org/wiki/Synthesizer#Attack_Decay_Sustain_Release_.28ADSR.29_envelope "ADSR Wikipedia")
E.D. Gennatas
## Not run: mplot3_adsr() ## End(Not run)
## Not run: mplot3_adsr() ## End(Not run)
mplot3
: BarplotDraw barplots
mplot3_bar( x, error = NULL, col = NULL, error.col = "white", error.lwd = 2, alpha = 1, beside = TRUE, border = NA, width = 1, space = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, las = 1.5, xnames = NULL, xnames.srt = 0, xnames.adj = ifelse(xnames.srt == 0, 0.5, 1), xnames.line = 0.5, xnames.font = 1, xnames.cex = 1, xnames.y.pad = 0.08, xnames.at = NULL, color.bygroup = FALSE, group.legend = NULL, legend.x = NULL, legend.y = NULL, group.names = NULL, legend.font = 1, bartoplabels = NULL, bartoplabels.line = 0, bartoplabels.font = 1, mar = c(2.5, 3, 2, 1), pty = "m", barplot.axes = FALSE, yaxis = TRUE, ylim.pad = 0.04, theme = rtTheme, palette = rtPalette, autolabel = letters, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
mplot3_bar( x, error = NULL, col = NULL, error.col = "white", error.lwd = 2, alpha = 1, beside = TRUE, border = NA, width = 1, space = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, las = 1.5, xnames = NULL, xnames.srt = 0, xnames.adj = ifelse(xnames.srt == 0, 0.5, 1), xnames.line = 0.5, xnames.font = 1, xnames.cex = 1, xnames.y.pad = 0.08, xnames.at = NULL, color.bygroup = FALSE, group.legend = NULL, legend.x = NULL, legend.y = NULL, group.names = NULL, legend.font = 1, bartoplabels = NULL, bartoplabels.line = 0, bartoplabels.font = 1, mar = c(2.5, 3, 2, 1), pty = "m", barplot.axes = FALSE, yaxis = TRUE, ylim.pad = 0.04, theme = rtTheme, palette = rtPalette, autolabel = letters, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
x |
Vector or Matrix: If Vector, each value will be drawn as a bar. If Matrix, each column is a vector, so multiple columns signify a different group. e.g. Columns could be months and rows could be N days sunshine, N days rainfall, N days snow, etc. |
error |
Vector or Matrix: If Vector, each value will be drawn as an error bar. If Matrix, each column is a vector, so multiple columns signify a different group. |
col |
Vector of colors to use |
alpha |
Float: Alpha to be applied to |
border |
Color if you wish to draw border around bars, NA for no borders (Default) |
space |
Float: Space left free on either side of the bars, as a fraction of bar width. A single number or a
vector, one value per bar. If |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
main |
Character: Plot title |
color.bygroup |
Logical: If TRUE, and input is a matrix, each group's bars will be given the same color, otherwise bars across groups will be given the same sequence of colors. Default = FALSE |
group.legend |
Logical: If TRUE, place |
group.names |
(Optional) If multiple groups are plotted, use these
names if |
mar |
Float, vector, length 4: Margins; see |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
par.reset |
Logical: If TRUE, reset |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
filename |
Character: Path to file to save plot. Default = NULL |
... |
Additional arguments to |
legend |
Logical: If TRUE, and input is matrix, draw legend for each case. Note: you may need to adjust
|
E.D. Gennatas
mplot3
: BoxplotDraw boxplots of a vector (single box), data.frame (one box per column) or list (one box per element - good for variable of different length)
mplot3_box( x, col = NULL, alpha = 0.66, border = NULL, border.alpha = 1, group.spacing = 0.25, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, boxwex = NULL, staplewex = 0.5, horizontal = FALSE, main = NULL, groupnames = NULL, xnames = NULL, xnames.at = NULL, xnames.y = NULL, xnames.font = 1, xnames.adj = NULL, xnames.pos = NULL, xnames.srt = NULL, order.by.fn = NULL, legend = FALSE, legend.names = NULL, legend.position = "topright", legend.inset = c(0, 0), mar = NULL, oma = rep(0, 4), pty = "m", yaxis = TRUE, ylim.pad = 0, theme = rtTheme, labelify = TRUE, autolabel = letters, na.rm = TRUE, palette = rtPalette, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
mplot3_box( x, col = NULL, alpha = 0.66, border = NULL, border.alpha = 1, group.spacing = 0.25, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, boxwex = NULL, staplewex = 0.5, horizontal = FALSE, main = NULL, groupnames = NULL, xnames = NULL, xnames.at = NULL, xnames.y = NULL, xnames.font = 1, xnames.adj = NULL, xnames.pos = NULL, xnames.srt = NULL, order.by.fn = NULL, legend = FALSE, legend.names = NULL, legend.position = "topright", legend.inset = c(0, 0), mar = NULL, oma = rep(0, 4), pty = "m", yaxis = TRUE, ylim.pad = 0, theme = rtTheme, labelify = TRUE, autolabel = letters, na.rm = TRUE, palette = rtPalette, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
x |
Vector, data.frame or list: Each data.frame column or list element will be drawn as a box |
col |
Vector of colors to use |
alpha |
Numeric: |
border |
Color for lines around boxes |
border.alpha |
Numeric: |
group.spacing |
Numeric: Spacing between groups of boxes (when input is data.frame or list) |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
boxwex |
Numeric: Scale factor for box width. Default = .5 |
staplewex |
Numeric: max and min line ("staple") width proportional to box. Default = .5 |
horizontal |
Logical: If TRUE, draw horizontal boxplot(s). |
main |
Character: Plot title |
groupnames |
Character vector: Group names |
xnames |
Character vector: Names for individual boxes |
xnames.at |
Numeric: Position of xnames |
order.by.fn |
Character: "mean", "median" or any function that outputs a single number: E stimate function on each vector and order boxes (when input is data.frame or list) by ascending order. Default = NULL, i.e. no reordering |
mar |
Float, vector, length 4: Margins; see |
oma |
Float, vector, length 4: Outer margins; see |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
theme |
Character: Run |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
na.rm |
Logical: If TRUE, remove NA values, otherwise function will give error. Default = TRUE |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
par.reset |
Logical: If TRUE, reset |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
filename |
Character: Path to file to save plot. Default = NULL |
... |
Additional arguments to |
Note that argument xnames
refers to the x-axis labels below each box. If not specified, these
are inferred from the input when possible. Argument xlab
is a single label for the x-axis as
per usual and often omitted if xnames
suffice.
E.D. Gennatas
## Not run: ## vector x <- rnorm(500) mplot3_box(x) ## data.frame - each column one boxplot x <- data.frame(alpha = rnorm(50), beta = rnorm(50), gamma = rnorm(50)) mplot3_box(x) ## list of vectors - allows different length vectors x <- list(alpha = rnorm(50), beta = rnorm(80, 4, 1.5), gamma = rnorm(30, -3, .5)) mplot3_box(x) ## grouped boxplots: input a list of lists. outer list: groups; inner lists: matched data vectors x <- list(Cases = list(Weight = rnorm(50), Temperature = rnorm(45, 1)), Controls = list(Weight = rnorm(80), Temperature = rnorm(72))) mplot3_box(x) ## End(Not run)
## Not run: ## vector x <- rnorm(500) mplot3_box(x) ## data.frame - each column one boxplot x <- data.frame(alpha = rnorm(50), beta = rnorm(50), gamma = rnorm(50)) mplot3_box(x) ## list of vectors - allows different length vectors x <- list(alpha = rnorm(50), beta = rnorm(80, 4, 1.5), gamma = rnorm(30, -3, .5)) mplot3_box(x) ## grouped boxplots: input a list of lists. outer list: groups; inner lists: matched data vectors x <- list(Cases = list(Weight = rnorm(50), Temperature = rnorm(45, 1)), Controls = list(Weight = rnorm(80), Temperature = rnorm(72))) mplot3_box(x) ## End(Not run)
Plots confusion matrix and classification metrics
mplot3_conf( object, main = "auto", xlab = "Reference", ylab = "Predicted", plot.metrics = TRUE, mod.name = NULL, oma = c(0, 0, 0, 0), dim.main = NULL, dim.lab = 1, dim.in = 4, dim.out = -1, font.in = 2, font.out = 1, cex.main = 1.2, cex.in = 1.2, cex.lab = 1.2, cex.lab2 = 1.2, cex.lab3 = 1, cex.out = 1, col.main = "auto", col.lab = "auto", col.text.out = "auto", col.bg = "auto", col.bg.out1 = "auto", col.bg.out2 = "auto", col.text.hi = "auto", col.text.lo = "auto", show.ba = TRUE, theme = getOption("rt.theme", "white"), mid.col = "auto", hi.color.pos = "#18A3AC", hi.color.neg = "#C23A70", autolabel = letters, par.reset = TRUE, pdf.width = 7, pdf.height = 7, filename = NULL, ... )
mplot3_conf( object, main = "auto", xlab = "Reference", ylab = "Predicted", plot.metrics = TRUE, mod.name = NULL, oma = c(0, 0, 0, 0), dim.main = NULL, dim.lab = 1, dim.in = 4, dim.out = -1, font.in = 2, font.out = 1, cex.main = 1.2, cex.in = 1.2, cex.lab = 1.2, cex.lab2 = 1.2, cex.lab3 = 1, cex.out = 1, col.main = "auto", col.lab = "auto", col.text.out = "auto", col.bg = "auto", col.bg.out1 = "auto", col.bg.out2 = "auto", col.text.hi = "auto", col.text.lo = "auto", show.ba = TRUE, theme = getOption("rt.theme", "white"), mid.col = "auto", hi.color.pos = "#18A3AC", hi.color.neg = "#C23A70", autolabel = letters, par.reset = TRUE, pdf.width = 7, pdf.height = 7, filename = NULL, ... )
object |
Either a classification |
main |
Character: Plot title. |
xlab |
Character: x-axis label. |
ylab |
Character: y-axis label. |
plot.metrics |
Logical: If TRUE, draw classification metrics next to confusion matrix. |
mod.name |
Character: Name of the algorithm used to make predictions. If NULL,
will look for |
oma |
Numeric, vector, length 4: Outer margins. |
dim.main |
Numeric: Height for title. |
dim.lab |
Numeric: Height for labels. |
dim.in |
Numeric: Height/Width for confusion matrix cells. |
dim.out |
Numeric: Height for metrics cells. Default = -1, which autoadjusts depending on number of output classes. |
font.in |
Integer: The |
font.out |
Integer: The |
cex.main |
Numeric: The |
cex.in |
Numeric: The |
cex.lab |
Numeric: The |
cex.lab2 |
Numeric: The |
cex.lab3 |
Numeric: The |
cex.out |
Numeric: The |
col.main |
Color for title. Default = "auto", determined by |
col.lab |
Color for labels. Default = "auto", determined by |
col.text.out |
Color for metrics cells' text. Default = "auto",
determined by |
col.bg |
Color for background. Default = "auto", determined by
|
col.bg.out1 |
Color for metrics cells' background (row1).
Default = "auto", determined by |
col.bg.out2 |
Color for metrics cells' background (row2).
Default = "auto", determined by |
col.text.hi |
Color for high confusion matrix values. Default = "auto",
determined by |
col.text.lo |
Color for low confusion matrix values. Default = "auto",
determined by |
show.ba |
Logical: If TRUE, show Balanced Accuracy at bottom right corner. |
theme |
Character: "light", or "dark". Set to
|
mid.col |
Color: The mid color for the confusion matrix.
Default = "auto", determined by |
hi.color.pos |
Color: The hi color for correct classification. |
hi.color.neg |
Color: The hi color for missclassification. |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
par.reset |
Logical: If TRUE, reset par before exit. |
pdf.width |
Numeric: PDF width, if |
pdf.height |
Numeric: PDF height, if |
filename |
Character: If specified, save plot to this path. |
... |
Additional arguments passed to |
This function uses its multiple cex args instead of the theme's cex
parameter
List of metrics, invisibly
E.D. Gennatas
## Not run: true <- c("alpha", "alpha", "alpha", "alpha", "beta", "beta", "beta", "beta") predicted <- c("alpha", "alpha", "alpha", "beta", "beta", "alpha", "alpha", "beta") mplot3_conf(table(predicted, true)) ## End(Not run)
## Not run: true <- c("alpha", "alpha", "alpha", "alpha", "beta", "beta", "beta", "beta") predicted <- c("alpha", "alpha", "alpha", "beta", "beta", "alpha", "alpha", "beta") mplot3_conf(table(predicted, true)) ## End(Not run)
Plots an extended confusion matrix using mplot3_img
mplot3_confbin( object, main = NULL, xlab = "True", ylab = "Estimated", mod.name = NULL, mar = c(4, 5, 4, 3), dim.lab = 1, dim.in = 4, dim.out = 2, font.in = 2, font.out = 2, cex.in = 1.2, cex.lab = 1.2, cex.lab2 = 1, cex.out = 1, col.text.out = "white", col.bg.out = "gray50", theme = "light", mid.color = NULL, hi.color.pos = "#18A3AC", hi.color.neg = "#716FB2", par.reset = TRUE, pdf.width = 8.7, pdf.height = 8.7, filename = NULL, ... )
mplot3_confbin( object, main = NULL, xlab = "True", ylab = "Estimated", mod.name = NULL, mar = c(4, 5, 4, 3), dim.lab = 1, dim.in = 4, dim.out = 2, font.in = 2, font.out = 2, cex.in = 1.2, cex.lab = 1.2, cex.lab2 = 1, cex.out = 1, col.text.out = "white", col.bg.out = "gray50", theme = "light", mid.color = NULL, hi.color.pos = "#18A3AC", hi.color.neg = "#716FB2", par.reset = TRUE, pdf.width = 8.7, pdf.height = 8.7, filename = NULL, ... )
object |
Either 1. a classification |
main |
Character: Plot title |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
mod.name |
Character: Name of the algorithm used to make predictions. If NULL, will look for
|
mar |
Numeric, vector, length 4: Overall margins |
dim.lab |
Float: Height for labels |
dim.in |
Float: Width and height for confusion matrix cells |
dim.out |
Float: Height for metrics cells |
font.in |
Integer: The |
font.out |
Integer: The |
cex.in |
Float: The |
cex.lab |
Float: The |
cex.lab2 |
Float: The |
cex.out |
Float: The |
col.text.out |
Color for metrics cells' text |
col.bg.out |
Color for metrics cells' background |
theme |
Character: "light", or "dark" |
mid.color |
Color: The mid color for the confusion matrix. Default = "white" for theme = "light", "black" for "dark" |
hi.color.pos |
Color: The hi color for correct classification. |
hi.color.neg |
Color: The hi color for missclassification |
par.reset |
Logical: If TRUE, reset par before exit. Default = TRUE |
pdf.width |
Float: PDF width, if |
pdf.height |
Float: PDF height, if |
filename |
Character: If specified, save plot to this path. Default = NULL |
... |
Not used |
List of metrics, invisibly
E.D. Gennatas
mplot3
: Decision boundariesPlot classification decision boundaries of rtemis models
mplot3_decision( rtmod, data, vars = c(1, 2), dots.per.axis = 100, bg.cex = 0.5, bg.alpha = 0.4, bg.pch = 15, par.reset = TRUE, theme = "white", col = c("#18A3AC", "#F48024"), contour.col = "black", contour.lwd = 0.1, point.pch = c(3, 4), point.alpha = 1 )
mplot3_decision( rtmod, data, vars = c(1, 2), dots.per.axis = 100, bg.cex = 0.5, bg.alpha = 0.4, bg.pch = 15, par.reset = TRUE, theme = "white", col = c("#18A3AC", "#F48024"), contour.col = "black", contour.lwd = 0.1, point.pch = c(3, 4), point.alpha = 1 )
rtmod |
rtemics trained model |
data |
Matrix / data frame of features; last column is class |
vars |
Integer vector, length 2: Index of features (columns of |
dots.per.axis |
Integer: Draw a grid with this many dots on each axis. Default = 100 |
bg.cex |
Float: Point cex for background / decision surface. Default = .5 |
bg.alpha |
Float: Point alpha for background / decision surface. Default = .2 |
bg.pch |
Integer vector: pch for background / decision surface. Default = c(3, 4) |
par.reset |
Logical: If TRUE, reset |
theme |
Character: Theme for mplot3_xy, "light" or "dark". Default = "light' |
col |
Color vector for classes. Default = |
contour.col |
Color for decision boundary. Default = "black" |
contour.lwd |
Float: Line width for decision boundary. Default = .3 |
point.pch |
Integer: pch for data points. Default = c(3, 4) |
point.alpha |
Float: Alpha for data points. Default = 1 |
If data has more than 2 variables, any variable not selected using vars
will be fixed to their mean
Underlying model (e.g. randomForest
, rpart
, etc) must support standard R predict format for classification:
predict(model, newdata, type = "class")
Predicted labels for background grid (invisibly)
E.D. Gennatas
## Not run: dat <- as.data.frame(mlbench::mlbench.2dnormals(200)) mod.cart <- s_CART(dat) mod.rf <- s_RF(dat) mplot3_decision(mod.cart, dat) mplot3_decision(mod.rf, dat) ## End(Not run)
## Not run: dat <- as.data.frame(mlbench::mlbench.2dnormals(200)) mod.cart <- s_CART(dat) mod.rf <- s_RF(dat) mplot3_decision(mod.cart, dat) mplot3_decision(mod.rf, dat) ## End(Not run)
An mplot3_xy
wrapper with defaults for plotting a learner's performance
mplot3_fit( x, y, fit = "lm", se.fit = TRUE, fit.error = TRUE, axes.equal = TRUE, diagonal = TRUE, theme = rtTheme, marker.col = NULL, fit.col = NULL, pty = "s", fit.legend = FALSE, mar = NULL, ... )
mplot3_fit( x, y, fit = "lm", se.fit = TRUE, fit.error = TRUE, axes.equal = TRUE, diagonal = TRUE, theme = rtTheme, marker.col = NULL, fit.col = NULL, pty = "s", fit.legend = FALSE, mar = NULL, ... )
x |
Numeric vector: True values |
y |
Numeric vector: Predicted values |
fit |
Character: rtemis model to calculate |
se.fit |
Logical: If TRUE, draw the standard error of the fit |
fit.error |
Logical: If TRUE: draw fit error annotation. Default = NULL, which results in TRUE, if fit is set |
axes.equal |
Logical: Should axes be equal? Defaults to FALSE |
diagonal |
Logical: If TRUE, draw diagonal line. |
theme |
Character: Run |
marker.col |
Color for marker |
fit.col |
Color: Color of the fit line. |
pty |
Character: "s" for square plot, "m" to fill device. Default = "s" |
fit.legend |
Logical: If TRUE, show fit legend |
mar |
Float, vector, length 4: Margins; see |
... |
Additional argument to be passed to mplot3_conf (classification) or mplot3_xy (regression) |
EDG
mplot3
: Guitar FretboardDraw color-coded notes on a guitar fretboard for standard E-A-D-G-B-e tuning
mplot3_fret( theme = rtTheme, useSharps = FALSE, strings.col = "auto", frets.col = "auto", inlays = TRUE, inlays.col = "auto", inlays.cex = 2, par.reset = TRUE, ... )
mplot3_fret( theme = rtTheme, useSharps = FALSE, strings.col = "auto", frets.col = "auto", inlays = TRUE, inlays.col = "auto", inlays.cex = 2, par.reset = TRUE, ... )
theme |
Character: "light" or "dark" |
useSharps |
Logical: If TRUE, draw sharp instead of flat notes. Default = FALSE |
strings.col |
Color for strings |
frets.col |
Color for frets |
inlays |
Logical: Draw fretboard inlays. Default = TRUE |
inlays.col |
Color for inlays |
inlays.cex |
Numeric: Character expansion factor for inlays. Default = 2 |
par.reset |
Logical: If TRUE, reset par before exit |
... |
Additional arguments to theme |
Plot is very wide and short. Adjust plot window accordingly. Practice every day.
E.D. Gennatas
igraph
networksPlot igraph
networks
mplot3_graph( net, vertex.size = 12, vertex.col = NULL, vertex.alpha = 0.33, vertex.label.col = NULL, vertex.label.alpha = 0.66, vertex.frame.col = NA, vertex.label = NULL, vertex.shape = "circle", edge.col = NULL, edge.alpha = 0.2, edge.curved = 0.35, edge.width = 2, layout = c("fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"), coords = NULL, layout_params = list(), cluster = NULL, groups = NULL, cluster_params = list(), cluster_mark_groups = TRUE, mark.col = NULL, mark.alpha = 0.3, mark.border = NULL, mark.border.alpha = 1, cluster_color_vertices = FALSE, theme = rtTheme, theme_extra_args = list(), palette = rtPalette, mar = rep(0, 4), par.reset = TRUE, filename = NULL, pdf.width = 6, pdf.height = 6, verbose = TRUE, ... )
mplot3_graph( net, vertex.size = 12, vertex.col = NULL, vertex.alpha = 0.33, vertex.label.col = NULL, vertex.label.alpha = 0.66, vertex.frame.col = NA, vertex.label = NULL, vertex.shape = "circle", edge.col = NULL, edge.alpha = 0.2, edge.curved = 0.35, edge.width = 2, layout = c("fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"), coords = NULL, layout_params = list(), cluster = NULL, groups = NULL, cluster_params = list(), cluster_mark_groups = TRUE, mark.col = NULL, mark.alpha = 0.3, mark.border = NULL, mark.border.alpha = 1, cluster_color_vertices = FALSE, theme = rtTheme, theme_extra_args = list(), palette = rtPalette, mar = rep(0, 4), par.reset = TRUE, filename = NULL, pdf.width = 6, pdf.height = 6, verbose = TRUE, ... )
net |
|
vertex.size |
Numeric: Vertex size |
vertex.col |
Color for vertices |
vertex.alpha |
Numeric: Transparency for |
vertex.label.col |
Color for vertex labels |
vertex.label.alpha |
Numeric: Transparency for |
vertex.frame.col |
Color for vertex border (frame) |
vertex.label |
Character vector: Vertex labels. Default = NULL, which will keep
existing names in |
vertex.shape |
Character: Vertex shape. See
|
edge.col |
Color for edges |
edge.alpha |
Numeric: Transparency for edges. Default = .2 |
edge.curved |
Numeric: Curvature of edges. Default = .35 |
edge.width |
Numeric: Edge thickness |
layout |
Character: one of: "fr", "dh", "drl", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama", corresponding to all the available layouts in igraph |
coords |
Output of precomputed igraph layout. If provided, |
layout_params |
List of parameters to pass to |
cluster |
Character: one of: "edge_betweenness", "fast_greedy", "infomap", "label_prop", "leading_eigen", "louvain", "optimal", "spinglass", "walktrap", corresponding to all the available igraph clustering functions |
groups |
Output of precomputed igraph clustering. If provided, |
cluster_params |
List of parameters to pass to |
cluster_mark_groups |
Logical: If TRUE, draw polygons to indicate clusters,
if |
mark.col |
Colors, one per group for polygon surrounding cluster. Note: You won't know the number of groups unless they are precomputed. The colors will be recycled as needed. |
mark.alpha |
Float [0, 1]: Transparency for |
mark.border |
Colors, similar to |
mark.border.alpha |
Float [0, 1]: Transparency for |
cluster_color_vertices |
Logical: If TRUE, color vertices by cluster membership. |
theme |
rtemis theme to use |
theme_extra_args |
List of extra arguments to pass to the theme function defined
by |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
mar |
Numeric vector, length 4: |
par.reset |
Logical: If TRUE, reset par before exiting. |
filename |
Character: If provided, save plot to this filepath |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
verbose |
Logical, If TRUE, print messages to console. |
... |
Extra arguments to pass to |
E.D. Gennatas
Plot a harmonograph
mplot3_harmonograph( steps = seq(1, 500, by = 0.01), seed = NULL, col = "white", alpha = 0.2, bg = "black", lwd = 1, text = NULL, text.side = 1, text.line = -1, text.adj = 0, text.padj = 0, text.col = NULL, mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0), xlim = NULL, ylim = NULL, new = FALSE, par.reset = TRUE )
mplot3_harmonograph( steps = seq(1, 500, by = 0.01), seed = NULL, col = "white", alpha = 0.2, bg = "black", lwd = 1, text = NULL, text.side = 1, text.line = -1, text.adj = 0, text.padj = 0, text.col = NULL, mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0), xlim = NULL, ylim = NULL, new = FALSE, par.reset = TRUE )
steps |
Float, vector |
seed |
Integer |
col |
Line color. Default = "white" |
alpha |
Alpha for line color |
bg |
Color for background. Default = "black" |
lwd |
Float: Line width |
text |
Character: Text you want printed along with the harmonograph. Default = NULL |
text.side |
Integer 1, 2, 3, 4: |
text.line |
Float: |
text.adj |
Float: |
text.padj |
Float: |
text.col |
Color: Text color. Default is same as |
mar |
Float vector, length 4: Plot margins. ( |
oma |
Float vector, length 4: Outer margins. ( |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
new |
Logical. If TRUE, do not clear plot before drawing |
par.reset |
Logical. If TRUE, reset par before exit |
Unless you define a seed, each graph will be random. Try different seeds if you want to reproduce your graphs. Some seeds to try: 9, 17, 26, 202, 208, ...
E.D. Gennatas
mplot3
Heatmap (image
; modified heatmap
)Customized heatmap with optional colorbar
mplot3_heatmap( x, colorGrad.n = 101, colorGrad.col = NULL, lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", space = "rgb", theme = getOption("rt.theme", "white"), colorbar = TRUE, cb.n = 21, cb.title = NULL, cb.cex = NULL, cb.title.cex = 1, cb.mar = NULL, Rowv = TRUE, Colv = TRUE, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv, "Rowv"), scale = "none", na.rm = TRUE, margins = NULL, group.columns = NULL, group.legend = !is.null(group.columns), column.palette = rtPalette, group.rows = NULL, row.palette = rtPalette, ColSideColors, RowSideColors, cexRow = 0.2 + 1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL, labCol = NULL, labCol.las = NULL, main = "", main.adj = 0, main.line = NA, xlab = NULL, ylab = NULL, xlab.line = NULL, ylab.line = NULL, keep.dendro = FALSE, trace = 0, zlim = NULL, autorange = TRUE, autolabel = letters, filename = NULL, par.reset = TRUE, pdf.width = 7, pdf.height = 7, ... )
mplot3_heatmap( x, colorGrad.n = 101, colorGrad.col = NULL, lo = "#18A3AC", lomid = NULL, mid = NULL, midhi = NULL, hi = "#F48024", space = "rgb", theme = getOption("rt.theme", "white"), colorbar = TRUE, cb.n = 21, cb.title = NULL, cb.cex = NULL, cb.title.cex = 1, cb.mar = NULL, Rowv = TRUE, Colv = TRUE, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv, "Rowv"), scale = "none", na.rm = TRUE, margins = NULL, group.columns = NULL, group.legend = !is.null(group.columns), column.palette = rtPalette, group.rows = NULL, row.palette = rtPalette, ColSideColors, RowSideColors, cexRow = 0.2 + 1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL, labCol = NULL, labCol.las = NULL, main = "", main.adj = 0, main.line = NA, xlab = NULL, ylab = NULL, xlab.line = NULL, ylab.line = NULL, keep.dendro = FALSE, trace = 0, zlim = NULL, autorange = TRUE, autolabel = letters, filename = NULL, par.reset = TRUE, pdf.width = 7, pdf.height = 7, ... )
x |
Input matrix |
colorGrad.n |
Integer: Number of distinct colors to generate using colorGrad. Default = 101 |
colorGrad.col |
Character: the |
lo |
Color for low end |
lomid |
Color for low-mid |
mid |
Color for middle of the range or "mean", which will result in colorOp(c(lo, hi), "mean"). If mid = NA, then only lo and hi are used to create the color gradient. |
midhi |
Color for middle-high |
hi |
Color for high end |
space |
Character: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb". Recommendation: If mid is "white" or "black" (default), use "rgb", otherwise "Lab" |
theme |
Character: Defaults to option "rt.theme", if set, otherwise "light" |
colorbar |
Logical: If TRUE, plot colorbar next to heatmap. Default = TRUE |
cb.n |
Integer: Number of steps in colorbar. Default = 21, which gives 10 above and 10 below midline. If midline is zero, this corresponds to 10 percent increments / decrements |
cb.title |
Character: Title for the colorbar. Default = NULL |
cb.cex |
Float: Character expansion ( |
cb.title.cex |
Float: |
cb.mar |
Float, vector, length 4: Margins for colorbar. (passed to colorGrad's |
Rowv |
Logical OR a dendrogram OR integer vector that determines index for reordering OR NA to suppress. Default = TRUE |
Colv |
See |
distfun |
Function: used to compute the distance/dissimilarity matrix between rows and columns.
Default = |
hclustfun |
Function: used to determined hierarchical clustering when |
reorderfun |
Function (d, w): function of dendrogram and weights that determines reordering of row and column
dendrograms. Default uses |
add.expr |
Expression: will be evaluated after the call to |
symm |
Logical: If TRUE, treat |
revC |
Logical: If TRUE, reverse column order for plotting. Default = TRUE, if Rowv and Colv are identical |
scale |
Character: "row", "column", or "none". Determines whether values are centered and scaled in either the row or column direction. Default = "none" |
na.rm |
Logical: If TRUE, NAs are removed. Default = TRUE |
margins |
Float, vector, length 2: bottom and right side margins. Automatically determined by length of variable names |
ColSideColors |
Color, vector, length = ncol(x): Colors for a horizontal side bar to annotate columns of |
RowSideColors |
Color, vector, length = nrow(x): Like |
cexRow |
Float: |
cexCol |
Float: |
labRow |
Character, vector: Row labels to use. Default = |
labCol |
Character, vector: Column labels to use. Default = |
labCol.las |
Integer 0:3: |
main |
Character: Plot title |
main.adj |
Float: |
main.line |
Float: |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
xlab.line |
Float: |
ylab.line |
Float: |
keep.dendro |
Logical: If TRUE, dedrogram is returned invisibly. Default = FALSE |
trace |
Integer: If > 0, print diagnostic messages to console. Default = 0 |
zlim |
Float, vector, length 2: Passed to |
autorange |
Logical: See |
filename |
Character: If provided, save heatmap to file. Default = NULL |
par.reset |
Logical: If TRUE, reset |
pdf.width |
Float: Width of PDF output, if |
pdf.height |
Float: Height of PDF output, if |
... |
Additional arguments passed to |
The main difference from the original stats::heatmap
is the addition of a colorbar on the side.
This is achieved with colorGrad.
Other differences:
Dendrograms are not drawn by default. Set Rowv = T
and Colv = T
to get them.
Column labels are only drawn perpendicular to the x-axis if any one is
longer than two characters.
Otherwise, the arguments are the same as in stats::heatmap
E.D. Gennatas modified from original stats::heatmap
by Andy Liaw, R. Gentleman, M. Maechler, W. Huber
## Not run: x <- rnormmat(200, 20) xcor <- cor(x) mplot3_heatmap(xcor) ## End(Not run)
## Not run: x <- rnormmat(200, 20) xcor <- cor(x) mplot3_heatmap(xcor) ## End(Not run)
Draw a bitmap from a matrix of values.
mplot3_img( z, as.mat = TRUE, col = NULL, xnames = NULL, xnames.y = 0, ynames = NULL, main = NULL, main.adj = 0, x.axis.side = 3, y.axis.side = 2, x.axis.line = -0.5, y.axis.line = -0.5, x.axis.las = 0, y.axis.las = 1, x.tick.labs.adj = NULL, y.tick.labs.adj = NULL, x.axis.font = 1, y.axis.font = 1, xlab = NULL, ylab = NULL, xlab.adj = 0.5, ylab.adj = 0.5, xlab.line = 1.7, ylab.line = 1.7, xlab.padj = 0, ylab.padj = 0, xlab.side = 1, ylab.side = 2, main.col = NULL, axlab.col = NULL, axes.col = NULL, labs.col = NULL, tick.col = NULL, cell.lab.hi.col = NULL, cell.lab.lo.col = NULL, cex = 1.2, cex.ax = NULL, cex.x = NULL, cex.y = NULL, zlim = NULL, autorange = TRUE, pty = "m", mar = NULL, asp = NULL, ann = FALSE, axes = FALSE, cell.labs = NULL, cell.labs.col = NULL, cell.labs.autocol = TRUE, bg = NULL, theme = getOption("rt.theme", "white"), autolabel = letters, filename = NULL, file.width = NULL, file.height = NULL, par.reset = TRUE, ... )
mplot3_img( z, as.mat = TRUE, col = NULL, xnames = NULL, xnames.y = 0, ynames = NULL, main = NULL, main.adj = 0, x.axis.side = 3, y.axis.side = 2, x.axis.line = -0.5, y.axis.line = -0.5, x.axis.las = 0, y.axis.las = 1, x.tick.labs.adj = NULL, y.tick.labs.adj = NULL, x.axis.font = 1, y.axis.font = 1, xlab = NULL, ylab = NULL, xlab.adj = 0.5, ylab.adj = 0.5, xlab.line = 1.7, ylab.line = 1.7, xlab.padj = 0, ylab.padj = 0, xlab.side = 1, ylab.side = 2, main.col = NULL, axlab.col = NULL, axes.col = NULL, labs.col = NULL, tick.col = NULL, cell.lab.hi.col = NULL, cell.lab.lo.col = NULL, cex = 1.2, cex.ax = NULL, cex.x = NULL, cex.y = NULL, zlim = NULL, autorange = TRUE, pty = "m", mar = NULL, asp = NULL, ann = FALSE, axes = FALSE, cell.labs = NULL, cell.labs.col = NULL, cell.labs.autocol = TRUE, bg = NULL, theme = getOption("rt.theme", "white"), autolabel = letters, filename = NULL, file.width = NULL, file.height = NULL, par.reset = TRUE, ... )
z |
Input matrix |
as.mat |
Logical: If FALSE, rows and columns of z correspond to x and y coordinates accoridngly.
This is the |
col |
Colors to use. Defaults to |
cell.labs |
Matrix of same dimensions as z (Optional): Will be printed as strings over cells |
cell.labs.col |
Color for |
bg |
Background color |
filename |
String (Optional): Path to file where image should be saved. R-supported extensions: ".pdf", ".jpeg", ".png", ".tiff". |
file.width |
Output Width in inches |
file.height |
Output height in inches |
par.reset |
Logical: If TRUE, par will be reset to original settings before exit. Default = TRUE |
... |
Additional arguments to be passed to |
This is also a good way to plot a large heatmap.
This function calls image
which is a lot faster than drawing heatmaps
E.D. Gennatas
Laterality scatter plot
mplot3_laterality( x, regionnames, main = NULL, ylab = "Left to Right", summary.fn = "median", summary.lty = 1, summary.lwd = 2.5, summary.col = NULL, arrowhead.length = 0.075, deltas = TRUE, line.col = theme$fg, line.alpha = 0.25, lty = 1, lwd = 0.3, ylim = NULL, theme = rtTheme, labelify = TRUE, autolabel = letters, mar = NULL, oma = rep(0, 4), pty = "m", palette = rtPalette, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
mplot3_laterality( x, regionnames, main = NULL, ylab = "Left to Right", summary.fn = "median", summary.lty = 1, summary.lwd = 2.5, summary.col = NULL, arrowhead.length = 0.075, deltas = TRUE, line.col = theme$fg, line.alpha = 0.25, lty = 1, lwd = 0.3, ylim = NULL, theme = rtTheme, labelify = TRUE, autolabel = letters, mar = NULL, oma = rep(0, 4), pty = "m", palette = rtPalette, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
x |
data.frame or data.table which includes columns with ROI names ending in "_L" or "_R" |
regionnames |
Character, vector: Regions to plot. For example, if |
main |
Character: Plot title |
ylab |
Character: y-axis label |
summary.fn |
Character: Name of function to summarize left and right values. Default = "median" |
summary.lty |
Integer: line type for summary arrows |
summary.lwd |
Float: line width for summary arrows |
summary.col |
Color for summary arrows |
arrowhead.length |
Float: arrowhead length in inches. Default = .075 |
deltas |
Logical, If TRUE, show summary statistics. Default = TRUE |
line.col |
Color for individual cases' lines |
line.alpha |
Float: transparency for individual lines |
lty |
Integer: Line type for individual lines. Default = 1 |
lwd |
Float: Line width for individual lines. Default = .3 |
ylim |
Float, vector, length 2: y-axis limits |
theme |
Character: Run |
labelify |
Logical: If TRUE, labelify regionnames |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
mar |
Float, vector, length 4: Margins; see |
oma |
Float, vector, length 4: Outer margins; see |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
par.reset |
Logical: If TRUE, reset |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
filename |
Character: Path to file to save plot. Default = NULL |
... |
Additional arguments to be passed to theme function |
E.D. Gennatas
mplot3
Lollipop Plotmplot3
Lollipop Plot
mplot3_lolli( x, order.on.x = TRUE, plot.top = 1, orientation = c("horizontal", "vertical"), xnames = NULL, points = TRUE, segments = TRUE, main = NULL, col = NULL, cex = 1.2, matching.segment.col = FALSE, segment.alpha = 0.333, lty = 3, lwd = 2, theme = rtTheme, palette = rtPalette, autolabel = letters, par.reset = TRUE, pdf.width = 6, pdf.height = 6, mar = c(2.5, 3, 2, 1), pty = "m", pch = 16, x.axis.at = NULL, y.axis.at = NULL, xlab = NULL, ylab = NULL, label.las = 1, label.padj = 0.5, xaxs = "r", yaxs = "r", xlab.adj = 0.5, ylab.adj = 0.5, filename = NULL, ... )
mplot3_lolli( x, order.on.x = TRUE, plot.top = 1, orientation = c("horizontal", "vertical"), xnames = NULL, points = TRUE, segments = TRUE, main = NULL, col = NULL, cex = 1.2, matching.segment.col = FALSE, segment.alpha = 0.333, lty = 3, lwd = 2, theme = rtTheme, palette = rtPalette, autolabel = letters, par.reset = TRUE, pdf.width = 6, pdf.height = 6, mar = c(2.5, 3, 2, 1), pty = "m", pch = 16, x.axis.at = NULL, y.axis.at = NULL, xlab = NULL, ylab = NULL, label.las = 1, label.padj = 0.5, xaxs = "r", yaxs = "r", xlab.adj = 0.5, ylab.adj = 0.5, filename = NULL, ... )
x |
Float, vector: Input data |
order.on.x |
Logical: If TRUE, order by value of |
plot.top |
Float or Integer: If <= 1, plot this percent highest absolute values, otherwise plot this many top values.
i.e.: |
xnames |
Character, vector: Names of |
main |
Character: Main title |
col |
Color, vector: Lollipop color |
cex |
Float: Character expansion factor for points. Default = 1.2 |
matching.segment.col |
Logical: If TRUE, color line segments using |
segment.alpha |
Float: Transparency for line segments. Default = .5
points. Default = FALSE, in which case they are colored with |
lty |
Integer: Line type for segment segments. See |
lwd |
Float: Width for segment segments. See |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
par.reset |
Logical: If TRUE, reset |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
mar |
Float, vector, length 4: Margins; see |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
pch |
Integer: Point character. |
x.axis.at |
Float, vector: x coordinates to place tick marks.
Default = NULL, determined by |
y.axis.at |
As |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
xaxs |
Character: "r": Extend plot x-axis limits by 4% on either end; "i": Use exact x-axis limits. |
yaxs |
Character: as |
xlab.adj |
Float: |
ylab.adj |
Float: |
filename |
Character: Path to file to save plot. Default = NULL |
... |
Additional arguments to be passed to theme function |
E.D. Gennatas
## Not run: x <- rnorm(12) mplot3_lolli(x) # a "rounded" barplot mplot3_lolli(x, segments = T, points = F, lty = 1, matching.segment.col = T, lwd = 10, segment.alpha = 1) ## End(Not run)
## Not run: x <- rnorm(12) mplot3_lolli(x) # a "rounded" barplot mplot3_lolli(x, segments = T, points = F, lty = 1, matching.segment.col = T, lwd = 10, segment.alpha = 1) ## End(Not run)
Plot missingness
mplot3_missing( x, feat.names = NULL, case.names = NULL, main = NULL, col.missing = "#FE4AA3", show = c("percent", "total"), names.srt = 90, case.names.x = 0.25, case.names.every = NULL, theme = rtTheme, alpha = 1, mar = c(3, 3.5, 5.5, 1), oma = c(0.5, 0.5, 0.5, 0.5), par.reset = TRUE, ... )
mplot3_missing( x, feat.names = NULL, case.names = NULL, main = NULL, col.missing = "#FE4AA3", show = c("percent", "total"), names.srt = 90, case.names.x = 0.25, case.names.every = NULL, theme = rtTheme, alpha = 1, mar = c(3, 3.5, 5.5, 1), oma = c(0.5, 0.5, 0.5, 0.5), par.reset = TRUE, ... )
x |
Data matrix or data.frame |
feat.names |
Character: Feature names. Defaults to |
case.names |
Character: Case names. Defaults to |
main |
Character: Main title |
col.missing |
Color for missing cases. |
show |
Character: "percent" or "total". Show percent missing or total missing per column on the x-axis |
names.srt |
Numeric: Angle of feature names in degrees. |
case.names.x |
Numeric: x position of case names |
case.names.every |
Numeric: Show case names every this many cases |
theme |
Character: Run |
alpha |
Numeric: Multiply theme's |
mar |
Float, vector, length 4: Margins; see |
oma |
Float, vector, length 4: Outer margins; see |
par.reset |
Logical: If TRUE, reset |
... |
Additional arguments to be passed to theme function |
## Not run: dat <- iris dat[c(1, 5, 17:20, 110, 115, 140), 1] <- dat[c(12, 15, 55, 73, 100:103), 2] <- dat[sample(1:150, 25), 4] <- NA mplot_missing(dat) ## End(Not run)
## Not run: dat <- iris dat[c(1, 5, 17:20, 110, 115, 140), 1] <- dat[c(12, 15, 55, 73, 100:103), 2] <- dat[sample(1:150, 25), 4] <- NA mplot_missing(dat) ## End(Not run)
Plots a mosaic plot using graphics::mosaicplot
mplot3_mosaic( x, main = NULL, xlab = NULL, ylab = NULL, border = FALSE, theme = rtTheme, theme.args = list(), palette = rtPalette, mar = NULL, oma = rep(0, 4), par.reset = TRUE, new = FALSE, autolabel = letters, filename = NULL, pdf.width = 5, pdf.height = 5, ... )
mplot3_mosaic( x, main = NULL, xlab = NULL, ylab = NULL, border = FALSE, theme = rtTheme, theme.args = list(), palette = rtPalette, mar = NULL, oma = rep(0, 4), par.reset = TRUE, new = FALSE, autolabel = letters, filename = NULL, pdf.width = 5, pdf.height = 5, ... )
x |
contingency table, e.g. output of |
main |
Character: Main title |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
border |
Color vector for cell borders or FALSE to turn off. Default = FALSE |
theme |
Character: Run |
theme.args |
List of arguments to pass to |
palette |
Vector of colors, or Character defining a builtin palette - get options with
|
new |
Logical: If TRUE, add plot to existing plot. See |
filename |
Character: Path to file to save plot. Default = NULL |
pdf.width |
Float: Width in inches for PDF output, if |
pdf.height |
Float: Height in inches for PDF output, if |
E.D. Gennatas
## Not run: party <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(party) <- list(gender = c("F", "M"), party = c("Democrat","Independent", "Republican")) mplot3_mosaic(party) ## End(Not run)
## Not run: party <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(party) <- list(gender = c("F", "M"), party = c("Democrat","Independent", "Republican")) mplot3_mosaic(party) ## End(Not run)
mplot3
Precision Recall curvesPlot Precision Recall curve for a binary classifier
mplot3_pr( prob, labels, f1 = FALSE, main = "", col = NULL, cex = 1.2, lwd = 2.5, diagonal = FALSE, hline.lty = 1, hline.lwd = 1, hline.col = "red", diagonal.lwd = 2.5, diagonal.lty = 3, group.legend = FALSE, annotation = TRUE, annotation.side = 3, annotation.col = col, annot.line = NULL, annot.adj = 1, annot.font = 1, mar = c(2.5, 3, 2.5, 1), theme = rtTheme, palette = rtPalette, par.reset = TRUE, verbose = TRUE, filename = NULL, pdf.width = 5, pdf.height = 5 )
mplot3_pr( prob, labels, f1 = FALSE, main = "", col = NULL, cex = 1.2, lwd = 2.5, diagonal = FALSE, hline.lty = 1, hline.lwd = 1, hline.col = "red", diagonal.lwd = 2.5, diagonal.lty = 3, group.legend = FALSE, annotation = TRUE, annotation.side = 3, annotation.col = col, annot.line = NULL, annot.adj = 1, annot.font = 1, mar = c(2.5, 3, 2.5, 1), theme = rtTheme, palette = rtPalette, par.reset = TRUE, verbose = TRUE, filename = NULL, pdf.width = 5, pdf.height = 5 )
prob |
Vector, Float [0, 1]: Predicted probabilities (i.e. c(.1, .8, .2, .9)) |
labels |
Vector, Integer 0, 1: True labels (i.e. c(0, 1, 0, 1)) |
f1 |
Logical: If TRUE, annotate the point of maximal F1 score. |
main |
Character: Plot title. |
col |
Color, vector: Colors to use for ROC curve(s) |
cex |
Float: Character expansion factor. |
lwd |
Float: Line width. |
diagonal |
Logical: If TRUE, draw diagonal. |
hline.lty |
Integer: Line type for horizontal line(s) |
hline.lwd |
Float: Width for horizontal line(s) |
hline.col |
Color for horizontal line(s) |
diagonal.lwd |
Float: Line width for diagonal. |
diagonal.lty |
Integer: Line type for diagonal. |
group.legend |
Logical |
annotation |
Character: Add annotation at the bottom right of the plot |
annotation.side |
Integer: Side of plot to place annotation. |
annotation.col |
Color: Color of annotation. |
annot.line |
Numeric: Line number for annotation. |
annot.adj |
Numeric: Adjustment for annotation. |
annot.font |
Integer: Font for annotation. |
mar |
Float, vector, length 4: Margins; see |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
par.reset |
Logical: If TRUE, reset |
verbose |
Logical: If TRUE, print messages to console. |
filename |
Path to file: If supplied, plot will be printed to file |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
List with Precision, Recall, and Threshold values, invisibly
E.D. Gennatas
Plot output of a regression or classification tree created using rpart
A wrapper for rpart.plot::rpart.plot
mplot3_prp( object, type = 0, extra = "auto", branch.lty = 1, under = FALSE, fallen.leaves = TRUE, palette = NULL, filename = NULL, pdf.width = 7, pdf.height = 5, ... )
mplot3_prp( object, type = 0, extra = "auto", branch.lty = 1, under = FALSE, fallen.leaves = TRUE, palette = NULL, filename = NULL, pdf.width = 7, pdf.height = 5, ... )
object |
Output of s_CART |
palette |
Color vector |
mplot3
Plot resample
Visualizes resampling output using mplot3_img
mplot3_res(res, col = NULL, mar = NULL, theme = rtTheme, ...)
mplot3_res(res, col = NULL, mar = NULL, theme = rtTheme, ...)
res |
rtemis resample object |
col |
Color vector |
mar |
Numeric vector: image margins |
theme |
rtemis theme |
... |
Additional theme arguments |
For resampling with no replacement where each case may be selected 0 or 1 time, 0 is white and 1 is teal For resampling with replacement, 0 is white, 1 is blue, 2 is teal
E.D. Gennatas
## Not run: x <- rnorm(500) res <- resample(x) mplot3_res(res) ## End(Not run)
## Not run: x <- rnorm(500) res <- resample(x) mplot3_res(res) ## End(Not run)
mplot3
ROC curvesPlot ROC curve for a binary classifier
mplot3_roc( prob, labels, method = c("pROC", "rt"), type = "TPR.FPR", balanced.accuracy = FALSE, main = "", col = NULL, alpha = 1, cex = 1.2, lwd = 2.5, diagonal = TRUE, diagonal.lwd = 1, diagonal.lty = 1, diagonal.col = "red", group.legend = FALSE, annotation = TRUE, annotation.col = col, annot.line = NULL, annot.adj = 1, annot.font = 1, pty = "s", mar = c(2.5, 3, 2, 1), theme = rtTheme, palette = rtPalette, verbose = TRUE, par.reset = TRUE, filename = NULL, pdf.width = 5, pdf.height = 5 )
mplot3_roc( prob, labels, method = c("pROC", "rt"), type = "TPR.FPR", balanced.accuracy = FALSE, main = "", col = NULL, alpha = 1, cex = 1.2, lwd = 2.5, diagonal = TRUE, diagonal.lwd = 1, diagonal.lty = 1, diagonal.col = "red", group.legend = FALSE, annotation = TRUE, annotation.col = col, annot.line = NULL, annot.adj = 1, annot.font = 1, pty = "s", mar = c(2.5, 3, 2, 1), theme = rtTheme, palette = rtPalette, verbose = TRUE, par.reset = TRUE, filename = NULL, pdf.width = 5, pdf.height = 5 )
prob |
Numeric vector or list of numeric vectors [0, 1]: Predicted probabilities (e.g. c(.1, .8, .2, .9)) |
labels |
Integer vector or list of integer vectors 0, 1: True labels (e.g. c(0, 1, 0, 1)) |
method |
Character: "rt" or "pROC" will use rtROC and |
type |
Character: "TPR.FPR" or "Sens.Spec". Only changes the x and y labels. True positive rate vs. False positive rate and Sensitivity vs. Specificity. |
balanced.accuracy |
Logical: If TRUE, annotate the point of maximal Balanced Accuracy. |
main |
Character: Plot title. |
col |
Color, vector: Colors to use for ROC curve(s) |
alpha |
Numeric: Alpha transparency for lines |
cex |
Float: Character expansion factor. |
lwd |
Float: Line width. |
diagonal |
Logical: If TRUE, draw diagonal. |
diagonal.lwd |
Float: Line width for diagonal. |
diagonal.lty |
Integer: Line type for diagonal. |
diagonal.col |
Color: Color for diagonal. |
group.legend |
Logical: If TRUE, print group legend |
annotation |
Character: Add annotation at the bottom right of the plot |
annotation.col |
Color: Color for annotation. |
annot.line |
Numeric: Line position for annotation. |
annot.adj |
Numeric: Text adjustment for annotation. |
annot.font |
Integer: Font for annotation. |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
mar |
Float, vector, length 4: Margins; see |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
verbose |
Logical: If TRUE, print messages to console. |
par.reset |
Logical: If TRUE, reset |
filename |
Path to file: If supplied, plot will be printed to file |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
E.D. Gennatas
mplot3
: Survival PlotsPlots survival step functions using mplot3_xy
mplot3_surv( x, lty = 1, lwd = 2, alpha = 1, col = NULL, mark.censored = TRUE, normalize.time = FALSE, cex = 1.2, xlab = NULL, ylab = "Survival", main = "Kaplan-Meier curve", theme = rtTheme, palette = rtPalette, plot.error = FALSE, error.lty = 2, error.alpha = 0.5, group.legend = NULL, group.title = "", group.names = NULL, group.side = 3, group.adj = 0.98, group.padj = 2, group.at = NA, par.reset = TRUE, ... )
mplot3_surv( x, lty = 1, lwd = 2, alpha = 1, col = NULL, mark.censored = TRUE, normalize.time = FALSE, cex = 1.2, xlab = NULL, ylab = "Survival", main = "Kaplan-Meier curve", theme = rtTheme, palette = rtPalette, plot.error = FALSE, error.lty = 2, error.alpha = 0.5, group.legend = NULL, group.title = "", group.names = NULL, group.side = 3, group.adj = 0.98, group.padj = 2, group.at = NA, par.reset = TRUE, ... )
x |
Survival object / list of Survival objects created using |
lty |
Integer: Line type. Default = 1. See |
lwd |
Float: Line width. Default = 2 |
alpha |
Float: Alpha for lines. Default = 1 |
normalize.time |
Logical: If TRUE, convert each input's time to 0-1 range. This is useful when survival estimates are not provided in original time scale. Default = FALSE. |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
main |
Character: Plot title |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
group.legend |
Logical: If TRUE, place |
group.title |
Character: Group title, shown above group names. e.g.
if group names are c("San Francisco", "Philadelphia"), |
group.names |
(Optional) If multiple groups are plotted, use these
names if |
group.side |
Integer: Side to show group legend |
group.adj |
Float: |
group.padj |
Float: |
group.at |
Float: location for group legend. See |
par.reset |
Logical: If TRUE, reset |
... |
Additional arguments to pass to mplot3_xy |
E.D. Gennatas
## Not run: library(survival) mplot3_surv(Surv(time = lung$time, event = lung$status)) ## End(Not run)
## Not run: library(survival) mplot3_surv(Surv(time = lung$time, event = lung$status)) ## End(Not run)
mplot3
: Plot survfit
objectsPlots survival step functions using mplot3_xy
mplot3_survfit( x, lty = 1, lwd = 1.5, alpha = 1, col = NULL, plot.median = FALSE, group.median = FALSE, median.lty = 3, median.lwd = 2, median.col = theme$fg, median.alpha = 0.5, censor.mark = TRUE, censor.col = NULL, censor.alpha = 0.4, censor.pch = "I", censor.cex = 0.8, mark.censored = FALSE, nrisk.table = FALSE, nrisk.pos = "below", nrisk.spacing = 0.9, table.font = 1, time.at = NULL, time.by = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Survival", main = "", theme = rtTheme, palette = rtPalette, plot.error = FALSE, error.alpha = 0.33, autonames = TRUE, group.legend = NULL, group.legend.type = c("legend", "mtext"), group.names = NULL, group.title = NULL, group.line = NULL, group.side = NULL, legend.x = NULL, mar = c(2.5, 3, 2, 1), oma = NULL, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
mplot3_survfit( x, lty = 1, lwd = 1.5, alpha = 1, col = NULL, plot.median = FALSE, group.median = FALSE, median.lty = 3, median.lwd = 2, median.col = theme$fg, median.alpha = 0.5, censor.mark = TRUE, censor.col = NULL, censor.alpha = 0.4, censor.pch = "I", censor.cex = 0.8, mark.censored = FALSE, nrisk.table = FALSE, nrisk.pos = "below", nrisk.spacing = 0.9, table.font = 1, time.at = NULL, time.by = NULL, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Survival", main = "", theme = rtTheme, palette = rtPalette, plot.error = FALSE, error.alpha = 0.33, autonames = TRUE, group.legend = NULL, group.legend.type = c("legend", "mtext"), group.names = NULL, group.title = NULL, group.line = NULL, group.side = NULL, legend.x = NULL, mar = c(2.5, 3, 2, 1), oma = NULL, par.reset = TRUE, pdf.width = 6, pdf.height = 6, filename = NULL, ... )
x |
survfit object (output of |
lty |
Integer: Line type. See |
lwd |
Float: Line width. |
alpha |
Float: Alpha for lines. |
col |
Color, vector: Color(s) to use for survival curves and annotations. If NULL,
taken from |
plot.median |
Logical: If TRUE, draw lines at 50 percent median survival. |
group.median |
Logical: If TRUE, include median survival times with group legend |
median.lty |
Integer: Median survival line type |
median.lwd |
Float: Median line width. |
median.col |
Color for median survival lines |
median.alpha |
Float, (0, 1): Transparency for median survival lines. |
censor.mark |
Logical: If TRUE, mark each censored case. |
censor.col |
Color to mark censored cases if |
censor.alpha |
Transparency for |
censor.pch |
Character: Point character for censored marks. |
censor.cex |
Float: Character expansion factor for censor marks. |
mark.censored |
Logical: This is an alternative to |
nrisk.table |
Logical: If TRUE, print Number at risk table. |
nrisk.pos |
Character: "above" or "below": where to place |
nrisk.spacing |
Float: Determines spacing between |
table.font |
Integer: 1: regular font, 2: bold. |
time.at |
Float, vector: x-axis positions to place tickmarks and labels as well as n at risk
values if |
time.by |
Float: Divide time by this amount to determine placing of tickmarks |
xlim |
Float, vector, length 2: x-axis limits |
ylim |
Float, vector, length 2: y-axis limits |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
main |
Character: main title |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette - get options with
|
autonames |
Logical: If TRUE, extract grouping variable names and level labels from |
group.legend |
Logical: If TRUE, include group legend |
group.names |
Character, vector: Group names to use. If NULL, extracted from |
group.title |
Character: Group legend title |
group.line |
Float, vector: Lines to print group legend using |
group.side |
Integer: Side to print group legend. Default is determined by survival curves, to avoid overlap of legend with curves. |
mar |
Float, vector, length 4: Margins. See |
oma |
Float, vector, length 4: Outer margins. See |
par.reset |
Logical: If TRUE, reset par to initial values before exit |
... |
Additional arguments to pass to theme |
E.D. Gennatas
## Not run: # Get the lung dataset data(cancer, package = "survival") sf1 <- survival::survfit(survival::Surv(time, status) ~ 1, data = lung) mplot3_survfit(sf1) sf2 <- survival::survfit(survival::Surv(time, status) ~ sex, data = lung) mplot3_survfit(sf2) # with N at risk table mplot3_survfit(sf2, nrisk.table = TRUE) ## End(Not run)
## Not run: # Get the lung dataset data(cancer, package = "survival") sf1 <- survival::survfit(survival::Surv(time, status) ~ 1, data = lung) mplot3_survfit(sf1) sf2 <- survival::survfit(survival::Surv(time, status) ~ sex, data = lung) mplot3_survfit(sf2) # with N at risk table mplot3_survfit(sf2, nrisk.table = TRUE) ## End(Not run)
mplot3
: Variable ImportanceDraw horizontal barplots for variable importance
mplot3_varimp( x, error = NULL, names = NULL, names.pad = 0.02, plot.top = 1, labelify = TRUE, col = NULL, palette = rtPalette, alpha = 1, error.col = theme$fg, error.lwd = 2, beside = TRUE, border = NA, width = 1, space = 0.75, xlim = NULL, ylim = NULL, xlab = "Variable Importance", xlab.line = 1.3, ylab = NULL, ylab.line = 1.5, main = NULL, names.arg = NULL, axisnames = FALSE, sidelabels = NULL, mar = NULL, pty = "m", barplot.axes = FALSE, xaxis = TRUE, x.axis.padj = -1.2, tck = -0.015, theme = rtTheme, zerolines = FALSE, par.reset = TRUE, autolabel = letters, pdf.width = NULL, pdf.height = NULL, trace = 0, filename = NULL, ... )
mplot3_varimp( x, error = NULL, names = NULL, names.pad = 0.02, plot.top = 1, labelify = TRUE, col = NULL, palette = rtPalette, alpha = 1, error.col = theme$fg, error.lwd = 2, beside = TRUE, border = NA, width = 1, space = 0.75, xlim = NULL, ylim = NULL, xlab = "Variable Importance", xlab.line = 1.3, ylab = NULL, ylab.line = 1.5, main = NULL, names.arg = NULL, axisnames = FALSE, sidelabels = NULL, mar = NULL, pty = "m", barplot.axes = FALSE, xaxis = TRUE, x.axis.padj = -1.2, tck = -0.015, theme = rtTheme, zerolines = FALSE, par.reset = TRUE, autolabel = letters, pdf.width = NULL, pdf.height = NULL, trace = 0, filename = NULL, ... )
x |
Vector, numeric: Input |
error |
Vector, numeric; length = length(x): Plot error bars with given error. |
names |
Vector, string; optional: Names of variables in |
plot.top |
Float or Integer: If <= 1, plot this percent highest absolute values, otherwise plot this many top values.
i.e.: |
labelify |
Logical: If TRUE convert |
col |
Colors: Gradient to use for barplot fill. |
alpha |
Float (0, 1): Alpha for |
error.col |
Color: For error bars |
trace |
Integer: If |
"NA" values in input are set to zero.
Position of bar centers (invisibly)
E.D. Gennatas
mplot3
: Univariate plots: index, histogram, density, QQ-lineDraw plots of 1-dimensional data: index, histogram, density, and Q-Q plots.
mplot3_x( x, type = c("density", "histogram", "hd", "lhist", "index", "ts", "qqline"), group = NULL, data = NULL, xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL, index.ypad = 0.1, axes.swap = FALSE, axes.col = NULL, tick.col = NULL, cex = 1.2, col = NULL, alpha = 0.75, index.type = c("p", "l"), hist.breaks = "Sturges", hist.type = c("bars", "lines"), hist.probability = FALSE, hist.lwd = 3, density.line = FALSE, density.shade = TRUE, density.legend.side = 3, density.legend.adj = 0.98, density.bw = "nrd0", density.kernel = "gaussian", density.params = list(na.rm = na.rm), qqline.col = "#18A3AC", qqline.alpha = 1, pch = 16, point.col = NULL, point.cex = 1, point.bg.col = NULL, point.alpha = 0.66, hline = NULL, vline = NULL, diagonal = FALSE, grid = FALSE, grid.col = NULL, grid.alpha = 0.5, grid.lty = 3, grid.lwd = 1, annotation = NULL, annot.col = NULL, group.legend = NULL, group.title = "", group.names = NULL, group.side = 3, group.adj = 0.02, group.at = NA, text.xy = NULL, text.x = NULL, text.y = NULL, text.xy.cex = 1, text.xy.col = "white", line.col = "#008E00", x.axis.padj = -1.1, y.axis.padj = 0.9, labs.col = NULL, lab.adj = 0.5, density.avg = ifelse(type == "density", TRUE, FALSE), density.avg.fn = c("median", "mean"), density.avg.line = FALSE, density.avg.lwd = 1.5, density.avg.lty = 3, hline.col = "black", hline.lwd = 1, hline.lty = 1, vline.col = "black", vline.lwd = 1, vline.lty = 1, lty = 1, lwd = 2, qqline.lwd = lwd, density.lwd = lwd, theme = rtTheme, palette = rtPalette, pty = "m", mar = NULL, oma = rep(0, 4), xaxs = "r", yaxs = "r", autolabel = letters, new = FALSE, alpha.off = FALSE, na.rm = TRUE, par.reset = TRUE, filename = NULL, pdf.width = 6, pdf.height = 6, ... )
mplot3_x( x, type = c("density", "histogram", "hd", "lhist", "index", "ts", "qqline"), group = NULL, data = NULL, xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL, index.ypad = 0.1, axes.swap = FALSE, axes.col = NULL, tick.col = NULL, cex = 1.2, col = NULL, alpha = 0.75, index.type = c("p", "l"), hist.breaks = "Sturges", hist.type = c("bars", "lines"), hist.probability = FALSE, hist.lwd = 3, density.line = FALSE, density.shade = TRUE, density.legend.side = 3, density.legend.adj = 0.98, density.bw = "nrd0", density.kernel = "gaussian", density.params = list(na.rm = na.rm), qqline.col = "#18A3AC", qqline.alpha = 1, pch = 16, point.col = NULL, point.cex = 1, point.bg.col = NULL, point.alpha = 0.66, hline = NULL, vline = NULL, diagonal = FALSE, grid = FALSE, grid.col = NULL, grid.alpha = 0.5, grid.lty = 3, grid.lwd = 1, annotation = NULL, annot.col = NULL, group.legend = NULL, group.title = "", group.names = NULL, group.side = 3, group.adj = 0.02, group.at = NA, text.xy = NULL, text.x = NULL, text.y = NULL, text.xy.cex = 1, text.xy.col = "white", line.col = "#008E00", x.axis.padj = -1.1, y.axis.padj = 0.9, labs.col = NULL, lab.adj = 0.5, density.avg = ifelse(type == "density", TRUE, FALSE), density.avg.fn = c("median", "mean"), density.avg.line = FALSE, density.avg.lwd = 1.5, density.avg.lty = 3, hline.col = "black", hline.lwd = 1, hline.lty = 1, vline.col = "black", vline.lwd = 1, vline.lty = 1, lty = 1, lwd = 2, qqline.lwd = lwd, density.lwd = lwd, theme = rtTheme, palette = rtPalette, pty = "m", mar = NULL, oma = rep(0, 4), xaxs = "r", yaxs = "r", autolabel = letters, new = FALSE, alpha.off = FALSE, na.rm = TRUE, par.reset = TRUE, filename = NULL, pdf.width = 6, pdf.height = 6, ... )
x |
Numeric vector or list of vectors, one for each group.
If |
type |
Character: "density", "histogram", "hd" (histogram bars & density lines),
"lhist" (line histogram like mhist; same as |
group |
Vector denoting group membership. Will be converted to factor.
If |
data |
Optional data frame containing x data |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
main |
Character: Plot title |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
index.ypad |
Float: Expand ylim by this much for plot type "index" Default = .1 (Stops points being cut off) |
index.type |
Character: "p" for points (Default), "l" for lines (timeseries) |
hist.breaks |
See |
hist.type |
Character: "bars" or "lines". Default = "bars" |
hist.lwd |
Float: Line width for |
density.line |
Logical: If TRUE, draw line for |
density.shade |
Logical: If TRUE, draw shaded polygon for |
qqline.col |
Color for Q-Q line |
qqline.alpha |
Float: Alpha for Q-Q line |
pch |
Integer: Point character. |
point.cex |
Float: Character expansion for points. |
point.bg.col |
Color: point background |
hline |
Vector: y-value(s) for horizontal lines. |
vline |
Vector: x-value(s) for vertical lines. |
diagonal |
Logical: If TRUE, draw diagonal line. |
annotation |
Character: Add annotation at the bottom right of the plot |
group.legend |
Logical: If TRUE, include legend with group names |
group.title |
Character: Title above group names |
group.names |
(Optional) If multiple groups are plotted, use these
names if |
group.side |
Integer: Side to show group legend |
group.adj |
Float: |
group.at |
Float: location for group legend. See |
line.col |
Color for lines |
labs.col |
Color for labels |
lab.adj |
Adjust the axes labels. 0 = left adjust; 1 = right adjust; .5 = center (Default) |
density.avg |
Logical: If TRUE, print mean of |
density.avg.fn |
Character: "median" or "mean". Function to use if
|
density.avg.line |
Logical: If TRUE, draw vertical lines at the density average x-value |
density.avg.lwd |
Float: Line width for |
density.avg.lty |
Integer: Line type for |
hline.col |
Color for horizontal line(s) |
hline.lwd |
Float: Width for horizontal line(s) |
hline.lty |
Integer: Line type for horizontal line(s) |
vline.col |
Color for vertical lines |
vline.lwd |
Float: Width for vertical lines |
vline.lty |
Integer: Line type for vertical lines |
lty |
Integer: Line type. See |
lwd |
Integer: Line width. Used for |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
mar |
Float, vector, length 4: Margins; see |
oma |
Float, vector, length 4: Outer margins; see |
xaxs |
Character: "r": Extend plot x-axis limits by 4% on either end; "i": Use exact x-axis limits. |
yaxs |
Character: as |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
new |
Logical: If TRUE, add plot to existing plot. See |
na.rm |
Logical: Will be passed to all functions that support it. If set to FALSE,
input containing NA values will result in error, depending on the |
par.reset |
Logical: If TRUE, reset |
filename |
Path to file: If supplied, plot will be printed to file |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
... |
Additional arguments to be passed to theme function |
You can group data either by supplying x as a list where each element contains one vector per
group, or as a data frame where each column represents group,
or by providing a group
variable, which will be converted to factor.
For bivariate plots, see mplot3_xy and mplot3_xym. For heatmaps, see
mplot3_heatmap
To plot histograms of multiple groups, it's best to use hist.type = "lines"
, which will
use mhist
and space apart the breaks for each group
Invisibly returns the output of density
, hist
, qqnorm
, or NULL
E.D. Gennatas
mplot3_xy, mplot3_xym, mplot3_heatmap
## Not run: mplot3_x(iris) mplot3_x(split(iris$Sepal.Length, iris$Species), xlab = "Sepal Length") ## End(Not run)
## Not run: mplot3_x(iris) mplot3_x(split(iris$Sepal.Length, iris$Species), xlab = "Sepal Length") ## End(Not run)
mplot3
: XY Scatter and line plotsPlot points and lines with optional fits and standard error bands
mplot3_xy( x, y = NULL, fit = NULL, formula = NULL, se.fit = FALSE, fit.params = NULL, error.x = NULL, error.y = NULL, cluster = NULL, cluster.params = list(), data = NULL, type = "p", group = NULL, xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL, xpd = TRUE, xaxs = "r", yaxs = "r", log = "", rsq = NULL, rsq.pval = FALSE, rsq.side = 1, rsq.adj = 0.98, rsq.col = NULL, rsq.line = NULL, fit.error = FALSE, fit.error.side = 1, fit.error.padj = NA, xaxp = NULL, yaxp = NULL, scatter = TRUE, axes.equal = FALSE, pty = "m", annotation = NULL, annotation.col = NULL, x.axis.at = NULL, x.axis.labs = TRUE, y.axis.at = NULL, y.axis.labs = TRUE, xlab.adj = 0.5, ylab.adj = 0.5, mar = NULL, oma = rep(0, 4), point.cex = 1, point.bg.col = NULL, pch = ifelse(is.null(point.bg.col), 16, 21), line.col = NULL, line.alpha = 0.66, lwd = 1, lty = 1, marker.col = NULL, marker.alpha = NULL, error.x.col = NULL, error.y.col = NULL, error.x.lty = 1, error.y.lty = 1, error.x.lwd = 1, error.y.lwd = 1, error.arrow.code = 3, fit.col = NULL, fit.lwd = 2.5, fit.alpha = 1, fit.legend = ifelse(is.null(fit), FALSE, TRUE), se.lty = "poly", se.lwd = 1, se.col = NULL, se.alpha = 0.5, se.times = 1.96, se.border = FALSE, se.density = NULL, hline = NULL, hline.col = NULL, hline.lwd = 1.5, hline.lty = 3, vline = NULL, vline.lwd = 1.5, vline.col = "blue", vline.lty = 3, diagonal = FALSE, diagonal.inv = FALSE, diagonal.lwd = 1.5, diagonal.lty = 1, diagonal.col = "gray50", diagonal.alpha = 1, group.legend = NULL, group.title = NULL, group.names = NULL, group.side = 3, group.adj = 0.02, group.padj = 2, group.at = NA, fit.legend.col = NULL, fit.legend.side = 3, fit.legend.adj = 0.02, fit.legend.padj = 2, fit.legend.at = NA, rm.na = TRUE, theme = rtTheme, palette = rtPalette, order.on.x = NULL, autolabel = letters, new = FALSE, par.reset = TRUE, return.lims = FALSE, pdf.width = 6, pdf.height = 6, trace = 0, filename = NULL, ... )
mplot3_xy( x, y = NULL, fit = NULL, formula = NULL, se.fit = FALSE, fit.params = NULL, error.x = NULL, error.y = NULL, cluster = NULL, cluster.params = list(), data = NULL, type = "p", group = NULL, xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL, xpd = TRUE, xaxs = "r", yaxs = "r", log = "", rsq = NULL, rsq.pval = FALSE, rsq.side = 1, rsq.adj = 0.98, rsq.col = NULL, rsq.line = NULL, fit.error = FALSE, fit.error.side = 1, fit.error.padj = NA, xaxp = NULL, yaxp = NULL, scatter = TRUE, axes.equal = FALSE, pty = "m", annotation = NULL, annotation.col = NULL, x.axis.at = NULL, x.axis.labs = TRUE, y.axis.at = NULL, y.axis.labs = TRUE, xlab.adj = 0.5, ylab.adj = 0.5, mar = NULL, oma = rep(0, 4), point.cex = 1, point.bg.col = NULL, pch = ifelse(is.null(point.bg.col), 16, 21), line.col = NULL, line.alpha = 0.66, lwd = 1, lty = 1, marker.col = NULL, marker.alpha = NULL, error.x.col = NULL, error.y.col = NULL, error.x.lty = 1, error.y.lty = 1, error.x.lwd = 1, error.y.lwd = 1, error.arrow.code = 3, fit.col = NULL, fit.lwd = 2.5, fit.alpha = 1, fit.legend = ifelse(is.null(fit), FALSE, TRUE), se.lty = "poly", se.lwd = 1, se.col = NULL, se.alpha = 0.5, se.times = 1.96, se.border = FALSE, se.density = NULL, hline = NULL, hline.col = NULL, hline.lwd = 1.5, hline.lty = 3, vline = NULL, vline.lwd = 1.5, vline.col = "blue", vline.lty = 3, diagonal = FALSE, diagonal.inv = FALSE, diagonal.lwd = 1.5, diagonal.lty = 1, diagonal.col = "gray50", diagonal.alpha = 1, group.legend = NULL, group.title = NULL, group.names = NULL, group.side = 3, group.adj = 0.02, group.padj = 2, group.at = NA, fit.legend.col = NULL, fit.legend.side = 3, fit.legend.adj = 0.02, fit.legend.padj = 2, fit.legend.at = NA, rm.na = TRUE, theme = rtTheme, palette = rtPalette, order.on.x = NULL, autolabel = letters, new = FALSE, par.reset = TRUE, return.lims = FALSE, pdf.width = 6, pdf.height = 6, trace = 0, filename = NULL, ... )
x |
Numeric vector or list of vectors for x-axis.
If |
y |
Numeric vector of list of vectors for y-axis
If |
fit |
Character: rtemis model to calculate |
formula |
Formula: Provide a formula to be solved using s_NLS.
If provided, |
se.fit |
Logical: If TRUE, draw the standard error of the fit |
fit.params |
List: Arguments for learner defined by |
error.x |
Vector, float: Error in |
error.y |
Vector, float: Error in |
cluster |
Character: Clusterer name. Will cluster
|
cluster.params |
List: Names list of parameters to pass to the
|
data |
(Optional) data frame, where |
type |
Character: "p" for points, "l" for lines, "s" for steps.
Default = "p". If |
group |
Vector: will be converted to factor.
If |
xlab |
Character: x-axis label |
ylab |
Character: y-axis label |
main |
Character: Plot title |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
xpd |
Logical or NA: FALSE: plotting clipped to plot region; TRUE: plotting clipped to figure region; NA: plotting clipped to device region. |
xaxs |
Character: "r": Extend plot x-axis limits by 4% on either end; "i": Use exact x-axis limits. |
yaxs |
Character: as |
log |
Character: "x", "y", or "xy", defines if either or both axes should be log-transformed. |
rsq |
Logical: If TRUE, add legend with R-squared (if fit is not NULL) |
rsq.pval |
Logical: If TRUE, add legend with R-squared and its p-value, if fit is not NULL |
rsq.side |
Integer: [1:4] Where to place the |
rsq.adj |
Float: Adjust |
rsq.col |
Color: Color for |
rsq.line |
Numeric: Passed to |
fit.error |
Logical: If TRUE: draw fit error annotation. Default = NULL, which results in TRUE, if fit is set |
fit.error.side |
Integer [1:4]: Which side to draw |
fit.error.padj |
Float: See |
xaxp |
See |
yaxp |
See |
scatter |
Logical: If TRUE, plot (x, y) scatter points. |
axes.equal |
Logical: Should axes be equal? Defaults to FALSE |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
annotation |
Character: Add annotation at the bottom right of the plot |
annotation.col |
Color for annotation |
x.axis.at |
Float, vector: x coordinates to place tick marks.
Default = NULL, determined by |
x.axis.labs |
See |
y.axis.at |
As |
y.axis.labs |
See |
xlab.adj |
Float: |
ylab.adj |
Float: |
mar |
Float, vector, length 4: Margins; see |
oma |
Float, vector, length 4: Outer margins; see |
point.cex |
Float: Character expansion for points. |
point.bg.col |
Color: point background |
pch |
Integer: Point character. |
line.col |
Color for lines |
line.alpha |
Float [0, 1]: Transparency for lines |
lwd |
Float: Line width |
lty |
Integer: Line type. See |
marker.col |
Color for marker |
marker.alpha |
Float [0, 1]: Transparency for markers |
error.x.col |
Color for x-axis error bars |
error.y.col |
Color for y-axis error bars |
error.x.lty |
Integer: line type for x-axis error bars |
error.y.lty |
Integer: line type for y-axis error bars |
error.x.lwd |
Float: Line width for x-axis error bars |
error.y.lwd |
Float: Line width for y-axis error bars |
error.arrow.code |
Integer: Type of arrow to draw for error bars.
See |
fit.col |
Color: Color of the fit line. |
fit.lwd |
Float: Fit line width |
fit.alpha |
Float [0, 1]: Transparency for fit line |
fit.legend |
Logical: If TRUE, show fit legend |
se.lty |
How to draw the |
se.lwd |
Float: Line width for standard error bounds |
se.col |
Color for |
se.alpha |
Alpha for |
se.times |
Draw polygon or lines at +/- |
se.border |
Define border of polygon for |
se.density |
Density of shading line of polygon for |
hline |
Vector: y-value(s) for horizontal lines. |
hline.col |
Color for horizontal line(s) |
hline.lwd |
Float: Width for horizontal line(s) |
hline.lty |
Integer: Line type for horizontal line(s) |
vline |
Vector: x-value(s) for vertical lines. |
vline.lwd |
Float: Width for vertical lines |
vline.col |
Color for vertical lines |
vline.lty |
Integer: Line type for vertical lines |
diagonal |
Logical: If TRUE, draw diagonal line. |
diagonal.inv |
Logical: If TRUE, draw inverse diagonal line. Will use
|
diagonal.lwd |
Float: Line width for |
diagonal.lty |
Integer: Line type for |
diagonal.col |
Color: Color for |
diagonal.alpha |
Float: Alpha for |
group.legend |
Logical: If TRUE, place |
group.title |
Character: Group title, shown above group names. e.g.
if group names are c("San Francisco", "Philadelphia"), |
group.names |
(Optional) If multiple groups are plotted, use these
names if |
group.side |
Integer: Side to show group legend |
group.adj |
Float: |
group.padj |
Float: |
group.at |
Float: location for group legend. See |
fit.legend.col |
Color for fit legend |
fit.legend.side |
Integer: Side for fit legend |
fit.legend.adj |
Float: |
fit.legend.padj |
Float: |
fit.legend.at |
Float: location for fit legend. See |
rm.na |
Logical: If TRUE, remove all NA values pairwise between x and y. Set to FALSE if you know your data has no missing values. |
theme |
Character: Run |
palette |
Vector of colors, or Character defining a builtin palette -
get options with |
order.on.x |
Logical: If TRUE, order (x, y) by increasing x. Default = NULL: will be set to TRUE if fit is set, otherwise FALSE |
autolabel |
Character vector to be used to generate autolabels when using
rtlayout with |
new |
Logical: If TRUE, add plot to existing plot. See |
par.reset |
Logical: If TRUE, reset |
return.lims |
Logical: If TRUE, return xlim and ylim. |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
trace |
Integer: If > 0, pass |
filename |
Character: Path to file to save plot. Default = NULL |
... |
Additional arguments to be passed to theme function |
E.D. Gennatas
## Not run: set.seed(1999) x <- rnorm(500) ycu <- x^3 + 12 + rnorm(500) mplot3_xy(x, ycu) mplot3_xy(x, ycu, fit = "gam") ysq <- x^2 + 3 + rnorm(500) mplot3_xy(x, list(squared = ysq, cubed = ycu), fit = "gam") ## End(Not run)
## Not run: set.seed(1999) x <- rnorm(500) ycu <- x^3 + 12 + rnorm(500) mplot3_xy(x, ycu) mplot3_xy(x, ycu, fit = "gam") ysq <- x^2 + 3 + rnorm(500) mplot3_xy(x, list(squared = ysq, cubed = ycu), fit = "gam") ## End(Not run)
Draw a scatter plot with fit line and marginal density and/or histogram
mplot3_xym( x, y, margin = c("histogram", "density", "both"), fit = "gam", se.fit = TRUE, xlim = NULL, ylim = NULL, col = "#18A3AC", density.alpha = 0.66, hist.breaks = 30, hist.alpha = 0.66, hist.space = 0.05, hist.lwd = 3, lwd = 4, main = NULL, main.adj = 0, axes.density = FALSE, pty = "m", mar = c(3, 3, 0, 0), margin.mar = 0.2, xaxs = "r", yaxs = "r", theme = rtTheme, par.reset = TRUE, widths = NULL, heights = NULL, filename = NULL, pdf.width = 7, pdf.height = 7, ... )
mplot3_xym( x, y, margin = c("histogram", "density", "both"), fit = "gam", se.fit = TRUE, xlim = NULL, ylim = NULL, col = "#18A3AC", density.alpha = 0.66, hist.breaks = 30, hist.alpha = 0.66, hist.space = 0.05, hist.lwd = 3, lwd = 4, main = NULL, main.adj = 0, axes.density = FALSE, pty = "m", mar = c(3, 3, 0, 0), margin.mar = 0.2, xaxs = "r", yaxs = "r", theme = rtTheme, par.reset = TRUE, widths = NULL, heights = NULL, filename = NULL, pdf.width = 7, pdf.height = 7, ... )
x |
Numeric vector: x-axis data |
y |
Numeric vector: y-axis data |
margin |
Character: "density", "histogram", or "both". Type of marginal plots to draw. |
fit |
Character: Algorithm to use to draw |
se.fit |
Logical: If TRUE: plot +/- 2 * Standard Error of fit |
xlim |
Float vector, length 2: x-axis limits |
ylim |
Float vector, length 2: y-axis limits |
col |
Color for marginal plots |
density.alpha |
Numeric: Alpha for density plots |
hist.breaks |
Integer: Number of histogram breaks |
hist.alpha |
Numeric: Alpha for barplots |
hist.space |
Numeric: Space between bars in barplots |
hist.lwd |
Numeric: Line width for barplots |
lwd |
Numeric: Line width |
main |
Character: Main title |
main.adj |
Numeric: Main title adjustment |
axes.density |
Logical: If TRUE, plot margin plot axes for density (debugging only) |
pty |
Character: "s" gives a square plot; "m" gives a plot that fills
graphics device size. Default = "m" (See |
mar |
Float, vector, length 4: Margins; see |
margin.mar |
Numeric: Margin for marginal plots |
xaxs |
Character: "r": Extend plot x-axis limits by 4% on either end; "i": Use exact x-axis limits. |
yaxs |
Character: as |
theme |
Character: Run |
par.reset |
Logical: Resest |
filename |
Character: Path to file to save plot. Default = NULL |
pdf.width |
Float: Width in inches for pdf output (if |
pdf.height |
Float: Height in inches for pdf output. |
... |
Additional arguments to passed to mplot3_xy |
To make wide plot, change widths
: e.g. widths = c(7, 1)
E.D. Gennatas
## Not run: x <- rnorm(500) y <- x^3 + 12 + rnorm(500) mplot3_xym(x, y) ## End(Not run)
## Not run: x <- rnorm(500) y <- x^3 + 12 + rnorm(500) mplot3_xym(x, y) ## End(Not run)
Convenience functions for calculating loss. These can be passed as arguments to learners that support custom loss functions.
mse(x, y, na.rm = TRUE) msew(x, y, weights = rep(1, length(y)), na.rm = TRUE) rmse(x, y, na.rm = TRUE) mae(x, y, na.rm = TRUE)
mse(x, y, na.rm = TRUE) msew(x, y, weights = rep(1, length(y)), na.rm = TRUE) rmse(x, y, na.rm = TRUE) mae(x, y, na.rm = TRUE)
x |
Vector of True values |
y |
Vector of Estimated values |
na.rm |
Logical: If TRUE, remove NA values before computation. Default = TRUE |
weights |
Float, vector: Case weights |
Plot a panel of gplot2 plots
multigplot(plots = NULL, nrows = NULL, byrow = TRUE)
multigplot(plots = NULL, nrows = NULL, byrow = TRUE)
plots |
List of ggplot2 plots |
nrows |
Integer: number of rows for panel arrangement. Defaults to number of rows required to plot 2 plots per row |
byrow |
Logical: If TRUE, draw plots in order provided by row, otherwise by column. Default = TRUE |
E.D. Gennatas
Calculate number of combinations
nCr(n, r)
nCr(n, r)
n |
Integer: Total number of items |
r |
Integer: Number of items in each combination |
In plain language:
You have n
items. How many different cobinations of r
items can you make?
Integer: Number of combinations
E.D. Gennatas
Number of unique values per feature
nunique_perfeat(x, excludeNA = FALSE)
nunique_perfeat(x, excludeNA = FALSE)
x |
data.table |
excludeNA |
Logical: If TRUE, exclude NA values |
E.D. Gennatas
Calculate odds ratio for a 2x2 contingency table
oddsratio(x, verbose = TRUE)
oddsratio(x, verbose = TRUE)
x |
2x2 contingency table (created with |
verbose |
Logical: If TRUE, print messages to console |
Odds ratio table from logistic regression
oddsratiotable(x, confint.method = c("default", "profilelikelihood"))
oddsratiotable(x, confint.method = c("default", "profilelikelihood"))
x |
glm object fit with |
confint.method |
"default" or "profilelikelihood" |
matrix with 4 columns: OR, 2.5% & 97.5% CI, p_val
E.D. Gennatas
## Not run: ir2 <- iris[51:150, ] ir2$Species <- factor(ir2$Species) ir.fit <- glm(Species ~ ., data = ir2, family = binomial) oddsratiotable(ir.fit) ## End(Not run)
## Not run: ir2 <- iris[51:150, ] ir2$Species <- factor(ir2$Species) ir.fit <- glm(Species ~ ., data = ir2, family = binomial) oddsratiotable(ir.fit) ## End(Not run)
One hot encode a vector or factors in a data.frame
oneHot(x, xname = NULL, verbose = FALSE) ## Default S3 method: oneHot(x, xname = NULL, verbose = TRUE) ## S3 method for class 'data.frame' oneHot(x, xname = NULL, verbose = TRUE) ## S3 method for class 'data.table' oneHot(x, xname = NULL, verbose = TRUE) dt_set_oneHot(x, xname = NULL, verbose = TRUE)
oneHot(x, xname = NULL, verbose = FALSE) ## Default S3 method: oneHot(x, xname = NULL, verbose = TRUE) ## S3 method for class 'data.frame' oneHot(x, xname = NULL, verbose = TRUE) ## S3 method for class 'data.table' oneHot(x, xname = NULL, verbose = TRUE) dt_set_oneHot(x, xname = NULL, verbose = TRUE)
x |
Vector or data.frame |
xname |
Character: Variable name |
verbose |
Logical: If TRUE, print messages to console |
A vector input will be one-hot encoded regardless of type by looking at all unique values. With data.frame input,
only column of type factor will be one-hot encoded.
This function is used by preprocess.
oneHot.data.table
operates on a copy of its input.
oneHot_
performs one-hot encoding in-place.
For vector input, a one-hot-encoded matrix, for data.frame frame input, an expanded data.frame where all factors are one-hot encoded
E.D. Gennatas
## Not run: iris_oh <- oneHot(iris) # factor with only one unique value but 2 levels: vf <- factor(rep("alpha", 20), levels = c("alpha", "beta")) vf_onehot <- oneHot(vf) ## End(Not run) oneHot(iris) |> head() ir <- data.table::as.data.table(iris) ir_oh <- oneHot(ir) ir_oh ir <- data.table::as.data.table(iris) # dt_set_oneHot operates in-place; therefore no assignment is used: dt_set_oneHot(ir) ir
## Not run: iris_oh <- oneHot(iris) # factor with only one unique value but 2 levels: vf <- factor(rep("alpha", 20), levels = c("alpha", "beta")) vf_onehot <- oneHot(vf) ## End(Not run) oneHot(iris) |> head() ir <- data.table::as.data.table(iris) ir_oh <- oneHot(ir) ir_oh ir <- data.table::as.data.table(iris) # dt_set_oneHot operates in-place; therefore no assignment is used: dt_set_oneHot(ir) ir
Convert one-hot encoded matrix to factor
onehot2factor(x, labels = colnames(x))
onehot2factor(x, labels = colnames(x))
x |
one-hot encoded matrix or data.frame |
labels |
Character vector of level names. Default = |
If input has a single column, it will be converted to factor and returned
E.D. Gennatas
## Not run: x <- data.frame(matrix(F, 10, 3)) colnames(x) <- c("Dx1", "Dx2", "Dx3") x$Dx1[1:3] <- x$Dx2[4:6] <- x$Dx3[7:10] <- T onehot2factor(x) ## End(Not run)
## Not run: x <- data.frame(matrix(F, 10, 3)) colnames(x) <- c("Dx1", "Dx2", "Dx3") x$Dx1[1:3] <- x$Dx2[4:6] <- x$Dx3[7:10] <- T onehot2factor(x) ## End(Not run)
Filter and order a set of colors to produce a palette suitable for multicolor plots
palettize( x, grayscale_hicut = 0.8, start_with = "#16A0AC", order_by = c("separation", "dissimilarity", "similarity") )
palettize( x, grayscale_hicut = 0.8, start_with = "#16A0AC", order_by = c("separation", "dissimilarity", "similarity") )
x |
Color vector |
grayscale_hicut |
Numeric: exclude colors whose grayscale equivalent is greater than this value |
start_with |
Integer or color: For integer, start with this color out
of |
order_by |
Character: "separation", "dissimilarity", "similarity" |
E.D. Gennatas
Creates all possible permutations
permute(n)
permute(n)
n |
Integer: Length of elements to permute |
n higher than 10 will take a while, or may run out of memory in systems with limited RAM
Matrix where each row is a different permutation
fread delimited file in parts
pfread( x, part_nrows, nrows = NULL, header = TRUE, sep = "auto", verbose = TRUE, stringsAsFactors = TRUE, ... )
pfread( x, part_nrows, nrows = NULL, header = TRUE, sep = "auto", verbose = TRUE, stringsAsFactors = TRUE, ... )
x |
Character: Path to delimited file |
part_nrows |
Integer: Number of rows to read in each part |
nrows |
Integer: Number of rows in the file |
header |
Logical: If TRUE, the file is assumed to include a header row |
sep |
Character: Delimiter |
verbose |
Logical: If TRUE, print messages to console |
stringsAsFactors |
Logical: If TRUE, characters will be converted to factors |
... |
Additional arguments to pass to |
E.D. Gennatas
massGAM
objectPlots a massGAM
object using dplot3_bar
## S3 method for class 'massGAM' plot( x, predictor = NULL, main = NULL, what = "pvals", p.adjust.method = "none", p.transform = function(x) -log10(x), show = c("all", "signif"), pval.hline = c(0.05, 0.001), hline.col = NULL, hline.dash = "dash", hline.annotate = as.character(pval.hline), ylim = NULL, xlab = NULL, ylab = NULL, group = NULL, grouped.nonsig.alpha = 0.5, order.by.group = TRUE, palette = rtPalette, col.sig = "#43A4AC", col.ns = "#7f7f7f", theme = rtTheme, alpha = NULL, margin = NULL, displayModeBar = FALSE, trace = 0, filename = NULL, ... )
## S3 method for class 'massGAM' plot( x, predictor = NULL, main = NULL, what = "pvals", p.adjust.method = "none", p.transform = function(x) -log10(x), show = c("all", "signif"), pval.hline = c(0.05, 0.001), hline.col = NULL, hline.dash = "dash", hline.annotate = as.character(pval.hline), ylim = NULL, xlab = NULL, ylab = NULL, group = NULL, grouped.nonsig.alpha = 0.5, order.by.group = TRUE, palette = rtPalette, col.sig = "#43A4AC", col.ns = "#7f7f7f", theme = rtTheme, alpha = NULL, margin = NULL, displayModeBar = FALSE, trace = 0, filename = NULL, ... )
x |
|
xlab |
Character: x-axis label for volcano plot |
E.D. Gennatas
massGLM
objectPlots a massGLM
object using dplot3_bar
## S3 method for class 'massGLM' plot( x, predictor = NULL, main = NULL, what = c("volcano", "coefs", "pvals"), p.adjust.method = "holm", p.transform = function(x) -log10(x), show = c("all", "signif"), xnames = NULL, pval.hline = c(0.05, 0.001), hline.col = "#ffffff", hline.dash = "dash", hline.annotate = as.character(pval.hline), ylim = NULL, xlab = NULL, ylab = NULL, group = NULL, col.neg = "#43A4AC", col.pos = "#FA9860", col.ns = "#7f7f7f", theme = rtTheme, alpha = NULL, volcano.annotate = TRUE, volcano.annotate.n = 7, volcano.hline = NULL, volcano.hline.dash = "dot", volcano.hline.annotate = NULL, volcano.p.transform = function(x) -log10(x), margin = NULL, displayModeBar = TRUE, trace = 0, filename = NULL, ... )
## S3 method for class 'massGLM' plot( x, predictor = NULL, main = NULL, what = c("volcano", "coefs", "pvals"), p.adjust.method = "holm", p.transform = function(x) -log10(x), show = c("all", "signif"), xnames = NULL, pval.hline = c(0.05, 0.001), hline.col = "#ffffff", hline.dash = "dash", hline.annotate = as.character(pval.hline), ylim = NULL, xlab = NULL, ylab = NULL, group = NULL, col.neg = "#43A4AC", col.pos = "#FA9860", col.ns = "#7f7f7f", theme = rtTheme, alpha = NULL, volcano.annotate = TRUE, volcano.annotate.n = 7, volcano.hline = NULL, volcano.hline.dash = "dot", volcano.hline.annotate = NULL, volcano.p.transform = function(x) -log10(x), margin = NULL, displayModeBar = TRUE, trace = 0, filename = NULL, ... )
x |
|
what |
Character: "adjusted" or "raw" p-values to plot |
xlab |
Character: x-axis label for volcano plot |
E.D. Gennatas
plot
method for resample
objectRun mplot3_res on a resample object
## S3 method for class 'resample' plot(x, col = NULL, ...)
## S3 method for class 'resample' plot(x, col = NULL, ...)
x |
Vector; numeric or factor: Outcome used for resampling |
col |
Vector, color |
... |
Additional arguments passed to mplot3_res |
E.D. Gennatas
rtModCVCalibration
objectPlot Calibration plots and Brier score boxplots for rtModCVCalibration
object
## S3 method for class 'rtModCVCalibration' plot( x, what = c("calibration", "brier"), type = c("aggregate.all", "aggregate.by.resample"), bin.method = c("quantile", "equidistant"), filename = NULL, ... )
## S3 method for class 'rtModCVCalibration' plot( x, what = c("calibration", "brier"), type = c("aggregate.all", "aggregate.by.resample"), bin.method = c("quantile", "equidistant"), filename = NULL, ... )
x |
|
what |
Character: "calibration" or "brier" |
type |
Character: "aggregate.all" or "aggregate.by.resample" |
bin.method |
Character: "quantile" or "equidistant" |
filename |
Character: Path to save plot as pdf |
... |
Additional arguments |
For calibration plots, type = "aggregate.all"
is likely the more informative one.
It shows calibrations curves before and after calibration by aggregating across all
outer test sets. The type = "aggregate.by.resample"
option shows the calibration
curves after calibration for each outer resample.
For Brier boxplots, type = "aggregate.all"
shows 1 score per outer resample prior
to calibration and multiple (equal n.resamples used in calibrate_cv) brier scores
per outer resample after calibration. This is for diagnostic purposes mainly.
For presentation, the type = "aggregate.by.resample"
option shows the mean Brier
score per outer resample prior to calibration and after calibration and makes more
sense. The uncalibrated estimates could be resampled using the calibration model
resamples to produce a more comparable boxplot of Brier scores when using the
type = "aggregate.all"
option, but that seems artifactual.
More options are certainly possibly, e.g. an "aggregate.none" that would show all calibration resamples for all outer ersamples, and can be added in the future if needed.
plotly object
E.D. Gennatas
rtTest
objectPlot rtTest
object
## S3 method for class 'rtTest' plot( x, main = NULL, mar = NULL, uni.type = c("density", "histogram", "hd"), boxplot.xlab = FALSE, theme = rtTheme, par.reset = TRUE, ... )
## S3 method for class 'rtTest' plot( x, main = NULL, mar = NULL, uni.type = c("density", "histogram", "hd"), boxplot.xlab = FALSE, theme = rtTheme, par.reset = TRUE, ... )
x |
|
main |
Character: Main title |
theme |
Character: Run |
E.D. Gennatas
plotly
Draw a heatmap using plotly
plotly.heat( z, x = NULL, y = NULL, title = NULL, col = penn.heat(21), xlab = NULL, ylab = NULL, zlab = NULL, transpose = TRUE )
plotly.heat( z, x = NULL, y = NULL, title = NULL, col = penn.heat(21), xlab = NULL, ylab = NULL, zlab = NULL, transpose = TRUE )
z |
Input matrix |
x , y
|
Vectors for x, y axes |
title |
Plot title |
col |
Set of colors to make gradient from |
xlab |
x-axis label |
ylab |
y-axis label |
zlab |
z value label |
transpose |
Logical: If TRUE, transpose matrix |
E.D. Gennatas
The first factor level is considered the positive case.
precision(true, estimated, harmonize = FALSE, verbosity = 1)
precision(true, estimated, harmonize = FALSE, verbosity = 1)
true |
Factor: True labels |
estimated |
Factor: Estimated labels |
harmonize |
Logical: If TRUE, run factor_harmonize first |
verbosity |
Integer: If > 0, print messages to console. |
Obtains predictions from a trained MediBoost model
## S3 method for class 'addtree' predict(object, newdata, verbose = FALSE, ...)
## S3 method for class 'addtree' predict(object, newdata, verbose = FALSE, ...)
object |
A trained model of class |
newdata |
Optional: a matrix / data.frame of features with which to predict |
verbose |
Logical: If TRUE, print messages to output |
... |
Not used |
E.D. Gennatas
boost
objectPredict method for boost
object
## S3 method for class 'boost' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
## S3 method for class 'boost' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
object |
boost object |
newdata |
data.frame: New data to predict on |
n.feat |
Integer: Number of features to use from |
n.iter |
Integer: Number of iterations to use |
as.matrix |
Logical: If TRUE, return matrix of predictions for each iteration, otherwise return vector |
verbose |
Logical: If TRUE, print messages to console |
n.cores |
Integer: Number of cores to use |
... |
Not used |
E.D. Gennatas
cartLite
objectPredict method for cartLite
object
## S3 method for class 'cartLite' predict(object, newdata, verbose = FALSE, ...)
## S3 method for class 'cartLite' predict(object, newdata, verbose = FALSE, ...)
object |
|
newdata |
Data frame of predictors |
verbose |
Logical: If TRUE, print messages to console. |
... |
Unused |
E.D. Gennatas
cartLiteBoostTV
objectPredict method for cartLiteBoostTV
object
## S3 method for class 'cartLiteBoostTV' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
## S3 method for class 'cartLiteBoostTV' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
object |
|
newdata |
Set of predictors |
n.feat |
Integer: N of features to use. Default = NCOL(newdata) |
n.iter |
Integer: N of iterations to predict from. Default = (all available) |
as.matrix |
Logical: If TRUE, return predictions from each iterations. Default = FALSE |
verbose |
Logical: If TRUE, print messages to console. Default = FALSE |
n.cores |
Integer: Number of cores to use. Default = |
... |
Unused |
E.D. Gennatas
glmLite
objectPredict method for glmLite
object
## S3 method for class 'glmLite' predict(object, newdata, verbose = FALSE, ...)
## S3 method for class 'glmLite' predict(object, newdata, verbose = FALSE, ...)
object |
glmLite object |
newdata |
Data frame of predictors |
verbose |
Logical: If TRUE, print messages to console. Default = FALSE |
... |
Unused |
E.D. Gennatas
glmLiteBoostTV
objectPredict method for glmLiteBoostTV
object
## S3 method for class 'glmLiteBoostTV' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
## S3 method for class 'glmLiteBoostTV' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, as.matrix = FALSE, verbose = FALSE, n.cores = rtCores, ... )
object |
|
newdata |
Set of predictors |
n.feat |
Integer: N of features to use. |
n.iter |
Integer: N of iterations to predict from. |
as.matrix |
Logical: If TRUE, return predictions from each iteration. |
verbose |
Logical: If TRUE, print messages to console. |
n.cores |
Integer: Number of cores to use. |
... |
Unused |
E.D. Gennatas
hytboost
objectPredict method for hytboost
object
## S3 method for class 'hytboost' predict( object, newdata = NULL, n.iter = NULL, fixed.cxr = NULL, as.matrix = FALSE, n.cores = 1, verbose = FALSE, ... )
## S3 method for class 'hytboost' predict( object, newdata = NULL, n.iter = NULL, fixed.cxr = NULL, as.matrix = FALSE, n.cores = 1, verbose = FALSE, ... )
object |
|
newdata |
data.frame of predictors |
n.iter |
Integer: Use the first so many trees for prediction |
fixed.cxr |
(internal use) Matrix: Cases by rules to use instead of matching
cases to rules using |
as.matrix |
Logical: If TRUE, output |
n.cores |
Integer: Number of cores to use |
verbose |
Logical: If TRUE, print messages to console |
... |
Not used |
E.D. Gennatas
hytboostnow
objectPredict method for hytboostnow
object
## S3 method for class 'hytboostnow' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, fixed.cxr = NULL, as.matrix = FALSE, n.cores = 1, verbose = FALSE, ... )
## S3 method for class 'hytboostnow' predict( object, newdata = NULL, n.feat = NCOL(newdata), n.iter = NULL, fixed.cxr = NULL, as.matrix = FALSE, n.cores = 1, verbose = FALSE, ... )
object |
|
E.D. Gennatas
hytreeLite
objectPredict method for hytreeLite
object
## S3 method for class 'hytreenow' predict( object, newdata, n.feat = NCOL(newdata), fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
## S3 method for class 'hytreenow' predict( object, newdata, n.feat = NCOL(newdata), fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
object |
|
newdata |
Data frame of predictors |
n.feat |
(internal use) Integer: Use first |
fixed.cxr |
(internal use) Matrix: Cases by rules to use instead of matching cases to rules using
|
cxr.newdata |
(internal use) Data frame: Use these values to match cases by rules |
cxr |
Logical: If TRUE, return list which includes cases-by-rules matrix along with predicted values |
cxrcoef |
Logical: If TRUE, return cases-by-rules * coefficients matrix along with predicted values |
verbose |
Logical: If TRUE, print messages to console |
... |
Not used |
E.D. Gennatas
hytreew
objectPredict method for hytreew
object
## S3 method for class 'hytreew' predict( object, newdata, n.feat = NCOL(newdata), fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
## S3 method for class 'hytreew' predict( object, newdata, n.feat = NCOL(newdata), fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
object |
|
newdata |
Data frame of predictors |
n.feat |
(internal use) Integer: Use first |
fixed.cxr |
(internal use) Matrix: Cases by rules to use instead of matching cases to rules using
|
cxr.newdata |
(internal use) Data frame: Use these values to match cases by rules |
cxr |
Logical: If TRUE, return list which includes cases-by-rules matrix along with predicted values |
cxrcoef |
Logical: If TRUE, return cases-by-rules * coefficients matrix along with predicted values |
verbose |
Logical: If TRUE, print messages to console |
... |
Not used |
E.D. Gennatas
predict
method for LightRuleFit
objectpredict
method for LightRuleFit
object
## S3 method for class 'LightRuleFit' predict( object, newdata = NULL, return.cases.by.rules = FALSE, verbose = TRUE, ... )
## S3 method for class 'LightRuleFit' predict( object, newdata = NULL, return.cases.by.rules = FALSE, verbose = TRUE, ... )
object |
|
newdata |
Feature matrix / data.frame: will be converted to |
return.cases.by.rules |
Logical: If TRUE, return cases by rules matrix |
verbose |
Logical: If TRUE, print messages during execution. Default = TRUE |
... |
Ignored |
Vector of estimated values
lihad
objectPredict method for lihad
object
## S3 method for class 'lihad' predict( object, newdata = NULL, learning.rate = NULL, n.feat = NULL, verbose = FALSE, cxrcoef = FALSE, ... )
## S3 method for class 'lihad' predict( object, newdata = NULL, learning.rate = NULL, n.feat = NULL, verbose = FALSE, cxrcoef = FALSE, ... )
object |
an |
newdata |
data frame of predictor features |
learning.rate |
Float: learning rate if |
n.feat |
Integer: internal use only |
verbose |
Logical: If TRUE, print messages to console. |
cxrcoef |
Logical: If TRUE, return matrix of cases by coefficients along with predictions. |
... |
Not used |
E.D. Gennatas
linadleaves
objectPredict method for linadleaves
object
## S3 method for class 'linadleaves' predict( object, newdata, type = c("response", "probability", "all", "step"), n.leaves = NULL, fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
## S3 method for class 'linadleaves' predict( object, newdata, type = c("response", "probability", "all", "step"), n.leaves = NULL, fixed.cxr = NULL, cxr.newdata = NULL, cxr = FALSE, cxrcoef = FALSE, verbose = FALSE, ... )
object |
|
newdata |
Data frame of predictors |
type |
Character: "response", "probability", "all", "step" |
n.leaves |
Integer: Use the first |
fixed.cxr |
(internal use) Matrix: Cases by rules to use instead of matching cases to rules using
|
cxr.newdata |
(internal use) Data frame: Use these values to match cases by rules |
cxr |
Logical: If TRUE, return list which includes cases-by-rules matrix along with predicted values |
cxrcoef |
Logical: If TRUE, return cases-by-rules * coefficients matrix along with predicted values |
verbose |
Logical: If TRUE, print messages to console |
... |
Not used |
E.D. Gennatas
nlareg
objectPredict method for nlareg
object
## S3 method for class 'nlareg' predict(object, newdata, ...)
## S3 method for class 'nlareg' predict(object, newdata, ...)
object |
nlareg object |
newdata |
Data frame of predictors |
... |
Unused |
E.D. Gennatas
nullmod
rtemis internal: predict for an object of class nullmod
## S3 method for class 'nullmod' predict(object, newdata = NULL, ...)
## S3 method for class 'nullmod' predict(object, newdata = NULL, ...)
object |
Object of class |
newdata |
Not used |
... |
Not used |
rtBSplines
Predict S3 method for rtBSplines
## S3 method for class 'rtBSplines' predict(object, newdata = NULL, ...)
## S3 method for class 'rtBSplines' predict(object, newdata = NULL, ...)
object |
|
newdata |
|
... |
Not used. |
E.D. Gennatas
Predict using a calibrated model returned by calibrate_cv
## S3 method for class 'rtModCVCalibration' predict(object, mod, newdata, ...)
## S3 method for class 'rtModCVCalibration' predict(object, mod, newdata, ...)
object |
|
mod |
|
newdata |
Data frame: New data to predict on |
... |
Additional arguments - Use to define |
EDG
predict.rtTLS
: predict
method for rtTLS
objectpredict.rtTLS
: predict
method for rtTLS
object
## S3 method for class 'rtTLS' predict(object, newdata, ...)
## S3 method for class 'rtTLS' predict(object, newdata, ...)
object |
|
newdata |
|
... |
Not used. |
predict
method for rulefit
objectpredict
method for rulefit
object
## S3 method for class 'rulefit' predict(object, newdata = NULL, verbose = TRUE, ...)
## S3 method for class 'rulefit' predict(object, newdata = NULL, verbose = TRUE, ...)
object |
|
newdata |
Feature matrix / data.frame: will be converted to |
verbose |
Logical: If TRUE, print messages during execution. Default = TRUE |
... |
Ignored |
Vector of estimated values
Prepare data for analysis and visualization
preprocess( x, completeCases = FALSE, removeCases.thres = NULL, removeFeatures.thres = NULL, missingness = FALSE, impute = FALSE, impute.type = c("missRanger", "micePMM", "meanMode"), impute.missRanger.params = list(pmm.k = 3, maxiter = 10, num.trees = 500), impute.discrete = get_mode, impute.numeric = mean, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, numeric.cut.n = 0, numeric.cut.labels = FALSE, numeric.quant.n = 0, numeric.quant.NAonly = FALSE, len2factor = 0, character2factor = FALSE, factorNA2missing = FALSE, factorNA2missing.level = "missing", factor2integer = FALSE, factor2integer_startat0 = TRUE, scale = FALSE, center = scale, removeConstants = FALSE, removeConstants.skipMissing = TRUE, removeDuplicates = FALSE, oneHot = FALSE, exclude = NULL, xname = NULL, verbose = TRUE )
preprocess( x, completeCases = FALSE, removeCases.thres = NULL, removeFeatures.thres = NULL, missingness = FALSE, impute = FALSE, impute.type = c("missRanger", "micePMM", "meanMode"), impute.missRanger.params = list(pmm.k = 3, maxiter = 10, num.trees = 500), impute.discrete = get_mode, impute.numeric = mean, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, numeric.cut.n = 0, numeric.cut.labels = FALSE, numeric.quant.n = 0, numeric.quant.NAonly = FALSE, len2factor = 0, character2factor = FALSE, factorNA2missing = FALSE, factorNA2missing.level = "missing", factor2integer = FALSE, factor2integer_startat0 = TRUE, scale = FALSE, center = scale, removeConstants = FALSE, removeConstants.skipMissing = TRUE, removeDuplicates = FALSE, oneHot = FALSE, exclude = NULL, xname = NULL, verbose = TRUE )
x |
data.frame to be preprocessed |
completeCases |
Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE |
removeCases.thres |
Float (0, 1): Remove cases with >= to this fraction of missing features. |
removeFeatures.thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
missingness |
Logical: If TRUE, generate new boolean columns for each feature with missing values, indicating which cases were missing data. |
impute |
Logical: If TRUE, impute missing cases. See |
impute.type |
Character: How to impute data: "missRanger" and
"missForest" use the packages of the same name to impute by iterative random
forest regression. "rfImpute" uses |
impute.missRanger.params |
Named list with elements "pmm.k" and
"maxiter", which are passed to |
impute.discrete |
Function that returns single value: How to impute
discrete variables for |
impute.numeric |
Function that returns single value: How to impute
continuous variables for |
integer2factor |
Logical: If TRUE, convert all integers to factors. This includes
|
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors |
numeric2factor.levels |
Character vector: Optional - will be passed to
|
numeric.cut.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric.cut.labels |
Logical: The |
numeric.quant.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric.quant.NAonly |
Logical: If TRUE, only bin numeric variables with missing values |
len2factor |
Integer (>=2): Convert all variables with less
than or equal to this number of unique values to factors. Default = NULL.
For example, if binary variables are encoded with 1, 2, you could use
|
character2factor |
Logical: If TRUE, convert all character variables to factors |
factorNA2missing |
Logical: If TRUE, make NA values in factors be of
level |
factorNA2missing.level |
Character: Name of level if
|
factor2integer |
Logical: If TRUE, convert all factors to integers |
factor2integer_startat0 |
Logical: If TRUE, start integer coding at 0 |
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
removeConstants |
Logical: If TRUE, remove constant columns. |
removeConstants.skipMissing |
Logical: If TRUE, skip missing values, before checking if feature is constant |
removeDuplicates |
Logical: If TRUE, remove duplicate cases. |
oneHot |
Logical: If TRUE, convert all factors using one-hot encoding |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
xname |
Character: Name of |
verbose |
Logical: If TRUE, write messages to console. |
Order of operations (reflected by order of arguments in usage):
keep complete cases only
remove constants
remove duplicates
remove cases by missingness threshold
remove features by missingness threshold
integer to factor
integer to numeric
logical to factor
logical to numeric
numeric to factor
cut numeric to n bins
cut numeric to n quantiles
numeric with less than N unique values to factor
character to factor
factor NA to named level
add missingness column
impute
scale and/or center
one-hot encoding
E.D. Gennatas
Prepare data for analysis and visualization
preprocess_( x, removeFeatures.thres = NULL, missingness = FALSE, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, len2factor = 0, character2factor = FALSE, factorNA2missing = FALSE, factorNA2missing.level = "missing", scale = FALSE, center = scale, removeConstants = FALSE, oneHot = FALSE, exclude = NULL, verbose = TRUE )
preprocess_( x, removeFeatures.thres = NULL, missingness = FALSE, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, len2factor = 0, character2factor = FALSE, factorNA2missing = FALSE, factorNA2missing.level = "missing", scale = FALSE, center = scale, removeConstants = FALSE, oneHot = FALSE, exclude = NULL, verbose = TRUE )
x |
data.frame or data.table to be preprocessed. If data.frame, will be converted to data.table in-place of missing features. |
removeFeatures.thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
missingness |
Logical: If TRUE, generate new boolean columns for each feature with missing values, indicating which cases were missing data. |
integer2factor |
Logical: If TRUE, convert all integers to factors |
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors |
numeric2factor.levels |
Character vector: Optional - If |
len2factor |
Integer (>=2): Convert all numeric variables with less
than or equal to this number of unique values to factors.
For example, if binary variables are encoded with 1, 2, you could use
|
character2factor |
Logical: If TRUE, convert all character variables to factors |
factorNA2missing |
Logical: If TRUE, make NA values in factors be of
level |
factorNA2missing.level |
Character: Name of level if
|
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
removeConstants |
Logical: If TRUE, remove constant columns. |
oneHot |
Logical: If TRUE, convert all factors using one-hot encoding |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
verbose |
Logical: If TRUE, write messages to console. |
This function (ending in "_") performs operations in-place and returns the preprocessed data.table silently (e.g. for piping). Note that imputation is not currently supported - use preprocess for imputation.
Order of operations is the same as the order of arguments in usage:
keep complete cases only
remove duplicates
remove cases by missingness threshold
remove features by missingness threshold
integer to factor
integer to numeric
logical to factor
logical to numeric
numeric to factor
numeric with less than N unique values to factor
character to factor
factor NA to named level
add missingness column
scale and/or center
remove constants
one-hot encoding
E.D. Gennatas
## Not run: x <- data.table(a = sample(c(1:3), 30, T), b = rnorm(30, 12), c = rnorm(30, 200), d = sample(c(21:22), 30, T), e = rnorm(30, -100), f = rnorm(30, 950), g = rnorm(30), h = rnorm(30)) ## add duplicates x <- rbind(x, x[c(1, 3), ]) ## add constant x[, z := 99] preprocess_(x) ## End(Not run)
## Not run: x <- data.table(a = sample(c(1:3), 30, T), b = rnorm(30, 12), c = rnorm(30, 200), d = sample(c(21:22), 30, T), e = rnorm(30, -100), f = rnorm(30, 950), g = rnorm(30), h = rnorm(30)) ## add duplicates x <- rbind(x, x[c(1, 3), ]) ## add constant x[, z := 99] preprocess_(x) ## End(Not run)
Plot training and testing performance boxplots of multiple 'rtModCV“ objects created by train_cv using dplot3_box
present( ..., mod.names = NULL, which.repeat = 1, metric = NULL, plot.train = TRUE, plot.test = TRUE, boxpoints = "all", annotate_meansd = TRUE, main = NULL, ylim = NULL, htest = "none", htest.annotate.y = NULL, col = NULL, theme = rtTheme, margin = list(b = 65, l = 100, t = 60, r = 18, pad = 0), subplot.margin = 0.0666, filename = NULL, file.width = 500, file.height = 550, file.scale = 1 )
present( ..., mod.names = NULL, which.repeat = 1, metric = NULL, plot.train = TRUE, plot.test = TRUE, boxpoints = "all", annotate_meansd = TRUE, main = NULL, ylim = NULL, htest = "none", htest.annotate.y = NULL, col = NULL, theme = rtTheme, margin = list(b = 65, l = 100, t = 60, r = 18, pad = 0), subplot.margin = 0.0666, filename = NULL, file.width = 500, file.height = 550, file.scale = 1 )
... |
rtModCV objects created with train_cv |
mod.names |
Character: Names of models being plotted. |
which.repeat |
Integer: which |
metric |
Character: which metric to plot. |
plot.train |
Logical: If TRUE, plot training performance. |
plot.test |
Logical: If TRUE, plot testing performance. |
boxpoints |
Character or FALSE: "all", "suspectedoutliers", "outliers" See https://plotly.com/r/box-plots/#choosing-the-algorithm-for-computing-quartiles |
annotate_meansd |
Logical: If TRUE, annotate with mean (SD) of each box |
main |
Character: Plot title. |
ylim |
Numeric vector: y-axis limits |
htest |
Character: e.g. "t.test", "wilcox.test" to compare each box to
the first box. If grouped, compare within each group to the first box.
If p-value of test is less than |
htest.annotate.y |
Numeric: y-axis paper coordinate for htest annotation |
col |
Color, vector: Color for boxes. If NULL, which will draw
colors from |
theme |
Character: Theme to use: Run |
margin |
Named list: plot margins.
Default = |
subplot.margin |
Numeric: margin between subplots. |
filename |
Character: Path to file to save static plot. |
file.width |
Integer: File width in pixels for when |
file.height |
Integer: File height in pixels for when |
file.scale |
Numeric: If saving to file, scale plot by this number |
E.D. Gennatas
Present gridsearch results
present_gridsearch(x, rtModCV.repeat = 1, ...)
present_gridsearch(x, rtModCV.repeat = 1, ...)
x |
rtMod or rtModCV objects |
rtModCV.repeat |
Integer: Which repeat to use, when x is rtModCV object |
... |
Additional arguments to pass to |
E.D. Gennatas
## Not run: mod <- s_CART(iris, cp = c(0, .1), maxdepth = c(3, 5)) mod_10ss <- elevate(iris, mod = "cart", cp = c(0, .1), maxdepth = c(3, 5)) present_gridsearch(mod) present_gridsearch(mod_10ss) ## End(Not run)
## Not run: mod <- s_CART(iris, cp = c(0, .1), maxdepth = c(3, 5)) mod_10ss <- elevate(iris, mod = "cart", cp = c(0, .1), maxdepth = c(3, 5)) present_gridsearch(mod) present_gridsearch(mod_10ss) ## End(Not run)
Preview one or multiple colors using little rhombi with their little labels up top
previewcolor( x, main = NULL, bg = "#333333", main.col = "#b3b3b3", main.x = 0.7, main.y = 0.2, main.adj = 0, main.cex = 0.9, main.font = 1, width = NULL, xlim = NULL, ylim = c(0, 2.2), asp = 1, labels.y = 1.55, label.cex = NULL, mar = c(0, 0, 0, 1), par.reset = TRUE, filename = NULL, pdf.width = 8, pdf.height = 2.5 )
previewcolor( x, main = NULL, bg = "#333333", main.col = "#b3b3b3", main.x = 0.7, main.y = 0.2, main.adj = 0, main.cex = 0.9, main.font = 1, width = NULL, xlim = NULL, ylim = c(0, 2.2), asp = 1, labels.y = 1.55, label.cex = NULL, mar = c(0, 0, 0, 1), par.reset = TRUE, filename = NULL, pdf.width = 8, pdf.height = 2.5 )
x |
Color, vector: One or more colors that R understands |
main |
Character: Title. Default = NULL, which results in
|
bg |
Background color. |
main.col |
Color: Title color |
main.x |
Float: x coordinate for |
main.y |
Float: y coordinate for |
main.adj |
Float: |
main.cex |
Float: character expansion factor for |
main.font |
Integer, 1 or 2: Weight of |
width |
Float: Plot width. Default = NULL, i.e. set automatically |
xlim |
Vector, length 2: x-axis limits. Default = NULL, i.e. set automatically |
ylim |
Vector, length 2: y-axis limits. |
asp |
Float: Plot aspect ratio. |
labels.y |
Float: y coord for labels. Default = 1.55 (rhombi are fixed and range y .5 - 1.5) |
label.cex |
Float: Character expansion for labels. Default = NULL, and is
calculated automatically based on length of |
mar |
Numeric vector, length 4: margin size. |
par.reset |
Logical: If TRUE, reset |
filename |
Character: Path to save plot as PDF. |
pdf.width |
Numeric: Width of PDF in inches. |
pdf.height |
Numeric: Height of PDF in inches. |
Nothing, prints plot
colors <- colorgradient.x(seq(-5, 5)) previewcolor(colors)
colors <- colorgradient.x(seq(-5, 5)) previewcolor(colors)
addtree
object created using s_AddTree
Print method for addtree
object created using s_AddTree
## S3 method for class 'addtree' print(x, ...)
## S3 method for class 'addtree' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
Print method for boost object
## S3 method for class 'boost' print(x, ...)
## S3 method for class 'boost' print(x, ...)
x |
boost object |
... |
Not used |
E.D. Gennatas
Print method for cartLiteBoostTV object
## S3 method for class 'cartLiteBoostTV' print(x, ...)
## S3 method for class 'cartLiteBoostTV' print(x, ...)
x |
|
... |
Additional arguments |
E.D. Gennatas
CheckData
objectPrint CheckData
object
## S3 method for class 'CheckData' print( x, type = c("plaintext", "html"), name = NULL, check_integers = FALSE, css = list(font.family = "Helvetica", color = "#fff", background.color = "#242424"), ... )
## S3 method for class 'CheckData' print( x, type = c("plaintext", "html"), name = NULL, check_integers = FALSE, css = list(font.family = "Helvetica", color = "#fff", background.color = "#242424"), ... )
x |
|
type |
Character: Output type: "plaintext" or "html". |
name |
Character: Dataset name. |
check_integers |
Logical: If TRUE and there are integer features, prints a message to consider converting to factors. |
css |
List with |
... |
Not used. |
E.D. Gennatas
Print class_error
## S3 method for class 'class_error' print(x, decimal.places = 4, ...)
## S3 method for class 'class_error' print(x, decimal.places = 4, ...)
x |
Object of type class_error |
decimal.places |
Integer: Number of decimal places to print |
... |
Not used |
E.D. Gennatas
glmLiteBoostTV
objectPrint method for glmLiteBoostTV
object
## S3 method for class 'glmLiteBoostTV' print(x, ...)
## S3 method for class 'glmLiteBoostTV' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
print
method for gridSearch
objectprint
method for gridSearch
object
## S3 method for class 'gridSearch' print(x, ...)
## S3 method for class 'gridSearch' print(x, ...)
x |
Object of class |
... |
Unused |
E.D. Gennatas
hytboost
objectPrint method for hytboost
object
## S3 method for class 'hytboost' print(x, ...)
## S3 method for class 'hytboost' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
boost
objectPrint method for boost
object
## S3 method for class 'hytboostnow' print(x, ...)
## S3 method for class 'hytboostnow' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
lihad
objectPrint method for lihad
object
## S3 method for class 'lihad' print(x, ...)
## S3 method for class 'lihad' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
linadleaves
objectPrint method for linadleaves
object
## S3 method for class 'linadleaves' print(x, ...)
## S3 method for class 'linadleaves' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
print
massGAM objectprint
massGAM object
## S3 method for class 'massGAM' print(x, ...)
## S3 method for class 'massGAM' print(x, ...)
x |
massGAM object |
... |
Not used |
E.D. Gennatas
print
massGLM objectprint
massGLM object
## S3 method for class 'massGLM' print(x, ...)
## S3 method for class 'massGLM' print(x, ...)
x |
massGLM object |
... |
Not used |
E.D. Gennatas
regError
objectprint
regError
object
## S3 method for class 'regError' print(x, ...)
## S3 method for class 'regError' print(x, ...)
x |
|
... |
Not used |
E.D. Gennatas
print
method for resample objectPrint resample information
## S3 method for class 'resample' print(x, ...)
## S3 method for class 'resample' print(x, ...)
x |
resample object |
... |
Not used |
E.D. Gennatas
Print method for bias_variance
## S3 method for class 'rtBiasVariance' print(x, ...)
## S3 method for class 'rtBiasVariance' print(x, ...)
x |
Output of bias_variance |
... |
Not used |
E.D. Gennatas
print.rtDecom
: print
method for rtDecom
objectprint.rtDecom
: print
method for rtDecom
object
## S3 method for class 'rtDecom' print(x, ...)
## S3 method for class 'rtDecom' print(x, ...)
x |
|
... |
Not used |
print.rtTLS
: print
method for rtTLS
objectprint.rtTLS
: print
method for rtTLS
object
## S3 method for class 'rtTLS' print(x, ...)
## S3 method for class 'rtTLS' print(x, ...)
x |
|
... |
Not used. |
Print surv_error
## S3 method for class 'surv_error' print(x, decimal.places = 4, ...)
## S3 method for class 'surv_error' print(x, decimal.places = 4, ...)
x |
Object of type surv_error |
decimal.places |
Integer: Number of decimal places to print. Default = 4 |
... |
Not used |
E.D. Gennatas
Prune an AddTree tree in Node format using data.tree
to remove sister nodes with same
class estimate.
prune.addtree( addtree, prune.empty.leaves = TRUE, remove.bad.parents = TRUE, verbose = TRUE )
prune.addtree( addtree, prune.empty.leaves = TRUE, remove.bad.parents = TRUE, verbose = TRUE )
addtree |
rtMod trained with s_AddTree |
prune.empty.leaves |
Logical: If TRUE, remove leaves with 0 cases. |
remove.bad.parents |
Logical: If TRUE, remove nodes with no siblings but children and give their children to their parent. |
verbose |
Logical: If TRUE, print messages to console. |
E.D. Gennatas
Estimate the population standard deviation:
psd(x)
psd(x)
x |
Numeric vector |
This will be particularly useful when the machines finally collect data on all humans. Caution is advised, however, as you never know how many may be hiding underground.
Population standard deviation
E.D. Gennatas
Read data and optionally clean column names, keep unique rows, and convert characters to factors
read( filename, datadir = NULL, make.unique = TRUE, character2factor = FALSE, clean.colnames = TRUE, delim.reader = c("data.table", "vroom", "duckdb", "arrow"), xlsx.sheet = 1, sep = NULL, quote = "\"", na.strings = c(""), output = c("data.table", "default"), attr = NULL, value = NULL, verbose = TRUE, fread_verbose = FALSE, timed = verbose, ... )
read( filename, datadir = NULL, make.unique = TRUE, character2factor = FALSE, clean.colnames = TRUE, delim.reader = c("data.table", "vroom", "duckdb", "arrow"), xlsx.sheet = 1, sep = NULL, quote = "\"", na.strings = c(""), output = c("data.table", "default"), attr = NULL, value = NULL, verbose = TRUE, fread_verbose = FALSE, timed = verbose, ... )
filename |
Character: filename or full path if |
datadir |
Character: Optional path to directory where |
make.unique |
Logical: If TRUE, keep unique rows only |
character2factor |
Logical: If TRUE, convert character variables to factors |
clean.colnames |
Logical: If TRUE, clean columns names using clean_colnames |
delim.reader |
Character: package to use for reading delimited data |
xlsx.sheet |
Integer or character: Name or number of XLSX sheet to read |
sep |
Single character: field separator. If |
quote |
Single character: quote character |
na.strings |
Character vector: Strings to be interpreted as NA values.
For |
output |
Character: "default" or "data.table", If default, return the delim.reader's default data structure, otherwise convert to data.table |
attr |
Character: Attribute to set (Optional) |
value |
Character: Value to set (if |
verbose |
Logical: If TRUE, print messages to console |
fread_verbose |
Logical: Passed to |
timed |
Logical: If TRUE, time the process and print to console |
... |
Additional parameters to pass to |
read
is a convenience function to read:
Delimited files using data.table:fread()
, arrow:read_delim_arrow()
,
vroom::vroom()
, duckdb::duckdb_read_csv()
ARFF files using farff::readARFF()
Parquet files using arrow::read_parquet()
XLSX files using readxl::read_excel()
DTA files from Stata using haven::read_dta()
FASTA files using seqinr::read.fasta()
RDS files using readRDS()
E.D. Gennatas
## Not run: datadir <- "~/icloud/Data" dat <- read("iris.csv", datadir) ## End(Not run)
## Not run: datadir <- "~/icloud/Data" dat <- read("iris.csv", datadir) ## End(Not run)
Reads rtemis configuration file.
read_config(config.path, verbose = TRUE)
read_config(config.path, verbose = TRUE)
file |
Character: Path to configuration file created by create_config. |
List.
EDG
Recycle values of vector to match length of target
recycle(x, target)
recycle(x, target)
x |
Vector to be recycled |
target |
Object whose length defines target length |
E.D. Gennatas
Calculate error metrics for regression
reg_error( x, y, rho = FALSE, tau = FALSE, pct.red = FALSE, na.rm = FALSE, verbosity = 0 )
reg_error( x, y, rho = FALSE, tau = FALSE, pct.red = FALSE, na.rm = FALSE, verbosity = 0 )
x |
Numeric vector: True values |
y |
Numeric vector: Predicted values |
rho |
Logical: If TRUE, calculate Spearman's rho |
tau |
Logical: If TRUE, calculate Kendall's tau |
pct.red |
Logical: If TRUE, calculate percent reduction in error |
na.rm |
Logical: If TRUE, remove NA values before computation |
verbosity |
Integer: If > 0, print messages to console |
Object of class regError
E.D. Gennatas
ReLU - Rectified Linear Unit
relu(x)
relu(x)
x |
Numeric: Input |
Create resamples of your data, e.g. for model building or validation.
"bootstrap" gives the standard bootstrap, i.e. random sampling with replacement, using
bootstrap, "strat.sub" creates stratified subsamples using strat.sub,
while "strat.boot" uses strat.boot which runs strat.sub and then
randomly duplicates some of the training cases to reach original length of input
(default) or length defined by target.length
.
resample( y, n.resamples = 10, resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"), index = NULL, group = NULL, stratify.var = y, train.p = 0.75, strat.n.bins = 4, target.length = NROW(y), id.strat = NULL, rtset = NULL, seed = NULL, verbosity = TRUE )
resample( y, n.resamples = 10, resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"), index = NULL, group = NULL, stratify.var = y, train.p = 0.75, strat.n.bins = 4, target.length = NROW(y), id.strat = NULL, rtset = NULL, seed = NULL, verbosity = TRUE )
y |
Vector or data.frame: Usually the outcome; |
n.resamples |
Integer: Number of training/testing sets required |
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
index |
List where each element is a vector of training set indices. Use this for manual/pre-defined train/test splits |
group |
Integer, vector, length = |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
id.strat |
Vector of IDs which may be replicated: resampling should force replicates of each ID to only appear in the training or testing. |
rtset |
List: Output of an setup.resample (or named list with same structure). NOTE: Overrides all other arguments. Default = NULL |
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
verbosity |
Logical: If TRUE, print messages to console |
resample
is used by multiple rtemis learners, gridSearchLearn
, and
train_cv. Note that option 'kfold', which uses kfold results in resamples
of slightly different length for y of small length, so avoid all operations which rely
on equal-length vectors. For example, you can't place resamples in a data.frame, but
must use a list instead.
E.D. Gennatas
y <- rnorm(200) # 10-fold (stratified) res <- resample(y, 10, "kfold") # 25 stratified subsamples res <- resample(y, 25, "strat.sub") # 100 stratified bootstraps res <- resample(y, 100, "strat.boot")
y <- rnorm(200) # 10-fold (stratified) res <- resample(y, 10, "kfold") # 25 stratified subsamples res <- resample(y, 25, "strat.sub") # 100 stratified bootstraps res <- resample(y, 100, "strat.boot")
Reverse the order of a factor's levels
reverseLevels(x)
reverseLevels(x)
x |
Factor |
E.D. Gennatas
Reverse factor level order
revfactorlevels(x)
revfactorlevels(x)
x |
factor |
E.D. Gennatas
Select important variables from a set of features based on RF-estimated variable importance
rfVarSelect(x, y, p = 0.2, print.plot = TRUE, verbose = TRUE)
rfVarSelect(x, y, p = 0.2, print.plot = TRUE, verbose = TRUE)
x |
Predictors |
y |
outcome |
p |
Float (0, 1): Fraction of variables in x to select. |
print.plot |
Logical: If TRUE, print plot of variable importance |
verbose |
Logical: If TRUE, print messages to console. |
Please note that this function is included for academic and exploratory purposes. It may be best to rely on each supervised learning algorithm's own variable selection approach.
E.D. Gennatas
Create a matrix or data frame of defined dimensions, whose columns are random normal vectors
rnormmat( nrow = 10, ncol = 10, mean = 0, sd = 1, return.df = FALSE, seed = NULL )
rnormmat( nrow = 10, ncol = 10, mean = 0, sd = 1, return.df = FALSE, seed = NULL )
nrow |
Integer: Number of rows. Default = 10 |
ncol |
Integer: Number of columns. Default = 10 |
mean |
Float: Mean. Default = 0 |
sd |
Float: Standard deviation. Default = 1 |
return.df |
Logical: If TRUE, return data.frame, otherwise matrix. Default = TRUE |
seed |
Integer: Set seed for |
E.D. Gennatas
Collapse data.frame to vector by getting row max
rowMax(x, na.rm = TRUE)
rowMax(x, na.rm = TRUE)
x |
Input vector |
na.rm |
Logical. If TRUE, missing values are not considered. |
E.D. Gennatas
Calculates the coefficient of variation, also known as relative standard deviation, which is given by
rsd(x, as.percentage = TRUE, na.rm = TRUE, adjust = FALSE, adjust.lo = 1)
rsd(x, as.percentage = TRUE, na.rm = TRUE, adjust = FALSE, adjust.lo = 1)
x |
Numeric: Input |
as.percentage |
Logical: If TRUE, multiply by 100 |
na.rm |
Logical: If TRUE, remove missing values before computation |
adjust |
Logical: If TRUE, if |
adjust.lo |
Float: Threshold to be used if |
This is not meaningful if mean is close to 0. For such cases, set adjust = TRUE
.
This will add min(x)
to x
## Not run: mplot3_x(sapply(1:100, function(x) cov(rnorm(100))), 'd', xlab = 'rnorm(100) x 100 times') # cov of rnorm without adjustment is all over the place mplot3_x(sapply(1:100, function(x) cov(rnorm(100), adjust = T)), 'd', xlab = 'rnorm(100) x 100 times') # COV after shifting above 1 is what you probably want ## End(Not run)
## Not run: mplot3_x(sapply(1:100, function(x) cov(rnorm(100))), 'd', xlab = 'rnorm(100) x 100 times') # cov of rnorm without adjustment is all over the place mplot3_x(sapply(1:100, function(x) cov(rnorm(100), adjust = T)), 'd', xlab = 'rnorm(100) x 100 times') # COV after shifting above 1 is what you probably want ## End(Not run)
R-squared
rsq(x, y)
rsq(x, y)
x |
Float, vector: True values |
y |
Float, vector: Estimated values |
E.D. Gennatas
Apply the rtemis RStudio theme, an adaptation of the rscodeio theme (https://github.com/anthonynorth/rscodeio) Recommended to use the Fira Code font with the theme (https://fonts.google.com/specimen/Fira+Code?query=fira+code)
rstudio_theme_rtemis(theme = "dark")
rstudio_theme_rtemis(theme = "dark")
theme |
Character: "dark" or "light" |
E.D. Gennatas
View table using reactable
rt_reactable( x, datatypes = NULL, lightsout = TRUE, bg = "#121212", pagination = TRUE, searchable = TRUE, bordered = TRUE, ... )
rt_reactable( x, datatypes = NULL, lightsout = TRUE, bg = "#121212", pagination = TRUE, searchable = TRUE, bordered = TRUE, ... )
x |
data.frame, data.table or similar |
datatypes |
Character vector: Data types of columns in x,
e.g. |
lightsout |
Logical: If TRUE, use dark theme. |
bg |
Background color. |
pagination |
Logical: If TRUE, paginate table. |
searchable |
Logical: If TRUE, add search box. |
bordered |
Logical: If TRUE, add border. |
... |
Additional arguments passed to |
E D Gennatas
Write rtemis model to RDS file
rt_save(rtmod, outdir, file.prefix = "s_", verbose = TRUE)
rt_save(rtmod, outdir, file.prefix = "s_", verbose = TRUE)
rtmod |
rtemis model |
outdir |
Path to output directory |
file.prefix |
Character: Prefix for filename |
verbose |
Logical: If TRUE, print messages to output |
E.D. Gennatas
S3 methods for rtClust
class.
## S3 method for class 'rtClust' print(x, ...)
## S3 method for class 'rtClust' print(x, ...)
x |
|
... |
Not used |
Allows you to get n
colors of a defined palette, useful for passing to other functions, like ggplot
rtemis_palette(n, palette = rtPalette)
rtemis_palette(n, palette = rtPalette)
n |
Integer: Number of colors to output |
palette |
Character: Palette to use. See available options with |
E.D. Gennatas
rtemis_palette(3)
rtemis_palette(3)
Initializes Directory Structure: "R", "Data", "Results"
rtInitProjectDir(verbose = TRUE)
rtInitProjectDir(verbose = TRUE)
verbose |
Logical, If TRUE, print messages to console |
E.D. Gennatas
mplot3
familySet layout for drawing multiple plots in the same view
rtlayout( nrows = NULL, ncols = NULL, byrow = FALSE, autolabel = FALSE, pdf.width = NULL, pdf.height = NULL, filename = NULL )
rtlayout( nrows = NULL, ncols = NULL, byrow = FALSE, autolabel = FALSE, pdf.width = NULL, pdf.height = NULL, filename = NULL )
nrows |
Integer: N of rows |
ncols |
Integer: N of columns |
byrow |
Logical: If TRUE, draw add plots by row Default = FALSE |
autolabel |
Logical: If TRUE, place letter labels on the top left corner of each figure. Default = FALSE |
pdf.width |
Float: Width of PDF to save, if |
pdf.height |
Float: Height of PDF to save, if |
filename |
String, optional: Save multiplot to file. Default = NULL |
E.D. Gennatas
S3 methods for rtMeta
class that differ from those of the rtMod
superclass
## S3 method for class 'rtMeta' predict(object, newdata, fn = median, ...)
## S3 method for class 'rtMeta' predict(object, newdata, fn = median, ...)
object |
|
newdata |
Testing set features |
fn |
Function to average predictions |
... |
Additional arguments passed to |
rtMod
S3 methodsS3 methods for rtMod
class.
Excludes functions print
and plot
defined within the rtMod
class itself.
Get coefficients or relative variable importance for rtMod
object
## S3 method for class 'rtMod' print(x, ...) ## S3 method for class 'rtMod' fitted(object, ...) ## S3 method for class 'rtMod' predict( object, newdata, classification.output = c("prob", "class"), trace = 0, verbose = TRUE, ... ) ## S3 method for class 'rtMod' residuals(object, ...) ## S3 method for class 'rtMod' plot(x, estimate = NULL, theme = rtTheme, filename = NULL, ...) ## S3 method for class 'rtMod' summary( object, plots = TRUE, cex = 1, fit.true.line = "lm", resid.fit.line = "gam", fit.legend = TRUE, se.fit = TRUE, single.fig = TRUE, summary = TRUE, theme = rtTheme, title.col = NULL, ... ) ## S3 method for class 'rtMod' coef(object, verbose = TRUE, ...) ## S3 method for class 'rtModLite' predict(object, newdata, ...)
## S3 method for class 'rtMod' print(x, ...) ## S3 method for class 'rtMod' fitted(object, ...) ## S3 method for class 'rtMod' predict( object, newdata, classification.output = c("prob", "class"), trace = 0, verbose = TRUE, ... ) ## S3 method for class 'rtMod' residuals(object, ...) ## S3 method for class 'rtMod' plot(x, estimate = NULL, theme = rtTheme, filename = NULL, ...) ## S3 method for class 'rtMod' summary( object, plots = TRUE, cex = 1, fit.true.line = "lm", resid.fit.line = "gam", fit.legend = TRUE, se.fit = TRUE, single.fig = TRUE, summary = TRUE, theme = rtTheme, title.col = NULL, ... ) ## S3 method for class 'rtMod' coef(object, verbose = TRUE, ...) ## S3 method for class 'rtModLite' predict(object, newdata, ...)
x |
|
... |
Additional argument passed to |
object |
|
newdata |
Testing set features |
classification.output |
Character: "prob" or "class" for classification models |
trace |
Integer: Set trace level |
verbose |
Logical: If TRUE, output messages to console |
estimate |
Character: "fitted" or "predicted" |
theme |
Character: theme to use. Options: "box", "darkbox", "light", "dark" |
filename |
Character: Path to file to save plot |
plots |
Logical: If TRUE, print plots. Default = TRUE |
cex |
Float: Character expansion factor |
fit.true.line |
rtemis algorithm to use for fitted vs. true line
Options: |
resid.fit.line |
rtemis algorithm to use for residuals vs. fitted line.
Options: |
fit.legend |
Logical: If TRUE, print fit legend. Default = TRUE |
se.fit |
Logical: If TRUE, plot 2 * standard error bands. Default = TRUE |
single.fig |
Logical: If TRUE, draw all plots in a single figure. Default = TRUE |
summary |
Logical: If TRUE, print summary. Default = TRUE |
title.col |
Color for main title |
E.D. Gennatas
S3 methods for rtModBag
class that differ from those of the rtMod
superclass
## S3 method for class 'rtModBag' predict(object, newdata, aggr.fn = NULL, n.cores = 1, verbose = FALSE, ...)
## S3 method for class 'rtModBag' predict(object, newdata, aggr.fn = NULL, n.cores = 1, verbose = FALSE, ...)
object |
|
newdata |
Testing set features |
aggr.fn |
Character: Function to aggregate models' prediction. If NULL, defaults to "median" |
n.cores |
Integer: Number of cores to use |
verbose |
Logical: If TRUE, print messages to console. |
... |
Not used |
rtemis Classification Model Class
rtemis Classification Model Class
R6 Class for rtemis Classification Models
rtemis::rtMod
-> rtModClass
fitted.prob
Training set probability estimates
predicted.prob
Testing set probability estimates
new()
Initialize rtModClass
object
rtModClass$new( mod.name = character(), y.train = numeric(), y.test = numeric(), x.name = character(), y.name = character(), xnames = character(), mod = list(), type = character(), gridsearch = NULL, parameters = list(), fitted = numeric(), fitted.prob = numeric(), se.fit = numeric(), error.train = list(), predicted = NULL, predicted.prob = NULL, se.prediction = NULL, error.test = NULL, varimp = NULL, question = character(), extra = list() )
mod.name
Character: Algorithm name
y.train
Training set output
y.test
Testing set output
x.name
Character: Feature set name
y.name
Character: Output name
xnames
Character vector: Feature names
mod
Trained model
type
Character: Type of model (Regression, Classification, Survival)
gridsearch
Grid search output
parameters
List of training parameters
fitted
Fitted values (training set predictions)
fitted.prob
Training set probability estimates
se.fit
Standard error of the fit
error.train
Training set error
predicted
Predicted values (Testing set predictions)
predicted.prob
Testing set probability estimates
se.prediction
Testting set standard error
error.test
Testing set error
varimp
Variable importance
question
Question the model is trying to answer
extra
List of extra model info
sessionInfo
R session info at time of training
plotROC()
plot ROC. Uses testing set if available, otherwise training
rtModClass$plotROC(theme = rtTheme, filename = NULL, ...)
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
plotROCfitted()
Plot training set ROC
rtModClass$plotROCfitted( main = "ROC Training", theme = rtTheme, filename = NULL, ... )
main
Character: Main title
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
plotROCpredicted()
plot testing set ROC
rtModClass$plotROCpredicted( main = "ROC Testing", theme = rtTheme, filename = NULL, ... )
main
Character: Main title
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
plotPR()
plot Precision-Recall curve. Uses testing set if available, otherwise training
rtModClass$plotPR(theme = rtTheme, filename = NULL, ...)
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
plotPRfitted()
Plot training set Precision-Recall curve.
rtModClass$plotPRfitted( main = "P-R Training", theme = rtTheme, filename = NULL, ... )
main
Character: Main title
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
plotPRpredicted()
plot testing set Precision-Recall curve.
rtModClass$plotPRpredicted( main = "P-R Testing", theme = rtTheme, filename = NULL, ... )
main
Character: Main title
theme
Theme to pass to plotting function
filename
Character: Path to file to save plot
...
Extra arguments to pass to plotting function
clone()
The objects of this class are cloneable with this method.
rtModClass$clone(deep = FALSE)
deep
Whether to make a deep clone.
rtModCV
class that differ from those of the rtMod
superclassS3 methods for rtModCV
class that differ from those of the rtMod
superclass
plot.rtModCV
: plot
method for rtModCV
object
summary.rtModCV
: summary
method for rtModCV
object
predict.rtModCV
: predict
method for rtModCV
object
describe
method for rtModCV
object
## S3 method for class 'rtModCV' plot(x, ...) ## S3 method for class 'rtModCV' summary(object, ...) ## S3 method for class 'rtModCV' predict( object, newdata, which.repeat = 1, classification.output = c("prob", "class"), output = c("array", "avg"), ... ) ## S3 method for class 'rtModCV' describe(object, ...)
## S3 method for class 'rtModCV' plot(x, ...) ## S3 method for class 'rtModCV' summary(object, ...) ## S3 method for class 'rtModCV' predict( object, newdata, which.repeat = 1, classification.output = c("prob", "class"), output = c("array", "avg"), ... ) ## S3 method for class 'rtModCV' describe(object, ...)
x |
|
... |
Not used |
object |
|
newdata |
Set of predictors to use |
which.repeat |
Integer: Which repeat to use for prediction |
classification.output |
Character: "prob" or "class" for classification models
If "class" and |
output |
Character: "matrix" or "avg". Produce either a matrix with predictions of each model in different columns, or the mean/mode of the predictions across models |
n.cores |
Integer: Number of cores to use |
S3 methods for rtModLite
class.
## S3 method for class 'rtModLite' print(x, ...)
## S3 method for class 'rtModLite' print(x, ...)
x |
|
... |
Not used |
rtemis Supervised Model Log Class
rtemis Supervised Model Log Class
mod.name
Learner algorithm name
parameters
List of hyperparameters used when building model
error.train
Training error
error.test
Testing error
sessionInfo
The output of sessionInfo()
at the time the model was trained
new()
Initialize rtModLog
object
rtModLog$new( mod.name = character(), parameters = list(), error.train = list(), error.test = NULL )
mod.name
Learner algorithm name
parameters
List of hyperparameters used when building model
error.train
Training error
error.test
Testing error
print()
Print method for rtModLog
object
rtModLog$print()
clone()
The objects of this class are cloneable with this method.
rtModLog$clone(deep = FALSE)
deep
Whether to make a deep clone.
E.D. Gennatas
rtemis model logger
rtemis model logger
R6 class to save trained models' parameters and performance. Keep your experiment results tidy in one place, with an option to write out to a multi-sheet Excel file.
mods
List of trained models
new()
Initialize rtModLogger
object
rtModLogger$new(mods = list())
mods
List of trained models
print()
Print method for rtModLogger
object
rtModLogger$print()
add()
Add model to logger
rtModLogger$add(mod, verbose = TRUE)
mod
Model to add
verbose
Logical: If TRUE, print messages to console
summarize()
Summary method for rtModLogger
rtModLogger$summarize( class.metric = "Balanced Accuracy", reg.metric = "Rsq", surv.metric = "Coherence", decimal.places = 3, print.metric = FALSE )
class.metric
Character: Metric to use for Classification models
reg.metric
Character: Metric to use for Regression models
surv.metric
Character: Metric to use for Survival models
decimal.places
Integer: Number of decimal places to display
print.metric
Logical: If TRUE, print metric name
summary()
Summary method for rtModLogger
rtModLogger$summary( class.metric = "Balanced Accuracy", reg.metric = "Rsq", surv.metric = "Coherence" )
class.metric
Character: Metric to use for Classification models
reg.metric
Character: Metric to use for Regression models
surv.metric
Character: Metric to use for Survival models
tabulate()
Tabulate models' parameters and performance
rtModLogger$tabulate(filename = NULL)
filename
Character: Path to file to save parameters and performance - will be saved as .xlsx file with multiple sheets
plot()
Plot method for rtModLogger
rtModLogger$plot( names = NULL, col = unlist(rtpalette(rtPalette)), mar = NULL, ... )
names
Character: Model names
col
Colors to use
mar
Float, vector: plot margins
...
Additional arguments to pass to plotting function
clone()
The objects of this class are cloneable with this method.
rtModLogger$clone(deep = FALSE)
deep
Whether to make a deep clone.
E.D. Gennatas
rtpalette()
prints names of available color palettes
Each palette is a named list of hexadecimal color definitions which can be used with
any graphics function.
rtpalette(palette_name)
returns a list of colors for a given palette.
rtpalette(palette = NULL, verbose = TRUE)
rtpalette(palette = NULL, verbose = TRUE)
palette |
Character: Name of palette to return. Default = NULL: available palette names are printed and no palette is returned |
verbose |
Logical: If |
A list of available palettes, invisibly
rtpalette("imperial")
rtpalette("imperial")
ucsfCol
: UCSF color palette (https://identity.ucsf.edu/brand-guide/color)
ucsfPalette
: Subset of ucsfCol
ucdCol
: UC Davis color palette
(https://marketingtoolbox.ucdavis.edu/visual-identity/color.html)
berkeleyCol
: Berkeley color palette
(https://brand.berkeley.edu/colors/)
ucscCol
: UC Santa Cruz color palette
(https://communications.ucsc.edu/visual-design/color/)
ucmercedCol
: UC Merced color palette
(https://publicrelations.ucmerced.edu/color-guidelines)
ucsbCol
: UC Santa Barbara color palette
(https://www.ucsb.edu/visual-identity/color)
uclaCol
: UCLA color palette (http://brand.ucla.edu/identity/colors)
ucrCol
: UC Riverside color palette (https://brand.ucr.edu/ucr-colors)
uciCol
: UCI color palette (https://communications.uci.edu/campus-resources/graphic-standards/colors.php)
ucsdCol
: UC San Diego color palette
(https://ucpa.ucsd.edu/brand/elements/color-palette/)
californiaCol
: University of California color palette
(http://brand.universityofcalifornia.edu/guidelines/color.html#!primary-colors)
stanfordCol
: Stanford color palette
(https://identity.stanford.edu/color.html#digital-color)
csuCol
: California State University color palette
(https://www2.calstate.edu/csu-system/csu-branding-standards/Documents/CSU-Brand-Guidelines-8-2018.pdf)
calpolyCol
: Cal Poly color palette
(https://universitymarketing.calpoly.edu/brand-guidelines/colors/)
caltechCol
: Caltech color palette (http://identity.caltech.edu/colors)
scrippsCol
: Scripps Research color palette
pennCol
: Penn color palette
(http://www.upenn.edu/about/styleguide-color-type)
cmuCol
: Carnegie Mellon color palette
(https://www.cmu.edu/marcom/brand-standards/web-standards.html#colors)
mitCol
: MIT color palette
(http://web.mit.edu/graphicidentity/colors.html)
princetonCol
: Princeton color palette
(https://communications.princeton.edu/guides-tools/logo-graphic-identity)
columbiaCol
: Columbia color palette
(https://visualidentity.columbia.edu/content/web-0)
brownCol
: Brown color palette
(https://www.brown.edu/university-identity/sites/university-identity/files/Brown_Visual_Identity_Policy_2016-07-22.pdf)
yaleCol
: Yale color palette (https://yaleidentity.yale.edu/web)
cornellCol
: Yale color palette
(https://brand.cornell.edu/design-center/colors/
hmsCol
: Harvard Medical School color palette
(https://identityguide.hms.harvard.edu/color)
dartmouthCol
: Dartmouth color palette
(https://communications.dartmouth.edu/visual-identity/design-elements/color-palette#web%20palette)
usfCol
: USF color palette
(https://myusf.usfca.edu/marketing-communications/resources/graphics-resources/brand-standards/color-palette)
Color conversions performed using https://www.pantone.com/color-finder/
uwCol
: University of Washington color palette
(http://www.washington.edu/brand/graphic-elements/primary-color-palette/)
jhuCol
: Johns Hopkins University color palette
(https://brand.jhu.edu/color/)
nyuCol
: NYU color palette
(https://www.nyu.edu/employees/resources-and-services/media-and-communications/styleguide/website/graphic-visual-design.html)
washuCol
: WashU color palette
(https://marcomm.wustl.edu/resources/branding-logo-toolkit/color-palettes/)
chicagoCol
: University of Chicago color palette
(https://news.uchicago.edu/sites/default/files/attachments/_uchicago.identity.guidelines.pdf)
texasCol
: Penn State color palette
(https://brand.psu.edu/design-essentials.html#color)
sfsuCol
: SF State color palette
(https://logo.sfsu.edu/color-system)
illinoisCol
: University of Illinois color palette
(https://www.uillinois.edu/OUR/brand/color_palettes)
umdCol
: University of Maryland color palette
(https://osc.umd.edu/licensing-trademarks/brand-standards/logos/#color)
msuCol
: MSU color palette
(https://brand.msu.edu/visual/color-palette)
michiganCol
: Michigan color palette
(https://brand.umich.edu/design-resources/colors/)
iowaCol
: University of Iowa color palette
(https://brand.uiowa.edu/color)
texasCol
: University of Texas color palette
(https://brand.utexas.edu/identity/color/)
emoryCol
: Emory color palette
(https://brand.emory.edu/color.html)
techCol
: Georgia Tech color palette
(http://www.licensing.gatech.edu/super-block/239)
vanderbiltCol
: Vanderbilt color palette
(https://www.vanderbilt.edu/communications/brand/color.php)
jeffersonCol
: Jefferson color palette (http://creative.jefferson.edu/downloads/Jefferson-Brand-Guidelines.pdf)
hawaiiCol
: University of Hawaii color palette (https://www.hawaii.edu/offices/eaur/graphicsstandards.pdf)
nihCol
: NIH color palette (https://www.nlm.nih.gov/about/nlm_logo_guidelines_030414_508.pdf)
imperialCol
: Imperial College London colour palette
(https://www.imperial.ac.uk/brand-style-guide/visual-identity/brand-colours/)
uclCol
: UCL colour palette (https://www.ucl.ac.uk/cam/brand/guidelines/colour)
oxfordCol
: Oxford University colour palette (https://www.ox.ac.uk/sites/files/oxford/media_wysiwyg/Oxford%20Blue%20LR.pdf)
nhsCol
: NHS colour palette (https://www.england.nhs.uk/nhsidentity/identity-guidelines/colours/)
ubcCol
: UBC color palette (http://assets.brand.ubc.ca/downloads/ubc_colour_guide.pdf)
torontoCol
: U Toronto color palette (https://trademarks.utoronto.ca/colors-fonts/)
mcgillCol
: McGill color palette (https://www.mcgill.ca/visual-identity/visual-identity-guide)
ethCol
: ETH color palette (https://ethz.ch/services/en/service/communication/corporate-design/colour.html)
rwthCol
: RWTH Aachen color palette (http://www9.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaaadpbhq)
mozillaCol
: Mozilla design colors
(https://mozilla.design/mozilla/color/)
firefoxCol
: Firefox design colors
(https://mozilla.design/firefox/color/)
appleCol
: Apple Human Interface Guidelines color palette
(https://developer.apple.com/design/human-interface-guidelines/ios/visual-design/color/)
googleCol
: Google brand palette (https://brandpalettes.com/google-colors/)
amazonCol
: Amazon brand palette
(https://images-na.ssl-images-amazon.com/images/G/01/AdvertisingSite/pdfs/AmazonBrandUsageGuidelines.pdf)
microsoftCol
: Microsoft brand palette
(https://brandcolors.net/b/microsoft)
ucsfLegacyCol ucsfPalette ucdCol berkeleyCol ucscCol ucmercedCol ucsbCol uclaCol ucrColor uciCol ucsdCol californiaCol stanfordCol csuCol calpolyCol caltechCol scrippsCol pennCol pennPalette pennLightPalette cmuCol mitCol princetonCol columbiaCol brownCol yaleCol cornellCol hmsCol dartmouthCol usfCol uwCol jhuCol nyuCol washuCol chicagoCol pennstateCol sfsuCol illinoisCol umdCol msuCol michiganCol iowaCol texasCol emoryCol techCol vanderbiltCol jeffersonCol hawaiiCol nihCol imperialCol uclCol oxfordCol nhsCol ubcCol torontoCol mcgillCol ethCol rwthCol mozillaCol firefoxCol appleCol googleCol amazonCol microsoftCol
ucsfLegacyCol ucsfPalette ucdCol berkeleyCol ucscCol ucmercedCol ucsbCol uclaCol ucrColor uciCol ucsdCol californiaCol stanfordCol csuCol calpolyCol caltechCol scrippsCol pennCol pennPalette pennLightPalette cmuCol mitCol princetonCol columbiaCol brownCol yaleCol cornellCol hmsCol dartmouthCol usfCol uwCol jhuCol nyuCol washuCol chicagoCol pennstateCol sfsuCol illinoisCol umdCol msuCol michiganCol iowaCol texasCol emoryCol techCol vanderbiltCol jeffersonCol hawaiiCol nihCol imperialCol uclCol oxfordCol nhsCol ubcCol torontoCol mcgillCol ethCol rwthCol mozillaCol firefoxCol appleCol googleCol amazonCol microsoftCol
An object of class list
of length 13.
An object of class list
of length 8.
An object of class list
of length 18.
An object of class list
of length 18.
An object of class list
of length 9.
An object of class list
of length 7.
An object of class list
of length 10.
An object of class list
of length 13.
An object of class list
of length 3.
An object of class list
of length 8.
An object of class list
of length 11.
An object of class list
of length 17.
An object of class list
of length 27.
An object of class list
of length 3.
An object of class list
of length 18.
An object of class list
of length 19.
An object of class list
of length 6.
An object of class list
of length 30.
An object of class list
of length 11.
An object of class list
of length 5.
An object of class list
of length 8.
An object of class list
of length 3.
An object of class list
of length 2.
An object of class list
of length 19.
An object of class list
of length 8.
An object of class list
of length 10.
An object of class list
of length 14.
An object of class list
of length 14.
An object of class list
of length 14.
An object of class list
of length 3.
An object of class list
of length 3.
An object of class list
of length 21.
An object of class list
of length 17.
An object of class list
of length 18.
An object of class list
of length 25.
An object of class list
of length 20.
An object of class list
of length 9.
An object of class list
of length 15.
An object of class list
of length 3.
An object of class list
of length 6.
An object of class list
of length 15.
An object of class list
of length 9.
An object of class list
of length 10.
An object of class list
of length 22.
An object of class list
of length 3.
An object of class list
of length 9.
An object of class list
of length 8.
An object of class list
of length 11.
An object of class list
of length 2.
An object of class list
of length 26.
An object of class list
of length 23.
An object of class list
of length 24.
An object of class list
of length 21.
An object of class list
of length 6.
An object of class list
of length 2.
An object of class list
of length 27.
An object of class list
of length 10.
An object of class list
of length 60.
An object of class list
of length 8.
An object of class list
of length 8.
An object of class list
of length 8.
An object of class list
of length 4.
An object of class list
of length 2.
An object of class list
of length 4.
Calculate the points of an ROC curve and the AUC
rtROC( true.labels, predicted.probabilities, thresholds = NULL, plot = FALSE, theme = rtTheme, verbose = TRUE )
rtROC( true.labels, predicted.probabilities, thresholds = NULL, plot = FALSE, theme = rtTheme, verbose = TRUE )
true.labels |
Factor with true labels |
predicted.probabilities |
Numeric vector of predicted probabilities / estimated score |
thresholds |
Numeric vector of thresholds to consider |
plot |
Logical: If TRUE, print plot |
theme |
rtemis theme to use |
verbose |
Logical: If TRUE, print messages to console |
true.labels
should be a factor (will be coerced to one) where the first level is the
"positive" case. predicted.probabilities
should be a vector of floats 0, 1 where [0, .5)
corresponds to the first level and [.5, 1]
corresponds to the second level.
predicted.probabilities
E.D. Gennatas
These functions output lists of default settings for different rtemis functions. This removes the need of passing named lists of arguments, and provides autocompletion, making it easier to setup functions without having to refer to the manual.
E.D. Gennatas
Get rtemis and OS version info
rtversion()
rtversion()
R6 class for rtemis cross-decompositions
R6 class for rtemis cross-decompositions
rtemis cross-decomposition R6 object
xdecom.name
Character: Name of cross-decomposition algorithm
k
Integer: Number of projections
xnames
Character vector: Column names of x
znames
Character vector: Column names of z
xdecom
Cross-decomposition model output
xprojections.train
x data training set projections
xprojections.test
x data test set data projections
zprojections.train
z data training set projections
zprojections.test
z data test set projections
parameters
Cross-decomposition parameters
extra
List: Algorithm-specific output
new()
rtXDecom$new( xdecom.name = character(), k = integer(), xnames = character(), znames = character(), xdecom = list(), xprojections.train = numeric(), xprojections.test = numeric(), zprojections.train = numeric(), zprojections.test = numeric(), parameters = list(), extra = list() )
xdecom.name
Character: Name of cross-decomposition algorithm
k
Integer: Number of projections
xnames
Character vector: Column names of x
znames
Character vector: Column names of z
xdecom
Cross-decomposition model output
xprojections.train
x data training set projections
xprojections.test
x data test set data projections
zprojections.train
z data training set projections
zprojections.test
z data test set projections
parameters
Cross-decomposition parameters
extra
List: Algorithm-specific output
print()
Print method for rtXDecom
objects
rtXDecom$print()
clone()
The objects of this class are cloneable with this method.
rtXDecom$clone(deep = FALSE)
deep
Whether to make a deep clone.
E.D. Gennatas
Calculate pairwise distance among a set of rules or between two sets of rules, where each rule defines a subpopulation
ruleDist( x, rules1, rules2 = NULL, print.plot = TRUE, plot.type = c("static", "interactive"), heat.lo = "black", heat.mid = NA, heat.hi = "#F48024", verbose = TRUE )
ruleDist( x, rules1, rules2 = NULL, print.plot = TRUE, plot.type = c("static", "interactive"), heat.lo = "black", heat.mid = NA, heat.hi = "#F48024", verbose = TRUE )
x |
Data frame / matrix: Input features (cases by features) |
rules1 |
Character, vector: Rules as combination of conditions on the features of |
rules2 |
String, vector, Optional: Rules as combination of conditions on the features of |
print.plot |
Logical: If TRUE, plot heatmap for calculated distance |
plot.type |
Character: "static", "interactive": type of graphics to use, base or plotly, respectively. Default = "static" |
heat.lo |
Color: Heatmap low color. Default = "black" |
heat.mid |
Color: Heatmap mid color. Default = NA (i.e. create gradient from |
heat.hi |
Colo: Heatmap hi colo. Default = "#F48024" (orange) |
verbose |
Logical: If TRUE, print console messages. Default = TRUE |
If only rules1 is provided, computes pairwise distance among rules1, otherwise computes pairwise distance between rules1 and rules2
E.D. Gennatas
Convert rules from cutoffs to median (range)
and mode (range)
format
rules2medmod(rules, x, .ddSci = TRUE, verbose = TRUE, trace = 0)
rules2medmod(rules, x, .ddSci = TRUE, verbose = TRUE, trace = 0)
rules |
Character, vector: Input rules |
x |
Data frame: Data to evaluate rules |
.ddSci |
Logical: If TRUE, format all continuous variables using ddSci, which will give either 2 decimal places, or scientific notation if two decimal places result in 0.00 |
verbose |
Logical: If TRUE, print messages to console. |
trace |
Integer: If greater than zero, print progress |
E.D. Gennatas
Create a matrix or data frame of defined dimensions, whose columns are random uniform vectors
runifmat( nrow = 10, ncol = 10, min = 0, max = 1, return.df = FALSE, seed = NULL )
runifmat( nrow = 10, ncol = 10, min = 0, max = 1, return.df = FALSE, seed = NULL )
nrow |
Integer: Number of rows. |
ncol |
Integer: Number of columns. |
min |
Float: Min. |
max |
Float: Max. |
return.df |
Logical: If TRUE, return data.frame, otherwise matrix. |
seed |
Integer: Set seed for |
E.D. Gennatas
Train an Adaboost Classifier using ada::ada
s_AdaBoost( x, y = NULL, x.test = NULL, y.test = NULL, loss = "exponential", type = "discrete", iter = 50, nu = 0.1, bag.frac = 0.5, upsample = FALSE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_AdaBoost( x, y = NULL, x.test = NULL, y.test = NULL, loss = "exponential", type = "discrete", iter = 50, nu = 0.1, bag.frac = 0.5, upsample = FALSE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
loss |
Character: "exponential" (Default), "logistic" |
type |
Character: "discrete", "real", "gentle" |
iter |
Integer: Number of boosting iterations to perform. Default = 50 |
nu |
Float: Shrinkage parameter for boosting. Default = .1 |
bag.frac |
Float (0, 1]: Sampling fraction for out-of-bag samples |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
ada::ada
does not support case weights
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Ensembles:
s_GBM()
,
s_RF()
,
s_Ranger()
Train an Additive Tree model
s_AddTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, update = c("exponential", "polynomial"), min.update = ifelse(update == "polynomial", 0.035, 1000), min.hessian = 0.001, min.membership = 1, steps.past.min.membership = 0, gamma = 0.8, max.depth = 30, learning.rate = 0.1, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, imetrics = TRUE, grid.resample.params = setup.resample("kfold", 5), metric = "Balanced Accuracy", maximize = TRUE, rpart.params = NULL, match.rules = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, prune.verbose = FALSE, trace = 1, grid.verbose = verbose, outdir = NULL, save.rpart = FALSE, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), n.cores = rtCores )
s_AddTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, update = c("exponential", "polynomial"), min.update = ifelse(update == "polynomial", 0.035, 1000), min.hessian = 0.001, min.membership = 1, steps.past.min.membership = 0, gamma = 0.8, max.depth = 30, learning.rate = 0.1, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, imetrics = TRUE, grid.resample.params = setup.resample("kfold", 5), metric = "Balanced Accuracy", maximize = TRUE, rpart.params = NULL, match.rules = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, prune.verbose = FALSE, trace = 1, grid.verbose = verbose, outdir = NULL, save.rpart = FALSE, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), n.cores = rtCores )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
update |
Character: "exponential" or "polynomial". Type of weight update. Default = "exponential" |
min.update |
Float: Minimum update for gradient step |
min.hessian |
[gS] Float: Minimum second derivative to continue splitting. Default = .001 |
min.membership |
Integer: Minimum number of cases in a node. Default = 1 |
steps.past.min.membership |
Integer: N steps to make past |
gamma |
[gS] Float: acceleration factor = lambda/(1 + lambda). Default = .8 |
max.depth |
[gS] Integer: maximum depth of the tree. Default = 30 |
learning.rate |
[gS] learning rate for the Newton Raphson step that updates the function values of the node |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
imetrics |
Logical: If TRUE, save interpretability metrics, i.e. N total nodes in tree and depth, in output. Default = TRUE |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
rpart.params |
List: |
match.rules |
Logical: If TRUE, match cases to rules to get statistics per node, i.e. what percent of cases match each rule. If available, these are used by dplot3_addtree when plotting. Default = TRUE |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
prune.verbose |
Logical: If TRUE, prune tree. |
trace |
Integer: 0, 1, 2. The higher the number, the more verbose the output. |
grid.verbose |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.rpart |
Logical: passed to |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
n.cores |
Integer: Number of cores to use. |
This function is for binary classification. The outcome must be a factor with two levels, the first level is the 'positive' class. Ensure there are no missing values in the data and that variables are either numeric (including integers) or factors. Use preprocess as needed to impute and convert characters to factors.
Factor levels should not contain the "/" character (it is used to separate conditions in the addtree object)
[gS] Indicates that more than one value can be supplied, which will result in grid search using internal resampling lambda = gamma/(1 - gamma)
Object of class rtMod
E.D. Gennatas
Jose Marcio Luna, Efstathios D Gennatas, Lyle H Ungar, Eric Eaton, Eric S Diffenderfer, Shane T Jensen, Charles B Simone, Jerome H Friedman, Timothy D Solberg, Gilmer Valdes Building more accurate decision trees with the additive tree Proc Natl Acad Sci U S A. 2019 Oct 1;116(40):19887-19893. doi: 10.1073/pnas.1816748116
Other Supervised Learning:
s_AdaBoost()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_C50()
,
s_CART()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_LMTree()
Trains a Bayesian Additive Regression Tree (BART) model using package bartMachine
s_BART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = c(100, 200), k_cvs = c(2, 3), nu_q_cvs = list(c(3, 0.9), c(10, 0.75)), k_folds = 5, n.burnin = 250, n.iter = 1000, n.cores = rtCores, upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), java.mem.size = 12, ... )
s_BART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = c(100, 200), k_cvs = c(2, 3), nu_q_cvs = list(c(3, 0.9), c(10, 0.75)), k_folds = 5, n.burnin = 250, n.iter = 1000, n.cores = rtCores, upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), java.mem.size = 12, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: if TRUE, sets |
... |
Additional arguments to be passed to |
Be warned this can take a very long time to train.
If you are having trouble with rJava in Rstudio on macOS, see:
https://support.rstudio.com/hc/en-us/community/posts/203663956/comments/249073727
bartMachine
does not support case weights
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Train a bayesian GLM using arm::bayesglm
s_BayesGLM( x, y = NULL, x.test = NULL, y.test = NULL, family = NULL, prior.mean = 0, prior.scale = NULL, prior.df = 1, prior.mean.for.intercept = 0, prior.scale.for.intercept = NULL, prior.df.for.intercept = 1, min.prior.scale = 1e-12, scaled = TRUE, keep.order = TRUE, drop.baseline = TRUE, maxit = 100, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, grid.verbose = verbose, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_BayesGLM( x, y = NULL, x.test = NULL, y.test = NULL, family = NULL, prior.mean = 0, prior.scale = NULL, prior.df = 1, prior.mean.for.intercept = 0, prior.scale.for.intercept = NULL, prior.df.for.intercept = 1, min.prior.scale = 1e-12, scaled = TRUE, keep.order = TRUE, drop.baseline = TRUE, maxit = 100, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, grid.verbose = verbose, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
family |
Character or function for the error distribution and link function to
be used. See |
prior.mean |
Numeric, vector: Prior mean for the coefficients. If scalar, it will be replicated to length N features. |
prior.scale |
Numeric, vector: Prior scale for the coefficients. Default = NULL, which results in 2.5 for logit, 2.5*1.6 for probit. If scalar, it will be replicated to length N features. |
prior.df |
Numeric: Prior degrees of freedom for the coefficients. Set to 1 for t distribution; set to Inf for normal prior distribution. If scalar, it will be replicated to length N features. |
prior.mean.for.intercept |
Numeric: Prior mean for the intercept. |
prior.scale.for.intercept |
Numeric: Default = NULL, which results in 10 for a logit model, and 10*1.6 for probit model. |
prior.df.for.intercept |
Numeric: Prior df for the intercept. |
min.prior.scale |
Numeric: Minimum prior scale for the coefficients. |
scaled |
Logical: If TRUE, the scale for the prior distributions are:
For feature with single value, use |
keep.order |
Logical: If TRUE, the feature positions are maintained, otherwise they are reordered: main effects, interactions, second-order, third-order, etc. |
drop.baseline |
Logical: If TRUE, drop the base level of factor features. |
maxit |
Integer: Maximum number of iterations |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
grid.verbose |
Logical: Passed to |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional parameters to pass to |
E.D. Gennatas
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Trains a BRUTO model and validates it
s_BRUTO( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, weights.col = NULL, dfmax = 6, cost = 2, maxit.select = 20, maxit.backfit = 20, thresh = 1e-04, start.linear = TRUE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_BRUTO( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, weights.col = NULL, dfmax = 6, cost = 2, maxit.select = 20, maxit.backfit = 20, thresh = 1e-04, start.linear = TRUE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a C5.0 decision tree using C50::C5.0
s_C50( x, y = NULL, x.test = NULL, y.test = NULL, trials = 10, rules = FALSE, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, control = C50::C5.0Control(), costs = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_C50( x, y = NULL, x.test = NULL, y.test = NULL, trials = 10, rules = FALSE, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, control = C50::C5.0Control(), costs = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
trials |
Integer [1, 100]: Number of boosting iterations |
rules |
Logical: If |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
control |
List: output of |
costs |
Matrix: Cost matrix. See |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_CART()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_LMTree()
Train a CART for regression or classification using rpart
s_CART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, method = "auto", parms = NULL, minsplit = 2, minbucket = round(minsplit/3), cp = 0.01, maxdepth = 20, maxcompete = 0, maxsurrogate = 0, usesurrogate = 2, surrogatestyle = 0, xval = 0, cost = NULL, model = TRUE, prune.cp = NULL, use.prune.rpart.rt = TRUE, return.unpruned = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, save.gridrun = FALSE, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
s_CART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, method = "auto", parms = NULL, minsplit = 2, minbucket = round(minsplit/3), cp = 0.01, maxdepth = 20, maxcompete = 0, maxsurrogate = 0, usesurrogate = 2, surrogatestyle = 0, xval = 0, cost = NULL, model = TRUE, prune.cp = NULL, use.prune.rpart.rt = TRUE, return.unpruned = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, save.gridrun = FALSE, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
method |
Character: "auto", "anova", "poisson", "class" or "exp". |
parms |
List of additional parameters for the splitting function.
See |
minsplit |
[gS] Integer: Minimum number of cases that must belong in a node before considering a split. |
minbucket |
[gS] Integer: Minimum number of cases allowed in a child node. |
cp |
[gS] Float: Complexity threshold for allowing a split. |
maxdepth |
[gS] Integer: Maximum depth of tree. |
maxcompete |
Integer: The number of competitor splits saved in the output |
maxsurrogate |
Integer: The number of surrogate splits retained in the
output (See |
usesurrogate |
See |
surrogatestyle |
See |
xval |
Integer: Number of cross-validations |
cost |
Vector, Float (> 0): One for each variable in the model.
See |
model |
Logical: If TRUE, keep a copy of the model. |
prune.cp |
[gS] Numeric: Complexity for cost-complexity pruning after tree is built |
use.prune.rpart.rt |
(Testing only, do not change) |
return.unpruned |
Logical: If TRUE and |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
save.gridrun |
Logical: If TRUE, save grid search models. |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
[gS] indicates grid search will be performed automatically if more than one value is passed
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_C50()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_LMTree()
Train a conditional inference tree using partykit::ctree
s_CTree( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, control = partykit::ctree_control(), ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_CTree( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, control = partykit::ctree_control(), ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
weights |
Numeric vector: Weights for cases. For classification, |
control |
List of parameters for the CTree algorithms. Set using
|
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
rtMod
object
E.D. Gennatas
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Train a EVTree for regression or classification using evtree
s_EVTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, control = evtree::evtree.control(), na.action = na.exclude, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_EVTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, control = evtree::evtree.control(), na.action = na.exclude, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
control |
Passed to |
na.action |
How to handle missing values. See |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Trains a GAM using mgcv::gam
and validates it.
Input will be used to create a formula of the form:
s_GAM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, k = 6, family = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, method = "REML", select = FALSE, removeMissingLevels = TRUE, spline.index = NULL, verbose = TRUE, trace = 0, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, n.cores = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_GAM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, k = 6, family = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, method = "REML", select = FALSE, removeMissingLevels = TRUE, spline.index = NULL, verbose = TRUE, trace = 0, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, n.cores = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
k |
Integer. Number of bases for smoothing spline |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
method |
Character: "auto", "anova", "poisson", "class" or "exp". |
select |
Logical: Passed to |
verbose |
Logical: If TRUE, print summary to screen. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
n.cores |
Integer: Number of cores to use. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a GBM model using gbm::gbm.fit
s_GBM( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, distribution = NULL, interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.9, n.minobsinnode = 5, n.trees = 2000, max.trees = 5000, force.n.trees = NULL, gbm.select.smooth = FALSE, n.new.trees = 500, min.trees = 50, failsafe.trees = 500, imetrics = FALSE, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, plot.tune.error = FALSE, n.cores = rtCores, relInf = TRUE, varImp = FALSE, offset = NULL, var.monotone = NULL, keep.data = TRUE, var.names = NULL, response.name = "y", checkmods = FALSE, group = NULL, plot.perf = FALSE, plot.res = ifelse(!is.null(outdir), TRUE, FALSE), plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, plot.theme = rtTheme, x.name = NULL, y.name = NULL, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, gbm.fit.verbose = FALSE, outdir = NULL, save.gridrun = FALSE, save.res = FALSE, save.res.mod = FALSE, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
s_GBM( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, distribution = NULL, interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.9, n.minobsinnode = 5, n.trees = 2000, max.trees = 5000, force.n.trees = NULL, gbm.select.smooth = FALSE, n.new.trees = 500, min.trees = 50, failsafe.trees = 500, imetrics = FALSE, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, plot.tune.error = FALSE, n.cores = rtCores, relInf = TRUE, varImp = FALSE, offset = NULL, var.monotone = NULL, keep.data = TRUE, var.names = NULL, response.name = "y", checkmods = FALSE, group = NULL, plot.perf = FALSE, plot.res = ifelse(!is.null(outdir), TRUE, FALSE), plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, plot.theme = rtTheme, x.name = NULL, y.name = NULL, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, gbm.fit.verbose = FALSE, outdir = NULL, save.gridrun = FALSE, save.res = FALSE, save.res.mod = FALSE, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
distribution |
Character: Distribution of the response variable. See gbm::gbm |
interaction.depth |
[gS] Integer: Interaction depth. |
shrinkage |
[gS] Float: Shrinkage (learning rate). |
bag.fraction |
[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting. |
n.minobsinnode |
[gS] Integer: Minimum number of observation allowed in node. |
n.trees |
Integer: Initial number of trees to fit |
max.trees |
Integer: Maximum number of trees to fit |
force.n.trees |
Integer: If specified, use this number of trees instead of tuning number of trees |
gbm.select.smooth |
Logical: If TRUE, smooth the validation error curve. |
n.new.trees |
Integer: Number of new trees to train if stopping criteria have not been met. |
min.trees |
Integer: Minimum number of trees to fit. |
failsafe.trees |
Integer: If tuning fails to find n.trees, use this number instead. |
imetrics |
Logical: If TRUE, save |
.gs |
Internal use only |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
plot.tune.error |
Logical: If TRUE, plot the tuning error curve. |
n.cores |
Integer: Number of cores to use. |
relInf |
Logical: If TRUE (Default), estimate variables' relative influence. |
varImp |
Logical: If TRUE, estimate variable importance by permutation (as in random forests; noted as experimental in gbm). Takes longer than (default) relative influence. The two measures are highly correlated. |
offset |
Numeric vector of offset values, passed to |
var.monotone |
Integer vector with values 0, 1, -1 and length = N features.
Used to define monotonicity constraints. |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
print.plot |
Logical: if TRUE, produce plot using |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
outdir |
Character: If defined, save log, 'plot.all' plots (see above) and RDS file of complete output |
save.gridrun |
Logical: If TRUE, save grid search models. |
save.res.mod |
Logical: If TRUE, save gbm model for each grid run. For diagnostic purposes only: Object size adds up quickly |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
Early stopping is implemented by fitting n.trees
initially, checking the
optionally smoothed validation error curve, and adding n.new.trees
if
needed, until error does not reduce or max.trees
is reached.
[gS] in the argument description indicates that a vector of values can be
passed, in which case grid search will be performed automatically using the
resampling scheme defined by grid.resample.params
.
This function includes a workaround for when gbm.fit
fails.
If an error is detected, gbm.fit
is rerun until successful and the
procedure continues normally
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Ensembles:
s_AdaBoost()
,
s_RF()
,
s_Ranger()
Train a Generalized Linear Model for Regression or Classification (i.e. Logistic Regression) using stats::glm
.
If outcome y
has more than two classes, Multinomial Logistic Regression is performed using
nnet::multinom
s_GLM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, family = NULL, interactions = NULL, class.method = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, intercept = TRUE, polynomial = FALSE, poly.d = 3, poly.raw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, removeMissingLevels = TRUE, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_GLM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, family = NULL, interactions = NULL, class.method = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, intercept = TRUE, polynomial = FALSE, poly.d = 3, poly.raw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, removeMissingLevels = TRUE, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
family |
Error distribution and link function. See |
interactions |
List of character pairs denoting column names in |
class.method |
Character: Define "logistic" or "multinom" for classification. The only purpose
of this is so you can try |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
intercept |
Logical: If TRUE, fit an intercept term. |
polynomial |
Logical: if TRUE, run lm on |
poly.d |
Integer: degree of polynomial. |
poly.raw |
Logical: if TRUE, use raw polynomials. Default, which should not really be changed is FALSE |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
na.action |
How to handle missing values. See |
removeMissingLevels |
Logical: If TRUE, finds factors in |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
A common problem with glm
arises when the testing set containts a predictor with more
levels than those in the same predictor in the training set, resulting in error. This can happen
when training on resamples of a data set, especially after stratifying against a different
outcome, and results in error and no prediction. s_GLM
automatically finds such cases
and substitutes levels present in x.test
and not in x
with NA.
rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_C50()
,
s_CART()
,
s_GLMNET()
,
s_GLMTree()
,
s_LMTree()
x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) / 2 mod <- s_GLM(x, y)
x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) / 2 mod <- s_GLM(x, y)
Train an elastic net model
s_GLMNET( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, intercept = TRUE, nway.interactions = 0, family = NULL, alpha = seq(0, 1, 0.2), lambda = NULL, nlambda = 100, which.cv.lambda = c("lambda.1se", "lambda.min"), penalty.factor = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, res.summary.fn = mean, metric = NULL, maximize = NULL, .gs = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_GLMNET( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, intercept = TRUE, nway.interactions = 0, family = NULL, alpha = seq(0, 1, 0.2), lambda = NULL, nlambda = 100, which.cv.lambda = c("lambda.1se", "lambda.min"), penalty.factor = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, res.summary.fn = mean, metric = NULL, maximize = NULL, .gs = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
intercept |
Logical: If TRUE, include intercept in the model. |
nway.interactions |
Integer: Number of n-way interactions to include in the model. |
family |
Error distribution and link function. See |
alpha |
[gS] Float [0, 1]: The elasticnet mixing parameter:
|
lambda |
[gS] Float vector: Best left to NULL, |
nlambda |
Integer: Number of lambda values to compute |
which.cv.lambda |
Character: Which lambda to use for prediction: "lambda.1se" or "lambda.min" |
penalty.factor |
Float vector: Multiply the penalty for each coefficient by the values in this vector. This is most useful for specifying different penalties for different groups of variables |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
res.summary.fn |
Function: Used to average resample runs. |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
.gs |
(Internal use only) |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
s_GLMNET
runs glmnet::cv.glmnet
for each value of alpha, for each resample in
grid.resample.params
.
Mean values for min.lambda
and MSE (Regression) or Accuracy (Classification) are aggregated for each
alpha and resample combination
\[gS\]
Indicates tunable hyperparameters: If more than a single value is provided, grid search will be
automatically performed
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_C50()
,
s_CART()
,
s_GLM()
,
s_GLMTree()
,
s_LMTree()
Train a GLMTree for regression or classification using
partykit::glmtree
s_GLMTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, alpha = 0.05, bonferroni = TRUE, minsize = NULL, maxdepth = Inf, prune = NULL, minsplit = minsize, minbucket = minsize, epsilon = 1e-08, maxit = 25, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.exclude, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_GLMTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, alpha = 0.05, bonferroni = TRUE, minsize = NULL, maxdepth = Inf, prune = NULL, minsplit = minsize, minbucket = minsize, epsilon = 1e-08, maxit = 25, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.exclude, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
maxdepth |
[gS] Integer: Maximum depth of tree. |
minsplit |
[gS] Integer: Minimum number of cases that must belong in a node before considering a split. |
minbucket |
[gS] Integer: Minimum number of cases allowed in a child node. |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments passed to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_C50()
,
s_CART()
,
s_GLM()
,
s_GLMNET()
,
s_LMTree()
Train a Generalized Least Squares regression model using nlme::gls
s_GLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, interactions = FALSE, nway.interactions = 0, covariate = NULL, weights = NULL, intercept = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_GLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, interactions = FALSE, nway.interactions = 0, covariate = NULL, weights = NULL, intercept = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
interactions |
List of character pairs denoting column names in |
nway.interactions |
Integer: Include n-way interactions. This integer defines
the n in: |
covariate |
Character: Name of column. Will include interactions between all features this variable. |
weights |
Numeric vector: Weights for cases. For classification, |
intercept |
Logical: If TRUE, fit an intercept term. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
na.action |
How to handle missing values. See |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
rtMod
E.D. Gennatas
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Trains a Deep Neural Net using H2O (http://www.h2o.ai)
Check out the H2O Flow at [ip]:[port]
, Default IP:port is "localhost:54321"
e.g. if running on localhost, point your web browser to localhost:54321
s_H2ODL( x, y = NULL, x.test = NULL, y.test = NULL, x.valid = NULL, y.valid = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, n.hidden.nodes = c(20, 20), epochs = 1000, activation = "Rectifier", mini.batch.size = 1, learning.rate = 0.005, adaptive.rate = TRUE, rho = 0.99, epsilon = 1e-08, rate.annealing = 1e-06, rate.decay = 1, momentum.start = 0, momentum.ramp = 1e+06, momentum.stable = 0, nesterov.accelerated.gradient = TRUE, input.dropout.ratio = 0, hidden.dropout.ratios = NULL, l1 = 0, l2 = 0, max.w2 = 3.4028235e+38, nfolds = 0, initial.biases = NULL, initial.weights = NULL, loss = "Automatic", distribution = "AUTO", stopping.rounds = 5, stopping.metric = "AUTO", upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_H2ODL( x, y = NULL, x.test = NULL, y.test = NULL, x.valid = NULL, y.valid = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, n.hidden.nodes = c(20, 20), epochs = 1000, activation = "Rectifier", mini.batch.size = 1, learning.rate = 0.005, adaptive.rate = TRUE, rho = 0.99, epsilon = 1e-08, rate.annealing = 1e-06, rate.decay = 1, momentum.start = 0, momentum.ramp = 1e+06, momentum.stable = 0, nesterov.accelerated.gradient = TRUE, input.dropout.ratio = 0, hidden.dropout.ratios = NULL, l1 = 0, l2 = 0, max.w2 = 3.4028235e+38, nfolds = 0, initial.biases = NULL, initial.weights = NULL, loss = "Automatic", distribution = "AUTO", stopping.rounds = 5, stopping.metric = "AUTO", upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Vector / Matrix / Data Frame: Training set Predictors |
y |
Vector: Training set outcome |
x.test |
Vector / Matrix / Data Frame: Testing set Predictors |
y.test |
Vector: Testing set outcome |
x.valid |
Vector / Matrix / Data Frame: Validation set Predictors |
y.valid |
Vector: Validation set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port number for server. Default = 54321 |
Integer vector of length equal to the number of hidden layers you wish to create |
|
epochs |
Integer: How many times to iterate through the dataset. Default = 1000 |
activation |
Character: Activation function to use: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Default = "Rectifier" |
learning.rate |
Float: Learning rate to use for training. Default = .005 |
adaptive.rate |
Logical: If TRUE, use adaptive learning rate. Default = TRUE |
rate.annealing |
Float: Learning rate annealing: rate / (1 + rate_annealing * samples). Default = 1e-6 |
input.dropout.ratio |
Float (0, 1): Dropout ratio for inputs |
Vector, Float (0, 2): Dropout ratios for hidden layers |
|
l1 |
Float (0, 1): L1 regularization (introduces sparseness; i.e. sets many weights to 0; reduces variance, increases generalizability) |
l2 |
Float (0, 1): L2 regularization (prevents very large absolute weights; reduces variance, increases generalizability) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
na.action |
How to handle missing values. See |
n.cores |
Integer: Number of cores to use |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional parameters to pass to |
x & y form the training set. x.test & y.test form the testing set used only to test model generalizability. x.valid & y.valid form the validation set used to monitor training progress
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Deep Learning:
d_H2OAE()
,
s_TFN()
Trains a Gradient Boosting Machine using H2O (http://www.h2o.ai)
s_H2OGBM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, h2o.init = TRUE, gs.h2o.init = FALSE, h2o.shutdown.at.end = TRUE, grid.resample.params = setup.resample("kfold", 5), metric = NULL, maximize = NULL, n.trees = 10000, force.n.trees = NULL, max.depth = 5, n.stopping.rounds = 50, stopping.metric = "AUTO", p.col.sample = 1, p.row.sample = 0.9, minobsinnode = 5, min.split.improvement = 1e-05, quantile.alpha = 0.5, learning.rate = 0.01, learning.rate.annealing = 1, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, grid.n.cores = 1, n.cores = rtCores, imetrics = FALSE, .gs = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, save.mod = FALSE, outdir = NULL, ... )
s_H2OGBM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, h2o.init = TRUE, gs.h2o.init = FALSE, h2o.shutdown.at.end = TRUE, grid.resample.params = setup.resample("kfold", 5), metric = NULL, maximize = NULL, n.trees = 10000, force.n.trees = NULL, max.depth = 5, n.stopping.rounds = 50, stopping.metric = "AUTO", p.col.sample = 1, p.row.sample = 0.9, minobsinnode = 5, min.split.improvement = 1e-05, quantile.alpha = 0.5, learning.rate = 0.01, learning.rate.annealing = 1, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, grid.n.cores = 1, n.cores = rtCores, imetrics = FALSE, .gs = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, save.mod = FALSE, outdir = NULL, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port number for server. Default = 54321 |
h2o.shutdown.at.end |
Logical: If TRUE, run |
n.trees |
Integer: Number of trees to grow. Maximum number of trees if |
max.depth |
[gS] Integer: Depth of trees to grow |
n.stopping.rounds |
Integer: If > 0, stop training if |
stopping.metric |
Character: "AUTO" (Default), "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error" |
p.col.sample |
[gS] |
p.row.sample |
[gS] |
minobsinnode |
[gS] |
learning.rate |
[gS] |
learning.rate.annealing |
[gS] |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
na.action |
How to handle missing values. See |
n.cores |
Integer: Number of cores to use |
.gs |
Internal use only |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional arguments |
[gS] denotes tunable hyperparameters
Warning: If you get an HTTP 500 error at random, use h2o.shutdown()
to shutdown the server.
It will be restarted when s_H2OGBM
is called
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Trains a Random Forest model using H2O (http://www.h2o.ai)
s_H2ORF( x, y = NULL, x.test = NULL, y.test = NULL, x.valid = NULL, y.valid = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, n.trees = 500, max.depth = 20, n.stopping.rounds = 0, mtry = -1, nfolds = 0, weights = NULL, balance.classes = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, h2o.shutdown.at.end = TRUE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
s_H2ORF( x, y = NULL, x.test = NULL, y.test = NULL, x.valid = NULL, y.valid = NULL, x.name = NULL, y.name = NULL, ip = "localhost", port = 54321, n.trees = 500, max.depth = 20, n.stopping.rounds = 0, mtry = -1, nfolds = 0, weights = NULL, balance.classes = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.fail, h2o.shutdown.at.end = TRUE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
x |
Training set features |
y |
Training set outcome |
x.test |
Testing set features (Used to evaluate model performance) |
y.test |
Testing set outcome |
x.valid |
Validation set features (Used to build model / tune hyperparameters) |
y.valid |
Validation set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
ip |
Character: IP address of H2O server. Default = "localhost" |
port |
Integer: Port to connect to at |
n.trees |
Integer: Number of trees to grow |
max.depth |
Integer: Maximum tree depth |
n.stopping.rounds |
Integer: Early stopping if simple moving average of this many rounds does not improve. Set to 0 to disable early stopping. |
mtry |
Integer: Number of variables randomly sampled and considered for
splitting at each round. If set to -1, defaults to |
nfolds |
Integer: Number of folds for K-fold CV used by |
weights |
Numeric vector: Weights for cases. For classification, |
balance.classes |
Logical: If TRUE, |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
na.action |
How to handle missing values. See |
h2o.shutdown.at.end |
Logical: If TRUE, run |
n.cores |
Integer: Number of cores to use |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional parameters to pass to |
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Train a HAL model
s_HAL( x, y = NULL, x.test = NULL, y.test = NULL, family = NULL, max.degree = ifelse(ncol(x) >= 20, 2, 3), lambda = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, upsample = FALSE, downsample = FALSE, resample.seed = NULL, metric = NULL, maximize = NULL, .gs = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_HAL( x, y = NULL, x.test = NULL, y.test = NULL, family = NULL, max.degree = ifelse(ncol(x) >= 20, 2, 3), lambda = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, upsample = FALSE, downsample = FALSE, resample.seed = NULL, metric = NULL, maximize = NULL, .gs = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
family |
Error distribution and link function. See |
max.degree |
Integer: The highest order of interaction terms to generate basis functions for. |
lambda |
Float vector: hal9001::fit_hal lambda |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
.gs |
Internal use only |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
\[gS\]
Indicates tunable hyperparameters: If more than a single value is provided,
grid search will be automatically performed
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a k-Nearest Neighbors learner for regression or classification using FNN
s_KNN( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, k = 3, algorithm = "kd_tree", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
s_KNN( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, k = 3, algorithm = "kd_tree", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
k |
Integer: Number of neighbors considered |
algorithm |
Character: Algorithm to use. Options: "kd_tree", "cover_tree", "brute" |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Optional. Path to directory to save output |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an LDA Classifier using MASS::lda
s_LDA( x, y = NULL, x.test = NULL, y.test = NULL, prior = NULL, method = "moment", nu = NULL, upsample = TRUE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LDA( x, y = NULL, x.test = NULL, y.test = NULL, prior = NULL, method = "moment", nu = NULL, upsample = TRUE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
prior |
Numeric: Prior probabilities of class membership |
method |
"moment" for standard estimators of the mean and variance, "mle" for MLEs, "mve" to use cov.mve, or "t" for robust estimates based on a t distribution |
nu |
Integer: Degrees of freedom for method = "t" |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments passed to |
Note: LDA requires all predictors to be numeric. The variable importance output ("varimp") is the vector of coefficients for LD1
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a single decision tree using LightGBM.
s_LightCART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, objective = NULL, num_leaves = 32L, max_depth = -1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = 0, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LightCART( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, objective = NULL, num_leaves = 32L, max_depth = -1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = 0, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
objective |
(Default = NULL) |
num_leaves |
Integer: [gS] Maximum tree leaves for base learners. |
max_depth |
Integer: [gS] Maximum tree depth for base learners, <=0 means no limit. |
lambda_l1 |
Numeric: [gS] L1 regularization term |
lambda_l2 |
Numeric: [gS] L2 regularization term |
max_cat_threshold |
Integer: Max number of splits to consider for categorical variable |
min_data_per_group |
Integer: Minimum number of observations per categorical group |
linear_tree |
Logical: [gS] If |
.gs |
(Internal use only) |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
lightgbm_verbose |
Integer: Passed to |
save.gridrun |
Logical: If |
n.cores |
Integer: Number of cores to use. |
n_threads |
Integer: Number of threads for lightgbm using OpenMP. Only
parallelize resamples using |
force_col_wise |
Logical: If |
force_row_wise |
Logical: If |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Extra arguments appended to |
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. LightGBM trains trees leaf-wise (best-first) rather than depth-wise. For categorical variables, convert to integer and indicate to lgb they are categorical, so that they are not treated as numeric.
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightGBM(dat) ## End(Not run)
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightGBM(dat) ## End(Not run)
Tune hyperparameters using grid search and resampling, train a final model, and validate it
s_LightGBM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, boosting = "gbdt", objective = NULL, max_nrounds = 1000L, force_nrounds = NULL, early_stopping_rounds = 10L, nrounds_default = 100L, num_leaves = 32L, max_depth = -1L, learning_rate = 0.01, feature_fraction = 1, subsample = 0.8, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, tree_learner = "serial", .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = 0, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LightGBM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, boosting = "gbdt", objective = NULL, max_nrounds = 1000L, force_nrounds = NULL, early_stopping_rounds = 10L, nrounds_default = 100L, num_leaves = 32L, max_depth = -1L, learning_rate = 0.01, feature_fraction = 1, subsample = 0.8, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, tree_learner = "serial", .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = 0, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
boosting |
Character: [gS] "gbdt", "rf", "dart", "goss" |
objective |
(Default = NULL) |
max_nrounds |
Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error |
force_nrounds |
Integer: Number of rounds to run if not estimating optimal number by CV |
early_stopping_rounds |
Integer: Training on resamples of |
nrounds_default |
Integer: Default number of rounds to run if cross-validation fails - likely will never be used |
num_leaves |
Integer: [gS] Maximum tree leaves for base learners. |
max_depth |
Integer: [gS] Maximum tree depth for base learners, <=0 means no limit. |
learning_rate |
Numeric: [gS] Boosting learning rate |
feature_fraction |
Numeric (0, 1): [gS] Fraction of features to consider at each iteration (i.e. tree) |
subsample |
Numeric: [gS] Subsample ratio of the training set. |
subsample_freq |
Integer: Subsample every this many iterations |
lambda_l1 |
Numeric: [gS] L1 regularization term |
lambda_l2 |
Numeric: [gS] L2 regularization term |
max_cat_threshold |
Integer: Max number of splits to consider for categorical variable |
min_data_per_group |
Integer: Minimum number of observations per categorical group |
linear_tree |
Logical: [gS] If |
tree_learner |
Character: [gS] "serial", "feature", "data", "voting" |
.gs |
(Internal use only) |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
lightgbm_verbose |
Integer: Passed to |
save.gridrun |
Logical: If |
n.cores |
Integer: Number of cores to use. |
n_threads |
Integer: Number of threads for lightgbm using OpenMP. Only
parallelize resamples using |
force_col_wise |
Logical: If |
force_row_wise |
Logical: If |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Extra arguments appended to |
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. LightGBM trains trees leaf-wise (best-first) rather than depth-wise. For categorical variables, convert to integer and indicate to lgb they are categorical, so that they are not treated as numeric.
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightGBM(dat) ## End(Not run)
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightGBM(dat) ## End(Not run)
Random Forest using LightGBM
s_LightRF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, objective = NULL, nrounds = 500L, early_stopping_rounds = -1L, num_leaves = 4096L, max_depth = -1L, learning_rate = 1, feature_fraction = 1, subsample = 0.623, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, tree_learner = "data_parallel", grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = rtCores, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), .gs = FALSE, ... )
s_LightRF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, objective = NULL, nrounds = 500L, early_stopping_rounds = -1L, num_leaves = 4096L, max_depth = -1L, learning_rate = 1, feature_fraction = 1, subsample = 0.623, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, max_cat_threshold = 32L, min_data_per_group = 32L, linear_tree = FALSE, tree_learner = "data_parallel", grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, lightgbm_verbose = -1, save.gridrun = FALSE, n.cores = 1, n_threads = rtCores, force_col_wise = FALSE, force_row_wise = FALSE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), .gs = FALSE, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
objective |
(Default = NULL) |
nrounds |
Integer: Number of trees to grow |
early_stopping_rounds |
Integer: Training on resamples of |
num_leaves |
Integer: [gS] Maximum tree leaves for base learners. |
max_depth |
Integer: [gS] Maximum tree depth for base learners, <=0 means no limit. |
learning_rate |
Numeric: [gS] Boosting learning rate |
feature_fraction |
Numeric (0, 1): [gS] Fraction of features to consider at each iteration (i.e. tree) |
subsample |
Numeric: [gS] Subsample ratio of the training set. |
subsample_freq |
Integer: Subsample every this many iterations |
lambda_l1 |
Numeric: [gS] L1 regularization term |
lambda_l2 |
Numeric: [gS] L2 regularization term |
max_cat_threshold |
Integer: Max number of splits to consider for categorical variable |
min_data_per_group |
Integer: Minimum number of observations per categorical group |
linear_tree |
Logical: [gS] If |
tree_learner |
Character: [gS] "serial", "feature", "data", "voting" |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
lightgbm_verbose |
Integer: Passed to |
save.gridrun |
Logical: If |
n.cores |
Integer: Number of cores to use. |
n_threads |
Integer: Number of threads for lightgbm using OpenMP. Only
parallelize resamples using |
force_col_wise |
Logical: If |
force_row_wise |
Logical: If |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
.gs |
(Internal use only) |
... |
Extra arguments appended to |
ED Gennatas
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightRF(dat) ## End(Not run)
## Not run: x <- rnormmat(500, 10) y <- x[, 3] + .5 * x[, 5]^2 + rnorm(500) dat <- data.frame(x, y) mod <- s_LightRF(dat) ## End(Not run)
Train a LightGBM gradient boosting model, extract rules, and fit using LASSO
s_LightRuleFit( x, y = NULL, x.test = NULL, y.test = NULL, lgbm.mod = NULL, n_trees = 200, num_leaves = 32L, max_depth = 4, learning_rate = 0.1, subsample = 0.666, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, objective = NULL, importance = FALSE, lgbm.ifw = TRUE, lgbm.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5), glmnet.ifw = TRUE, alpha = 1, lambda = NULL, glmnet.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5), grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, grid.verbose = FALSE, save.gridrun = FALSE, weights = NULL, empirical_risk = TRUE, cases_by_rules = NULL, save_cases_by_rules = FALSE, x.name = NULL, y.name = NULL, n.cores = rtCores, question = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, outdir = NULL, save.mod = if (!is.null(outdir)) TRUE else FALSE, verbose = TRUE, trace = 0, .gs = FALSE )
s_LightRuleFit( x, y = NULL, x.test = NULL, y.test = NULL, lgbm.mod = NULL, n_trees = 200, num_leaves = 32L, max_depth = 4, learning_rate = 0.1, subsample = 0.666, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, objective = NULL, importance = FALSE, lgbm.ifw = TRUE, lgbm.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5), glmnet.ifw = TRUE, alpha = 1, lambda = NULL, glmnet.grid.resample.params = setup.resample(resampler = "kfold", n.resamples = 5), grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, grid.verbose = FALSE, save.gridrun = FALSE, weights = NULL, empirical_risk = TRUE, cases_by_rules = NULL, save_cases_by_rules = FALSE, x.name = NULL, y.name = NULL, n.cores = rtCores, question = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, outdir = NULL, save.mod = if (!is.null(outdir)) TRUE else FALSE, verbose = TRUE, trace = 0, .gs = FALSE )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
lgbm.mod |
rtMod object created by s_LightGBM. If provided, the gradient boosting step is skipped. |
num_leaves |
Integer: [gS] Maximum tree leaves for base learners. |
max_depth |
Integer: [gS] Maximum tree depth for base learners, <=0 means no limit. |
learning_rate |
Numeric: [gS] Boosting learning rate |
subsample |
Numeric: [gS] Subsample ratio of the training set. |
subsample_freq |
Integer: Subsample every this many iterations |
lambda_l1 |
Numeric: [gS] L1 regularization term |
lambda_l2 |
Numeric: [gS] L2 regularization term |
objective |
(Default = NULL) |
importance |
Logical: If |
alpha |
[gS] Float [0, 1]: The elasticnet mixing parameter:
|
lambda |
[gS] Float vector: Best left to NULL, |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
grid.verbose |
Logical: Passed to |
save.gridrun |
Logical: If |
weights |
Numeric vector: Weights for cases. For classification, |
empirical_risk |
Logical: If TRUE, calculate empirical risk |
cases_by_rules |
Matrix of cases by rules from a previoue rulefit run. If provided, the GBM step is skipped. |
save_cases_by_rules |
Logical: If TRUE, save cases_by_rules to object |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
n.cores |
Integer: Number of cores to use |
question |
Character: the question you are attempting to answer with this model, in plain language. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: Verbosity level |
.gs |
(Internal use only) |
Based on "Predictive Learning via Rule Ensembles" by Friedman and Popescu http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf
rtMod
object
E.D. Gennatas
Friedman JH, Popescu BE, "Predictive Learning via Rule Ensembles", http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf
Train a Linear Hard Hybrid Tree for Regression
s_LIHAD( x, y = NULL, x.test = NULL, y.test = NULL, max.depth = 3, alpha = 0, lambda = 0.1, lincoef.params = setup.lincoef("glmnet"), minobsinnode = 2, minobsinnode.lin = 10, learning.rate = 1, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, weights = NULL, metric = "MSE", maximize = FALSE, grid.resample.params = setup.grid.resample(), keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, verbose = TRUE, verbose.predict = FALSE, trace = 0, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE )
s_LIHAD( x, y = NULL, x.test = NULL, y.test = NULL, max.depth = 3, alpha = 0, lambda = 0.1, lincoef.params = setup.lincoef("glmnet"), minobsinnode = 2, minobsinnode.lin = 10, learning.rate = 1, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, weights = NULL, metric = "MSE", maximize = FALSE, grid.resample.params = setup.grid.resample(), keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, verbose = TRUE, verbose.predict = FALSE, trace = 0, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
max.depth |
[gS] Integer: Max depth of additive tree. Default = 3 |
alpha |
[gS] Float: |
lambda |
[gS] Float: |
lincoef.params |
Named List: Output of setup.lincoef |
minobsinnode |
[gS] Integer: Minimum N observations needed in node, before considering splitting |
minobsinnode.lin |
Integer: Minimum N observations needed in node in order to train linear model. |
learning.rate |
[gS] Float (0, 1): Learning rate. |
part.max.depth |
Integer: Max depth for each tree model within the additive tree |
part.cp |
[gS] Float: Minimum complexity needed to allow split by |
weights |
Numeric vector: Weights for cases. For classification, |
cxrcoef |
Logical: Passed to predict.lihad, if TRUE, returns cases by coefficients matrix |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
The Hybrid Tree grows a tree using a sequence of regularized linear models and tree stumps. Use s_LINAD for the standard Linear Additive Tree Algorithm, which grows branches stepwise and includes all observations weighted by gamma
Grid searched parameters: max.depth, alpha, lambda, minobsinnode, learning.rate, part.cp
E.D. Gennatas
Boost a Linear Hard Additive Tree (i.e. LIHAD, i.e. LINAD with hard splits)
s_LIHADBoost( x, y = NULL, x.test = NULL, y.test = NULL, resid = NULL, boost.obj = NULL, learning.rate = 0.5, case.p = 1, max.depth = 5, gamma = 0.1, alpha = 0, lambda = 1, lambda.seq = NULL, minobsinnode = 2, minobsinnode.lin = 10, shrinkage = 1, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, part.minbucket = 5, lin.type = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve", "none"), cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", max.iter = 10, tune.n.iter = TRUE, earlystop.params = setup.earlystop(), lookback = TRUE, init = NULL, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, cxrcoef = FALSE, print.progress.every = 5, print.error.plot = "final", x.name = NULL, y.name = NULL, question = NULL, base.verbose = FALSE, verbose = TRUE, grid.verbose = FALSE, trace = 0, prefix = NULL, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, print.plot = FALSE, print.base.plot = FALSE, print.tune.plot = TRUE, plot.type = "l", save.gridrun = FALSE, outdir = NULL, n.cores = rtCores, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LIHADBoost( x, y = NULL, x.test = NULL, y.test = NULL, resid = NULL, boost.obj = NULL, learning.rate = 0.5, case.p = 1, max.depth = 5, gamma = 0.1, alpha = 0, lambda = 1, lambda.seq = NULL, minobsinnode = 2, minobsinnode.lin = 10, shrinkage = 1, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, part.minbucket = 5, lin.type = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve", "none"), cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", max.iter = 10, tune.n.iter = TRUE, earlystop.params = setup.earlystop(), lookback = TRUE, init = NULL, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, cxrcoef = FALSE, print.progress.every = 5, print.error.plot = "final", x.name = NULL, y.name = NULL, question = NULL, base.verbose = FALSE, verbose = TRUE, grid.verbose = FALSE, trace = 0, prefix = NULL, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, print.plot = FALSE, print.base.plot = FALSE, print.tune.plot = TRUE, plot.type = "l", save.gridrun = FALSE, outdir = NULL, n.cores = rtCores, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
learning.rate |
Float (0, 1] Learning rate for the additive steps |
max.iter |
Integer: Maximum number of iterations (additive steps) to perform. Default = 10 |
init |
Float: Initial value for prediction. Default = mean(y) |
print.error.plot |
String or Integer: "final" plots a training and validation (if available) error curve at the end of training. If integer, plot training and validation error curve every this many iterations during training |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
base.verbose |
Logical: |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If > 0, print diagnostic info to console |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
print.plot |
Logical: if TRUE, produce plot using |
print.base.plot |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional parameters to be passed to learner |
By default, early stopping works by checking training loss.
E.D. Gennatas
Train a Linear Additive Tree for Regression or Binary Classification
s_LINAD( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, max.leaves = 20, lookback = TRUE, force.max.leaves = NULL, learning.rate = 0.5, ifw = TRUE, ifw.type = 1, upsample = FALSE, downsample = FALSE, resample.seed = NULL, leaf.model = c("line", "spline"), gamlearner = "gamsel", gam.params = list(), nvmax = 3, gamma = 0.5, gamma.on.lin = FALSE, lin.type = c("glmnet", "forwardStepwise", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", "none"), first.lin.type = "cv.glmnet", first.lin.learning.rate = 1, first.lin.alpha = 1, first.lin.lambda = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", alpha = 1, lambda = 0.05, lambda.seq = NULL, minobsinnode.lin = 10, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, part.minbucket = 1, .rho = TRUE, rho.max = 1000, init = NULL, metric = "auto", maximize = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", save.gridrun = FALSE, select.leaves.smooth = FALSE, cluster = FALSE, keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, .preprocess = NULL, verbose = TRUE, grid.verbose = FALSE, plot.tuning = FALSE, verbose.predict = FALSE, trace = 1, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE, .gs = FALSE )
s_LINAD( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, max.leaves = 20, lookback = TRUE, force.max.leaves = NULL, learning.rate = 0.5, ifw = TRUE, ifw.type = 1, upsample = FALSE, downsample = FALSE, resample.seed = NULL, leaf.model = c("line", "spline"), gamlearner = "gamsel", gam.params = list(), nvmax = 3, gamma = 0.5, gamma.on.lin = FALSE, lin.type = c("glmnet", "forwardStepwise", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", "none"), first.lin.type = "cv.glmnet", first.lin.learning.rate = 1, first.lin.alpha = 1, first.lin.lambda = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", alpha = 1, lambda = 0.05, lambda.seq = NULL, minobsinnode.lin = 10, part.minsplit = 2, part.xval = 0, part.max.depth = 1, part.cp = 0, part.minbucket = 1, .rho = TRUE, rho.max = 1000, init = NULL, metric = "auto", maximize = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", save.gridrun = FALSE, select.leaves.smooth = FALSE, cluster = FALSE, keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, .preprocess = NULL, verbose = TRUE, grid.verbose = FALSE, plot.tuning = FALSE, verbose.predict = FALSE, trace = 1, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE, .gs = FALSE )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
weights |
Numeric vector: Weights for cases. For classification, |
max.leaves |
Integer: Maximum number of terminal nodes to grow. Setting
this to a value > 1, triggers cross-validation to find best number of leaves.
To force a given number of leaves and not cross-validate, set
|
lookback |
Logical: If TRUE, use validation error to decide best number of leaves to use. |
force.max.leaves |
Integer: If set, |
learning.rate |
[gS] Numeric: learning rate for steps after initial linear model |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
nvmax |
[gS] Integer: Number of max features to use for lin.type
"allSubsets", "forwardStepwise", or "backwardStepwise". If values greater
than n of features in |
gamma |
[gS] Numeric: Soft weighting parameter. Weights of cases that do not belong to node get multiplied by this amount |
lin.type |
Character: One of "glmnet", "forwardStepwise", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", or "none" to not fit linear models See lincoef for more |
first.lin.type |
Character: same options as |
first.lin.alpha |
Numeric: alpha for the first linear model, if
|
lambda |
[gS] Numeric: lambda value for lin.type |
minobsinnode.lin |
[gS] Integer: Minimum number of observation needed to fit linear model |
part.minsplit |
[gS] Integer: Minimum number of observations in node to consider splitting |
part.max.depth |
Integer: Max depth for each tree model within the additive tree |
part.cp |
[gS] Numeric: Split must decrease complexity but at least this much to be considered |
part.minbucket |
[gS] Integer: Minimum number of observations allowed in child node to allow splitting |
init |
Initial value. Default = |
verbose |
Logical: If TRUE, print summary to screen. |
plot.tuning |
Logical: If TRUE, plot validation error during gridsearch |
trace |
Integer: If higher than 0, will print more information to the console. |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
.gs |
internal use only |
The Linear Additive Tree trains a tree using a sequence of regularized
linear models and splits. We specify an upper threshold of leaves using
max.leaves
instead of directly defining a number, because depending
on the other parameters and the datasets, splitting may stop early.
[gS] indicates tunable hyperparameters that can accept a vector of possible values
E.D. Gennatas
Train a Linear Optimized Additive Tree
s_LINOA( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, max.leaves = 8, learning.rate = 0.5, select.leaves.smooth = TRUE, force.max.leaves = NULL, lookback = TRUE, gamma = 0, n.quantiles = 20, minobsinnode = NULL, minbucket = NULL, lin.type = c("forwardStepwise", "glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", "none"), alpha = 1, lambda = 0.05, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", nbest = 1, nvmax = 3, .rho = TRUE, rho.max = 1000, init = NULL, metric = "auto", maximize = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", save.gridrun = FALSE, grid.verbose = verbose, keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, splitline.cores = 1, .preprocess = NULL, plot.tuning = TRUE, verbose.predict = FALSE, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE, .gs = FALSE, verbose = TRUE, trace = 1 )
s_LINOA( x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, max.leaves = 8, learning.rate = 0.5, select.leaves.smooth = TRUE, force.max.leaves = NULL, lookback = TRUE, gamma = 0, n.quantiles = 20, minobsinnode = NULL, minbucket = NULL, lin.type = c("forwardStepwise", "glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "backwardStepwise", "glm", "solve", "none"), alpha = 1, lambda = 0.05, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = "lambda.min", nbest = 1, nvmax = 3, .rho = TRUE, rho.max = 1000, init = NULL, metric = "auto", maximize = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", save.gridrun = FALSE, grid.verbose = verbose, keep.x = FALSE, simplify = TRUE, cxrcoef = FALSE, n.cores = rtCores, splitline.cores = 1, .preprocess = NULL, plot.tuning = TRUE, verbose.predict = FALSE, x.name = NULL, y.name = NULL, question = NULL, outdir = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, save.mod = FALSE, .gs = FALSE, verbose = TRUE, trace = 1 )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
max.leaves |
Integer: Maximum number of terminal nodes to grow |
lookback |
Logical: If TRUE, check validation error to decide when to stop growing tree. Default = FALSE |
minobsinnode |
Integer: Minimum N observations needed in node, before considering splitting |
lambda |
Float: lambda parameter for |
nvmax |
[gS] Integer: Number of max features to use for lin.type "allSubsets", "forwardStepwise", or
"backwardStepwise". If values greater than n of features in |
init |
Initial value. Default = |
plot.tuning |
Logical: If TRUE, plot validation error during gridsearch |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
question |
Character: the question you are attempting to answer with this model, in plain language. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
.gs |
internal use only |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
The Linear Optimized Additive Tree grows a tree by finding splits that minimize loss after linear
models are fit on each child.
We specify an upper threshold of leaves using max.leaves
instead of directly defining a number,
because depending on the other parameters and the datasets, splitting may stop early.
E.D. Gennatas
Fit a linear model and validate it. Options include base lm()
, robust linear model using
MASS:rlm()
, generalized least squares using nlme::gls
, or polynomial regression
using stats::poly
to transform features
s_LM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, intercept = TRUE, robust = FALSE, gls = FALSE, polynomial = FALSE, poly.d = 3, poly.raw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, intercept = TRUE, robust = FALSE, gls = FALSE, polynomial = FALSE, poly.d = 3, poly.raw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, na.action = na.exclude, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
intercept |
Logical: If TRUE, fit an intercept term. |
robust |
Logical: if TRUE, use |
gls |
Logical: if TRUE, use |
polynomial |
Logical: if TRUE, run lm on |
poly.d |
Integer: degree of polynomial |
poly.raw |
Logical: if TRUE, use raw polynomials. Default, which should not really be changed is FALSE |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
na.action |
How to handle missing values. See |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical. If TRUE, save all output as RDS file in |
... |
Additional arguments to be passed to |
GLS can be useful in place of a standard linear model, when there is correlation among
the residuals and/or they have unequal variances.
Warning: nlme
's implementation is buggy, and predict
will not work
because of environment problems, which means it fails to get predicted values if
x.test
is provided.
robut = TRUE
trains a robust linear model using MASS::rlm
.
gls = TRUE
trains a generalized least squares model using nlme::gls
.
rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) / 2 mod <- s_LM(x, y)
x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) / 2 mod <- s_LM(x, y)
Train a LMTree for regression or classification using partykit::lmtree
s_LMTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, offset = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.exclude, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LMTree( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, offset = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, na.action = na.exclude, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Numeric vector: Weights for cases. For classification, |
offset |
Numeric vector of a priori known offsets |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
na.action |
Character: How to handle missing values. See |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments passed to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Interpretable models:
s_AddTree()
,
s_C50()
,
s_CART()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
Fits a LOESS curve or surface
s_LOESS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_LOESS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to |
A maximum of 4 features are allowed in this implementation (stats::loess
)
The main use for this algorithm would be fitting curves in bivariate plots,
where GAM or similar is preferable anyway. It is included in rtemis mainly for academic purposes -
not for building predictive models.
Object of class rtemis
E.D. Gennatas
Convenience alias for s_GLM(family = binomial(link = "logit"))
.
s_LOGISTIC( x, y, x.test = NULL, y.test = NULL, family = binomial(link = "logit"), ... )
s_LOGISTIC( x, y, x.test = NULL, y.test = NULL, family = binomial(link = "logit"), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
family |
Error distribution and link function. See |
... |
Additional arguments |
Trains a MARS model using earth::earth
.
[gS] in Arguments description indicates that hyperparameter will be tuned if more than one value are provided
For more info on algorithm hyperparameters, see ?earth::earth
s_MARS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, glm = NULL, degree = 2, penalty = 3, nk = NULL, thresh = 0, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 2, fast.beta = 1, linpreds = FALSE, pmethod = "forward", nprune = NULL, nfold = 4, ncross = 1, stratify = TRUE, wp = NULL, na.action = na.fail, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
s_MARS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, glm = NULL, degree = 2, penalty = 3, nk = NULL, thresh = 0, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 2, fast.beta = 1, linpreds = FALSE, pmethod = "forward", nprune = NULL, nfold = 4, ncross = 1, stratify = TRUE, wp = NULL, na.action = na.fail, metric = NULL, maximize = NULL, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
(Optional) Numeric vector or matrix of validation set features
must have set of columns as |
y.test |
(Optional) Numeric vector of validation set outcomes |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
glm |
List of parameters to pass to glm |
degree |
[gS] Integer: Maximum degree of interaction. Default = 2 |
penalty |
[gS] Float: GCV penalty per knot. 0 penalizes only terms, not knots. -1 means no penalty. Default = 3 |
nk |
[gS] Integer: Maximum number of terms created by the forward pass.
See |
thresh |
[gS] Numeric: Forward stepping threshold. Forward pass terminates if RSq reduction is less than this. |
minspan |
Numeric: Minimum span of the basis functions. Default = 0 |
pmethod |
[gS] Character: Pruning method: "backward", "none", "exhaustive", "forward", "seqrep", "cv". Default = "forward" |
nprune |
[gS] Integer: Max N of terms (incl. intercept) in the pruned model |
na.action |
How to handle missing values. See |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional parameters to pass to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an MLlib Random Forest model on Spark
s_MLRF( x, y = NULL, x.test = NULL, y.test = NULL, upsample = FALSE, downsample = FALSE, resample.seed = NULL, n.trees = 500L, max.depth = 30L, subsampling.rate = 1, min.instances.per.node = 1, feature.subset.strategy = "auto", max.bins = 32L, x.name = NULL, y.name = NULL, spark.master = "local", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_MLRF( x, y = NULL, x.test = NULL, y.test = NULL, upsample = FALSE, downsample = FALSE, resample.seed = NULL, n.trees = 500L, max.depth = 30L, subsampling.rate = 1, min.instances.per.node = 1, feature.subset.strategy = "auto", max.bins = 32L, x.name = NULL, y.name = NULL, spark.master = "local", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
vector, matrix or dataframe of training set features |
y |
vector of outcomes |
x.test |
vector, matrix or dataframe of testing set features |
y.test |
vector of testing set outcomes |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
n.trees |
Integer: Number of trees to train |
max.depth |
Integer: Max depth of each tree |
subsampling.rate |
Numeric: Fraction of cases to use for training each tree |
min.instances.per.node |
Integer: Min N of cases per node. |
feature.subset.strategy |
Character: The number of features to consider for splits at each tree node. Supported options: "auto" (choose automatically for task: If numTrees == 1, set to "all." If numTrees > 1 (forest), set to "sqrt" for classification and to "onethird" for regression), "all" (use all features), "onethird" (use 1/3 of the features), "sqrt" (use sqrt(number of features)), "log2" (use log2(number of features)), "n": (when n is in the range (0, 1.0], use n * number of features. When n is in the range (1, number of features), use n features). Default is "auto". |
max.bins |
Integer. Max N of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
spark.master |
Spark cluster URL or "local" |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
The overhead incurred by Spark means this is best used for larged datasets on a Spark cluster.
See also: Spark MLLib documentation
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Convenience alias for s_GLM(class.method = "multinom")
.
s_MULTINOM(x, y, x.test = NULL, y.test = NULL, class.method = "multinom", ...)
s_MULTINOM(x, y, x.test = NULL, y.test = NULL, class.method = "multinom", ...)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
class.method |
Character: Define "logistic" or "multinom" for classification. The only purpose
of this is so you can try |
... |
Additional arguments |
Train a Naive Bayes Classifier using e1071::naiveBayes
s_NBayes( x, y = NULL, x.test = NULL, y.test = NULL, laplace = 0, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_NBayes( x, y = NULL, x.test = NULL, y.test = NULL, laplace = 0, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
laplace |
Float (>0): Laplace smoothing. Default = 0 (no smoothing). This only affects categorical features |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
The laplace
argument only affects categorical predictors
rtMod
object
E.D. Gennatas
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an equivalent of a 1 hidden unit neural network with a defined nonlinear activation function
using optim
s_NLA( x, y = NULL, x.test = NULL, y.test = NULL, activation = softplus, b_o = mean(y), W_o = 1, b_h = 0, W_h = 0.01, optim.method = "BFGS", control = list(), x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_NLA( x, y = NULL, x.test = NULL, y.test = NULL, activation = softplus, b_o = mean(y), W_o = 1, b_h = 0, W_h = 0.01, optim.method = "BFGS", control = list(), x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
activation |
Function: Activation function to use. Default = softplus |
b_o |
Float, vector (length y): Output bias. Defaults to |
W_o |
Float: Output weight. Defaults to 1 |
b_h |
Float: Hidden layer bias. Defaults to 0 |
W_h |
Float, vector (length |
optim.method |
Character: Optimization method to use: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B",
"SANN", "Brent". See |
control |
List: Control parameters passed to |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If > 0, print model summary. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
Since we are using optim
, results will be sensitive to the combination of
optimizer method (See optim::method
for details),
initialization values, and activation function.
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Build a NLS model
s_NLS( x, y = NULL, x.test = NULL, y.test = NULL, formula = NULL, weights = NULL, start = NULL, control = nls.control(maxiter = 200), .type = NULL, default.start = 0.1, algorithm = "default", nls.trace = FALSE, x.name = NULL, y.name = NULL, save.func = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, verbosity = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_NLS( x, y = NULL, x.test = NULL, y.test = NULL, formula = NULL, weights = NULL, start = NULL, control = nls.control(maxiter = 200), .type = NULL, default.start = 0.1, algorithm = "default", nls.trace = FALSE, x.name = NULL, y.name = NULL, save.func = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, verbosity = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
formula |
Formula for the model. If NULL, a model is built with all predictors. |
weights |
Numeric vector: Weights for cases. For classification, |
start |
List of starting values for the parameters in the model. |
control |
Control parameters for |
.type |
Type of model to build. If NULL, a linear model is built. If "sig", a sigmoid model is built. |
default.start |
Numeric: Default starting value for all parameters |
algorithm |
Character: Algorithm to use for |
nls.trace |
Logical: If TRUE, trace information is printed during the optimization process. |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
save.func |
Logical: If TRUE, save model as character string |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
verbosity |
Integer: If > 0, print model summary |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Computes a kernel regression estimate using np::npreg()
s_NW( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, bw = NULL, plot.bw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_NW( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, bw = NULL, plot.bw = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
bw |
Bandwidth as calculate by |
plot.bw |
Logical. Plot bandwidth selector results |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If higher than 0, will print more information to the console. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional parameters to be passed to |
np::npreg
allows inputs with mixed data types.
NW automatically models interactions, like PPR, but the latter is a lot faster
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
## Not run: x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) mod <- s_NW(x, y) ## End(Not run)
## Not run: x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) mod <- s_NW(x, y) ## End(Not run)
Convenience alias for s_GLM(polynomial = T)
.
Substitutes all features with poly(x, poly.d)
s_POLY(x, y, x.test = NULL, y.test = NULL, poly.d = 3, poly.raw = FALSE, ...)
s_POLY(x, y, x.test = NULL, y.test = NULL, poly.d = 3, poly.raw = FALSE, ...)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
poly.d |
Integer: degree of polynomial(s) to use |
poly.raw |
Logical: if TRUE, use raw polynomials. Defaults to FALSE, resulting in
orthogonal polynomials. See |
... |
Additional arguments |
Trains a POLYMARS model using polspline::polymars
and validates it
s_PolyMARS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, maxsize = ceiling(min(6 * (nrow(x)^{ 1/3 }), nrow(x)/4, 100)), n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
s_PolyMARS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, maxsize = ceiling(min(6 * (nrow(x)^{ 1/3 }), nrow(x)/4, 100)), n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
(Optional) Numeric vector or matrix of validation set features
must have set of columns as |
y.test |
(Optional) Numeric vector of validation set outcomes |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
maxsize |
Integer: Maximum number of basis functions to use |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional parameters to pass to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a Projection Pursuit Regression model
s_PPR( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, weights = NULL, nterms = NULL, max.terms = nterms, optlevel = 3, sm.method = "spline", bass = 0, span = 0, df = 5, gcvpen = 1, metric = "MSE", maximize = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_PPR( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.grid.resample(), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, weights = NULL, nterms = NULL, max.terms = nterms, optlevel = 3, sm.method = "spline", bass = 0, span = 0, df = 5, gcvpen = 1, metric = "MSE", maximize = FALSE, n.cores = rtCores, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
weights |
Numeric vector: Weights for cases. For classification, |
nterms |
[gS] Integer: number of terms to include in the final model |
max.terms |
Integer: maximum number of terms to consider in the model |
optlevel |
[gS] Integer [0, 3]: optimization level (Default = 3).
See Details in |
sm.method |
[gS] Character: "supsmu", "spline", or "gcvspline". Smoothing method. Default = "spline" |
bass |
[gS] Numeric [0, 10]: for |
span |
[gS] Numeric [0, 1]: for |
df |
[gS] Numeric: for |
gcvpen |
[gs] Numeric: for |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
n.cores |
Integer: Number of cores to use. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
Integer: If greater than 0, print additional information to console |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
[gS]: If more than one value is passed, parameter tuning via grid search will be performed on resamples of the training set prior to training model on full training set Interactions: PPR automatically models interactions, no need to specify them
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Fit a parametric survival regression model using survival::survreg
s_PSurv( x, y, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, dist = "weibull", control = survival::survreg.control(), print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
s_PSurv( x, y, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, weights = NULL, dist = "weibull", control = survival::survreg.control(), print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Object of class "Surv" created using |
x.test |
(Optional) Numeric vector or matrix of testing set features
must have set of columns as |
y.test |
(Optional) Object of class "Surv" created using |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
weights |
Float: Vector of case weights |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
... |
Additional parameters to pass to |
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Train a QDA Classifier using MASS::qda
s_QDA( x, y = NULL, x.test = NULL, y.test = NULL, prior = NULL, method = "moment", nu = NULL, x.name = NULL, y.name = NULL, upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_QDA( x, y = NULL, x.test = NULL, y.test = NULL, prior = NULL, method = "moment", nu = NULL, x.name = NULL, y.name = NULL, upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
prior |
Numeric, vector (length = N classes of outcome variable): Prior probabilities |
method |
Character: "moment", "mle", "mve", or "t". See |
nu |
Integer: Degrees of freedom for methdo "t" |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an ensemble of Neural Networks to perform Quantile Regression using qrnn
s_QRNN( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.hidden = 1, tau = 0.5, n.ensemble = 5, iter.max = 5000, n.trials = 5, bag = TRUE, lower = -Inf, eps.seq = 2^(-8:-32), Th = qrnn::sigmoid, Th.prime = qrnn::sigmoid.prime, penalty = 0, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_QRNN( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.hidden = 1, tau = 0.5, n.ensemble = 5, iter.max = 5000, n.trials = 5, bag = TRUE, lower = -Inf, eps.seq = 2^(-8:-32), Th = qrnn::sigmoid, Th.prime = qrnn::sigmoid.prime, penalty = 0, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
Integer: Number of hidden nodes. |
|
tau |
Numeric: tau-quantile. |
n.ensemble |
Integer: Number of NNs to train. |
iter.max |
Integer: Max N of iteration of the optimization algorithm. |
n.trials |
Integer: N of trials. Used to avoid local minima. |
bag |
Logical: If TRUE, use bagging. |
lower |
Numeric: Left censoring point. |
eps.seq |
Numeric: sequence of eps values for the finite smoothing algorithm. |
Th |
Function: hidden layer transfer function; use |
Th.prime |
Function: derivative of hidden layer transfer function. |
penalty |
Numeric: weight penalty for weight decay regularization. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
For more details on hyperparameters, see qrnn::qrnn.fit
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train a Random Forest for regression or classification using ranger
s_Ranger( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, weights = NULL, ifw = TRUE, ifw.type = 2, ifw.case.weights = TRUE, ifw.class.weights = FALSE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, autotune = FALSE, classwt = NULL, n.trees.try = 500, stepFactor = 2, mtry = NULL, mtryStart = NULL, inbag.resample = NULL, stratify.on.y = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, probability = NULL, importance = "impurity", local.importance = FALSE, replace = TRUE, min.node.size = NULL, splitrule = NULL, strata = NULL, sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)), tune.do.trace = FALSE, imetrics = FALSE, n.cores = rtCores, print.tune.plot = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, grid.verbose = verbose, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_Ranger( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, weights = NULL, ifw = TRUE, ifw.type = 2, ifw.case.weights = TRUE, ifw.class.weights = FALSE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, autotune = FALSE, classwt = NULL, n.trees.try = 500, stepFactor = 2, mtry = NULL, mtryStart = NULL, inbag.resample = NULL, stratify.on.y = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, probability = NULL, importance = "impurity", local.importance = FALSE, replace = TRUE, min.node.size = NULL, splitrule = NULL, strata = NULL, sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)), tune.do.trace = FALSE, imetrics = FALSE, n.cores = rtCores, print.tune.plot = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, grid.verbose = verbose, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
n.trees |
Integer: Number of trees to grow. Default = 1000 |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
ifw.case.weights |
Logical: If TRUE, define ranger's
|
ifw.class.weights |
Logical: If TRUE, define ranger's
|
upsample |
Logical: If TRUE, upsample training set cases not belonging in majority outcome group |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
autotune |
Logical: If TRUE, use |
classwt |
Vector, Float: Priors of the classes for
|
n.trees.try |
Integer: Number of trees to train for tuning, if |
stepFactor |
Float: If |
mtry |
[gS] Integer: Number of features sampled randomly at each split. Defaults to square root of n of features for classification, and a third of n of features for regression. |
mtryStart |
Integer: If |
inbag.resample |
List, length |
stratify.on.y |
Logical: If TRUE, overrides |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
probability |
Logical: If TRUE, grow a probability forest.
See |
importance |
Character: "none", "impurity", "impurity_corrected", or "permutation" Default = "impurity" |
local.importance |
Logical: If TRUE, return local importance values.
Only applicable if
|
replace |
Logical: If TRUE, sample cases with replacement during training. |
min.node.size |
[gS] Integer: Minimum node size |
splitrule |
Character: For classification: "gini" (Default) or "extratrees"; For regression: "variance" (Default), "extratrees" or "maxstat". For survival "logrank" (Default), "extratrees", "C" or "maxstat". |
strata |
Vector, Factor: Will be used for stratified sampling |
sampsize |
Integer: Size of sample to draw. In Classification, if |
tune.do.trace |
Same as |
imetrics |
Logical: If TRUE, calculate interpretability metrics
(N of trees and N of nodes) and save under the |
n.cores |
Integer: Number of cores to use. |
print.tune.plot |
Logical: passed to |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
grid.verbose |
Logical: Passed to |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
String, Optional: Path to directory to save output |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
You should cconsider, or try, setting mtry to NCOL(x), especially for small number of features.
By default mtry is set to NCOL(x) for NCOL(x) <= 20.
For imbalanced datasets, setting stratify.on.y = TRUE should improve performance.
If autotune = TRUE
, randomForest::tuneRF
will be run to determine best mtry
value.
[gS]: indicated parameter will be tuned by grid search if more than one value is passed
See Tech Report comparing balanced (ifw.case.weights = TRUE) and weighted (ifw.class.weights = TRUE) Random Forests.
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_XGBoost()
,
s_XRF()
Other Ensembles:
s_AdaBoost()
,
s_GBM()
,
s_RF()
Train a Random Forest for regression or classification using randomForest
s_RF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, autotune = FALSE, n.trees.try = 1000, stepFactor = 1.5, mtry = NULL, nodesize = NULL, maxnodes = NULL, mtryStart = mtry, grid.resample.params = setup.resample("kfold", 5), metric = NULL, maximize = NULL, classwt = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, importance = TRUE, proximity = FALSE, replace = TRUE, strata = NULL, sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)), sampsize.ratio = NULL, do.trace = NULL, tune.do.trace = FALSE, imetrics = FALSE, n.cores = rtCores, print.tune.plot = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, proximity.tsne = FALSE, discard.forest = FALSE, tsne.perplexity = 5, plot.tsne.train = FALSE, plot.tsne.test = FALSE, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_RF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, autotune = FALSE, n.trees.try = 1000, stepFactor = 1.5, mtry = NULL, nodesize = NULL, maxnodes = NULL, mtryStart = mtry, grid.resample.params = setup.resample("kfold", 5), metric = NULL, maximize = NULL, classwt = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, importance = TRUE, proximity = FALSE, replace = TRUE, strata = NULL, sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)), sampsize.ratio = NULL, do.trace = NULL, tune.do.trace = FALSE, imetrics = FALSE, n.cores = rtCores, print.tune.plot = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, proximity.tsne = FALSE, discard.forest = FALSE, tsne.perplexity = 5, plot.tsne.train = FALSE, plot.tsne.test = FALSE, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
n.trees |
Integer: Number of trees to grow. Default = 1000 |
autotune |
Logical: If TRUE, use |
n.trees.try |
Integer: Number of trees to train for tuning, if |
stepFactor |
Float: If |
mtry |
[gS] Integer: Number of features sampled randomly at each split |
nodesize |
[gS]: Integer: Minimum size of terminal nodes. Default = 5 (Regression); 1 (Classification) |
maxnodes |
[gS]: Integer: Maximum number of terminal nodes in a tree. Default = NULL; trees grown to maximum possible |
mtryStart |
Integer: If |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
classwt |
Vector, Float: Priors of the classes for classification only. Need not add up to 1 |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample training set cases not belonging in majority outcome group |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
importance |
Logical: If TRUE, estimate variable relative importance. |
proximity |
Logical: If TRUE, calculate proximity measure among cases. |
replace |
Logical: If TRUE, sample cases with replacement during training. |
strata |
Vector, Factor: Will be used for stratified sampling |
sampsize |
Integer: Size of sample to draw. In Classification, if |
sampsize.ratio |
Float (0, 1): Heuristic of sorts to increase sensitivity in unbalanced
cases. Sample with replacement from minority case to create bootstraps of length N cases.
Select |
do.trace |
Logical or integer: If TRUE, |
tune.do.trace |
Same as |
imetrics |
Logical: If TRUE, calculate interpretability metrics
(N of trees and N of nodes) and save under the |
n.cores |
Integer: Number of cores to use. |
print.tune.plot |
Logical: passed to |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
proximity.tsne |
Logical: If TRUE, perform t-SNE on proximity matrix. Will be saved under 'extra' field of
|
discard.forest |
Logical: If TRUE, remove forest from |
tsne.perplexity |
Numeric: Perplexity parameter for |
plot.tsne.train |
Logical: If TRUE, plot training set tSNE projections |
plot.tsne.test |
Logical: If TRUE, plot testing set tSNE projections |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
outdir |
String, Optional: Path to directory to save output |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
If autotue = TRUE
, randomForest::tuneRF
will be run to determine best mtry
value.
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Other Ensembles:
s_AdaBoost()
,
s_GBM()
,
s_Ranger()
Train a Random Forest for Regression, Classification, or Survival Regression
using randomForestSRC
s_RFSRC( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, bootstrap = "by.root", mtry = NULL, importance = TRUE, proximity = TRUE, nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, nodedepth = NULL, na.action = "na.impute", trace = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_RFSRC( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, n.trees = 1000, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, bootstrap = "by.root", mtry = NULL, importance = TRUE, proximity = TRUE, nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, nodedepth = NULL, na.action = "na.impute", trace = FALSE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix of features, i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
(Optional) Numeric vector or matrix of validation set features
must have set of columns as |
y.test |
(Optional) Numeric vector of validation set outcomes |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
n.trees |
Integer: Number of trees to grow. The more the merrier. |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
bootstrap |
Character: |
mtry |
Integer: Number of features sampled randomly at each split |
importance |
Logical: If TRUE, calculate variable importance. |
proximity |
Character or Logical: "inbag", "oob", "all", TRUE, or FALSE; passed
to |
nodesize |
Integer: Minimum size of terminal nodes. |
nodedepth |
Integer: Maximum tree depth. |
na.action |
Character: How to handle missing values. |
trace |
Integer: Number of seconds between messages to the console. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Optional. Path to directory to save output |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
For Survival Regression, y must be an object of type Surv
, created using
survival::Surv(time, status)
mtry
is the only tunable parameter, but it usually only makes a small difference
and is often not tuned.
Object of class rtMod
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_Ranger()
,
s_XGBoost()
,
s_XRF()
Convenience alias for s_LM(robust = T)
. Uses MASS::rlm
s_RLM(x, y, x.test = NULL, y.test = NULL, ...)
s_RLM(x, y, x.test = NULL, y.test = NULL, ...)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
... |
Additional parameters to be passed to |
Train a gradient boosting model, extract rules, and fit using LASSO
s_RuleFit( x, y = NULL, x.test = NULL, y.test = NULL, gbm.params = list(list(n.trees = 300, bag.fraction = 1, shrinkage = 0.1, interaction.depth = 3, ifw = TRUE)), meta.alpha = 1, meta.lambda = NULL, meta.extra.params = list(ifw = TRUE), cases.by.rules = NULL, x.name = NULL, y.name = NULL, n.cores = rtCores, question = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, outdir = NULL, save.mod = if (!is.null(outdir)) TRUE else FALSE, verbose = TRUE )
s_RuleFit( x, y = NULL, x.test = NULL, y.test = NULL, gbm.params = list(list(n.trees = 300, bag.fraction = 1, shrinkage = 0.1, interaction.depth = 3, ifw = TRUE)), meta.alpha = 1, meta.lambda = NULL, meta.extra.params = list(ifw = TRUE), cases.by.rules = NULL, x.name = NULL, y.name = NULL, n.cores = rtCores, question = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, outdir = NULL, save.mod = if (!is.null(outdir)) TRUE else FALSE, verbose = TRUE )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
gbm.params |
List of named lists: A list, each element of which is a named list
of parameters for s_GBM. i.e. If you want to train a single GBM model, this could
be:
|
meta.alpha |
Float [0, 1]: |
meta.lambda |
Float: |
meta.extra.params |
Named list: Parameters for s_GLMNET for the feature selection step |
cases.by.rules |
Matrix of cases by rules from a previoue rulefit run. If provided, the GBM step is skipped. Default = NULL |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
n.cores |
Integer: Number of cores to use |
question |
Character: the question you are attempting to answer with this model, in plain language. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
outdir |
Character: If defined, save log, 'plot.all' plots (see above) and RDS file of complete output |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
verbose |
Logical: If TRUE, print summary to screen. |
Based on "Predictive Learning via Rule Ensembles" by Friedman and Popescu http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf
rtMod
object
E.D. Gennatas
Friedman JH, Popescu BE, "Predictive Learning via Rule Ensembles", http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf
Train an SDA Classifier using sparseLDA::sda
s_SDA( x, y = NULL, x.test = NULL, y.test = NULL, lambda = 1e-06, stop = NULL, maxIte = 100, Q = NULL, tol = 1e-06, .preprocess = setup.preprocess(scale = TRUE, center = TRUE), upsample = TRUE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, trace = 0, outdir = NULL, n.cores = rtCores, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
s_SDA( x, y = NULL, x.test = NULL, y.test = NULL, lambda = 1e-06, stop = NULL, maxIte = 100, Q = NULL, tol = 1e-06, .preprocess = setup.preprocess(scale = TRUE, center = TRUE), upsample = TRUE, downsample = FALSE, resample.seed = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = verbose, trace = 0, outdir = NULL, n.cores = rtCores, save.mod = ifelse(!is.null(outdir), TRUE, FALSE) )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
lambda |
L2-norm weight for elastic net regression |
stop |
If STOP is negative, its absolute value corresponds to the desired number of variables. If STOP is positive, it corresponds to an upper bound on the L1-norm of the b coefficients. There is a one to one correspondence between stop and t. The default is -p (-the number of variables). |
maxIte |
Integer: Maximum number of iterations |
Q |
Integer: Number of components |
tol |
Numeric: Tolerance for change in RSS, which is the stopping criterion |
.preprocess |
List of preprocessing parameters. Scaling and centering is enabled by default, because it is crucial for algorithm to learn. |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
trace |
Integer: passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
n.cores |
Integer: Number of cores to use. |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
## Not run: datc2 <- iris[51:150, ] datc2$Species <- factor(datc2$Species) resc2 <- resample(datc2) datc2_train <- datc2[resc2$Subsample_1, ] datc2_test <- datc2[-resc2$Subsample_1, ] # Without scaling or centering, fails to learn mod_c2 <- s_SDA(datc2_train, datc2_test, .preprocess = NULL) # Learns fine with default settings (scaling & centering) mod_c2 <- s_SDA(datc2_train, datc2_test) ## End(Not run)
## Not run: datc2 <- iris[51:150, ] datc2$Species <- factor(datc2$Species) resc2 <- resample(datc2) datc2_train <- datc2[resc2$Subsample_1, ] datc2_test <- datc2[-resc2$Subsample_1, ] # Without scaling or centering, fails to learn mod_c2 <- s_SDA(datc2_train, datc2_test, .preprocess = NULL) # Learns fine with default settings (scaling & centering) mod_c2 <- s_SDA(datc2_train, datc2_test) ## End(Not run)
Train a model by Stochastic Gradient Descent using sgd::sgd
s_SGD( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, model = NULL, model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_SGD( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, model = NULL, model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd"), upsample = FALSE, downsample = FALSE, resample.seed = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
model |
character specifying the model to be used: |
model.control |
a list of parameters for controlling the model.
|
sgd.control |
an optional list of parameters for controlling the estimation.
|
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
From sgd::sgd
:
"Models: The Cox model assumes that the survival data is ordered when passed in, i.e.,
such that the risk set of an observation i is all data points after it."
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an SPLS model using spls::spls
(Regression) and spls::splsda
(Classification)
s_SPLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, upsample = TRUE, downsample = FALSE, resample.seed = NULL, k = 2, eta = 0.5, kappa = 0.5, select = "pls2", fit = "simpls", scale.x = TRUE, scale.y = TRUE, maxstep = 100, classifier = c("lda", "logistic"), grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), n.cores = rtCores, ... )
s_SPLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, upsample = TRUE, downsample = FALSE, resample.seed = NULL, k = 2, eta = 0.5, kappa = 0.5, select = "pls2", fit = "simpls", scale.x = TRUE, scale.y = TRUE, maxstep = 100, classifier = c("lda", "logistic"), grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, metric = NULL, maximize = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, trace = 0, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), n.cores = rtCores, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
k |
[gS] Integer: Number of components to estimate. |
eta |
[gS] Float [0, 1): Thresholding parameter. |
kappa |
[gS] Float [0, .5]: Only relevant for multivariate responses: controls effect of concavity of objective function. |
select |
[gS] Character: "pls2", "simpls". PLS algorithm for variable selection. |
fit |
[gS] Character: "kernelpls", "widekernelpls", "simpls", "oscorespls". Algorithm for model fitting. |
scale.x |
Logical: if TRUE, scale features by dividing each column by its sample standard deviation |
scale.y |
Logical: if TRUE, scale outcomes by dividing each column by its sample standard deviation |
maxstep |
[gS] Integer: Maximum number of iteration when fitting direction vectors. |
classifier |
Character: Classifier used by |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
trace |
If > 0 print diagnostic messages |
grid.verbose |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
n.cores |
Integer: Number of cores to be used by
|
... |
Additional parameters to be passed to |
[gS] denotes argument can be passed as a vector of values, which will trigger
a grid search using gridSearchLearn
np::npreg
allows inputs
with mixed data types.
Object of class rtemis
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
## Not run: x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) mod <- s_SPLS(x, y) ## End(Not run)
## Not run: x <- rnorm(100) y <- .6 * x + 12 + rnorm(100) mod <- s_SPLS(x, y) ## End(Not run)
Train an SVM learner using e1071::svm
s_SVM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, class.weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, kernel = "radial", degree = 3, gamma = NULL, coef0 = 0, cost = 1, probability = TRUE, metric = NULL, maximize = NULL, plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, plot.theme = rtTheme, n.cores = rtCores, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_SVM( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = c("exhaustive", "randomized"), gridsearch.randomized.p = 0.1, class.weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, kernel = "radial", degree = 3, gamma = NULL, coef0 = 0, cost = 1, probability = TRUE, metric = NULL, maximize = NULL, plot.fitted = NULL, plot.predicted = NULL, print.plot = FALSE, plot.theme = rtTheme, n.cores = rtCores, question = NULL, verbose = TRUE, grid.verbose = verbose, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
gridsearch.randomized.p |
Float (0, 1): If
|
class.weights |
Float, length = n levels of outcome: Weights for each
outcome class.For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
kernel |
Character: "linear", "polynomial", "radial", "sigmoid" |
degree |
[gS] Integer: Degree for |
gamma |
[gS] Float: Parameter used in all kernels except |
coef0 |
[gS] Float: Parameter used by kernels |
cost |
[gS] Float: Cost of constraints violation; the C constant of the regularization term in the Lagrange formulation. |
probability |
Logical: If TRUE, model allows probability estimates |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
print.plot |
Logical: if TRUE, produce plot using |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
n.cores |
Integer: Number of cores to use. |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments to be passed to |
[gS] denotes parameters that will be tuned by cross-validation if more than one value is passed. Regarding SVM tuning, the following guide from the LIBSVM authors can be useful: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf They suggest searching for cost = 2 ^ seq(-5, 15, 2) and gamma = 2 ^ seq(-15, 3, 2)
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_TFN()
,
s_XGBoost()
,
s_XRF()
Train an Feedforward Neural Network using keras and tensorflow
s_TFN( x, y = NULL, x.test = NULL, y.test = NULL, class.weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, net = NULL, n.hidden.nodes = NULL, initializer = c("glorot_uniform", "glorot_normal", "he_uniform", "he_normal", "lecun_uniform", "lecun_normal", "random_uniform", "random_normal", "variance_scaling", "truncated_normal", "orthogonal", "zeros", "ones", "constant"), initializer.seed = NULL, dropout = 0, activation = c("relu", "selu", "elu", "sigmoid", "hard_sigmoid", "tanh", "exponential", "linear", "softmax", "softplus", "softsign"), kernel_l1 = 0.1, kernel_l2 = 0, activation_l1 = 0, activation_l2 = 0, batch.normalization = TRUE, output = NULL, loss = NULL, optimizer = c("rmsprop", "adadelta", "adagrad", "adam", "adamax", "nadam", "sgd"), learning.rate = NULL, metric = NULL, epochs = 100, batch.size = NULL, validation.split = 0.2, callback = keras::callback_early_stopping(patience = 150), scale = TRUE, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_TFN( x, y = NULL, x.test = NULL, y.test = NULL, class.weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, net = NULL, n.hidden.nodes = NULL, initializer = c("glorot_uniform", "glorot_normal", "he_uniform", "he_normal", "lecun_uniform", "lecun_normal", "random_uniform", "random_normal", "variance_scaling", "truncated_normal", "orthogonal", "zeros", "ones", "constant"), initializer.seed = NULL, dropout = 0, activation = c("relu", "selu", "elu", "sigmoid", "hard_sigmoid", "tanh", "exponential", "linear", "softmax", "softplus", "softsign"), kernel_l1 = 0.1, kernel_l2 = 0, activation_l1 = 0, activation_l2 = 0, batch.normalization = TRUE, output = NULL, loss = NULL, optimizer = c("rmsprop", "adadelta", "adagrad", "adam", "adamax", "nadam", "sgd"), learning.rate = NULL, metric = NULL, epochs = 100, batch.size = NULL, validation.split = 0.2, callback = keras::callback_early_stopping(patience = 150), scale = TRUE, x.name = NULL, y.name = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
class.weights |
Numeric vector: Class weights for training. |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
net |
Pre-defined keras network to be trained (optional) |
Integer vector: Length must be equal to the number of hidden layers you wish to create. Can be zero, in which case you get a linear model. Default = N of features, i.e. NCOL(x) |
|
initializer |
Character: Initializer to use for each layer: "glorot_uniform", "glorot_normal", "he_uniform", "he_normal", "cun_uniform", "lecun_normal", "random_uniform", "random_normal", "variance_scaling", "truncated_normal", "orthogonal", "zeros", "ones", "constant". Glorot is also known as Xavier initialization. |
initializer.seed |
Integer: Seed to use for each initializer for reproducibility. |
dropout |
Floar, vector, (0, 1): Probability of dropping nodes. Can be a vector of length equal to N of layers, otherwise will be recycled. Default = 0 |
activation |
String vector: Activation type to use: "relu", "selu", "elu", "sigmoid", "hard_sigmoid", "tanh", "exponential", "linear", "softmax", "softplus", "softsign". Defaults to "relu" for Classification and "tanh" for Regression |
kernel_l1 |
Float: l1 penalty on weights. |
kernel_l2 |
Float: l2 penalty on weights. |
activation_l1 |
Float: l1 penalty on layer output. |
activation_l2 |
Float: l2 penalty on layer output. |
batch.normalization |
Logical: If TRUE, batch normalize after each hidden layer. |
output |
Character: Activation to use for output layer. Can be any as in |
loss |
Character: Loss to use: Default = "mean_squared_error" for regression, "binary_crossentropy" for binary classification, "sparse_categorical_crossentropy" for multiclass |
optimizer |
Character: Optimization to use: "rmsprop", "adadelta", "adagrad", "adam", "adamax", "nadam", "sgd". Default = "rmsprop" |
learning.rate |
Float: learning rate. Defaults depend on |
metric |
Character: Metric used for evaluation during train. Default = "mse" for regression, "accuracy" for classification. |
epochs |
Integer: Number of epochs. Default = 100 |
batch.size |
Integer: Batch size. Default = N of cases |
validation.split |
Float (0, 1): proportion of training data to use for validation. Default = .2 |
callback |
Function to be called by keras during fitting.
Default = |
scale |
Logical: If TRUE, scale featues before training.
column means and standard deviation will be saved in |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional parameters |
For more information on arguments and hyperparameters, see (https://keras.rstudio.com/) and (https://keras.io/) It is important to define network structure and adjust hyperparameters based on your problem. You cannot expect defaults to work on any given dataset.
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_XGBoost()
,
s_XRF()
Other Deep Learning:
d_H2OAE()
,
s_H2ODL()
A minimal function to perform total least squares regression
s_TLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = "x", y.name = "y", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_TLS( x, y = NULL, x.test = NULL, y.test = NULL, x.name = "x", y.name = "y", print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
The main differences between a linear model and TLS is that the latter assumes error in the features as well as the outcome. The solution is essentially the projection on the first principal axis.
E.D. Gennatas
Tune hyperparameters using grid search and resampling, train a final model, and validate it
s_XGBoost( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, booster = c("gbtree", "gblinear", "dart"), missing = NA, nrounds = 1000L, force.nrounds = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, obj = NULL, feval = NULL, xgb.verbose = NULL, print_every_n = 100L, early_stopping_rounds = 50L, eta = 0.01, gamma = 0, max_depth = 2, min_child_weight = 5, max_delta_step = 0, subsample = 0.75, colsample_bytree = 1, colsample_bylevel = 1, lambda = 0, alpha = 0, tree_method = "auto", sketch_eps = 0.03, num_parallel_tree = 1, base_score = NULL, objective = NULL, sample_type = "uniform", normalize_type = "forest", rate_drop = 0, one_drop = 0, skip_drop = 0, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, trace = 0, save.gridrun = FALSE, n.cores = 1, nthread = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), .gs = FALSE, ... )
s_XGBoost( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, booster = c("gbtree", "gblinear", "dart"), missing = NA, nrounds = 1000L, force.nrounds = NULL, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, obj = NULL, feval = NULL, xgb.verbose = NULL, print_every_n = 100L, early_stopping_rounds = 50L, eta = 0.01, gamma = 0, max_depth = 2, min_child_weight = 5, max_delta_step = 0, subsample = 0.75, colsample_bytree = 1, colsample_bylevel = 1, lambda = 0, alpha = 0, tree_method = "auto", sketch_eps = 0.03, num_parallel_tree = 1, base_score = NULL, objective = NULL, sample_type = "uniform", normalize_type = "forest", rate_drop = 0, one_drop = 0, skip_drop = 0, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = NULL, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, trace = 0, save.gridrun = FALSE, n.cores = 1, nthread = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), .gs = FALSE, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
booster |
Character: "gbtree", "gblinear": Booster to use. |
missing |
String or Numeric: Which values to consider as missing. |
nrounds |
Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error |
force.nrounds |
Integer: Number of rounds to run if not estimating optimal number by CV |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
obj |
Function: Custom objective function. See |
feval |
Function: Custom evaluation function. See |
xgb.verbose |
Integer: Verbose level for XGB learners used for tuning. |
print_every_n |
Integer: Print evaluation metrics every this many iterations |
early_stopping_rounds |
Integer: Training on resamples of |
eta |
[gS] Numeric (0, 1): Learning rate. |
gamma |
[gS] Numeric: Minimum loss reduction required to make further partition |
max_depth |
[gS] Integer: Maximum tree depth. |
min_child_weight |
[gS] Numeric: Minimum sum of instance weight needed in a child. |
max_delta_step |
[gS] Numeric: Maximum delta step we allow each leaf output to be. O means no constraint. 1-10 may help control the update, especially with imbalanced outcomes. |
subsample |
[gS] Numeric: subsample ratio of the training instance |
colsample_bytree |
[gS] Numeric: subsample ratio of columns when constructing each tree |
colsample_bylevel |
[gS] Numeric |
lambda |
[gS] L2 regularization on weights |
alpha |
[gS] L1 regularization on weights |
tree_method |
[gS] XGBoost tree construction algorithm |
sketch_eps |
[gS] Numeric (0, 1): |
num_parallel_tree |
Integer: N of trees to grow in parallel: Results in Random Forest -like algorithm. (Default = 1; i.e. regular boosting) |
base_score |
Numeric: The mean outcome response. |
objective |
(Default = NULL) |
sample_type |
Character: Type of sampling algorithm for |
normalize_type |
Character. |
rate_drop |
[gS] Numeric: Dropout rate for |
one_drop |
[gS] Integer 0, 1: When this flag is enabled, at least one tree is always dropped during the dropout. |
skip_drop |
[gS] Numeric [0, 1]: Probability of skipping the dropout
procedure during a boosting iteration. If a dropout is skipped, new trees are added
in the same manner as gbtree. Non-zero |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If TRUE, calculate variable importance. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
trace |
Integer: If > 0, print parameter values to console. |
save.gridrun |
Logical: If TRUE, save grid search models. |
n.cores |
Integer: Number of cores to use. |
nthread |
Integer: Number of threads for xgboost using OpenMP. Only parallelize resamples
using |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
.gs |
Internal use only |
... |
Additional arguments passed to |
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. Learn more about XGBoost's parameters here: http://xgboost.readthedocs.io/en/latest/parameter.html
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XRF()
Tune hyperparameters using grid search and resampling, train a final model, and validate it
s_XRF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, num_parallel_tree = 1000, booster = c("gbtree", "gblinear", "dart"), missing = NA, nrounds = 1, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, obj = NULL, feval = NULL, xgb.verbose = NULL, print_every_n = 100L, early_stopping_rounds = 50L, eta = 1, gamma = 0, max_depth = 12, min_child_weight = 1, max_delta_step = 0, subsample = 0.75, colsample_bytree = 1, colsample_bylevel = 1, lambda = 0, alpha = 0, tree_method = "auto", sketch_eps = 0.03, base_score = NULL, objective = NULL, sample_type = "uniform", normalize_type = "forest", rate_drop = 0, one_drop = 0, skip_drop = 0, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, trace = 0, save.gridrun = FALSE, n.cores = 1, nthread = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
s_XRF( x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL, y.name = NULL, num_parallel_tree = 1000, booster = c("gbtree", "gblinear", "dart"), missing = NA, nrounds = 1, weights = NULL, ifw = TRUE, ifw.type = 2, upsample = FALSE, downsample = FALSE, resample.seed = NULL, obj = NULL, feval = NULL, xgb.verbose = NULL, print_every_n = 100L, early_stopping_rounds = 50L, eta = 1, gamma = 0, max_depth = 12, min_child_weight = 1, max_delta_step = 0, subsample = 0.75, colsample_bytree = 1, colsample_bylevel = 1, lambda = 0, alpha = 0, tree_method = "auto", sketch_eps = 0.03, base_score = NULL, objective = NULL, sample_type = "uniform", normalize_type = "forest", rate_drop = 0, one_drop = 0, skip_drop = 0, .gs = FALSE, grid.resample.params = setup.resample("kfold", 5), gridsearch.type = "exhaustive", metric = NULL, maximize = NULL, importance = TRUE, print.plot = FALSE, plot.fitted = NULL, plot.predicted = NULL, plot.theme = rtTheme, question = NULL, verbose = TRUE, grid.verbose = FALSE, trace = 0, save.gridrun = FALSE, n.cores = 1, nthread = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
num_parallel_tree |
Integer: Number of trees to grow |
booster |
Character: Booster to use. Options: "gbtree", "gblinear" |
missing |
String or Numeric: Which values to consider as missing. Default = NA |
nrounds |
Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
obj |
Function: Custom objective function. See |
feval |
Function: Custom evaluation function. See |
xgb.verbose |
Integer: Verbose level for XGB learners used for tuning. |
print_every_n |
Integer: Print evaluation metrics every this many iterations |
early_stopping_rounds |
Integer: Training on resamples of |
eta |
[gS] Numeric (0, 1): Learning rate. |
gamma |
[gS] Numeric: Minimum loss reduction required to make further partition |
max_depth |
[gS] Integer: Maximum tree depth. |
min_child_weight |
[gS] Numeric: Minimum sum of instance weight needed in a child. |
max_delta_step |
[gS] Numeric: Maximum delta step we allow each leaf output to be. O means no constraint. 1-10 may help control the update, especially with imbalanced outcomes. |
subsample |
[gS] Numeric: subsample ratio of the training instance |
colsample_bytree |
[gS] Numeric: subsample ratio of columns when constructing each tree |
colsample_bylevel |
[gS] Numeric |
lambda |
[gS] L2 regularization on weights |
alpha |
[gS] L1 regularization on weights |
tree_method |
[gS] XGBoost tree construction algorithm |
sketch_eps |
[gS] Numeric (0, 1): |
base_score |
Numeric: The mean outcome response (Defaults to mean) |
objective |
(Default = NULL) |
sample_type |
Character. Default = "uniform" |
normalize_type |
Character. Default = "forest" |
rate_drop |
[gS] Numeric: Dropout rate for |
one_drop |
[gS] Integer 0, 1: When this flag is enabled, at least one tree is always dropped during the dropout. |
skip_drop |
[gS] Numeric [0, 1]: Probability of skipping the dropout
procedure during a boosting iteration. If a dropout is skipped, new trees are added
in the same manner as gbtree. Non-zero |
.gs |
Internal use only |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If TRUE, calculate variable importance. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
trace |
Integer: If higher than 0, will print more information to the console. |
save.gridrun |
Logical: If TRUE, save grid search models. |
n.cores |
Integer: Number of cores to use. |
nthread |
Integer: Number of threads for xgboost using OpenMP. Only parallelize resamples
using |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
... |
Additional arguments |
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. Learn more about XGBoost's parameters here: http://xgboost.readthedocs.io/en/latest/parameter.html
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XGBoost()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XGBoost()
Save rtemis model to PMML file
savePMML( x, filename, transforms = NULL, model_name = NULL, model_version = NULL, description = NULL, copyright = NULL, ... )
savePMML( x, filename, transforms = NULL, model_name = NULL, model_version = NULL, description = NULL, copyright = NULL, ... )
x |
rtemis model |
filename |
Character: path to file |
transforms |
List of PMML transformations |
model_name |
Character: name of the model |
model_version |
Character: version of the model |
description |
Character: description of the model |
copyright |
Character: copyright information |
... |
Additional arguments passed to pmml::pmml() |
E.D. Gennatas
Returns mod$se.fit
se(x)
se(x)
x |
An |
Standard error of fitted values of mod
E.D. Gennatas
Accepts clusterer name (supports abbreviations) and returns rtemis function name or the function itself. If run with no parameters, prints list of available algorithms.
select_clust(clust, fn = FALSE, desc = FALSE)
select_clust(clust, fn = FALSE, desc = FALSE)
clust |
Character: Clustering algorithm name. Case insensitive, supports partial matching. e.g. "hop" for HOPACH |
fn |
Logical: If TRUE, return function, otherwise name of function. |
desc |
Logical: If TRUE, return full name of algorithm |
Name of function (Default) or function (fn=TRUE
) or full name
of algorithm (desc=TRUE
)
E.D. Gennatas
Accepts decomposer name (supports abbreviations) and returns rtemis function name or the function itself. If run with no parameters, prints list of available algorithms.
select_decom(decom, fn = FALSE, desc = FALSE)
select_decom(decom, fn = FALSE, desc = FALSE)
decom |
Character: Decomposition name. Case insensitive. e.g. "iso" for isomap |
fn |
Logical: If TRUE, return function, otherwise name of function. Defaults to FALSE |
desc |
Logical: If TRUE, return full name of algorithm |
Function or name of function (see param fn
) or full name of algorithm (desc
)
E.D. Gennatas
Accepts learner name (supports abbreviations) and returns rtemis function name or the function itself. If run with no parameters, prints list of available algorithms.
select_learn(alg, fn = FALSE, name = FALSE, desc = FALSE)
select_learn(alg, fn = FALSE, name = FALSE, desc = FALSE)
alg |
Character: Model name. Case insensitive. e.g. "XGB" for xgboost |
fn |
Logical: If TRUE, return function, otherwise name of function. Defaults to FALSE |
name |
Logical: If TRUE, return canonical name of algorithm |
desc |
Logical: If TRUE, return full name / description of algorithm |
function or name of function (see param fn
) or short algorithm name
(name = TRUE
) or full algorithm name (desc = TRUE
)
E.D. Gennatas
Select N of learning iterations based on loss
selectiter( loss.valid, loss.train, smooth = TRUE, plot = FALSE, verbose = FALSE )
selectiter( loss.valid, loss.train, smooth = TRUE, plot = FALSE, verbose = FALSE )
loss.valid |
Float, vector: Validation loss. Can be NULL |
loss.train |
Float, vector: Training loss |
smooth |
Logical: If TRUE, smooth loss before finding minimum. |
plot |
Logical: If TRUE, plot loss curve. |
verbose |
Logical: If TRUE, print messages to console. |
E.D. Gennatas
The first factor level is considered the positive case.
sensitivity(true, estimated, harmonize = FALSE, verbosity = 1)
sensitivity(true, estimated, harmonize = FALSE, verbosity = 1)
true |
True labels |
estimated |
Estimated labels |
harmonize |
Logical: If TRUE, run factor_harmonize first |
verbosity |
Integer: If > 0, print messages to console. |
Sequence generation with automatic cycling
seql(x, target)
seql(x, target)
x |
R object of some |
target |
R object of some |
E.D. Gennatas
color <- c("red", "blue") target <- 1:5 color[seql(color, target)] # "red" "blue" "red" "blue" "red" color <- c("red", "green", "blue", "yellow", "orange") target <- 1:3 color[seql(color, target)] # "red" "green" "blue"
color <- c("red", "blue") target <- 1:5 color[seql(color, target)] # "red" "blue" "red" "blue" "red" color <- c("red", "green", "blue", "yellow", "orange") target <- 1:3 color[seql(color, target)] # "red" "green" "blue"
Symmetric Set Difference
setdiffsym(x, y)
setdiffsym(x, y)
x |
vector |
y |
vector of same type as |
E.D. Gennatas
setdiff(1:10, 1:5) setdiff(1:5, 1:10) setdiffsym(1:10, 1:5) setdiffsym(1:5, 1:10)
setdiff(1:10, 1:5) setdiff(1:5, 1:10) setdiffsym(1:10, 1:5) setdiffsym(1:5, 1:10)
rtMod
baggingSet resample parameters for rtMod
bagging
setup.bag.resample( resampler = "strat.sub", n.resamples = 10, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = 1 )
setup.bag.resample( resampler = "strat.sub", n.resamples = 10, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = 1 )
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
n.resamples |
Integer: Number of training/testing sets required |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
verbosity |
Logical: If TRUE, print messages to console |
Set colorGrad parameters
setup.color( n = 101, colors = NULL, space = "rgb", lo = "#01256E", lomid = NULL, mid = "white", midhi = NULL, hi = "#95001A", colorbar = FALSE, cb.mar = c(1, 1, 1, 1), ... )
setup.color( n = 101, colors = NULL, space = "rgb", lo = "#01256E", lomid = NULL, mid = "white", midhi = NULL, hi = "#95001A", colorbar = FALSE, cb.mar = c(1, 1, 1, 1), ... )
n |
Integer: How many distinct colors you want. If not odd, converted to |
colors |
Character: Acts as a shortcut to defining |
space |
Character: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb".
Recommendation: If |
lo |
Color for low end |
lomid |
Color for low-mid |
mid |
Color for middle of the range or "mean", which will result in |
midhi |
Color for middle-high |
hi |
Color for high end |
colorbar |
Logical: Create a vertical colorbar |
cb.mar |
Vector, length 4: Colorbar margins. Default: c(1, 1, 1, 1) |
... |
Additional arguments |
setup.cv.resample
: resample defaults for cross-validationsetup.cv.resample
: resample defaults for cross-validation
setup.cv.resample( resampler = "strat.sub", n.resamples = 10, stratify.var = NULL, train.p = 0.8, strat.n.bins = 4, target.length = NULL, id.strat = NULL, verbosity = 1 )
setup.cv.resample( resampler = "strat.sub", n.resamples = 10, stratify.var = NULL, train.p = 0.8, strat.n.bins = 4, target.length = NULL, id.strat = NULL, verbosity = 1 )
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
n.resamples |
Integer: Number of training/testing sets required |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
id.strat |
Vector of IDs which may be replicated: resampling should force replicates of each ID to only appear in the training or testing. |
verbosity |
Logical: If TRUE, print messages to console |
.decompose
argumentSet decomposition parameters for train_cv .decompose
argument
setup.decompose(decom = "ICA", k = 2, ...)
setup.decompose(decom = "ICA", k = 2, ...)
decom |
Character: Name of decomposer to use. |
k |
Integer: Number of dimensions to project to. |
... |
Additional arguments to be passed to decomposer |
Set earlystop parameters
setup.earlystop( window = 150, window_decrease_pct_min = 0.01, total_decrease_pct_max = NULL )
setup.earlystop( window = 150, window_decrease_pct_min = 0.01, total_decrease_pct_max = NULL )
window |
Integer: Number of steps to consider |
window_decrease_pct_min |
Float: Stop if improvement is less than this percent over last |
total_decrease_pct_max |
Float: Stop if improvement from first to last step exceeds this percent. If defined, overrides |
Set s_GBM parameters
setup.GBM( interaction.depth = 2, shrinkage = 0.001, max.trees = 5000, min.trees = 100, bag.fraction = 0.9, n.minobsinnode = 5, grid.resample.params = setup.resample("kfold", 5), ifw = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, ... )
setup.GBM( interaction.depth = 2, shrinkage = 0.001, max.trees = 5000, min.trees = 100, bag.fraction = 0.9, n.minobsinnode = 5, grid.resample.params = setup.resample("kfold", 5), ifw = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, ... )
interaction.depth |
[gS] Integer: Interaction depth. |
shrinkage |
[gS] Float: Shrinkage (learning rate). |
max.trees |
Integer: Maximum number of trees to fit |
min.trees |
Integer: Minimum number of trees to fit. |
bag.fraction |
[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting. |
n.minobsinnode |
[gS] Integer: Minimum number of observation allowed in node. |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
... |
Additional arguments |
gridSearchLearn
Set resample parameters for gridSearchLearn
setup.grid.resample( resampler = "kfold", n.resamples = 5, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = 1 )
setup.grid.resample( resampler = "kfold", n.resamples = 5, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = 1 )
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
n.resamples |
Integer: Number of training/testing sets required |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
verbosity |
Logical: If TRUE, print messages to console |
Sets parameters for the GBM and GLMNET (LASSO) steps of s_LightRuleFit
setup.LightRuleFit( n_trees = 200, num_leaves = 32L, max_depth = 3, learning_rate = 0.1, subsample = 0.666, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, objective = NULL, extra.lgbm.params = NULL, lightgbm.ifw = TRUE, lightgbm.resample.params = setup.resample("kfold", 5), glmnet.ifw = TRUE, importance = FALSE, alpha = 1, lambda = NULL, glmnet.resample.params = setup.resample("kfold", 5) )
setup.LightRuleFit( n_trees = 200, num_leaves = 32L, max_depth = 3, learning_rate = 0.1, subsample = 0.666, subsample_freq = 1L, lambda_l1 = 0, lambda_l2 = 0, objective = NULL, extra.lgbm.params = NULL, lightgbm.ifw = TRUE, lightgbm.resample.params = setup.resample("kfold", 5), glmnet.ifw = TRUE, importance = FALSE, alpha = 1, lambda = NULL, glmnet.resample.params = setup.resample("kfold", 5) )
num_leaves |
Integer: [gS] Maximum tree leaves for base learners. |
max_depth |
Integer: [gS] Maximum tree depth for base learners, <=0 means no limit. |
learning_rate |
Numeric: [gS] Boosting learning rate |
subsample |
Numeric: [gS] Subsample ratio of the training set. |
subsample_freq |
Integer: Subsample every this many iterations |
lambda_l1 |
Numeric: [gS] L1 regularization term |
lambda_l2 |
Numeric: [gS] L2 regularization term |
objective |
(Default = NULL) |
lightgbm.ifw |
Logical: Passed to s_LightGBM's |
glmnet.ifw |
Logical: Passed to s_GLMNET's |
importance |
Logical: If |
alpha |
[gS] Float [0, 1]: The elasticnet mixing parameter:
|
lambda |
[gS] Float vector: Best left to NULL, |
ED Gennatas
Set s_LIHAD parameters
setup.LIHAD( max.depth = 2, learning.rate = 1, lincoef.params = setup.lincoef("glmnet"), alpha = 0, lambda = 0.1, minobsinnode = 2, minobsinnode.lin = 20, ... )
setup.LIHAD( max.depth = 2, learning.rate = 1, lincoef.params = setup.lincoef("glmnet"), alpha = 0, lambda = 0.1, minobsinnode = 2, minobsinnode.lin = 20, ... )
max.depth |
[gS] Integer: Max depth of additive tree. Default = 3 |
learning.rate |
[gS] Float (0, 1): Learning rate. |
lincoef.params |
Named List: Output of setup.lincoef |
alpha |
[gS] Float: |
lambda |
[gS] Float: |
minobsinnode |
[gS] Integer: Minimum N observations needed in node, before considering splitting |
minobsinnode.lin |
Integer: Minimum N observations needed in node in order to train linear model. |
... |
Additional arguments |
Set lincoef parameters
setup.lincoef( method = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve"), alpha = 0, lambda = 0.01, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd") )
setup.lincoef( method = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets", "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve"), alpha = 0, lambda = 0.01, lambda.seq = NULL, cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min", "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm", sgd.model.control = list(lambda1 = 0, lambda2 = 0), sgd.control = list(method = "ai-sgd") )
method |
Character: Method to use:
|
alpha |
Float: |
lambda |
Float: The lambda value for |
lambda.seq |
Float, vector: lambda sequence for |
cv.glmnet.nfolds |
Integer: Number of folds for |
which.cv.glmnet.lambda |
Character: Whitch lambda to pick from cv.glmnet: "lambda.min": Lambda that gives minimum cross-validated error; |
nbest |
Integer: For |
nvmax |
Integer: For |
sgd.model |
Character: Model to use for |
sgd.model.control |
List: |
sgd.control |
List: |
Set s_MARS parameters
setup.MARS( hidden = 1, activation = NULL, learning.rate = 0.8, momentum = 0.5, learningrate_scale = 1, output = NULL, numepochs = 100, batchsize = NULL, hidden_dropout = 0, visible_dropout = 0, ... )
setup.MARS( hidden = 1, activation = NULL, learning.rate = 0.8, momentum = 0.5, learningrate_scale = 1, output = NULL, numepochs = 100, batchsize = NULL, hidden_dropout = 0, visible_dropout = 0, ... )
... |
Additional parameters to pass to |
Set resample parameters for meta model training
setup.meta.resample( resampler = "strat.sub", n.resamples = 4, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = TRUE )
setup.meta.resample( resampler = "strat.sub", n.resamples = 4, stratify.var = NULL, train.p = 0.75, strat.n.bins = 4, target.length = NULL, verbosity = TRUE )
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
n.resamples |
Integer: Number of training/testing sets required |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
verbosity |
Logical: If TRUE, print messages to console |
.preprocess
argumentSet preprocess parameters for train_cv .preprocess
argument
setup.preprocess( completeCases = FALSE, removeCases.thres = NULL, removeFeatures.thres = NULL, impute = FALSE, impute.type = "missRanger", impute.missRanger.params = list(pmm.k = 0, maxiter = 10), impute.discrete = get_mode, impute.numeric = mean, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, numeric.cut.n = 0, numeric.cut.labels = FALSE, numeric.quant.n = 0, character2factor = FALSE, scale = FALSE, center = FALSE, removeConstants = TRUE, oneHot = FALSE, exclude = NULL )
setup.preprocess( completeCases = FALSE, removeCases.thres = NULL, removeFeatures.thres = NULL, impute = FALSE, impute.type = "missRanger", impute.missRanger.params = list(pmm.k = 0, maxiter = 10), impute.discrete = get_mode, impute.numeric = mean, integer2factor = FALSE, integer2numeric = FALSE, logical2factor = FALSE, logical2numeric = FALSE, numeric2factor = FALSE, numeric2factor.levels = NULL, numeric.cut.n = 0, numeric.cut.labels = FALSE, numeric.quant.n = 0, character2factor = FALSE, scale = FALSE, center = FALSE, removeConstants = TRUE, oneHot = FALSE, exclude = NULL )
completeCases |
Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE |
removeCases.thres |
Float (0, 1): Remove cases with >= to this fraction of missing features. |
removeFeatures.thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
impute |
Logical: If TRUE, impute missing cases. See |
impute.type |
Character: How to impute data: "missRanger" and
"missForest" use the packages of the same name to impute by iterative random
forest regression. "rfImpute" uses |
impute.missRanger.params |
Named list with elements "pmm.k" and
"maxiter", which are passed to |
impute.discrete |
Function that returns single value: How to impute
discrete variables for |
impute.numeric |
Function that returns single value: How to impute
continuous variables for |
integer2factor |
Logical: If TRUE, convert all integers to factors. This includes
|
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors |
numeric2factor.levels |
Character vector: Optional - will be passed to
|
numeric.cut.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric.cut.labels |
Logical: The |
numeric.quant.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
character2factor |
Logical: If TRUE, convert all character variables to factors |
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
removeConstants |
Logical: If TRUE, remove constant columns. |
oneHot |
Logical: If TRUE, convert all factors using one-hot encoding |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
Set s_Ranger parameters
setup.Ranger( n.trees = 1000, min.node.size = 1, mtry = NULL, grid.resample.params = setup.resample("kfold", 5), ifw = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, ... )
setup.Ranger( n.trees = 1000, min.node.size = 1, mtry = NULL, grid.resample.params = setup.resample("kfold", 5), ifw = TRUE, upsample = FALSE, downsample = FALSE, resample.seed = NULL, ... )
n.trees |
Integer: Number of trees to grow. Default = 1000 |
min.node.size |
[gS] Integer: Minimum node size |
mtry |
[gS] Integer: Number of features sampled randomly at each split. Defaults to square root of n of features for classification, and a third of n of features for regression. |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
upsample |
Logical: If TRUE, upsample training set cases not belonging in majority outcome group |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
... |
Additional arguments to be passed to |
Set resample settings
setup.resample( resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"), n.resamples = 10, stratify.var = NULL, train.p = 0.8, strat.n.bins = 4, target.length = NULL, id.strat = NULL, seed = NULL )
setup.resample( resampler = c("strat.sub", "strat.boot", "kfold", "bootstrap", "loocv"), n.resamples = 10, stratify.var = NULL, train.p = 0.8, strat.n.bins = 4, target.length = NULL, id.strat = NULL, seed = NULL )
resampler |
Character: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". |
n.resamples |
Integer: Number of training/testing sets required |
stratify.var |
Numeric vector (optional): Variable used for stratification. |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
id.strat |
Vector of IDs which may be replicated: resampling should force replicates of each ID to only appear in the training or testing. |
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
Submit expression to SGE grid
sge_submit( expr, obj_names = NULL, packages = NULL, queue = NULL, n_threads = 4, sge_out = file.path(getwd(), "./sge_out"), sge_error = sge_out, sge_env = "#! /usr/bin/env bash", sge_opts = "#$ -cwd", R_command = NULL, system_command = NULL, h_rt = "00:25:00", mem_free = NULL, temp_dir = file.path(getwd(), ".sge_tempdir"), verbose = TRUE, trace = 1 )
sge_submit( expr, obj_names = NULL, packages = NULL, queue = NULL, n_threads = 4, sge_out = file.path(getwd(), "./sge_out"), sge_error = sge_out, sge_env = "#! /usr/bin/env bash", sge_opts = "#$ -cwd", R_command = NULL, system_command = NULL, h_rt = "00:25:00", mem_free = NULL, temp_dir = file.path(getwd(), ".sge_tempdir"), verbose = TRUE, trace = 1 )
expr |
R expression |
obj_names |
Character vector: Names of objects to copy to cluster R session |
packages |
Character vector: Names of packages to load in cluster R session |
queue |
Character: Name of SGE queue to submit to |
n_threads |
Integer: Number of threads to request from scheduler |
sge_out |
Character: Path to directory to write standard out message files |
sge_error |
Character: Path to directory to write error message files |
sge_env |
Character: Shell environment for script to be submitted to SGE |
sge_opts |
Character: SGE options that will be written in shell script. Default = "#$ -cwd" |
R_command |
Character: Optional R command(s) to run at the beginning of the R script |
system_command |
Character: system command to be run by shell script before executing R code. For example a command that export the R executable to use |
h_rt |
Character: Max time to request. Default = "00:25:00", i.e. 25 minutes |
mem_free |
Character: Amount of memory to request from the scheduler |
temp_dir |
Character: Temporary directory that is accessible to all
execution nodes.
Default = |
verbose |
Logical: If TRUE, print messages to console. Default = TRUE |
trace |
Integer: If > 0 print diagnostic messages to console. |
E.D. Gennatas
Sigmoid function
sigmoid(x)
sigmoid(x)
x |
Vector, float: Input |
Return the size of a matrix or vector as (Nrows, Ncolumns) Are you tired of getting NULL when you run dim() on a vector?
size(x)
size(x)
x |
Vector or matrix input |
Integer vector of length 2: c(Nrow, Ncols)
E.D. Gennatas
x <- rnorm(20) size(x) # 20 1 x <- matrix(rnorm(100), 20, 5) size(x) # 20 5
x <- rnorm(20) size(x) # 20 1 x <- matrix(rnorm(100), 20, 5) size(x) # 20 5
Softmax function
softmax(x)
softmax(x)
x |
Vector, Float: Input |
Softplus function:
softplus(x)
softplus(x)
x |
Vector, Float: Input |
lines, but sorted
sortedlines(x, y, col = "red", ...)
sortedlines(x, y, col = "red", ...)
x |
Input vector |
y |
Input vector |
col |
Line color. Default = "red" |
... |
Extra params to pass to |
E.D. Gennatas
A sparse version of stats::rnorm
Outputs a vector where a fraction of values are zeros (determined by sparseness
)
and the rest are drawn from a random normal distribution using stats::rnorm
sparsernorm(n, sparseness = 0.1, mean = 0, sd = 1)
sparsernorm(n, sparseness = 0.1, mean = 0, sd = 1)
n |
Integer: Length of output vector |
sparseness |
Float (0, 1): Fraction of required nonzero elements, i.e. output will have
|
mean |
Float: Target mean of nonzero elements, passed to |
sd |
Float: Target sd of nonzero elements, passed to |
E.D. Gennatas
Get sparseness measure on a matrix of vectors
sparseVectorSummary(vectors, y = NULL)
sparseVectorSummary(vectors, y = NULL)
vectors |
Matrix of column vectors |
y |
Optional numeric vector. |
Keep top x% of values of a vector
sparsify(x, sparseness)
sparsify(x, sparseness)
x |
Input vector |
sparseness |
Percent of values of |
E.D. Gennatas
The first factor level is considered the positive case.
specificity(true, estimated, harmonize = FALSE, verbosity = 1)
specificity(true, estimated, harmonize = FALSE, verbosity = 1)
true |
True labels |
estimated |
Estimated labels |
harmonize |
Logical: If TRUE, run factor_harmonize first |
verbosity |
Integer: If > 0, print messages to console. |
Calculate the standard error of the mean, which is equal to the standard deviation divided by the square root of the sample size. NA values are automatically removed
stderror(x)
stderror(x)
x |
Vector, numeric: Input data |
E.D. Gennatas
Stratified Bootstrap Resampling
strat.boot( x, n.resamples = 10, train.p = 0.75, stratify.var = NULL, strat.n.bins = 4, target.length = NULL, seed = NULL, verbosity = TRUE )
strat.boot( x, n.resamples = 10, train.p = 0.75, stratify.var = NULL, strat.n.bins = 4, target.length = NULL, seed = NULL, verbosity = TRUE )
x |
Input vector |
n.resamples |
Integer: Number of training/testing sets required |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
stratify.var |
Numeric vector (optional): Variable used for stratification. |
strat.n.bins |
Integer: Number of groups to use for stratification for
|
target.length |
Integer: Number of cases for training set for
|
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
verbosity |
Logical: If TRUE, print messages to console |
E.D. Gennatas
Resample using Stratified Subsamples
strat.sub( x, n.resamples = 10, train.p = 0.75, stratify.var = NULL, strat.n.bins = 4, seed = NULL, verbosity = TRUE )
strat.sub( x, n.resamples = 10, train.p = 0.75, stratify.var = NULL, strat.n.bins = 4, seed = NULL, verbosity = TRUE )
x |
Input vector |
n.resamples |
Integer: Number of training/testing sets required |
train.p |
Float (0, 1): Fraction of cases to assign to traininig set for
|
stratify.var |
Numeric vector (optional): Variable used for stratification. |
strat.n.bins |
Integer: Number of groups to use for stratification for
|
seed |
Integer: (Optional) Set seed for random number generator, in order to make
output reproducible. See |
verbosity |
Logical: If TRUE, print messages to console |
E.D. Gennatas
survfit
object's strata to a factorConvert survfit
object's strata to a factor
strata2factor(x)
strata2factor(x)
x |
|
factor
E.D. Gennatas
Summarize numeric variables
summarize( x, varname, group_by = NULL, type = c("all", "median-range", "mean-sd"), na.rm = TRUE )
summarize( x, varname, group_by = NULL, type = c("all", "median-range", "mean-sd"), na.rm = TRUE )
x |
data.frame or data.table (will be coerced to data.table) |
varname |
Character, vector: Variable name(s) to summarize. Must be column names in |
group_by |
Character, vector: Variable name(s) of factors to group by. Must be column names
in |
type |
Character: "all", "median-range" or "mean-sd". Default = "all", which returns Mean, SD, Median, Range, NA (number of NA values) |
na.rm |
Logical: Passed to |
data.table
with summary
E.D. Gennatas
massGAM
object summarymassGAM
object summary
## S3 method for class 'massGAM' summary(object, ...)
## S3 method for class 'massGAM' summary(object, ...)
object |
A |
... |
Not used |
E.D. Gennatas
massGLM
object summarymassGLM
object summary
## S3 method for class 'massGLM' summary(object, ...)
## S3 method for class 'massGLM' summary(object, ...)
object |
An object created by massGLM |
... |
Not used |
E.D. Gennatas
Survival Analysis Metrics
surv_error(true, estimated)
surv_error(true, estimated)
true |
Vector, numeric: True survival times |
estimated |
Vector, numeric: Estimated survival times |
E.D. Gennatas
rtemis-internals
Project Variables to First EigenvectorConvenience function for SVD k = 1
svd1(x, x.test = NULL)
svd1(x, x.test = NULL)
x |
Input matrix / data frame |
x.test |
Optional test matrix / data frame |
E.D. Gennatas
Create "Multimodal" Synthetic Data using squares and arctangents
synth_multimodal( n.cases = 10000, init.fn = "runifmat", init.fn.params = list(min = -10, max = 10), n.groups = 4, n.feat.per.group = round(seq(10, 300, length.out = n.groups)), contrib.p = 0.33, linear.p = 0.66, square.p = 0.1, atan.p = 0.1, pair.multiply.p = 0.05, pair.square.p = 0.05, pair.atan.p = 0.05, verbose = TRUE, seed = NULL, filename = NULL )
synth_multimodal( n.cases = 10000, init.fn = "runifmat", init.fn.params = list(min = -10, max = 10), n.groups = 4, n.feat.per.group = round(seq(10, 300, length.out = n.groups)), contrib.p = 0.33, linear.p = 0.66, square.p = 0.1, atan.p = 0.1, pair.multiply.p = 0.05, pair.square.p = 0.05, pair.atan.p = 0.05, verbose = TRUE, seed = NULL, filename = NULL )
n.cases |
Integer: Number of cases to create. Default = 10000 |
init.fn |
Character: "runifmat" or "rnormmat". Use the respective functions to generate features as random uniform and random normal variables, respectively. Default = "runifmat" |
init.fn.params |
Named list with arguments "min", "max" for "runifmat" and
"mean", "sd" for "rnormmat". Default = |
n.groups |
Integer: Number of feature groups / modalities to create. Default = 4 |
n.feat.per.group |
Integer, vector, length |
contrib.p |
Float (0, 1]: Ratio of features contributing to outcome per group. Default = .33, i.e. a third of the features in each group will be used to produce the outcome y |
linear.p |
Float [0, 1]: Ratio of contributing features to be included linearly. Default = .1, i.e. .1 of .33 of features in each group will be included |
square.p |
Float [0, 1]: Ratio of contributing features to be squared. Default = .1, i.e. .1 of .33 of features in each group will be squared |
atan.p |
Float [0, 1]: Ratio of contributing features whose |
pair.multiply.p |
Float [0, 1] Ratio of features will be divided into pairs and multiplied. Default = .05 |
pair.square.p |
Float [0, 1] Ratio of features which will be divided into pairs, multiplied and squared. |
pair.atan.p |
Float [0, 1] Ratio of features which will be divided into pairs, multiplied and transformed using
|
verbose |
Logical: If TRUE, print messages to console. |
seed |
Integer: If set, pass to |
filename |
Character: Path to file to save output. |
There are no checks yet for compatibility among inputs and certain combinations may not work.
List with elements x, y, index.square, index.atan, index.pair.square
E.D. Gennatas
xmm <- synth_multimodal( n.cases = 10000, init.fn = "runifmat", init.fn.params = list(min = -10, max = 10), n.groups = 5, n.feat.per.group = c(20, 50, 100, 200, 300), contrib.p = .33, linear.p = .66, square.p = .1, atan.p = .1, pair.multiply.p = .1, pair.square.p = .1, pair.atan.p = .1, seed = 2019 )
xmm <- synth_multimodal( n.cases = 10000, init.fn = "runifmat", init.fn.params = list(min = -10, max = 10), n.groups = 5, n.feat.per.group = c(20, 50, 100, 200, 300), contrib.p = .33, linear.p = .66, square.p = .1, atan.p = .1, pair.multiply.p = .1, pair.square.p = .1, pair.atan.p = .1, seed = 2019 )
Synthesize Simple Regression Data
synth_reg_data( nrow = 500, ncol = 50, noise.sd.factor = 1, resample.params = setup.resample(), seed = NULL, verbose = FALSE )
synth_reg_data( nrow = 500, ncol = 50, noise.sd.factor = 1, resample.params = setup.resample(), seed = NULL, verbose = FALSE )
nrow |
Integer: Number of rows. Default = 500 |
ncol |
Integer: Number of columns. Default = 50 |
noise.sd.factor |
Numeric: Add rnorm(nrow, sd = noise.sd.factor * sd(y)). Default = 2 |
resample.params |
Output of setup.resample defining training/testing split. The first resulting resample
will be used to create |
seed |
Integer: Seed for random number generator. Default = NULL |
verbose |
Logical: If TRUE, print messages to console. Default = FALSE |
List with elements dat, dat.train, dat.test, resamples, w, seed
E.D. Gennatas
Build Table 1. Subject characteristics
table1( x, summaryFn1 = mean, summaryFn2 = sd, summaryFn1.extraArgs = list(na.rm = TRUE), summaryFn2.extraArgs = list(na.rm = TRUE), labelify = TRUE, verbose = TRUE, filename = NULL )
table1( x, summaryFn1 = mean, summaryFn2 = sd, summaryFn1.extraArgs = list(na.rm = TRUE), summaryFn2.extraArgs = list(na.rm = TRUE), labelify = TRUE, verbose = TRUE, filename = NULL )
x |
data.frame or matrix: Input data, cases by features |
summaryFn1 |
Function: Summary function 1. Default = |
summaryFn2 |
Function: Summary function 2. Default = |
summaryFn1.extraArgs |
List: Extra arguments for |
summaryFn2.extraArgs |
List: Extra arguments for |
labelify |
Logical: If TRUE, apply labelify to column names of |
verbose |
Logical: If TRUE, print messages to console. |
filename |
Character: Path to output CSV file to save table. |
The output will look like "summaryFn1 (summaryFn2)". Using defaults this will be "mean (sd)"
A data.frame, invisibly, with two columns: "Feature", "Value mean (sd) | N"
E.D. Gennatas
table1(iris)
table1(iris)
mplot3
and dplot3
functionsThemes for mplot3
and dplot3
functions
theme_black( bg = "#000000", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_blackgrid( bg = "#000000", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_blackigrid( bg = "#000000", plot.bg = "#1A1A1A", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgray( bg = "#121212", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgraygrid( bg = "#121212", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#404040", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgrayigrid( bg = "#121212", plot.bg = "#202020", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "transparent", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_white( bg = "#ffffff", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_whitegrid( bg = "#ffffff", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#c0c0c0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_whiteigrid( bg = "#ffffff", plot.bg = "#E6E6E6", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "transparent", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_lightgraygrid( bg = "#dfdfdf", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#c0c0c0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_mediumgraygrid( bg = "#b3b3b3", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#d0d0d0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") )
theme_black( bg = "#000000", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_blackgrid( bg = "#000000", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_blackigrid( bg = "#000000", plot.bg = "#1A1A1A", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgray( bg = "#121212", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 0.2, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgraygrid( bg = "#121212", plot.bg = "transparent", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#404040", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_darkgrayigrid( bg = "#121212", plot.bg = "#202020", fg = "#ffffff", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "transparent", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_white( bg = "#ffffff", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = FALSE, grid.nx = NULL, grid.ny = NULL, grid.col = fg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = fg, tick.alpha = 0.5, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_whitegrid( bg = "#ffffff", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#c0c0c0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_whiteigrid( bg = "#ffffff", plot.bg = "#E6E6E6", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = bg, grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "transparent", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_lightgraygrid( bg = "#dfdfdf", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#c0c0c0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") ) theme_mediumgraygrid( bg = "#b3b3b3", plot.bg = "transparent", fg = "#000000", pch = 16, cex = 1, lwd = 2, bty = "n", box.col = fg, box.alpha = 1, box.lty = 1, box.lwd = 0.5, grid = TRUE, grid.nx = NULL, grid.ny = NULL, grid.col = "#d0d0d0", grid.alpha = 1, grid.lty = 1, grid.lwd = 1, axes.visible = TRUE, axes.col = "transparent", tick.col = "#00000000", tick.alpha = 1, tick.labels.col = fg, tck = -0.01, tcl = NA, x.axis.side = 1, y.axis.side = 2, labs.col = fg, x.axis.line = 0, x.axis.las = 0, x.axis.padj = -1.1, x.axis.hadj = 0.5, y.axis.line = 0, y.axis.las = 1, y.axis.padj = 0.5, y.axis.hadj = 0.5, xlab.line = 1.4, ylab.line = 2, zerolines = TRUE, zerolines.col = fg, zerolines.alpha = 0.5, zerolines.lty = 1, zerolines.lwd = 1, main.line = 0.25, main.adj = 0, main.font = 2, main.col = fg, font.family = getOption("rt.font", "Helvetica") )
bg |
Color: Figure background |
plot.bg |
Color: Plot region background |
fg |
Color: Foreground color used as default for multiple elements like axes and labels, which can be defined separately |
pch |
Integer: Point character. |
cex |
Float: Character expansion factor. |
lwd |
Float: Line width. |
bty |
Character: Box type: "o", "l", "7", "c", "u", or "]", or "n". Default = "n" (no box) |
box.col |
Box color if |
box.alpha |
Float: Box alpha |
box.lty |
Integer: Box line type |
box.lwd |
Float: Box line width |
grid |
Logical: If TRUE, draw grid in plot regions |
grid.nx |
Integer: N of vertical grid lines |
grid.ny |
Integer: N of horizontal grid lines |
grid.col |
Grid color |
grid.alpha |
Float: Grid alpha |
grid.lty |
Integer: Grid line type |
grid.lwd |
Float: Grid line width |
axes.visible |
Logical: If TRUE, draw axes |
axes.col |
Axes colors |
tick.col |
Tick color |
tick.alpha |
Float: Tick alpha |
tick.labels.col |
Tick labels' color |
tck |
|
tcl |
|
x.axis.side |
Integer: Side to place x-axis. Default = 1 (bottom) |
y.axis.side |
Integer: Side to place y-axis. Default = 2 (left) |
labs.col |
Labels' color |
x.axis.line |
Numeric: |
x.axis.las |
Numeric: |
x.axis.padj |
Numeric: x-axis' |
x.axis.hadj |
Numeric: x-axis' |
y.axis.line |
Numeric: |
y.axis.las |
Numeric: |
y.axis.padj |
Numeric: y-axis' |
y.axis.hadj |
Numeric: y-axis' |
xlab.line |
Numeric: Line to place |
ylab.line |
Numeric: Line to place |
zerolines |
Logical: If TRUE, draw lines on x = 0, y = 0, if within plot limits |
zerolines.col |
Zerolines color |
zerolines.alpha |
Float: Zerolines alpha |
zerolines.lty |
Integer: Zerolines line type |
zerolines.lwd |
Float: Zerolines line width |
main.line |
Float: How many lines away from the plot region to draw title. |
main.adj |
Float: How to align title. Default = 0 (left-align) |
main.font |
Integer: 1: Regular, 2: Bold |
main.col |
Title color |
font.family |
Character: Font to be used throughout plot. |
timeProc
measures how long it takes for a process to run
timeProc(..., verbose = TRUE)
timeProc(..., verbose = TRUE)
... |
Command to be timed. (Will be converted using |
verbose |
Logical: If TRUE, print messages to console |
E.D. Gennatas
CheckData
object description in HTMLGenerate CheckData
object description in HTML
tohtml( x, name = NULL, css = list(font.family = "Helvetica", color = "#fff", background.color = "#242424") )
tohtml( x, name = NULL, css = list(font.family = "Helvetica", color = "#fff", background.color = "#242424") )
x |
|
name |
Character: Name of the data set |
css |
List: CSS styles |
E.D. Gennatas
train
is a high-level function to tune, train, and test an
rtemis model by nested resampling, with optional preprocessing and
decomposition of input features
train_cv( x, y = NULL, alg = "ranger", train.params = list(), .preprocess = NULL, .decompose = NULL, weights = NULL, n.repeats = 1, outer.resampling = setup.resample(resampler = "strat.sub", n.resamples = 10), inner.resampling = setup.resample(resampler = "kfold", n.resamples = 5), bag.fn = median, x.name = NULL, y.name = NULL, save.mods = TRUE, save.tune = TRUE, bag.fitted = FALSE, outer.n.workers = 1, print.plot = FALSE, plot.fitted = FALSE, plot.predicted = TRUE, plot.theme = rtTheme, print.res.plot = FALSE, question = NULL, verbose = TRUE, res.verbose = FALSE, trace = 0, headless = FALSE, outdir = NULL, save.plots = FALSE, save.rt = ifelse(!is.null(outdir), TRUE, FALSE), save.mod = TRUE, save.res = FALSE, debug = FALSE, ... )
train_cv( x, y = NULL, alg = "ranger", train.params = list(), .preprocess = NULL, .decompose = NULL, weights = NULL, n.repeats = 1, outer.resampling = setup.resample(resampler = "strat.sub", n.resamples = 10), inner.resampling = setup.resample(resampler = "kfold", n.resamples = 5), bag.fn = median, x.name = NULL, y.name = NULL, save.mods = TRUE, save.tune = TRUE, bag.fitted = FALSE, outer.n.workers = 1, print.plot = FALSE, plot.fitted = FALSE, plot.predicted = TRUE, plot.theme = rtTheme, print.res.plot = FALSE, question = NULL, verbose = TRUE, res.verbose = FALSE, trace = 0, headless = FALSE, outdir = NULL, save.plots = FALSE, save.rt = ifelse(!is.null(outdir), TRUE, FALSE), save.mod = TRUE, save.res = FALSE, debug = FALSE, ... )
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
alg |
Character: Learner to use. Options: select_learn |
train.params |
Optional named list of parameters to be passed to
|
.preprocess |
Optional named list of parameters to be passed to
preprocess. Set using setup.preprocess,
e.g. |
.decompose |
Optional named list of parameters to be used for
decomposition / dimensionality reduction. Set using setup.decompose,
e.g. |
weights |
Numeric vector: Weights for cases. For classification, |
n.repeats |
Integer: Number of times to repeat the outer resampling. This was added for completeness, but in practice we use either k-fold crossvalidation, e.g. 10-fold, especially in large samples, or a higher number of stratified subsamples, e.g. 25, for smaller samples |
outer.resampling |
List: Output of setup.resample to define outer resampling scheme |
inner.resampling |
List: Output of setup.resample to define inner resampling scheme |
bag.fn |
Function to use to average prediction if
|
x.name |
Character: Name of predictor dataset |
y.name |
Character: Name of outcome |
save.mods |
Logical: If TRUE, retain trained models in object, otherwise discard (save space if running many resamples). |
save.tune |
Logical: If TRUE, save the best.tune data frame for each resample (output of gridSearchLearn) |
bag.fitted |
Logical: If TRUE, use all models to also get a bagged prediction on the full sample. To get a bagged prediction on new data using the same models, use predict.rtModCV |
outer.n.workers |
Integer: Number of cores to use for the outer i.e. testing resamples. You are likely parallelizing either in the inner (tuning) or the learner itself is parallelized. Don't parallelize the parallelization |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
print.res.plot |
Logical: Print model performance plot for each resample. from all resamples. Defaults to TRUE |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
res.verbose |
Logical: Passed to each individual learner's |
trace |
Integer: (Not really used) Print additional information if > 0. |
headless |
Logical: If TRUE, turn off all plotting. |
outdir |
Character: Path where output should be saved |
save.plots |
Logical: If TRUE, save plots to outdir |
save.rt |
Logical: If TRUE and |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
save.res |
Logical: If TRUE, save the full output of each model trained
on differents resamples under subdirectories of |
debug |
Logical: If TRUE, sets |
... |
Additional train.params to be passed to learner. Will be concatenated
with |
Note on resampling: You should never use an outer resampling method with replacement if you will also be using an inner resampling (for tuning). The duplicated cases from the outer resampling may appear both in the training and testing sets of the inner resamples, leading to underestimated testing error.
If there is an error while running either the outer or inner resamples in parallel, the error message returned by R will likely be unhelpful. Repeat the command after setting both inner and outer resample run to use a single core, which should provide an informative message.
The train
command is replacing elevate
.
Note: specifying id.strat for the inner resampling is not yet supported.
Object of class rtModCV
(Regression) or
rtModCVClass
(Classification)
error.test.repeats |
the mean or aggregate error, as appropriate, for each repeat |
error.test.repeats.mean |
the mean error of all repeats, i.e. the mean of |
error.test.repeats.sd |
if |
error.test.res |
the error for each resample, for each repeat |
E.D. Gennatas
## Not run: # Regression x <- rnormmat(100, 50) w <- rnorm(50) y <- x %*% w + rnorm(50) mod <- train(x, y) # Classification data(Sonar, package = "mlbench") mod <- train(Sonar) ## End(Not run)
## Not run: # Regression x <- rnormmat(100, 50) w <- rnorm(50) y <- x %*% w + rnorm(50) mod <- train(x, y) # Classification data(Sonar, package = "mlbench") mod <- train(Sonar) ## End(Not run)
Print tunable hyperparameters for a supervised learning algorithm
tunable( alg = c("glmnet", "svm", "cart", "ranger", "gbm", "xgboost", "lightgbm") )
tunable( alg = c("glmnet", "svm", "cart", "ranger", "gbm", "xgboost", "lightgbm") )
alg |
Character string: Algorithm name. |
Prints tunable hyperparameters for the specified algorithm.
EDG
Given an index of columns, convert identified columns of data.frame to factor, ordered factor, or integer. A number of datasets are distributed with an accompanying index of this sort, especially to define which variables should be treated as categorical (here, factors) for predicting modeling. This functions aims to make data type conversions in those cases easier.
typeset( x, factor.index = NULL, orderedfactor.index = NULL, integer.index = NULL )
typeset( x, factor.index = NULL, orderedfactor.index = NULL, integer.index = NULL )
x |
data frame: input whose columns' types you want to edit |
factor.index |
Integer, vector: Index of columns to be converted to
factors using |
orderedfactor.index |
Integer, vector: Index of columns to be
converted to ordered factors using |
integer.index |
Integer, vector: Index of columns to be converted to
integers using |
E.D. Gennatas
A subset of data from the World Health Organization Global Tuberculosis Report ...
uci_heart_failure
uci_heart_failure
uci_heart_failure
A data frame with 299 rows and 13 columns:
Boolean: If the patient died during the follow-up period
https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records
Get protein sequence from UniProt
uniprot_get( accession = "Q9UMX9", baseURL = "https://rest.uniprot.org/uniprotkb", verbosity = 1 )
uniprot_get( accession = "Q9UMX9", baseURL = "https://rest.uniprot.org/uniprotkb", verbosity = 1 )
accession |
Character: UniProt Accession number - e.g. "Q9UMX9" |
baseURL |
Character: UniProt rest API base URL. Default = "https://rest.uniprot.org/uniprotkb" |
verbosity |
Integer: If > 0, print messages to console |
List with two elements: Annotation & Sequence
E.D. Gennatas
## Not run: mapt <- uniprot_get("Q9UMX9") ## End(Not run)
## Not run: mapt <- uniprot_get("Q9UMX9") ## End(Not run)
Get number of unique values per features
uniquevalsperfeat(x, excludeNA = FALSE)
uniquevalsperfeat(x, excludeNA = FALSE)
x |
matrix or data frame input |
excludeNA |
Logical: If TRUE, exclude NA values from unique count. |
Vector, integer of length NCOL(x)
with number of unique
values per column/feature
E.D. Gennatas
## Not run: uniquevalsperfeat(iris) ## End(Not run)
## Not run: uniquevalsperfeat(iris) ## End(Not run)
Replace extreme values by absolute or quantile threshold
winsorize( x, lo = NULL, hi = NULL, prob.lo = 0.025, prob.hi = 0.975, quantile.type = 7, verbose = TRUE )
winsorize( x, lo = NULL, hi = NULL, prob.lo = 0.025, prob.hi = 0.975, quantile.type = 7, verbose = TRUE )
x |
Numeric vector: Input data |
lo |
Numeric: If not NULL, replace any values in |
hi |
Numeric: If not NULL, replace any values in |
prob.lo |
Numeric (0, 1): If not NULL and |
prob.hi |
Numeric (0, 1): If not NULL and |
quantile.type |
Integer: passed to |
verbose |
Logical: If TRUE, print messages to console. |
If both lo and prob.lo or both hi and prob.hi are NULL, cut-off is set to min(x) and max(x) respectively, i.e. no values are changed
E.D. Gennatas
# Winsorize a normally distributed variable x <- rnorm(500) xw <- winsorize(x) # Winsorize an exponentially distributed variable only on # the top 5% highest values x <- rexp(500) xw <- winsorize(x, prob.lo = NULL, prob.hi = .95)
# Winsorize a normally distributed variable x <- rnorm(500) xw <- winsorize(x) # Winsorize an exponentially distributed variable only on # the top 5% highest values x <- rexp(500) xw <- winsorize(x, prob.lo = NULL, prob.hi = .95)
Run a sparse Canonical Correlation Analysis using the PMA
package
x_CCA( x, z, x.test = NULL, z.test = NULL, y = NULL, outcome = NULL, k = 3, niter = 20, nperms = 50, permute.niter = 15, typex = "standard", typez = "standard", penaltyx = NULL, penaltyz = NULL, standardize = TRUE, upos = FALSE, vpos = FALSE, verbose = TRUE, n.cores = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x_CCA( x, z, x.test = NULL, z.test = NULL, y = NULL, outcome = NULL, k = 3, niter = 20, nperms = 50, permute.niter = 15, typex = "standard", typez = "standard", penaltyx = NULL, penaltyz = NULL, standardize = TRUE, upos = FALSE, vpos = FALSE, verbose = TRUE, n.cores = rtCores, outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ... )
x |
Matrix: Training x dataset |
z |
Matrix: Training z dataset |
x.test |
Matrix (Optional): Testing x set |
z.test |
Matrix (Optional): Testing z set |
y |
Outcome vector (Optional): If supplied, linear combinations of |
outcome |
Character: Type of outcome |
k |
Integer: Number of components |
niter |
Integer: Number of iterations |
nperms |
Integer: Number of permutations to run with |
permute.niter |
Integer: Number of iterations to run for each permutation with |
typex |
Character: "standard", "ordered". Use "standard" if columns of x are unordered; lasso penalty is applied to enforce sparsity. Otherwise, use "ordered"; fused lasso penalty is applied, to enforce both sparsity and smoothness. |
typez |
Character: "standard", "ordered". Same as |
penaltyx |
Float: The penalty to be applied to the matrix x, i.e. the penalty that results in the canonical vector u. If typex is "standard" then the L1 bound on u is penaltyx*sqrt(ncol(x)). In this case penaltyx must be between 0 and 1 (larger L1 bound corresponds to less penalization). If "ordered" then it's the fused lasso penalty lambda, which must be non-negative (larger lambda corresponds to more penalization). |
penaltyz |
Float: The penalty to be applied to the matrix z, i.e. the penalty that results in the canonical vector v. If typez is "standard" then the L1 bound on v is penaltyz*sqrt(ncol(z)). In this case penaltyz must be between 0 and 1 (larger L1 bound corresponds to less penalization). If "ordered" then it's the fused lasso penalty lambda, which must be non-negative (larger lambda corresponds to more penalization). |
standardize |
Logical: If TRUE, center and scale columns of |
upos |
Logical: Require elements of u to be positive |
vpos |
Logical: Require elements of v to be positive |
verbose |
Logical: Print messages, including |
n.cores |
Integer: Number of cores to use |
outdir |
Path to output directory. Default = NULL |
save.mod |
Logical: If TRUE, and |
... |
Additional arguments to be passed to |
#' x_CCA
runs PMA::CCA
. If penaltyx is NULL, penaltyx and penaltyz will be estimated automatically
using x_CCA.permute (adapted to run in parallel)
E.D. Gennatas
Other Cross-Decomposition:
xselect_decom()
Read all sheets of an XLSX file into a list
xlsx2list( x, sheet = NULL, startRow = 1, colNames = TRUE, na.strings = "NA", detectDates = TRUE, skipEmptyRows = TRUE, skipEmptyCols = TRUE )
xlsx2list( x, sheet = NULL, startRow = 1, colNames = TRUE, na.strings = "NA", detectDates = TRUE, skipEmptyRows = TRUE, skipEmptyCols = TRUE )
x |
Character: path or URL to XLSX file |
sheet |
Integer, vector: Sheet(s) to read. If NULL, will read all
sheets in |
startRow |
Integer, vector: First row to start reading. Will be recycled as needed for all sheets |
colNames |
Logical: If TRUE, use the first row of data |
na.strings |
Character vector: stringd to be interpreted as NA |
detectDates |
Logical: If TRUE, try to automatically detect dates |
skipEmptyRows |
Logical: If TRUE, skip empty rows |
skipEmptyCols |
Logical: If TRUE, skip empty columns |
E.D. Gennatas
Accepts decomposer name (supports abbreviations) and returns rtemis function name or the function itself. If run with no parameters, prints list of available algorithms.
xselect_decom(xdecom, fn = FALSE, desc = FALSE)
xselect_decom(xdecom, fn = FALSE, desc = FALSE)
xdecom |
Character: Cross-decomposition name; case insensitive |
fn |
Logical: If TRUE, return function, otherwise return name of function. Default = FALSE |
desc |
Logical: If TRUE, return full name of algorithm. Default = FALSE |
Function or name of function (see param fn
) or full name of algorithm (desc
)
E.D. Gennatas
Other Cross-Decomposition:
x_CCA()
This is a test to emulate the xtdescribe
function in Stata.
xtdescribe(x, ID_col = 1, time_col = 2, n_patterns = 9)
xtdescribe(x, ID_col = 1, time_col = 2, n_patterns = 9)
x |
data.frame with longitudinal data |
ID_col |
Integer: The column position of the ID variable |
time_col |
Integer: The column position of the time variable |
n_patterns |
Integer: The number of patterns to display |
EDG
Returns a data.table of longitude and lattitude for one or more zip codes, given an input dataset
zip2longlat(x, zipdt)
zip2longlat(x, zipdt)
x |
Character vector: Zip code(s) |
zipdt |
data.table with "zip", "lng", and "lat" columns |
Get distance between pairs of zip codes
zipdist(x, y, zipdt)
zipdist(x, y, zipdt)
x |
Character vector |
y |
Character vector, same length as |
zipdt |
data.table with columns |
data.table
with distances in meters
E.D. Gennatas