Title: | Tools for Flexible Survival Analysis Using Machine Learning |
---|---|
Description: | Statistical tools for analyzing time-to-event data using machine learning. Implements survival stacking for conditional survival estimation, standardized survival function estimation for current status data, and methods for algorithm-agnostic variable importance. See Wolock CJ, Gilbert PB, Simon N, and Carone M (2024) <doi:10.1080/10618600.2024.2304070>. |
Authors: | Charles Wolock [aut, cre, cph]
|
Maintainer: | Charles Wolock <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.0.9000 |
Built: | 2025-03-10 04:51:40 UTC |
Source: | https://github.com/cwolock/survml |
Generate cross-fitted oracle prediction function estimates
crossfit_oracle_preds( time, event, X, folds, nuisance_preds, pred_generator, ... )
crossfit_oracle_preds( time, event, X, folds, nuisance_preds, pred_generator, ... )
time |
|
event |
|
X |
|
folds |
|
nuisance_preds |
Named list of conditional event and censoring survival functions that will be used to estimate the oracle prediction function. |
pred_generator |
Function to be used to estimate oracle prediction function. |
... |
Additional arguments to be passed to |
Named list of cross-fitted oracle prediction estimates
Generate cross-fitted conditional survival predictions
crossfit_surv_preds(time, event, X, newtimes, folds, pred_generator, ...)
crossfit_surv_preds(time, event, X, newtimes, folds, pred_generator, ...)
time |
|
event |
|
X |
|
newtimes |
Numeric vector of times on which to estimate the conditional survival functions |
folds |
|
pred_generator |
Function to be used to estimate conditional survival function. |
... |
Additional arguments to be passed to |
Named list of cross-fitted conditional survival predictions
Estimate a survival function under current status sampling
currstatCIR( time, event, X, SL_control = list(SL.library = c("SL.mean", "SL.glm"), V = 3), HAL_control = list(n_bins = c(5), grid_type = c("equal_mass"), V = 3), deriv_method = "m-spline", eval_region, n_eval_pts = 101, alpha = 0.05 )
currstatCIR( time, event, X, SL_control = list(SL.library = c("SL.mean", "SL.glm"), V = 3), HAL_control = list(n_bins = c(5), grid_type = c("equal_mass"), V = 3), deriv_method = "m-spline", eval_region, n_eval_pts = 101, alpha = 0.05 )
time |
|
event |
|
X |
|
SL_control |
List of |
HAL_control |
List of |
deriv_method |
Method for computing derivative. Options are |
eval_region |
Region over which to estimate the survival function. |
n_eval_pts |
Number of points in grid on which to evaluate survival function.
The points will be evenly spaced, on the quantile scale, between the endpoints of |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
Data frame giving results, with columns:
t |
Time at which survival function is estimated |
S_hat_est |
Survival function estimate |
S_hat_cil |
Lower bound of confidence interval |
S_hat_ciu |
Upper bound of confidence interval |
## Not run: # This is a small simulation example set.seed(123) n <- 300 x <- cbind(2*rbinom(n, size = 1, prob = 0.5)-1, 2*rbinom(n, size = 1, prob = 0.5)-1) t <- rweibull(n, shape = 0.75, scale = exp(0.4*x[,1] - 0.2*x[,2])) y <- rweibull(n, shape = 0.75, scale = exp(0.4*x[,1] - 0.2*x[,2])) # round y to nearest quantile of y, just so there aren't so many unique values quants <- quantile(y, probs = seq(0, 1, by = 0.05), type = 1) for (i in 1:length(y)){ y[i] <- quants[which.min(abs(y[i] - quants))] } delta <- as.numeric(t <= y) dat <- data.frame(y = y, delta = delta, x1 = x[,1], x2 = x[,2]) dat$delta[dat$y > 1.8] <- NA dat$y[dat$y > 1.8] <- NA eval_region <- c(0.05, 1.5) res <- survML::currstatCIR(time = dat$y, event = dat$delta, X = dat[,3:4], SL_control = list(SL.library = c("SL.mean", "SL.glm"), V = 3), HAL_control = list(n_bins = c(5), grid_type = c("equal_mass"), V = 3), eval_region = eval_region) xvals = res$t yvals = res$S_hat_est fn=stepfun(xvals, c(yvals[1], yvals)) plot.function(fn, from=min(xvals), to=max(xvals)) ## End(Not run)
## Not run: # This is a small simulation example set.seed(123) n <- 300 x <- cbind(2*rbinom(n, size = 1, prob = 0.5)-1, 2*rbinom(n, size = 1, prob = 0.5)-1) t <- rweibull(n, shape = 0.75, scale = exp(0.4*x[,1] - 0.2*x[,2])) y <- rweibull(n, shape = 0.75, scale = exp(0.4*x[,1] - 0.2*x[,2])) # round y to nearest quantile of y, just so there aren't so many unique values quants <- quantile(y, probs = seq(0, 1, by = 0.05), type = 1) for (i in 1:length(y)){ y[i] <- quants[which.min(abs(y[i] - quants))] } delta <- as.numeric(t <= y) dat <- data.frame(y = y, delta = delta, x1 = x[,1], x2 = x[,2]) dat$delta[dat$y > 1.8] <- NA dat$y[dat$y > 1.8] <- NA eval_region <- c(0.05, 1.5) res <- survML::currstatCIR(time = dat$y, event = dat$delta, X = dat[,3:4], SL_control = list(SL.library = c("SL.mean", "SL.glm"), V = 3), HAL_control = list(n_bins = c(5), grid_type = c("equal_mass"), V = 3), eval_region = eval_region) xvals = res$t yvals = res$S_hat_est fn=stepfun(xvals, c(yvals[1], yvals)) plot.function(fn, from=min(xvals), to=max(xvals)) ## End(Not run)
Generate oracle prediction function estimates using doubly-robust pseudo-outcome regression with SuperLearner
DR_pseudo_outcome_regression( time, event, X, newX, approx_times, S_hat, G_hat, newtimes, outcome, SL.library, V )
DR_pseudo_outcome_regression( time, event, X, newX, approx_times, S_hat, G_hat, newtimes, outcome, SL.library, V )
time |
|
event |
|
X |
|
newX |
|
approx_times |
Numeric vector of length J2 giving times at which to approximate integral appearing in the pseudo-outcomes |
S_hat |
|
G_hat |
|
newtimes |
Numeric vector of times at which to generate oracle prediction function estimates |
outcome |
Outcome type, either |
SL.library |
Super Learner library |
V |
Number of cross-validation folds, to be passed to |
Matrix of predictions.
Generate cross-fitting and sample-splitting folds
generate_folds(n, V, sample_split)
generate_folds(n, V, sample_split)
n |
Total sample size |
V |
Number of cross-fitting folds to use |
sample_split |
Logical, whether or not sample-splitting is being used |
Named list of cross-fitting and sample-splitting folds
Obtain predicted conditional survival and cumulative hazard functions from a global survival stacking object
## S3 method for class 'stackG' predict( object, newX, newtimes, surv_form = object$surv_form, time_grid_approx = object$time_grid_approx, ... )
## S3 method for class 'stackG' predict( object, newX, newtimes, surv_form = object$surv_form, time_grid_approx = object$time_grid_approx, ... )
object |
Object of class |
newX |
|
newtimes |
|
surv_form |
Mapping from hazard estimate to survival estimate.
Can be either |
time_grid_approx |
Numeric vector of times at which to
approximate product integral or cumulative hazard interval. Defaults to the value
saved in |
... |
Further arguments passed to or from other methods. |
A named list with the following components:
S_T_preds |
An |
S_C_preds |
An |
Lambda_T_preds |
An |
Lambda_C_preds |
An |
time_grid_approx |
The approximation grid for the product integral or cumulative hazard integral, (user-specified). |
surv_form |
Exponential or product-integral form (user-specified). |
# This is a small simulation example set.seed(123) n <- 250 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackG(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", time_grid_approx = sort(unique(time)), surv_form = "exp", learner = "SuperLearner", SL_control = list(SL.library = SL.library, V = 5)) preds <- predict(object = fit, newX = X, newtimes = seq(0, 15, 0.1)) plot(preds$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
# This is a small simulation example set.seed(123) n <- 250 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackG(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", time_grid_approx = sort(unique(time)), surv_form = "exp", learner = "SuperLearner", SL_control = list(SL.library = SL.library, V = 5)) preds <- predict(object = fit, newX = X, newtimes = seq(0, 15, 0.1)) plot(preds$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
Obtain predicted conditional survival function from a local survival stacking object
## S3 method for class 'stackL' predict(object, newX, newtimes, ...)
## S3 method for class 'stackL' predict(object, newX, newtimes, ...)
object |
Object of class |
newX |
|
newtimes |
|
... |
Further arguments passed to or from other methods. |
A named list with the following components:
S_T_preds |
An |
# This is a small simulation example set.seed(123) n <- 500 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackL(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", SL_control = list(SL.library = SL.library, V = 5)) preds <- predict(object = fit, newX = X, newtimes = seq(0, 15, 0.1)) plot(preds$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
# This is a small simulation example set.seed(123) n <- 500 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackL(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", SL_control = list(SL.library = SL.library, V = 5)) preds <- predict(object = fit, newX = X, newtimes = seq(0, 15, 0.1)) plot(preds$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
Estimate a conditional survival function using global survival stacking
stackG( time, event = rep(1, length(time)), entry = NULL, X, newX = NULL, newtimes = NULL, direction = "prospective", time_grid_fit = NULL, bin_size = NULL, time_basis, time_grid_approx = sort(unique(time)), surv_form = "PI", learner = "SuperLearner", SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV = FALSE), tau = NULL )
stackG( time, event = rep(1, length(time)), entry = NULL, X, newX = NULL, newtimes = NULL, direction = "prospective", time_grid_fit = NULL, bin_size = NULL, time_basis, time_grid_approx = sort(unique(time)), surv_form = "PI", learner = "SuperLearner", SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV = FALSE), tau = NULL )
time |
|
event |
|
entry |
Study entry variable, if applicable. Defaults to |
X |
|
newX |
|
newtimes |
|
direction |
Whether the data come from a prospective or retrospective study.
This determines whether the data are treated as subject to left truncation and
right censoring ( |
time_grid_fit |
Named list of numeric vectors of times of times on which to discretize
for estimation of cumulative probability functions. This is an alternative to
|
bin_size |
Size of time bin on which to discretize for estimation
of cumulative probability functions. Can be a number between 0 and 1,
indicating the size of quantile grid (e.g. |
time_basis |
How to treat time for training the binary
classifier. Options are |
time_grid_approx |
Numeric vector of times at which to
approximate product integral or cumulative hazard interval.
Defaults to |
surv_form |
Mapping from hazard estimate to survival estimate.
Can be either |
learner |
Which binary regression algorithm to use. Currently, only
|
SL_control |
Named list of parameters controlling the Super Learner fitting
process. These parameters are passed directly to the |
tau |
The maximum time of interest in a study, used for
retrospective conditional survival estimation. Rather than dealing
with right truncation separately than left truncation, it is simpler to
estimate the survival function of |
A named list of class stackG
, with the following components:
S_T_preds |
An |
S_C_preds |
An |
Lambda_T_preds |
An |
Lambda_C_preds |
An |
time_grid_approx |
The approximation grid for the product integral or cumulative hazard integral, (user-specified). |
direction |
Whether the data come from a prospective or retrospective study (user-specified). |
tau |
The maximum time of interest in a study, used for retrospective conditional survival estimation (user-specified). |
surv_form |
Exponential or product-integral form (user-specified). |
time_basis |
Whether time is included in the regression as |
SL_control |
Named list of parameters controlling the Super Learner fitting process (user-specified). |
fits |
A named list of fitted regression objects corresponding to the constituent regressions needed for
global survival stacking. Includes |
Wolock C.J., Gilbert P.B., Simon N., and Carone, M. (2024). "A framework for leveraging machine learning tools to estimate personalized survival curves."
predict.stackG for stackG
prediction method.
# This is a small simulation example set.seed(123) n <- 250 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackG(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", time_grid_approx = sort(unique(time)), surv_form = "exp", learner = "SuperLearner", SL_control = list(SL.library = SL.library, V = 5)) plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
# This is a small simulation example set.seed(123) n <- 250 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackG(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", time_grid_approx = sort(unique(time)), surv_form = "exp", learner = "SuperLearner", SL_control = list(SL.library = SL.library, V = 5)) plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
Estimate a conditional survival function via local survival stacking
stackL( time, event = rep(1, length(time)), entry = NULL, X, newX, newtimes, direction = "prospective", bin_size = NULL, time_basis = "continuous", learner = "SuperLearner", SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV = FALSE), tau = NULL )
stackL( time, event = rep(1, length(time)), entry = NULL, X, newX, newtimes, direction = "prospective", bin_size = NULL, time_basis = "continuous", learner = "SuperLearner", SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV = FALSE), tau = NULL )
time |
|
event |
|
entry |
Study entry variable, if applicable. Defaults to |
X |
|
newX |
|
newtimes |
|
direction |
Whether the data come from a prospective or retrospective study.
This determines whether the data are treated as subject to left truncation and
right censoring ( |
bin_size |
Size of bins for the discretization of time. A value between 0 and 1 indicating the size of observed event time quantiles on which to grid times (e.g. 0.02 creates a grid of 50 times evenly spaced on the quantile scaled). If NULL, defaults to every observed event time. |
time_basis |
How to treat time for training the binary
classifier. Options are |
learner |
Which binary regression algorithm to use. Currently, only
|
SL_control |
Named list of parameters controlling the Super Learner fitting
process. These parameters are passed directly to the |
tau |
The maximum time of interest in a study, used for
retrospective conditional survival estimation. Rather than dealing
with right truncation separately than left truncation, it is simpler to
estimate the survival function of |
A named list of class stackL
.
S_T_preds |
An |
fit |
The Super Learner fit for binary classification on the stacked dataset. |
Polley E.C. and van der Laan M.J. (2011). "Super Learning for Right-Censored Data" in Targeted Learning.
Craig E., Zhong C., and Tibshirani R. (2021). "Survival stacking: casting survival analysis as a classification problem."
predict.stackL for stackL
prediction method.
# This is a small simulation example set.seed(123) n <- 500 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackL(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", SL_control = list(SL.library = SL.library, V = 5)) plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
# This is a small simulation example set.seed(123) n <- 500 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) S0 <- function(t, x){ pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE) } T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) G0 <- function(t, x) { as.numeric(t < 15) *.9*pexp(t, rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]), lower.tail=FALSE) } C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 entry <- runif(n, 0, 15) time <- pmin(T, C) event <- as.numeric(T <= C) sampled <- which(time >= entry) X <- X[sampled,] time <- time[sampled] event <- event[sampled] entry <- entry[sampled] # Note that this a very small Super Learner library, for computational purposes. SL.library <- c("SL.mean", "SL.glm") fit <- stackL(time = time, event = event, entry = entry, X = X, newX = X, newtimes = seq(0, 15, .1), direction = "prospective", bin_size = 0.1, time_basis = "continuous", SL_control = list(SL.library = SL.library, V = 5)) plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,])) abline(0,1,col='red')
Estimate AUC VIM
vim( type, time, event, X, landmark_times = stats::quantile(time[event == 1], probs = c(0.25, 0.5, 0.75)), restriction_time = max(time[event == 1]), approx_times = NULL, large_feature_vector, small_feature_vector, conditional_surv_preds = NULL, large_oracle_preds = NULL, small_oracle_preds = NULL, conditional_surv_generator = NULL, conditional_surv_generator_control = NULL, large_oracle_generator = NULL, large_oracle_generator_control = NULL, small_oracle_generator = NULL, small_oracle_generator_control = NULL, cf_folds = NULL, cf_fold_num = 5, sample_split = TRUE, ss_folds = NULL, robust = TRUE, scale_est = FALSE, alpha = 0.05, verbose = FALSE )
vim( type, time, event, X, landmark_times = stats::quantile(time[event == 1], probs = c(0.25, 0.5, 0.75)), restriction_time = max(time[event == 1]), approx_times = NULL, large_feature_vector, small_feature_vector, conditional_surv_preds = NULL, large_oracle_preds = NULL, small_oracle_preds = NULL, conditional_surv_generator = NULL, conditional_surv_generator_control = NULL, large_oracle_generator = NULL, large_oracle_generator_control = NULL, small_oracle_generator = NULL, small_oracle_generator_control = NULL, cf_folds = NULL, cf_fold_num = 5, sample_split = TRUE, ss_folds = NULL, robust = TRUE, scale_est = FALSE, alpha = 0.05, verbose = FALSE )
type |
Type of VIM to compute. Options include |
time |
|
event |
|
X |
|
landmark_times |
Numeric vector of length J1 giving
landmark times at which to estimate VIM ( |
restriction_time |
Maximum follow-up time for calculation of |
approx_times |
Numeric vector of length J2 giving times at which to approximate integrals. Defaults to a grid of 100 timepoints, evenly spaced on the quantile scale of the distribution of observed event times. |
large_feature_vector |
Numeric vector giving indices of features to include in the 'large' prediction model. |
small_feature_vector |
Numeric vector giving indices of features to include in the 'small' prediction model. Must be a
subset of |
conditional_surv_preds |
User-provided estimates of the conditional survival functions of the event and censoring
variables given the full covariate vector (if not using the |
large_oracle_preds |
User-provided estimates of the oracle prediction function using |
small_oracle_preds |
User-provided estimates of the oracle prediction function using |
conditional_surv_generator |
A user-written function to estimate the conditional survival functions of the event and censoring variables. Must take arguments
|
conditional_surv_generator_control |
A list of arguments to pass to |
large_oracle_generator |
A user-written function to estimate the oracle prediction function using |
large_oracle_generator_control |
A list of arguments to pass to |
small_oracle_generator |
A user-written function to estimate the oracle prediction function using |
small_oracle_generator_control |
A list of arguments to pass to |
cf_folds |
Numeric vector of length |
cf_fold_num |
The number of cross-fitting folds, if not providing |
sample_split |
Logical indicating whether or not to sample split |
ss_folds |
Numeric vector of length |
robust |
Logical, whether or not to use the doubly-robust debiasing approach. This option
is meant for illustration purposes only — it should be left as |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
verbose |
Whether to print progress messages. |
Named list with the following elements:
result |
Data frame giving results. See the documentation of the individual |
folds |
A named list giving the cross-fitting fold IDs ( |
approx_times |
A vector of times used to approximate integrals appearing in the form of the VIM estimator. |
conditional_surv_preds |
A named list containing the estimated conditional event and censoring survival functions. |
large_oracle_preds |
A named list containing the estimated large oracle prediction function. |
small_oracle_preds |
A named list containing the estimated small oracle prediction function. |
vim_accuracy vim_AUC vim_brier vim_cindex vim_rsquared vim_survival_time_mse
# This is a small simulation example set.seed(123) n <- 100 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 time <- pmin(T, C) event <- as.numeric(T <= C) # landmark times for AUC landmark_times <- c(3) output <- vim(type = "AUC", time = time, event = event, X = X, landmark_times = landmark_times, large_feature_vector = 1:2, small_feature_vector = 2, conditional_surv_generator_control = list(SL.library = c("SL.mean", "SL.glm")), large_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")), small_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")), cf_fold_num = 2, sample_split = FALSE, scale_est = TRUE) print(output$result)
# This is a small simulation example set.seed(123) n <- 100 X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5)) T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2])) C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2])) C[C > 15] <- 15 time <- pmin(T, C) event <- as.numeric(T <= C) # landmark times for AUC landmark_times <- c(3) output <- vim(type = "AUC", time = time, event = event, X = X, landmark_times = landmark_times, large_feature_vector = 1:2, small_feature_vector = 2, conditional_surv_generator_control = list(SL.library = c("SL.mean", "SL.glm")), large_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")), small_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")), cf_fold_num = 2, sample_split = FALSE, scale_est = TRUE) print(output$result)
Estimate classification accuracy VIM
vim_accuracy( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
vim_accuracy( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
landmark_times |
Numeric vector of length J2 giving times at which to estimate accuracy |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
sample_split |
Logical indicating whether or not to sample split |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
landmark_time |
Time at which AUC is evaluated. |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
Estimate AUC VIM
vim_AUC( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, robust = TRUE, scale_est = FALSE, alpha = 0.05 )
vim_AUC( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, robust = TRUE, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
landmark_times |
Numeric vector of length J2 giving times at which to estimate AUC |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
sample_split |
Logical indicating whether or not to sample split |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
robust |
Logical, whether or not to use the doubly-robust debiasing approach. This option
is meant for illustration purposes only — it should be left as |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
landmark_time |
Time at which AUC is evaluated. |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
vim for example usage
Estimate Brier score VIM
vim_brier( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, ss_folds, sample_split, scale_est = FALSE, alpha = 0.05 )
vim_brier( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, ss_folds, sample_split, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
landmark_times |
Numeric vector of length J2 giving times at which to estimate Brier score |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
sample_split |
Logical indicating whether or not to sample split |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
landmark_time |
Time at which AUC is evaluated. |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
vim for example usage
Estimate concordance index VIM
vim_cindex( time, event, approx_times, restriction_time, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
vim_cindex( time, event, approx_times, restriction_time, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
restriction_time |
Restriction time (upper bound for event times to be compared in computing the C-index) |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
sample_split |
Logical indicating whether or not to sample split |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
restriction_time |
Restriction time (upper bound for event times to be compared in computing the C-index). |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
vim for example usage
Estimate R-squared (proportion of explained variance) VIM based on event occurrence by a landmark time
vim_rsquared( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, ss_folds, sample_split, scale_est = FALSE, alpha = 0.05 )
vim_rsquared( time, event, approx_times, landmark_times, f_hat, fs_hat, S_hat, G_hat, cf_folds, ss_folds, sample_split, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
landmark_times |
Numeric vector of length J2 giving times at which to estimate Brier score |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
sample_split |
Logical indicating whether or not to sample split |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
landmark_time |
Time at which AUC is evaluated. |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
vim for example usage
Estimate restricted predicted survival time MSE VIM
vim_survival_time_mse( time, event, approx_times, restriction_time, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
vim_survival_time_mse( time, event, approx_times, restriction_time, f_hat, fs_hat, S_hat, G_hat, cf_folds, sample_split, ss_folds, scale_est = FALSE, alpha = 0.05 )
time |
|
event |
|
approx_times |
Numeric vector of length J1 giving times at which to approximate integrals. |
restriction_time |
restriction time |
f_hat |
Full oracle predictions (n x J1 matrix) |
fs_hat |
Residual oracle predictions (n x J1 matrix) |
S_hat |
Estimates of conditional event time survival function (n x J2 matrix) |
G_hat |
Estimate of conditional censoring time survival function (n x J2 matrix) |
cf_folds |
Numeric vector of length n giving cross-fitting folds |
sample_split |
Logical indicating whether or not to sample split |
ss_folds |
Numeric vector of length n giving sample-splitting folds |
scale_est |
Logical, whether or not to force the VIM estimate to be nonnegative |
alpha |
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05 |
A data frame giving results, with the following columns:
restriction_time |
Restriction time (upper bound for event times to be compared in computing the restricted survival time). |
est |
VIM point estimate. |
var_est |
Estimated variance of the VIM estimate. |
cil |
Lower bound of the VIM confidence interval. |
ciu |
Upper bound of the VIM confidence interval. |
cil_1sided |
Lower bound of a one-sided confidence interval. |
p |
p-value corresponding to a hypothesis test of null importance. |
large_predictiveness |
Estimated predictiveness of the large oracle prediction function. |
small_predictiveness |
Estimated predictiveness of the small oracle prediction function. |
vim |
VIM type. |
large_feature_vector |
Group of features available for the large oracle prediction function. |
small_feature_vector |
Group of features available for the small oracle prediction function. |
vim for example usage