Title: | Prediction Error Curves for Risk Prediction Models in Survival Analysis |
---|---|
Description: | Validation of risk predictions obtained from survival models and competing risk models based on censored data using inverse weighting and cross-validation. Most of the 'pec' functionality has been moved to 'riskRegression'. |
Authors: | Thomas A. Gerds |
Maintainer: | Thomas A. Gerds <[email protected]> |
License: | GPL (>= 2) |
Version: | 2023.04.12 |
Built: | 2024-11-04 04:19:58 UTC |
Source: | https://github.com/tagteam/pec |
Calibration plots for risk prediction models in right censored survival and competing risks data
calPlot( object, time, formula, data, splitMethod = "none", B = 1, M, pseudo, type, showPseudo, pseudo.col = NULL, pseudo.pch = NULL, method = "nne", round = TRUE, bandwidth = NULL, q = 10, bars = FALSE, hanging = FALSE, names = "quantiles", showFrequencies = FALSE, jack.density = 55, plot = TRUE, add = FALSE, diag = !add, legend = !add, axes = !add, xlim = c(0, 1), ylim = c(0, 1), xlab, ylab, col, lwd, lty, pch, cause = 1, percent = TRUE, giveToModel = NULL, na.action = na.fail, cores = 1, verbose = FALSE, cex = 1, ... )
calPlot( object, time, formula, data, splitMethod = "none", B = 1, M, pseudo, type, showPseudo, pseudo.col = NULL, pseudo.pch = NULL, method = "nne", round = TRUE, bandwidth = NULL, q = 10, bars = FALSE, hanging = FALSE, names = "quantiles", showFrequencies = FALSE, jack.density = 55, plot = TRUE, add = FALSE, diag = !add, legend = !add, axes = !add, xlim = c(0, 1), ylim = c(0, 1), xlab, ylab, col, lwd, lty, pch, cause = 1, percent = TRUE, giveToModel = NULL, na.action = na.fail, cores = 1, verbose = FALSE, cex = 1, ... )
object |
A named list of prediction models, where allowed
entries are (1) R-objects for which a predictSurvProb method
exists (see details), (2) a |
time |
The evaluation time point at predicted event probabilities are plotted against pseudo-observed event status. |
formula |
A survival or event history formula. The left hand
side is used to compute the expected event status. If
|
data |
A data frame in which to validate the prediction models
and to fit the censoring model. If |
splitMethod |
Defines the internal validation design:
|
B |
The number of cross-validation steps. |
M |
The size of the subsamples for cross-validation. |
pseudo |
Logical. Determines the method for estimating expected event status:
|
type |
Either "risk" or "survival". |
showPseudo |
If |
pseudo.col |
Colour for pseudo-values. |
pseudo.pch |
Dot type (see par) for pseudo-values. |
method |
The method for estimating the calibration curve(s):
|
round |
If |
bandwidth |
The bandwidth for |
q |
The number of quantiles for |
bars |
If |
hanging |
Barplots only. If |
names |
Barplots only. Names argument passed to |
showFrequencies |
Barplots only. If |
jack.density |
Gray scale for pseudo-observations. |
plot |
If |
add |
If |
diag |
If |
legend |
If |
axes |
If |
xlim |
Limits of x-axis. |
ylim |
Limits of y-axis. |
xlab |
Label for y-axis. |
ylab |
Label for x-axis. |
col |
Vector with colors, one for each element of
object. Passed to |
lwd |
Vector with line widths, one for each element of
object. Passed to |
lty |
lwd Vector with line style, one for each element of
object. Passed to |
pch |
Passed to |
cause |
For competing risks models, the cause of failure or event of interest |
percent |
If TRUE axes labels are multiplied by 100 and thus interpretable on a percent scale. |
giveToModel |
List of with exactly one entry for each entry in
|
na.action |
Passed to |
cores |
Number of cores for parallel computing. Passed as
value of argument |
verbose |
if |
cex |
Default cex used for legend and labels. |
... |
Used to control the subroutines: plot, axis, lines, barplot,
legend. See |
For method "nne" the optimal bandwidth with respect to is obtained with the
function dpik
from the package KernSmooth
for a box
kernel function.
list with elements: time, pseudoFrame and bandwidth (NULL for method quantile).
Thomas Alexander Gerds [email protected]
library(prodlim) library(lava) library(riskRegression) library(survival) # survival dlearn <- SimSurv(40) dval <- SimSurv(100) f <- coxph(Surv(time,status)~X1+X2,data=dlearn,x=TRUE,y=TRUE) cf=calPlot(f,time=3,data=dval) print(cf) plot(cf) g <- coxph(Surv(time,status)~X2,data=dlearn,x=TRUE,y=TRUE) cf2=calPlot(list("Cox regression X1+X2"=f,"Cox regression X2"=g), time=3, type="risk", data=dval) print(cf2) plot(cf2) calPlot(f,time=3,data=dval,type="survival") calPlot(f,time=3,data=dval,bars=TRUE,pseudo=FALSE) calPlot(f,time=3,data=dval,bars=TRUE,type="risk",pseudo=FALSE) ## show a red line which follows the hanging bars calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE) a <- calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE,abline.col=NULL) lines(c(0,1,ceiling(a$xcoord)), c(a$offset[1],a$offset,a$offset[length(a$offset)]), col=2,lwd=5,type="s") calPlot(f,time=3,data=dval,bars=TRUE,type="risk",hanging=TRUE) set.seed(13) m <- crModel() regression(m, from = "X1", to = "eventtime1") <- 1 regression(m, from = "X2", to = "eventtime1") <- 1 m <- addvar(m,c("X3","X4","X5")) distribution(m, "X1") <- binomial.lvm() distribution(m, "X4") <- binomial.lvm() d1 <- sim(m,100) d2 <- sim(m,100) csc <- CSC(Hist(time,event)~X1+X2+X3+X4+X5,data=d1) fgr <- FGR(Hist(time,event)~X1+X2+X3+X4+X5,data=d1,cause=1) if ((requireNamespace("cmprsk",quietly=TRUE))){ predict.crr <- cmprsk:::predict.crr cf3=calPlot(list("Cause-specific Cox"=csc,"Fine-Gray"=fgr), time=5, legend.x=-0.3, legend.y=1.35, ylab="Observed event status", legend.legend=c("Cause-specific Cox regression","Fine-Gray regression"), legend.xpd=NA) print(cf3) plot(cf3) b1 <- calPlot(list("Fine-Gray"=fgr),time=5,bars=TRUE,hanging=FALSE) print(b1) plot(b1) calPlot(fgr,time=5,bars=TRUE,hanging=TRUE) }
library(prodlim) library(lava) library(riskRegression) library(survival) # survival dlearn <- SimSurv(40) dval <- SimSurv(100) f <- coxph(Surv(time,status)~X1+X2,data=dlearn,x=TRUE,y=TRUE) cf=calPlot(f,time=3,data=dval) print(cf) plot(cf) g <- coxph(Surv(time,status)~X2,data=dlearn,x=TRUE,y=TRUE) cf2=calPlot(list("Cox regression X1+X2"=f,"Cox regression X2"=g), time=3, type="risk", data=dval) print(cf2) plot(cf2) calPlot(f,time=3,data=dval,type="survival") calPlot(f,time=3,data=dval,bars=TRUE,pseudo=FALSE) calPlot(f,time=3,data=dval,bars=TRUE,type="risk",pseudo=FALSE) ## show a red line which follows the hanging bars calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE) a <- calPlot(f,time=3,data=dval,bars=TRUE,hanging=TRUE,abline.col=NULL) lines(c(0,1,ceiling(a$xcoord)), c(a$offset[1],a$offset,a$offset[length(a$offset)]), col=2,lwd=5,type="s") calPlot(f,time=3,data=dval,bars=TRUE,type="risk",hanging=TRUE) set.seed(13) m <- crModel() regression(m, from = "X1", to = "eventtime1") <- 1 regression(m, from = "X2", to = "eventtime1") <- 1 m <- addvar(m,c("X3","X4","X5")) distribution(m, "X1") <- binomial.lvm() distribution(m, "X4") <- binomial.lvm() d1 <- sim(m,100) d2 <- sim(m,100) csc <- CSC(Hist(time,event)~X1+X2+X3+X4+X5,data=d1) fgr <- FGR(Hist(time,event)~X1+X2+X3+X4+X5,data=d1,cause=1) if ((requireNamespace("cmprsk",quietly=TRUE))){ predict.crr <- cmprsk:::predict.crr cf3=calPlot(list("Cause-specific Cox"=csc,"Fine-Gray"=fgr), time=5, legend.x=-0.3, legend.y=1.35, ylab="Observed event status", legend.legend=c("Cause-specific Cox regression","Fine-Gray regression"), legend.xpd=NA) print(cf3) plot(cf3) b1 <- calPlot(list("Fine-Gray"=fgr),time=5,bars=TRUE,hanging=FALSE) print(b1) plot(b1) calPlot(fgr,time=5,bars=TRUE,hanging=TRUE) }
In survival analysis, a pair of patients is called concordant if the risk of the event predicted by a model is lower for the patient who experiences the event at a later timepoint. The concordance probability (C-index) is the frequency of concordant pairs among all pairs of subjects. It can be used to measure and compare the discriminative power of a risk prediction models. The function provides an inverse of the probability of censoring weigthed estimate of the concordance probability to adjust for right censoring. Cross-validation based on bootstrap resampling or bootstrap subsampling can be applied to assess and compare the discriminative power of various regression modelling strategies on the same set of data.
cindex( object, formula, data, eval.times, pred.times, cause, lyl = FALSE, cens.model = "marginal", ipcw.refit = FALSE, ipcw.args = NULL, ipcw.limit, tiedPredictionsIn = TRUE, tiedOutcomeIn = TRUE, tiedMatchIn = TRUE, splitMethod = "noPlan", B, M, model.args = NULL, model.parms = NULL, keep.models = FALSE, keep.residuals = FALSE, keep.pvalues = FALSE, keep.weights = FALSE, keep.index = FALSE, keep.matrix = FALSE, multiSplitTest = FALSE, testTimes, confInt = FALSE, confLevel = 0.95, verbose = TRUE, savePath = NULL, slaveseed = NULL, na.action = na.fail, ... )
cindex( object, formula, data, eval.times, pred.times, cause, lyl = FALSE, cens.model = "marginal", ipcw.refit = FALSE, ipcw.args = NULL, ipcw.limit, tiedPredictionsIn = TRUE, tiedOutcomeIn = TRUE, tiedMatchIn = TRUE, splitMethod = "noPlan", B, M, model.args = NULL, model.parms = NULL, keep.models = FALSE, keep.residuals = FALSE, keep.pvalues = FALSE, keep.weights = FALSE, keep.index = FALSE, keep.matrix = FALSE, multiSplitTest = FALSE, testTimes, confInt = FALSE, confLevel = 0.95, verbose = TRUE, savePath = NULL, slaveseed = NULL, na.action = na.fail, ... )
object |
A named list of prediction models, where allowed entries are
(1) R-objects for which a predictSurvProb method exists (see
details), (2) a |
formula |
A survival formula. The left hand side is used to finde the
status response variable in |
data |
A data frame in which to validate the prediction models and to
fit the censoring model. If |
eval.times |
A vector of timepoints for evaluating the discriminative
ability. At each timepoint the c-index is computed using only those pairs
where one of the event times is known to be earlier than this timepoint. If
|
pred.times |
A vector of timepoints for evaluating the prediction
models. This should either be exactly one timepoint used for all
|
cause |
For competing risks, the event of interest. Defaults to the
first state of the response, which is obtained by evaluating the left hand
side of |
lyl |
If |
cens.model |
Method for estimating inverse probability of censoring weigths:
|
ipcw.refit |
If |
ipcw.args |
List of arguments passed to function specified by argument |
ipcw.limit |
Value between 0 and 1 (but no equal to 0!) used to cut for small weights in order to stabilize the estimate at late times were few individuals are observed. |
tiedPredictionsIn |
If |
tiedOutcomeIn |
If |
tiedMatchIn |
If |
splitMethod |
Defines the internal validation design:
|
B |
Number of bootstrap samples. The default depends on argument
|
M |
The size of the bootstrap samples for resampling without replacement. Ignored for resampling with replacement. |
model.args |
List of extra arguments that can be passed to the
|
model.parms |
Experimental. List of with exactly one entry for each
entry in |
keep.models |
Logical. If |
keep.residuals |
Experimental. |
keep.pvalues |
Experimental. |
keep.weights |
Experimental. |
keep.index |
Logical. If |
keep.matrix |
Logical. If |
multiSplitTest |
Experimental. |
testTimes |
A vector of time points for testing differences between models in the time-point specific Brier scores. |
confInt |
Experimental. |
confLevel |
Experimental. |
verbose |
if |
savePath |
Place in your filesystem (directory) where training models
fitted during cross-validation are saved. If |
slaveseed |
Vector of seeds, as long as |
na.action |
Passed immediately to model.frame. Defaults to na.fail. If set otherwise most prediction models will not work. |
... |
Not used. |
Pairs with identical observed times, where one is uncensored and one is
censored, are always considered usuable (independent of the value of
tiedOutcomeIn
), as it can be assumed that the event occurs at a later
timepoint for the censored observation.
For uncensored response the result equals the one obtained with the
functions rcorr.cens
and rcorrcens
from the Hmisc
package (see examples).
Estimates of the C-index.
Thomas A Gerds [email protected]
TA Gerds, MW Kattan, M Schumacher, and C Yu. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine, Ahead of print:to appear, 2013. DOI = 10.1002/sim.5681
Wolbers, M and Koller, MT and Witteman, JCM and Gerds, TA (2013) Concordance for prognostic models with competing risks Research report 13/3. Department of Biostatistics, University of Copenhagen
Andersen, PK (2012) A note on the decomposition of number of life years lost according to causes of death Research report 12/2. Department of Biostatistics, University of Copenhagen
Paul Blanche, Michael W Kattan, and Thomas A Gerds. The c-index is not proper for the evaluation of-year predicted risks. Biostatistics, 20(2): 347–357, 2018.
# simulate data based on Weibull regression library(prodlim) set.seed(13) dat <- SimSurv(100) # fit three different Cox models and a random survival forest # note: low number of trees for the purpose of illustration library(survival) cox12 <- coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) cox1 <- coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE) cox2 <- coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) # # compute the apparent estimate of the C-index at a single time point # A1 <- pec::cindex(list("Cox X1"=cox1), formula=Surv(time,status)~X1+X2, data=dat, eval.times=10) # # compute the apparent estimate of the C-index at different time points # ApparrentCindex <- pec::cindex(list("Cox X1"=cox1, "Cox X2"=cox2, "Cox X1+X2"=cox12), formula=Surv(time,status)~X1+X2, data=dat, eval.times=seq(1,15,1)) print(ApparrentCindex) plot(ApparrentCindex) # # compute the bootstrap-crossvalidation estimate of # the C-index at different time points # set.seed(142) bcvCindex <- pec::cindex(list("Cox X1"=cox1, "Cox X2"=cox2, "Cox X1+X2"=cox12), formula=Surv(time,status)~X1+X2, data=dat, splitMethod="bootcv", B=5, eval.times=seq(1,15,1)) print(bcvCindex) plot(bcvCindex) # for uncensored data the results are the same # as those obtained with the function rcorr.cens from Hmisc set.seed(16) dat <- SimSurv(30) dat$staus=1 fit12 <- coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) fit1 <- coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE) fit2 <- coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) Cpec <- pec::cindex(list("Cox X1+X2"=fit12,"Cox X1"=fit1,"Cox X2"=fit2), formula=Surv(time,status)~1, data=dat) p1 <- predictSurvProb(fit1,newdata=dat,times=10) p2 <- predictSurvProb(fit2,newdata=dat,times=10) p12 <- predictSurvProb(fit12,newdata=dat,times=10) if (requireNamespace("Hmisc",quietly=TRUE)){ library(Hmisc) harrelC1 <- rcorr.cens(p1,with(dat,Surv(time,status))) harrelC2 <- rcorr.cens(p2,with(dat,Surv(time,status))) harrelC12 <- rcorr.cens(p12,with(dat,Surv(time,status))) harrelC1[["C Index"]]==Cpec$AppCindex[["Cox.X1"]] harrelC2[["C Index"]]==Cpec$AppCindex[["Cox.X2"]] harrelC12[["C Index"]]==Cpec$AppCindex[["Cox.X1.X2"]] } # # competing risks # library(riskRegression) library(prodlim) set.seed(30) dcr.learn <- SimCompRisk(30) dcr.val <- SimCompRisk(30) pec::cindex(CSC(Hist(time,event)~X1+X2,data=dcr.learn),data=dcr.val) fit <- CSC(Hist(time,event)~X1+X2,data=dcr.learn) cif <- predictRisk(fit,newdata=dcr.val,times=3,cause=1) pec::cindex(list(fit),data=dcr.val,times=3)
# simulate data based on Weibull regression library(prodlim) set.seed(13) dat <- SimSurv(100) # fit three different Cox models and a random survival forest # note: low number of trees for the purpose of illustration library(survival) cox12 <- coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) cox1 <- coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE) cox2 <- coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) # # compute the apparent estimate of the C-index at a single time point # A1 <- pec::cindex(list("Cox X1"=cox1), formula=Surv(time,status)~X1+X2, data=dat, eval.times=10) # # compute the apparent estimate of the C-index at different time points # ApparrentCindex <- pec::cindex(list("Cox X1"=cox1, "Cox X2"=cox2, "Cox X1+X2"=cox12), formula=Surv(time,status)~X1+X2, data=dat, eval.times=seq(1,15,1)) print(ApparrentCindex) plot(ApparrentCindex) # # compute the bootstrap-crossvalidation estimate of # the C-index at different time points # set.seed(142) bcvCindex <- pec::cindex(list("Cox X1"=cox1, "Cox X2"=cox2, "Cox X1+X2"=cox12), formula=Surv(time,status)~X1+X2, data=dat, splitMethod="bootcv", B=5, eval.times=seq(1,15,1)) print(bcvCindex) plot(bcvCindex) # for uncensored data the results are the same # as those obtained with the function rcorr.cens from Hmisc set.seed(16) dat <- SimSurv(30) dat$staus=1 fit12 <- coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) fit1 <- coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE) fit2 <- coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) Cpec <- pec::cindex(list("Cox X1+X2"=fit12,"Cox X1"=fit1,"Cox X2"=fit2), formula=Surv(time,status)~1, data=dat) p1 <- predictSurvProb(fit1,newdata=dat,times=10) p2 <- predictSurvProb(fit2,newdata=dat,times=10) p12 <- predictSurvProb(fit12,newdata=dat,times=10) if (requireNamespace("Hmisc",quietly=TRUE)){ library(Hmisc) harrelC1 <- rcorr.cens(p1,with(dat,Surv(time,status))) harrelC2 <- rcorr.cens(p2,with(dat,Surv(time,status))) harrelC12 <- rcorr.cens(p12,with(dat,Surv(time,status))) harrelC1[["C Index"]]==Cpec$AppCindex[["Cox.X1"]] harrelC2[["C Index"]]==Cpec$AppCindex[["Cox.X2"]] harrelC12[["C Index"]]==Cpec$AppCindex[["Cox.X1.X2"]] } # # competing risks # library(riskRegression) library(prodlim) set.seed(30) dcr.learn <- SimCompRisk(30) dcr.val <- SimCompRisk(30) pec::cindex(CSC(Hist(time,event)~X1+X2,data=dcr.learn),data=dcr.val) fit <- CSC(Hist(time,event)~X1+X2,data=dcr.learn) cif <- predictRisk(fit,newdata=dcr.val,times=3,cause=1) pec::cindex(list(fit),data=dcr.val,times=3)
This data set contains a subset of the data from the Copenhagen stroke study.
This data frame contains the observations of 518 stroke patients :
Age of the patients in years.
A factor
with two levels female
and male
.
Hypertension,
a factor with two levels no
and yes
.
History of
ischemic heart disease at admission, a factor with two levels no
and
yes
.
History of previous strokes before admission,
a factor with two levels no
and yes
.
History of other disabling diseases (e.g. severe
dementia), a factor with two levels no
and yes
.
Daily alcohol consumption, a factor with two levels no
and yes
.
Diabetes mellitus status indicating if the
glucose level was higher than 11 mmol/L, a factor with two levels no
and yes
.
Daily smoking status, a factor with two levels
no
and yes
.
Atrial fibrillation, a factor
with two levels no
and yes
.
Hemorrhage (stroke
subtype), a factor with two levels no
(infarction) and yes
(hemorrhage).
Scandinavian stroke score at admission to the hospital. Ranges from 0 (worst) to 58 (best).
Cholesterol level
Survival time (in days).
Status (0
: censored, 1
: event).
Joergensen HS, Nakayama H, Reith J, Raaschou HO, and Olsen TS. Acute stroke with atrial fibrillation. The Copenhagen Stroke Study. Stroke, 27(10):1765-9, 1996.
Mogensen UB, Ishwaran H, and Gerds TA. Evaluating random forests for survival analysis using prediction error curves. Technical Report 8, University of Copenhagen, Department of Biostatistics, 2010.
CoxBoost
of package CoxBoost
.Formula interface for function CoxBoost
of package CoxBoost
.
coxboost(formula, data, cv = TRUE, cause = 1, penalty, ...)
coxboost(formula, data, cv = TRUE, cause = 1, penalty, ...)
formula |
An event-history formula for competing risks of the
form |
data |
A data.frame in which the variables of formula are defined. |
cv |
If |
cause |
The cause of interest in competing risk models. |
penalty |
See |
... |
Arguments passed to either |
See CoxBoost
.
See CoxBoost
.
Thomas Alexander Gerds [email protected]
See CoxBoost
.
See CoxBoost
.
Computes the cumulative prediction error curves, aka integrated Brier scores, in ranges of time.
crps(object, models, what, times, start)
crps(object, models, what, times, start)
object |
An object with estimated prediction error curves obtained with the function pec |
models |
Which models in |
what |
The name of the entry in |
times |
Time points at which the integration of the prediction error curve stops. |
start |
The time point at which the integration of the prediction error curve is started. |
The cumulative prediction error (continuous ranked probability score) is defined as the area under the prediction error curve, hence the alias name, ibs, which is short for integrated Brier score.
A matrix with a column for the crps (ibs) at every requested time point and a row for each model
Thomas A. Gerds [email protected]
E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.
Gerds TA, Cai T & Schumacher M (2008) The performance of risk prediction models Biometrical Journal, 50(4), 457–479
set.seed(18713) library(prodlim) library(survival) dat=SimSurv(100) pmodel=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) perror=pec(list(Cox=pmodel),Hist(time,status)~1,data=dat) ## cumulative prediction error crps(perror,times=1) # between min time and 1 ## same thing: ibs(perror,times=1) # between min time and 1 crps(perror,times=1,start=0) # between 0 and 1 crps(perror,times=seq(0,1,.2),start=0) # between 0 and seq(0,1,.2)
set.seed(18713) library(prodlim) library(survival) dat=SimSurv(100) pmodel=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) perror=pec(list(Cox=pmodel),Hist(time,status)~1,data=dat) ## cumulative prediction error crps(perror,times=1) # between min time and 1 ## same thing: ibs(perror,times=1) # between min time and 1 crps(perror,times=1,start=0) # between 0 and 1 crps(perror,times=seq(0,1,.2),start=0) # between 0 and seq(0,1,.2)
A data frame containing the observations from the GBSG2 study.
This data frame contains the observations of 686 women:
hormonal therapy, a factor at two levels no
and
yes
.
of the patients in years.
menopausal status, a factor at two levels pre
(premenopausal) and post
(postmenopausal).
tumor size (in mm).
tumor grade, a ordered factor at levels I < II
< III
.
number of positive nodes.
progesterone receptor (in fmol).
estrogen receptor (in fmol).
recurrence free survival time (in days).
censoring indicator (0- censored, 1- event).
M. Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski,
W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker
for the German Breast Cancer Study Group (1994), Randomized
trial evaluating hormonal treatment and the duration of chemotherapy in
node-positive breast cancer patients. Journal of Clinical Oncology,
12, 2086–2093.
This function is used internally by the function pec
to obtain
inverse of the probability of censoring weights.
ipcw( formula, data, method, args, times, subjectTimes, subjectTimesLag = 1, what )
ipcw( formula, data, method, args, times, subjectTimes, subjectTimesLag = 1, what )
formula |
A survival formula like, |
data |
The data used for fitting the censoring model |
method |
Censoring model used for estimation of the (conditional) censoring distribution. |
args |
A list of arguments which is passed to method |
times |
For |
subjectTimes |
For |
subjectTimesLag |
If equal to |
what |
Decide about what to do: If equal to
|
Inverse of the probability of censoring weights (IPCW) usually refer to the probabilities of not being censored at certain time points. These probabilities are also the values of the conditional survival function of the censoring time given covariates. The function ipcw estimates the conditional survival function of the censoring times and derives the weights.
IMPORTANT: the data set should be ordered, order(time,-status)
in
order to get the values IPCW.subjectTimes
in the right order for some
choices of method
.
times |
The times at which weights are estimated |
IPCW.times |
Estimated weights at |
IPCW.subjectTimes |
Estimated weights at individual time values
|
fit |
The fitted censoring model |
method |
The method for modelling the censoring distribution |
call |
The call |
Thomas A. Gerds [email protected]
library(prodlim) library(rms) library(survival) dat=SimSurv(30) dat <- dat[order(dat$time),] # using the marginal Kaplan-Meier for the censoring times WKM=ipcw(Hist(time,status)~X2, data=dat, method="marginal", times=sort(unique(dat$time)), subjectTimes=dat$time) plot(WKM$fit) WKM$fit # using the Cox model for the censoring times given X2 library(survival) WCox=ipcw(Hist(time=time,event=status)~X2, data=dat, method="cox", times=sort(unique(dat$time)), subjectTimes=dat$time) WCox$fit plot(WKM$fit) lines(sort(unique(dat$time)), 1-WCox$IPCW.times[1,], type="l", col=2, lty=3, lwd=3) lines(sort(unique(dat$time)), 1-WCox$IPCW.times[5,], type="l", col=3, lty=3, lwd=3) # using the stratified Kaplan-Meier # for the censoring times given X2 WKM2=ipcw(Hist(time,status)~X2, data=dat, method="nonpar", times=sort(unique(dat$time)), subjectTimes=dat$time) plot(WKM2$fit,add=FALSE)
library(prodlim) library(rms) library(survival) dat=SimSurv(30) dat <- dat[order(dat$time),] # using the marginal Kaplan-Meier for the censoring times WKM=ipcw(Hist(time,status)~X2, data=dat, method="marginal", times=sort(unique(dat$time)), subjectTimes=dat$time) plot(WKM$fit) WKM$fit # using the Cox model for the censoring times given X2 library(survival) WCox=ipcw(Hist(time=time,event=status)~X2, data=dat, method="cox", times=sort(unique(dat$time)), subjectTimes=dat$time) WCox$fit plot(WKM$fit) lines(sort(unique(dat$time)), 1-WCox$IPCW.times[1,], type="l", col=2, lty=3, lwd=3) lines(sort(unique(dat$time)), 1-WCox$IPCW.times[5,], type="l", col=3, lty=3, lwd=3) # using the stratified Kaplan-Meier # for the censoring times given X2 WKM2=ipcw(Hist(time,status)~X2, data=dat, method="nonpar", times=sort(unique(dat$time)), subjectTimes=dat$time) plot(WKM2$fit,add=FALSE)
PBC3 was a multi-centre randomized clinical trial conducted in six European hospitals. Between 1 Jan. 1983 and 1 Jan. 1987, 349 patients with the liver disease primary biliary cirrhosis (PBC) were randomized to either treatment with Cyclosporin A (CyA, 176 patients) or placebo (173 patients). The purpose of the trial was to study the effect of treatment on the survival time. However, during the course of the trial an increased use of liver transplantation for patients with this disease made the investigators redefine the main response variable to be time to “failure of medical treatment” defined as either death or liver transplantation. Patients were then followed from randomization until treatment failure, drop-out or 1 Jan, 1989; 61 patients died (CyA: 30, placebo: 31), another 29 were transplanted (CyA: 14, placebo: 15) and 4 patients were lost to follow-up before 1 Jan. 1989. At entry a number of clinical, biochemical and histological variables, including serum bilirubin, serum albumin, sex, age were recorded.
A data frame with 349 observations on the following 15 variables.
patient identification
hospital (1: Hvidovre, 2: London, 3: Copenhagen, 4: Barcelona, 5: Munich, 6: Lyon)
treatment (0: placebo, 1: CyA)
(1: males, 0: females)
age in years
histological stage (1, 2, 3, 4)
previous gastrointestinal bleeding (1: yes, 0: no)
creatinine (micromoles/L)
albumin (g/L)
bilirubin (micromoles/L)
alkaline phosphatase (IU/L)
aspartate transaminase (IU/L)
body weight (kg)
observation time (days)
status at observation time (0: censored, 1: liver transplantation, 2 : dead)
Andersen and Skovgaard. Regression with linear predictors.
Andersen and Skovgaard. Regression with linear predictors. Springer, 2010.
data(Pbc3)
data(Pbc3)
Evaluating the performance of risk prediction models in survival analysis. The Brier score is a weighted average of the squared distances between the observed survival status and the predicted survival probability of a model. Roughly the weights correspond to the probabilities of not being censored. The weights can be estimated depend on covariates. Prediction error curves are obtained when the Brier score is followed over time. Cross-validation based on bootstrap resampling or bootstrap subsampling can be applied to assess and compare the predictive power of various regression modelling strategies on the same set of data.
pec( object, formula, data, traindata, times, cause, start, maxtime, exact = TRUE, exactness = 100, fillChar = NA, cens.model = "cox", ipcw.refit = FALSE, ipcw.args = NULL, splitMethod = "none", B, M, reference = TRUE, model.args = NULL, model.parms = NULL, keep.index = FALSE, keep.matrix = FALSE, keep.models = FALSE, keep.residuals = FALSE, keep.pvalues = FALSE, noinf.permute = FALSE, multiSplitTest = FALSE, testIBS, testTimes, confInt = FALSE, confLevel = 0.95, verbose = TRUE, savePath = NULL, slaveseed = NULL, na.action = na.fail, ... )
pec( object, formula, data, traindata, times, cause, start, maxtime, exact = TRUE, exactness = 100, fillChar = NA, cens.model = "cox", ipcw.refit = FALSE, ipcw.args = NULL, splitMethod = "none", B, M, reference = TRUE, model.args = NULL, model.parms = NULL, keep.index = FALSE, keep.matrix = FALSE, keep.models = FALSE, keep.residuals = FALSE, keep.pvalues = FALSE, noinf.permute = FALSE, multiSplitTest = FALSE, testIBS, testTimes, confInt = FALSE, confLevel = 0.95, verbose = TRUE, savePath = NULL, slaveseed = NULL, na.action = na.fail, ... )
object |
A named list of prediction models, where allowed entries are
(1) R-objects for which a predictSurvProb method exists (see
details), (2) a |
formula |
A survival formula as obtained either
with |
data |
A data frame in which to validate the prediction models and to
fit the censoring model. If |
traindata |
A data frame in which the models are trained. This argument is used only in the absence of crossvalidation, in which case it is passed to the predictHandler function predictSurvProb |
times |
A vector of time points. At each time point the prediction
error curves are estimated. If |
cause |
For competing risks, the event of interest. Defaults to the
first state of the response, which is obtained by evaluating the left hand
side of |
start |
Minimal time for estimating the prediction error curves. If
missing and |
maxtime |
Maximal time for estimating the prediction error curves. If missing the largest value of the response variable is used. |
exact |
Logical. If |
exactness |
An integer that determines how many equidistant gridpoints
are used between |
fillChar |
Symbol used to fill-in places where the values of the
prediction error curves are not available. The default is |
cens.model |
Method for estimating inverse probability of censoring weigths:
|
ipcw.refit |
If |
ipcw.args |
List of arguments passed to function specified by argument |
splitMethod |
SplitMethod for estimating the prediction error curves.
The random splitting is repeated
|
B |
Number of bootstrap samples. The default depends on argument
|
M |
The size of the bootstrap samples for resampling without replacement. Ignored for resampling with replacement. |
reference |
Logical. If |
model.args |
List of extra arguments that can be passed to the
|
model.parms |
Experimental. List of with exactly one entry for each
entry in |
keep.index |
Logical. If |
keep.matrix |
Logical. If |
keep.models |
Logical. If |
keep.residuals |
Logical. If |
keep.pvalues |
For |
noinf.permute |
If |
multiSplitTest |
If |
testIBS |
A range of time points for testing differences between models in the integrated Brier scores. |
testTimes |
A vector of time points for testing differences between models in the time-point specific Brier scores. |
confInt |
Experimental. |
confLevel |
Experimental. |
verbose |
if |
savePath |
Place in your file system (i.e., a directory on your
computer) where training models fitted during cross-validation are saved. If
|
slaveseed |
Vector of seeds, as long as |
na.action |
Passed immediately to model.frame. Defaults to na.fail. If set otherwise most prediction models will not work. |
... |
Not used. |
Note that package riskRegression provides very similar functionality (and much more) but not yet every feature of pec.
Missing data in the response or in the input matrix cause a failure.
The status of the continuous response variable at cutpoints (times
),
ie status=1 if the response value exceeds the cutpoint and status=0
otherwise, is compared to predicted event status probabilities which are
provided by the prediction models on the basis of covariates. The
comparison is done with the Brier score: the quadratic difference between
0-1 response status and predicted probability.
There are two different sources for bias when estimating prediction error in
right censored survival problems: censoring and high flexibility of the
prediction model. The first is controlled by inverse probability of
censoring weighting, the second can be controlled by special Monte Carlo
simulation. In each step, the resampling procedures reevaluate the
prediction model. Technically this is done by replacing the argument
object$call$data
by the current subset or bootstrap sample of the
full data.
For each prediction model there must be a predictSurvProb
method: for
example, to assess a prediction model which evaluates to a myclass
object one defines a function called predictSurvProb.myclass
with
arguments object,newdata,cutpoints,...
Such a function takes the object and derives a matrix with predicted event status probabilities for each subject in newdata (rows) and each cutpoint (column) of the response variable that defines an event status.
Currently, predictSurvProb
methods are readily available for
various survival models, see methods(predictSurvProb)
A pec
object. See also the help pages of the corresponding
print
, summary
, and plot
methods. The object includes
the following components:
PredErr |
The estimated prediction error
according to the |
AppErr |
The training error or apparent error obtained when the
model(s) are evaluated in the same data where they were trained. Only if
|
BootCvErr |
The prediction error when the model(s)
are trained in the bootstrap sample and evaluated in the data that are not
in the bootstrap sample. Only if |
NoInfErr |
The prediction
error when the model(s) are evaluated in the permuted data. Only if
|
weight |
The weight used to linear combine the
|
overfit |
Estimated |
call |
The call that produced the object |
time |
The time points at which the prediction error curves change. |
ipcw.fit |
The fitted censoring model that was used for re-weighting the Brier score residuals. See Gerds and Schumacher (2006, Biometrical Journal) |
n.risk |
The number of subjects at risk for all time points. |
models |
The prediction models fitted in their own data. |
cens.model |
The censoring models. |
maxtime |
The latest timepoint where the prediction error curves are estimated. |
start |
The earliest timepoint where the prediction error curves are estimated. |
exact |
|
splitMethod |
The splitMethod used for estimation of the overfitting bias. |
Thomas Alexander Gerds [email protected]
Gerds TA, Kattan MW. Medical Risk Prediction Models: With Ties to Machine Learning. Chapman and Hall/CRC https://www.routledge.com/9781138384477
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.
Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548–560 Improvement On Cross-Validation: The .632+ Bootstrap Method.
Gerds, Schumacher (2006), Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, vol 48, 1029–1040.
Thomas A. Gerds, Martin Schumacher (2007) Efron-Type Measures of Prediction Error for Survival Analysis Biometrics, 63(4), 1283–1287 doi:10.1111/j.1541-0420.2007.00832.x
Martin Schumacher, Harald Binder, and Thomas Gerds. Assessment of survival prediction models based on microarray data. Bioinformatics, 23(14):1768-74, 2007.
Mark A. van de Wiel, Johannes Berkhof, and Wessel N. van Wieringen Testing the prediction error difference between 2 predictors Biostatistics (2009) 10(3): 550-560 doi:10.1093/biostatistics/kxp011
plot.pec
, summary.pec
,
R2
, crps
# simulate an artificial data frame # with survival response and two predictors set.seed(130971) library(prodlim) library(survival) dat <- SimSurv(100) # fit some candidate Cox models and compute the Kaplan-Meier estimate Models <- list("Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) # compute the apparent prediction error PredError <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="none", B=0, verbose=TRUE) print(PredError,times=seq(5,30,5)) summary(PredError) plot(PredError,xlim=c(0,30)) # Comparison of Weibull model and Cox model library(survival) library(rms) library(pec) data(pbc) pbc <- pbc[sample(1:NROW(pbc),size=100),] f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc) f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE) f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE) brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1) print(brier) plot(brier) # compute the .632+ estimate of the generalization error set.seed(130971) library(prodlim) library(survival) dat <- SimSurv(100) set.seed(17100) PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="Boot632plus", B=3, verbose=TRUE) print(PredError.632plus,times=seq(4,12,4)) summary(PredError.632plus) plot(PredError.632plus,xlim=c(0,30)) # do the same again but now in parallel ## Not run: set.seed(17100) # library(doMC) # registerDoMC() PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="Boot632plus", B=3, verbose=TRUE) ## End(Not run) # assessing parametric survival models in learn/validation setting learndat <- SimSurv(50) testdat <- SimSurv(30) library(survival) library(rms) f1 <- psm(Surv(time,status)~X1+X2,data=learndat) f2 <- psm(Surv(time,status)~X1,data=learndat) pf <- pec(list(f1,f2),formula=Surv(time,status)~1,data=testdat,maxtime=200) plot(pf) summary(pf) # ---------------- competing risks ----------------- library(survival) library(riskRegression) if(requireNamespace("cmprsk",quietly=TRUE)){ library(cmprsk) library(pec) cdat <- SimCompRisk(100) f1 <- CSC(Hist(time,event)~X1+X2,cause=2,data=cdat) f2 <- CSC(Hist(time,event)~X1,data=cdat,cause=2) f3 <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat) f4 <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat) p1 <- pec(list(f1,f2,f3,f4),formula=Hist(time,event)~1,data=cdat,cause=2) }
# simulate an artificial data frame # with survival response and two predictors set.seed(130971) library(prodlim) library(survival) dat <- SimSurv(100) # fit some candidate Cox models and compute the Kaplan-Meier estimate Models <- list("Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) # compute the apparent prediction error PredError <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="none", B=0, verbose=TRUE) print(PredError,times=seq(5,30,5)) summary(PredError) plot(PredError,xlim=c(0,30)) # Comparison of Weibull model and Cox model library(survival) library(rms) library(pec) data(pbc) pbc <- pbc[sample(1:NROW(pbc),size=100),] f1 <- psm(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc) f2 <- coxph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,x=TRUE,y=TRUE) f3 <- cph(Surv(time,status!=0)~edema+log(bili)+age+sex+albumin,data=pbc,surv=TRUE) brier <- pec(list("Weibull"=f1,"CoxPH"=f2,"CPH"=f3),data=pbc,formula=Surv(time,status!=0)~1) print(brier) plot(brier) # compute the .632+ estimate of the generalization error set.seed(130971) library(prodlim) library(survival) dat <- SimSurv(100) set.seed(17100) PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="Boot632plus", B=3, verbose=TRUE) print(PredError.632plus,times=seq(4,12,4)) summary(PredError.632plus) plot(PredError.632plus,xlim=c(0,30)) # do the same again but now in parallel ## Not run: set.seed(17100) # library(doMC) # registerDoMC() PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="Boot632plus", B=3, verbose=TRUE) ## End(Not run) # assessing parametric survival models in learn/validation setting learndat <- SimSurv(50) testdat <- SimSurv(30) library(survival) library(rms) f1 <- psm(Surv(time,status)~X1+X2,data=learndat) f2 <- psm(Surv(time,status)~X1,data=learndat) pf <- pec(list(f1,f2),formula=Surv(time,status)~1,data=testdat,maxtime=200) plot(pf) summary(pf) # ---------------- competing risks ----------------- library(survival) library(riskRegression) if(requireNamespace("cmprsk",quietly=TRUE)){ library(cmprsk) library(pec) cdat <- SimCompRisk(100) f1 <- CSC(Hist(time,event)~X1+X2,cause=2,data=cdat) f2 <- CSC(Hist(time,event)~X1,data=cdat,cause=2) f3 <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat) f4 <- FGR(Hist(time,event)~X1+X2,cause=2,data=cdat) p1 <- pec(list(f1,f2,f3,f4),formula=Hist(time,event)~1,data=cdat,cause=2) }
S3-wrapper function for cforest from the party package
pecCforest(formula, data, ...)
pecCforest(formula, data, ...)
formula |
Passed on as is. See |
data |
Passed on as is. See |
... |
Passed on as they are. See |
See cforest
of the party
package.
list with two elements: cforest and call
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
The call is added to an ctree object
pecCtree(...)
pecCtree(...)
... |
passed to ctree |
list with two elements: ctree and call
Thomas A. Gerds <[email protected]>
pecCforest
if (requireNamespace("party",quietly=TRUE)){ library(prodlim) library(survival) set.seed(50) d <- SimSurv(50) nd <- data.frame(X1=c(0,1,0),X2=c(-1,0,1)) f <- pecCtree(Surv(time,status)~X1+X2,data=d) predictSurvProb(f,newdata=nd,times=c(3,8)) }
if (requireNamespace("party",quietly=TRUE)){ library(prodlim) library(survival) set.seed(50) d <- SimSurv(50) nd <- data.frame(X1=c(0,1,0),X2=c(-1,0,1)) f <- pecCtree(Surv(time,status)~X1+X2,data=d) predictSurvProb(f,newdata=nd,times=c(3,8)) }
Combines the rpart result with a stratified Kaplan-Meier (prodlim) to predict survival
pecRpart(formula, data, ...)
pecRpart(formula, data, ...)
formula |
passed to rpart |
data |
passed to rpart |
... |
passed to rpart |
list with three elements: ctree and call
library(prodlim) if (!requireNamespace("rpart",quietly=TRUE)){ library(rpart) library(survival) set.seed(50) d <- SimSurv(50) nd <- data.frame(X1=c(0,1,0),X2=c(-1,0,1)) f <- pecRpart(Surv(time,status)~X1+X2,data=d) predictSurvProb(f,newdata=nd,times=c(3,8)) }
library(prodlim) if (!requireNamespace("rpart",quietly=TRUE)){ library(rpart) library(survival) set.seed(50) d <- SimSurv(50) nd <- data.frame(X1=c(0,1,0),X2=c(-1,0,1)) f <- pecRpart(Surv(time,status)~X1+X2,data=d) predictSurvProb(f,newdata=nd,times=c(3,8)) }
calPlot
Calibration plots
## S3 method for class 'calibrationPlot' plot(x, ...)
## S3 method for class 'calibrationPlot' plot(x, ...)
x |
Object obtained with |
... |
Not used. |
Nothing
Thomas A. Gerds <[email protected]>
calPlot
Plotting prediction error curves for one or more prediction models.
## S3 method for class 'pec' plot( x, what, models, xlim = c(x$start, x$minmaxtime), ylim = c(0, 0.3), xlab = "Time", ylab, axes = TRUE, col, lty, lwd, type, smooth = FALSE, add.refline = FALSE, add = FALSE, legend = ifelse(add, FALSE, TRUE), special = FALSE, ... )
## S3 method for class 'pec' plot( x, what, models, xlim = c(x$start, x$minmaxtime), ylim = c(0, 0.3), xlab = "Time", ylab, axes = TRUE, col, lty, lwd, type, smooth = FALSE, add.refline = FALSE, add = FALSE, legend = ifelse(add, FALSE, TRUE), special = FALSE, ... )
x |
Object of class |
what |
The name of the entry in |
models |
Specifies models in |
xlim |
Plotting range on the x-axis. |
ylim |
Plotting range on the y-axis. |
xlab |
Label given to the x-axis. |
ylab |
Label given to the y-axis. |
axes |
Logical. If |
col |
Vector of colors given to the curves of |
lty |
Vector of lty's given to the curves of |
lwd |
Vector of lwd's given to the curves of |
type |
Plotting type: either |
smooth |
Logical. If |
add.refline |
Logical. If |
add |
Logical. If |
legend |
if TRUE a legend is plotted by calling the function legend.
Optional arguments of the function |
special |
Logical. If |
... |
Extra arguments that are passed to |
From version 2.0.1 on the arguments legend.text, legend.args, lines.type,
lwd.lines, specials are obsolete and only available for backward
compatibility. Instead arguments for the invoked functions legend
,
axis
, Special
are simply specified as legend.lty=2
. The
specification is not case sensitive, thus Legend.lty=2
or
LEGEND.lty=2
will have the same effect. The function axis
is
called twice, and arguments of the form axis1.labels
, axis1.at
are used for the time axis whereas axis2.pos
, axis1.labels
,
etc. are used for the y-axis.
These arguments are processed via ...{}
of plot.pec
and
inside by using the function resolveSmartArgs
. Documentation of
these arguments can be found in the help pages of the corresponding
functions.
The (invisible) object.
Ulla B. Mogensen [email protected], Thomas A. Gerds [email protected]
# simulate data # with a survival response and two predictors library(prodlim) library(survival) set.seed(280180) dat <- SimSurv(100) # fit some candidate Cox models and # compute the Kaplan-Meier estimate Models <- list("Kaplan.Meier"=survfit(Surv(time,status)~1,data=dat), "Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) Models <- list("Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) # compute the .632+ estimate of the generalization error set.seed(17100) PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="boot632plus", B=5, keep.matrix=TRUE, verbose=TRUE) # plot the .632+ estimates of the generalization error plot(PredError.632plus,xlim=c(0,30)) # plot the bootstrapped curves, .632+ estimates of the generalization error # and Apparent error for the Cox model 'Cox.X1' with the 'Cox.X2' model # as benchmark plot(PredError.632plus, xlim=c(0,30), models="Cox.X1", special=TRUE, special.bench="Cox.X2", special.benchcol=2, special.addprederr="AppErr")
# simulate data # with a survival response and two predictors library(prodlim) library(survival) set.seed(280180) dat <- SimSurv(100) # fit some candidate Cox models and # compute the Kaplan-Meier estimate Models <- list("Kaplan.Meier"=survfit(Surv(time,status)~1,data=dat), "Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) Models <- list("Cox.X1"=coxph(Surv(time,status)~X1,data=dat,x=TRUE,y=TRUE), "Cox.X2"=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE)) # compute the .632+ estimate of the generalization error set.seed(17100) PredError.632plus <- pec(object=Models, formula=Surv(time,status)~X1+X2, data=dat, exact=TRUE, cens.model="marginal", splitMethod="boot632plus", B=5, keep.matrix=TRUE, verbose=TRUE) # plot the .632+ estimates of the generalization error plot(PredError.632plus,xlim=c(0,30)) # plot the bootstrapped curves, .632+ estimates of the generalization error # and Apparent error for the Cox model 'Cox.X1' with the 'Cox.X2' model # as benchmark plot(PredError.632plus, xlim=c(0,30), models="Cox.X1", special=TRUE, special.bench="Cox.X2", special.benchcol=2, special.addprederr="AppErr")
Ploting time-dependent event risk predictions.
plotPredictEventProb( x, newdata, times, cause = 1, xlim, ylim, xlab, ylab, axes = TRUE, col, density, lty, lwd, add = FALSE, legend = TRUE, percent = FALSE, ... )
plotPredictEventProb( x, newdata, times, cause = 1, xlim, ylim, xlab, ylab, axes = TRUE, col, density, lty, lwd, add = FALSE, legend = TRUE, percent = FALSE, ... )
x |
Object specifying an event risk prediction model. |
newdata |
A data frame with the same variable names as those that were
used to fit the model |
times |
Vector of times at which to return the estimated probabilities. |
cause |
Show predicted risk of events of this cause |
xlim |
Plotting range on the x-axis. |
ylim |
Plotting range on the y-axis. |
xlab |
Label given to the x-axis. |
ylab |
Label given to the y-axis. |
axes |
Logical. If |
col |
Vector of colors given to the survival curve. |
density |
Densitiy of the color – useful for showing many (overlapping) curves. |
lty |
Vector of lty's given to the survival curve. |
lwd |
Vector of lwd's given to the survival curve. |
add |
Logical. If |
legend |
Logical. If TRUE a legend is plotted by calling the function
legend. Optional arguments of the function |
percent |
Logical. If |
... |
Parameters that are filtered by |
Arguments for the invoked functions legend
and axis
are simply
specified as legend.lty=2
. The specification is not case sensitive,
thus Legend.lty=2
or LEGEND.lty=2
will have the same effect.
The function axis
is called twice, and arguments of the form
axis1.labels
, axis1.at
are used for the time axis whereas
axis2.pos
, axis1.labels
, etc. are used for the y-axis.
These arguments are processed via ...{}
of
plotPredictEventProb
and inside by using the function
SmartControl
.
The (invisible) object.
Ulla B. Mogensen [email protected], Thomas A. Gerds [email protected]
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
# competing risk data library(riskRegression) library(pec) set.seed(9) d=sampleData(80) csc1 <- CSC(Hist(time,event)~X1+X8, data=d) nd=sampleData(3) plotPredictEventProb(csc1, newdata=nd, cause=1, col=1:3)
# competing risk data library(riskRegression) library(pec) set.seed(9) d=sampleData(80) csc1 <- CSC(Hist(time,event)~X1+X8, data=d) nd=sampleData(3) plotPredictEventProb(csc1, newdata=nd, cause=1, col=1:3)
Ploting prediction survival curves for one prediction model using
predictSurvProb
.
plotPredictSurvProb( x, newdata, times, xlim, ylim, xlab, ylab, axes = TRUE, col, density, lty, lwd, add = FALSE, legend = TRUE, percent = FALSE, ... )
plotPredictSurvProb( x, newdata, times, xlim, ylim, xlab, ylab, axes = TRUE, col, density, lty, lwd, add = FALSE, legend = TRUE, percent = FALSE, ... )
x |
A survival prediction model including |
newdata |
A data frame with the same variable names as those that were
used to fit the model |
times |
Vector of times at which to return the estimated probabilities. |
xlim |
Plotting range on the x-axis. |
ylim |
Plotting range on the y-axis. |
xlab |
Label given to the x-axis. |
ylab |
Label given to the y-axis. |
axes |
Logical. If |
col |
Vector of colors given to the survival curve. |
density |
Densitiy of the color – useful for showing many (overlapping) curves. |
lty |
Vector of lty's given to the survival curve. |
lwd |
Vector of lwd's given to the survival curve. |
add |
Logical. If |
legend |
Logical. If TRUE a legend is plotted by calling the function
legend. Optional arguments of the function |
percent |
Logical. If |
... |
Parameters that are filtered by |
Arguments for the invoked functions legend
and axis
are simply
specified as legend.lty=2
. The specification is not case sensitive,
thus Legend.lty=2
or LEGEND.lty=2
will have the same effect.
The function axis
is called twice, and arguments of the form
axis1.labels
, axis1.at
are used for the time axis whereas
axis2.pos
, axis1.labels
, etc. are used for the y-axis.
These arguments are processed via ...{}
of
plotPredictSurvProb
and inside by using the function
SmartControl
.
The (invisible) object.
Ulla B. Mogensen [email protected], Thomas A. Gerds [email protected]
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
# generate some survival data library(prodlim) d <- SimSurv(100) # then fit a Cox model library(survival) library(rms) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # plot predicted survival probabilities for all time points ttt <- sort(unique(d$time)) # and for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) plotPredictSurvProb(coxmodel,newdata=ndat,times=ttt)
# generate some survival data library(prodlim) d <- SimSurv(100) # then fit a Cox model library(survival) library(rms) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # plot predicted survival probabilities for all time points ttt <- sort(unique(d$time)) # and for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) plotPredictSurvProb(coxmodel,newdata=ndat,times=ttt)
Function to extract event probability predictions from various modeling
approaches. The most prominent one is the combination of cause-specific Cox
regression models which can be fitted with the function cumincCox
from the package compRisk
.
predictEventProb(object, newdata, times, cause, ...)
predictEventProb(object, newdata, times, cause, ...)
object |
A fitted model from which to extract predicted event probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted event probabilities. |
times |
A vector of times in the range of the response variable, for which the cumulative incidences event probabilities are computed. |
cause |
Identifies the cause of interest among the competing events. |
... |
Additional arguments that are passed on to the current method. |
The function predictEventProb is a generic function that means it invokes specifically designed functions depending on the 'class' of the first argument.
See predictSurvProb
.
A matrix with as many rows as NROW(newdata)
and as many
columns as length(times)
. Each entry should be a probability and in
rows the values should be increasing.
Thomas A. Gerds [email protected]
See predictSurvProb
.
library(pec) library(survival) library(riskRegression) library(prodlim) train <- SimCompRisk(100) test <- SimCompRisk(10) cox.fit <- CSC(Hist(time,cause)~X1+X2,data=train) predictEventProb(cox.fit,newdata=test,times=seq(1:10),cause=1) ## with strata cox.fit2 <- CSC(list(Hist(time,cause)~strata(X1)+X2,Hist(time,cause)~X1+X2),data=train) predictEventProb(cox.fit2,newdata=test,times=seq(1:10),cause=1)
library(pec) library(survival) library(riskRegression) library(prodlim) train <- SimCompRisk(100) test <- SimCompRisk(10) cox.fit <- CSC(Hist(time,cause)~X1+X2,data=train) predictEventProb(cox.fit,newdata=test,times=seq(1:10),cause=1) ## with strata cox.fit2 <- CSC(list(Hist(time,cause)~strata(X1)+X2,Hist(time,cause)~X1+X2),data=train) predictEventProb(cox.fit2,newdata=test,times=seq(1:10),cause=1)
Function to extract predicted life years lost from various modeling
approaches. The most prominent one is the combination of cause-specific Cox
regression models which can be fitted with the function cumincCox
from the package compRisk
.
predictLifeYearsLost(object, newdata, times, cause, ...)
predictLifeYearsLost(object, newdata, times, cause, ...)
object |
A fitted model from which to extract predicted event probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted event probabilities. |
times |
A vector of times in the range of the response variable, for which the cumulative incidences event probabilities are computed. |
cause |
Identifies the cause of interest among the competing events. |
... |
Additional arguments that are passed on to the current method. |
The function predictLifeYearsLost is a generic function that means it invokes specifically designed functions depending on the 'class' of the first argument.
See predictSurvProb
.
A matrix with as many rows as NROW(newdata)
and as many
columns as length(times)
. Each entry should be a positive value and
in rows the values should be increasing.
Thomas A. Gerds [email protected]
predictSurvProb
, predictEventProb
.
library(pec) library(riskRegression) library(survival) library(prodlim) train <- SimCompRisk(100) test <- SimCompRisk(10) fit <- CSC(Hist(time,cause)~X1+X2,data=train,cause=1) predictLifeYearsLost(fit,newdata=test,times=seq(1:10),cv=FALSE,cause=1)
library(pec) library(riskRegression) library(survival) library(prodlim) train <- SimCompRisk(100) test <- SimCompRisk(10) fit <- CSC(Hist(time,cause)~X1+X2,data=train,cause=1) predictLifeYearsLost(fit,newdata=test,times=seq(1:10),cv=FALSE,cause=1)
Function to extract predicted mean times from various modeling approaches.
## S3 method for class 'aalen' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'riskRegression' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'cox.aalen' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'cph' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'coxph' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'matrix' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'selectCox' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'prodlim' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'psm' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'survfit' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'pecRpart' predictRestrictedMeanTime(object,newdata,times,...) #' \method{predictRestrictedMeanTime}{pecCtree}(object,newdata,times,...)
## S3 method for class 'aalen' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'riskRegression' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'cox.aalen' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'cph' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'coxph' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'matrix' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'selectCox' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'prodlim' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'psm' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'survfit' predictRestrictedMeanTime(object,newdata,times,...) ## S3 method for class 'pecRpart' predictRestrictedMeanTime(object,newdata,times,...) #' \method{predictRestrictedMeanTime}{pecCtree}(object,newdata,times,...)
object |
A fitted model from which to extract predicted survival probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted survival probabilities. |
times |
A vector of times in the range of the response variable, e.g. times when the response is a survival object, at which to return the survival probabilities. |
... |
Additional arguments that are passed on to the current method. |
The function predictRestrictedMeanTime is a generic function, meaning that it invokes a different function dependent on the 'class' of the first argument.
See also predictSurvProb
.
A matrix with as many rows as NROW(newdata)
and as many
columns as length(times)
. Each entry should be a probability and in
rows the values should be decreasing.
In order to assess the predictive performance of a new survival model
a specific predictRestrictedMeanTime
S3 method has to be written. For examples,
see the bodies of the existing methods.
The performance of the assessment procedure, in particular for resampling
where the model is repeatedly evaluated, will be improved by supressing in
the call to the model all the computations that are not needed for
probability prediction. For example, se.fit=FALSE
can be set in the
call to cph
.
Thomas A. Gerds [email protected]
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
# generate some survival data library(prodlim) set.seed(100) d <- SimSurv(100) # then fit a Cox model library(rms) library(survival) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # predicted survival probabilities can be extracted # at selected time-points: ttt <- quantile(d$time) # for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) # as follows predictRestrictedMeanTime(coxmodel,newdata=ndat,times=ttt) # stratified cox model sfit <- coxph(Surv(time,status)~strata(X1)+X2,data=d,x=TRUE,y=TRUE) predictRestrictedMeanTime(sfit,newdata=d[1:3,],times=c(1,3,5,10)) ## simulate some learning and some validation data learndat <- SimSurv(100) valdat <- SimSurv(100) ## use the learning data to fit a Cox model library(survival) fitCox <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) ## suppose we want to predict the survival probabilities for all patients ## in the validation data at the following time points: ## 0, 12, 24, 36, 48, 60 psurv <- predictRestrictedMeanTime(fitCox,newdata=valdat,times=seq(0,60,12)) ## This is a matrix with survival probabilities ## one column for each of the 5 time points ## one row for each validation set individual
# generate some survival data library(prodlim) set.seed(100) d <- SimSurv(100) # then fit a Cox model library(rms) library(survival) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # predicted survival probabilities can be extracted # at selected time-points: ttt <- quantile(d$time) # for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) # as follows predictRestrictedMeanTime(coxmodel,newdata=ndat,times=ttt) # stratified cox model sfit <- coxph(Surv(time,status)~strata(X1)+X2,data=d,x=TRUE,y=TRUE) predictRestrictedMeanTime(sfit,newdata=d[1:3,],times=c(1,3,5,10)) ## simulate some learning and some validation data learndat <- SimSurv(100) valdat <- SimSurv(100) ## use the learning data to fit a Cox model library(survival) fitCox <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) ## suppose we want to predict the survival probabilities for all patients ## in the validation data at the following time points: ## 0, 12, 24, 36, 48, 60 psurv <- predictRestrictedMeanTime(fitCox,newdata=valdat,times=seq(0,60,12)) ## This is a matrix with survival probabilities ## one column for each of the 5 time points ## one row for each validation set individual
Function to extract survival probability predictions from various modeling approaches. The most prominent one is the Cox regression model which can be fitted for example with ‘coxph’ and with ‘cph’.
## S3 method for class 'aalen' predictSurvProb(object,newdata,times,...) ## S3 method for class 'riskRegression' predictSurvProb(object,newdata,times,...) ## S3 method for class 'cox.aalen' predictSurvProb(object,newdata,times,...) ## S3 method for class 'cph' predictSurvProb(object,newdata,times,...) ## S3 method for class 'coxph' predictSurvProb(object,newdata,times,...) ## S3 method for class 'matrix' predictSurvProb(object,newdata,times,...) ## S3 method for class 'selectCox' predictSurvProb(object,newdata,times,...) ## S3 method for class 'pecCforest' predictSurvProb(object,newdata,times,...) ## S3 method for class 'prodlim' predictSurvProb(object,newdata,times,...) ## S3 method for class 'psm' predictSurvProb(object,newdata,times,...) ## S3 method for class 'survfit' predictSurvProb(object,newdata,times,...) ## S3 method for class 'pecRpart' predictSurvProb(object,newdata,times,...) #' \method{predictSurvProb}{pecCtree}(object,newdata,times,...)
## S3 method for class 'aalen' predictSurvProb(object,newdata,times,...) ## S3 method for class 'riskRegression' predictSurvProb(object,newdata,times,...) ## S3 method for class 'cox.aalen' predictSurvProb(object,newdata,times,...) ## S3 method for class 'cph' predictSurvProb(object,newdata,times,...) ## S3 method for class 'coxph' predictSurvProb(object,newdata,times,...) ## S3 method for class 'matrix' predictSurvProb(object,newdata,times,...) ## S3 method for class 'selectCox' predictSurvProb(object,newdata,times,...) ## S3 method for class 'pecCforest' predictSurvProb(object,newdata,times,...) ## S3 method for class 'prodlim' predictSurvProb(object,newdata,times,...) ## S3 method for class 'psm' predictSurvProb(object,newdata,times,...) ## S3 method for class 'survfit' predictSurvProb(object,newdata,times,...) ## S3 method for class 'pecRpart' predictSurvProb(object,newdata,times,...) #' \method{predictSurvProb}{pecCtree}(object,newdata,times,...)
object |
A fitted model from which to extract predicted survival probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted survival probabilities. |
times |
A vector of times in the range of the response variable, e.g. times when the response is a survival object, at which to return the survival probabilities. |
... |
Additional arguments that are passed on to the current method. |
The function predictSurvProb is a generic function that means it invokes specifically designed functions depending on the 'class' of the first argument.
The function pec
requires survival probabilities for each row in
newdata at requested times. These probabilities are extracted from a fitted
model of class CLASS
with the function predictSurvProb.CLASS
.
Currently there are predictSurvProb
methods for objects of class cph
(library rms), coxph (library survival), aalen (library timereg), cox.aalen
(library timereg),
rpart (library rpart), product.limit (library prodlim),
survfit (library survival), psm (library rms)
A matrix with as many rows as NROW(newdata)
and as many
columns as length(times)
. Each entry should be a probability and in
rows the values should be decreasing.
In order to assess the predictive performance of a new survival model
a specific predictSurvProb
S3 method has to be written. For examples,
see the bodies of the existing methods.
The performance of the assessment procedure, in particular for resampling
where the model is repeatedly evaluated, will be improved by supressing in
the call to the model all the computations that are not needed for
probability prediction. For example, se.fit=FALSE
can be set in the
call to cph
.
Thomas A. Gerds [email protected]
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
# generate some survival data library(prodlim) set.seed(100) d <- SimSurv(100) # then fit a Cox model library(survival) library(rms) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # Extract predicted survival probabilities # at selected time-points: ttt <- quantile(d$time) # for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) # as follows predictSurvProb(coxmodel,newdata=ndat,times=ttt) # stratified cox model sfit <- coxph(Surv(time,status)~strata(X1)+X2,data=d,,x=TRUE,y=TRUE) predictSurvProb(sfit,newdata=d[1:3,],times=c(1,3,5,10)) ## simulate some learning and some validation data learndat <- SimSurv(100) valdat <- SimSurv(100) ## use the learning data to fit a Cox model library(survival) fitCox <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) ## suppose we want to predict the survival probabilities for all patients ## in the validation data at the following time points: ## 0, 12, 24, 36, 48, 60 psurv <- predictSurvProb(fitCox,newdata=valdat,times=seq(0,60,12)) ## This is a matrix with survival probabilities ## one column for each of the 5 time points ## one row for each validation set individual ## Cox with ridge option f1 <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) f2 <- coxph(Surv(time,status)~ridge(X1)+ridge(X2),data=learndat,x=TRUE,y=TRUE) plot(predictSurvProb(f1,newdata=valdat,times=10), pec:::predictSurvProb.coxph(f2,newdata=valdat,times=10), xlim=c(0,1), ylim=c(0,1), xlab="Unpenalized predicted survival chance at 10", ylab="Ridge predicted survival chance at 10")
# generate some survival data library(prodlim) set.seed(100) d <- SimSurv(100) # then fit a Cox model library(survival) library(rms) coxmodel <- cph(Surv(time,status)~X1+X2,data=d,surv=TRUE) # Extract predicted survival probabilities # at selected time-points: ttt <- quantile(d$time) # for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) # as follows predictSurvProb(coxmodel,newdata=ndat,times=ttt) # stratified cox model sfit <- coxph(Surv(time,status)~strata(X1)+X2,data=d,,x=TRUE,y=TRUE) predictSurvProb(sfit,newdata=d[1:3,],times=c(1,3,5,10)) ## simulate some learning and some validation data learndat <- SimSurv(100) valdat <- SimSurv(100) ## use the learning data to fit a Cox model library(survival) fitCox <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) ## suppose we want to predict the survival probabilities for all patients ## in the validation data at the following time points: ## 0, 12, 24, 36, 48, 60 psurv <- predictSurvProb(fitCox,newdata=valdat,times=seq(0,60,12)) ## This is a matrix with survival probabilities ## one column for each of the 5 time points ## one row for each validation set individual ## Cox with ridge option f1 <- coxph(Surv(time,status)~X1+X2,data=learndat,x=TRUE,y=TRUE) f2 <- coxph(Surv(time,status)~ridge(X1)+ridge(X2),data=learndat,x=TRUE,y=TRUE) plot(predictSurvProb(f1,newdata=valdat,times=10), pec:::predictSurvProb.coxph(f2,newdata=valdat,times=10), xlim=c(0,1), ylim=c(0,1), xlab="Unpenalized predicted survival chance at 10", ylab="Ridge predicted survival chance at 10")
Print the important arguments of call and the prediction error values at selected time points.
## S3 method for class 'pec' print(x, times, digits = 3, what = NULL, ...)
## S3 method for class 'pec' print(x, times, digits = 3, what = NULL, ...)
x |
Object of class |
times |
Time points at which to show the values of the prediction error curve(s) |
digits |
Number of decimals used in tables. |
what |
What estimate of the prediction error curve to show. Should be a string matching an element of x. The default is determined by splitMethod. |
... |
Not used |
print |
Set to FALSE to suppress printing. |
The first argument in the invisible cloak.
Thomas A. Gerds [email protected]
This function computes a time-dependent $R^2$ like measure of the variation explained by a survival prediction model, by dividing the mean squared error (Brier score) of the model by the mean squared error (Brier score) of a reference model which ignores all the covariates.
R2(object, models, what, times, reference = 1)
R2(object, models, what, times, reference = 1)
object |
An object with estimated prediction error curves obtained with the function pec |
models |
For which of the models in |
what |
The name of the entry in |
times |
Time points at which the summaries are shown. |
reference |
Position of the model whose prediction error is used as the reference in the denominator when constructing $R^2$ |
In survival analysis the prediction error of the Kaplan-Meier estimator plays a similar role as the total sum of squares in linear regression. Hence, it is a sensible reference model for $R^2$.
A matrix where the first column holds the times and the following columns are the corresponding $R^2$ values for the requested prediction models.
Thomas A. Gerds [email protected]
E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.
Gerds TA, Cai T and Schumacher M (2008) The performance of risk prediction models Biometrical Journal, 50(4), 457–479
set.seed(18713) library(prodlim) library(survival) dat=SimSurv(100) nullmodel=prodlim(Hist(time,status)~1,data=dat) pmodel1=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) pmodel2=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) perror=pec(list(Cox1=pmodel1,Cox2=pmodel2),Hist(time,status)~1,data=dat,reference=TRUE) R2(perror,times=seq(0,1,.1),reference=1)
set.seed(18713) library(prodlim) library(survival) dat=SimSurv(100) nullmodel=prodlim(Hist(time,status)~1,data=dat) pmodel1=coxph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE) pmodel2=coxph(Surv(time,status)~X2,data=dat,x=TRUE,y=TRUE) perror=pec(list(Cox1=pmodel1,Cox2=pmodel2),Hist(time,status)~1,data=dat,reference=TRUE) R2(perror,times=seq(0,1,.1),reference=1)
Retrospective table of risks predicted by two different methods, models, algorithms
reclass( object, reference, formula, data, time, cause, cuts = seq(0, 100, 25), digits = 2 )
reclass( object, reference, formula, data, time, cause, cuts = seq(0, 100, 25), digits = 2 )
object |
Either a
list with two elements. Each element should either
be a vector with probabilities, or an object for which
|
reference |
Reference prediction model. |
formula |
A survival formula as obtained either with
|
data |
Used to extract the response from the data and passed
on to |
time |
Time interest for prediction. |
cause |
For competing risk models the cause of interest. Defaults to all available causes. |
cuts |
Risk quantiles to group risks. |
digits |
Number of digits to show for the predicted risks |
All risks are multiplied by 100 before
reclassification tables: overall table and one conditional table for each cause and for subjects event free at time interest.
Thomas A. Gerds <[email protected]>
predictStatusProb
## Not run: library(survival) set.seed(40) d <- prodlim::SimSurv(400) nd <- prodlim::SimSurv(400) Models <- list("Cox.X2"=coxph(Surv(time,status)~X2,data=d,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=d,x=TRUE,y=TRUE)) rc <- reclass(Models,formula=Surv(time,status)~1,data=nd,time=5) print(rc) plot(rc) set.seed(40) library(riskRegression) library(prodlim) dcr <- prodlim::SimCompRisk(400) ndcr <- prodlim::SimCompRisk(400) crPred5 <- list("X2"=predictEventProb(CSC(Hist(time,event)~X2,data=dcr),newdata=ndcr,times=5), "X1+X2"=predictEventProb(CSC(Hist(time,event)~X1+X2,data=dcr),newdata=ndcr,times=5)) rc <- reclass(crPred5,Hist(time,event)~1,data=ndcr,time=3) print(rc) reclass(crPred5,Hist(time,event)~1,data=ndcr,time=5,cuts=100*c(0,0.05,0.1,0.2,1)) ## End(Not run)
## Not run: library(survival) set.seed(40) d <- prodlim::SimSurv(400) nd <- prodlim::SimSurv(400) Models <- list("Cox.X2"=coxph(Surv(time,status)~X2,data=d,x=TRUE,y=TRUE), "Cox.X1.X2"=coxph(Surv(time,status)~X1+X2,data=d,x=TRUE,y=TRUE)) rc <- reclass(Models,formula=Surv(time,status)~1,data=nd,time=5) print(rc) plot(rc) set.seed(40) library(riskRegression) library(prodlim) dcr <- prodlim::SimCompRisk(400) ndcr <- prodlim::SimCompRisk(400) crPred5 <- list("X2"=predictEventProb(CSC(Hist(time,event)~X2,data=dcr),newdata=ndcr,times=5), "X1+X2"=predictEventProb(CSC(Hist(time,event)~X1+X2,data=dcr),newdata=ndcr,times=5)) rc <- reclass(crPred5,Hist(time,event)~1,data=ndcr,time=3) print(rc) reclass(crPred5,Hist(time,event)~1,data=ndcr,time=5,cuts=100*c(0,0.05,0.1,0.2,1)) ## End(Not run)
The function computes a matrix of random indices obtained by drawing from the row numbers of a data set either with or without replacement. The matrix can be used to repeatedly set up independent training and validation sets.
resolvesplitMethod(splitMethod, B, N, M)
resolvesplitMethod(splitMethod, B, N, M)
splitMethod |
String that determines the splitMethod to use. Available splitMethods are none/noPlan (no splitting), bootcv or outofbag (bootstrap cross-validation), cvK (K-fold cross-validation, e.g. cv10 gives 10-fold), boot632, boot632plus or boot632+, loocv (leave-one-out) |
B |
The number of repetitions. |
N |
The sample size |
M |
For subsampling bootstrap the size of the subsample. Note M<N. |
A list with the following components
name |
the official name of the splitMethod |
internal.name |
the internal name of the splitMethod |
index |
a matrix of indices with B columns and either N or M rows, dependent on splitMethod |
B |
the value of the argument B |
N |
the value of the argument N |
M |
the value of the argument M |
Thomas Alexander Gerds [email protected]
# BootstrapCrossValidation: Sampling with replacement resolvesplitMethod("BootCv",N=10,B=10) # 10-fold cross-validation: repeated 2 times resolvesplitMethod("cv10",N=10,B=2) # leave-one-out cross-validation resolvesplitMethod("loocv",N=10) resolvesplitMethod("bootcv632plus",N=10,B=2)
# BootstrapCrossValidation: Sampling with replacement resolvesplitMethod("BootCv",N=10,B=10) # 10-fold cross-validation: repeated 2 times resolvesplitMethod("cv10",N=10,B=2) # leave-one-out cross-validation resolvesplitMethod("loocv",N=10) resolvesplitMethod("bootcv632plus",N=10,B=2)
This is a wrapper function which first selects variables in the Cox
regression model using fastbw
from the rms
package and then
returns a fitted Cox regression model with the selected variables.
selectCox(formula, data, rule = "aic")
selectCox(formula, data, rule = "aic")
formula |
A formula object with a |
data |
Name of an data frame containing all needed variables. |
rule |
The method for selecting variables. See |
This function first calls cph
then fastbw
and finally
cph
again.
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. DOI 10.18637/jss.v050.i11
data(GBSG2) library(survival) f <- selectCox(Surv(time,cens)~horTh+age+menostat+tsize+tgrade+pnodes+progrec+estrec , data=GBSG2)
data(GBSG2) library(survival) f <- selectCox(Surv(time,cens)~horTh+age+menostat+tsize+tgrade+pnodes+progrec+estrec , data=GBSG2)
This is a wrapper function which first selects variables in the Fine & Gray
regression model using crrstep
from the crrstep
package and
then returns a fitted Fine & Gray regression model with the selected
variables.
selectFGR(formula, data, cause = 1, rule = "AIC", direction = "backward", ...)
selectFGR(formula, data, cause = 1, rule = "AIC", direction = "backward", ...)
formula |
A formula whose left hand side is a |
data |
A data.frame in which all the variables of
|
cause |
The failure type of interest. Defaults to |
rule |
Rule to pass on to crrstep ("AIC", "BIC" or "BICcr"),
also see |
direction |
see |
... |
Further arguments passed to |
Rob C.M. van Kruijsdijk [email protected]
Thomas Alexander Gerds [email protected]
## Not run: library(riskRegression) library(prodlim) library(lava) if (!requireNamespace("cmprsk",quietly=TRUE)){ library(cmprsk) library(pec) m <- crModel() m <- addvar(m,c('X1','X2','X3','X4','X5','X6','X7','X8','X9','X10')) distribution(m,c("X2","X7","X9")) <- binomial.lvm() regression(m,eventtime1~X1+X2+X5+X9) <- c(-1,1,0.5,0.8) set.seed(100) d <- sim(m,100) ## full formula ff <- Hist(time, event) ~ X1 + X2 + X3 + X4 +X5 + X6 + X7+ X8 + X9 + X10 # Fit full model with FGR fg <- FGR(ff,cause=1,data=d) # Backward selection based on the AIC sfgAIC <- selectFGR(ff, data=d, rule="AIC", direction="backward") sfgAIC$fit # Final FGR-model with selected variables # Risk reclassification plot at time = 4 plot(predictEventProb(fg,times=4,newdata=d), predictEventProb(sfgAIC,times=4,newdata=d)) # Backward selection based on the BIC, while forcing # the last two variables (X9 and X10) in the model sfgBIC <- selectFGR(ff, data=d, rule="BIC", direction="backward", scope.min=~X9+X10) ## apparent performance pec(list(full.model=fg,selectedAIC=sfgAIC,selectedBIC=sfgBIC), formula=Hist(time, event)~1, data=d) ## bootstrap cross-validation performance set.seed(7) pec(list(full.model=fg,selectedAIC=sfgAIC,selectedBIC=sfgBIC), formula=Hist(time, event)~1, data=d, B=5, splitMethod="bootcv") } ## End(Not run)
## Not run: library(riskRegression) library(prodlim) library(lava) if (!requireNamespace("cmprsk",quietly=TRUE)){ library(cmprsk) library(pec) m <- crModel() m <- addvar(m,c('X1','X2','X3','X4','X5','X6','X7','X8','X9','X10')) distribution(m,c("X2","X7","X9")) <- binomial.lvm() regression(m,eventtime1~X1+X2+X5+X9) <- c(-1,1,0.5,0.8) set.seed(100) d <- sim(m,100) ## full formula ff <- Hist(time, event) ~ X1 + X2 + X3 + X4 +X5 + X6 + X7+ X8 + X9 + X10 # Fit full model with FGR fg <- FGR(ff,cause=1,data=d) # Backward selection based on the AIC sfgAIC <- selectFGR(ff, data=d, rule="AIC", direction="backward") sfgAIC$fit # Final FGR-model with selected variables # Risk reclassification plot at time = 4 plot(predictEventProb(fg,times=4,newdata=d), predictEventProb(sfgAIC,times=4,newdata=d)) # Backward selection based on the BIC, while forcing # the last two variables (X9 and X10) in the model sfgBIC <- selectFGR(ff, data=d, rule="BIC", direction="backward", scope.min=~X9+X10) ## apparent performance pec(list(full.model=fg,selectedAIC=sfgAIC,selectedBIC=sfgBIC), formula=Hist(time, event)~1, data=d) ## bootstrap cross-validation performance set.seed(7) pec(list(full.model=fg,selectedAIC=sfgAIC,selectedBIC=sfgBIC), formula=Hist(time, event)~1, data=d, B=5, splitMethod="bootcv") } ## End(Not run)
Simulate data alike the data from the Copenhagen stroke study (COST)
simCost(N)
simCost(N)
N |
Sample size |
This uses functionality of the lava package.
Data frame
Thomas Alexander Gerds
This function is invoked and controlled by plot.pec
.
Special( x, y, addprederr, models, bench, benchcol, times, maxboot, bootcol, col, lty, lwd )
Special( x, y, addprederr, models, bench, benchcol, times, maxboot, bootcol, col, lty, lwd )
x |
an object of class 'pec' as returned by the |
y |
Prediction error values. |
addprederr |
Additional prediction errors. The options are bootstrap cross-validation errors or apparent errors. |
models |
One model also specified in |
bench |
A benchmark model (also specified in |
benchcol |
Color of the benchmark curve. |
times |
Time points at which the curves must be plotted. |
maxboot |
Maximum number of bootstrap curves to be added. Default is all. |
bootcol |
Color of the bootstrapped curves. Default is 'gray77'. |
col |
Color of the different error curves for |
lty |
Line type of the different error curves for |
lwd |
Line width of the different error curves for |
This function should not be called directly. The arguments can be specified
as Special.arg
in the call to plot.pec
.
Invisible object.
Extracted data from a french population based cohort (Three-City cohort). The dataset includes followup information on dementia outcome and predicted 5-year risks based on based on the subject specific information which includes age, gender, education level and cognitive decline measured by a psychometric test (Mini Mental State Examination). The prediction model from which the predictions have been computed has been fitted on independent training data from the Paquid cohort, another french population based cohort with similar design (see Reference Blanche et al. 2015 for details) .
A subsample consisting of 2000 observations on the following 3 variables.
5-year absolute risk predictions of dementia.
0=censored, 1=dementia, 2=death dementia free
time to event (i.e., time to either dementia, death dementia free or loss of follow-up)
Web-appendix of Blanche et al. (2015).
Blanche, P., Proust-Lima, C., Loubere, L., Berr, C., Dartigues, J. F., Jacqmin-Gadda, H. (2015). Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics, 71(1), 102-113.
data(threecity)
data(threecity)