---
title: "IPCW Cumulative Cost"
author: Klaus Holst & Thomas Scheike
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
    # fig_width: 7.15
    # fig_height: 5.5 
header-includes: 
  - \usepackage{tikz}
vignette: >
  %\VignetteIndexEntry{IPCW Cumulative Cost}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  #dev="svg",
  dpi=50,
  fig.width=7, fig.height=5.5,
  out.width="600px",
  fig.retina=1,
  comment = "#>"
)
fullVignette <- Sys.getenv("_R_FULL_VIGNETTE_") %in% c("1","TRUE")
library(mets)
```

Overview
========

We describe how to perform regression modelling for cumulative cost
\begin{align*}
{\cal U}(t) & = \int_0^t Z(s) dN(s)
\end{align*}
where $N(s)$ is a counting process that registers the times at which costs are realised and accumulated,
and $Z(t)$ is the cost (or mark) at the event times. The counting process can be
a mix of random and fixed times, and the data are represented in counting process format with
the marks/costs attached to the event times. There are many additional uses of
such cumulative processes; for example, when considering time lost in a recurrent events setting, which we return to
below.

We can estimate the marginal mean of the cumulative process
\begin{align*}
\nu(t) & = E ( {\cal U}(t) )  
\end{align*}
possibly for strata with standard errors based on the derived influence function. 

We provide semi-parametric regression modelling using the  proportional model 
\begin{align*}
E ( {\cal U}(t) | X) & = \Lambda_0(t) \exp( X^T \beta).
\end{align*}

In addition for a fixed time-point $t \in [0,\tau]$ we can estimate the mean given covariates
\begin{align*}
E ( {\cal U}(t) | X) & = \exp( X^T \beta) 
\end{align*}
where $\tau$ is some maximum follow-up time. 

 - These quantities are estimated under independent right-censoring given $X$, using IPCW-adjusted estimating equations,
    - similarly to the Ghosh-Lin model for recurrent events.
 - A terminal event can be specified.

We also estimate the probability of exceeding thresholds over time
\begin{align*}
 P ( {\cal U}(t)  > k ) & = \mu_k(t),
\end{align*}
and in the setting with a terminal event this is based on a derived competing risks data structure that
keeps track of the competing terminal event.

\begin{tikzpicture}[
    node distance=3.5cm,
    box/.style={draw, rectangle, minimum width=2.8cm, minimum height=1cm, align=center}
]
% Nodes
\node[box] (start) {At Risk};
\node[box, right of=start, yshift=2.4cm] (exceed) {Exceed-k};
\node[box, right of=start, yshift=-2.4cm] (death) {D};

% Arrows with hazard labels
\draw[->, thick] (start) -- node[below] {\hspace{2 cm} $F_{exceed-k}(t)$} (exceed);
\draw[->, thick] (start) -- node[below] {$F_{D}(t)$} (death);
\end{tikzpicture}

Regression modelling of this quantity is also possible using competing risks
regression models, for example via the `cifreg` function in mets.


HF-action data
==============

Using the HF-action data, we simulate a severity score for each event.


```{r}
library(mets)
data(hfactioncpx12)
hf <- hfactioncpx12
hf$severity <- abs((5+rnorm(741)*2))[hf$id]

## marginal mean using formula  
outNZ <- recurrent_marginal(Event(entry,time,status)~strata(treatment)+cluster(id)
			 +marks(severity),hf,cause=1,death.code=2)
plot(outNZ,se=TRUE)
summary(outNZ,times=3) 

outN <- recurrent_marginal(Event(entry,time,status)~strata(treatment)+cluster(id),data=hf,
		       cause=1,death.code=2)
plot(outN,se=TRUE,add=TRUE)
summary(outN,times=3) 
```

For comparison we also compute the IPCW estimates with and without marks at time 3 using the linear model,
and note that they are identical. Standard errors are based on different formulae that are asymptotically
equivalent, and we note that they are very similar.

```{r}
outNZ3 <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id)+marks(severity),data=hf,
		  cause=1,death.code=2,time=3,cens.model=~strata(treatment),model="lin")
summary(outNZ3)
head(iid(outNZ3))

outN3 <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id),data=hf,cause=1,death.code=2,time=3,
		 cens.model=~strata(treatment),model="lin")
summary(outN3)
head(iid(outN3))
```

We also apply the semiparametric proportional cost model with IPCW adjustment: 

```{r}
propNZ <- recreg(Event(entry,time,status)~treatment+marks(severity)+cluster(id),data=hf,cause=1,death.code=2)
summary(propNZ) 
plot(propNZ,main="Baselines")
     
GL <- recreg(Event(entry,time,status)~treatment+cluster(id),hf,cause=1,death.code=2)
summary(GL)
plot(GL,add=TRUE,col=2)
```
Those treated have 14\% lower cumulative severity and 11\% lower expected number of events.


Exceed threshold
================

Finally, we estimate the probability of exceeding cumulative severity thresholds of 1, 5, and 10:

```{r}
ooNZ <- prob_exceed_recurrent(Event(entry,time,status)~strata(treatment)+cluster(id)+marks(severity),data=hf,
			      cause=1,death.code=2,exceed=c(1,5,10,20))
plot(ooNZ,strata=1)
plot(ooNZ,strata=2,add=TRUE)
summary(ooNZ,times=3)
```

Cumulative time lost for recurrent events
=========================================

The cumulative time lost for recurrent events is defined as
\begin{align*}
  {\cal M}(t) = E\left[ \int_0^\tau (\tau-s) dN(s) \right] = \int_0^\tau \mu(s) ds
\end{align*}
where $\mu(t) = E( N(t) )$ is the marginal mean of the recurrent events at time $t$.


```{r}
hf$lost5 <- 5-hf$time

RecLost <- recregIPCW(Event(entry,time,status)~-1+treatment+cluster(id)+marks(lost5),data=hf,
		   cause=1,death.code=2,time=5,cens.model=~strata(treatment),model="lin")
summary(RecLost)
head(iid(RecLost))
```


SessionInfo
============


```{r}
sessionInfo()
```
