The clmplus
package provides practitioners with a fast
and user friendly implementation of the modeling framework we derived in
our paper .
We were able to connect the well-known hazard models developed in life insurance to non-life run-off triangles claims development. The flexibility of this approach goes beyond the methodological novelty: we hope to provide a user-friendly set of tools based on the point of contact between non-life insurance and life insurance in the actuarial science.
This vignette is organized as follows:
We show the connection between the age-period representation and run-off triangles.
We replicate the chain ladder within our modeling framework. As
shown in the paper, by using the clmplus
approach the
resulting model is saving some parameters with respect to the standard
GLM approach.
We show an example were adding a cohort effect can lead to an improvement on the model fit.
In this tutorial, we show an example on the AutoBIPaid
run-off triangle from the ChainLadder
package.
Consider the data set we chose for this tutorial.
The run-off triangle representation is displayed below:
library(ChainLadder)
data("AutoBI")
=AutoBI$AutoBIPaid
dataset
dataset#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1904 5398 7496 8882 9712 10071 10199 10256
#> [2,] 2235 6261 8691 10443 11346 11754 12031 NA
#> [3,] 2441 7348 10662 12655 13748 14235 NA NA
#> [4,] 2503 8173 11810 14176 15383 NA NA NA
#> [5,] 2838 8712 12728 15278 NA NA NA NA
#> [6,] 2405 7858 11771 NA NA NA NA NA
#> [7,] 2759 9182 NA NA NA NA NA NA
#> [8,] 2801 NA NA NA NA NA NA NA
colnames(dataset)=c(0:(dim(dataset)[1]-1))
rownames(dataset)=c(0:(dim(dataset)[1]-1))
Practitioners in general insurance refer to the x axis of this representation as development years. Similarly, the y axis is called accident years. The third dimension that matters is the diagonals: the calendar years. There is a one-to-one correspondence between the age-period representation and run-off triangles. In notional terms, life insurance actuaries use the following terminology:
ages are development years.
cohorts are accident years.
periods is calendar years.
Indeed, the age-period representation of the run-off triangle is the following:
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1904 2235 2441 2503 2838 2405 2759 2801
#> [2,] NA 5398 6261 7348 8173 8712 7858 9182
#> [3,] NA NA 7496 8691 10662 11810 12728 11771
#> [4,] NA NA NA 8882 10443 12655 14176 15278
#> [5,] NA NA NA NA 9712 11346 13748 15383
#> [6,] NA NA NA NA NA 10071 11754 14235
#> [7,] NA NA NA NA NA NA 10199 12031
#> [8,] NA NA NA NA NA NA NA 10256
Observe that the y axis is now the development years (or age) component.
Calendar years (or periods) are displayed on the x axis.
Accident years (or cohorts) are on the diagonals.
clmplus
packageOur package is an out of the box set of tools to compute the claims
reserve. We now show how to replicate the chain ladder model. Observe
the run-off triangle data structure first needs to be initialized to a
RtTriangle
object.
Starting from the results in the paper we showed how to replicate the
chain ladder with an age model. The computation on the
RtTriangle
object is computed with the method
clmplus
specifying an hazard model.
=clmplus(RtTriangle = rtt,
a.modelhazard.model = "a")
#> StMoMo: The following cohorts have been zero weighted: -7 -6 -5 -4 -3 -2 -1
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm
We show the consistency of our approach by comparing our estimates
with those obtained with the Mack chain ladder method as implemented in
the ChainLadder
package.
<- MackChainLadder(dataset)
mck.chl =mck.chl$FullTriangle[,dim(mck.chl$FullTriangle)[2]]
ultimate.chl=rev(t2c(mck.chl$FullTriangle)[,dim(mck.chl$FullTriangle)[2]]) diagonal
Estimates are gathered in a data.frame
to ease the
understanding.
data.frame(ultimate.cost.mack=ultimate.chl,
ultimate.cost.fchl=a.model$ultimate.cost,
reserve.mack=ultimate.chl-diagonal,
reserve.fchl=a.model$reserve
)#> ultimate.cost.mack ultimate.cost.fchl reserve.mack reserve.fchl
#> 0 10256.00 10256.00 0.00000 0.00000
#> 1 12098.24 12098.24 67.23865 67.23865
#> 2 14580.19 14580.19 345.18727 345.18727
#> 3 16323.69 16323.69 940.68770 940.68770
#> 4 17628.86 17628.86 2350.85562 2350.85562
#> 5 16237.77 16237.77 4466.77443 4466.77443
#> 6 18285.24 18285.24 9103.24335 9103.24335
#> 7 17281.44 17281.44 14480.43832 14480.43832
cat('\n Total reserve:',
sum(a.model$reserve))
#>
#> Total reserve: 31754.43
We fit the standard GLM model with the apc
package. As
shown in the paper the chain-ladder model can be replicated by fitting
an age-cohort model.
library(apc)
= apc.data.list(cum2incr(dataset),
ds.apc data.format = "CL")
= apc.fit.model(ds.apc,
ac.model.apc model.family = "od.poisson.response",
model.design = "AC")
Inspect the model coefficients derived from the output:
$coefficients.canonical[,'Estimate']
ac.model.apc#> level age slope cohort slope DD_age_3 DD_age_4 DD_age_5
#> 7.41596168 0.74105900 0.16519698 -1.16411707 -0.02909890 -0.17467013
#> DD_age_6 DD_age_7 DD_age_8 DD_cohort_3 DD_cohort_4 DD_cohort_5
#> -0.17533888 0.17408744 -0.55360421 0.02140672 -0.07364998 -0.03603392
#> DD_cohort_6 DD_cohort_7 DD_cohort_8
#> -0.15911660 0.20095088 -0.17521544
= apc.forecast.ac(ac.model.apc)
ac.fcst.apc
data.frame(reserve.mack=ultimate.chl-diagonal,
reserve.apc=c(0,ac.fcst.apc$response.forecast.coh[,'forecast']),
reserve.fchl=a.model$reserve
)#> reserve.mack reserve.apc reserve.fchl
#> 0 0.00000 0.00000 0.00000
#> 1 67.23865 67.23865 67.23865
#> 2 345.18727 345.18727 345.18727
#> 3 940.68770 940.68770 940.68770
#> 4 2350.85562 2350.85562 2350.85562
#> 5 4466.77443 4466.77443 4466.77443
#> 6 9103.24335 9103.24335 9103.24335
#> 7 14480.43832 14480.43832 14480.43832
Our method is able to replicate the chain-ladder results with no need to add the cohort component.
$model.fit$ax
a.model#> 1 2 3 4 5 6
#> 0.69314718 0.02366899 -1.01313612 -1.72538123 -2.48027780 -3.34130515
#> 7 8
#> -3.99615988 -5.18978418
Further inspection can be performed with the fchl
package, which provides the graphical tools to inspect the fitted
effects. Observe we model the rate in continuous time, the choice of a
line plot is then consistent.
=plot(a.model) p.a
# The benefitial effect of adding the cohort component
It is straightforward to state that from the statistical perspective it is desirable to have a model with less parameters. Nevertheless, our approach goes far beyond that.
By adding the cohort effect we are able to improve our modeling.
We show these results by inspecting the residuals plots.
#make it triangular
plotresiduals(a.model)
Clearly, the red and blue areas suggest some trends that the model wasn’t able to catch. Consider now the age-cohort model and its residuals plot.
<- clmplus(rtt, hazard.model="ac")
ac.model #> StMoMo: The following cohorts have been zero weighted: -7 -6 -5 -4 -3 -2 -1
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm
plotresiduals(ac.model)
With no need of extrapolating a period component, we were able to improve the fit already. Similarly, it is possible to add a period component and choose an age-period model or an age-period-cohort model.
= clmplus(rtt,hazard.model = "ap")
ap.model #> StMoMo: The following cohorts have been zero weighted: -7 -6 -5 -4 -3 -2 -1
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm
= clmplus(rtt,hazard.model = "apc")
apc.model #> StMoMo: The following cohorts have been zero weighted: -7 -6 -5 -4 -3 -2 -1
#> StMoMo: Start fitting with gnm
#> StMoMo: Finish fitting with gnm
plotresiduals(ap.model)
It can be seen that the age-period model does not suggest any serious improvement from the age-cohort model. It is worth noticing one more time that the age-cohort model does not require any extrapolation. In a similar fashion, we plot the age-period-cohort model below, which seems to lead us to a small improvement.
plotresiduals(apc.model)
In this vignette we wanted to show the flexibility of our modeling approach with respect to the well-known chain-ladder model.
By modeling the hazard rate we are able to replicate the chain-ladder results with less parameters. Indeed, we model the age component directly and add a cohort effect if needed.
Going from an age model to an age-cohort model may lead to a serious improvement in the model results.