.. _lrdb:

================================================================================
The CAS Loss Reserving Database
================================================================================

The |LRDB|__ is a collection of loss data originally compiled by Glenn Meyers 
intended to be used for claims reserving studies. The data includes major 
personal and commercial lines of business from U.S. property casualty insurers.
The claims data comes from 
*Schedule P Analysis of Losses and Loss Expenses* in the National
Association of Insurance Commissioners (NAIC) database. The final product is a 
dataset that contains run-off triangles of six lines of business for all U.S. 
property casualty insurers. The triangle data correspond to claims of accident 
year 1988-1997 with 10 years development lag. Both upper and lower triangles
are included so that one could use the data to develop a model and then test 
its performance retrospectively [#f1]_.   

     
| The |LRDB| contains the following fields:


**loss_key**
    Currently limited to ``"com_auto"``, but will be expanded in a future release (see mapping below)
               
**grcode**
    NAIC company code (including insurer groups and single insurers)

**grname**
    NAIC company name (including insurer groups and single insurers)
  
**origin**
    Accident year

**dev**
    Development year

**incrd_loss**
    Incurred losses and allocated expenses reported at year end

**paid_loss**
    paid losses and allocated expenses at year end

**ep_dir**
    Premiums earned at incurral year - direct and assumed

**ep_ceded**
    Premiums earned at incurral year - ceded

**ep_net**
    earned_prem_dir - earned_prem_ceded

**single**
    1 indicates a single entity, 0 indicates a group insurer

**posted_reserve_97**
    Posted reserves in year 1997 taken from the Underwriting and Investment
    Exhibit Part 2A, including net losses unpaid and unpaid loss adjustment
    expenses

**train_ind**
    A 1 indicates that the value associated with a particular origin-dev combination
    would fall in the upper-left of a typical loss triangle. * indicates what would
    typically be indentified as a future observation at the time triangles are constructed.
    The lower right triangle values can be treated as a holdout set to test the adequacy
    of a reserve estimator.

The following table provides a description of the type of losses associated 
with each unique combination of ``loss_key`` and
``loss_type_suffix``:


.. csv-table:: Loss Description
    :header: "loss_type_suffix", "loss_key", "description"

    "D", "wc", "Workers Compensation"
    "R", "prod_liab","Products Liability - Occurance"
    "B", "pp_auto","Private Passenger Auto Liability/Medical"
    "H", "othr_liab", "Other Liability - Occurrence"
    "F", "med_mal", "Medical Malpractice - Claims Made"
    "C", "comauto", "Commercial Auto Liability/Medical"


A copy of the |LRDB| is distributed along with trikit, in addition to
a number of convenience functions intended to simplify database
interaction. In what follows, |LRDB| convenience functions are detailed.


Loss Reserve Database API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


.. function:: load_lrdb(tri_type=None, loss_type="incurred", lob="comauto", grcode=1767, grname=None, train_only=True)

    Load the CAS Loss Reserving Database subset of losses. Additional
    keyword arguments are used to subset the CAS Loss Reserving Database to the
    records of interest.
    Within the Loss Reserving Database, "lob" and "grcode" uniquely
    partition losses into 100 record blocks if ``lower_right_ind=True``,
    otherwise losses are partitioned into 55 record blocks. All available
    combinations of "lob" and "grcode" (referred to as "specs")
    can be obtained by calling ``get_lrdb_specs``.
    Note that when ``tri_type`` is "cum" or "incr", the remaining subset
    of records after applying ``lob``, ``grcode`` and ``grname`` filters will
    be aggregated into a single triangle of the specified type.

    :param str tri_type: If ``None``, lrdb subset is returned as pd.DataFrame. Otherwise, return subset as either incremental or cumulative triangle type. Default value is None.
    :param str lob: Specifies the line of business to return. Available options are ``['comauto', 'ppauto', 'wkcomp', 'medmal', 'prodliab', 'othliab']``. Default value is "comauto".
    :param int grcode: NAIC company code including insurer groups and single insurers. For a full listing, call ``get_lrdb_specs``. Default value is ``1767``.
    :param str grname: NAIC company name (including insurer groups and single insurers). The complete mapping of available grcodes can be obtained by calling ``get_lrdb_specs``. Default value is None.
    :param str loss_type: Specifies which loss data to load. Can be one of "paid" or "incurred". Defaults to "incurred". Note that bulk losses have already been subtracted from schedule P incurred losses. Default value is "incurred".
    :param bool train_only: If True, the upper-left portion of the triangle will be returned. The upper-left portion of the triangle typically consists of actual loss experience. If False, the squared triangle, consisting of 100 observations is returned. Default value is True.
    :return: Either pd.DataFrame, trikit.triangle.IncrTriangle or trikit.triangle.CumTriangle.


|LRDB| datasets are loaded via ``load_lrdb``. A number of additional keyword arguments can be
passed to ``load_lrdb``. For example, to retrieve the subset of commercial auto losses for
``grcode=1767``::

    In [1]: from trikit import load_lrdb
    In [2]: df = load_lrdb(grcode=1767, lob="comauto")
    In [3]: type(df)
    Out[3]: pandas.core.frame.DataFrame


Alternatively, the resulting subset can be transformed to a triangle of the desired type::

    In [4]: tri = load_lrdb(tri_type="cum", grcode=1767, lob="comauto")
    In [5]: tri
    Out[5]:
              1       2       3       4       5         6         7         8         9         10
    1988 110,231 263,079 431,216 611,278 797,428   985,570 1,174,922 1,366,229 1,558,096 1,752,096
    1989 121,678 279,896 456,640 644,767 837,733 1,033,837 1,233,015 1,432,670 1,633,619       nan
    1990 123,376 298,615 500,570 714,683 934,671 1,157,979 1,383,820 1,610,193       nan       nan
    1991 117,457 280,058 463,396 662,003 865,401 1,071,271 1,278,228       nan       nan       nan
    1992 124,611 291,399 481,170 682,203 889,029 1,101,390       nan       nan       nan       nan
    1993 137,902 323,854 533,211 753,639 980,180       nan       nan       nan       nan       nan
    1994 150,582 345,110 561,315 792,392     nan       nan       nan       nan       nan       nan
    1995 150,511 345,241 560,278     nan     nan       nan       nan       nan       nan       nan
    1996 142,301 326,584     nan     nan     nan       nan       nan       nan       nan       nan
    1997 143,970     nan     nan     nan     nan       nan       nan       nan       nan       nan


Notice that with ``grcode`` and ``lob`` specified as above, the returned DataFrame contains 55 records
as expected (recall that by default, ``train_ind`` is set to False, otherwise the shape of the returned
DataFrame would be (100, 3).


|LRDB| lines of business can be listed by calling ``get_lrdb_lobs``::

    In [1]: from trikit import get_lrdb_lobs
    In [2]: get_lrdb_lobs()
    Out[2]: ['comauto', 'ppauto', 'wkcomp', 'medmal', 'prodliab', 'othliab']


Unique combinations of "loss_key", "grname" and "grcode" can be listed by calling ``get_lrdb_specs``::

        In [1]: from trikit import get_lrdb_specs
        In [2]: get_lrdb_specs()
        Out[2]:
            loss_key  grcode                               grname
        0    comauto     266              Public Underwriters Grp
        1    comauto     337                   California Cas Grp
        2    comauto     353                       Celina Mut Grp
        3    comauto     388                   Federal Ins Co Grp
        4    comauto     460                      Buckeye Ins Grp
        ..       ...     ...                                  ...
        431   wkcomp   41580                    Red Shield Ins Co
        432   wkcomp   42439                Toa-Re Ins Co Of Amer
        433   wkcomp   43915                       Rainier Ins Co
        434   wkcomp   44091  Dowa Fire & Marine Ins Co Ltd Us Br
        435   wkcomp   44300                   Tower Ins Co Of NY


.. |LRDB| replace:: CAS Loss Reserving Database
__ https://www.casact.org/research/index.cfm?fa=loss_reserves_data


.. rubric:: Footnotes

.. [#f1] https://www.casact.org/research/index.cfm?fa=loss_reserves_data