No. 139: MIES – Schema Changes and Claims Reporting

This entry is part of a series dedicated to MIES – a miniature insurance economic simulator. The source code for the project is available on GitHub.

Current Status

When programming MIES, I face a trade-off between rolling out enough features for a weekly blog post and spending more time trying to learn Python in order to code them in the most elegant way possible. If I implement things too quickly, I face the problem of accumulating bugs and technical debt, but if I spend too much time trying learn the latest libraries, I’d never get anything done. Therefore, I’ve decided to just write what I can most weeks, but revisiting the code to make it better every 3rd or 4th week, so I can implement any new technical knowledge I’d gained during that time.

Last month, I demonstrated the phenomenon of adverse selection using MIES. However, the underlying database was not a realistic depiction of the real world:

There was only one database, shared by all entities involved in the simulation – firms, people, and the environment. In business, it’s more realistic for companies to have their own internal databases, without the ability to access any other company’s database. I’ve reached point where I’ve decided that it would be difficult to add insurance operations, specifically the reporting of claims, if all entities used the same database. Therefore, I’ve decided to spend time this week setting up separate schemas for each company.

The reason why I didn’t do this earlier is because I hadn’t gone far enough in the SQLAlchemy documentation to figure out how to establish multiple metadata bases to allow each company in the simulation to access its own database. The decision I was facing last month was to either write the adverse selection demo with the knowledge I had, or to keep reading further to set things up the right way and then write about how I built nothing, but read some documentation in the hope that I’d build something later. I’d gotten tired of doing the latter one too many times, so I decided to just go for it.

This week, I was confident that I’d be able to fix the single-schema problem and add a claims reporting facility as well, so that’s what I’ll go over today.

Splitting the Database

Splitting the database involved two separate challenges:

Defining the schema for each entity
Rewriting the code to accept the new backend structure

This second challenge was the main motivation for updating the schema now rather than later. The more features I added to MIES, the more of them I would have to rewrite when I inevitably split the database.

First, I envisioned the database structure I wanted for each firm:

Much of this was taken from the original schema, as seen at the top of this post. The policy table contains the same information as before, minus the company id. This is because a company does not need to have this field if all of its policies belong to itself.

The customer table is new. Here, I faced the challenge of whether I wanted to maintain data integrity between the information a company knows about its own customers, and their actual attributes as they exist outside the knowledge of the firm. For example, if a person switches insurers and then changes their profession, should the original insurer know about it?

In the real world, the answer is no. This also applies to things like credit scores which can change after the company does an initial credit check. Therefore, I’ve decided to discard data integrity between companies and the environment. I may want to rethink whether I want the all-knowing environment class to at least have a canonical record of all the information in the simulation (the answer is usually yes when I write these thoughts out in a blog post).

The customer table is analogous to the person table as before, but only contains information that the firm knows about each customer. This information typically costs money to acquire when quoting policies, and is considered an underwriting expense that I will need to add later.

Lastly, I’ve added a new table for claims. I mentioned last month that since we assumed each loss was fully covered, there was no need to distinguish between losses and claims. Now that we’re facing the challenge of assigning a losses to different companies as claims, I decided to lay the groundwork now so that we can loosen this assumption when we need to. Gradually, we will see the what the differences is between a loss and a claim. A loss is a fortuitous event that happens to a person, and an insurance claim is a financial claim that a person makes against their insurer for indemnification under their policy. There is also a distinction between the loss a person endures and the loss that a company needs to pay, due to things like coverage exclusions, limits, and deductibles. The claim table will take care of many of these distinctions.

Therefore, in a two-firm simulation, we will have a database for each firm. Below, we have two ER-diagrams side-by-side for each company:

This idea carries over to the case of an n-firm simulation, where we’ll have n databases. The code that defines the firm schema now resides in a new module:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Date, Float, String
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship


Base = declarative_base()


class Customer(Base):
    __tablename__ = 'customer'

    person_id = Column(
        Integer,
        primary_key=True
    )
    age_class = Column(String)
    profession = Column(String)
    health_status = Column(String)
    education_level = Column(String)

    policy = relationship(
        "Policy",
        back_populates="customer"
    )

    def __repr__(self):
        return "<Customer(" \
               "age_class='%s', " \
               "profession='%s', " \
               "health_status='%s', " \
               "education_level='%s'" \
               ")>" % (
                   self.age_class,
                   self.profession,
                   self.health_status,
                   self.education_level,
               )


class Policy(Base):
    __tablename__ = 'policy'

    policy_id = Column(
        Integer,
        primary_key=True
    )
    person_id = Column(
        Integer,
        ForeignKey('customer.person_id')
    )
    effective_date = Column(Date)
    expiration_date = Column(Date)
    premium = Column(Float)

    customer = relationship(
        "Customer",
        back_populates="policy"
    )

    def __repr__(self):
        return "<Policy(person_id ='%s'," \
               "effective_date ='%s', " \
               "expiration_date='%s', " \
               "premium='%s')>" % (
                self.person_id,
                self.effective_date,
                self.expiration_date,
                self.premium
                )


class Claim(Base):
    __tablename__ = 'claim'

    claim_id = Column(
        Integer,
        primary_key=True
    )
    policy_id = Column(
        Integer,
        ForeignKey('policy.policy_id')
    )
    person_id = Column(Integer)
    event_id = Column(Integer)
    occurrence_date = Column(Date)
    report_date = Column(Date)
    incurred_loss = Column(Float)

    def __repr__(self):
        return "<Claim(policy_id ='%s'," \
               "person_id ='%s', " \
               "event_id='%s', " \
               "occurrence_date='%s'," \
               "report_date='%s'," \
               "incurred_loss='%s')>" % (
                self.policy_id,
                self.person_id,
                self.event_id,
                self.occurence_date,
                self.report_date,
                self.incurred_loss
                )

100

101

102

103

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy import Column, Integer, Date, Float, String

from sqlalchemy import ForeignKey

from sqlalchemy.orm import relationship

Base = declarative_base()

class Customer(Base):

__tablename__ = 'customer'

person_id = Column(

Integer,

primary_key=True

)

age_class = Column(String)

profession = Column(String)

health_status = Column(String)

education_level = Column(String)

policy = relationship(

"Policy",

back_populates="customer"

)

def __repr__(self):

return "<Customer(" \

"age_class='%s', " \

"profession='%s', " \

"health_status='%s', " \

"education_level='%s'" \

")>" % (

self.age_class,

self.profession,

self.health_status,

self.education_level,

)

class Policy(Base):

__tablename__ = 'policy'

policy_id = Column(

Integer,

primary_key=True

)

person_id = Column(

Integer,

ForeignKey('customer.person_id')

)

effective_date = Column(Date)

expiration_date = Column(Date)

premium = Column(Float)

customer = relationship(

"Customer",

back_populates="policy"

)

def __repr__(self):

return "<Policy(person_id ='%s'," \

"effective_date ='%s', " \

"expiration_date='%s', " \

"premium='%s')>" % (

self.person_id,

self.effective_date,

self.expiration_date,

self.premium

)

class Claim(Base):

__tablename__ = 'claim'

claim_id = Column(

Integer,

primary_key=True

)

policy_id = Column(

Integer,

ForeignKey('policy.policy_id')

)

person_id = Column(Integer)

event_id = Column(Integer)

occurrence_date = Column(Date)

report_date = Column(Date)

incurred_loss = Column(Float)

def __repr__(self):

return "<Claim(policy_id ='%s'," \

"person_id ='%s', " \

"event_id='%s', " \

"occurrence_date='%s'," \

"report_date='%s'," \

"incurred_loss='%s')>" % (

self.policy_id,

self.person_id,

self.event_id,

self.occurence_date,

self.report_date,

self.incurred_loss

)

There is also a schema for the environment, which is similar to before but minus the policy-specific information:

Query Encapsulation

MIES relies on querying its databases to run the simulations. Up until now, many of these queries resided within the environment, broker, and insurer classes. The tricky thing about querying databases is that you should close the connections to them once you no longer need them. However, establishing and closing a database requires a few lines of code, and the queries themselves can be verbose as well. This can get repetitive and can lead to bugs if I forget to close a connection or make modifications to reused queries.

Therefore, I thought it would be a good idea to encapsulate them into functions. In order to do this, I created two modules in a folder called utilities that are used to connect to and then query the databases, respectively. For example, the connections file contains code to connect to either the universe (environment) database or the database for a particular company:

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
import schema.universe as universe


def connect_universe():
    engine = sa.create_engine(
        'sqlite:///db/universe.db',
        echo=True
    )
    session = sessionmaker(bind=engine)()
    connection = engine.connect()
    return session, connection


def connect_company(company_name):
    engine = sa.create_engine(
        'sqlite:///db/companies/' + company_name + '.db',
        echo=True)
    session = sessionmaker(bind=engine)()
    connection = engine.connect()
    return session, connection

import sqlalchemy as sa

from sqlalchemy.orm import sessionmaker

import schema.universe as universe

def connect_universe():

engine = sa.create_engine(

'sqlite:///db/universe.db',

echo=True

)

session = sessionmaker(bind=engine)()

connection = engine.connect()

return session, connection

def connect_company(company_name):

engine = sa.create_engine(

'sqlite:///db/companies/' + company_name + '.db',

echo=True)

session = sessionmaker(bind=engine)()

connection = engine.connect()

return session, connection

You can see that you need to import some things, and also define an engine, session, and connection. That’s quite a bit of code that I don’t want to repeat every time I need to access a database.

Moving onto the queries, we likewise have code that we don’t want to repeat. For example, the query_population() function in the queries file returns the PersonTable from the environment schema as a dataframe:

import pandas as pd

from schema.universe import Company, PersonTable
from schema.insco import Customer, Policy
from utilities.connections import (
    connect_universe,
    connect_company)

def query_population():
    session, connection = connect_universe()
    query = session.query(PersonTable).statement
    population = pd.read_sql(query, connection)
    connection.close()
    return population

import pandas as pd

from schema.universe import Company, PersonTable

from schema.insco import Customer, Policy

from utilities.connections import (

connect_universe,

connect_company)

def query_population():

session, connection = connect_universe()

query = session.query(PersonTable).statement

population = pd.read_sql(query, connection)

connection.close()

return population

Now, rather than calling all the code that we see within these functions, we can just get the information in one line, that is, query_population(). Other queries within the file access other parts of the database and return information we might want to ask about frequently:

def query_company():
    session, connection = connect_universe()
    companies_query = session.query(Company).statement
    companies = pd.read_sql(
        companies_query,
        connection
    )
    connection.close()
    return companies


def query_all_policies():
    companies = query_company()
    policies = pd.DataFrame()

    for index, row in companies.iterrows():
        company_id = row['company_id']
        company_name = row['company_name']
        session, connection = connect_company(company_name)
        query = session.query(Policy).statement
        policy_c = pd.read_sql(
            query,
            connection
        )
        policy_c['company_id'] = company_id
        policy_c['company_name'] = company_name
        policies = policies.append(policy_c)
        connection.close()

    return policies


def query_in_force_policies(curr_date):
    companies = get_company_names()
    in_force = pd.DataFrame()

    for company in companies:
        session, connection = connect_company(company)

        exp_query = session.query(
            Policy
        ).filter(
            Policy.expiration_date == curr_date
        ).statement

        company_in_force = pd.read_sql(
            exp_query,
            connection)

        in_force = in_force.append(
            company_in_force,
            ignore_index=True
        )

        connection.close()
    return in_force


def get_company_names():
    session, connection = connect_universe()
    companies_query = session.query(Company.company_name).statement
    companies = pd.read_sql(
        companies_query,
        connection
    )
    connection.close()
    return list(companies['company_name'])


def get_company_ids():
    session, connection = connect_universe()
    companies_query = session.query(Company.company_id).statement
    companies = pd.read_sql(companies_query, connection)
    connection.close()
    return list(companies['company_id'])


def get_uninsured_ids(curr_date):
    population = query_population()
    in_force = query_in_force_policies(curr_date)
    uninsureds = population[~population['person_id'].isin(in_force['person_id'])]['person_id']
    return uninsureds


def get_customer_ids(company):
    session, connection = connect_company(company)
    id_query = session.query(Customer.person_id).statement
    ids = pd.read_sql(id_query, connection)
    return ids

def query_company():

session, connection = connect_universe()

companies_query = session.query(Company).statement

companies = pd.read_sql(

companies_query,

connection

)

connection.close()

return companies

def query_all_policies():

companies = query_company()

policies = pd.DataFrame()

for index, row in companies.iterrows():

company_id = row['company_id']

company_name = row['company_name']

session, connection = connect_company(company_name)

query = session.query(Policy).statement

policy_c = pd.read_sql(

query,

connection

)

policy_c['company_id'] = company_id

policy_c['company_name'] = company_name

policies = policies.append(policy_c)

connection.close()

return policies

def query_in_force_policies(curr_date):

companies = get_company_names()

in_force = pd.DataFrame()

for company in companies:

session, connection = connect_company(company)

exp_query = session.query(

Policy

).filter(

Policy.expiration_date == curr_date

).statement

company_in_force = pd.read_sql(

exp_query,

connection)

in_force = in_force.append(

company_in_force,

ignore_index=True

)

connection.close()

return in_force

def get_company_names():

session, connection = connect_universe()

companies_query = session.query(Company.company_name).statement

companies = pd.read_sql(

companies_query,

connection

)

connection.close()

return list(companies['company_name'])

def get_company_ids():

session, connection = connect_universe()

companies_query = session.query(Company.company_id).statement

companies = pd.read_sql(companies_query, connection)

connection.close()

return list(companies['company_id'])

def get_uninsured_ids(curr_date):

population = query_population()

in_force = query_in_force_policies(curr_date)

uninsureds = population[~population['person_id'].isin(in_force['person_id'])]['person_id']

return uninsureds

def get_customer_ids(company):

session, connection = connect_company(company)

id_query = session.query(Customer.person_id).statement

ids = pd.read_sql(id_query, connection)

return ids

Claims Reporting

If you looked carefully, you may have noticed an additional date field added since last month. In addition to the event date, synonymous with occurrence date, we now have the report date – that is, the date the claim is reported to the insurer. This distinction has important implications in actuarial science, since different types of policies exist that may only offer coverage on claims that either occur during or are reported during the period the policy is in effect.

For now, I have assumed that all policies are written on an occurrence basis, which means that a policy covers all claims that occur within the coverage period. This means that even if an insured reports a claim several years after is has occurred and can prove that its valid and occurred during the policy period, it is still covered under the terms of the policy.

I decided to add the claims reporting facility as a method in the broker class, which now resembles more of a place where transactions happen than an actual brokerage:

    def report_claims(self, report_date):
        # match events to policies in which they are covered
        session, connection = connect_universe()

        event_query = session.query(Event).\
            filter(Event.report_date == report_date).\
            statement

        events = pd.read_sql(event_query, connection)
        connection.close()

        policies = query_all_policies()
        claims = events.merge(
            policies,
            on='person_id',
            how='left'
        )
        claims = claims.query(
            'event_date >= effective_date '
            'and event_date <= expiration_date'
        )

        claims = claims.drop([
            'effective_date',
            'expiration_date',
            'premium',
            'company_id'
        ], axis=1)

        companies = get_company_names()
        for company in companies:
            reported_claims = claims[claims['company_name'] == company]

            reported_claims = reported_claims.rename(columns={
                'event_date': 'occurrence_date',
                'ground_up_loss': 'incurred_loss'
            })

            reported_claims = reported_claims.drop(['company_name'], axis=1)

            session, connection = connect_company(company)
            reported_claims.to_sql(
                'claim',
                connection,
                index=False,
                if_exists='append'
            )
            connection.close()

def report_claims(self, report_date):

# match events to policies in which they are covered

session, connection = connect_universe()

event_query = session.query(Event).\

filter(Event.report_date == report_date).\

statement

events = pd.read_sql(event_query, connection)

connection.close()

policies = query_all_policies()

claims = events.merge(

policies,

on='person_id',

how='left'

)

claims = claims.query(

'event_date >= effective_date '

'and event_date <= expiration_date'

)

claims = claims.drop([

'effective_date',

'expiration_date',

'premium',

'company_id'

], axis=1)

companies = get_company_names()

for company in companies:

reported_claims = claims[claims['company_name'] == company]

reported_claims = reported_claims.rename(columns={

'event_date': 'occurrence_date',

'ground_up_loss': 'incurred_loss'

})

reported_claims = reported_claims.drop(['company_name'], axis=1)

session, connection = connect_company(company)

reported_claims.to_sql(

'claim',

connection,

index=False,

if_exists='append'

)

connection.close()

This function takes the report date as an argument, and then combs through both the event table from the environment and the policy table from each company to match the event to the appropriate company and policy, and then populate that company’s claim table.

The actual claims cycle is much more complex, involving adjusters, claimants, lawyers, agents, and other parties. These have been abstracted away for now, but I plan to add more detail later since claims modeling is itself an important part of an insurer’s strategy.

I’ve also made many changes to the other classes, but I wanted to highlight claims reporting since it’s a new feature. You can examine the other changes on GitHub.

Simulation

By encapsulating the queries, I’ve removed the number of arguments and lines of code needed to run the simulations. For example, we can now define an Insurer with just two arguments:

company_1 = Insurer(4000000, 'company_1')

company_2 = Insurer(4000000, 'company_2')

company_1 = Insurer(4000000, 'company_1')

company_2 = Insurer(4000000, 'company_2')

Whereas previously, it took five aguments:

company_1 = Insurer(gsession, engine, 4000000, Company, 'company_1')

company_2 = Insurer(gsession, engine, 4000000, Company, 'company_2')

company_1 = Insurer(gsession, engine, 4000000, Company, 'company_1')

company_2 = Insurer(gsession, engine, 4000000, Company, 'company_2')

The need to specify the connection, engine, and table was cumbersome and is now handled automatically by the Insurer class.

Finally, I tested out how the data flows from policy issuance to claims occurrence, and then to renewals, and MIES is still able to demonstrate the phenomenon of adverse selection:

Further Improvements

I’d like to get back to the subject of economics next week, so now I’m working on implementing the Slutsky Equation, which decomposes the motivations a consumer might have for changing their consumption as prices change.

No. 139: MIES – Schema Changes and Claims Reporting

Current Status

Splitting the Database

Query Encapsulation

Claims Reporting

Simulation

Further Improvements

Leave a Reply Cancel reply

Post Navigation

Archives

Categories

Links

Texas Cycling