Quantifying the unquantifiable

Reinsurers and ILS funds are frequently in the business of risking large amounts of investors' capital based on very little relevant data. Some contracts cover very remote risks such as meteor strikes, others cover risks, like terrorism,  that change significantly from year to year. Statistical methods are little help in these kinds of situations and expert judgment becomes a crucial tool. But experts often disagree and it can be difficult for non experts draw strong conclusions from contradictory opinions. Dr Raveem Ismail and Scott Reid discuss a new approach for systematically quantifying expert judgments.

Screen Shot 2016-05-11 at 9.44.08 AMThere are no hard facts, just endless opinions. Every day, the news media deliver forecasts without reporting, or even asking, how good the forecasters who made the forecasts really are. Every day, corporations and governments pay for forecasts that may be prescient or worthless or something in between. And every day, all of us - leaders of nations, corporate executives, investors, and voters - make critical decisions on the basis of forecasts whose quality is unknown. (Superforecasting: The Art & Science Of Prediction).

Ideally all decision-aiding models would be based on objective criteria such as exhaustive data and sound physical principles. This ideal situation rarely occurs, and reinsurers an ILS funds frequently act in data-poor environments, relying heavily on expert judgement. This occurs particularly in low frequency high severity/loss practice areas, such as Life and Health & Care, and unusual, rare and catastrophic risk appraisal is almost entirely based on expert judgement. Given that Solvency II requires assessment of 1-in-200-year events, the regulatory capital regime across the EU is also based on expert judgement application.

Decision makers can and should demand the most unbiased expert judgement procedures, with objective criteria to appraise expert performance. But how? Referencing a first actual study, we discuss one approach, used in others fields but not yet in reinsurance. This is Structured Expert Judgement (SEJ), which is an auditable and objective combination of multiple judgements, each weighted by its skill in gauging uncertainty. This produces a better overall judgement within a plausible range of outcomes.

Expert opinion
A single expert’s judgement might be an outlier, but consulting ten experts will yield ten different answers. Each answer is an (unknowable) function of an expert’s previous experience, grasp of data, judgemental capability, biases, mood on the day, etc. Without a method of selecting between so many different judgements, the customer (insurance companies) often simply sticks with what they know best: a longstanding provider/relationship, or market reputation/brand. None of these is any indicator of capability: the client cannot know the quality since no performance-based appraisal of forecasting ability has occurred. Any simple averaging leads to limited gains since each expert is weighted equally without regard for capability: the final answer may actually be less accurate than some individual answers due to outliers.

Structured Expert Judgement (SEJ)

SEJ differs from and extends previous opinion pooling methods. Each expert is first rated with regard to prior performance by being asked a set of seed questions to which the answer is already known to the elicitation facilitator, but not necessarily to the expert. Each expert’s performance on these seed questions ascertains that expert’s weighting. Experts are then asked the target questions; the actual judgements being sought, to which answers are not known. Weightings drawn from seed questions are then used to combine the experts’ judgements on the target question, producing one outcome which truly combines the different expert judgements in a way which is performance-based, and is thus potentially better than each individual answer. Seed question design is critical: these must be chosen for their tight alignment with target questions; testing the same ability required for target questions, thus maximising the utility of performance weighting. The effectiveness of the weighting will be impacted by poorly designed seed questions.


Screen Shot 2016-05-11 at 9.44.03 AMPolitical Violence frequency

Cooke’s Classical Model[1] for SEJ involves asking each expert for two metrics: a confidence interval between which they think the true value lies (5% to 95%) and a central median value. These are then used to calculate how well the expert gauges uncertainty spreads (“information”), and how reliably they capture true values within their ranges (statistical accuracy or “calibration”).

A first elicitation was performed[2] in January 2016, with 18 seed and 8 target questions. This was for an inherently unknowable future metric: the 2016 frequency of Strikes Riots & Civil Commotion in blocs of countries (Central Asia, Maghreb, etc.), with participants drawn from across the reinsurance profession. An example of their judgements on a single seed question (prior Strikes Riots & Civil Commotion events in South-East Asia) are shown in the first figure.

The table shows information and calibration scores across the full seed question set. Two experts emerge with notably strong performance-based weights. If all experts were weighted equally (last column), this discovered capability would be diluted away (“equal-weighted” row, table foot).

However, if the experts’ judgements are combined using the weights from the calibration exercise (penultimate column), then a combination emerges which capitalises on these high performance experts to produce better results than all of them (“performance-weighted” row at table foot). When this performance-weighted combination is used for a target question, the second figure results.

Now for this forward-looking question, there is no known answer, yet we see that the performance-weighted process has allowed the influence of Experts 1 and 4 to provide a much tighter and more informative judgement than would most individual experts, or the equal-weighted combination (which is inflated by outliers). For the performance-weighted combination, outliers are ameliorated and identified experts, given more weight. Such a final frequency, with associated range, could now feed a pricing/catastrophe model with greater assurance than customary approaches.

Conclusion

Structured Expert Judgement is still judgement. But it is not guesswork, being a transparent method of pooling multiple opinions, weighted according to performance criteria aligned to the actual judgements being sought. Where data or models are lacking, it forms an objective and auditable method of producing decision-making judgements and inputs to models. We have described a first SEJ elicitation in our area of interest, where this method has been shown to identify and outperform uncalibrated methods.

It should be noted that SEJ is not a silver bullet: where there are science-based models or suitable data, these should trump expert judgement (or be used in tandem). But in their absence, in classes of business such as political violence, and for situations where tail risk is being gauged, SEJ would look to naturally provide significant enhancement to decision-making and risk appraisal.

 

Screen Shot 2016-05-11 at 10.11.35 AMDr Raveem Ismail. DPhil, MSc, MPhys (Oxon), MInstP. (raveem.ismail@oxon.org) Raveem is a Specialty Treaty Underwriter (focussing on challenging risks such as terror and cyber) at Ariel Re, Bermuda, and chair of the Reinsurance Special Interest Group of the COST action (see below). He was previously at Validus and Aon Benfield, and has consulted for quantitative political violence at IHS Exclusive Analysis. Raveem is a triple graduate of Oxford University where his research was in atmospheric physics modelling.

Screen Shot 2016-05-11 at 10.11.46 AMScott Reid. FFA, BSc (Hons). Scott is a Fellow of the Institute & Faculty of Actuaries, Head of Pricing & Reinsurance at AIG Life, UK, and is a member of the IFoA Health & Care Research committee. He previously was a broker at Aon Benfield, and worked at reinsurers SCOR and Gen Re.

 

 

www.expertsinuncertainty.net The ISCH EU COST Action IS1304 for Structured Expert Judgement aims to bridge the gap between scientific uncertainty and evidence-based decision making. The political violence elicitation referenced here took place in London in January 2016, kindly hosted by Dickie Whitaker and the Lighthill Risk Network, run by the COST Action’s Reinsurance Special Interest Group, principal investigators being Dr Raveem Ismail, Christoph Werner (Strathclyde University), and Professor Willy Aspinall (Bristol University). The full write-up from the study will be available at http://1drv.ms/1VZkuGh.

[1] Experts In Uncertainty. 1991. Oxford University Press.

[2] The full study, Structured Expert Judgement (SEJ) For Political Violence In (Re)insurance, will be a forthcoming publication in a scientific journal (permanent URL: http://1drv.ms/1VZkuGh, corresponding author raveem.ismail@oxon.org).

Posted: Monday, May 16th, 2016