The dangers of model inaccuracy over time
Financial institutions use models to predict customer behavior, make decisions, and plan strategy. While the performance of these financial models is often key to their success, institutions cannot always tell if they’re working effectively.
A bank creates a capital stress-testing model, management has signed off on the methodology, the assumptions are carefully documented, and the model consistently produces results within a couple percentage points of the previous quarter. Sounds like it is working perfectly, right? Not necessarily.
Even if a model makes the right assumptions and employs the right techniques, it may not be effective over time if changes in market conditions, customer behavior, or other factors are not effectively monitored and incorporated. As noted in the Federal Reserve’s Supervisory Guidance on Model Risk Management (SR 11-7), one of the major sources of risk is errors in the model; these can occur at any time.
Fortunately, tests of model performance allow model developers, validators, and management to determine whether a model’s effectiveness has degraded over time and make changes before imprecise outputs lead to bad decisions.
Financial models are built leveraging a myriad of different techniques and methodologies, from a basic linear regression to black box artificial intelligence (AI) and machine learning (ML). The best method to assess model performance will depend on the type and complexity of the model and its variables. The more common performance testing methodologies and indicators include:
Back testing assesses model viability by comparing actual historical results to predicted model results. One of the simpler performance tests, it can be used for a variety of model types. If the actual historical results of a process that is being modeled differ materially from the model’s output for the same period, then investigation is warranted. This requires testing the model’s assumptions and mechanisms to find potential causes of the inaccuracy.
Sensitivity analysis applies a small change to a key variable. If the model output changes by a large measure, the model developer knows how sensitive their model is to changes in the environment and can take that into account as they consider assumptions and results.
While sensitivity analysis uses small changes, stress testing involves multiple, extreme shocks to envision what could happen in a variety of “worst case scenarios.”
For example, if a bank is worried about having enough capital to withstand a major economic shock, they can execute stress test scenarios that shock key variables like interest rates and unemployment. The results will help them understand what scenarios might still allow them to maintain adequate capital in spite of extreme market conditions. Since it enables proper risk mitigation planning, stress testing is used regularly by financial regulators.
Mean Absolute Percentage Error (MAPE)
MAPE measures the accuracy of a model by comparing the ratio of actual results and the results forecasted by the model. While useful, MAPE should not be the only reason to select one model over another. A higher or lower MAPE can often be explained—and may not be as important as other factors.
Independent variables should be independent or non-correlated. When multiple independent variables are correlated—the definition of multicollinearity—it becomes difficult to know how each independent variable interacts with the dependent variables and what effect a change in each will have. Multicollinearity can be detected using variable inflation factors (VIF). The VIF score will tell you approximately how well each of your independent variables is explained by other independent variables.
One of the assumptions of linear regression modeling is homoskedasticity—i.e. the variance of the residual term is constant or nearly so. When heteroscedasticity—a variable fluctuating across the range of values of another variable that predicts it—is evident, a careful review of the model variables should follow.
Heteroscedasticity is easy to see on a graph because the residual errors will tend to “fan out” over time. If heteroscedasticity is present in a regression model, it is a serious issue.
The case of AI/ML models
If a model uses AI and ML techniques, its performance will not always be easy to tie to a particular variable or attribute. Most of the prior performance testing methods will not work since there are multiple layers of regression happening, and it is difficult to parse them.
A simple way to assess AI and ML model performance is to look at the model’s predictions and compare them to the actual outcomes. How many false positives are there? How many true negatives? This will help you identify potential issues even if the nuances underlying the methodology are not transparent.
AI and ML models are typically used to forecast patterns—e.g., to predict whether someone will open a credit card or not. They are more usefully viewed as “trained” rather than “built”. The key is to examine the data on which they are trained and constantly update the training models for their current context.
Model assessment is important both before and after launch—to proactively identify structural issues and quickly discern developing issues. Make sure your teams are conducting periodic performance tests and the results are within established thresholds. If not, escalate the issue.
Model performance will not always be based on something quantitative. If a model predicts outcomes that can be greatly affected by the regulatory environment, cultural trends, or economic markets, conduct ongoing research in those areas to determine if you need to adjust qualitative or quantitative model factors.
Thresholds are helpful, but don’t rely on one number to determine whether a model is underperforming. An unexpected result should only be the catalyst for further examination, not rash changes.
While model evaluation is technical, it is important to regularly share results with management. Subject matter experts should provide context so executives understand what they are seeing.
The performance testing methods outlined here provide a useful starting point. However, with the proliferation of new technology and advanced statistical methods, your team should consider alternative performance tests that more closely fit your model type or environment.
Whether you use established or emerging tests, focus on model performance, ongoing monitoring, and governance. To translate your efforts into business performance, you must review results on a timely basis and, most importantly, do so within a context informed by technical expertise and business judgment.