Consistency and Stability of Composite-Reliant Bank Failure Models Over Time

Stuart Paul

Major Professor: Carlos D Ramirez, PhD, Department of Economics

Committee Members: Lawrence H White, Garett B Jones

Online Location, Online
March 17, 2021, 01:00 PM to 03:00 PM


Early warning systems designed to indicate which banks are most at risk of failure have generally consisted of relatively traditional statistical techniques, such as Logit and Probit models. The predictive power of the traditional statistical techniques are low by modern standards, but the traditional models are relatively interpretable and the determinants of bank failure can be simply identified. Over the past 40 years, however, as computing power has improved, the techniques available for forecasting bank failures have become increasingly sophisticated. With increased sophistication and improved predictive ability has come decreased interpretability.

Chapter 1 considers the nature of this problem. The academic literature has embraced these modern techniques and has devoted much research time and ink to conducting horse races between them. In doing so, the ability of regulators and academics alike to identify the determinants of bank failure has decreased precipitously and models are largely viewed as “black boxes”. With machine learning models often relying on the creation of composite variables, the use of decision trees, and the selection of subsets of independent variables explaining the maximal variance in the banks’ outcomes, a new question must be considered: Can we at least expect these “black box” techniques to be consistently constructed over time?

Chapter 2 begins to address this question by examining the frequency with which the commonly-used predictors of bank failure are grouped together to form the same composite variables. I conduct principal components analysis (PCA) on a quarterly basis with an increasingly large dataset intended to simulate the quarterly filing of bank call reports. The collinearity of the independent variables included in the analysis determine their assignments to the composite variables, and each composite is interpretable as a reflective construct. I find that although the same sets of independent variables are commonly grouped together to form the same composite variables, the relative importance of each independent variable in the creation of each composite is not nearly as consistent. Machine learning methods relying on composite variable construction may be consistently constructed, but methods relying on variable subset selection are far less likely to be consistently constructed. The in-tandem use of one method for predictive purposes and another for interpretive purposes is likely to mislead those using the models to direct bank examinations.

Chapter 3 searches for an explanation for the frequency with which the most commonly-constructed composite variables are created. I estimate a correlation matrix each quarter for which data are publicly available, once again using a dataset that expands to simulate the increasing availability of data as banks file new quarterly call reports. This process creates a panel of quarterly estimates of the correlation coefficients for each pair of independent variables. I find that the independent variables that are most commonly paired in the construction of composite variables have non-stationary pairwise correlations, and the level of their correlation – and their subsequent grouping on a single composite variable – therefore cannot be predicted, per se. Their average relative correlation, however, is typically sufficiently high that they are still likely to contribute to the same composite variables over time.

In Chapter 4 I examine the tradeoff between a relatively simple early warning system's predictive power and its interpretability by comparing a logistic regression model estimated using all available independent variables and a logistic regression model estimated using the principal components constructed in Chapter 2. These competing techniques create a potential tradeoff between increased predictive power  and increased interpretability. I find that when using a Logit value cutoff adequate for competitive model specificity there is no cost of reduced predictive power when using composite variables. Banking researchers can, therefore, jointly consider model performance and interpretability to a greater extent than is currently commonly believed. I then offer concluding remarks and policy recommendations.

The results of this research program serve as a warning to financial regulators and those expecting to deploy early warning systems on a continual basis. It is reasonable to expect “black box” models to be consistently constructed using composite variables; yet, it is unreasonable to claim that we can predict the composition of composite variables or the relative importance of the independent variables used in their construction. We must be wary, therefore, of systems that use multiple machine learning techniques jointly, as the consistency of each model’s construction will differ. Lastly, the tradeoff between predictive power and interpretability of relatively simple early warning systems is not as severe are would be anticipated given the bimodal state of the current literature.