What we can learn from stress testing for AI regulation

This work was done as part of the call for case studies to inform AI regulation. Thanks to Patrick Levermore for conversation and feedback on this project.

Bullet Point Summary

Stress tests assess if banks can withstand economic shocks without public bailouts
Fed, ECB, and BoE implemented stress tests after the 2008 crisis
Tests have quantitative and qualitative components
Tests assess solvency and liquidity
Evidence suggests tests restored confidence but hard to prove they prevent crises
Central bank independence was key to the force of tests
Credit rating agencies failed before the crisis, analogous to AI auditors
No race to the bottom between jurisdictions
Industry standards have heavily influenced regulation
Banks haven't substantially gamed the stress test system
The crisis was critical to implement the stress test system

Executive Summary:

After the 2008 financial crisis, major central banks implemented stress testing regimes for banks to assess their ability to withstand economic shocks, and in particular to ensure that contagion of financial crisis could be prevented without recourse to public funds. The US Federal Reserve, European Central Bank, and Bank of England instituted stress tests to evaluate bank solvency and liquidity. The tests have quantitative components using economic models and scenarios as well as qualitative evaluations of risk management. There is evidence that the stress tests have restored confidence in banking systems, but it is difficult to conclusively demonstrate the tests prevent crises due to the rarity of crises.

There are a number of lessons we can take from this report for AI regulation. Firstly, it seems likely that good legislation will only be passed after a crisis has already happened. The Basel Accords provide some evidence against this - none of the was precipitated by the crisis - but they weren’t sufficiently strong to prevent the crisis and took many years to be passed and implemented. On the other hand, despite the well-documented and well-known ability of banks to engage in regulatory arbitrage, there has been no race to the bottom with respect to stress tests. On the contrary, there’s been an extraordinary proliferation of stress testing standards around the world including in China without any explicit coordination between central banks. The key factor behind their spread seems to be the initial success they had in the US at calming the financial markets, and potentially also simply their sound logic.

An important finding from this report is that it is very strong evidence for the persistence that private regulatory standards, safety and risk management practices can have with a key driver for this seeming to be that it reduces the workload for policymakers and provides a model that they can use that has already had success in the industry being regulated. The lesson from this for AI is that it suggests that it’s highly plausible that lab-based governance practices will be adopted into law.

A finding particularly relevant for evals of AI systems is that keeping the specifics of evals secret from firms is important to prevent firms from gaming the system by designing their models to specifically pass the tests they know they’ll face. This appears to be accepted best practice with stress tests and a key failure in the financial crisis seems to have been securities created in concrete with credit rating agencies and thereby given good ratings by construction.

A potential problem that could arise from the regime of private firms acting as evaluators and auditors of AI models, particularly if the AI firms are paying the evaluating and auditing organisations, is that AI firms are able to use their market power to pressure the evaluators and auditors into giving less stringent tests. This dynamic happen prior to the financial crisis and there is a broad consensus in the literature that this played a part in why securities were incorrectly rated prior to the financial crisis which in turn played a part in the financial crisis itself.

Finally, the independence of central banks appears to have been important for two reasons - firstly it meant that they were able to respond quickly and drastically to the financial crisis without the need for new legislation to be approved. Secondly their independence, particularly in a US government context where central banks are structured much more similarly to Weberian bureaucracy than other parts of the US government with the exception of the military, meant that they had a lot more capacity and independence from lobbying to carry out stress tests than previous organisations tasked with carrying out stress tests.

This pattern was repeated in the EU where stress tests performed by the central bank were much more successful than initial stress tests that were not. All countries seemed to have converged on their central banks carrying out stress tests rather than other financial regulators they may have. Ideally, I think a US AI regulatory body would have the same structure as the Fed but this seems unlikely to be feasible. On the margin, this seems to strengthen the case for the US military playing a larger role in AI regulation.

Introduction:

Stress testing is a risk management practice in which regulators assess how a bank's balance sheet would respond to a hypothetical adverse economic scenario. Major central banks implemented macroprudential stress testing regimes for banks after the 2008 financial crisis revealing risks that existing bank risk models failed to capture. The goal of the macroprudential stress testing regime by the US Federal Reserve, European Central Bank, and Bank of England, amongst other major central banks, was to evaluate whether banks had sufficient capital and liquidity to avoid taxpayer-funded bailouts during crises. By testing banks concurrently, stress tests aim to also capture risks of contagion across financial institutions. Stress tests typically have quantitative components based on econometric models, as well as qualitative evaluations of risk management practices. Stress tests also parallel the regulatory role of credit rating agencies before the crisis, serving as a public regulatory function carried out by independent central banks rather than fully private companies.

An important piece of general background comes from Acemoglu et al. They model financial contagion as banks failing and, connected by debtor-creditor relations, this failure propagating through the network of banks leading to other banks failing. They find that densely connected networks are protected from small shocks by the density of the network because the liquidity in the network can be shifted around the network to protect all of an insolvent bank's creditors. However, if a shock is sufficiently large that liquidity can’t protect an insolvent bank's creditors then the density of the network means that the failure propagates throughout the network to a greater degree than if the network had been less connected.

They then show that there’s financial stability negative externality - banks contract to be able to internalise the externality of banks failing on their neighbours but don’t contract to internalise the externalities for banks more than 1 away from them - i.e in a network with only 3 firms all the externalities of firms collapsing is internalised by all firms.

The implication of this is that there is a negative externality of the risk of the financial crisis in equilibrium without either a public or private effort at regulation to internalise the externality.

How Stress Tests Work:

Federal Reserve Stress Testing

The Federal Reserve's stress testing regime has two main components:

Dodd-Frank Act Stress Tests (DFAST): Assess bank capital adequacy assuming dividends paid at current rates and no share buybacks. Conducted annually for banks with over $250 billion in assets, although initially, it was only $50bn
Comprehensive Capital Analysis and Review (CCAR): Banks submit capital plans including detailed dividend and buyback proposals. Plans are evaluated based on quantitative stress tests and qualitative review. CCAR occurs annually for banks with over $100 billion in assets.

Both DFAST and CCAR have quantitative and qualitative components:

Quantitative: Models using over 25 macroeconomic variables assess if banks remain solvent under adverse scenarios like 10% unemployment or equity market shocks. The Fed's models and scenarios are not disclosed to banks to prevent gaming.
Qualitative: Fed evaluates risk management, governance, internal controls, and capital planning. Banks submit detailed capital policies for review.

If banks fail either the quantitative or qualitative elements, they must resubmit less-risky capital plans. This acts as a binding constraint on bank capital distributions. CCAR in particular ties payouts directly to stress test results.

ECB Stress Testing

The ECB's stress testing also has quantitative and qualitative elements:

Quantitative: Banks across the euro area are modelled under hypothetical three-year adverse scenarios provided by the ECB. Similar to the Fed, scenarios cover shocks to economic growth, unemployment, interest rates, and asset prices.
Qualitative: Assesses bank risk management
Unlike the Fed, the ECB initially didn’t include liquidity stress tests as standard, although this has now changed

Their stress tests covered over 99 banks in 2023. If capital falls below required levels, banks are legally required to recapitalise.

Similarly to the Fed, ECB uses models it keeps private to evaluate banks, often requiring banks to update crisis scenario planning in comparison to what is required by banks' internal stress tests.

Bank of England stress tests

Bank of England stress tests are similar to the Fed’s and ECB’s but include an explicitly anti-cyclical element where stress scenarios are designed to be more stringent during “peacetime” and less stringent when demand is weak
The Bank of England only began regular macroprudential stress tests in 2014, significantly later than either the Fed or the ECB. My speculation is that this is the British financial system was subject to the same levels of stress as either the European or US banking systems meaning stress tests were started explicitly as macroprudential measures rather than starting as a way to try to restore confidence in the banking systems as they were in the US and EU.

How Effective Are Stress Tests?

There are conceptual challenges in assessing stress test efficacy:

- No crises during the regime to test performance

- When practical, the status quo continues of no bank failures

- The Acemolgu results suggest that large shocks that would cause failure contagion in the highly interconnected global financial system happen rarely and that small shocks won’t cause contagion

Some positive evidence:

- Market reactions to stress test results

- Banks recapitalizing after failing the stress tests

- Restored confidence in the banking system

Restoring confidence in the banking system

There is a strong consensus that the US stress tests in 2009 were critical to restoring confidence in the banking system with then-Fed chair Ben Bernake describing it as a turning point in the financial crisis. Similar, though less glowing, assessments, were made of the 2014 stress tests and assessments were made of the 2014 ECB stress tests in conjunction with the asset quality review. Large banks failed both stress tests and were subsequently forced to recapitalise.

The significance of the restoration of confidence in the banking system is that it is a measurable way in which stress tests had a positive effect that can be assessed on the relatively short scale with which we have to assess the efficacy of stress tests.

There is also statistical evidence for the success of stress tests. Morgan et al and Georgescu et al look at the evidence for whether stress tests provided new information to markets for the 2009 US stress test based on weather returns for firms were statistically different from normal. In both cases, the identifying assumption is that the announcement on a specific date of the announcement of the results of the stress tests was exogenous to the share price. I find this identifying assumption plausible - it seems like there was essentially no other event on the day of the announcement that could have affected stock prices to a similar degree to the announcement. However, neither paper uses techniques such as placebo tests to attempt to verify the identifying assumptions.

Both Morgan et al and Georgescu find that the announcement of the results resulted in abnormal returns implying that the stress tests provided information for the markets. It is notable that this finding held for both the crisis stress tests of 2009 and 2014 and the start of the EU macroprudential tests in 2016. This provides some evidence that stress tests do have some positive effects, although it of course doesn’t demonstrate that they pass the cost-benefit test.

Evidence from the 2023 banking crisis

Silicon Valley Bank Failure

The 2022 failure of Silicon Valley Bank (SVB) highlights challenges in regulating fast-growing regional banks. SVB expanded rapidly from a niche lender to a tech startup into a mid-sized bank with inadequate risk management. When interest rates rose in 2022, SVB took large losses on its bond portfolio due to inadequate hedging. This sparked a run on uninsured deposits exacerbated by social media.

SVB was regulated by the San Francisco Fed with less stringent rules than systemic banks. The Fed review blamed poor oversight during its growth and disruption from COVID-19. SVB’s management incentives rewarding risk-taking without balancing risks also played a role.

On balance, this seems like weak evidence against the efficacy of stress tests. SVB seemed all things considered a well-run bank albeit with poor risk management policies, and certainly was solvent. It, therefore, seems that stress tests should have caught the failure to hedge against interest rate increases. However, there wasn’t substantial contagion as a result of failure (although SVB was still a small bank) and most of the failures weren’t on the part of the stress tests aimed at the structurally important banks where most of the focus of stress tests lies.

See the appendix for more details on the collapse of SVB.

Credit Suisse Failure

The failure of Credit Suisse in 2023 after large losses show orderly resolution is possible for global systemically important banks. Credit Suisse had become undercapitalized after scandals and bad investments. Declining profits led to shareholder dilution, deposit outflows, and debt concerns. The collapse of SVB further strained finances.

UBS acquired Credit Suisse with significant losses forced onto shareholders and contingent convertible (CoCo) bondholders. No public bailout was required. Regulations since 2008 enabled the absorption of losses and transfer of assets outside of insolvency. This orderly wind-down of a major global bank suggests that post-crisis reforms made the financial system more resilient, including the macroprudential stress testing regime.

The converse of this result is that the stress tests didn’t lead to Credit Suisse being capitalised to the degree that it could withstand the shock from the failure of SVB. I mostly don’t think that this is an especially strong critique - Credit Suisse was a poorly managed bank and it was correct that it was taken over by a better-managed bank and good that this could be achieved without contagion. When setting the stress test requirements there must be acceptances of both false negatives and false positives - there is nothing in the failure of Credit Suisse that suggests to me that the ECB stress tests weren’t reasonably close to the Pareto frontier of the false positive and false negative rate.

The difficulty of course in evaluating the degree to which stress tests successfully prevented contagion following the failure of Credit Suisse comes from the Acemoglu result - it is unclear if the failure would have been large enough to cause contagion. Additionally Credit Suisse - like Bear Sterns and unlike Lehman Bros - didn’t fail but was bought out by UBS. The experience of the financial crisis suggests that a failing firm being bought out can be sufficient to prevent a crisis where had the firm failed a financial crisis would have followed.

See further details of the collapse of Credit Suisse in the appendix.

Key Findings

Central Bank Independence Critical

The independence of central banks from political pressure was critical for implementing credible stress tests. The Federal Reserve has statutory independence that shields it from industry regulatory capture and political interventions, unlike agencies like OFHEO that regulated Fannie Mae and Freddie Mac with inadequate stringency. This independence meant the Fed could take an aggressive supervisory stance despite industry opposition. The Fed's stress tests were more extensive than international Basel II standards or pre-crisis bank regulations. Without the Fed's political autonomy, the stress testing regime likely would have faced greater industry lobbying or legislative opposition. The evidence comes from contrasting the Fed's actions with OFHEO's weak oversight of Fannie Mae and Freddie Mac, which succumbed to regulatory capture.

Scott et al show that the OFHEO stress tests had three key flaws that meant that despite Mae and Mac passing the stress tests with flying colours they needed $191bn in bailout money in 2008 to stay solvent. Firstly, the model was calibrated using the 1993 data that was available when it was first created and has not been updated since then. The authors argue that this is due to the requirement on OFHEO to update fully inform firms of the updates to the model due to simulations in the original legislation, rather than having the authority to unilaterally update the model. Secondly, the model just didn’t consider sufficiently adverse scenarios. The most severe scenario in the stress test had US house prices falling 13% while the actual fall in the financial crisis was 18%. It’s not clear though ex-ante that it was a mistake for the stress tests not to include more adverse scenarios since there is an unavoidable tradeoff between false positives and false negatives in setting the correct upper threshold on adverse scenarios that Mae and Mac were forced to consider. Finally, the model didn’t consider the effects on future revenues in addition to the effects on the return on assets Mae and Mac already owned following an adverse shock to the housing market.

The Fed’s own review of the failings that led to the collapse of SVB provides additional evidence for the importance of independence in achieving good outcomes. In their report, they identified pressure by the Trump administration to reduce the regulatory burden on banks as a reason why stress tests on SVB weren’t carried out to the degree that would have prevented the collapse, in addition to concerns from their political masters about whether stress tests were violating due process. It’s unclear to me the degree that this should be discounted on account of it being in the Fed’s corporate interest to try to assert its independence.

Prior to the more successful stress tests in 2014, the EU, via a regulatory agency rather than the ECB, carried out multiple stress tests that completely failed to identify weaknesses in banks that failed soon after they were carried out. It’s very hard to assess the causal effect of the ECB taking over the 2014 stress that was much more successful than previous stress tests judged by market reaction and false negative rate, but it is at least suggestive of a continuation in the pattern of independent central banks carrying out stress tests better than other organisations.

This example relates to competence rather than decisions being distorted by political interference or hamstrung by a lack of operational independence.

Both explanations for the importance of independence are supported by much broader literature on the importance of the independence of state institutions for the development of state capacity. See the appendix on state capacity for more details. This is particularly pertinent in the US context where the only two state institutions staffed at the senior levels by professional civil servants rather than political appointees who are subject to cheap removal by the President are the Fed and the US military. The US is very unusual amongst rich countries in this respect.

Credit Rating Agencies Analogue

There are analogies between the credit ratings agency regime before the crisis and the emerging paradigm of independent organisations carrying out AI auditing and evaluations.

Credit ratings agencies arose in the early 20th century and their ratings were progressively given legal authority. The Securities and Exchange Act of 1934 gave official roles to designated rating agencies. The 1975 NRSRO rules required certain regulated institutional investors to only hold securities rated investment grade by approved rating agencies.

This codification into law gave huge weight to the judgments of a small number of private agencies. However, flawed incentives and inadequate risk models led rating agencies to systematically underestimate the risks of mortgage-backed securities. Only a few large banks engineered these complex securities, and they were the issuers paying rating agencies to evaluate them. This concentrated market power meant rating agencies had skewed incentives towards pleasing large issuers. Their inaccurate ratings on mortgage-backed securities and related derivatives contributed to the 2008 crisis.

Perhaps the most worrying analogy between the role that rating agencies played in the financial crisis and the emerging paradigm in AI auditing and evaluations is the potential similarity between credit rating agencies playing a role in designing securities that later rate and AI firms knowing in advance the sorts of tests that their models will face. Rating agencies play a role in designing the securities they later rated meaning by construction they achieved high ratings. A similar problem could arise where AI firms construct their models to get safe ratings by the auditors which they wouldn’t have been able to do so had they not known the structure of the tests in advance. Central banks seem to have thought along similar lines by structuring stress tests to have novel scenarios every year and the Fed keeping their models private.

See the appendix for further details on the role of rating agencies in the financial crisis.

No International Race to Bottom

Despite the risks of international regulatory arbitrage, there is no evidence it affected the adoption of bank stress tests. All major jurisdictions implemented stress tests following the 2008 financial crisis. Critically, there has been no international coordination around the implementation of stress tests.

There is substantial qualitative and empirical evidence that financial institutions engage in regulatory arbitrage in other contexts. For example, US money market funds faced less regulation than banks so grew substantially as an alternative investment vehicle. However, this dynamic has thus far not been replicated in response to post-crisis stress tests. The widespread voluntary adoption shows central banks were relatively unconcerned by regulatory competition undercutting the efficacy of their standards.

There is also substantial statistical evidence that banks engage in regulatory arbitrage. Houston et al test for regulatory arbitrage using a variety of methods and with a variety of identification strategies. Funndemtally it’s very difficult to do causal inference in this area there aren’t good natural experiments where there are plausibly exogenous changes in financial regulation. Furthermore, it’s difficult to rule out reverse causation - it is difficult to show that banks don’t, for instance, lobby harder for softer financial regulation in places where they have more assets.

Houston et al though put together a battery of evidence that consistently shows meaningful, but not overwhelmingly large, effects on where bank assets as a result of the severity of financial regulation. None of their instruments is especially convincing - they use the GINI coefficient, years since independence, and models of regulatory contagion amongst others - but they consistently point to there being an effect. In sum, I think this provides some evidence that there is cross-jurisdiction regulatory arbitrage, but the effect sizes aren’t extremely large.

Clark and Ehbrahiem look at regulatory arbitrage specifically by seeing if firms moved risk into “operational risk” which wasn’t covered by the Basel I capital accords. Operational risk is essentially a catch-all term covering risks from human error and legal issues.They find that banks that are more constrained in the risks they take by regulation take on more operational risks. Theory suggests that banks closer to the somewhat arbitrary leverage limits imposed on them by regulation would if they weren’t constrained by regulation, take on more risk in exchange for higher expected returns.

Clark and Ehbrahiem don’t establish more than an associative relationship and also don’t test if this effect goes away after Basel II is implemented. They do however conduct a placebo test which supports their results. Specifically, they test whether operational risk actually being realised has the same effect which they find that it doesn’t. This is evidence against the possibility that the relationship between operational risk and distance-to-leverage ratio minimums is being driven by something other than an optimisation decision on the part of banks.

The coordination of central banks around stress tests - which are quite costly for the regulated banks - requires explanation. Acemolgu et al results establish that there is a financial crisis externality - the costs of a financial crisis are not contained to the country that it occurs in for a large financial crisis because there is a contagion, as happened in the 2008 financial crisis, the 1997 Asian financial crisis and the Great depression. Furthermore, it’s also been established that banks do move assets to places where there’s less stringent regulation and as well as finding other ways for the regulation not to apply. This seems to create a prisoner dilemma where each central bank is incentivized to defect (at least a degree - lots of the costs of the financial crisis are internalised, particularly in less that perfectly connected financial systems) by having or absent stress tests. Instead, we observe the proliferation of stress tests and the tests in fact greeting stronger over time, for instance with the ECB’s addition of stress tests that focus on liquidity.

At least some of the answers are provided explicitly by Alex Braizer, at the time a senior member of staff at the BoE, in an article in a special issue on stress testing - the US stress tests 2009 was judged to be extremely successful and other central banks wanted to emulate that success.

It’s notable that nowhere in the literature on stress tests, including that written by central bankers, was the issue of regulatory arbitrage discussed. This suggests to me that a cultural explanation is important - engaging in this kind of competition between jurisdictions isn’t in the mandate or culture of central banks and so as action it isn’t considered. I speculate that the effect sizes on the movement assets in response to financial regulation aren’t especially large meaning that the costs of increased financial regulation aren’t especially high.

Overall though this seems to be an important avenue for future research.

A final point to consider is the possibility that stress tests actually aren’t costly for the large banks that are most able to influence regulation because it creates greater barriers to entry for smaller banks and therefore protect their oligopoly profits. I think this is unlikely to be the explanation. US banks aggressively and successfully lobbied for less regulation throughout the last quarter of the 20th century and were successful. From this point, finance salaries diverged from non-financed salaries. This strongly suggests to me that the straightforward story where stricter financial regulation hurts large banks is correct.

Industry Standards Influenced Regulations

Stress tests emerged in the early 1990s in banks and were adopted very quickly by Basel II. Stress tests were pioneered though to measure mortgage credit risks more accurately since the mortgage market was unusually exposed to macroeconomic shocks like increases in the unemployment rate. The practices diffused through the banking sector during the 1990s and 2000s. Following the crisis, central bank stress test regulations built directly on these industry testing norms as well as on the specific techniques used. This created continuity between existing private practices and public oversight, facilitating adoption. Current stress takes make extensive use of bank’s internal stress testing models for instance.

Credit rating agencies are another example of industry standards influencing regulation, both in terms of the ratings of agencies becoming law and in terms of subsequent regulation of rating agencies coming out of industry standards. The wholesale nature of the adaptation ratings agencies' ratings in 1975 suggests to me that the driving force behind their adoption was that it substantially reduced the workload for the US government. It seems unlikely to me that this emerged as a result of rating agency lobbying. Firstly I’ve found no discussion of this in the literature. Secondly, rating agencies are quite small organisations in comparison to the financial organisations that they became de facto regulators of. Moody’s, the largest of the agencies, has only 3000 employees as of 2023 in comparison to Goldman’s 48,500 also as of 2023. This, in conjunction with the soft power that large financial institutions exert via the revolving door between them and the US treasury and Fed, makes it seem very unlikely to me that the rating agencies would be able to exert more political power than those firms in the area of their core interests.

Rating agencies became more heavily regulated in the US in the wake of the dot com crash and in the EU in the wake of the financial crisis. In both cases the regulation was based on the ISOCO code, ratings agencies own international standards.

Banks Largely Unable to Game System

While there is minor evidence of stress test “gaming,” the regime remains binding on bank capital and lending overall.

Cornett et al use a regression discontinuity design based around the cutoff of initially $100bn and later $50bn dollars in assets for the Fed’s stress tests of systematically important banks to assess changes in bank behaviour around stress tests

They find that banks subject to the stress tests for systematically important banks reduce their dividends and share buybacks in the quarter before the stress tests to a level below those banks not under the stress testing regime while in other quarters their dividends and share buybacks are higher than those other banks - this suggests some gaming of the system

They don’t defend their identification strategy by testing if either there in fact is something structurally different about banks with assets just over $50bn or below or $100bn or (more plausibly) that banks intentionally keep their asset levels below one of the cutoffs to avoid the regulatory burden

I don’t think either of those challenges to identification strategy is particularly dire - Cornett et al look only at stress tests up to 2013 - the stress tests only began regularly in 2011 which makes it seem unlikely to me that banks had enough time to implement a strategy to strategy to keep their assets below either of the cutoffs.

They also find that firms going above the systemically important bank stress test cutoff are associated with firms increasing their spending on lobbying. The authors cite other literature that shows that lobbying spending by firms in financial services is associated with a lower regulatory burden all else equal. However, they don’t demonstrate in the paper that in this specific case firms which spent more on lobbying were able to reduce their regulatory burden.

However, Schneider et al. find no evidence that large banks face less stringent stress tests than other banks. They test this by analyzing whether large banks are more likely to pass CCAR tests given their quantitative performance on the Fed's undisclosed stress test models. The authors find no advantage for large banks in CCAR pass rates, suggesting political or regulatory capture did not affect outcomes.

This paper doesn’t have an especially robust identification strategy for estimating whether large banks were in fact more or equally stringently treated than other banks relative to their to the risk level that they would be perceived to be at had they been a non-large bank. It’s perfectly conceivable that prior to the financial crisis, these firms were extremely overleveraged relative to the systemic risk that they pose due to size, such that the substantial decrease in capital payouts and leverage ratios post-financial crisis doesn’t cover the difference between these firms and the others.

Large banks have been unsuccessful in avoiding the US stress tests. At a high level, all of the largest banks did have to undergo stress tests while various smaller banks didn’t, and banks were semi-regular and found to be undercapitalised and forced to raise more capital.

Schneider et al test formally whether large banks were able to engage in regulatory capture by testing whether the large banks had less serve stress tests compared to other firms. They find that across a wide variety of measures large and more connected banks face harsher stress tests - they find that these firms are more likely to fail stress tests given how well the firms performed on Fed’s private model implying that either those banks had higher more aggressive levels dividend payouts and share buybacks relative to profitability given leverage, or they were more likely to fail the qualitative section of the stress tests. Furthermore, they also find that, as a percentage of total assets, large banks have lower payouts and that

On a more macro level, large banks have been forced to recapitalise, markets have reacted extremely negatively to firms failing stress tests and the threat of banning firms from rewarding shareholders with either dividends or capital buybacks has been executed. A notable example of this followed Citi failing the 2014 US stress tests and their market cap dropping 6% as a result. I find this high-level evidence quite compelling - it demonstrates that the stress tests were strenuous and that systematically important banks did fail them sometimes and had to bear the costs.

Finally, Coombs provides qualitative evidence that stress tests are having the desired effect. He interviews employees at both the BoE and banks subject to stress tests involved in the process. He finds that in particular the specific scenarios taken seriously by banks and their responses really are tailored to the specific scenario the BoE has constructed.

The crisis was critical to the adoption

The 2008 financial crisis was a crucial catalyst motivating the adoption of tougher bank stress testing. Gradual phase-in of stress tests had been occurring under the international Basel II accord but with the negations for the accord starting in 1999, finishing in 2004 by 2008 Basel II was still in the process of being implemented. But the Fed and ECB responded rapidly to the post-crisis need to restore confidence, with the first US-wide stress tests in 2009. The crisis shattered the deregulation paradigm that had ruled prior to 2008 and provided the impetus for new regulation.

In general, the policy takes months or years to pass through Congress and comes out of often many years of work in academia and think tanks. However, the first stress tests were used by the Fed in 2009 in the midst of the crisis having previously been a quite marginal tool in the central bankers' financial regulation toolbox. This is much faster than such a large piece of legislation would normally take to pass and be implemented and was driven by crisis.

The public and political support for harsh regulation also seems important and contingent on the financial crisis. The prior 20 years of US financial regulation policy had been, all things considered, driven by a convection that markets could regulate themselves culminating in the elimination of the separation between investment and commercial banking in 1993. The financial crisis shattered the perception of the efficacy of private regulation and seems likely to me to have played a role in the severity (although not punitive or retributive) stress testing regime that was adopted.

A caveat to this is that had already been a shift towards a more pro-regulatory stance by Fed following the dot com crash and in particular after the high priest of deregulation Alan Greenspan ended his tenure as Fed Chair in 2006 and was replaced by the more pro-regulation Ben Bernanke.

This is part of a broader pattern in financial regulation. The Glass-Steagal Act, the centrepiece of US financial regulation that established the FDIC and separated commercial and investment banking activities was enacted in 1933 in the wake of the great depression. The Fed was also established as a lender of last resort as a response to the great depression, a policy that eventually became standard practice.

As noted previously, the US adopted more stringent regulations for rating agencies following their perceived failure over the dotcom crash in 2000, and the EU only adopted more stringent regulations after the financial crisis.

Failure of SVB appendix,

The Fed’s internal assessment of what went wrong both at SVB places the blame on poor incentives for management at SVB combined with a relatively basic failure to hedge against interest rate increases at SVB.

Mechanistically, SVB bought US treasures while interest rates were low. Interest rates then dramatically rose in 2022 and early 2023 meaning that the treasuries bought by SVB lost value and when SVB sold those bonds those losses were realised. This sparked a panic about SVB’s solvency which was enhanced by a great majority of SVB’s deposits being uninsured. What followed was a classic bank run exacerbated by the speed the information spreads on Twitter.

At a higher level, SVB’s compensation structure for management incentivised imprudent risk-taking. SVB’s manager's pay was tied to performance but not to risk-adjusted performance. There’s much less evidence that this played a part in the causal chain for SVB’s collapse - only the Fed’s report has highlighted this as a factor amongst the literature I’ve read and more seriously there’s no statistical evidence for this theory nor have I seen any qualitative evidence - for instance, interview’s with SVB managers.

The Fed assessed their own failures as stemming from: the very rapid growth of SVB; the disruption from covid 19; and the policymaking during the Trump administration that pressured to be less strict in their banking regulation.

SVB grew extremely rapidly over the 2010s going from a small regional bank where stress testing was conducted by the SF Fed and liquidity, solvency and risk management practices had lower statutory limits than structurally important banks. SVB passed the $100bn asset mark in 2020 having only passed the $50bn in 2017.

At $100bn the standards used to assess banks are changed and the team that assesses them changes. In the Fed’s own assessment this, in addition to the disruption caused by Covid-19, meant that stress tests carried out on SVB were lower quality than they would otherwise have been.

Failure of Credit Suisse appendix

Credit Suisse failed following the distress of the US banking system in 2023 but this was the straw that broke the camel’s back.

In a sense, Credit Suisse was a success of the post-financial crisis regulatory regime. Credit Suisse was a poorly managed bank that made bad investments and had a large number of scandals. As a result, Credit Suisse was becoming progressively undercapitalised as its share price dropped and clients pulled their funds out of the bank. The role of the collapse of SVB and generically the stress on the US financial system in the failure of Credit Suisse was that this made investors more reluctant to have capital tied to a risky institution. The critical event was the refusal of the Saudi national bank - Credit Suisse’s largest shareholder - refusal to inject any more capital into the bank.

There was no contagion following the failure of the bank - Credit Suisse was a very large, systematically important bank which had assets around 70% of Swiss GDP.

The failure was resolved via the sale of Credit Suisse to UBS and the use of the “bail-in” bonds - Cocos - pioneered after the financial crisis. Cocos are a high-yield bond which in exchange for the high yield they earn their bonds, unlike other bonds issued by banks, aren’t senior to equity. This was used by Credit Suisse - specifically the highest yield Cocos were entirely written off - to reduce their liabilities to facilitate their sale to USB.

Credit Suisse, therefore, looks in some ways like a success - a formerly too-big-to-fail bank was allowed to fail and its assets transferred to a better-run bank without the use of public funds or contagion to the rest of the financial system.

Credit rating agencies appendix

Credit rating agencies played a key role in the financial crisis. The subprime mortgage-backed securities that were at the heart of the financial crisis were systematically rated as less risky than they in fact were by credit rating agencies, who also played a key role in engineering the products and their derivatives. Not only were they systematically rated as less risky than they in fact were, but the senior tranches of mortgage-backed securities were also often rated as prime securities, the highest grade, and so were used as short-term collateral for the repo markets that various financial institutions used to raise capital short term to pay short term debts. When the value of mortgage-backed securities dropped dramatically it meant that banks faced both questions over solvency and over short-term liquidity because they could borrow much less in the repo markets because the value of their collateral had dropped so substantially.

The solvency problem generically comes about when a large asset class is incorrectly priced while the liquidity problem came about because only very safe assets are used as collateral in the repo market (mostly treasury bonds, generally considered the safest possible debt). Therefore had rating agencies not classed mortgage-backed securities as extremely low risk there would not have been the liquidity crisis that, in particular, sunk Lehmann brothers. Lehman was probably solvent but lacked the liquidity to pay its short-term debts since it could no longer rely on the repo market to the degree it previously had. It is important to note though that Lehman only came to rely on the repo market so heavily as a result of the solvency issues caused by its investments in securities tied to the housing market, after the downturn in the housing market that began the financial crisis,

The key institutional factors behind the failure of credit rating agencies to properly assess risk and why this was so catastrophic:

Credit ratings from agencies were given the force of law - as a result, pension funds and other financial institutions required by law to only invest in very safe assets took the ratings as close to gospel
Sell-side paid for ratings to be carried out - for MBS and derivatives on MBS only a small number of very powerful firms were engineering these assets giving them a lot of market power
Unlike corporate and government debt models for assessing MBS+deriviatives debt opaque - much less clear if rating agencies were systematically undervaluing risks and so reputational consequences could act as less strong constraints
Agencies involved in the financial engineering of MBS+derviratives meaning that almost by construction the financial instruments would be rated as investment grade
There are negative externalities from underestimating the risk of assets because of the contagion of bad loans in sufficiently large sufficiently interconnected financial systems meaning that we should expect risks to be systematically underestimated relative to the social optimum (although to some degree ratings agencies internalise this in that they lose business if the financial sector shrinks)

Post financial crisis there have been substantial reforms of the credit ratings agencies, specifically preventing them from providing agencies rating financial products they were involved in designing.

State capacity appendix

The concept of a Weberian bureaucracy, proposed by German sociologist Max Weber, refers to an ideal type of administrative structure characterized by hierarchical authority, formal rules and procedures, division of labour based on expertise, and impersonal relations between administrators and the public. A Weberian bureaucracy is meant to operate in a rational and efficient manner.

In his 2011 book The Origins of Political Order, Francis Fukuyama argues that the development of modern, centralized state bureaucracies has been crucial to state capacity. According to Fukuyama, a professional bureaucracy with merit-based recruitment and organized hierarchy enables the state to effectively implement policies and provide public services. He points to China's long tradition of bureaucratic governance as an important factor enabling its state capacity. Fukuyama contends that Weberian bureaucracies lead to greater state capacity by insulating public administration from patrimonialism and politicization.

Similarly, in his 1968 book Political Order in Changing Societies, Samuel Huntington emphasized the need for developing countries to establish coherent, autonomous bureaucracies. He argued that many post-colonial states failed to build effective state capacity because their bureaucracies remained personalized instruments of particular leaders or factions, rather than becoming impersonal administrative structures. For Huntington, modernizing states requires a bureaucracy with a strong sense of corporate identity and morale. Like Fukuyama, he sees merit-based recruitment and promotion as key to having an effective state bureaucracy.

In his 1957 book The Soldier and the State, Huntington looked specifically at civil-military relations. He argued that a professionalized, apolitical military bureaucracy was essential for state capacity and stability. Huntington contended that keeping the military out of politics required recruiting career soldiers on the basis of merit rather than personal ties.

In their 2019 book The Narrow Corridor, Daron Acemoglu and James Robinson argue that inclusive economic and political institutions enable the development of state capacity. They contend that extractive institutions that concentrate power undermine the creation of effective bureaucracies. Like Fukuyama and Huntington, Acemoglu and Robinson emphasize building impersonal bureaucracies not beholden to special interests.

In the United States, the increased political appointment of senior civil servants has raised concerns about politicization eroding bureaucratic professionalism and autonomy. For example, critics argue that appointing agency heads based on loyalty rather than expertise damages state capacity by undermining competent governance.

In India, the overrepresentation of upper castes in bureaucratic positions, particularly in northern states, has limited state capacity by excluding lower castes from the bureaucracy. Nepotism and patronage in civil service recruitment and promotion have weakened meritocracy and accountability. Caste-based politics have prevented the Indian bureaucracy from becoming fully impersonal and professionalized.

In conclusion, the literature emphasizes that developing modern bureaucratic structures based on Weberian principles of meritocracy and insulation from partisan politics is essential for state capacity. The experiences of the US and India illustrate the costs to effective governance when bureaucracies are politicized or dominated by narrow interests.

Effective Altruism Forum
EA Forum

What we can learn from stress testing for AI regulation

27

27

Reactions

More posts like this