AI in Financial Services: Algorithmic Credit and Flash Crashes

The Algorithmic Transformation of Finance

The financial sector was among the earliest and most aggressive adopters of computational automation, and today it represents one of the most AI-saturated industries in the global economy. The transformation unfolded over several decades, beginning with the computerisation of exchanges and the introduction of electronic trading in the 1980s and 1990s, accelerating with the proliferation of quantitative hedge funds through the 2000s, and entering a new phase with the deployment of machine learning for credit assessment, fraud detection, portfolio optimisation, and algorithmic trading in the 2010s and 2020s.

The financial sector's early and deep adoption of algorithmic methods was driven by factors specific to its structure. Financial markets generate enormous volumes of structured data, including price series, order flow, economic indicators, and corporate filings, that are ideal inputs for statistical models. The returns to superior information processing are immediate and quantifiable, creating strong incentives to invest in more capable systems. And the competitive dynamics of financial markets mean that advantages in speed, information, and processing capability compound rapidly: a firm that can execute trades in microseconds rather than milliseconds has a structural edge that translates directly into profits.

Pasquale (2015), in his examination of algorithmic power in financial markets, described the resulting environment as one characterised by opacity, speed, and the progressive displacement of human judgment by processes that even their designers could not fully explain or predict. The "black box" character of financial algorithms, the gap between the inputs, the processing, and the outputs that human observers could not easily bridge, was not incidental. It was a competitive advantage: firms with superior algorithms had strong incentives to protect them from inspection, whether by competitors, regulators, or the public.

O'Neil (2016) identified financial models as among the most consequential examples of what she called "weapons of math destruction": mathematical models that were widely used, opaque, and harmful, particularly to vulnerable populations. Her analysis covered credit scoring, insurance pricing, and employment screening, documenting how models that claimed mathematical objectivity systematically produced outcomes aligned with pre-existing social stratifications. The appearance of scientific rigour, she argued, was itself a political act: it immunised consequential decisions from the kind of democratic accountability that would otherwise apply to choices with such significant effects on people's lives.

The scope of AI deployment in financial services by the mid-2020s was extensive. Machine learning models were used in credit underwriting by banks, fintech lenders, and mortgage companies. Algorithmic trading systems executed the majority of equity market volume in the United States and Europe. Robo-advisors managed hundreds of billions of dollars in retail and institutional assets. Fraud detection systems evaluated tens of billions of transactions daily. Natural language processing systems parsed central bank communications, earnings calls, and news feeds to inform trading decisions. The financial system had, in a meaningful sense, become an AI-operated system with human oversight rather than a human-operated system with algorithmic assistance.

The Bank for International Settlements (2021) documented this transformation in its analysis of machine learning in central banking, noting that the same techniques that commercial financial institutions were using for competitive advantage were also being adopted by central banks and regulatory bodies for systemic risk monitoring, financial stability analysis, and the supervision of institutions whose own behaviour was increasingly driven by the same machine learning techniques. This created a supervisory environment in which the regulators and the regulated were using comparable tools, with uncertain implications for the quality and independence of oversight.

Credit Scoring and Algorithmic Discrimination

Credit scoring is one of the most consequential applications of algorithmic decision-making in daily life. The decision about whether to extend credit, at what price and on what terms, shapes individuals' ability to purchase homes, finance education, smooth consumption through periods of income disruption, and build businesses. Traditional credit scoring, using FICO and similar models based primarily on repayment history, utilisation rates, and account age, had well-documented disparities in outcomes across racial and ethnic groups, reflecting both historical patterns of credit access and the compounding effects of financial exclusion. Machine learning-based credit scoring was widely promoted as an opportunity to improve on these models by incorporating richer data and more sophisticated analytical techniques. The evidence that it did so was, at best, mixed.

Fuster and colleagues (2022) conducted an influential analysis of the effects of machine learning on credit markets, using mortgage application data to compare traditional and ML-based credit models. Their findings were detailed and significant. They found that ML models were more accurate predictors of default risk, and that this improved accuracy could extend credit access to some borrowers previously denied, particularly among minority borrowers who had thin credit files that traditional models handled poorly. But they also found that ML models amplified existing racial disparities in mortgage approval rates and pricing, because the additional variables they incorporated, even without explicitly using race, captured information that was correlated with race in ways that reproduced racial gaps.

The mechanism was what fair lending researchers call "disparate impact": the use of variables that are facially race-neutral but that, in the context of historically segregated housing markets, racially concentrated poverty, and differential access to financial services, produce systematically worse outcomes for minority applicants. Zip code, for example, is not a racial variable, but in the context of decades of racially discriminatory housing policy, it is correlated with race in ways that make its use in credit scoring produce racially disparate outcomes. Machine learning models, able to identify and exploit such correlations across thousands of variables, could achieve impressive aggregate predictive accuracy while embedding racial disparities at a granularity that simple regression models could not.

The Equal Credit Opportunity Act (ECOA) and the Fair Housing Act prohibit discrimination in credit decisions on the basis of race, sex, national origin, and other protected characteristics. Their application to algorithmic credit models raised significant interpretive challenges. The legal doctrine of disparate impact, which makes facially neutral practices unlawful if they produce discriminatory effects without business justification, was applicable in principle but difficult to enforce in practice against proprietary ML models whose internal workings regulators could not easily inspect.

Pasquale (2015) argued that the opacity of financial algorithms was itself a regulatory challenge: regulators could not effectively evaluate compliance with anti-discrimination law if they could not examine the decision processes that credit models were implementing. This led to proposals for algorithmic auditing, specifically third-party examination of credit models for disparate impact, that required regulators to have the technical capacity and legal authority to conduct such examinations, neither of which was consistently present. The Consumer Financial Protection Bureau's (CFPB) guidance on fair lending in AI and ML-based credit models, issued in 2023, represented a step toward this framework, but enforcement remained limited by the CFPB's resource constraints and the complexity of algorithmic discrimination litigation.

The fintech sector's use of "alternative data," including social media activity, smartphone usage patterns, purchasing behaviour, educational history, and other non-traditional variables, in credit scoring raised additional concerns. Proponents argued that alternative data could extend credit to the "credit invisible," the approximately 26 million Americans with no credit bureau file, by providing signals of creditworthiness not captured in traditional scoring. Critics, including the National Consumer Law Center and the Consumer Financial Protection Bureau, documented that alternative data sources frequently introduced proxy discrimination, that the correlations between alternative data variables and creditworthiness were often spurious, and that the use of such data raised privacy concerns that the consent frameworks built into traditional credit disclosure did not adequately address.

High-Frequency Trading and Market Fragility

High-frequency trading (HFT) represents the most complete realisation of the principle that speed is a competitive advantage in financial markets. HFT firms use co-located servers placed physically within exchange data centres, fibre-optic and microwave communication links optimised for minimum latency, and algorithms capable of making thousands of trading decisions per second to gain advantages measured in microseconds over other market participants. At its peak, HFT activity accounted for approximately 50–60% of daily equity volume in US markets, though this share declined somewhat after the initial burst of profitability attracted excess competition.

Hendershott, Jones, and Menkveld (2011) published one of the foundational empirical analyses of algorithmic trading's effects on market quality, finding that the introduction of NYSE's AutoQuote system, which increased algorithmic participation in quote-setting, improved market liquidity as measured by bid-ask spreads. This finding was widely cited by HFT proponents as evidence that algorithmic trading benefited markets. The methodology and the generalisability of the finding were contested; subsequent research found that the liquidity improvements associated with HFT were concentrated in normal market conditions and tended to deteriorate precisely when liquidity was most needed, during periods of market stress.

The fragility concern was theoretical before the Flash Crash of May 6, 2010 made it empirically undeniable. Kirilenko and colleagues (2017), in their influential analysis of the Flash Crash using millisecond-level trading data, documented how HFT firms' behaviour during the crash was not stabilising, as market microstructure theory had suggested it would be, but destabilising. HFT firms detected the rapid price declines being triggered by a large algorithmic sell order and responded by aggressively pulling their quotes and flipping from buyers to sellers, accelerating the price decline rather than absorbing it. The liquidity that HFT firms had provided in normal conditions evaporated at the moment of stress, precisely when market participants most needed it to remain.

The economics of HFT are predicated on capturing small price advantages at very high frequency, not on providing liquidity as a service with obligations that persist in adverse conditions. This created what regulators came to understand as a distinction between market-making as a commitment and market-making as an opportunistic activity. Traditional market makers operating under affirmative obligations had contractual and regulatory commitments to provide two-sided markets even in difficult conditions. HFT firms providing liquidity through proprietary activity had no such obligations and withdrew it when it was unprofitable to provide, which was precisely when market stress required it most.

The structural change in market microstructure that HFT represented created a category of market fragility. Prices moved at speeds far beyond human perception, a significant fraction of displayed liquidity could evaporate within milliseconds, and the behaviour of major participants was fundamentally unpredictable in stress scenarios. that existing circuit breakers and regulatory safeguards were poorly designed to address. The Securities and Exchange Commission implemented market-wide circuit breakers and limit-up/limit-down price bands after 2010 as part of its response to the Flash Crash, and these measures reduced the frequency of extreme events. But the underlying dynamic remained structurally present: markets in which algorithmic participants could produce feedback loops of self-reinforcing price movements at speeds that precluded human intervention.

The 2010 Flash Crash and Its Successors

At 2:32 p.m. on May 6, 2010, the Dow Jones Industrial Average began one of the most precipitous declines in its history, falling approximately 1,000 points, nearly 10%, in a matter of minutes, before recovering almost as rapidly. At the lowest point of the crash, stocks including Accenture, which had been trading at approximately $40, briefly traded for one cent. The event demonstrated, in a way that no theoretical analysis could have, that algorithmic trading could produce systemic outcomes that no individual participant had intended or anticipated, through the interaction of multiple automated systems each operating rationally within its own narrow parameters.

The SEC and CFTC's joint report (2010) attributed the crash to a large institutional sell order, a mutual fund company's algorithmic programme to sell approximately $4.1 billion worth of E-mini futures contracts, that triggered a cascade of reactions from HFT firms and other algorithmic market participants. The analysis identified the key failure mode as a "hot potato" dynamic: HFT firms that initially absorbed the sell order quickly passed the contracts to other HFT firms, who passed them to others, with each firm holding the position only briefly before offloading it to maintain their preferred near-zero inventory positions. This rapid turnover among a small number of HFT participants created the appearance of large trading volume with no actual price discovery occurring, until prices collapsed.

Kirilenko and colleagues (2017) provided a more detailed microstructural analysis of the Flash Crash, finding that HFT firms' behaviour was best described as "aggressive," actively trading in ways that amplified price movements rather than dampening them. They estimated that HFT firms' collective activity during the crash produced trading profits of approximately $13 million while contributing to the price dislocation that cost other market participants far more. The distributional implications were stark: algorithmic speed advantages produced significant profits for their possessors precisely from the market instability that those same speed advantages helped create.

The 2010 Flash Crash was the most dramatic but not the only example of algorithmic feedback dynamics producing rapid, severe market dislocations. The "Knight Capital incident" of August 2012, in which a software deployment error caused Knight Capital's algorithmic trading system to flood markets with erroneous orders, resulting in a $440 million loss in 45 minutes and the effective destruction of the firm, demonstrated that failure modes in algorithmic trading systems were not limited to cascade dynamics involving multiple firms. A single firm's algorithmic failure could, through its market impact, affect prices across hundreds of securities simultaneously. The "ETF dislocation" of August 24, 2015, in which exchange-traded funds trading at significant discounts to their net asset values triggered circuit breakers across thousands of securities, provided further evidence of how algorithmic market structure could produce unexpected and harmful outcomes in stress conditions.

The regulatory response to these events, including market-wide circuit breakers, limit-up/limit-down bands, kill-switch requirements for trading systems, and enhanced pre-trade risk controls, represented meaningful improvements in the market's ability to contain the worst outcomes of algorithmic failures. Whether they were sufficient to prevent a major algorithmic event from triggering systemic financial instability remained contested. The European Central Bank's (2020) report on machine learning for systemic risk assessment noted that the feedback loops created by multiple AI systems responding to the same market signals were a distinct category of systemic risk that required new analytical frameworks and monitoring approaches.

Robo-Advisors and Retail Investor Risk

The rise of robo-advisors, automated investment platforms that construct and manage diversified portfolios using algorithms rather than human advisers, represented the democratisation of algorithmic investment management. Prior to robo-advisors, professionally managed, diversified investment portfolios were accessible primarily to wealthy individuals who could afford the minimum investment thresholds and advisory fees of traditional wealth management. Platforms including Betterment (founded 2008), Wealthfront (founded 2008), and the robo-advisory services of major brokerages like Vanguard and Schwab brought algorithmic portfolio management to investors with as little as $1 in assets.

The investment approach used by robo-advisors was, at its core, a systematic implementation of Modern Portfolio Theory: constructing diversified portfolios of low-cost index funds optimised for expected return at a given level of risk, with automatic rebalancing and tax-loss harvesting. This approach had solid theoretical foundations and empirical support: the evidence that most actively managed funds underperformed their benchmarks over long periods was robust, and low-cost passive investing consistently produced better risk-adjusted returns for retail investors than actively managed alternatives.

The risks robo-advisors introduced were distinct from the risks of HFT and largely invisible in normal market conditions. They concentrated in stress scenarios and in the interaction between algorithmic portfolio management and retail investor behaviour. During market downturns, algorithmic rebalancing, selling assets that had declined below their target allocation and buying assets that had risen above theirs, could amplify market movements if sufficient assets were managed according to correlated algorithms that responded to the same signals with the same trades at the same time. The scale of robo-advisor AUM, which reached into the hundreds of billions by the early 2020s, meant that correlated algorithmic responses to market movements could have measurable price impacts in stressed conditions.

The transparency and disclosure practices of robo-advisors raised distinct concerns. The algorithms driving portfolio construction and rebalancing were proprietary, with users receiving limited information about how decisions were made on their behalf. Regulatory requirements for investment advisers to provide "adequate disclosure" about their methods and conflicts of interest were interpreted variably by robo-advisory platforms, with some providing detailed algorithmic disclosures and others offering only summary descriptions that a retail investor could not meaningfully evaluate. The SEC's guidance on robo-adviser compliance, issued in 2017 and updated subsequently, attempted to clarify the disclosure standard, but practical enforcement was limited by the SEC's resource constraints and the pace of platform evolution.

The suitability requirement, the obligation of investment advisers to recommend investments that are suitable for the client's financial situation and objectives, was interpreted differently in the robo-advisory context. Traditional human advisers assessed suitability through an ongoing relationship that incorporated information about changes in the client's circumstances, risk tolerance, and financial goals. Robo-advisors typically assessed suitability through an onboarding questionnaire and updated their assessment only if the client proactively reported changes. Research on investor questionnaire responses suggested that stated risk tolerance was an unreliable predictor of actual investor behaviour during market downturns. Many investors who reported high risk tolerance in calm conditions behaved very differently when their portfolios declined significantly in real time.

Systemic Risk and Correlated AI Behavior

The individual risks of algorithmic credit models, HFT, and robo-advisors were significant. The systemic risks arising from the interaction of multiple AI-driven systems operating simultaneously across the financial sector were potentially larger and substantially less well-understood. Systemic risk in finance referred to the risk that the failure of one institution or market segment would propagate through the financial system to cause broader disruption, the mechanism that turned the subprime mortgage crisis of 2007–2008 into a global financial crisis. AI introduced new transmission channels for systemic risk that existing regulatory frameworks were not designed to monitor or contain.

The Financial Stability Board's (2022) report on AI and machine learning in financial services identified correlated behaviour among AI systems as a primary systemic risk concern. If multiple financial institutions trained their credit models, trading algorithms, or risk management systems on the same or similar datasets using the same or similar techniques, those systems would tend to identify the same patterns and respond to the same signals in the same ways. This algorithmic homogeneity would create the conditions for highly correlated, simultaneous behaviour across the financial system, a kind of herd behaviour at machine speed. Market stress events would trigger similar responses from many institutions simultaneously, amplifying shocks rather than absorbing them.

The concern was not hypothetical. The 2007–2008 financial crisis had, in part, been driven by the correlated behaviour of risk models across major financial institutions: Value at Risk (VaR) models, using similar statistical methods and similar historical data, had produced similar assessments of portfolio risk that were simultaneously wrong in the same direction, systematically underestimating the tail risks that materialised in the crisis. Machine learning models, trained on larger and more recent datasets, would presumably perform better than pre-crisis VaR models in many respects, but the fundamental dynamic, multiple institutions' risk management systems behaving similarly and potentially erroneously, was reproduced rather than eliminated.

The Bank for International Settlements (2021) noted that the opacity of ML-based financial models, which were generally more complex and harder to interpret than traditional statistical models, made it more difficult for regulators and institutions themselves to understand the assumptions embedded in their models and to identify when those assumptions were being violated. The 2007–2008 crisis had demonstrated that complex financial models could embed flawed assumptions, including about the independence of mortgage defaults, the stability of housing prices, and the liquidity of securitised instruments, that were not visible in the model outputs until the assumptions were violated catastrophically. Machine learning models, with their ability to identify non-obvious correlations in historical data, could embed equally non-obvious assumptions about the stability of those correlations in future conditions.

The concentration of financial AI infrastructure in a small number of technology providers, primarily the major cloud computing platforms and AI tool vendors, introduced a new category of single-point-of-failure risk to the financial system. If a significant fraction of financial industry AI workloads ran on a common cloud platform, a failure or cyberattack affecting that platform could simultaneously affect the risk management, trading, and credit systems of a large number of financial institutions. This concentration risk was recognised by regulators including the Bank of England and the European Banking Authority, which issued guidance on operational resilience and third-party risk management, but the commercial economics of cloud computing, which strongly favoured concentration in a small number of providers, made it difficult to achieve the diversification that prudent risk management would prescribe.

Regulatory Frameworks for Algorithmic Finance

The regulatory framework for algorithmic finance evolved reactively, responding to crises and market failures after they occurred rather than anticipating them before they did. This reactive posture reflected the genuine difficulty of regulating systems that operated at speeds and complexity levels that outpaced conventional oversight approaches, and the political economy of financial regulation, in which the regulated sector had strong incentives and substantial resources to shape the regulatory response in directions that preserved its competitive advantages.

The post-2010 regulatory response to HFT and algorithmic trading in the United States focused primarily on market structure reforms: circuit breakers, limit-up/limit-down bands, mandatory pre-trade risk controls for algorithmic trading systems, and enhanced reporting requirements. Reg NMS, originally designed for a slower market, was supplemented by rules specifically addressing high-speed trading. The SEC and CFTC developed more sophisticated surveillance capabilities for identifying manipulative algorithmic trading strategies including spoofing and layering. These measures addressed specific, identified failure modes without comprehensively addressing the systemic fragility that HFT participation in market-making created.

In the European Union, MiFID II (Markets in Financial Instruments Directive II), implemented in 2018, imposed more extensive requirements on algorithmic trading firms, including requirements to test algorithms before deployment, to have kill-switch capability, to maintain records of algorithmic activity, and to have monitoring systems capable of identifying when algorithms were behaving abnormally. These requirements represented genuine improvements in governance of algorithmic trading, though enforcement quality varied across EU member states and the disclosure requirements were limited by legitimate confidentiality concerns about proprietary trading strategies.

The regulation of AI in credit decisions attracted growing regulatory attention in the United States through the CFPB's implementation of fair lending laws, including guidance on the use of alternative data and the explainability requirements for adverse action notices, the notifications lenders were required to provide to rejected applicants explaining the reasons for denial. The traditional adverse action notice framework, listing specific factors that negatively affected the credit decision, was developed for simple credit models whose factors could be readily identified and explained. Machine learning models, which might use thousands of variables in ways that did not decompose into simple factor-by-factor explanations, were difficult to fit into this framework.

The CFPB's 2023 guidance on adverse action notices in AI-based credit decisions required lenders to provide "principal reasons" for denial in terms that were meaningful to the applicant, regardless of the complexity of the underlying model. Technically, this required approaches to model interpretability, meaning methods for explaining individual model decisions in terms of input variable contributions, that were active research areas in machine learning but had not achieved consensus standards for regulatory purposes. The guidance established the regulatory objective without fully resolving the technical challenges its implementation required.

O'Neil (2016) argued that the regulation of consequential algorithms required not merely technical auditing but democratic accountability, meaning public oversight of systems that made decisions affecting large numbers of people's lives, with meaningful recourse for those adversely affected. This argument had political as well as technical dimensions: it implied that the proprietary protections surrounding financial algorithms needed to be balanced against public interests in understanding and contesting consequential decisions. That balance remained contested, with the financial sector maintaining that disclosure requirements sufficient for genuine democratic accountability would destroy the competitive value of proprietary models, and critics maintaining that the status quo effectively placed consequential public functions beyond democratic oversight.

The direction of regulatory development across jurisdictions was toward greater transparency, more prospective governance of AI model deployment, and more explicit attention to systemic risk from AI concentration and correlation. The pace was slower than the rate of AI adoption in financial services, creating persistent gaps between the scale of AI deployment and the adequacy of the oversight framework. Whether the gap would narrow before a major AI-enabled financial disruption demonstrated its consequences at systemic scale remained the central unresolved question of AI governance in the financial sector.

References

Pasquale, F. (2015). The Black Box Society. Harvard University Press.
O'Neil, C. (2016). Weapons of Math Destruction. Crown Books.
Securities and Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010. SEC/CFTC Report.
CFTC/SEC (2010). Joint Report: Flash Crash of May 6, 2010. U.S. Commodity Futures Trading Commission and Securities and Exchange Commission.
Kirilenko, A., et al. (2017). The flash crash: High-frequency trading in an electronic market. Journal of Finance, 72(3), 967–998.
Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. Journal of Finance, 77(1), 5–47.
European Central Bank (2020). Report on Machine Learning Applications for Systemic Risk Assessment. ECB.
Bank for International Settlements (2021). BIS Working Papers No. 930: Big data and machine learning in central banking. BIS.
Hendershott, T., Jones, C.M., & Menkveld, A.J. (2011). Does algorithmic trading improve liquidity? Journal of Finance, 66(1), 1–33.
Financial Stability Board (2022). Artificial Intelligence and Machine Learning in Financial Services. FSB.

AI in Financial Services: Algorithmic Credit, Flash Crashes, and the Concentration of Systemic Risk