Algorithms now make decisions that were historically reserved for judges, doctors, and loan officers. These decisions are called "objective" because they are mathematical. This is a category error. A model trained on historical data inherits the biases of that history. When society delegates authority to algorithms without understanding this, it does not remove bias from decision-making. It launders bias through computation and calls it neutral.
The COMPAS Problem
The clearest documented case of algorithmic authority is COMPAS (Correctional Offender Management Profiling for Alternative Sanctions). A risk assessment tool used across US courts to predict the likelihood that a defendant will re-offend.
In 2016, ProPublica published an investigation that remains the landmark study in algorithmic fairness. Their analysis of over 7,000 defendants in Broward County, Florida found:
Black defendants were almost twice as likely to be falsely flagged as high-risk (45% false positive rate vs. 23% for white defendants). White defendants were more often incorrectly labeled as low-risk when they did go on to re-offend. The system was not predicting crime. It was predicting proximity to the criminal justice system. A measurement deeply shaped by policing patterns, sentencing disparities, and socioeconomic factors that correlate with race.
COMPAS's developer, Northpointe (now Equivant), responded that the tool maintained equal predictive accuracy across racial groups. That is, when it predicted a score of 7, the re-offense rate was similar across races. Researchers at Stanford and other institutions subsequently proved that equal predictive accuracy and equal error rates across groups are mathematically incompatible in the presence of different base rates. You cannot have both. Any tool that produces equal accuracy will produce unequal errors, and vice versa.
This is not a bug in COMPAS. It is a property of all prediction models applied to populations with different base rates. The choice of which fairness metric to prioritize, equal accuracy or equal error rates, is not a technical question. It is a moral one. And it is being made, by default, by engineers and product managers rather than by the communities affected.
Healthcare: Bias in Clinical AI
The same dynamics operate in healthcare, where the stakes are measured in lives.
A widely cited 2019 study published in Science examined a healthcare algorithm used by a major US health system to allocate follow-up care. The algorithm predicted patient health needs using prior healthcare spending as a proxy. The result: Black patients had to be significantly sicker than white patients to receive the same algorithmic risk score, because Black patients, on average, had historically spent less on healthcare, not because they needed less care, but because they had less access to it.
The algorithm did not use race as an input. But healthcare expenditure is a reliable proxy for race in the US, because access to care is distributed along racial lines. The algorithm encoded the consequences of structural inequality into a "neutral" prediction and automated the inequality's continuation.
An algorithm trained on historical data does not predict the future. It projects the past. If the past was shaped by discrimination, the projection perpetuates it, at machine speed, at machine scale, and with machine credibility.
Dermatology AI presents a case where the bias is visible on the skin, literally. Skin cancer detection models depend on training images to classify lesions as benign or malignant. The International Skin Imaging Collaboration (ISIC) dataset, one of the most widely used sources for training these models, contains over 90% images of Fitzpatrick skin types I through III (light to medium skin). Research published in JAMA Dermatology between 2021 and 2023 demonstrated that models trained on these datasets perform significantly worse on darker skin tones, with accuracy dropping by as much as 20 percentage points for Fitzpatrick types V and VI. Melanoma is already harder to detect on dark skin through visual inspection alone; an AI system that compounds this gap does not supplement clinical judgment. It automates its worst failure mode.
This is not a hypothetical. Skin cancer is the most common cancer in the United States, and delayed detection increases mortality. The five-year survival rate for melanoma detected at stage I is 99%, according to the American Cancer Society. At stage IV, it drops to 32%. A diagnostic tool that performs well for white patients and poorly for Black patients is not a neutral technology deployed in an unequal context. It is a technology that widens the gap between who lives and who dies.
Research published in 2025 found that Large Language Models used for treatment recommendations can exhibit racial bias even when race is never explicitly mentioned. LLMs infer demographic information from linguistic patterns, geographic context, and described symptoms. And adjust their recommendations accordingly. The bias is not in the explicit inputs. It is in the training data.
The Lending Machine
Algorithmic discrimination in lending follows the same pattern.
The Equal Credit Opportunity Act prohibits lenders from discriminating based on race, religion, national origin, sex, marital status, or age. Algorithmic lending models do not use these features directly. They use features that correlate with them: ZIP code (which correlates with race due to residential segregation), educational institution attended (which correlates with socioeconomic background), browsing patterns, social network connections.
A UC Berkeley study found that algorithmic mortgage lenders charged Black and Hispanic borrowers 5-9 basis points more than white borrowers with identical credit profiles. The discrimination was smaller than traditional (human) lenders, but it was present and profitable, generating an estimated $765 million per year in excess interest charges.
The algorithm did not "decide" to discriminate. It found patterns in historical data that were profitable to exploit, and those patterns reflected decades of redlining, wealth disparities, and unequal access to financial education.
The Hiring Filter
In 2018, Reuters reported that Amazon had built an internal recruiting tool designed to automate resume screening. The system, trained on ten years of hiring data, learned to penalize resumes that contained the word "women's," as in "women's chess club" or "women's lacrosse captain." It also downgraded graduates of two all-women's colleges. Amazon did not instruct the algorithm to discriminate against women. The algorithm observed that Amazon had historically hired mostly men, concluded that male candidates were preferable, and translated that pattern into automated filtering. Amazon scrapped the tool after internal testing revealed the bias, but the episode demonstrated a structural problem: any hiring algorithm trained on a company's historical decisions will reproduce whatever demographic imbalances already existed in that company.
Video interview platforms introduced a second vector of algorithmic discrimination. HireVue, used by over 700 companies including Hilton, Unilever, and Goldman Sachs, analyzed candidates' facial expressions, word choice, and vocal tone during recorded video interviews, then generated a score that ranked candidates against each other. In 2019, the Electronic Privacy Information Center (EPIC) filed a complaint with the FTC arguing that the system was opaque, unvalidated, and disproportionately likely to penalize candidates whose facial expressions or speech patterns deviated from the majority of the training set. HireVue dropped facial analysis from its product in January 2021, but continued to use vocal and linguistic analysis.
Legislatures have started responding. Illinois passed the Artificial Intelligence Video Interview Act in 2020, requiring employers to notify candidates when AI analyzes their video interviews and to obtain consent before using the technology. New York City went further with Local Law 144, effective July 2023, which requires any employer using an "automated employment decision tool" to commission an independent bias audit before deployment and to publish the audit results publicly. The law defines specific metrics: selection rates and scoring distributions must be broken down by race, ethnicity, and sex. Violations carry fines of $500 to $1,500 per candidate affected.
These laws represent the first attempts to impose the same evidentiary standards on algorithmic hiring that Title VII of the Civil Rights Act of 1964 imposed on human hiring. The gap remains large. Most US states have no AI hiring regulations at all, and federal legislation has stalled repeatedly in committee.
The Transparency Problem
The fundamental challenge with algorithmic authority is opacity.
Proprietary algorithms are trade secrets. COMPAS's specific features and weightings are not public. Healthcare risk models are often proprietary to the companies that sell them. Credit scoring models reveal only broad categories of factors, not the specific logic. When an algorithm denies parole, denies care, or raises a loan rate, the affected person has no meaningful ability to interrogate the reasoning.
This creates an accountability vacuum. A human judge who denies bail must state reasons on the record. A human doctor who recommends against treatment must document clinical reasoning. A human loan officer who denies a mortgage must provide a written explanation. An algorithm that does any of these things produces a score, and the score is treated as self-justifying.
The EU AI Act, effective in phases from 2024, classifies AI systems used in criminal justice, employment, credit scoring, and healthcare as "high-risk" and requires transparency, human oversight, and explainability standards. The US regulatory response is more fragmented, with the CFPB and FTC asserting jurisdiction over algorithmic discrimination in lending and consumer products, but without a detailed federal AI law.
An algorithm's authority derives from its perceived objectivity. Remove the perception of objectivity, and the authority collapses. This is why transparency is existentially threatening to proprietary algorithmic systems: once the public understands that a "risk score" is a weighted sum of proxy variables trained on biased historical data, the score loses its air of mathematical certainty. The companies that sell these systems have a financial incentive to maintain opacity. The communities affected by these systems have a fundamental interest in transparency. These interests are directly opposed.
The Audit Infrastructure
The technical response is emerging: independent algorithmic auditing.
NIST published the AI Risk Management Framework in 2023, providing voluntary standards for identifying and mitigating algorithmic risk. The framework is not binding, but it establishes a vocabulary and methodology for systematic evaluation.
New York City's Local Law 144, which took effect in July 2023, became the first US law to require bias audits of automated employment decision tools. Employers must hire independent auditors to evaluate disparate impact before deploying any algorithm that substantially assists or replaces human decision-making in hiring or promotion. The law requires public disclosure of audit results, including selection rates broken down by race, ethnicity, and sex. Early compliance has been uneven: a Cornell University study of the first year of enforcement found that many audits were superficial, conducted by firms with no prior experience in algorithmic fairness, and that some employers reclassified their tools to avoid the law's definitions entirely.
The EU AI Act, which entered into force in phases starting August 2024, takes a broader approach. It classifies AI systems into four risk tiers: unacceptable, high, limited, and minimal. AI used in criminal justice (including recidivism prediction), employment screening, credit scoring, and healthcare triage falls into the "high-risk" category. High-risk systems must meet specific requirements before deployment: technical documentation of training data composition, mandatory bias testing across protected characteristics, human oversight mechanisms that allow a qualified person to override algorithmic decisions, and post-market monitoring obligations that continue for the system's entire operational life. Non-compliance carries fines of up to 35 million euros or 7% of global annual revenue, whichever is higher.
The practical requirements for algorithmic accountability include:
Pre-deployment testing. Before an algorithm is deployed in a consequential context (bail, healthcare, lending, hiring), it should be tested for disparate impact across protected categories using held-out evaluation data that represents the actual deployment population.
Continuous monitoring. Algorithmic performance drifts over time as deployment populations change. A model trained on 2020 data may perform differently on 2025 populations. Ongoing monitoring of accuracy, error rates, and disparate impact is required, not as a one-time audit, but as a continuous process.
Explainability requirements. Any person materially affected by an algorithmic decision, whether denied parole, denied insurance, denied a loan, or rejected from a job, should receive an explanation of the factors that drove the decision, in language they can understand, with a meaningful opportunity to challenge errors.
Independent review. The entity that profits from the algorithm should not be the sole entity that audits it. Independent third-party auditing, with access to the model's logic and training data, is the structural equivalent of financial auditing. The Sarbanes-Oxley Act of 2002 required independent auditing of corporate financial statements after Enron and WorldCom collapsed under fraudulent self-reporting. Algorithmic auditing requires the same structural independence for the same reason: self-interested parties cannot be trusted to evaluate their own systems honestly.
COMPAS demonstrated that criminal justice risk models produce racially disparate false positive rates (45% for Black defendants vs. 23% for white defendants). Healthcare AI trained on spending-as-proxy systematically undertreated minority patients; dermatology AI trained on 90%+ light-skinned image datasets shows accuracy drops of up to 20 percentage points for darker skin tones, directly affecting cancer survival. Mortgage algorithms charged Black and Hispanic borrowers 5-9 basis points more with identical credit profiles, extracting $765 million annually. Amazon's hiring algorithm penalized resumes containing the word "women's," and HireVue's facial analysis scored candidates against a non-representative training baseline before dropping the feature in 2021. These outcomes are not algorithm failures. They are the predictable result of training prediction models on historical data shaped by structural inequality. NYC Local Law 144 (2023) became the first US law requiring bias audits of automated hiring tools. The EU AI Act imposes fines of up to 35 million euros for non-compliant high-risk AI. The required infrastructure parallels post-Enron financial auditing: pre-deployment disparate impact testing, continuous monitoring, individual explainability, and structurally independent third-party review. The choice of fairness metric is moral, not mathematical, and that choice should not default to engineers.