A practical dictionary for understanding fairness, risk, and accountability in AI systems
Overview
Bias auditing is no longer a niche concern. It now sits at the centre of everyday conversations across HR tech, talent platforms, enterprise procurement, and AI governance.
Yet many of those conversations are still happening without a shared language.
The same terms appear in sales decks, policies, audit reports, and regulatory guidance, often meaning slightly different things depending on who is using them. That ambiguity creates risk. It also makes meaningful oversight harder than it needs to be.
The aim of this lexicon is simple: to get precise.
This dictionary provides plain-English definitions of the most commonly used terms in AI bias auditing, grounded in how audits are conducted in practice, not how fairness is marketed. Where terminology varies across vendors or regulations, that is made explicit.
If you are responsible for deploying AI, buying AI, or explaining how AI hiring systems work to regulators, clients, or candidates, this is the language you need to be comfortable using and, just as importantly, challenging.
Bias that manifests in the outputs of an automated or algorithmic system, where observed differences in outcomes are attributable to how the system processes data, applies rules, or learns from patterns.
Why it matters in practice:
AI bias is observable and testable through outputs. It does not require insight into intent or source code.
May also be called:
Algorithmic bias, model bias
The documented framework defining metrics, datasets, assumptions, thresholds, and review processes used in an audit.
Why it matters in practice:
Findings are only defensible if the methodology is transparent and repeatable.
May also be called:
Audit framework
The formal artefact summarising audit scope, methodology, findings, and limitations.
Why it matters in practice:
This is the document regulators and clients expect to review.
May also be called:
Audit documentation
The legal and regulatory frameworks under which an audit is performed, such as NYC LL 144, SB-205, EU AI Act, or FEHA.
Why it matters in practice:
Scope determines what must be tested and how findings can be used.
May also be called:
Regulatory scope
A documented record of data sources, methods, assumptions, and findings associated with an audit.
Why it matters in practice:
Essential for regulatory defence and governance.
May also be called:
Evidence trail
A system that uses automated computation or algorithms to make or materially assist decisions about individuals, including scoring, ranking, filtering, or prioritization. Terminology varies by regulation.
Why it matters in practice:
If a system shapes access to opportunity, it attracts regulatory scrutiny regardless of label.
May also be called:
AEDT, ADM system, automated decision system
A systematic pattern of difference in outcomes or treatment between groups or individuals that is not explained by relevant, legitimate criteria. Bias describes observed patterns, not intent or legality.
Why it matters in practice:
Bias is the signal that triggers investigation. It can exist in human processes, data, or automated systems.
May also be called:
Outcome disparity, differential treatment
An independent, structured assessment of whether a system’s outputs show statistically meaningful differences across protected characteristics using defined methods and datasets.
Why it matters in practice:
Produces evidence about risk. It does not itself determine legality or compliance.
May also be called:
Fairness audit, algorithmic audit
An evaluation approach that assesses a system based on observed inputs and outputs, without access to internal logic or source code.
Why it matters in practice:
Behavioral testing is often the only practical way to evaluate proprietary systems.
May also be called:
Outcome-based testing
A metric used to assess how stable or repeatable an AI system’s outputs are when presented with the same or similar inputs. Similar inputs may include paraphrased text, reordered information, or other non-material variations. In some audit contexts, deliberately modified inputs such as demographic changes may also be treated as similar for specific analyses.
Why it matters in practice:
Low consistency can indicate sensitivity to irrelevant changes, increasing the risk of unpredictable behaviour. Depending on test design, consistency metrics may be used to assess robustness, bias sensitivity, or both.
May also be called:
Output stability metric, repeatability metric
A modified instance of the same individual or data point in which only selected attributes are changed. For bias auditing, this is usually a demographic counterfactual.
Why it matters in practice:
Counterfactuals allow controlled testing of what influences decisions.
May also be called:
Synthetic variant, altered input
A method that tests whether altering selected attributes while holding other inputs constant changes system outputs.
Why it matters in practice:
Helps identify sensitivity to specific attributes, including but not limited to protected characteristics.
May also be called:
Sensitivity analysis (context-specific)
A pattern where outcomes disproportionately disadvantage one group compared to another, regardless of intent.
Why it matters in practice:
A core concept in employment discrimination analysis and AI governance.
May also be called:
Adverse impact
A statistical comparison of outcome rates across demographic groups to identify potential adverse impact.
Why it matters in practice:
One of the most widely recognised methods used in bias audits.
May also be called:
Adverse impact analysis
The ability to describe how a system produces outputs in terms understandable to stakeholders.
Why it matters in practice:
Supports accountability, challenge, and informed oversight.
May also be called:
Interpretability
Bias identified through aggregated outcome patterns across demographic groups.
Why it matters in practice:
Reveals systemic disparities that individual cases may not show.
May also be called:
Population-level bias
A system used in contexts where decisions have legal or similarly significant effects on individuals, such as hiring or pay. Definitions vary by regulation.
Why it matters in practice:
High-risk classification triggers stronger governance and oversight expectations.
May also be called:
Regulated AI system
A ratio comparing a group’s outcome rate to that of the highest-performing group.
Why it matters in practice:
Used to quantify disparity and assess whether further investigation is required.
May also be called:
Selection rate ratio
Differential treatment affecting specific individuals, even when group-level statistics appear balanced.
Why it matters in practice:
Individual harm can exist without obvious population-level signals.
May also be called:
Case-level bias
When a system meaningfully contributes to a decision by shaping ranking, filtering, prioritizing, or visualizing.
Why it matters in practice:
Human involvement does not remove responsibility if the system structures the decision space.
May also be called:
Decision support with material effect
Repeated audits over time to account for system changes, updates, and data drift.
Why it matters in practice:
AI systems evolve. Static evaluations do not capture this evolution.
May also be called:
Continuous auditing
Ongoing evaluation of a live system using real operational data in a production environment.
Why it matters in practice:
Many risks only emerge once systems interact with real users and data distributions.
May also be called:
In-market monitoring
Testing conducted before a system is placed into operational use, typically using historical or synthetic test data.
Why it matters in practice:
Useful for early risk identification but insufficient on its own.
May also be called:
Model validation, pre-market testing
Attributes that receive legal or ethical protection, such as sex, race, age, disability, religion, or national origin.
Why it matters in practice:
Bias audits assess outcomes across these attributes to identify discrimination risk.
May also be called:
Protected classes
Bias arising from variables that correlate strongly with protected characteristics, even if those characteristics are not explicitly used.
Why it matters in practice:
Removing protected attributes does not eliminate risk if proxies remain.
May also be called:
Indirect discrimination
Stress-testing a system under varying conditions to assess stability and sensitivity to irrelevant changes.
Why it matters in practice:
Identifies fragility before it becomes operational or compliance risk.
May also be called:
Stress testing
The proportion of individuals in a group receiving a favorable outcome.
Why it matters in practice:
Forms the basis of many outcome-based fairness metrics.
May also be called:
Selection rate
Any method, tool, or process used to evaluate, screen, or rank individuals in employment-related decisions.
Why it matters in practice:
Many AI hiring tools are legally treated as selection procedures, bringing established fairness standards into scope.
May also be called:
Assessment method, screening tool
A comparison of outcome rates across groups to assess balance.
Why it matters in practice:
Parity alone does not establish fairness. Context and relevance matter.
May also be called:
Demographic parity
An audit conducted by an independent entity with no commercial interest in the system being assessed.
Why it matters in practice:
Independence is critical for credibility with regulators and enterprise buyers.
May also be called:
Independent audit, external audit