Read the Illinois Human Rights Act Amendment (HB 3773) Regulation Guide

The Assurance Process

The Assurance Process at Warden establishes a standardized, technical framework for evaluating algorithmic hiring systems.

Designed to satisfy jurisdictional precedents and emerging statutory frameworks, including NYC LL 144, the EU AI Act, Colorado SB205, and EDPB algorithmic auditing guidelines, our methodology isolates and measures algorithmic behavior without requiring inspection of proprietary source code.

This process is calibrated to the specific architecture of the evaluated system. The objective is to generate a defensible record of system behavior, enabling organizations to mitigate bias and demonstrate compliance.

Part I: The Assurance Process Timeline

The implementation process follows five defined phases, typically spanning a 2–4 week timeline to complete.

1. Understanding the System

Reverse Demo & Audit Methodology
Our evaluation begins with the technical onboarding step. The system provider does a walkthrough of their AI system(s). Warden's auditing team conducts a structural review to determine behavioral parameters, data inputs, and scoring mechanisms, establishing the requirements for the audit process and data.

2. Getting the Data Ready

Integration & Calibration
Warden initiates the configuration phase to confirm the data setup is working correctly. Technical oversight is facilitated through secure file-based integration or the Warden API. For audits using the Warden Dataset, our team maps our standardized test data to align with the unique system architecture and grading rubrics.

3. Running the First Audit

First Audit Analysis
Once setup is complete, the first audit is performed. Complete test data is run through the AI system and the results returned back. Historical data is also shared if available. Outputs from the AI are sent to the Warden platform for analysis, and audit outputs are then subjected to human review.

4. Reviewing and Remediating

Re-Calibration & Remediation
Warden operates as an independent evaluator; we do not participate in the development or direct alteration of assessed models. If the evaluation detects sub-threshold metrics, we flag the specific proxy variables—or a combination of variables—driving the biased behavior. The vendor utilizes this diagnostic record to facilitate internal mitigation protocols, followed by a secondary evaluation to verify the system meets compliance thresholds.

5. Publishing Results

First Audit Launch & GTM
Once complete, the final audit reports and transparency dashboards are generated. Vendors publicly launch their Warden dashboard and verified status, allowing them to integrate the Warden Assured standard into their go-to-market motions and share the results with their buyers.

Part II: The Technical Architecture

Our methodology is driven by a robust evaluation engine spanning six distinct technical layers.

1. Integration Layer: Behavioural Black-Box Evaluation

Warden evaluates system behavior rather than governance processes. This ensures a high-fidelity assessment of real-world functionality while preserving vendor intellectual property. Technical oversight is facilitated through secure file-based integration or Warden API. Audits can be performed across two operational environments:

  • Post-Market audits: Standard audits are conducted on production (or a system with equivalent behavior to production). This ensures the evaluated behavior reflects the version interacting with candidates, providing realistic results of the live system.
  • Pre-Market audits: Audits are performed on development or staging environments to isolate and measure the behavioral impact of new features prior to deployment. By stress-testing the logic before it reaches production, organizations ensure updates do not introduce statistical disparities.

2. Data Layer: Independent Benchmarks

To ensure statistical validity, we utilize a dual-data approach anchored by the proprietary Warden Dataset, optionally complemented by historical data. The Warden Dataset is built on real data (provided under GDPR/CCPA consent, and synthetically manipulated in a controlled process).

By using Warden data, organizations adopt a controlled experimental approach (versus the observational approach of historical data), producing results that covers a wide range of scenarios. This focuses on the actual bias present in the system, isolating the system behavior without confounding it with other bias sources. This method of isolating system behavior allows us to pinpoint exact proxy attributes as drivers of bias, rather than only identifying disparate impact on a group level.

3. Protected Classes Layer: Targeted Scope

System behavior is measured across demographic classifications aligned with AI-specific regulations, industry standards, and civil rights law. Depending on the customer’s specific package, evaluations test for disparities across Sex, Race/ethnicity, Age, Disability, Religion, and more than 15 total protected characteristics.

4. Evaluation Layer: Dual-Method Bias Detection

Single statistical tests cannot detect complex discrimination patterns. Warden executes two parallel testing techniques to comprehensively evaluate system fairness:

  • Group bias (Disparate Impact): In this analysis, selection rates are compared on a group level to identify whether a protected demographic group is disproportionately disadvantaged compared to another group. The primary metric is the Impact Ratio, with a standard threshold of 0.80 reflecting the established Four-Fifths Rule.
  • IIndividual bias (Adversarial Counterfactuals): Aligning directly with the EDPB recommendation for "adversarial auditing" and "sockpuppeting", this technique isolates algorithmic logic. We generate synthetic variations of base test cases, altering a single protected characteristic while holding all qualifications constant. The metric is the Consistency Score, requiring a 95% threshold for a verified outcome.

5. Compliance Mapping Layer: Regulatory Alignment

Raw technical measurements do not establish regulatory defensibility in isolation. Audit outcomes are mapped to specific statutory requirements. This includes aligning system behavior with the technical expectations of NYC LL 144, the EU AI Act, Colorado SB205, California FEHA.

6. Transparency & Reporting Layer

Audit results are published publicly, giving buyers and candidates verifiable evidence that a system has been independently evaluated. Vendors ultimately control whether to share results, but withholding findings would show outdated results and be out of alignment with the Warden Assured standard.

Continuous Technical Oversight

Point-in-time testing cannot assure compliance across a system’s operational lifecycle. The entire framework is underwritten by a recurring audit routine, fulfilling regulatory expectations for ongoing technical oversight. Warden deploys longitudinal audit protocols that monitor system outputs, identifying potential issues that may arise over time.

Join the companies
building trust in AI

Book Demo