Too many decisions are being driven by assumptions. Not enough are grounded in evidence.
The Mobley v. Workday case has sparked concern across the HR and AI community. While headlines focus on allegations of algorithmic bias, our recent panel highlighted a deeper issue:
Too many decisions are being driven by assumptions. Not enough are grounded in evidence.
We brought together a group of experts to unpack the case and its wider implications:
Here’s how we see assumptions getting in the way, and what we need to do differently.
Many people are treating the Mobley case as confirmation that Workday’s AI system is biased. But we’re still at the allegation stage. As Jung put it:
“No evidence has been produced yet. This case is years away from being resolved, and we’re still in a heated discovery battle over what data is even available.”
Jumping to conclusions before evidence is introduced (whether as buyers, vendors, or regulators) is risky and reactionary.
There’s a widespread assumption that because AI is trained on historical data (and historical data reflects human bias), all AI must be biased too.
But as I shared during the session:
“We’ve seen in many cases that AI systems, when tested properly, are actually less biased than human decision-making.”
In fact, our own data suggests that across a range of hiring scenarios, AI systems audited for bias tend to perform more fairly on average than their human counterparts.
That fairness isn’t guaranteed. But it’s achievable when we treat bias measurement as a foundational requirement.
There’s a persistent belief that human decision-making is the safer fallback. But that assumes human judgment is consistent, fair, or even measurable. It often isn’t.
Sarah put it plainly:
“We couldn’t tell if bias was in the system or in our recruiters. We had to assemble a cross-functional team (legal, data scientists, architects) just to start untangling it.”
AI gets held to a high standard. Human decision-making rarely does. But if we don’t test either, we’re managing neither.
Sarah shared a telling example. Her team implemented an AI matching tool based on vendor promises, only to discover later it wasn’t doing what they thought.
“We assumed it was smart AI. It turned out to be basic keyword matching, not a learning algorithm. And when we dug deeper, it was hard to tell if any of the reporting could actually show whether bias was happening or not.”
Vendor branding shouldn’t replace scrutiny. If you don’t validate how a tool performs in your environment, you’re running on trust, not facts.
Many vendors (and customers) assume that if an audit has been done, the system must be fair. But most audits only cover race and gender. Even those are often annual.
“Less than 5% of third-party audits we see include characteristics like age or disability,” I noted. “That’s a major gap. And Mobley, focused on age discrimination, shows why it matters.”
Without comprehensive and regular testing, there’s no reliable picture of how a system behaves and no defensible evidence if it’s challenged.
What Mobley v. Workday really underscores is that lack of measurement and evidence is the core risk. Not AI itself.
We don’t need to pause AI adoption. But we do need to pause assumptions. Bias in hiring can’t be assessed (let alone reduced) without robust measurement. And right now, most organizations simply don’t have the data.
Fairness in AI isn’t automatic. But it’s not hypothetical either. The data shows it’s possible.
If we want to build fairer hiring systems, we have to start by measuring them.