Why Agents Amplify Trust Violations at Scale

One human’s bad day affects 20 people. One agent’s bug affects 20,000.

Mar 23, 2026

Customer service rep has a bad day. Gives wrong information to handful of customers. Manager catches it. Retrains rep. Problem contained. Twenty people affected. Unfortunate, not catastrophic.

Now: Agent has a bug. Gives wrong information to thousands before anyone notices. Automation doesn’t just execute at scale. It destroys trust at scale.

Automation doesn’t just scale execution.
It scales failure.

The Amplification Effect

When humans handle conversations, both trust-building and trust-destruction happen slowly:

Human Trust-Building:

One good interaction at a time
Quality varies by individual and context
Self-correcting (rep notices confusion, adjusts)
Limited throughput

Human Trust-Destruction:

One bad interaction at a time
Contained to individual rep’s shift
Usually caught before massive damage
Recovery localized

Agents change everything.

You can build trust faster than ever.
Or destroy it faster than ever.

What This Looks Like in Practice

Imagine a large financial services company. They launch a loan application agent to handle pre-qualification questions—routine stuff that should be straightforward.

The agent works beautifully in testing. Week one in production looks great: 3,000 successful interactions, customers praising the speed and convenience. Then week two hits.

A configuration error causes the agent to miscalculate debt-to-income ratios. Not wildly wrong—off by about 15%—but enough to pre-qualify people who shouldn’t be… and reject people who should have qualified. The team catches it within 18 hours. By then, 1,200 applicants have received incorrect guidance.

But the trust collapse isn’t just among those 1,200 people. Their financial advisors hear about it. Consumer protection forums pick it up. Within a week, loan officers are fielding calls from customers asking, “Can I trust what the agent told me?” The damage spreads far beyond the direct impact.

When they fix the bug and re-launched the agent with additional safeguards three weeks later, adoption is half what projections initially showed. People have learned: “That agent gets financial calculations wrong.” Even though it is now fixed and does a good job with its judgement.

One bug can impact thousands.
One failure can define perception.

Automation Amplifies Both Sides

Agent Trust-Building:

Thousands of good interactions simultaneously
Consistent quality (when working correctly)
24/7 availability
Unlimited throughput

Agent Trust-Destruction:

Thousands of bad interactions simultaneously
Consistent failure (when broken)
24/7 damage
Unlimited destruction

The scale cuts both ways. You can build Conversational Capital faster than ever before. Or destroy it faster than ever before.

Most teams design for scale.

They don’t design for failure at scale.

This isn’t a bug problem.
It’s a system design problem.

This is also a financial problem.

When agents fail at scale, you don’t just get defects.

You get:

increased churn
higher cost to serve (more escalations, more recovery work)
reduced adoption of automation
long-term brand damage

What looks like a technical issue becomes a capital destruction event.

The Real-World Math

Healthcare system launches appointment scheduling agent.

Week 1 (working correctly):

5,000 successful bookings
Users delighted: “This is so convenient!”
Massive trust deposits
Net capital: +5,000 units

Week 2 (bug introduced):

Bug causes agent to double-book 800 appointments
Users show up, told appointment doesn’t exist
Massive trust withdrawals
Net capital: -12,000 units (remember the asymmetry)

Total net capital: -7,000 units

Week 1’s gains completely erased. Now deeply in the negative.

And here’s the worst part: The 4,200 users whose appointments worked fine in Week 2? They heard about the 800 failures. Trust contaminated.

Why “Move Fast and Break Things” Fails

Silicon Valley mantra: Ship fast. Break things. Fix them. Users forgive.

Works for features. Catastrophic for agents.

Feature breaks:

User frustrated
User waits for fix
Fix deployed
User moves on

Agent breaks:

User learns “I can’t trust this”
Fix deployed
User still doesn’t trust it
Learning persists

You can patch code quickly.
You can’t patch broken trust.

The Network Effect

Human rep gives wrong info to 20 people. Those 20 people: inconvenienced. Maybe 5 tell someone. Maybe 1 posts online. Limited spread.

Agent gives wrong info to 2,000 people. Those 2,000 people: violated at scale. 500 tell colleagues. 100 post online. 50 write reviews.

Within days, 10,000+ people know “that agent failed.” Most haven’t even used it. Already distrustful.

Network effects amplify trust destruction beyond direct user base.

The “Good Enough” Trap at Scale

80% accuracy sounds reasonable. Human reps aren’t perfect either.

But scale changes the math:

Human rep, 80% accuracy:

Affects 50 customers/day
10 failures/day
Contained, correctable

Agent, 80% accuracy:

Affects 5,000 customers/day
1,000 failures/day
Viral, catastrophic

Same percentage. Completely different impact.

Why Early Failures Are Exponentially Worse

Launch agent to 1,000 users. Week 1: 10% failure rate (100 bad experiences).

Those 100 users:

Had no prior trust capital accumulated
First impression = failure
Tell others
Create distrust before most users even try it

Pre-poisoned the well.

By the time you fix it, market perception: “That agent doesn’t work.” Even though it works fine now.

Recovery from bad launch tends to be extraordinarily difficult, sometimes impossible.

The Implication for Design

Given amplification at scale:

Consider HIGHER reliability for agents than humans. Not the same. Higher.

Human reps: 85% might be acceptable baseline. Agents: Typically need 95%+ before launch, 98%+ in production. Because one mistake affects thousands, not dozens.

Consider SLOWER rollout than traditional features. Not ship to everyone day one. Phased: Internal → Pilot → Limited → Full. Catch problems when blast radius is small.

Adopt CONSERVATIVE design. When uncertain: escalate to human. When stakes high: require confirmation. When knowledge contradictory: admit limitation. Protect against catastrophic failures at scale.

Scale is a multiplier.
It multiplies both success and failure.

The Bottom Line

Automation amplifies everything.

When your agent works: trust builds faster than ever. When your agent fails: trust destroys faster than ever.

The asymmetry we discussed earlier? Scale makes it worse. One bug can destroy more capital in hours than months of good service built.

This is why:

80% accuracy often isn’t good enough
“Move fast and break things” typically doesn’t work
Bad launches are difficult to recover from
Conservative design tends to be essential

Scale is a multiplier. It multiplies both success and failure.

Design knowing that every decision affects thousands, not dozens.

What patterns are you seeing with agent failures at scale? I’d love to hear how teams are handling this in your context.

Part of a series on Agentic Experience Design — the discipline of designing AI systems that act autonomously while building trust, not destroying it.

Pawel Jozefiak

The 20 vs 20,000 framing is clean and I keep coming back to it. Running a Mac Mini agent 24/7 for months, I've had the equivalent of 'one bad day' situations - a config getting deleted, wrong data sent somewhere.

Small blast radius because the system was still limited. But you're right that the threshold shifts as you hand over more. The '95% before launch, 98% in production' bar is interesting - I haven't seen it stated that precisely before.

The implication is that most agents people are deploying right now don't actually meet it, and nobody's measuring.

1 reply by Hans van Dam

1 more comment...

Power to the Poets

Discussion about this post

Ready for more?