2 Comments
User's avatar
Pawel Jozefiak's avatar

The 20 vs 20,000 framing is clean and I keep coming back to it. Running a Mac Mini agent 24/7 for months, I've had the equivalent of 'one bad day' situations - a config getting deleted, wrong data sent somewhere.

Small blast radius because the system was still limited. But you're right that the threshold shifts as you hand over more. The '95% before launch, 98% in production' bar is interesting - I haven't seen it stated that precisely before.

The implication is that most agents people are deploying right now don't actually meet it, and nobody's measuring.

Hans van Dam's avatar

Good points. It's also important to not treat all the deployments the same. There is a big difference between building a personal agent, one for a small business, mid size company, or a public company in a highly regulated industry. Each organization has it's risk thresholds.

what are the chances of something going wrong?

what are the consequences if something goes wrong?

and do we feel about that risk profile?

Those questions should help teams figure out what they should and shouldn't set free.