Discussion about this post

User's avatar
Pawel Jozefiak's avatar

The 20 vs 20,000 framing is clean and I keep coming back to it. Running a Mac Mini agent 24/7 for months, I've had the equivalent of 'one bad day' situations - a config getting deleted, wrong data sent somewhere.

Small blast radius because the system was still limited. But you're right that the threshold shifts as you hand over more. The '95% before launch, 98% in production' bar is interesting - I haven't seen it stated that precisely before.

The implication is that most agents people are deploying right now don't actually meet it, and nobody's measuring.

1 more comment...

No posts

Ready for more?