Discussion about this post

User's avatar
Giving Lab's avatar

Great framing — “toddler, not coworker” matches what we’re seeing in production.

One thing that helped us was treating autonomy like a budget, not a binary: agents can auto-execute low-risk tasks, but anything with destructive potential (deletes, sends, edits at scale) requires explicit checkpoints plus morning audit logs. That keeps the upside without pretending supervision is optional.

Pawel Jozefiak's avatar

That toddler analogy is accurate. I've been running an AI agent (Wiz) on night shifts for months now, and the honest description is: it does real work, but it also does real surprising things.

The turning point wasn't making it smarter. It was building structure around it. Explicit memory files, audit logs, defined autonomy levels per context. Not because the agent needed more guardrails but because I needed clearer signals to know what to review in the morning.

The messiness you're describing is actually the right phase. It means you're past demos and into something real. I wrote about the early night shift setup here: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1

Curious what monitoring patterns you've landed on. That's where I see the biggest gap in how people talk about agents publicly.

No posts

Ready for more?