The messy reality of running AI agents
AI agents that work while you sleep? For now, it's more like a toddler who needs supervision
Can you really run your AI agents while you get your beauty sleep? Or is it more like a Mom trying to keep her toddler from running amok?
Finding that out became my goal over the past week, after my Fortune editors asked what actually happens when AI agent tools like the viral OpenClaw—or whatever OpenAI builds now that it has hired OpenClaw’s creator, Peter Steinberger—are allowed to run 24/7.
In my new Fortune story published today (here is a gift link!) I have the answer, at least for the moment: Toddler. Or, best-case scenario, junior intern.
The problem is while tools like OpenClaw and Claude Code make it technically possible for agents to run for long periods, in practice these tools are typically fragile, unpredictable, and labor-intensive to manage. Rather than replacing human work, today’s agents often require constant monitoring, guardrails, and intervention, especially when the stakes rise beyond low-risk experiments.
That’s why if you’re on X a lot, you might have come across a cautionary tale from Summer Yue, who works on safety and alignment on Meta’s superintelligence team.
In a post on X Monday, Yue described how her OpenClaw autonomous AI agents—built to run locally on a Mac mini computer—deleted her entire inbox, ignoring instructions to pause and ask for confirmation first.
“I had to RUN to my Mac Mini like I was defusing a bomb,” she said. It was, she added, a “rookie mistake.” The workflow had been working in a test inbox she used to safely trial the agent for weeks, she explained, but in the real inbox the agent lost her original instruction.
I spoke to a fantastic array of experts for my Fortune piece, including Shyamal Anadkat, who previously worked as an applied AI engineer at OpenAI; Yoav Shoham, a former principal scientist at Google, a professor emeritus at Stanford and co-founder of AI21 Labs; Bret Greenstein, chief AI officer at consulting firm West Monroe; and Aaron Levie, CEO of cloud-based content management and collaboration company Box.
The bottom line: For now, working with AI agents may have less to do with sleeping while they work than with staying half-awake while they do. Tools like OpenClaw can run for hours at a time, but for many early users, that autonomy comes with a new kind of vigilance—checking logs, reviewing outputs, and stepping in before things go wrong.
That dynamic was captured in a recent viral post titled Token Anxiety, in which investor Nikunj Kothari described a friend leaving a party early—not because he was tired, but because he wanted to get back to his agents. “Nobody questions it anymore,” Kothari wrote. “Half the room is thinking the same thing. The other half are probably checking the progress of their agents. At a party.”
Do you have any stories of AI agents gone rogue? Or how OpenClaw is your latest obsession? Are there use cases you love so far?



Great framing — “toddler, not coworker” matches what we’re seeing in production.
One thing that helped us was treating autonomy like a budget, not a binary: agents can auto-execute low-risk tasks, but anything with destructive potential (deletes, sends, edits at scale) requires explicit checkpoints plus morning audit logs. That keeps the upside without pretending supervision is optional.
That toddler analogy is accurate. I've been running an AI agent (Wiz) on night shifts for months now, and the honest description is: it does real work, but it also does real surprising things.
The turning point wasn't making it smarter. It was building structure around it. Explicit memory files, audit logs, defined autonomy levels per context. Not because the agent needed more guardrails but because I needed clearer signals to know what to review in the morning.
The messiness you're describing is actually the right phase. It means you're past demos and into something real. I wrote about the early night shift setup here: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1
Curious what monitoring patterns you've landed on. That's where I see the biggest gap in how people talk about agents publicly.