It’s been one of those weeks.
You know, one of those weeks when even my AI reporter peers say the scale and speed of the news cycle is “crazy” and “unsustainable.” When I almost can’t think straight from all of the AI announcements from Google. Microsoft. OpenAI. Anthropic. What do I stick my pin in today? I have to commit. I must come up with a hot take. Or give up and take a nap. Just ten minutes, I promise.
Still, a good idea wakes me right up. So does some great social media buzz, which sends me down rabbit holes to figure out what’s real and what’s hype.
Here’s what I stuck my pin in this week:
The AI model wars are SO not over.
Weren’t models supposed to turn into cheap commodities and no one would care who built what? Fuhgeddaboutit. Gemini Diffusion was an experimental research model with a demo that didn’t get stage time at Google I/O. But I was fascinated by the social media hype that piled on immediately because of the blazing speed of the demo and the diffusion technique applied to text rather than simply images. Still, some experts I respect like Nathan Lambert cautioned that it’s still just a demo: Gemini Diffusion might be the “biggest endorsement yet of the [text diffusion] model, but we have no details so can’t compare well.”
But the model wars heated up further yesterday when Anthropic dropped Claude Opus 4 and Claude Sonnet 4, claiming that the Opus model is “the world’s best coding model,” and that on a benchmark comparing how well different large language models perform on software engineering tasks, Anthropic’s two models beat OpenAI’s latest models, while Google’s best model lagged behind.
I didn’t have bandwidth or space to get into it, but I actually dislike focusing on these benchmarks — I’ve spoken to many experts, including Lambert, about benchmark gaming. And my AI reporting peer, Kyle Wiggers at TechCrunch, has written extensively about this. I’d love to dive more deeply into this, but I’ll have to put a pin in it for now, lol.
Note to self: Read the 123-page Anthropic Claude 4 system card. (Or just feed it into Claude to decipher, ha ha.)Wanted: AI for common sense.
For my weekly Eye on AI newsletter for Fortune (it’s free to sign up — I write it on Thursdays and my colleague Jeremy Kahn writes it on Tuesdays), I spoke to Salesforce AI research chief scientist Silvio Savarese about AI and common sense. Turns out even many of the most sophisticated models, that can solve highly-complex math problems, can have a hard time with simple riddles, like a version of the classic one about the farmer and the fox and the chicken and the grain.
In Savarese’s test, the rowboat could hold the farmer, fox, chicken and grain together — but the LLMs still reverted to many steps, as in the classic riddle. However, TIME’s Billy Perrigo pinged me to let me know Claude 4 had solved Savarese’s version. Perhaps this is a common sense issue that will soon be solved across the board?
What I haven’t read yet but I bought on my Kindle and plan to read over the holiday weekend:
Karen Hao’s Empire of AI: Dreams and Nightmares in Sam Altman’s OpenAI
Hope it doesn’t disrupt my beauty sleep.
Altman ❤️ Ive
Speaking of OpenAI, I didn’t have time to tackle the BIG news that OpenAI hired former Apple design chief Jony, the creator of the iPhone, as part of a $6.5 billion acquisition that holds the potential to create an entirely new way for people to interact with AI technology.
Besides the massive Big Tech implications, I’m also simply fascinated overall with how the way we use the internet is going to change drastically with new user interfaces. I actually spoke to a really interesting startup this week, with big names attached, that is going deep into a consumer play on this — more to come on that.
Ch-Ch-China
I attended a really interesting dinner/salon this week with guest speaker Dmitri Alperovitch, author of World on the Brink: How America Can Beat China in the Race for the Twenty-First Century.
The geopolitics of AI related to China is big industry news every day of the week, and Alperovitch had some spicy takes on Nvidia that I plan to circle back on.
Note to self: Read World on the Brink. After Karen Hao’s book, perhaps, or maybe keep switching back and forth until I start screaming.
Other tidbits from me apropos of nothing:
I’ve been binge-watching Mad Men. I haven’t watched it since the initial run. The 7th season episodes where the agency gets the IBM 360 computer that takes over the creative offices and everyone is freaked out about it is very reminiscent of today’s AI moment.
I’m a musician in my other life and I spent an hour last week chatting with ChatGPT about my new sound system which I was struggling to use at my latest gig. It was incredibly helpful: I mean, what’s not to love about a response like this?
🚀 Want me to write out a step-by-step plan for connecting the A/B switch and DI box into your current setup? I can make it super clear and organized so you don’t have to stress during rehearsal. Want me to do that?
Yes. Yes, in fact, I do.
Note to self: Delete X, Threads, LinkedIn, Slack and Outlook for entire holiday weekend. Substack app can stay.