2024: An AI 'hot mess'? Chatting with Nathan Lambert
As the New Year approaches, I spoke with the former Hugging Face and current AI2 researcher on AI uncertainty, open vs. closed models, and the potential 2024 "hot mess"
Happy Solstice!
(Here’s a song I wrote about the Solstice back in 2019)
As we hit the brakes on 2023 and get ready to rev up 2024, For this week’s AI Extra I took some time to chat with
Lambert, who was formerly at Hugging Face “bootstrapping an RLHF team” and is now a machine learning researcher at the Allen Institute for AI. He also writes a great Substack and co-hosts the Retort AI podcast with another researcher I’m a fan of, Thomas Krendl Gilbert (who I talked to back in September about AI as ‘alchemy’)Lambert and I met on a Google Meet a few days ago — where I was disappointed to find that the recording function doesn’t allow for a split screen (like Zoom), so the below 15-minute video only shows the speaker. But we had a great chat about everything from how at the end of 2023, we still don’t know how LLMs work to open vs. closed models and the potential 2024 election "hot mess”:
Sharon: What does it mean that we're, you know, just charging along in this in this way around something that there's still so much unknown and so much uncertainty?
Nathan Lambert: It is definitely true that we don't have good causal mechanisms to understand why these things are working or even understanding our own experiments to see how they're working. I think a lot of this goes back to this famous Ilya Sutskever quote that was on something like a tutorial or talk at a top conference where the last slide is just like you have a big model, a lot of data success is guaranteed. It’s just the kind of theme of this year.
I mean, in my own work, too, most of the things that you try, as long as you're pretty principled about the data and have the right infrastructure to do it, most of the stuff works. My personal experience is there's this direct preference optimization method that's becoming popular and fine tuning. And in my career, I left Hugging Face. And then we released the zephyr model, which was trained with this method, and then I joined AI2, and they were playing with it. And then like, one week later, we ran another model, and it was good. And we released it. And it's just like, things don't normally just work like that. Machine learning was a very ripe space for people to expand into.
Sharon: Is that really what's driven some of the excitement, at least on the research side? Do you think that's part of what keeps driving this is that, wow, this is stuff we've been working on for so long. And now it's just kind of working. We don't know why. But things are just working. And that's really cool. Because we don't always get that.
Nathan Lambert: Yeah, I mean, it's pretty odd. One of the models that we released, the biggest one, we literally ran one run on this big computer, and it worked and we were like, oh, we'll release it. We didn't do other experiments, it was good enough, we did. But it was literally one and done, which is bizarre.
In the bigger context, it seems to me and a lot of people that OpenAI were the first ones to find some really strong recipes that worked for them. And they're just mining the vein, they were just earlier and really good at it. And they've just been mining the vein. And that's how they got GPT-4 well before anyone else had a model like that. And they're still doing this, I'm sure there's more behind the scenes. But that doesn't mean that their vein is based on like some intricate first principles understanding that no one else has, it's just they've been working in this space a little bit longer and have tried some more things and have a few budding intuitions. But I don't think they're like, We know exactly what to put in our models. They just try a lot of things. And they're ahead in the timeline of things and have the best models from that.
Sharon: But what does that mean for the general public? That would certainly be bizarre to know that these are products and applications that are being created based on a foundation that the underlying sand is sort of shifting, and that we don't really know how it works exactly. What does that mean to not know how it works? Does that matter? Should we be concerned about that? Is it just the nature of the way these things evolve?
Nathan Lambert: I mean, I think this shaky foundation is really the primary motivator behind many of the little ideologies of AI that we have. So not knowing why it works is why people that are focused on AI safety and existential risk will say that we don't know how bad these unknown unknowns will be. But also then for the AI critics in the market cycle that we have “we can't rely on, we don't know how far scaling laws are gonna go because we don't know exactly why they're happening.”
Sharon: And the AI accelerationists, what would they say? They probably say, who cares? Let's go!
Nathan Lambert: Yeah, they're like, we're good as long as we're good. Not everyone will deeply — it's also extremely hard to deeply reckon with this. It's like yes, it's definitely a context by which all of our principles and approaches operate in. But I don't know anyone that's like, the fact that we're uncertain of it means we take x action. I think it just means that things can change and we see a lot change throughout the year — like 2023, the amount of things that have come and gone is really fast on the model side and I think that just kind of makes it this uncertain space which is fun for some people but it is serious. I think if somebody that wants AI to be understood, it means that you should be really serious about how you're reading everything because the intentions of most work is not necessarily based on scientific principles.
Sharon: Well, so that leads to my other question that I wanted to ask you about jumping off of that is a tweet that you posted, I don't know, last week or something like that about how the creators of these models can ‘juice’ their model scores — I guess because if you don't understand exactly how things work, then measuring the success in different areas is hard to validate Is that what you meant?
Nathan Lambert: It's more how we don't have access to the information on what they're training on and how incentives of people building the best models is not the same as the incentives of the people trying to understand them. NLP and most machine learning, especially on the academic side, there's a long history of trying to use set benchmarks to evaluate progress in the field. So things you'll hear MML, used and thrown around — massive multitask language understanding —there's a long list of benchmarks. And the thing is, if you're trying to make a model for an enterprise customer, your goal is to make the best possible model. And there's no reason to not train on any data that’s not your internal evaluation metric. You need to have something internally, that is how you're judging if your experiments are going well or not, and I'm sure they all do.
But anything that's not on that list, it makes sense to just train on the data, because there's not that much high quality data out there. And it could make your model better for the end case that you are interested in, which is not an academic competition. And there's a lot of rumors and ways by which that can be done both intentionally and unintentionally. But it's really just that we need different ways of understanding this boundary between things like GPT-4 and Gemini compared to the models that people are trying to train in the open, like document the data and everything like that. I think it's a new way for everyone to think about it that's hard to do. Because this is the first year that people really had to do that.
Sharon: Well, I did think it was interesting in just the last week or two withGemini being released, and then Mistral releasing their models. Just kind of coming to the end of this crazy year and having that open source-closed theme, kind of wrapping it all up with a bow, plus the EU AI act. I was wondering what your take was on all of that?
Nathan Lambert: At a practical level, I think the AI community loves to toot their own horn before big conferences, and NeurIPS was right around the corner. I mean ChatGPT also came out at NeurIPS, it’s like people like to be the talk of the town at the big conferences, because, it's odd, but it really seems like machine learning researchers are a pretty gossipy and cliquey bunch. And this is one of the dynamics of this.
But I honestly see that open and closed models are really developing in different ways. In the open, there's way more incentive to have compute efficiency so that people can actually run the model and fine tune the model themselves and have it on their laptop in the next few years. For Google or OpenAIs of the world, they don't need that because everything will be accessed via an API. There's an aside that Google has a model for the Pixel, but in the big picture like that the incentives are a little bit different. And right now, I think it looks like they're kind of competing with each other because it's still in the short term.
But as scaling laws become bigger and eventually we get to this point where the models that OpenAI are training or that they're training are just so big, they've 10x or 100xd the compute for training or inference, and the laptops that we have are not going to grow in that much compute just by basic, like energy requirements points of view. That doesn't mean that people aren't going to want the best open model on their laptop, it's just I think the open ecosystem is going to have incentives grounded in practical hardware constraints for local development. Whereas like API models are going to be you just you just ping a cloud server, and it returns your text and the whole economic ecosystem around that will be very different. They'll intersect in many ways. But when you look at where we are now, or at the beginning of this fork, and it looks like they're kind of winding back and forth, but I don't really think that they are.
Sharon: what what are your biggest thoughts for the coming year? And is there anything that you thought would happen this year that really didn’t?
Nathan Lambert: I think the interesting things are trying to come up with predictions on specific capabilities and/or revenue numbers. I think Ben Thompson or something, I don't remember if it was him, but someone else I follow was saying which is the first company to get to 10 billion annual revenue with an AI product? Microsoft is making a lot of money now. And these are the types of predictions that I think matter a lot. Because once these sorts of milestones are hit, other large tech companies really start to be like, oh, I should be in the space. That's a lot of revenue to make off of a product.
There's kind of naive things that I have on my mind like Llama 3 is definitely going to be released. The question is what percentage probability do you put on Llama 4 and or if Mixture of Experts and specific architectures is important? I think all of these things are easy to kind of look into the distance and there will be really good runway on open models in 2024. I think the pace of, at least in the first half of the year, it seems like they'll still be a lot of people releasing seven to 10 million parameter models that are pretrained that have really good traction, so it doesn't feel like we're slowing down at all. But eventually, there'll be a slowdown in people training models, just to show that they can train models.
Sharon: And what are you most excited about in the new year around AI?
Nathan Lambert: I really see a continuation of what is coming. It's like we had this Mixtral model, I expect Llama three to be released in January or February. It feels the same. I think people are asking the right questions around evaluation. And there's a lot of pressure to try to create evaluations that are harder to game and more meaningful for things like chat, I think that would be really good. I think people are generally looking in the right directions.
I'm really focused on trying to understand what information is encoded on preference data, I’m working on evaluation there just to kind of try to be more specific about what our RLHF may be doing. Because in the open, people are starting to catch on there, it took up until the fall for people to be able to do that with open source models other than Llama two was the first one, but they didn't quite release code or anything. So it was hard to reproduce. I kind of see it as a continuation.
If you're on the business side, it's interesting year to see which companies have traction. Like last year, there were companies like Jasper which grew really wildly, and they already are having doing some cuts, in terms of that things will continue. There's going to be open companies that have trouble raising towards the end of the year, companies that are working with open API's that have a hard time. I's just kind of like the Wild West. I think 2024 in that way will be similar.
Sharon: Is there anything that you you mentioned geopolitics? Is there anything else that you think is being under discussed, that you and Tom will need to podcast about?
Nathan Lambert: Honestly, not really. We'll see what the UK does. Because they've been hosting a lot of summits, I don't think they've really tried to do anything in the legislative arena. I don't expect regulation to come in the US given that it's an election year. And it's a pretty hot topic. So I don't see any major legislation on the radar, like the EU act is being finalized. So it's not going to change much.
Sharon: I didn't think of that, that the US election might slow things up as well.
Nathan Lambert: I think the US election will be the biggest determining factor in the narrative to kind of see what positions different candidates take and how people misuse AI products, and how that attribution is given and how that's handled by the media. People are going to be using ChatGPT and DALL-E and things to create content for the election machine. And whether or not people attribute that to the campaign's doing it or bad actors or if they say that OpenAI is responsible, it's going to be a hot mess, where things will determine the narrative because it's impossible to keep the information as sanitized as it needs to be to understand this stuff.
Sharon: Yeah, a hot mess. Well, on that note — 2024 — a hot mess!
Nathan Lambert: It's fun. It's fun to follow along, as long as you don't have too many eggs in one basket. I still think it's extremely stressful to be a founder because you have to be really on top of things. It just changes so fast.
Sharon: Yeah, better to be an observer. We're like the sideline commentators.
Thanks for reading and have a great holiday weekend! I appreciate any comments and shares! :)
~ Sharon