TNAI: Jonathan Frankle on building MosaicML

TNAI

MosaicML Chief Scientist Jonathan Frankle joined Thursday Nights in AI, where he shared some hot takes on Mosaic’s recent acquisition by Databricks, open-sourced versus closed-source models, and AI policy.

We would like to thank Jonathan Frankle for such an informative talk. If you are interested, join us for our upcoming firesides! See the full list here.

This event is brought to you by @OutsetCap and @imbue_ai.

On the danger of centralization, and his mandate to go toe-to-toe with OpenAI:

Jonathan: “I hope there’s an alternative to the world we’re being presented with now. The world right now is there are a few big monolithic models, you can take them or leave them. You better hope they behave the way you want them to. You better be okay with giving away all of your data, and you better be okay with whatever prices are set. And you better be okay that they may change under you on a day-to-day basis or get deprecated six months after they come out, or what have you. You have no control. It doesn’t reflect anything you want and you better live with it. I don’t like that world. I'm not a big fan of centralization in general, and I'm not a big fan of taking away control from people, and I'm excited to present an alternative to OpenAI. I was told in this deal to go toe to toe with OpenAI my own way, and I intend to do it my own way. That's to give people choice, customization, efficiency, reduce cost. There shouldn't be four LLMs, there should be a hundred thousand. And everybody's data and everybody's opinion should get reflected in those models. It's really simple. Mosaic was pushing toward that. And now we've been given a ton more resources, a mandate, and a lot of amazing collaborators to go pull that off. So I hope there’s a choice now.”

On today’s nosebleed valuations in AI, and whether we’re going to see more $1B+ exits:

Jonathan: “Valuations and exiting are very different... I look at some companies that have very high valuations and I think, is that justified? How is it justified? And what happens next? How do you get acquired at a certain valuation if the investors are looking for 4x… A lot of valuations are getting to the point where I think it’s hard to contemplate acquiring a company that big…”
Jonathan: “The analog I look to is the AI chip space… I can name two exits in that space, both to Intel. I can’t think of any other exits in the AI chip space. Some incredible valuations and incredible technology, but so far it’s unclear whether there are going to be more exits. And it’s entirely possible we look back at the LLM space and say, wow, those were a lot of amazing companies with great technology and great valuations. Did anybody exit? I don’t know, but it’s certainly a question we should ask ourselves. It’s great to have a 4 billion valuation. It’s a lot better to actually be able to exit, and, you know, realize some valuation.”
Jonathan: “I definitely see a lot of pairing up. I mean, Snowflake acquired Neeva, and I think that was a really savvy move. But I think a lot of valuations, from what I understand, are getting to the point where it's hard to contemplate acquiring a company that big. It's a lot harder to acquire a company for 10 billion than for 1 billion. It was a very different animal to be fundraising on the VC market than to be talking about an acquisition.”

On why the goal shouldn’t be just trying to catch up with OpenAI:

Jonathan: “I would argue with the premise that the goal is to close the gap with OpenAI. My personal goal is not to close the gap, my personal goal is to do something different. I don’t think that just because a ladder has been defined means we all have to climb it. There are lots of other ways to solve problems and lots of other ways to come at this. My goal is not to go to Databricks and then build a GPT 4 scale model. That is decidedly not the plan. The plan is to continue doing exactly what we’ve been doing at Mosaic and try to find ways to build specialized models that can be useful to anyone.”

On why Mosaic’s $1.2B price tag is good for Databricks:

Jonathan: “I actually think it's fully justified. I think they got a good deal to be completely frank. I mean that, I mean that seriously, there were all these interesting articles that came out of like 21 million per employee then it was, yeah, and they had three salespeople and they were heading toward, however many million or tens of millions in ARR. Yeah, we were because when you're at a startup like this, everybody's on the sales team. I'm on the sales team. Our CEO is in the sales team. My researchers are on the sales team and people really want custom models. So I honestly think the valuation was fully justified and perhaps they got a good deal in the process and good for them.”

On why Databricks’ acquisition of MosaicML is good for the world:

Jonathan: “I think it's really one of those situations where one plus one equals three. They are really good at data. We're really good at training models … In general, I think this is going to be really good for the world. Everybody knows that data is what powers deep learning these days. There were all these leaks of the GPT 4 architecture and my response is: "who cares?" That you could have done it a thousand different ways and gotten a model just as good. The data was what mattered. And I'm really excited to hook up to data and take advantage of our data, have the world take advantage of their own data, and build really great models.”

On why transformers may continue to be the dominant architecture:

Jonathan: “Honestly, my belief about this is we love to think that the technology is changing really quickly and in many ways it is, but by and large, the underlying good inductive biases and good models are really hard to come by… Good architecture seemed to last for decades, not years and transformers, I think, are going to be around for a while because it's really hard to replace them.”

On why Jonathan doesn’t believe in existential risk:

Jonathan: “I care about the world, and I care about actually making an impact on the world. And I think all this existential risk bullshit is a great way for a lot of people to make themselves feel like they’re doing something good and just keep doing what they were doing before…”
Jonathan: "There are real tangible society and policy issues that can be addressed today, and existential risk is a way to distract people whose time is valuable and who actually make these decisions from working on the right problems."
Jonathan: "I’m a technologist. My job is to share what I know and help not to try to make things up and reinvent the wheel for a bunch of people who have been working in this field and know how to actually operate the levers of policy and think about all of society.”
Jonathan: “In AI and tech in general, we have this belief that we can reinvent or disrupt anything and do it better than the people who have been doing it for a long time. When it comes to policy and society, we need to stop acting like we are experts in this space just to make ourselves feel good.”

On how technologists can get involved in policy:

Jonathan: “Ask to join a conversation and go where you’re invited. Don’t go shouting about it. White House meetings are very fun and pretty, but nothing gets done there, that’s not where it happens. I work with congressional staffers on this stuff that's where the work gets done. And there's no recognition. You don't get in the New York Times for that, but it's useful.”
Jonathan: “The thing about policy is it is slow and boring and incremental, and you have to build relationships and build trust over time. I'm still building those very slowly, and I've been building them for almost 10 years. So the first step is to just have any conversation. Talk to me, and I'm happy to put you in touch with someone, and then earn their trust, and then they'll put you in touch with other people but it is a gradual process. The thing about technical subjects is you tend to be able to measure results for the most part. I mean, we all talk about how publications are impossible to replicate, but at the end of the day, you can kind of evaluate what a good piece of science is and what a good piece of science isn't, at least if you take a step back. Policy is not measurable. It's really completely in the eye of the beholder and it comes down to values and these really fuzzy things. And so you can't measure "Is someone good at policy?" It's really about trust and you have to earn that trust over time. You can't get trust immediately just by saying I'm a founder. Trust me. That's not how this works.”

On why it’s not as simple as open- versus closed-source:

Jonathan: “I get a lot of questions at these events about open source versus closed source, and I really don’t like that. It's about transparency versus lack of transparency and control versus lack of control. Imagine that OpenAI told you literally everything they did in GPT 4. You still have no control over that. Imagine I released a model tomorrow and told you nothing about how I built it. There's no transparency. Those are really the two axes that matter. Open source can get you some amount of control and get you transparency if someone's willing to share how they built it. So for me, open source is really about that. It's about transparency and control. I care a lot about control for my customers. I care a lot about transparency for my customers. So it's natural to open source things and tell people how I did it.”
Jonathan: “And honestly we take a lot from the community. Everything we do at MosaicML is built on what the community’s done, and I’m not a big fan of taking and not giving back. We need to sustain the community, especially right now, when all the big labs that have sustained the community are shutting down and closing up. Brain doesn't exist anymore. DeepMind's not going to publish anything. FAIR is very confused about whether they're going to be a very open lab or not, and we'll probably find out in the next couple of weeks which direction they're going to go, and I really hope they go the open route. I feel like overnight I was suddenly running the third-largest open industry research lab and that's really scary given that I have a 20 person team.”

On why open-source models could provide a viable alternative to proprietary LLMs:

Jonathan: "I think it's going to be like any open source project. Linux is going to be behind Windows, perpetually, on a lot of things. But the gap remains about the same. Given how quickly things are improving, the open source models in the fall are going to be so much better than LLaMA-7B, which was the de facto best open source model a few months ago, and even that's been left in the dust at this point by things like Falcon. So I think we're going to keep seeing that curve improve and, I hope we'll be a part of that… I'm sure OpenAI is doing a bunch of crazy things right now to try to keep that gap widening, but it's hard to bet against the community. Microsoft bought GitHub and Linux now runs inside Windows. It's really hard to beat the open source community in the long run. And we're in it for the long game now. "

Jonathan Frankle: @jefrankle
Ali Rohde: @RohdeAli
Josh Albrecht @joshalbrecht