Meta is back in the LLM game after a year-long break

What Muse Spark tells us about Meta’s new AI strategy.

Apr 20, 2026

∙ Paid

In the latest episode of the AI Summer podcast, Tim and Kai discuss Claude Mythos Preview with Sayash Kapoor, a computer scientist at Princeton.

The April 8 release of Meta’s new model Muse Spark got overshadowed by Claude Mythos Preview, which was announced one day earlier. But Meta’s new model family — and the 158-page safety report Meta released about it last week — are still significant for what they tell us about the company’s future role in the AI industry.

Mark Zuckerberg spent billions of dollars to assemble the team that built Muse Spark. The model’s release gives us our first hints about whether Meta will be able to break into the top tier of AI labs.

Meta has all of the advantages of a well-resourced technology company: lots of AI chips, proprietary data, and lavish salaries. Those resources have enabled the Meta team to produce a model with strong benchmark scores. But I suspect that those scores still overstate the model’s real-world utility.

The companies that produce today’s best models — Anthropic and OpenAI — excel at the subtle art of post-training. This is the step that gives a model its “personality” — the combination of creativity, resourcefulness, and ethical grounding that turns a good model into a great one.

I don’t think Meta’s new AI team is there yet. And it’s not clear if Zuckerberg will be able to build a team with top-tier post-training capabilities, no matter how many billions of dollars he spends on the effort. Meta’s metrics-obsessed culture may help the company catch up to leaders like Anthropic and OpenAI, but I predict it will be a poor guide for further innovation once Meta’s models are closer to the frontier.

The Llama 4 stumble

Muse Spark was a long time coming; Meta’s previous model release — Llama 4 — was more than a year earlier.

On April 5, 2025, Meta heralded the release of the Llama 4 model family as “our most advanced models yet and the best in their class for multimodality.” Meta claimed that Llama 4 Maverick, the mid-sized model in the series, outperformed OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash “across a broad range of widely accepted benchmarks.”

But the Internet wasn’t impressed.

“Genuinely astonished how bad it is,” one Redditor commented on a post titled “I’m incredibly disappointed with Llama-4.” Other commenters concurred. “Pathetic release from one of the richest corporations on the planet,” one wrote.

It wasn’t just Reddit: Llama 4 performed “mid” or “less than mid” on just about every independent benchmark, writer Zvi Mowshowitz observed.

While previous Llama models, especially the Llama 3 series, are still popular with researchers, Llama 4 has been relegated to the dustbin of history.

The release of Llama 4 hurt Meta’s reputation in the AI community. Llama 4 models had only done well on benchmarks because — as Meta’s then chief AI scientist Yann LeCun later told the Financial Times — the “results were fudged a little bit.” Meta had fine-tuned specific models to do well on prominent benchmarks and reported those results. Then it released different models to the public.

“I am placing Meta in that category of AI labs whose pronouncements about model capabilities are not to be trusted, that cannot be relied upon to follow industry norms, and which are clearly not on the frontier,” Mowshowitz wrote at the time.

For the next year, Meta did not release any LLMs — not even Llama 4 Behemoth, which it had previewed in the Llama 4 announcement.

But Mark Zuckerberg didn’t give up. Last June, he began restructuring Meta’s AI efforts. Meta invested $14.3 billion in the data labeling startup Scale AI to hire its then-28-year-old CEO Alexandr Wang, in a process called an acquihire. Wang became Meta’s chief AI officer and led a new effort within the organization called Meta Superintelligence Labs (MSL).

Meta Chief AI Officer Alexandr Wang. (Photo by Ludovic MARIN / AFP via Getty Images)

Meta splurged on more than Wang. In July, the New York Times reported that one 24-year-old researcher was offered $250 million, including $100 million in the first year. Meta offered engineers pay packages that “hovered in the mid-tens of millions of dollars,” according to the Times. Meta poached several researchers from OpenAI, which prompted the latter’s chief of research to write an internal memo saying it felt “as if someone has broken into our home and stolen something.”

By August, Meta had recruited more than 50 new researchers and started work on a new model, codenamed Avocado. Meta laid off 600 researchers from older AI units in October, but the new team kept working. By the end of December, it had completed the pre-training process for Avocado.

In mid-March, the New York Times reported that Avocado was being delayed from a planned March release because it performed worse than leading AI models from Google, OpenAI, and Anthropic “on internal tests for reasoning, coding, and writing.”

Finally, on April 8, Meta announced it was releasing a new LLM: Muse Spark.

Initial reviews were mostly positive — or at least not relentlessly negative like the reviews for Llama 4.

Keep reading with a 7-day free trial

Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.