Meta Releases Llama 4 Flagship AI Models

•

a day ago

Summary

Meta releases Llama 4, a new crop of flagship AI models; Maverick ranks second on LM Arena; Version deployed to LM Arena is different from the widely available one; Concerns raised about benchmark reliability and model fine-tuning

Key Points

The Maverick model has been optimized for conversationality
The version deployed to LM Arena is experimental, while the widely available version is not
This raises concerns about the reliability of benchmarks and how AI companies fine-tune their models

Why It Matters

This matters because it highlights the importance of transparency in AI model development and the need for reliable benchmarks.

Author

Kyle Wiggers

More Headlines

BBC Complains to UK Regulator Over Lack of Branding on Apple News and Google News

The BBC is complaining that aggregators like Apple News and Google News minimize credit for the stories they feature. The broadcaster is asking the UK's Competition and Markets Authority to require Apple and Google to more prominently credit news sources.

Microsoft Releases Quake II Browser-Based Level

Microsoft has released a browser-based, playable level of the classic video game Quake II as a tech demo for its Copilot AI platform. The experience isn't quite the same as playing a well-made game, but it showcases the capabilities of Microsoft's Muse family of AI models for video games.

How Atlantic Editor-in-Chief Got Added to Signal Chat

An internal investigation by the White House's information technology office concluded that Atlantic editor-in-chief Jeffrey Goldberg was accidentally added to a Signal group chat with Trump administration officials. The addition occurred when Mike Waltz, National Security Adviser, tried to add Brian Hughes, a Trump spokesperson, to the chat and ended up adding Goldberg instead.

San Francisco Mayor Wants to Attract Tech Leaders

San Francisco Mayor Daniel Lurie wants to bring back the city's glory days by addressing a drug and homelessness crisis, streamlining permitting, and attracting tech leaders with tax breaks. He has spent much of his first 100 days in office walking troubled neighborhoods and rolling back a program that handed out free pipes, foil, and straws used for ingesting drugs.

OpenAI's ChatGPT struggles to turn momentum into revenue in India

OpenAI's ChatGPT has seen rapid growth in India, but the company is struggling to turn this momentum into revenue. One factor may be the lack of local pricing for India, with OpenAI's cheapest plan costing $20 per month.

Week in Review: Epic Wins, Zelle Shuts Down, Plaid Valuation

In this week's Week in Review, Epic Games won a case against Google over app store practices. Zelle, a person-to-person payment platform, shut down its stand-alone app as most users accessed it through their banks. Plaid, a fintech company, raised $575 million at a valuation of $6.1 billion.

Minecraft Movie Brings in $58 Million on Opening Friday

The Minecraft movie adaptation brought in $58 million on its opening day, making it one of the biggest openings of the year. The film is expected to bring in a total of $135 million over the weekend, surpassing 'Captain America: Brave New World' and providing a much-needed boost to the theatrical box office.

Deel's Communications Chief Resigns

Deel's head of communications Elisabeth Diana has resigned amid allegations that the company planted a spy at rival workforce management platform Rippling. The news follows a lawsuit filed by Rippling against Deel, accusing the company of violation of the RICO racketeering act, misappropriation of trade secrets, and unfair competition.

Meta Releases New Collection of AI Models

Meta has released a new collection of AI models, called Llama 4. These reasoning models fact-check their answers and generally respond to questions more reliably, but as a consequence take longer than traditional models to deliver answers.

Elon Musk's DOGE Plans Hackathon

Elon Musk's DOGE is planning a hackathon to create a 'mega API' that will provide access to taxpayer data. The event, organized by two DOGE staffers at the Internal Revenue Service (IRS), aims to make it easy for cloud providers to access IRS data, including names, addresses, social security numbers, tax returns, and employment information. This has raised concerns about the potential security risks and privacy breaches.

Tech Layoff Wave Still Kicking in 2025

The tech layoff wave continues in 2025 as several startups and companies announce layoffs. Pandion, Icon, Altruist, Aqua Security, and Solargedge Technologies have all announced job cuts, impacting hundreds of employees.

IPO Window May Close Again

in the wake of president trump's sweeping tariffs, two highly anticipated ipos are hitting pause. klarna and stubhub had released public documents for their respective ipos last month, each hoping to raise at least $1 billion in their debuts. both were set to launch their road shows next week, talking to potential investors about their ipo, but decided to postpone.

Trump Extends TikTok Deadline

President Donald Trump has extended the deadline for a deal to save TikTok in the US by 75 days, citing progress in negotiations. The move comes just one day before the original deadline was set to expire. ByteDance, the Chinese company that owns TikTok, must still finalize a sale of its US operations.

GitHub Copilot's Costly Upgrade

github announced 'premium requests' for its ai coding assistant, github copilot. the new system imposes rate limits on tasks and actions with newer models. customers on pro and business plans will receive 300 and 1000 monthly premium requests respectively.