Elon Musk launched Grok 4 final evening, calling it the 'smartest AI on the planet' — what companies have to know

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now

After days of controversy surrounding a flurry of antisemitic responses made lately by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the newest model of his AI mannequin household, Grok 4, throughout an occasion livestreamed on X final evening, calling it the “the neatest AI on the planet.”

As Musk posted on X: “Grok 4 is the primary time, in my expertise, that an AI has been capable of remedy troublesome, real-world engineering questions the place the solutions can’t be discovered wherever on the Web or in books. And it’ll get a lot better.”

The brand new launch really consists of two distinct fashions: Grok 4, a single-agent reasoning mannequin, and Grok 4 Heavy, a multi-agent system designed to unravel advanced issues by inside collaboration and synthesis.

Each fashions are optimized for reasoning duties and include native device integration, enabling capabilities similar to net search, code execution, and multimodal evaluation.

Musk and his group at xAI showcased benchmarks that recommend Grok 4 outperforms all present opponents throughout a variety of educational and coding evaluations, even in comparison with previously main AI reasoning mannequin rivals OpenAI o3 and Google Gemini.

Nonetheless, xAI has not but launched a mannequin card nor any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess efficiency and the claims made through the stream. We’ll replace if/when these change into obtainable.

Nor did Musk and his xAI group members collaborating within the livestream deal with the obvious controversy going through Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that folks with Jewish surnames must be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Struggle 2.

The closest Musk got here was when he acknowledged: “The factor that I believe is most vital for AI security—not less than my organic neural internet tells me an important factor—is to be maximally truth-seeking,” and “We have to be sure that the AI is an efficient AI. Good Grok” in addition to “It’s vital to instill the values you need in a baby that may develop as much as be extremely highly effective.”

Nonetheless, Musk didn’t apologize nor did he settle for duty for Grok’s antisemitic, sexually offensive, and conspiratorial remarks. Right here’s a cop of the complete stream under:

All through the livestream, the group emphasised Grok 4’s means to cause from first ideas, appropriate its personal errors, and doubtlessly invent new applied sciences or uncover novel scientific insights.

The presentation additionally included demonstrations of Grok 4 Heavy making use of multi-agent collaboration to deal with research-level issues throughout disciplines.

Availability and pricing

Grok 4 is obtainable now by a number of channels, relying on consumer kind and subscription degree:

API Entry (for builders and enterprises):
Grok 4 and Grok 4 Heavy are reside through the xAI API. Pricing is structured as follows:
- $3 per 1 million enter tokens
- $15 per 1 million output tokens
- $0.75 per 1 million cached enter tokens
- Costs double after 128,000 tokens in a single context window
  The API helps textual content and picture inputs, perform calling, structured outputs, and affords a 256,000-token context window.
Shopper Entry (through Grok chatbot and apps):
Particular person customers can entry Grok 4 by the Grok chatbot on X, the Grok app (iOS and Android), and X.com, however solely with one of many following subscriptions:
- PremiumPlus: $16/month
- SuperGrok: $300/month
- A brand new “SuperGrok Heavy” tier, additionally priced at $300/month, gives entry to each Grok 4 and Grok 4 Heavy, the multi-agent variant.
  (Word: SuperGrok and PremiumPlus tiers might differ in availability and utilization quotas throughout X and Grok platforms.)
Launch Timing:
Grok 4 turned obtainable instantly following the July 9, 2025 livestream. Non permanent entry limits have been in place through the demo, however full rollout to subscribers started shortly after.
Platform Growth:
xAI has indicated plans to make Grok 4 obtainable by Microsoft Azure AI Foundry, the place Grok 3 and Grok 3 Mini are at the moment listed.

For subscription particulars, customers are directed to x.ai/grok and X Premium help. Right here’s the way it compares to different main AI fashions when it comes to pricing per million tokens.

Supplier & mannequin	Context window	Enter ($/Mtok)	Cached enter	Output ($/Mtok)	Further notes
xAI – Grok 4 / 4 Heavy	256 Ok (2× worth >128 Ok)	$3.00	$0.75	$15.00	Picture enter, perform calling, structured JSON (apidog)
OpenAI – o3	200 Ok	$2.00	$0.50	$8.00	50 % Batch-API low cost obtainable (OpenAI, OpenAI Assist Heart)
GPT-4o	128 Ok	$5.00	$2.50	$20.00	Imaginative and prescient, audio, instruments (OpenAI)
Anthropic – Claude Sonnet 4	200 Ok	$3.00	$0.30	$15.00	50 % batch output low cost (Anthropic)
Claude Opus 4	200 Ok	$15.00	$1.50	$75.00	Excessive-accuracy flagship (Anthropic)
Google – Gemini 2.5 Professional	200 Ok (2× worth >200 Ok)	$1.25	$0.31	$10.00	75 % cache hit low cost (Google AI for Builders, Google Cloud)
Gemini 2.5 Flash	200 Ok	$0.30	$0.075	$2.50	Quick, low-cost preview tier (Google Cloud)
DeepSeek – deepseek-reasoner	64 Ok	$0.55 (miss) / $0.14 (hit)	$0.14	$2.19	50-75 % off-peak low cost (DeepSeek API Docs)

In contrast to its predecessor Grok 3, launched in February, which separated tool-augmented responses from normal reasoning, Grok 4 was skilled with instruments from the beginning.

The mannequin integrates capabilities similar to code execution, net search, and doc parsing. It additionally introduces Grok 4 Heavy, a multi-agent system the place a number of inside fashions work in parallel to generate and validate solutions.

Grok 4 additionally features a new voice mode that includes expressive outputs with lowered latency, and it helps textual content and picture enter, structured outputs, and performance calling.

Efficiency highlights

The unbiased AI mannequin evaluation and benchmarking group Artificial Evaluation acknowledged on X that xAI supplied it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.

On technical benchmarks, Grok 4 leads the Synthetic Evaluation Intelligence Index with a rating of 73, forward of opponents similar to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).

It additionally recorded prime scores in:

GPQA Diamond: 88%
ARC-AGI 2: 15.9%, double the second-best rating
Humanities Final Examination: 24% on the text-only model, and 44% with instruments
MMLU-Professional and AIME 2024: 87% and 94%, respectively
Coding and Math evaluations: Highest to this point on LiveCodeBench, SciCode, AIME24, and MATH-500

Regardless of its benchmark success, Grok 4’s output velocity stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless sooner than Anthropic’s Claude 4 Opus (66).

The mannequin includes a 256,000 token context window, which sits above the 200k context limits of o3 and Claude 4 Sonnet however under the 1 million tokens supplied by Gemini 2.5 Professional and GPT-4.1.

Actual world use instances

xAI supplied a number of demonstrations of Grok 4’s efficiency in utilized situations:

In a simulated enterprise process known as VendingBench, Grok 4 considerably outperformed different fashions in long-horizon monetary planning.
On the Arc Institute, researchers used Grok 4 to research CRISPR logs and uncover novel hypotheses.
In radiology, the mannequin interpreted chest X-rays with greater accuracy than main friends.
Within the monetary sector, its mixture of real-time information entry and reasoning made it appropriate for forecasting and evaluation.

The mannequin can even create 3D video video games with minimal enter by autonomously sourcing and integrating property. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from revealed analysis.

Reception and dialogue

Business response to the Grok 4 launch has been divided, mixing enthusiasm for its efficiency with criticism of the occasion’s supply and broader belief points.

David Shapiro, an AI energy consumer and author, famous: “Grok 4 now takes its place as ‘good sufficient to truly assist with frontier analysis’… however has merely caught up with OpenAI.”

Ethan Mollick, a professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents the place apparently unvetted modifications to the deployed system prompted a large-scale moral problem and an emergency rollback. I don’t suppose you are able to do a Grok 4 launch that doesn’t not less than deal with this actually, if consumer belief issues,” later including, “Grok 3 was an excellent mannequin, and Grok 4 may be wonderful however having an excellent mannequin just isn’t sufficient – there are lots of actually good fashions on the market. You really wish to belief the mannequin you’re constructing on.”

Ben Hyak, co-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is without doubt one of the worst issues I’ve ever watched in my life. Love y’all, nevertheless it’s unhealthy.”

Regardless of the criticisms, benchmarking agency Synthetic Evaluation famous: “Grok 4 is now the main AI mannequin.”

Ongoing belief points

The launch of Grok 4 comes amid renewed criticism over Grok’s prior conduct in client deployments, significantly as a chatbot built-in into Musk’s social community, X.

Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.

As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate management” at main studios and affect content material by “progressive ideologies,” and went on to rant about folks of Jewish surnames as becoming a “sample” of participating in “excessive leftist activism,” and suggesting Hitler knew “easy methods to deal with it decisively, each rattling time,” an obvious reference to the Holocaust.

The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we’re seeing from Grok LLM proper now’s irresponsible, harmful and antisemitic, plain and easy. This supercharging of extremist rhetoric will solely amplify and encourage the antisemitism that’s already surging on X and lots of different platforms.”

This incident follows a historical past of problematic Grok outputs, together with a Could 2025 case the place the Grok bot built-in into X randomly inserted references to a fully nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each of those two instances, xAI blamed the behaviors on anonymous staff and stated they have been being addressed.

Already, right this moment, customers of Grok 4 on the patron app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:

As I beforehand famous, Musk has overtly acknowledged on a number of events he wished to change Grok to higher mirror his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views may adversely impression customers and the companies constructing atop the Grok household of fashions.

My prior advice stays: For these within the enterprise attempting to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately greatest prevented. Fortunately, there are quite a few different options to select from.

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.