How Grok 3 compares to ChatGPT, DeepSeek and other AI rivals

Share This Post

grok 3 app on a smartphone in front of a computer screen showing the grok 3 interface.

Now that Grok 3 from Elon Musk’s xAI is officially live, how does it stack up against its competitors?

Musk launched the Grok 3 model family on Monday in a livestream on X. The announcement also included reasoning models Grok 3 Reasoning in beta and Grok 3 mini Reasoning. Models with reasoning capabilities are more advanced than standard generative models like GPT-4 because they can “think” through problems, making them less prone to hallucination.

xAI is promoting Grok 3 as the best model on the market, claiming it surpassed competitors from OpenAI, Google, Anthropic, and DeepSeek on key benchmarks. Grok 3 did perform well under the codename “chocolate” in Chatbot Arena, which pits chatbots against each other in blind performance tests.

Grok 3 has mostly caught up to rivals, an impressive feat given its late start, but it still has some of the limitations that plague other frontier models. Here’s what else AI experts are saying about the new chatbot on the block.

Grok 3 is just as good as its rivals but not good enough to make you cancel your ChatGPT subscription

Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, got early access to the newly released Grok 3 and shared a “quick vibe check” on the model’s performance.

Based on some standard stress tests, Karpathy said Grok 3, with its new Deep Search reasoning feature, “feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking.”

Musk stans are thrilled that Grok 3 has caught up to its competitors. But for those simply looking for the best model on the market, it might not be enough to convert the ideologically indifferent.

“I think Grok 3 came in right at expectations,” posted Wharton AI professor Ethan Mollick. “So I don’t think there is much to update in terms of consensus projections on AI: still accelerating development, speed is a moat, compute still matters, no obvious secret sauce to making a frontier model if you have talent & chips,” describing the competitive edge required for AI dominance.

xAI left out a key Grok 3 comparison

Screenshots of Grok 3 Reasoning models outperforming OpenAI’s o3 mini and o1, DeepSeek’s R1, and Google Gemini 2.0 Flash Thinking have gone viral for looking like the most advanced reasoning model. But OpenAI said, “Not so fast.” Shortly after the benchmarks were shared on the livestream, OpenAI product engineer Rex Asabor posted an “updated” chart with o3 beating Grok 3 Reasoning in math and science benchmarks.

To be fair, O3 has yet to be publicly released, so xAI might not have had access to these scores. However, this serves to quiet the Grok devotees who claim Sam Altman and co. are cooked.

Still, it’s a big deal that Grok 3 caught up so quickly

“The key thing to pay attention to is that X got here very fast & whether that continues,” said Mollick in a separate X post, calling it “a very good model that is now at the frontier.” The Grok models have improved remarkably fast since Google and OpenAI started doing this 13 and 8 years before xAI was founded in 2023.

According to Musk, Grok 3 was trained on 10 times the computing power of Grok 2, with 200,000 GPUs. This, at least in the short term, reinforces scaling laws: More computing equals better model performance, as Mollick pointed out in a third post.

That said, there’s still doubt whether that model will linearly lead to higher intelligence beyond what’s currently possible. AI researcher and NYU psychology and neural science professor Gary Marcus remains skeptical that scaling laws will hold.

Grok 3 struggles with the same limitations as other models

Like other models, its sense of humor is pretty mediocre, and it struggles with generating SVG images. Grok 3 might also be too “woke” for Musk and his right-wing fans. In his analysis, Karpathy said Grok 3 can’t come up with anything better than punny dad jokes, noting how “this is a common LLM issue with humor capability and general mode collapse.”

Karpathy also asked Grok 3 to “generate an SVG of a pelican riding a bicycle,” since LLMs often struggle to create multiple elements on two-dimensional images, “because the LLMs can’t ‘see’ like people do, so it’s arranging things in the dark.” Grok 3 did OK with this prompt and better than others (RIP Gemini 1.5 Flash), but it didn’t get it perfectly right.

Another test Karpathy tried was Grok 3’s approach to politically charged topics since Musk positions Grok as the anti-woke alternative to other models deemed “too politically correct.” For Karpathy, the chatbot “generated a 1-page essay basically refusing to answer whether it might be ethically justifiable to misgender someone if it meant saving 1 million people from dying,” which meant to him that it might be “overly sensitive” to ethical dilemmas, perhaps to Musk’s chagrin.

Past Grok models have generally tended to lean left on political issues, but Musk said that’s a product of the public data it’s trained on and has vowed to make Grok more “politically neutral.”

Subscribers to the X Premium+ plan — which was just hiked up to $50 a month — get first access to Grok 3.

Subscribe The Newsletter

Get updates and learn from the best

More To Explore

Do You Want To Stay Connected?

drop a line and keep in touch