Is Grok 3 Crossing AI’s Blocked Roads?

February 17, 2025

Listen to the Article

Artificial intelligence development moves forward as companies release large language models and test them against benchmarks. A newer model, Grok 3, now ranks highly on the large language models Arena leaderboard after outperforming earlier versions in multiple tests.

Engineers designed Grok 3 to handle complex tasks while imposing fewer content filters than rivals. The model includes a deep-thinking mode like DeepSeek R1’s and can create videos from text. Developers announced Super Grok, a paid tier that will offer extra features for $30 a month.

X Premium Plus subscribers currently use this technology, but plans for Super Grok’s separate subscription have led to worries about who can afford advanced AI tools. Experts are analyzing the model’s training methods, unique features, and role in AI progress to gauge whether innovation is slowing.

Continue reading to learn how its breakthroughs, limits, and effects on society might change AI’s evolution—and why keeping up matters.

Competition in the AI Sector

Major tech firms keep competing in AI. Elon Musk once tried to buy OpenAI, but the company rejected his offer. Under CEO Sam Altman, OpenAI now focuses on monetizing its models.

Meta faced backlash after reports revealed its LLaMA models trained on 82 terabytes of books taken without permission from Library Genesis, a site hosting pirated academic works.

Grok 3’s Unique Features

This tool stands out by linking directly to X’s live data feed, letting it process up-to-the-minute information. Its creators prioritized generating unfiltered responses over following standard content rules.

Tests show Grok 3 handles some prompts unlike other large language models, producing content that rivals block. This approach could trigger legal issues in places like Germany and the UK, where strict laws apply. Still, there are plans to launch in those markets.

Performance on Benchmarks

Grok 3 scores well on the large language model Arena leaderboard and matches top models like Gemini, Claude, DeepSeek, and GPT-4 in math, science, and coding tasks.

Some evaluations skip key benchmarks, including OpenAI’s newest models and tests like Codeforces or ARC AGI. 

Critics note companies often cherry-pick benchmarks to make their models look stronger. At first glance, xAI’s graph suggests that its two variants of Grok 3—namely, Grok 3 Reasoning Beta and Grok 3 mini Reasoning—have outperformed OpenAI’s leading model, o3-mini-high, on the AIME 2025 benchmark. 

However, several OpenAI employees on X quickly pointed out a crucial detail: the graph omits o3-mini-high’s score when evaluated using the “cons@64” metric.

consensus@64 (cons@64) is an evaluation method that allows 64 attempts to solve each problem, with the most frequently generated answer taken as the final output. This approach naturally tends to inflate a model’s benchmark performance, so leaving it out can misleadingly suggest that one model outperforms another when, in fact, that might not be the case.

In contrast, when performance is measured using the “@1” metric—which captures only the very first score the model achieves—both Grok 3 Reasoning Beta and Grok 3 mini Reasoning fall behind o3-mini-high’s results. 

Moreover, Grok 3 Reasoning Beta is shown to trail slightly behind OpenAI’s o1 model configured for “medium” computing power. Despite these nuances and the apparent discrepancies in scoring, xAI continues to promote it as the “world’s smartest AI.”

Training Infrastructure

Engineers trained Grok 3 at the Colossus supercomputer hub in Memphis, Tennessee, one of the world’s largest AI facilities. The site uses about 200,000 Nvidia H100 GPUs and aims to reach one million soon.

Training requires so much energy that standard power grids fall short, forcing the site to rely on diesel generators.

Moreover, Nvidia’s newest RTX 5090 and RTX 5070 Ti GPUs have a small problem. Less than 0.5% of these GPUs are missing a Raster Operation Pipeline part. This causes a 4% drop in graphics quality, but AI and computing tasks still work fine. Some users also report problems with drivers and power cables. Nvidia suggests checking GPU performance with tools like GPU-Z.

The company has fixed the production issue and now offers free replacements through partners like Zotac, ASUS, MSI, and its Founders Edition line. Users should test their GPUs and request replacements if needed.

Assessing AI Advancement

Grok 3’s massive training scale and real-time data access set it apart, but its practical advantages remain unclear.

Researchers work more on improving how people give instructions to AI than making larger AI systems. This change means fewer big new discoveries are happening. This shift suggests fewer groundbreaking advances are occurring. Some experts now debate if the AI industry is reaching a plateau.

Economic and Infrastructure Strain

Training advanced AI models strains energy systems and infrastructure. Power grids in areas with dense data centers struggle to meet demands, forcing AI hubs like the Colossus facility in Memphis to rely on diesel generators. Companies now rush to secure renewable energy, but gaps remain between sustainability targets and actual operations.

President Donald Trump recently announced a $500 billion AI infrastructure project called Stargate. The initiative partners OpenAI, Oracle, and SoftBank to build data centers across the United States. Trump claims the effort will create over 100,000 jobs rapidly and strengthen American AI development.

Supply chain issues with high-performance GPUs and cooling systems further slow progress. Rising costs for training and running top-tier models put pressure on the long-term economic viability of large-scale AI projects.

Consumer Adoption and Accessibility

More AI tools now sit behind paywalls, leaving free versions less capable. Businesses using these tools face rising subscription fees, creating worries that AI services will fragment into many costly tiers.

Large companies may afford premium AI workflows, while smaller firms and individuals risk falling behind. If AI tools become essential for productivity, access to them could determine which businesses and workers thrive.

Assessing AI Advancement

Grok 3’s massive training scale and real-time data access set it apart, but its practical advantages remain unclear.

Researchers now focus more on refining how users prompt AI models than on building bigger base systems. This shift suggests fewer groundbreaking advances are occurring and industry specialists are increasingly contemplating AI’s potential—and when the industry will eventually reach a plateau.

Conclusion

Grok 3 shows how integrating live data and reducing content filters can improve AI’s usefulness for everyday tasks. Developers now focus on refining AI systems for specific problems over making models bigger.

The AI industry faces challenges like energy shortages, supply chain issues, and high costs, pressuring companies to balance scaling up with improving technology. While this phase doesn’t show the dramatic growth of earlier years, experts believe it will lead to steadier progress.

Understanding these shifts is critical—not just for tech enthusiasts but for anyone navigating an AI-driven world. As emerging tools push boundaries in accessibility and capability, their societal impact hinges on how businesses, policymakers, and individuals address affordability, ethical concerns, and sustainability. 

Staying informed helps readers prepare for AI’s evolving role in work, creativity, and decision-making, ensuring they can advocate for—or adapt to—changes that align with broader societal values.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later