Large Language Models are Reaching Their Limits

The End of the Scaling Era

When Sam Altman spoke at MIT in 2023 and said, "I think we're at the end of the era where it's going to be these, like, giant, giant models… We'll make them better in other ways" ^[1], he gave a clear sign of saturation.

For years, the field had followed a simple recipe: make models bigger, train them on more data, and watch their fascinating capabilities grow. And for a while, that approach worked remarkably well. But the evidence we've seen from 2022 to 2025 shows that this scaling strategy is now hitting its limits, what I call the 'diminishing point'.

This isn't speculation—the signs are everywhere. For instance, scaling laws, described as early as 2020, showed that after a certain tipping point, adding more parameters or more data produces smaller and smaller improvements ^[2]. Since then, benchmark performance has begun to flatten, with each new generation of large models delivering less dramatic leaps than the one before.

Training costs have soared from millions to tens of millions of dollars per model, heading toward billions. This makes brute-force scaling unsustainable for all but a few companies at the very top.

The Data Bottleneck

In my opinion, the bottleneck is no longer algorithms or compute—it is data. Models like GPT-5, Claude, and Gemini already train on trillions of tokens. A 2024 analysis by Epoch AI estimated that the world's supply of high-quality English text—everything from books and articles to curated websites—could be exhausted by 2027–2028 ^[3].

Interactive Visualization: Foundation models hitting the ceiling of available data, while domain expertise provides the path forward

"What remains is increasingly noisy, repetitive, or low-quality text. If you filter aggressively, you reduce the usable text. Either loosen the filters to include more data, but do so at the cost of quality (you might train junk data in), or keep quality high but run out of volume."

This creates a fundamental dilemma: we're running out of the high-quality data that has fueled the AI revolution. The internet is vast, but most of it is redundant, low-quality, or unsuitable for training sophisticated models.

Smaller Models, Smarter Approaches

And yet, smaller models are showing that size alone doesn't decide performance anymore. The Berkeley AI Lab showed that "small" models can use high-quality data to compensate for a lack of quantity. The Lab launched a 13 billion open model that was built on top of Meta's LLaMA, which competes with ChatGPT in terms of quality of result by learning from high-quality datasets ^[4].

Key Insight

The success of smaller, specialized models proves that the quality of training data and the sophistication of training approaches matter more than raw parameter count. This represents a fundamental shift in how we should think about AI development.

The Path Forward

So where does this leave us? I don't see an end to progress; rather, I see a turning point. The era of larger models is coming to an end. Now, success comes from shifting from "more" to "smarter."

Foundation models give us breadth: strong general knowledge and fluent reasoning as a baseline. But real-world excellence now comes from:

Fine-tuning with context-aware, expert-annotated datasets, with labels that encode source, intent, constraints, and rationale
Integration with retrieval and external tools to extend capabilities beyond what's stored in parameters
Smarter architectures like sparsity and Mixture-of-Experts that achieve more with less
Targeted training strategies that turn capable generalists into reliable specialists

"The lesson is clear: the future won't be won by the biggest model, but by the teams with the best, most contextually labeled data and the craft to use it."

References

S. Levy, "OpenAI's CEO Says the Age of Giant AI Models Is Already Over," WIRED, Apr. 13, 2023. [Online]. https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
Kaplan, J., S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu and D. Amodei (2020), 'Scaling Laws for Neural Language Models,' arXiv preprint arXiv:2001.08361.
P. Villalobos et al., "Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data," Epoch AI, Jun. 6, 2024. [Online]. https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
Geng, X., Gudibande, Liu H., Wallace E., Abbeel P., Levine S., and Song D. (2023), 'Koala: A Dialogue Model for Academic Research,' The Berkeley Artificial Intelligence Research Blog. https://perma.cc/9HUC-K9KC

Dr. Tamam Alsarhan

CEO & Founder at Nawwa AI

Leading the development of high-quality domain expert datasets and contextual annotation services across multiple industries.

Large Language Models (LLMs) are Reaching Their Limits