Allow me to introduce Laurent Giraid, a renowned technologist whose groundbreaking work in artificial intelligence has reshaped our understanding of machine learning and natural language processing. With a keen focus on the ethics of AI, Laurent has been at the forefront of exploring how reasoning models—a new generation of large language models—mimic human thought processes in fascinating ways. In this interview, we dive into the concept of the “cost of thinking,” the parallels between human and machine problem-solving, and the innovative methods behind measuring computational effort in AI. We also explore the surprising ways these models approach complex challenges and what this means for the future of intelligent systems.
Can you explain what reasoning models are and how they differ from earlier large language models?
Sure, reasoning models are a newer breed of large language models designed specifically to tackle complex tasks like math or intricate problem-solving, which older models often struggled with. Earlier systems relied heavily on recognizing language patterns to generate responses, but they’d often falter when faced with anything requiring step-by-step logic. Reasoning models, on the other hand, are trained to break problems down into smaller parts, working through them methodically, much like a human would. This shift in approach allows them to handle tasks that demand deeper thought, producing more accurate and reliable outcomes.
What challenges did older models face with tasks like math or complex reasoning, and how do reasoning models address these differently?
Older models had a tough time with math and complex reasoning because they were essentially guessing based on patterns in their training data rather than truly understanding the problem. For instance, they might spit out a plausible-looking answer to a math question that was completely wrong because they couldn’t perform actual calculations. Reasoning models address this by incorporating a stepwise process during training, often reinforced through rewards for correct answers. This allows them to explore the problem space, test different approaches, and arrive at solutions that are not just plausible but correct, even if it takes a bit longer.
Why do reasoning models need extra time to think through problems, and how does this compare to human thinking?
The extra time is necessary because reasoning models are simulating a deliberative process. They don’t just blurt out the first answer that comes to mind; they break the problem into manageable chunks, evaluate each step, and build toward a solution. This mirrors how humans often need time to think through a tough question—if you’re asked to solve a tricky puzzle on the spot, you’re likely to stumble unless you can mull it over. For these models, that extra time ensures they’re not just guessing but actually reasoning, which leads to better results.
How does breaking down a problem into steps help these models arrive at better answers, and what’s the trade-off with speed?
Breaking down a problem into steps allows the model to tackle each part systematically, reducing the chance of errors that come from trying to solve everything at once. It’s like working through a math equation: if you skip steps, you’re more likely to mess up. By focusing on smaller pieces, the model can check its logic along the way and adjust if needed. The trade-off, of course, is speed. This stepwise approach takes longer than the instant responses of older models, but since the answers are more accurate, most would argue it’s worth the wait.
Your research highlights a similarity between the ‘cost of thinking’ for reasoning models and humans. Can you explain what this term means?
The ‘cost of thinking’ refers to the effort or resources required to solve a problem. For humans, this might mean the time and mental energy we spend puzzling something out. For reasoning models, it’s more about the computational effort—how many internal steps or processes they go through to reach an answer. What’s striking is that the types of problems that demand the most effort from humans, like certain logic puzzles, also require the most computational effort from these models. It’s a fascinating overlap that suggests some shared principles in how we approach challenges, even if the underlying mechanisms are very different.
How did you measure this cost for humans compared to the models, and what did you find?
For humans, we measured the cost by tracking response times down to the millisecond—basically, how long it took someone to solve a problem. For the models, we couldn’t just use processing time since that depends on hardware. Instead, we looked at ‘tokens,’ which are like internal units of thought the model generates as it works through a problem. What we found was a remarkable alignment: the harder a problem was for humans, as shown by longer response times, the more tokens the model generated to solve it. This pattern held across different problem types, suggesting a parallel in how effort scales with complexity.
Can you share more about the types of problems you tested on both humans and reasoning models, and which ones were the toughest?
We tested a variety of problems, ranging from basic arithmetic to more abstract challenges like the ARC challenge, where you have to infer a transformation rule from pairs of colored grids and apply it to a new scenario. Both humans and models found arithmetic relatively easy, but the ARC challenge was by far the toughest. I think it’s because it requires a kind of intuitive reasoning and pattern recognition that’s not straightforward—it’s less about rote calculation and more about grasping an underlying concept, which demands a lot of mental or computational effort from both groups.
You used ‘tokens’ to measure the effort of reasoning models. Can you explain what tokens are in simple terms and why they’re useful?
Tokens are essentially pieces of information that a model generates internally as it thinks through a problem. Think of them as little building blocks of thought—words, numbers, or symbols that the model uses to talk to itself while working out a solution. They’re not meant for the user to see; they’re just part of the model’s internal process. By counting tokens, we get a sense of how much computational effort the model is putting in. It’s a more reliable way to measure effort than something like processing time, which can vary based on the computer’s speed rather than the model’s actual workload.
You’ve suggested that reasoning models might not use language to think, despite generating internal monologues. Can you unpack that idea for us?
Absolutely. While these models produce what look like internal monologues—strings of text or tokens as they reason through a problem—that doesn’t mean they’re thinking in language the way humans might. Often, the text they generate internally contains errors or nonsense, yet they still arrive at the correct answer. This suggests their actual thinking happens in a more abstract, non-linguistic space—a kind of conceptual framework that’s hard to pin down. It’s similar to how humans don’t always think in words; sometimes, we just have a gut sense or a mental image that guides us to a solution.
Looking ahead, what is your forecast for the future of reasoning models and their role in advancing our understanding of both AI and human cognition?
I’m really optimistic about where reasoning models are headed. I think they’ll continue to get better at handling even more complex tasks, potentially bridging gaps in areas like scientific discovery or personalized education by solving problems we haven’t even thought to ask yet. At the same time, studying these models will likely teach us a lot about human cognition—how we reason, where our blind spots are, and why we struggle with certain challenges. The convergence between human and machine thinking, even if it’s not by design, opens up exciting possibilities for collaboration between AI and neuroscience, potentially reshaping how we approach intelligence itself in the coming years.