Since their public debut in 2022, large language models have become central to how people search for information. Instead of sifting through dozens of links, users can now receive neatly composed summaries synthesized from across the internet. The experience is fast, fluent, and convenient - but new experimental research from the Wharton School of the University of Pennsylvania and New Mexico State University suggests that this convenience may come at a cognitive cost.
In their paper, Experimental Evidence of the Effects of Large Language Models versus Web Search on Depth of Learning, Shiri Melumad and Jin Ho Yun propose that when people rely on AI-generated syntheses instead of traditional web links, they may learn less deeply. Across seven experiments involving over 10,400 participants, the researchers found consistent evidence that people who used LLM summaries developed shallower, less original knowledge structures compared with those who searched and integrated information themselves.
The central idea behind their work is straightforward but profound: effort matters for learning. Traditional web search forces users to explore, compare, and assemble pieces of information from different sources. This process - called search-as-learning in cognitive science - promotes "sensemaking," where the learner gradually builds internal models of understanding. In contrast, LLMs perform much of that synthesis automatically, creating the illusion of completeness while skipping the steps that help cement knowledge.
To test this, the researchers first asked over a thousand participants to research a simple question - how to plant a vegetable garden - using either ChatGPT or Google Search. Those who used ChatGPT spent less time exploring and later reported that they learned fewer new things, felt less personal ownership of the knowledge they gained, and produced advice that was shorter, less factual, and more generic. Linguistic analysis using natural language processing confirmed these patterns: their written advice contained fewer unique terms, lower factual density, and less originality in phrasing.
In a second experiment, the team ensured that both groups saw identical information - the only difference was format. One group received the data as a ChatGPT-style summary; the other browsed it as a set of clickable web articles. Even with identical content, participants who read the AI summary felt they learned less and produced less detailed and original advice. In short, how information is presented - not just what is presented - shapes the depth of learning.
A third experiment, conducted in a university lab, replicated the findings using Google's "AI Overview" - an integrated LLM summary that appears atop standard search results. Again, participants who learned through the AI summary reported lower comprehension and less investment in the task. Their written responses were shorter, referenced fewer facts, and showed lower semantic uniqueness, meaning they used more similar language to others in the same condition.
The final test extended the research to the real-world question of influence. In a follow-up study, a new group of participants - unaware of which platform produced which advice - rated the helpfulness, informativeness, and trustworthiness of the texts written by earlier participants. The result was clear: advice derived from LLM-based learning was consistently judged as less informative, less trustworthy, and less likely to be adopted.
Together, the seven experiments converge on a striking insight: AI summarization may make us feel smarter while quietly flattening our understanding. The researchers emphasize that this does not mean LLMs are harmful or incapable of supporting learning. Rather, it suggests that active engagement - the process of discovery and synthesis - remains essential for developing the kind of deep, flexible knowledge that can be applied creatively.
From an educational standpoint, these results resonate with what psychologists call desirable difficulty: learning that requires more cognitive effort often produces better long-term retention. Friction, in this sense, is not a flaw but a feature - a mechanism that forces the mind to process more deeply.
Melumad and Yun caution, however, that their findings apply primarily to conceptual learning - cases where understanding and originality matter - rather than to rote fact retrieval. If one simply needs a date, formula, or definition, an AI summary may suffice or even outperform manual search. But when the goal is to form nuanced insights or creative perspectives, skipping the "work of learning" can lead to intellectual shallowness.
The authors note that this effect persisted even when LLMs provided real-time web links and that participants rarely clicked them. Once the synthesis is presented as complete, curiosity appears to diminish. This may signal a broader shift in human learning behaviors: when the interface reduces uncertainty, it also reduces exploration - the engine of deep cognition.
From the standpoint of Seven Reflections' Dimensional Systems Architecture (DSA) framework, this phenomenon can be understood as a loss of recursive feedback within the cognitive field. Traditional learning engages multiple feedback loops - perception, evaluation, synthesis, revision - that sustain depth and originality. When an LLM collapses these loops into a single, linear output, the process of learning loses its self-correcting structure. In DSA terms, the system ceases to oscillate and begins to merely receive.
This insight reframes the question from "Is AI making us less intelligent" - to "What happens to cognition when the feedback architecture of learning is outsourced - As Seven Reflections' DSA framework suggests, depth of thought depends not on data quantity but on the structural recursion of awareness itself. The future of human learning, then, may hinge on whether we choose to keep that loop open.