Your content doesn’t need another tool — It needs intelligence that shows up

Your content doesn’t need another tool — It needs intelligence that shows up

Your content doesn’t need another tool — It needs intelligence that shows up

This is a long piece — around 17,000 words tracing a 25-year paradigm shift. If you’d rather start with the shape of the argument before committing to the full read, the podcast episode covers the key ideas in about 20 minutes, and the video presentation walks through the framework. Both are at the bottom of this post. If they catch your interest, the deep dive will be here when you’re ready.

Every tool you use demands the same thing: bring your content here, reshape it to fit, and maintain it alongside all the other copies you’re already juggling. We lose eighteen working days a year to this ritual. But a 25-year paradigm shift — from responsive design through Spotify to the protocol layer now dissolving the walls between AI systems — reveals that the problem was never the tools. It was the direction of travel. This post traces the inversion, proposes a practical framework for it, and confronts the risks of getting it wrong.

Don’t breathe on it

I remember the sensation before I remember the decade.
Early 1990s, Gothenburg. I was prototyping a medicine tablet package for a client competition — black-and-white drawings coloured with airbrush, a transparent film folded to look like the real thing. The film was too thick. It kept springing back flat. I worked two days without sleep. On presentation day, I sat in the room next door, frantically taping the film into place while my colleagues stalled the client for an hour and a half. Every time it held, another edge lifted. So sensitive you could barely breathe on it.

That feeling — the medium fighting back against everything you’re trying to make it do — never left me. It just moved.

It moved to the web. Transparent GIFs in HTML tables. Pixel-perfect layouts that collapsed if a single image failed to load. Netscape and Explorer rendering the same page differently. The same fragility, different material.

In 2000, John Allsopp named what I’d been feeling. His essay “A Dao of Web Design” argued that the web’s nature is fluidity, not fixity. We were treating it like a printed page. “The control which designers know in the print medium,” he wrote, “is simply a function of the limitation of the printed page.” The web doesn’t have those limitations. Stop fighting the water.A decade later, Ethan Marcotte turned Allsopp’s philosophy into technique — responsive web design. The layout adapts to the screen. The first wave of letting go.

But the deeper fragility didn’t go away. It moved somewhere harder to see.

Think about your Tuesday afternoon. You have an idea during a meeting. You type it into a notes app. Later, you copy it into a document to develop it. You find useful research and save it somewhere else. You share a summary in Slack. Colleagues respond. You go back to the document, update it, hunt for the research you saved in a different tool, copy the relevant parts back.

The same idea now exists in four places. None of them is the “real” one. And you’ve spent twenty minutes not on the idea — on the logistics of carrying it between tools.

This is the modern version of breathing on the medicine package. The assumption baked into every piece of software: your content must come to the tool. Notion wants your thoughts in Notion’s structure. Confluence wants your documents shaped to its model. Every tool is a destination that requires your content to arrive, reshape itself, and stay.

Research suggests knowledge workers lose 144 hours per year — eighteen full working days — to rebuilding context after switching between tools. Not doing the work. Re-establishing where they were so they can start doing the work.

And now we’ve built an entire cottage industry to manage this fragility. “Second brains” — Notion databases, Obsidian vaults, Roam graphs. The pitch: one system to rule them all. But the second brain isn’t a solution. It’s a symptom. It exists because none of our tools talk to each other, so we build a meta-tool to manage the other tools. We add orchestration on top of orchestration.

I call this the Duplication Tax — every copy, every reformat, every manual sync. A gentle, persistent drain that accumulates until eighteen working days are gone and you can’t say where they went. I’ve written separately about what I call the Orchestration Load Framework — a way to name and measure the different cognitive loads that tool-switching places on us: the cost of learning each tool’s logic, the cost of maintaining awareness across systems, the cost of context lost in every handoff. The Duplication Tax is what those loads feel like in aggregate. The reason we can’t fix it is that we’ve never had language for it.

Same fragility that fought me in Gothenburg. Deeper layer. Invisible because we’ve been living inside it so long we think it’s normal. But a pattern has been quietly building over the past twenty-five years that suggests it doesn’t have to be.

The four waves of unbinding

The Duplication Tax isn’t a design failure. It’s a paradigm — content goes to tools. That’s just how it works. Except it’s not. Over the past twenty-five years, the relationship between content and its container has been inverting in waves — each one removing a binding that the previous wave took for granted.

Wave 1: The screen unbound (2000–2010)

Before responsive design, web layouts were fixed — typically 960 pixels wide. Anyone on a different device got a broken experience or a separate “mobile site.” Allsopp’s philosophy and Marcotte’s technique changed that: fluid grids, flexible images, media queries. The interface adapted to the device rather than the other way around.

What was unbound: the screen.

But the content still lived inside the website. The layout was fluid; the destination was fixed.

Wave 2: The device unbound (2010s–)

Spotify didn’t just digitise music — it made the service follow you. You’re at the desk in a team meeting, listening on the computer. You walk away from the office — the mobile phone picks up. You get in the car — the car audio takes over. The service was never interrupted. You continue independently in whatever context you’re in. The service follows the user.
Teams did the same for communication: start a call on the laptop, walk out the door, the call transfers to your phone. Get in the car — the car equipment takes over. The session follows the person, not the hardware.

In 2017, I stood in front of my department and tried to articulate this shift. The framework I used came from Fjord (the futures research agency, later acquired by Accenture) — three approaches to how services relate to devices. I called them the Three Cs in my presentation:

  • Consistent: Same content across devices. Responsive design — one website fits all screens.
  • Continuous: The service flows between devices. Not replicated — taken over. Spotify. Teams. Start here, continue there.
  • Complementary: Different devices play different roles. Your phone authenticates your desktop banking. Each device contributes its strength without duplicating the others.

I ended with questions: “How will tomorrow’s multi-device ecosystem affect our roles? Do we need new methods, tools, ways of working?” I didn’t realise those weren’t just questions about devices. They were questions about a deeper structural shift I couldn’t yet see.

What was unbound: the device.

Wave 3: The ecosystem unbound (2024–)

In March 2025, OpenAI deprecated their proprietary Assistants API and adopted Anthropic’s Model Context Protocol. Then both companies donated MCP to the Linux Foundation.

Why abandon your own framework for a competitor’s open standard? Because MCP solves the N×M connectivity problem. Twenty AI models and fifty data sources would require a thousand custom integrations. MCP requires seventy standard connections. “USB-C for AI” — any intelligence connects to any data source without bespoke wiring.
Over 16,000 MCP servers. 97 million SDK downloads per month. And in Q1 2026, $285 billion wiped from software valuations. The “SaaSpocalypse.” The walled-garden model — build a tool, trap content inside it, charge rent — is cracking. Software is shifting from destination to substrate.

What’s unbound: the tool ecosystem.

Wave 4: The Application Unbound (Emerging)

The operating system paradigm is shifting: procedural (you operate the tool), to intent-based (you state a goal), to ambient (intelligence is simply present wherever content lives). No application to open. No interface to learn.

Wave 1 freed the layout from the screen. Wave 2 freed the service from the device. Wave 3 freed intelligence from the tool. Wave 4 frees the user from the application itself. Each wave is the same structural signature applied deeper — and each makes the previous one look like an optimisation of the old binding rather than a true inversion.

What’s unbound: the application itself.

But here’s what the four waves reveal if you look at them together — not at what they freed, but at what they exposed.

Each wave removed a binding: the screen, the device, the ecosystem, the application. And each time, the same thing became visible underneath: the real work was never inside the tool. It was in the space between them. The handover. The context that had to be rebuilt every time you left one environment and arrived at another. The meaning that got lost in translation.

We’ve been staring at the tools — measuring them, optimising them, comparing their features — while the actual cost accumulated in the transitions. The Tuesday afternoon from Part 1, the 144 hours per year, the Duplication Tax — none of that happens inside Notion or Confluence or Jira. It happens in the gaps. When you leave one tool and arrive at another. When you reconstruct what you were thinking. When you reformat something that was already formatted perfectly well somewhere else.

Organisations measure what happens inside tools. They count tasks completed in Jira, documents created in SharePoint, messages sent in Slack. But nobody measures what happens between them — the ambient contextual work of carrying meaning across boundaries. And that, it turns out, is where most of the cognitive cost lives.

The four waves have been progressively revealing this. Responsive design showed that the layout wasn’t the point — the content was. Service mobility showed that the device wasn’t the point — the session was. Protocol interoperability is showing that the tool isn’t the point — the context is. And ambient intelligence will show that the application isn’t the point — the work is.

The same Three Cs that described how services relate to devices in 2017 now describe how intelligence relates to content. Consistent, Continuous, Complementary — just at a deeper layer. In Wave 2, the service followed the user. Now the question is broader: can intelligence move to where the content already lives — and carry back the results?

That question leads somewhere specific. And to get there, we need to reverse the direction of travel.

The inversion

Let me start with what it looks like in practice, because this is where the shift from tools to context becomes concrete.

I use Google NotebookLM as a source library — documents, research, notes organised by topic. Alongside it, a conversational AI connects to the Notebook as a source. When I ask the AI a question, it reaches into the Notebook, finds relevant material, synthesises it, and responds. No copying. No reformatting. No migration.

But here’s what makes this more than a clever integration: it’s not just the sources from NotebookLM that become available. It’s the functionality. The Notebook’s ability to structure knowledge, to surface connections between documents, to generate audio overviews — those capabilities are accessible from the conversational AI without rebuilding them. The AI doesn’t just read the Notebook’s data. It uses the Notebook’s strengths. Each tool operates in its own environment, contributing what it does best, while intelligence moves between them and carries the results back.

Now compare that with the old version of the same task. Gather research in one tool. Copy the relevant pieces into the AI chat. Get a response. Copy the response back. Need more context — go back to the research tool. Copy more. Paste again. I’m the integration layer. The human middleware carrying meaning between systems that can’t talk to each other.

The difference between these two workflows isn’t speed. It’s the direction of travel.

In the old workflow, content travels to where intelligence is. I carry my material to the tool, the tool processes it, I carry the result back. The context moves; the tool stays still.

In the new workflow, intelligence travels to where content is. My material stays where it lives. The AI arrives, does its work within that context, and the output stays alongside my existing files. The content stays still; the intelligence moves.

This reversal of direction is what I mean by the inversion. For decades, the default has been: content goes to tools. The inversion says: intelligence comes to content. It’s not a new feature added to existing tools. It’s a structural change in which direction things flow.

And the reason it matters — the reason it’s not just a technical rearrangement — goes back to what the four waves exposed. If the real work is in the context between tools, then every time you force content to travel to a tool, you’re creating that in-between work. You’re generating the handovers, the reformatting, the context loss. The old direction of travel creates the Duplication Tax. The inversion eliminates it — not by building a better tool, but by removing the need for content to travel at all.

This is the distinction that makes everything else in this post concrete: structural versus ad hoc.

When intelligence shares space with content — when the AI is present in the environment where your work already lives — that’s structural. The logic and the material coexist. There’s no gap between them for context to get lost in.
When you carry content to intelligence and carry results back — copy-pasting into ChatGPT, exporting to an analytics tool, uploading to a separate platform — that’s ad hoc. It might be faster than doing the work manually. But the architecture is the same as it ever was. Content migrates to the tool. The Duplication Tax applies. And the real work — the contextual, ambient work of carrying meaning across boundaries — remains entirely on you.

So how do you tell the difference? And more importantly — how do you move from ad hoc to structural?

The core test

At every decision point, one question clarifies everything:
Does this make AI structural — intelligence sharing space with content — or does it create a faster ad hoc workflow where content still travels to intelligence?

If the answer is ad hoc — if users still need to carry content to intelligence and carry results back — you haven’t inverted. You’ve optimised the old paradigm. You’ve built a better transparent GIF.

This is harder to apply than it sounds, because ad hoc can feel like progress. A team that uses ChatGPT to draft emails is getting real value. A designer who generates variations in Midjourney is working faster. A developer who pastes code into an AI for review is catching bugs earlier. None of these are bad. But none of them are structural. The content still goes to the tool. The value evaporates the moment the user stops manually carrying things between systems.

Structural integration looks different. It looks like intelligence already being present when you open the document. It looks like your file system being legible to an AI that can act within it. It looks like an agent that reads your project folder, understands the context, does its work, and saves the output alongside your existing files — without you ever leaving the environment where you were already thinking.

How to think about the inversion

If the paradigm shift is real — and the evidence across four waves suggests it is — then how do you actually make it happen? Not in the abstract, but in your team, your organisation, your Tuesday afternoon?

I’ve been trying out a practical framework for this, and it starts not with AI but with something much less glamorous: an inventory of what’s already connected to what. What surprised me was that the first useful thing wasn’t a new idea — it was looking at what was already there with different eyes.

1. Start with the integration landscape.

Before touching user journeys or workflows, map the existing integrations in your system. Most organisations have accumulated them over years — some essential, some legacy, some existing purely because content had to migrate between tools.

For each integration, ask: is this structural (these systems genuinely need to exchange data for a real business function), migration (this exists because content had to be moved to fit the tool’s data model), or synchronisation (this keeps the same content consistent across multiple tools that each maintain their own copy)?

Migration integrations are your primary inversion candidates. If intelligence can reach the content where it lives, the migration becomes unnecessary. Synchronisation integrations are secondary candidates — if there’s a single source of truth and intelligence can access it from anywhere, the sync layer dissolves. Structural integrations stay. And they become simpler, because they’re no longer carrying migration and sync overhead.

The validation is simple: does removing migration and sync integrations reduce overall system complexity? If yes, proceed. If the proposed change adds complexity on top of existing integrations, stop and reassess. The goal of the inversion is to reduce orchestration, not add another layer of it. Once I’d done this for my own setup, the next question was obvious: if these integrations exist because content has to travel, where exactly does it travel?

2. Then trace the context path.

This is where service design earns its place. Use user journeys and service blueprints — the same tools we’ve used for years — but with a specific analytical lens. Remember: the service needs to follow the user’s context. At every touchpoint, ask: where does content move? Why does it move? What transforms when it moves? What’s duplicated? What breaks?

Don’t map at the task level — “the user creates a report.” Map at the activity level: the user gathers data in Tool A, copies it to Tool B, reformats it to fit Tool B’s structure, adds analysis in Tool C, exports to Tool D for review, receives feedback in Tool E, returns to Tool B to update. Each handoff is a migration point. Each migration point is a candidate for inversion. Each carries a cognitive cost that nobody is measuring but everyone is feeling.

Connect the migration points into a context path — the route content takes through the user’s actual process. Where does the path stay within a single context? That’s low friction. Where does it cross between contexts? That’s high friction. The high-friction crossings are your inversion priorities. This is where my 2017 presentation came back to me. The Three Cs I’d borrowed from Fjord to describe how services relate to devices — they turned out to describe something much broader.

3. Then classify what kind of inversion is needed.

This is where the Three Cs come back — not as a device strategy this time, but as an AI integration strategy.

  • Consistent: The same content needs to appear across multiple channels or surfaces. Currently it’s duplicated and reformatted for each one. In the inverted state, content stays structured in one place and intelligence presents it appropriately for each context. Think: product information maintained once, rendered differently on a website, a mobile app, an internal dashboard, and a partner portal.
  • Continuous: A process flows across devices, contexts, or time, and the user needs to pick up where they left off. Currently, users manually re-establish context when switching. In the inverted state, the session and context persist and transfer automatically. Think: starting a document review on the desktop, continuing annotation on a tablet during the commute, finalising approval on the phone. Same process, same state, different devices.
  • Complementary: Different tools contribute their specific strengths to the same workflow without duplication. Currently, users import and export between tools, maintaining parallel copies. In the inverted state, tools are linked by their capabilities, each operating in its own environment but connected. Think: a knowledge base connected to a conversational AI connected to a presentation tool — each doing what it does best, linked rather than merged.

Each C demands different technical architecture, different governance, and different design. Misclassifying a Complementary need as Consistent — trying to put everything in one place — recreates the monolithic problem. Misclassifying a Consistent need as Complementary — linking separate systems for content that should simply be unified — creates unnecessary complexity. Getting the classification right was the step that took me longest. Once I had it, the design question became surprisingly concrete.

4. Then design the inversion.

For each migration point that passes the checks, think in four layers:

  • What stays — the user’s content structure, naming conventions, organisational logic. This is the anchor. In the inverted paradigm, the user’s own structure IS the architecture. The AI adapts to it, not the reverse.
  • What arrives — the intelligence, the capability, the processing that currently requires a tool visit. In the inverted state, this arrives at the content’s location. Via an agent, an MCP connection, an embedded capability.
  • What connects — the protocol layer that makes the arrival possible. MCP, APIs, agent frameworks, authentication.
  • What’s governed — security, audit trails, data governance, compliance. Every connection is a surface. Every surface needs accountability.

I can hear the enterprise architects reading this and thinking: it’s not that simple. And they’re right — decades of integration architecture have taught them how tangled these systems get. But that’s precisely the point. The outcome of this assessment isn’t adding another integration layer. It’s removing complexity. Every migration integration you eliminate is a connection that no longer needs maintaining, monitoring, or debugging. Every sync layer that dissolves is a source of truth that no longer conflicts with itself. The inversion succeeds when the system gets simpler, not when it gets more connected. If that framing resonates with the architects, the rest of the conversation gets much easier.

The maturity question

This isn’t an overnight transformation. It’s a spectrum, and most organisations will operate across multiple levels simultaneously.

At the most basic level — where most organisations are today — AI is an external consultation. Workers copy-paste to ChatGPT. Shadow AI runs on personal subscriptions. Intelligence is disconnected from content. This is Level 0: ad hoc.

One step up, some tools are linked. Intelligence can read content from select sources, but content still primarily lives within tool-specific ecosystems. Integration is partial. Level 1: connected.

Further along, content has a clear sovereign location. Intelligence arrives at content rather than content migrating to tools. Migration integrations are being eliminated. Level 2: context-first.

At the far end — and this is where things get philosophical — intelligence is present wherever content lives. The system anticipates needs. No application to visit. The user’s structure is the architecture. Purpose-built tools remain where governance requires them, connected to the ambient layer. Level 3: ambient.

The important insight is that Level 3 is not the goal for everything. Some processes should stay at Level 0 — deliberately purpose-built, because the domain demands constraints, expertise development, or regulatory compliance. The inversion framework isn’t a mandate to invert everything. It’s a lens for identifying where the inversion creates value and where it would destroy it. Which brings us to the honest objections.

The hollowed mind and other honest objections

I want to take the counterarguments seriously. Not as “things people who don’t understand will say,” but as genuine structural problems that limit where and how the inversion can be applied. Because if the paradigm is as powerful as I’ve been arguing, it’s also powerful enough to cause serious damage if applied without thinking.

The security surface

Every connection is an attack vector. When intelligence can reach into content wherever it lives — across file systems, databases, APIs, and services — the security surface expands dramatically. Prompt injection attacks have increased by 540 per cent since agentic AI architectures became mainstream. An MCP server that gives an AI agent access to your file system also gives any compromised prompt a path to your file system.

This isn’t a problem that will be solved by better passwords. The architecture itself creates a new category of vulnerability: one where the attack surface grows with every integration you add. The more connected the system, the more exposed it becomes.

The governance gap

In the old paradigm — content goes to tool — at least the tool could enforce rules. Documents in SharePoint inherit SharePoint’s permissions. Data in a regulated database is governed by that database’s audit trail. When intelligence reaches into content across systems, the question becomes: whose rules apply?

If an AI agent reads a document from System A, combines it with data from System B, and generates output in System C — which system’s governance applies to the output? Who audits the reasoning? Where’s the accountability trail? Current agentic architectures have what researchers call the “ephemeral identity problem”: the agent acts on behalf of the user but doesn’t have a persistent identity in any of the systems it touches. It’s a ghost in the governance framework.

For industries with regulatory requirements — healthcare, finance, legal — this isn’t a philosophical concern. It’s a compliance barrier. And it’s one reason why some processes should remain purpose-built: not because the inversion wouldn’t work technically, but because the accountability architecture doesn’t yet exist.

The quality floor

Here’s an uncomfortable fact about large language models: calibrated models must hallucinate. This isn’t a bug that will be fixed. It’s a mathematical property of how probability distributions work in neural networks. A model that is well-calibrated — meaning it’s honest about its uncertainty — will necessarily generate some outputs that are wrong, because perfect calibration means occasionally saying high-confidence things that happen to be false.

When AI is an external consultation — ad hoc, copy-paste — the user is the quality filter. You read the output, evaluate it, decide whether to use it. But when intelligence is ambient, present everywhere, integrated into the flow of content — who filters? If the AI’s output is saved alongside your files, mixed into your context, used as input for the next interaction, errors propagate invisibly. The quality floor isn’t the AI’s accuracy rate. It’s the compounding effect of small errors across an integrated system.

The hollowed mind

This objection is like a small stone in your shoe. You know it’s there. You know you should stop and deal with it. But you keep walking because everything else is moving so fast.

When tools require you to learn their logic — their data model, their constraints — that learning curve isn’t just friction. It’s education. A designer who masters Figma develops spatial reasoning. An analyst who learns SQL develops data modelling intuition. If intelligence handles the tool-work, what happens to that learning?

The OECD published data showing that 80 per cent of students who use AI writing tools cannot independently recall what they wrote. Not the phrasing — the ideas. The essay gets submitted, the grade comes back. But the thinking that was supposed to develop in the process? Hollow. The performance was rented, not earned.

But here’s where I think the objection deserves more nuance than it usually gets. We might be judging with old-paradigm criteria. The old paradigm valued deep mastery of specific tools and techniques because that’s how work got done — you had to know the tool to produce the output. In the inverted paradigm, the relationship between knowledge and work shifts.

What I need in my own practice isn’t deep mastery of every system I touch. It’s awareness — understanding how the systems work in principle, at the level that’s relevant to my work. Knowing enough to judge, to validate, to steer. The concern isn’t losing tool-specific skills. It’s losing the capacity for reasoning, for problem-solving, for the kind of thinking that underpins all the tools. That’s the real “hollowed mind” — not the loss of specific competencies, but the erosion of the cognitive foundation those competencies were built on.

And here the data is concerning. AI-assisted teams complete tasks faster, but the quality of reasoning doesn’t scale with the speed. Weekly metrics improve. Quarterly innovation doesn’t. The thing that degrades — the slow accumulation of judgement and expertise — is exactly the kind of thing that doesn’t show up in a dashboard. If knowledge isn’t practised in experience, the investment in deep learning becomes hollow. The question is how much depth you need, and in what — and that calculation is different in a world where intelligence can arrive at your content.

The sovereignty trap

The final objection is perhaps the most subtle. If intelligence is truly ambient — present wherever content lives, anticipating needs, acting proactively — then who is directing whom?

The paradigm promises cognitive sovereignty: your structure remains the architecture, intelligence adapts to you. But ambient systems have a way of shaping the context they inhabit. Recommendation algorithms were supposed to help you find what you wanted. Instead, they shaped what you wanted. Social media platforms were supposed to connect you with your friends. Instead, they redefined what friendship looks like.

When intelligence follows content — when it’s always there, always helpful, always suggesting the next step — the question isn’t whether it’s useful. The question is whether your choices are still your own, or whether you’re navigating a landscape that the intelligence has quietly reshaped around you.

None of these objections invalidate the inversion. But they define its boundaries.

The security surface means the connection layer needs genuine architectural attention. The governance gap means some domains need purpose-built tools with explicit accountability. The quality floor means humans must remain in evaluation loops, not just execution loops. The hollowed mind — the erosion of the cognitive foundation beneath our tools, the thinking and reasoning capacity that no AI can rent back to us — means we need to pay deliberate attention to cognitive sovereignty: the ability to think, judge, and decide independently of the systems we use. Not as a vague aspiration, but as something we actively design for — with structured thinking modes, with ways of measuring what’s being gained and what’s being lost, with systems that keep the human cognitively engaged even when the AI could do it all. And the sovereignty trap means the inverted paradigm needs something that pure ambient intelligence doesn’t naturally provide.

It needs friction. But not the old friction — not the duplication tax, not the context-switching. A different kind.

Designed friction and the question that remains

Let me return to the water metaphor one last time.

Allsopp was right in 2000: the web’s nature is fluidity. Fighting it creates fragility. But water without banks isn’t a river — it’s a flood. The strength of water isn’t that it flows everywhere. It’s that it flows between things. The banks give the river its direction, its force, its usefulness. Without them, you just have a swamp.

The inversion I’ve been describing — intelligence following content, the four waves of unbinding, the dissolution of the tool-as-destination — is the water finding its natural flow. After twenty-five years of trying to build rigid containers for something that wants to be fluid, we’re finally beginning to let go. MCP is the riverbed. Ambient intelligence is the current. The content stays where it is, and intelligence flows to it.

But the honest objections in Part 4 are the banks.
If we let intelligence flow without any friction — without any point where the human must stop, evaluate, decide, reckon with the material — we get the hollowed mind. We get the productivity mirage. We get the sovereignty trap. We get fast water and no direction.

What the inverted paradigm needs isn’t the old friction. Not the friction of carrying content between tools, of rebuilding context, of learning seventeen different interfaces for what is essentially the same task. That was waste friction — the kind that drains your eighteen working days and leaves nothing behind. The inversion rightly eliminates it.

What it needs is designed friction.

Deliberate, intentional moments where the human must engage cognitively with the material. Not because the system is badly designed, but because the engagement itself is the value. Moments where you evaluate the AI’s output rather than accepting it. Where you make a decision rather than following a suggestion. Where you structure your own thinking rather than letting the ambient system structure it for you. Where the tool asks you a question instead of providing an answer.

This is a design challenge, not a technology challenge. And it’s the design challenge of the next decade.

Because here’s the paradox at the heart of the inverted paradigm: the same unbinding that frees us from waste friction also removes the incidental friction that was quietly training us. The struggle with the tool that taught us the skill. The reformatting that forced us to re-engage with the material. The context switch that made us notice what we’d been taking for granted. Some of that friction was productive — not because it was well-designed, but because it was there. Learning happened in the cracks.

In the ambient world, the cracks close. Intelligence is seamless, present, anticipatory. The question is whether we can design new cracks — intentional ones, productive ones — that preserve human agency within a system that’s optimised to remove the need for it.

I don’t have a complete answer to this. But I’ve been building toward one.

Over the past two years, this question — how do you keep humans cognitively sovereign inside ambient intelligence? — has become the central thread of my work. It’s led me to develop structured thinking modes for navigating complex problems, ways to measure the hidden orchestration costs that the old paradigm made invisible, and methods for identifying what goes missing when we design systems without accounting for human behaviour. Not as abstract theory, but as practical tools I use every day — and that I’m building into something larger.

The specifics of that work are for future posts. But the foundation is what this post has been about: understanding why the paradigm is inverting, seeing that the shift is from tools to context, recognising that the inversion eliminates waste friction but doesn’t automatically preserve the cognitive engagement we need. The designed friction isn’t an afterthought. It’s the entire design challenge that follows from the inversion.

When production is free, judgment is the work

When the AI can generate the output in seconds, the value shifts from production to judgement. And judgement requires friction. Not the accidental friction of broken tools, but the designed friction of intentional cognitive engagement.

But the shift goes deeper than production versus judgement. What’s actually changing is the direction of focus itself. We’ve built our entire working culture around output — measuring it, optimising it, celebrating it. AI accelerated that focus: faster drafts, more variations, instant results. But when output becomes abundant, it stops being the scarce resource. What becomes scarce is the quality of what goes in. The structuring of sources. The governance of what enters the system. The judgement about which inputs matter and how they should flow through the process. The paradigm isn’t just inverting from tools to context — it’s inverting from output focus to input focus. And our role shifts with it: from ensuring production quality to ensuring input quality, from making things to judging what goes in and how it progresses through to the result. That’s the shift nobody’s talking about yet.

The paradigm shift is happening regardless of whether we name it or not. The data is there. The adoption curves are there. The $285 billion in evaporated software valuations is there. Content used to go to tools. Intelligence is coming to content. The four waves have been building toward this for twenty-five years, and the direction isn’t going to reverse.

But how we inhabit it — that’s the design challenge. Not the technology, not the protocols, not the architecture. The question is whether we can build the banks that give the river its direction: the deliberate cognitive engagement that keeps human agency intact inside the flow.

In Gothenburg in the 1990s, I learned what happens when you fight the medium. The medium wins. The paradigm that Allsopp named and Marcotte codified has been unbinding content from its containers for a quarter of a century, and we’re finally approaching the last binding — the application itself.

We can stop breathing carefully around our workflows. That fragility is ending. But what comes next requires something we’ve never had to design before: the right friction, in the right places, to ensure that the intelligence arriving at our content actually helps us think better — rather than quietly making our own thinking unnecessary.

Disclaimer

AI-assisted content: This post was researched and co-authored with Claude (Anthropic), with additional deep research conducted via Google Gemini and Perplexity AI. The personal experiences, frameworks, and analysis are the author’s own. AI tools were used for literature synthesis, source discovery, counterargument generation, and drafting support.

Opinion note: This is a personal exploration blog. The views, frameworks, and interpretations expressed here are my own, grounded in over twenty years of UX practice but not representing any organisation or institution.

Source attribution: Research draws on peer-reviewed papers, industry reports (OECD, Deloitte, McKinsey), technical documentation (Anthropic, Linux Foundation, GitHub), and historical web design literature. Key sources are listed below.

Research & sources:

Historical & philosophical

MCP & protocol architecture

Data architecture & compute inversion

Ambient computing & OS evolution

AI usage, workforce & education

Composable architecture

Agentic AI & enterprise

Related Stimulus content

THE STIMULUS EFFECT | Podcasts

Podcasts on Spotify

You can listen to the Stimulus Effect Podcasts
on Spotify now!

 

Click to listen on Spotify!

THE STIMULUS EFFECT | Videocasts

0
The AI literacy paradox — The real leap is when we let go of AI as a tool

The AI literacy paradox — The real leap is when we let go of AI as a tool

The AI literacy paradox — The real leap is when we let go of AI as a tool

When a technology shift is small, the existing mental models stretch to accommodate it. When the shift is categorical, everything cracks. AI isn’t exposing a skills gap. It’s exposing a paradigm gap — between what we think we’re working with and what we’re actually working with. And for the first time, the turbulence is strong enough to trace exactly where the cracks run.

The blueprint nobody questions

For decades, every significant technology that entered the workplace followed the same pattern. A new capability arrived. We learned its features. We measured proficiency by how well people operated it. We built training programmes around it, certification levels for it, organisational structures to support it. And then we moved on to the next one.

This worked. It worked because the technologies were tools. Word processors, spreadsheets, CRM systems, design software, project management platforms — they all shared a fundamental characteristic: you operated them. You put something in. You got something out. The relationship was one-directional. The intelligence lived in the person; the tool executed.

Over those decades, we didn’t just learn the tools. We built a mental model about what technology is in the workplace. A deep, unexamined assumption: new technology means new features to learn, new interfaces to master, new skills to certify. The model became so natural it stopped being visible. Like water to fish.

AI entered the workplace through that same mental model. And at first, it fitted. Early encounters with ChatGPT really were tool-like — you typed a prompt, you got an output. The mental model held. It felt accurate because, for that moment, it was.

The AI literacy frameworks that followed encoded this model faithfully. Researchers examining 16 major AI literacy instruments found that every single one measures operational sophistication — your ability to use AI as a tool. Can you prompt effectively? Can you navigate platforms? Can you integrate AI outputs into your workflow? Thirteen of the sixteen are pure self-report. And not one measures what might be the most consequential dimension: whether you understand what AI changes about your work. Not what it can do. What it changes.

That’s not a gap in the research. That’s the mental model reproducing itself. The instruments measure what the model says matters: tool proficiency. The model says tool proficiency is what matters because that’s what the instruments measure. The loop is already running before anyone notices it’s a loop.

But here’s the number that should stop us. McKinsey’s 2025 research found that 88% of organisations now regularly use AI. Only 5–6% capture meaningful enterprise-level value. Sit with that for a moment. Nearly nine out of ten organisations have adopted AI. Fewer than one in twenty have figured out how to make it actually matter. McKinsey calls this “adoption without absorption” — deploying a technology without metabolising it into how the organisation thinks and works.

That’s not a small gap. That’s a chasm. And it’s telling us something fundamental about the nature of the problem. If this were a skills gap, training would close it. If this were a tool gap, better tools would close it. If this were an awareness gap, information would close it. But 88% adoption means the awareness is there, the tools are there, the access is there. And still — 5%. Something structural is preventing the translation from adoption to value. Something that operates below the level of tools, training, and intention.

That something is the mental model itself.

Why this one is different

Every previous technology fit into the category “tool” because it was a tool. AI doesn’t fit — and the cognitive science tells us precisely why the mismatch is so hard to see.

Michelene Chi’s research on conceptual change identifies three types, each progressively harder. The easiest is belief revision — updating a fact (“this model is more capable than I thought”). The middle is mental model transformation — restructuring your understanding (“AI can do more than I assumed”). The hardest, and the one that applies here, is categorical shift: moving a concept from one fundamental category to another entirely.

AI is migrating from what Chi calls a “direct process” entity — linear, deterministic, controlled by the user — to an “emergent process” entity — non-deterministic, adaptive, capable of initiative. This isn’t an upgrade within the tool category. It’s a departure from it. And categorical shifts are, according to Chi’s research, the most resistant form of conceptual change there is.

Why so resistant? Because the existing category doesn’t passively wait to be replaced. It actively filters incoming evidence.

Dedre Gentner’s structural mapping theory shows the mechanism. The “AI = software tool” analogy maps surface features accurately — there’s an interface, there’s input and output, there’s a screen. These surface similarities make the analogy feel correct. But it fails at the relational level. The relationship between you and AI is no longer the same as between you and a software tool. When AI behaves unexpectedly, instead of questioning the category, people revise their beliefs within it: “this tool has bugs,” “the output isn’t reliable,” “it needs better training data.” The category is preserved. The anomaly is absorbed. The shift never happens.

Stella Vosniadou’s research on conceptual change adds the next layer. When people encounter observations genuinely incompatible with their paradigm, they don’t immediately shift. They construct synthetic mental models — hybrids that blend new evidence with the old framework. For AI, this sounds like: “AI is a very powerful, somewhat unpredictable application that I control through prompts.”

That sentence feels reasonable. It probably describes your own working model, or something close to it. And that’s the danger — it’s coherent enough to feel like understanding while preventing the deeper realisation that the relationship itself has changed. From command to collaboration. From operating to orchestrating. From tool to something we don’t yet have a settled word for.

This pattern has a physical precedent that makes it easier to see. When the automobile arrived at the turn of the twentieth century, the first designs were horse carriages without horses — literally called “horseless carriages.” The driver sat high up where the coachman had sat, positioned to see over horses that weren’t there. The wheels were wooden and spoked, designed for horse speed. The suspension was built for trotting, not engines. One 1899 design — the “Horsey Horseless” by Uriah Smith — even bolted a carved wooden horse head to the front to avoid frightening real horses on the road. Every design choice came from the old category applied to the new reality. It took only about five years before the most advanced designs escaped the carriage entirely and became something genuinely new. But during that transition, the inherited mental model shaped everything that got built — including the parts that made no sense for what the technology actually was.

We’re in the horseless carriage phase of AI. The LinkedIn skill lists, the tool-proficiency frameworks, the feature-based maturity models — these are the wooden wheels and the coachman’s seat. They map the old category onto the new reality with enough surface accuracy to feel right. But they encode assumptions that belong to the previous paradigm: that the relationship is one-directional, that proficiency means operation, that literacy means knowing what buttons to press. The carved horse head on the front.

This is where AI parts company with every technology adoption that came before it. We’ve never had to make a categorical shift about what workplace technology is. Every previous adoption was an upgrade within the same category. This one requires leaving the category altogether. And the mental model we’ve spent decades building — the one that served us perfectly for every previous technology — is now the primary obstacle.

It has never been this clear. Small shifts don’t produce visible turbulence. When the change is incremental — a new version of software, a better interface, additional features — the existing mental model stretches to accommodate it without cracking. But when the shift is categorical, when it requires leaving one ontological class for another, the cracks appear everywhere. In how individuals think. In how organisations structure. In how we measure progress. And for the first time, those cracks are visible enough to trace — from the individual all the way through the organisation and back.

The loop that locks itself

Here’s where the research reveals something that hasn’t been this traceable before.

The individual’s frozen mental model doesn’t just sit quietly inside one person’s head. It shapes what they propose, what they evaluate, what they consider possible. And it enters the organisation through every meeting, every strategy document, every portfolio decision, every training programme designed by people operating from the tool paradigm.

But it doesn’t stop there. The organisation — whose structures, measurement systems, and decision frameworks were built by people with the same mental model — mirrors the freeze back. Your proposal gets evaluated through tool-paradigm criteria. Your maturity gets assessed against tool-paradigm frameworks. Your environment confirms: yes, AI is a tool. You are right to think of it that way. Here is your Level 3 score.

The individual sees confirmation. The loop tightens. And this is where the research gets genuinely new — because we can now trace the mechanisms at each stage of the loop, name them, and see why they’re so resistant to intervention.

The individual mechanisms

In 2008, Merim Bilalic and colleagues ran a study that should be required reading for anyone leading AI transformation. They put chess masters in front of problems with both a familiar solution and a better unfamiliar one, then tracked their eye movements.

The chess masters’ gaze continued fixating on features of the familiar solution while they claimed to be searching for alternatives. Performance dropped three standard deviations below normal. The bias — called the Einstellung Effect — doesn’t operate through conscious choice. It operates through attentional allocation. The first schema activated by familiar features literally directs where your eyes go. Better solutions become invisible. Not metaphorically. Literally.

Now apply this to every AI evaluation meeting you’ve ever sat in. The senior enterprise architect with decades of valuable experience evaluates AI agents through software criteria — determinism, auditability, latency — finds them wanting on those criteria, and concludes applications are still the right choice. The evaluation feels rigorous. The criteria feel objective. And the lens is wrong without anyone being able to see that it’s wrong — because the Einstellung Effect controls where their attention goes.

Nęcka, Gruszka, and Orzechowski’s research adds a critical finding: experts show strong intra-domain rigidity resistance (they can resist being misled within their domain) but heightened inter-domain rigidity (they perform worse than non-experts when the fundamental rules change). The shift from tool-use to AI collaboration is precisely an inter-domain shift — and expertise in the old domain makes it harder, not easier, to navigate.

Erik Dane’s work on cognitive entrenchment explains why this doesn’t feel like inflexibility from the inside. Deep expertise offers “perceived optimal efficiency” — the entrenched individual stays with familiar patterns because they minimise cognitive load and maintain the feeling of competence. Entrenchment feels like competence. The expert doesn’t experience themselves as stuck. They experience themselves as experienced.

And then there’s what happens when someone has made the shift and tries to bring it into the room. Research by Ackerhans and Wehkamp on medical professionals found that the loss of autonomy in decision-making drives resistance more than fear of replacement. The primary psychological mechanism isn’t “AI will take my job.” It’s “AI will take my agency in my own work.” When you propose a fundamentally different way of working with AI — collaboration instead of operation — you’re not just suggesting a new tool. You’re implicitly suggesting that the expert’s model of their own role needs updating. That triggers identity threat at four distinct levels: self-esteem (“Am I still valuable?”), self-efficacy (“Can I remain effective?”), continuity (“How do I maintain my identity through this?”), and distinction (“What makes me uniquely human?”).

No wonder the proposal gets dismissed.

The organisational amplification

If these were just individual cognitive patterns, training might address them. But individual patterns don’t persist in isolation. They persist because the social environment rewards them and the organisational structure sustains them.

A field experiment with 450 workers revealed something counterintuitive: when AI use is visible to evaluators, people reduce their reliance on AI by 14%. Accuracy declined 3.4% — they were performing worse by hiding their AI use — but the social calculus made hiding rational. Being seen using AI carries risk. So people who’ve made the categorical shift retreat into private experimentation and perform as tool operators in public.

This connects to a pattern confirmed across 14 countries by the Behavioural Insights Team: stated acceptance of AI is systematically more positive than actual behavioural adoption. People say they’re ready. Their behaviour says otherwise. The gap between stated acceptance and behavioural adoption tells us something important about the environment, not the individuals.

And this is where the social cognitive biases enter — not as individual quirks but as structural load-bearing mechanisms that keep the loop running.

Groupthink validates frozen mental models: “We all agree AI isn’t mature enough for that.” The consensus makes each individual’s Snapshot Freeze feel like shared knowledge rather than shared limitation.

Conformity bias makes the person who sees the shift feel like the problem: proposing something outside the group’s paradigm doesn’t feel like offering insight — it feels like breaking social contract.

Authority bias amplifies the Expertise Shield: senior people’s outdated models carry more weight precisely because of their seniority, regardless of whether their models are current.

Status quo bias operates at group level: existing tools, existing processes, existing portfolio decisions are “proven.” Change requires justification; staying the same does not.

Adolfo Carreno’s research on organisational immune systems describes what happens when these biases institutionalise. Resistance to change functions like a biological immune response — operating through pattern recognition, learned responses, and selective memory to protect organisational stability. When defensive responses harden into what Carreno calls “immunity memory,” the organisation begins to misidentify productive novelty as pathogen. Innovation gets neutralised not because it doesn’t work, but because it’s perceived as instability threatening homeostasis.

Chris Argyris identified the mechanism that holds it all together: organisational defensive routines. Recent research by Yang, Secchi, and Homberg maps four manifestations: rigidity (resistance to changing established procedures), embarrassment avoidance (suppression of critical doubts), cover-up (concealing mistakes or using intentional vagueness), and pretense (acting as if the official strategy is functional when everyone quietly knows it isn’t).

The result is what Argyris called “the undiscussable” — patterns that cannot be discussed without threatening social belonging. Everyone in the room knows the official AI picture doesn’t quite match ground reality. Saying so has costs. Not saying so is safer. The gap between official story and lived experience normalises until it becomes the water everyone swims in.

And here’s the insight that changes the frame entirely: these biases aren’t bugs. They’re adaptive collective coping mechanisms. When measurement tools show green and daily experience shows friction, the human brain has to resolve that dissonance. Social biases are how groups resolve it collectively. They’re the immune system of the status quo — and they serve a protective function. Attacking the biases without addressing the structural conditions that make them necessary produces anxiety, not progress.

Meyer and Rowan described this dynamic as decoupling — organisations deliberately adopt formal policies that satisfy external stakeholders while buffering their internal core from actual disruption. Formal AI strategy documents live on one track. How people actually work lives on another. The organisation maintains what researchers call “dual consciousness” — two simultaneous operating realities. This isn’t pathological. It’s structural. And it’s the mechanism by which organisations absorb the pressure of transformation without actually transforming.

The loop is now complete. Individual mental model → organisational structures → social dynamics that protect the structures → confirmation signals back to the individual. Each component sustains the others. And the loop operates at every scale — between individuals in a team, between teams in an organisation, between organisations in an industry, between institutions and the sectors they serve. Same mechanisms. Different boundaries. Same result: the categorical shift that needs to happen gets absorbed, buffered, and neutralised.

This is not a dark picture. It’s a human one. These are coping mechanisms for genuine structural incoherence, not character flaws. But seeing them clearly — seeing the full loop — is the first step toward intervening in it. Because you can’t change a system you can’t see.

What the mirror shows you

Existing AI literacy frameworks share a structural flaw that the loop makes visible: they only face upward. They describe aspiration — here’s Level 1, here’s Level 5, here’s the path between them. They assume linear progression. And they measure the individual in isolation, as if the environment doesn’t determine what’s possible.

The research tells us this is like a doctor who can only describe health but can’t diagnose illness. “Here’s what Level 4 looks like. You’re not there yet.” But no explanation of what’s preventing movement. No distinction between someone who lacks capability and someone who lacks the conditions to demonstrate capability. No recognition that the environment might be the constraint, not the individual.

What would it look like to build a diagnostic that captures both sides?

I’ve been developing something I call the Dual-Perspective AI Literacy Model. It starts with a simple observation: your effective capability isn’t determined by your level alone. It’s determined by the relationship between your level and your environment’s level.

Two gauges, not one

Imagine two gauges side by side. The left shows your AI literacy — your personal understanding, practice, and flexibility with AI. The right shows your environment’s collective mental model — the paradigm operating in your team, your organisation, your industry.

When the readings align, there’s no friction. Both at Level 2? Everything feels functional. You’re productive, valued, making sense to others.

When you’re ahead of your environment, the gap becomes friction. Your proposals can’t be parsed by the paradigm doing the evaluating. At one level apart: mild frustration. At two or three: structural disconnection. Your effective impact is constrained to the lower reading — a Level 4 practitioner in a Level 1 environment operates at Level 1 in that context.

When your environment is ahead of you, you feel something different: things moving too fast, conversations in a language you don’t quite speak, a vague sense that the ground rules changed while you weren’t looking.

What the organisation mirrors back

The dual gauge shows the distance. But to understand what you’ll actually encounter, you need to see what the organisation reflects back at each level — including the social cognitive artifacts that maintain the freeze.

Level 1 — The Frozen Frame. The mental model of AI was formed from early encounters and cached as settled knowledge. Chi’s ontological mismatch is operating invisibly — the category was set, everything since has been filtered through it. Groupthink validates: “We all agree AI isn’t ready for serious work.” There’s no felt friction because you’re producing the current, not swimming against it. The organisation mirrors back confirmation: “We’re being appropriately cautious.”

Level 2 — The Tool Operator. The comfort zone. The organisation rewards tool proficiency, and you are genuinely proficient. The Expertise Shield is forming — AI is “a tool I operate with skill,” evaluated by speed, accuracy, reliability. Dane’s cognitive entrenchment is active, and it feels like competence. Someone proposing AI as a “collaborator” sounds impractical from here. Bandwagon effect normalises the position: “Nobody else is doing this differently either.” The mirror says: “You’re doing well. Keep developing your skills.”

Level 3 — The Pattern Recogniser. The gap opens. You’ve crossed what Chi would call the ontological threshold — you’ve begun experiencing AI as a different kind of entity. Your environment hasn’t. The same colleagues who felt like peers now feel like barriers. The Einstellung Effect is visible to you, operating in others. Conformity pressure pushes back toward Level 2 — the 14% visibility retreat is rational here. You use AI as a collaborator privately and perform as a tool operator in public. The mirror says: “You’re overthinking this. Just focus on what works.”

Level 4 — The Paradigm Navigator. The gap becomes structural. You think in capabilities and redesigned workflows. Your environment thinks in tools and applications. Your proposals sound “too ambitious” for the meeting format. And here Sen’s Capability Approach becomes startlingly relevant: you may have the capability, but you lack the *conversion factors* to demonstrate it — role, visibility, access, time, platform. The maturity framework says Level 4 is achievable. The organisation’s architecture won’t let you operate above Level 2. The mirror says: “Interesting ideas, but let’s be realistic about what we can implement.”

Level 5 — The System Architect of Change. You see the full loop — including yourself in it. The risk is isolation: seeing so clearly that you lose patience with the pace. Carreno’s research on organisational immune systems becomes practical knowledge — you understand that successful local pilots fail to scale when the parent organisation’s immune system treats them as foreign bodies. You design for gradual shifts, not dramatic reveals. Participatory governance — giving practitioners voice in defining what “progress” means — becomes more effective than mandate. The mirror, if the environment has begun to shift, says: “Help us see what you see.” If it hasn’t: silence.

What the dual perspective reveals

The model does something no existing framework does: it diagnoses the present condition, not just the aspiration. It makes visible that a person’s stuckness might not be their own — it might be the environment’s. It names the social cognitive biases that operate at each level, not as character flaws but as artifacts of the gap between individual understanding and organisational structure.

And it reveals the loop in actionable terms. The reason the gap persists isn’t that people refuse to learn. It’s that the environment — built on the same frozen mental model — confirms the freeze and punishes the shift. The measurement frameworks score the wrong dimension. The social dynamics protect the status quo. And the individual, seeing their environment’s response, rationally concludes that the shift isn’t valued.

Repenning and Sterman at MIT identified the system dynamic that keeps this stable. Organisations operate with two competing loops: “Work Harder” (pressure for throughput, which crowds out investment in learning) and “Work Smarter” (invest in capability, which initially produces lower output). When “Work Harder” dominates — and it almost always does, because it produces immediate visible results — capability erodes slowly, with a time delay that masks the erosion. Managers misattribute the erosion to individual motivation rather than system dynamics. The measurement systems provide cover for this misattribution.

The dashboard shows throughput. It doesn’t show capability decay. And so the loop deepens.

5 System architect of change High flexibility

Flexibility marker

You include yourself in the picture you're diagnosing. You've stopped blaming "resistant people" and started seeing structural dynamics. You design for gradual chain effects, not dramatic reveals.

The highest flexibility is knowing when NOT to reconfigure — when stability serves the transition better than disruption.

Capability profile

  • Understands the social amplification layer
  • Can design skill/agent/application portfolios that match actual need
  • Works WITH organisational immune responses, not against them
  • Addresses structural conditions that make biases necessary
Knows that participatory governance — giving practitioners voice in defining progress — is more effective than top-down mandate.

What the organisation mirrors back

  • The temptation to "bang on the big drum" rather than building understanding gradually
  • The challenge of meeting people where they are without condescending
  • Rare level — most organisations have no one here, which is why the structural dynamic remains invisible
  • Decoupling everywhere: formal AI policies for external legitimacy, internal reality unchanged
The paradox: the more clearly you see the system, the harder it is to communicate without triggering the very defences you've mapped.
4 Paradigm navigator Growing flexibility

Flexibility marker

You see the full picture but struggle to communicate it without triggering the Abstract Defence. Your challenge is translating paradigm-level insight into concrete, non-threatening demonstrations.

The conversion factor problem: your capability exists, but the organisational conditions for demonstrating it may not. That's a structural gap, not a personal one.

Capability profile

  • Can design skill/agent/application portfolios
  • Sees connection between individual understanding and organisational capability
  • Understands the three-tier capability model (systems of record, hybrid workflow, autonomous)
  • Questions governing assumptions, not just processes (double-loop learning)
Asks "what category of solution should be doing this?" not "how do we use AI to do what we already do?"

What the organisation mirrors back

  • Perceived as threatening by the organisational immune system — your insights challenge the official picture
  • Evaluated on criteria from the paradigm you've moved beyond
  • The organisation makes wrong-category decisions (building apps when it needs skills) and you can trace the mechanism
  • The gap between management dashboard and ground reality is fully visible to you — and painful
The Aspiration Trap: maturity scales measure absent conversion factors, not absent capability. The bitter taste of being measured against a ladder that doesn't fit the terrain.
3 Pattern recogniser Emerging flexibility

Flexibility marker

You can release the tool-operator model and adopt the collaboration model. But you may not yet have the language or frameworks to explain what you see to others. The flexibility is personal but not yet communicable.

You feel frustrated in meetings about AI. You can see what others can't but can't make them see it. You've stopped arguing and started just doing it quietly.

Capability profile

  • Has crossed from "tool-use" to "collaboration" mental model
  • Recognises when framing limits output quality
  • Begins to see workflow redesign opportunities, not just efficiency gains
  • Understands AI as reasoning partner, not execution engine
The shift from "How do I prompt this better?" to "How should this work be structured differently?"

What the organisation mirrors back

  • You've crossed the threshold — the environment hasn't. THIS is where the gap opens
  • The same Level 1 colleagues who felt like peers now feel like barriers
  • Conformity pressure to perform at Level 2 — "just use it as a tool"
  • Your proposals evaluated through wrong criteria by people who can't see what you see
  • AI shaming: when AI use is visible to evaluators, people hide their actual practice
The morning meeting: you propose something real, get dismissed by people applying frozen mental models with confidence. The frustration IS the gap.
2 Tool operator Low flexibility

Flexibility marker

The tool-operator schema is efficient and rewarding. It produces visible value. Releasing it would mean a period of apparent performance degradation — the cost of transitioning to a new model. This is the cognitive entrenchment threshold.

You haven't changed HOW you work, only added a faster step. AI is "my tool" — not "my collaborator." This feels like competence because it is. The question is whether it's sufficient.

Capability profile

  • Can produce good results with AI for known tasks
  • Understands prompt quality affects output quality
  • Has integrated AI into daily workflow for specific functions
  • Can evaluate AI output within their domain expertise
Competent and productive — but the question "how should this work be structured differently?" hasn't yet surfaced.

What the organisation mirrors back

  • AI gets bolted onto existing workflows rather than triggering workflow redesign
  • Single-loop learning: "How do we do what we already do, faster?"
  • "It's useful for X but you can't trust it for Y" — framing that protects the tool model
  • Someone proposing AI as "collaborator" sounds impractical from here — and slightly threatening
  • Authority bias: senior operators define what's "appropriate" AI use for the team
The comfort of alignment: the environment validates your model, so there's no signal that a different model exists. The gap hasn't opened yet — which is precisely why it's hard to move from here.
1 Frozen frame Rigid

Flexibility marker

The mental model is cached and treated as complete. New evidence is interpreted within the existing category ("this is just a fancier version"). The category itself — "AI = unreliable tool" — is not questioned because it was formed from direct experience, which feels like knowledge.

You reference AI experiences from 2+ years ago as current. You evaluate today's AI by your first encounter. You feel confident in your assessment without recent hands-on use.

Capability profile

  • Has basic awareness AI exists and can produce text/images
  • May have tried ChatGPT or similar once or a few times
  • Formed an assessment based on those early encounters
  • Assessment was accurate for that moment — but the moment has passed
The first encounter wasn't wrong — it was a snapshot. The problem is treating a snapshot as a portrait.

What the organisation mirrors back

  • Confidence in outdated judgements: "I've tried AI, it doesn't work well"
  • Dismissal of demonstrations — evidence absorbed into existing model, not used to update it
  • "Yes but in general..." — the Abstract Defence that can't be falsified by specific examples
  • When leaders are here → portfolio decisions default to "build an application" because the skill/agent category doesn't exist in their model
  • Groupthink validates the frozen model: "We all agree AI is unreliable" feels like consensus, not limitation
You don't experience resistance because you are the resistance. The frozen model feels like knowledge. This is the hardest level to diagnose from the inside.

What no instrument measures

There’s a Swedish book from 2004 that keeps surfacing in this work — Jansson’s “Validering: att synliggöra individens resurser,” drawing on Finnish scientific research into competence validation. Jansson made the case that competence assessed through a single lens produces an incomplete picture. Formal education without practice is half the story. Deep experience without formal knowledge is the other half. Both dimensions are needed.

Twenty-two years later, we’re facing the same structural problem with AI literacy — but with a third dimension that Jansson didn’t need and that no current framework accounts for.

Knowledge — what you know about AI formally. Necessary. But Chi’s research is clear: knowledge accumulation doesn’t produce categorical shifts. You can describe AI’s emergent properties accurately on a test while your working mental model stays firmly in the “tool” category. Knowledge is a capacity dimension. It grows by addition. And addition alone doesn’t produce the shift we need.

Practice — what you’ve done with AI, hands-on. Also necessary. Also insufficient on its own. The Einstellung research is unambiguous: deep experience can entrench as readily as it liberates. Bilalic’s chess masters weren’t lacking practice. They were trapped by it.

And practice with AI carries its own hidden cost. Research by Crowston, and separately by Collins and colleagues, found that AI assistance can accelerate skill decay in experts and hinder skill acquisition in learners — without anyone noticing, because AI produces adequate outputs that mask the degradation. Clinicians using AI for polyp detection showed significant decline in independent detection skills after just three months. Researchers have begun calling this “never-skilling” — trainees using heavy AI assistance never develop the foundational abilities they’re ostensibly learning. The productive struggle that drives genuine skill formation gets removed. And with it, the mechanism that builds the very competence the tool was supposed to augment.

Metacognition — and this is the missing dimension. The ability to see the pattern you’re in. The flexibility to recognise when your mental model is no longer serving you and to release it. Can you feel the difference between knowledge and assumption? Can you recognise when you’re applying a cached model to a changed reality? Can you sit with the discomfort of a paradigm that doesn’t yet have a settled name?

Sixteen major AI literacy instruments. Not one assesses this dimension. They measure what you know and what you can do. They don’t measure whether you can see the frame you’re operating within.

Intelligence as flexibility

This connects to something that reframes the entire landscape. The emerging scientific understanding of intelligence is shifting — away from capacity (how much you know, how fast you process) and toward flexibility (how fluidly you can reconfigure what you already know when the situation demands it).

This is not a minor academic distinction. It’s the theoretical keystone for everything we’ve been tracing.

If intelligence is capacity, then more training, more knowledge, more practice should solve the AI literacy problem. But the research tells us they don’t. The 88% adoption / 5% value capture gap isn’t caused by insufficient training. 82% of employees have received no formal AI training at all — but training the other 18% hasn’t closed the gap either. Because the gap isn’t about capacity. It’s about flexibility.

Knowledge and practice are capacity dimensions. They accumulate. Metacognition is the flexibility dimension. It reconfigures. And flexibility is what determines whether someone can cross the threshold between Level 2 and Level 3 — the threshold where the category changes.

This is why the Expertise Shield is the most resistant pattern in the dual-perspective model. High capacity plus low flexibility equals entrenchment. The most knowledgeable, most experienced professionals can be the most deeply stuck — not despite their expertise but because of it. Dane’s research confirms: perceived optimal efficiency is the trap. It feels like competence. It functions as a cage.

And the Aspiration Trap doesn’t just trap capability — it traps flexibility. When you anchor self-assessment to a fixed ladder (“How do I get to Level 4?”), you replace the adaptive question (“What does my situation actually need?”) with a rigid one. The ladder becomes its own Einstellung — a perceptual attractor at the level of career development rather than problem-solving.

The “signals you’re here” in the dual-perspective model aren’t asking what you can do or what you know. They’re asking whether you can see the pattern you’re in. They’re flexibility diagnostics. Can you recognise when your expertise is creating blind spots? Can you shift your evaluation criteria when the paradigm changes? Can you hold your model loosely enough to be surprised?

This measurement shift — from tool-skill checklists to knowledge, practice, and metacognition — isn’t an academic refinement. It’s the difference between measuring what keeps people busy and measuring what actually moves them.

The portfolio built on yesterday’s logic

Everything we’ve traced — the frozen mental model, the loop between individual and organisation, the social cognitive biases that keep it stable, the missing metacognitive dimension — converges in a place where the cost becomes concrete and measurable: the organisation’s capability decisions.

When the collective mental model is “AI = tool,” the organisation builds accordingly. Application portfolios expand. Every operational need gets translated into a dedicated application — because in the tool paradigm, that’s what capability means. A persistent software system. Built once, maintained forever. Deterministic, auditable, controlled.

Large organisations already carry portfolios of thousands of applications. Research shows 30% of software budgets are wasted on redundant or unused systems. And those portfolios represent something deeper than technology choices — they represent fossilised assumptions about how work should happen. Each application encodes a moment’s understanding of a process, frozen in code, maintained at significant cost, increasingly divergent from how people actually work.

Here’s what the frozen paradigm can’t see: AI has collapsed the cost of contextual, ad-hoc work. What previously required building and maintaining a dedicated application can increasingly be handled by skills, agents, and orchestrated workflows — flexible capability that adapts to context rather than forcing context into a rigid structure. Gartner predicts 40% of enterprise applications will embed AI agents by the end of 2026, up from under 5% in 2025. The infrastructure is already decomposing: APIs, vector databases, MCP servers, modular retrieval systems — the monolithic backend is giving way to orchestrated, composable architectures. The specific forms will evolve. What we call skills and agents today may carry different names tomorrow. But the shift they represent — from rigid, persistent applications toward flexible capability that lives close to the work — that direction isn’t reversing.

Applications will remain essential for persistent cognitive systems with state, cross-references, and deterministic auditability. But that’s the core, not the default for everything. The long tail of organisational work — infrequent, variable, context-dependent — doesn’t need applications. It needs intelligence that adapts. And when the mental model only contains two categories — “manual process” and “dedicated application” — every analysis arrives at “build or buy an application.” The new category doesn’t exist in the cognitive map. Introducing it triggers the exact resistance mechanisms we’ve traced: the Einstellung Effect activates tool-paradigm schemas, the Expertise Shield protects existing architectural knowledge, the organisational immune system treats the proposal as foreign, and the social dynamics punish the person proposing something outside consensus.

The consequence is measurable. When organisations evaluate AI through the tool paradigm, they apply tool-paradigm KPIs: efficiency gains, time saved, error reduction. These metrics make sense for tools. They miss entirely what AI enables when understood differently — workflow redesign, capability redistribution, knowledge returning from rigid application logic to flexible, portable, maintainable process intelligence that people can actually update and own.

McKinsey’s finding is relevant again here: the organisations capturing real value — that 5–6% — are three times more likely to have fundamentally redesigned workflows rather than bolting AI onto existing processes. They’re not better at operating tools. They’ve left the tool paradigm.

And the cost of not leaving compounds. Technical debt accumulates as the portfolio grows. Maintenance costs rise. Flexibility decreases. The gap between how work is officially represented in systems and how it actually happens on the ground widens. And the organisation, measuring tool efficiency while missing paradigm readiness, reports progress on dashboards while the structural problem deepens underneath.

Goodhart’s Law is operating at full force here — “when a measure becomes a target, it ceases to be a good measure.” When maturity level becomes the organisational target, behaviour optimises for the metric. Application deployment rates. Training completion percentages. AI adoption scores. All trending upward. All measuring the wrong thing. Recent research suggests this isn’t just a rational response — fMRI evidence indicates that metric-seeking behaviour is partially neurologically hard-wired. We are built to optimise for the number, even when the number isn’t measuring what matters.

The four-stage blindness mechanism that researchers have identified operates relentlessly: the framework defines the problem space (foreclosing alternatives), measurement creates behavioural response (optimising for metric rather than capability), the score produces false certainty (“Level 3 of 5” closes inquiry), and structural causes become invisible (gaps get attributed to individual inadequacy rather than organisational architecture).

The result: “The frameworks are not failing — they are succeeding at what they actually do: create organisational confidence in systems that are systematically obscuring what matters.”

If the organisation doesn’t have the maturity to see what AI actually does and how it could be used — not as a bolt-on tool but as a fundamentally different way of structuring capability — then every portfolio decision, every architectural choice, every measurement framework reinforces the paradigm that needs to change. The mental model doesn’t just affect how individuals think. It determines what the organisation builds. And what it builds determines what it can become.

What the cracks make visible

I didn’t start this exploration with a framework. I started with a crack.

A moment in a meeting where I proposed something that couldn’t be heard — not because it was wrong, but because the room’s shared mental model didn’t contain the category it belonged to. That experience sent me into the research. And what the research revealed was not a single cause but a system — a loop between individual cognition and organisational structure that reinforces itself at every level, sustained by social dynamics that serve a genuinely protective function, and measured by instruments that confirm progress while obscuring the problem.

What makes this traceable now — for the first time, as far as I can find — is the scale of the turbulence. Decades of technology adoption never required a categorical shift. The mental model of “technology = tool” stretched comfortably to accommodate every previous wave. But AI breaks that accommodation. The shift is too large. The mismatch between the old category and the new reality generates friction at every boundary — individual, team, organisational, inter-institutional — and that friction makes the mechanisms visible.

Small shifts don’t produce this kind of turbulence. You don’t see the cracks when the change is incremental. But when the shift is categorical, everything that was hidden becomes exposed. The frozen mental models. The social biases holding them in place. The measurement frameworks confirming a picture that doesn’t match reality. The portfolio decisions encoding the wrong paradigm. The Einstellung Effect directing expert attention away from better solutions. The dual consciousness of organisations performing transformation while buffering against it.

All of this was always there. It operated in every previous technology adoption. But it was never visible because the turbulence was never this strong.

The research doesn’t just describe the problem. It traces cause and effect from the individual’s cognitive category all the way through the organisation’s defensive architecture and back. Chi explains why the category is frozen. Bilalic shows how expertise reinforces the freeze. Gentner reveals why the tool analogy persists. Vosniadou shows the hybrid models people construct to avoid the shift. Dane shows why entrenchment feels like competence. Carreno shows how the organisation’s immune system neutralises innovation. Argyris shows how defensive routines make the problem undiscussable. Meyer and Rowan show how decoupling allows organisations to perform change without changing. Goodhart and Campbell show how measurement confirms the performance. And Repenning and Sterman show how the system dynamics ensure capability erodes while dashboards show green.

Each of these was known. The cascade connecting them — from an individual’s ontological category assignment through social amplification to organisational architecture to portfolio decisions and back — that’s what hasn’t been traced before. Not because the pieces weren’t available, but because the turbulence wasn’t strong enough to make the connections visible.

So what changes?

The mental model has to change — at the individual level first, then cascading through the organisation. Not through training programmes that teach more tools. Not through maturity frameworks that score the wrong dimension. Through the development of metacognitive flexibility: the ability to see the pattern you’re in, release the model that’s no longer serving you, and operate in the uncertainty of a paradigm that hasn’t fully arrived yet.

That means measuring differently. Knowledge, practice, and metacognition — not tool checklists. Flexibility, not just capacity. Diagnostic frameworks that show where you’re stuck and why, not just where you aspire to be. Dual-perspective assessment that accounts for the environment, not just the individual.

And it means building differently. Organisations that understand AI as a categorical shift will structure their capability portfolios differently — skills and agents for the flexible majority, applications for the essential core. They’ll measure readiness by paradigm understanding, not tool proficiency. They’ll create environments where the people who’ve made the shift can operate visibly, not just privately. They’ll address the structural conditions that make social biases necessary, rather than blaming individuals for having them.

The 88% who’ve adopted and the 5% who’ve absorbed are separated by a categorical shift that no tool list will bridge. The cracks are visible now. What we do with what we can see — that’s the work ahead.

Disclaimer

AI-assisted content: This post was researched and developed with assistance from Claude (Anthropic), Gemini Deep Research (Google), and Perplexity Pro. The research foundation draws on 12 independent research reports commissioned across two AI platforms, covering cognitive science, organisational learning, social psychology, and measurement theory. Research synthesis, structural development, and visual modelling were collaborative processes between the author and AI. The thinking, editorial decisions, and conclusions are the author’s own.

Opinion: This is a personal exploration blog. Views expressed are the author’s own, informed by 20+ years of UX design practice and ongoing research into human-AI interaction.
Sources: Key references are listed below. The full research corpus is available on requ

Research & sources:

Research & academic sources

    • Chi, M.T.H. — Three Types of Conceptual Change: Belief Revision, Mental Model Transformation, and Categorical Shift
    • Bilalic, M., McLeod, P., & Gobet, F. (2008) — Why Good Thoughts Block Better Ones: The Mechanism of the Pernicious Einstellung Effect
    • Dane, E. (2010) — Reconsidering the Trade-off Between Expertise and Flexibility: A Cognitive Entrenchment Perspective
    • Vosniadou, S. — Conceptual Change in Learning and Instruction: The Framework Theory Approach
    • Gentner, D. (1983) — Structure-Mapping: A Theoretical Framework for Analogy
    • Goodhart, C.A.E. (1984) — Problems of Monetary Management: The U.K. Experience
    • Campbell, D.T. (1976) — Assessing the Impact of Planned Social Change
    • Sen, A. (1999) — Development as Freedom (Capability Approach)
    • Repenning, N.P. & Sterman, J.D. (2001) — Nobody Ever Gets Credit for Fixing Problems that Never Happened
    • Nęcka, E., Gruszka, A., & Orzechowski, J. (2012) — Cognitive Flexibility and Inter-domain Rigidity
    • Jansson, S. (2004) — Validering: att synliggöra individens resurser
    • Argyris, C. & Schön, D.A. (1978) — Organizational Learning: A Theory of Action Perspective
    • Carreno, A. (2025) — Why Organizations Resist Their Own Evolution
    • Ackerhans & Wehkamp (2022) — Professional Identity Threat and AI Resistance (JMIR)
    • Yang, Secchi, & Homberg (2025) — Organisational Defensive Routines (IJPSM)
    • Collins et al. (2024) — AI-Assisted Skill Decay in Clinical Settings (PMC)
    • Crowston, K. — Deskilling and AI Assistance in Expert Work
    • Behavioural Insights Team (2025) — AI Adoption Across 14 Countries
    • McKinsey (2025) — The State of AI: From Adoption to Absorption
    • Luhmann, N. — Social Systems and Decision Premises Theory
    • Meyer, J.W. & Rowan, B. (1977) — Institutionalized Organizations: Formal Structure as Myth and Ceremony
    • Elmqvist et al. (2025) — Participatory AI (consortium of 46 researchers)
    • Reich et al. (2026) — Psychological Safety and AI Adoption (arXiv)

Related Stimulus Content

THE STIMULUS EFFECT | Podcasts

Podcasts on Spotify

You can listen to the Stimulus Effect Podcasts
on Spotify now!

 

Click to listen on Spotify!

THE STIMULUS EFFECT | Videocasts

0
When science catches up: An evidential map of exploration and validation

When science catches up: An evidential map of exploration and validation

When science catches up: An evidential map of exploration and validation

There’s a particular kind of feeling when you’ve spent over a year exploring, questioning, and building frameworks around an idea — and then one morning, you open two freshly compiled research reports and the science is saying exactly what you’ve been writing about. Not vaguely. Not in the general neighbourhood. Almost bullet by bullet. I sat there reading through the findings, and what I kept thinking was: BINGO. Not the smug kind. The kind that comes from relief — that the exploration wasn’t noise, that the thinking wasn’t disconnected from reality, that the threads I’d been pulling actually led somewhere the scientific community is now arriving at too. This post is an evidential map. It traces the concepts I’ve explored across the Stimulus blog over the past year and maps them against the influential cognitive science reports and user behaviour research of 2025–2026. It’s not a victory lap. It’s something better: a convergence.

 

Part 1: The cognitive reckoning — When offloading becomes debt

The territory we mapped first

If there’s one thread that runs through almost everything I’ve written on Stimulus, it’s this: working with AI is fundamentally a cognitive event, not a technical one. That was the core argument of “Working with GenAI: The Big Shift is Cognitive” — a piece I published in October 2024, when most of the conversation was still about prompting techniques and model benchmarks. The argument was simple but, at the time, felt like swimming against the current: the real transformation isn’t what AI can do for you. It’s what it does to you. To your attention, your memory, your capacity for independent thought.

In “The Symbiotic Evolution of AI and the Human Brain,” I pushed further into the neurochemistry — how dopamine systems get rewired through AI interaction, how cognitive offloading isn’t just convenient but structurally habit-forming, and how the developing brain is particularly vulnerable. The metaphor I kept returning to was symbiosis: the relationship between human cognition and AI isn’t parasitic and it isn’t purely mutualistic. It depends entirely on how we engage.

And in the “Digital Dance” series — particularly Part 2, “When AI Meets Human Nature” — I explored the trust paradox, the illusion of agency, and the dependency loop that forms when we let AI handle not just our tasks but our thinking.

What the 2025–2026 science now confirms

Fast forward to 2026, and the peer-reviewed literature is landing in almost exactly the same territory.

The landmark study from SBS Swiss Business School (Gerlich, 2025) surveyed 666 participants and found a statistically significant negative correlation between AI tool usage and critical thinking scores — r = -0.68. Cognitive offloading was identified as the primary mediating mechanism, with a correlation of r = +0.72 to AI usage and r = -0.75 to critical thinking performance. These are among the strongest empirically documented correlations in any large-scale AI–cognition study to date. This isn’t a suggestion. It’s a signal.

Microsoft Research reinforced the picture with a survey of 319 knowledge workers sharing 936 real-world GenAI use cases: higher confidence in AI was negatively associated with critical thinking, while higher self-confidence in one’s own abilities predicted more critical thinking — though at a perceived higher cognitive cost. The nature of critical thinking isn’t disappearing, the researchers noted, but shifting: from generation toward information verification, response integration, and task stewardship. This is almost word-for-word what I was arguing in “The Big Shift is Cognitive” — that the cognitive work changes in kind, not just in quantity.

But the most neurologically grounded evidence comes from the MIT Media Lab. In an EEG-based study on 54 volunteers, ChatGPT users showed the lowest neural engagement in areas linked to memory, executive function, and creativity. An 83.3% failure rate was observed when these users tried to quote from their own essays. And here’s the finding that stopped me cold: those who stopped using ChatGPT showed persistent weakened neural connectivity. The researchers called it “cognitive debt” — where AI dependency leaves lasting neurological traces, consistent with neuroplasticity principles.

Cognitive debt. I’d been using the concept of cognitive erosion and dependency architecture in the Digital Dance series. The MIT lab gave it a clinical name and EEG data. BINGO.

And then there’s “AI Brain Fry” — a term coined by Boston Consulting Group’s March 2026 study of nearly 1,500 workers. Professionals required to closely monitor AI agents reported 14% more mental effort, 12% more mental fatigue, and 19% greater information overload. Approximately 14% of all AI-using professionals experienced this state; in marketing and operations roles, it rose above 25%. The core problem, BCG found, is a fundamental shift from creation to curation — professionals are no longer primary authors of work but reviewers of AI output, sustaining intense vigilance rather than deploying creative effort.

I wrote about the orchestration burden in the context of cognitive load and AI interaction. BCG gave it a catchy name and survey data. But the mechanism — the paradox of AI reducing task load while increasing oversight load — is the same tension I was tracing.

What does it mean when the biggest consulting firm in the world is documenting the same cognitive paradox you were exploring on a personal blog a year earlier? It means the exploration was real. It means the thinking was grounded. And it means the problem is bigger than any of us estimated.

Part 2: Trust, bias, and the architecture of overreliance

The territory we mapped

“Analysis Paralysis in the AI Age” was one of the posts where I tried to map what happens to decision-making when we’re drowning in AI-generated options. The argument wasn’t just about information overload — it was about how AI fundamentally reshapes the decision landscape. When the machine can generate twenty plausible options in seconds, the cognitive bottleneck shifts from finding answers to evaluating them. And our brains weren’t evolved for that kind of evaluative load.

The Digital Dance series — particularly Part 2 — explored the trust paradox head-on: humans tend to trust AI more than they trust other humans in certain contexts, not because AI is more reliable, but because it presents information with a confidence and consistency that triggers our authority bias. I called it the “illusion of agency” — the feeling that we’re making choices when we’re actually following algorithmic paths.

And in the Cognitive Ecosystem framework — the Four-Layer model I developed through the Stimulus Cogitavi project — I mapped how biases operate across four distinct layers: Individual Cognitive Bias, Social Cognitive Bias, Media Manipulation, and a fourth layer I called Synthetic Cognitive Alterations. That fourth layer is the new territory: AI systems that don’t just exploit existing biases but alter the substrate of cognition itself. Cognitive offloading, synthetic social cognition, reality uncertainty, and what I termed “dependency architecture.”

What the 2025–2026 science now confirms

The automation bias literature of 2025–2026 reads like footnotes to these explorations.

A 2025 Springer review of human–AI collaboration identified automation bias as a pervasive pattern across industries. A March 2026 study using linear mixed-effects modelling found that automation bias severity intensifies under time pressure — participants leaned on AI more heavily when time was constrained, even when this produced greater deviations from ground truth. Professional experience and self-efficacy reduced dependence, but — and this is the paradox I was circling in Analysis Paralysis — high in-task confidence paradoxically increased automation reliance. Greater comfort with AI can lead to less critical evaluation, not more.

An SSRN framework paper on LLM overreliance proposes a three-phase cycle that mirrors almost exactly what I was describing in the Cognitive Ecosystem’s Layer 4. Phase one: initial dependency fuelled by perceived efficiency. Phase two: critical thinking atrophy via cognitive offloading. Phase three — and this is the one that should concern us all — bias internalisation, where AI biases are reproduced in human decisions even when AI is not present. The biases become ours. We absorb them. This is precisely what I meant by “synthetic cognitive alterations” — the point where AI doesn’t just influence your thinking in the moment but reshapes how you think after you close the laptop.

The KPMG/University of Melbourne 2025 Global Trust Study — the most comprehensive to date, surveying 48,340 people across 47 countries — found that 66% of global respondents regularly use AI, but only 46% are willing to trust it. Seventy percent are uncertain about AI-generated online content. And here’s the number that makes the trust paradox concrete: 66% rely on AI output without evaluating its accuracy, while simultaneously 56% report making mistakes at work due to AI. We use it, we don’t trust it, and we don’t check it. That’s not a rational pattern. That’s a cognitive one.

The same study revealed that 57% of employees hide their AI use from employers — presenting AI-generated content as their own. Shadow AI. I explored this in the context of the Digital Dance’s “dissolution of information hierarchies” — the idea that when everyone is using AI but nobody is admitting it, the entire epistemic foundation of professional work becomes unstable. Who actually thought this? Who actually wrote this? Does it matter? The science now says: yes, it matters enormously, because the hidden AI use means hidden cognitive offloading, which means hidden skill erosion at organisational scale.

And the Gemini research report adds a dimension I hadn’t fully explored: the role of Explainable AI (XAI) in trust dynamics. Several studies indicate that explanations can increase overreliance, because users find the presence of any logical justification — even a flawed one — sufficient to abdicate their own judgment. The very thing designed to make AI more trustworthy can make us less careful. That’s not a technology problem. That’s a cognitive architecture problem. And it maps directly onto the trust paradox I’d been exploring.

BINGO, BINGO, BINGO.

Part 3: The interface revolution — From conversations to autonomous teammates

The territory we mapped

“The Next Frontier of UX/UI: Where AI Meets Human-Centered Design” was one of the longest pieces I’ve written — over 40,000 words mapping how AI is fundamentally reshaping the design landscape. The core argument: we’re not just adding AI features to existing interfaces. We’re witnessing a paradigm shift in what an interface is. The screen-based, click-driven interaction model that has dominated for decades is being replaced by something more fluid, conversational, and increasingly autonomous.

In “AI-First Design Framework: A New Paradigm,” I pushed this further into a concrete methodology — arguing that design needs to start from AI capabilities rather than bolting AI onto existing patterns. And in “Beyond Current LLM Architectures: Revolutionary AI Architectures,” I explored the agentic frontier — how the shift from copilots to autonomous agents represents not just a technical evolution but a fundamental change in the human-machine relationship.

“AI Frontiers: Trends and Challenges for 2025 and Beyond” was my attempt to map the macro trajectory — where the technology is heading and what that means for how we work, create, and think. And “Future Trends in the UX-UI Field” specifically addressed how these shifts demand entirely new design thinking.

What the 2025–2026 science now confirms

The research reports of 2025–2026 confirm this shift with hard data.

Traditional search rankings are no longer the primary factor for visibility. AI engines now provide direct answers, and “zero-click searches” — where users get information from a summary without visiting a website — account for over 65% of informational searches. This has given rise to “Generative Engine Optimisation” (GEO), where businesses focus on technical “retrievability” rather than traditional SEO. Users are adopting more conversational search patterns — asking complete questions in natural language instead of keyword phrases. This is the paradigm shift I mapped in the UX/UI posts: from click-based navigation to intent-based conversation.

Research on conversational user interfaces (CUIs) reveals that ease of use drives 82% satisfaction rates, but understanding user intent drops to 69% and personalisation to 65%. The friction isn’t in the technology anymore — it’s in the gap between what users mean and what systems understand. That gap is exactly where I argued the next generation of UX design needs to focus.

The agentic shift is now quantified. A Nylas 2026 report based on 1,000+ developers and product leaders found that 85% believe agentic AI will become “table stakes” within three years, with 64.4% already placing it on product roadmaps. Gartner predicts 40% of enterprise applications will embed AI agents by end of 2026, up from less than 5% in 2025. This is the trajectory I described in “Beyond Current LLM Architectures” — the shift from passive tools to active teammates.

And Anthropic’s own research on Claude Code provides a rare window into how users actually interact with high-autonomy agents. As users gain experience, they increasingly shift from step-by-step approval to “full auto-approve” modes. But here’s the nuance that validates my thinking about active oversight: experienced users who let agents run longer without pre-approval actually interrupt more frequently (9% of turns vs 5% for new users). They trust more, but they watch more carefully. That’s not blind delegation — it’s calibrated oversight. It’s exactly the kind of “active engagement” I argued for in the Digital Dance series as the healthy alternative to passive consumption.

The Gemini report maps the interface evolution into four paradigms: Generative UI (adaptive layouts generated on the fly for each intent), Spatial UI (gaze and gesture-based interaction in AR/VR), Voice UI (speech as the primary mode), and Invisible UI (ambient context where AI anticipates needs without direct prompts). Jakob Nielsen and other UX experts predict that 2026 will see the death of static, hard-coded interfaces. This moves UX design from “styling” to “strategic hypothesis testing” — where designers focus on metrics like retention and conversion while AI handles the generation of layouts and components. This is the shift from design-as-craft to design-as-strategy that I mapped in the AI-First Design Framework.

And then there’s the concept of “emotionally aware” interfaces — systems that track eye movement, facial expressions, and gestures to adjust their colour, motion, and pacing based on the user’s emotional state. This is precisely the intersection I explored in the “Emotional AI and UX” series, where I argued that the next generation of AI-driven design won’t just respond to what users do but to what they feel. The research now has the technical implementation catching up to the conceptual framework.

BINGO after BINGO.

Part 4: Authenticity, fatigue, and the human signal

The territory we mapped

“Creativity as the Product: Addressing the GenAI Dilution Dilemma” was born from a growing unease I felt about the flood of AI-generated content — not because AI content is inherently bad, but because the sheer volume was beginning to drown out the qualities that make creative work meaningful. The argument: when everyone has access to the same generation tools, the differentiator shifts from technical capability to authenticity. The human signal — specific personal experience, genuine opinion, intentional imperfection — becomes the scarce resource.

“The Future of Content: Immersion, Personalisation, and the Role of AI” explored the other side of this coin: how AI can actually enhance content when it’s used for personalisation, immersion, and adaptive experiences rather than bulk generation.

And “Storytelling and Media in the Age of AI” grappled with the deeper question: what happens to narrative — the most fundamental human sense-making tool — when the machines can tell stories too?

What the 2025–2026 science now confirms

By early 2026, the term “AI slop” became a dominant cultural descriptor for low-quality, AI-generated content produced for clicks. Consumers have developed sophisticated pattern recognition for synthetic content, leading to a 20-35% lower engagement rate for posts identified as AI-generated compared to human-authored content.

The research calls it the “uncanny valley of text” — content that is technically correct but lacks the specific details, genuine opinions, or imperfect phrasing that signals authentic human expertise. A 2026 report found that 66% of consumers feel “credibility fatigue” from the constant task of verifying whether online information is true. Among Gen Z, AI fatigue reaches 80%. Forty-three percent actively distrust online information. And 63% report switching brands after a poor AI experience.

This is exactly the dilution dilemma I was writing about. And the response the market has developed is remarkably aligned with what I proposed: authenticity markers. Brands are now intentionally incorporating specific personal anecdotes, verifiable data with specific numbers, and what researchers call the “stutter premium” in video — unedited footage with natural pauses and self-corrections that generates 3x higher recall than polished AI video. The imperfection is the signal.

Consumers are increasingly using “trust shortcuts” to navigate the AI-saturated environment — relying on brand recognition, number of reviews, and recommendations from family rather than doing their own research. In 2026, a brand’s reputation for authenticity is more valuable than the raw intelligence of its AI tools. That’s the thesis of “Creativity as the Product” in one sentence.

And the personalisation data validates the other side of my argument. BCG and Bain research confirms that AI-powered personalisation delivers an average 20% sales growth, with fast-growing companies deriving 40% more revenue from personalisation than slower peers. Dynamic content personalisation increases average session duration by 20–30%, and AI-powered re-engagement campaigns reduce subscription churn by 20–40%. When AI serves the human experience rather than replacing the human voice, it works. When it replaces the human voice, people leave.

Part 5: Frameworks for a cognitive future

The territory we mapped

This is where the Stimulus work goes beyond observation into architecture. The Cognitive Resilience Diagnostic (CRD) — born from the Digital Dance series — is a framework for assessing and enhancing human resistance to digital manipulation. It integrates the triple-brain model (reptilian, emotional, rational) with a four-component resilience model: Cognitive Resilience, Emotional Regulation, Information Processing, and Vulnerability Factors. It even includes a quantifiable formula: CRS = (CR × W₁) + (ER × W₂) + (IP × W₃) – (VS × W₄).

The Cognitive Ecosystem — the Four-Layer framework from the Stimulus Cogitavi project — maps how biases operate not just at the individual level but across social, media, and synthetic layers. It introduced the System 0-1-2 model, extending Kahneman’s famous dual-process theory to include the reptilian/survival brain as System 0. And it argued that the pathology isn’t having biases — it’s when they become rigid, exploited, imbalanced, or stop adapting.

The STIMULUS thinking system itself — with its seven modes (Explore, Analyse, Synthesise, Model, Validate, Narrative, Format) — is a practical cognitive framework for maintaining structured thinking in the AI age. And the interdisciplinary thinking series made the case that the next wave of innovation won’t come from going deeper into single domains but from connecting across them.

These aren’t just blog posts. They’re tools. Frameworks. Architectures for thinking.

What the 2025–2026 science now confirms

The emerging scientific frameworks of 2025–2026 are converging on the same structural insights.

A January 2026 paper in Nature introduced the “3R Principle” — Results, Responses, Responsibility — as a framework for cognitive hygiene in human-AI interaction. The core argument: neuroplasticity is shaped by the quality of AI engagement. Passive, uncritical reliance weakens activity-dependent brain plasticity, while active co-creation can sustain or enhance it. This is, structurally, what the CRD framework was designed to assess and what the STIMULUS system was designed to support: active cognitive engagement rather than passive delegation.

Andy Clark’s 2025 paper “Extending Minds with Generative AI” argues that humans are inherently “hybrid thinkers” who have always incorporated non-biological resources — from writing to calculators — and that AI represents the newest layer of this cognitive scaffolding. The key variable is agency: whether users remain the intentional architects of cognitive processes or become passive consumers. This is exactly the distinction at the heart of every piece I’ve written — the difference between using AI as a scaffold and using it as a crutch.

Research on Cognitive Forcing Functions (CFFs) — deliberate design mechanisms that introduce structured friction to slow down and deepen evaluation — found that participants required to complete structured reflection steps before proceeding with AI-generated plans were significantly less reliant on AI, achieved higher accuracy, and did so without meaningfully increasing cognitive load. Structured friction is not an obstacle to productivity — it’s a protection for cognitive quality. The CRD framework is essentially a diagnostic for identifying where that friction needs to go. The STIMULUS thinking modes are the friction itself, turned into a workflow.

The scientific community is also increasingly adopting interdisciplinary frameworks to evaluate AI’s impact — Human-Centred Artificial Intelligence (HCAI), the Stimulus-Organism-Response (SOR) model, Self-Determination Theory, and Social Interaction Theory. These are being applied to “digital well-being” models that attempt to balance the efficiency of AI with the psychological needs of the user. The interdisciplinary approach I advocated for — drawing from psychology, philosophy, cognitive science, UX design, and behavioural economics — isn’t just philosophically appealing anymore. It’s methodologically necessary.

A University of Technology Sydney report from March 2026 found that pedagogically structured AI use — including explicit teaching, Load Reduction Instruction, and integrated metacognitive prompts — can preserve critical thinking while retaining AI-enabled efficiency. Unstructured use, by contrast, risks what the report terms “cognitive atrophy.” Structure preserves cognition. That’s the entire design philosophy behind the STIMULUS system: structured thinking modes that keep the human mind active while leveraging AI capability.

A 2025 review of 103 papers on Cognitive Load Theory (CLT) and AI found that AI can significantly improve real-time management of cognitive load through neuroadaptive learning technologies and personalised feedback systems. But it also identified the cognitive paradox — AI as both enhancer and eroder of deep cognition — and framed it as a design challenge that demands intentional architecture. Not passive deployment. Not “let the AI handle it.” Intentional, structured, human-centred design. Which is what every framework on Stimulus has been arguing for.

And the World Economic Forum’s March 2026 analysis identifies cognitive manipulation via AI-generated synthetic media as an emerging global disinformation crisis, with advanced AI systems now capable of exploiting known cognitive biases at scale. This is the Four-Layer Cognitive Ecosystem’s Layer 3 (Media Manipulation) supercharged by Layer 4 (Synthetic Cognitive Alterations). The biases I mapped aren’t theoretical anymore. They’re weaponised.

The evidential map: A summary of convergence

Here’s the complete mapping — Stimulus explorations on the left, 2025–2026 scientific validation on the right:

Stimulus Exploration Published Research Validation (2025–2026)
Cognitive offloading as the core AI shift (“Working with GenAI: The Big Shift is Cognitive”) Oct 2024 Gerlich 2025 (r = -0.68); Microsoft Research (319 workers); MIT Media Lab EEG study
Cognitive debt and dependency architecture (“The Symbiotic Evolution of AI and the Human Brain”) 2024 MIT: “cognitive debt” with persistent neural traces; BCG: “AI Brain Fry” in 14% of workers
Trust paradox and illusion of agency (“The Digital Dance” series) 2024 KPMG: 66% use AI, 46% trust it; XAI paradoxically increases overreliance
Automation bias and decision paralysis (“Analysis Paralysis in the AI Age”) 2024 Springer review; SSRN 3-phase bias internalisation cycle; time-pressure amplification
Four-Layer Cognitive Ecosystem and Synthetic Cognitive Alterations (Stimulus Cogitavi) 2024–2025 SSRN: bias internalisation post-AI use; WEF: cognitive manipulation at scale
Shadow AI and dissolved information hierarchies (“The Digital Dance”) 2024 KPMG: 57% of employees hide AI use from employers
Interface paradigm shift: clicks → conversations → agents (“The Next Frontier of UX/UI”) Jan 2025 Zero-click search at 65%; CUI 82% satisfaction; Gartner: 40% apps embed agents by 2026
AI-First Design methodology (“AI-First Design Framework”) Jan 2025 “Generative UI” predicted by Nielsen; outcome-driven UX; death of static interfaces
Agentic AI as behavioural frontier (“Beyond Current LLM Architectures”) 2024 Nylas: 85% say agentic AI = table stakes; Anthropic: active oversight patterns in Claude Code
Emotional AI and anthropomorphic design (“Emotional AI and UX” series) 2024 SOR model studies; Frontiers: anthropomorphism increases trust but masks limitations
Authenticity as competitive advantage (“Creativity as the Product”) Nov 2024 “AI slop” backlash; 20-35% lower engagement for AI content; “stutter premium” in video
AI personalisation as enhancement vs. replacement (“The Future of Content”) Dec 2024 BCG/Bain: 20% sales growth; 20–30% session uplift; 20–40% churn reduction
Cognitive Resilience Diagnostic and triple-brain model (“The Digital Dance” → CRD) 2024 Nature 2026: 3R Principle; CFFs reduce AI dependency without increasing load
STIMULUS structured thinking system (Stimulus Cogitavi) 2024–2025 UTS 2026: structured AI use preserves cognition; unstructured use → “cognitive atrophy”
Interdisciplinary approach to AI–cognition (Interdisciplinary Thinking series) 2024 HCAI, SOR, Self-Determination Theory — interdisciplinary frameworks now methodological standard
Cognitive Load Theory reframed for AI (across multiple posts) 2024 Frontiers: 103-paper review confirms AI as both enhancer and eroder; demands intentional design
Disinformation as cognitive threat (“The Digital Dance”) 2024 WEF March 2026: cognitive manipulation via synthetic media at global scale

 

That’s seventeen points of convergence. Seventeen moments where explorations published on a personal blog between late 2024 and early 2025 align with peer-reviewed research, large-scale surveys, and institutional reports published in 2025 and 2026.

What this means — And what it doesn’t

I want to be careful here. This isn’t a claim of prediction. I didn’t predict these findings. What I did — what any curious explorer does — is follow threads. I read the earlier science, the foundational cognitive research, the UX literature, the behavioural economics, and I followed the implications forward. The frameworks I built on Stimulus are extensions of existing knowledge, not inventions from nothing. Kahneman, Clark, Sweller, Le Bon, Tversky — they did the foundational work. I tried to ask: what happens when we add AI to these existing models?

The fact that the 2025–2026 research is arriving at similar conclusions doesn’t mean I was ahead. It means the foundations were solid. It means the interdisciplinary approach — drawing from psychology, neuroscience, philosophy, UX design, and technology simultaneously — works. It means that when you follow good science forward with genuine curiosity and a willingness to sit with complexity, you end up in places the science will eventually validate.

But it also means something else. Something more urgent. If a personal exploration blog can map these cognitive risks a year before the institutional research confirms them, then the institutional response is too slow. The MIT EEG study showed cognitive debt forming in months. The BCG study showed AI Brain Fry already affecting 14% of the workforce. The KPMG study showed 57% hiding their AI use. These aren’t future risks. They’re present realities. And the frameworks for addressing them — structured thinking, cognitive resilience diagnostics, intentional design friction, active engagement over passive delegation — can’t wait for another cycle of peer review.

We need them now. We’ve needed them since 2024. Some of us were building them. 

The question that remains

Here’s what I keep coming back to: if the science confirms that passive AI use erodes cognition, and if the behavioural data shows that most people use AI passively, and if the trust data shows that most people don’t even trust what they’re passively consuming — then we’re not just facing a technology challenge. We’re facing a civilisational design problem.

How do we build systems that keep us thinking? How do we design interactions that strengthen rather than atrophy the neural pathways we need for independent thought? How do we maintain the “mental muscles” — as the Gemini report puts it — that define human intelligence, while still embracing the genuine power of AI collaboration?

I don’t have a final answer. I’ve been exploring these questions for over a year, and the exploration has led to frameworks, tools, diagnostics, and a seven-mode thinking system. The science now says the direction is right. But the destination? That’s still being written — by all of us, in every interaction we have with these systems, every day.

The question isn’t whether AI will keep evolving. It will. The question is whether we will evolve with it — consciously, actively, with our cognitive sovereignty intact. Or whether we’ll look up one day and realise we outsourced the one thing that made us human: the capacity to think for ourselves.

Disclaimer

AI-Assisted Content: This blog post was researched using Claude AI. The two underlying research compilations were produced using Gemini and Perplexity AI. The writing, analysis, framework mapping, and editorial judgment are the author’s own.

Opinion Note: This is a personal exploration blog. The views, interpretations, and framework connections expressed here are my own. The evidential mapping represents my reading of the research — other interpretations are valid and welcome.

Source Attribution: The 2025–2026 research cited in this post is drawn from two comprehensive compilations: a Gemini-produced report (“The Cognitive-Behavioral Revolution of AI-Driven Applications: A Global Analytical Report 2025-2026”) and a Perplexity-produced report (“Cognitive Science & User Behavior Trends in AI-Driven Applications 2025–2026”), both compiled in March 2026. Primary sources include peer-reviewed studies from MIT Media Lab, SBS Swiss Business School, Microsoft Research, Boston Consulting Group, KPMG/University of Melbourne, the World Economic Forum, and publications in Nature, Frontiers in Psychology, and various HCI journals. Full references are linked below.

Research & sources:

Research Sources (2025–2026)

    • Gerlich, M. (2025). “Increased AI Use Linked to Eroding Critical Thinking Skills.” SBS Swiss Business School. phys.org
    • Microsoft Research (2025). “The Impact of Generative AI on Critical Thinking.” CHI 2025. microsoft.com
    • Kosmyna et al. (2025). MIT Media Lab EEG Study on ChatGPT and Brain Activity. lemonde.fr
    • Boston Consulting Group (2026). “AI Brain Fry” Study. streamlinefeed.co.ke
    • KPMG/University of Melbourne (2025). Global Trust in AI Study (48,340 respondents, 47 countries). forbes.com
    • Nature (2026). “The Brain Side of Human-AI Interactions: The 3R Principle.” nature.com
    • Clark, A. (2025). “Extending Minds with Generative AI.” Frontiers in Psychology. pmc.ncbi.nlm.nih.gov
    • University of Technology Sydney (2026). “AI, Cognitive Offloading, and Implications for Education.” uts.edu.au
    • Frontiers in Psychology (2025). “The Cognitive Paradox of AI in Education.” pmc.ncbi.nlm.nih.gov
    • World Economic Forum (2026). “How Cognitive Manipulation and AI Will Shape Disinformation in 2026.” weforum.org
    • Anthropic (2026). “Measuring AI Agent Autonomy in Practice.” anthropic.com
    • Nylas (2026). “Agentic AI Report 2026.” nylas.com
      Deloitte (2026). “The State of AI in the Enterprise.” deloitte.com
    • Microsoft/AI Economy Institute (2025). “Global AI Adoption in 2025.” microsoft.com
    • Gartner/IDC (2026). AI Agent Adoption Data. joget.com
    • Smashing Magazine (2026). “Designing for Agentic AI: UX Patterns for Control, Consent, and Accountability.” smashingmagazine.com

Related Stimulus Content

THE STIMULUS EFFECT | Podcasts

Podcasts on Spotify

You can listen to the Stimulus Effect Podcasts
on Spotify now!

 

Click to listen on Spotify!

THE STIMULUS EFFECT | Videocasts

0
From rituals to readiness — A UX practitioner’s self-assessment for the AI shift

From rituals to readiness — A UX practitioner’s self-assessment for the AI shift

From rituals to readiness — A UX practitioner’s self-assessment for the AI shift

In Part 1, we looked at the Orchestration Load Framework from the outside — a new model for understanding what happens to human cognition when AI enters the workflow. But there’s a harder question hiding behind the theory, and it’s the one that kept me up at night: what happens when you point that lens at UX practice itself? When you audit not the tools your users work with, but the system you work in? The diagnosis is uncomfortable. The opportunity is enormous. And the transition has already started whether we’re ready or not.

 

The week you recognise

I want to describe a week. Tell me if it sounds familiar.

Monday morning. Standup at 9:15. You report on the screens you pushed to review on Friday. Someone asks about the edge case you haven’t had time to think about, so you improvise an answer that sounds reasonable. After standup, you open Figma. The component library needs updating because the design system team changed the spacing tokens over the weekend. An hour disappears.

By 11 o’clock, you’re in a refinement session, translating a product requirement into something developers can estimate. The requirement is vague — “improve the onboarding experience” — but it’s already been sized and slotted into the sprint. You’re not designing the onboarding experience. You’re fitting a design into a container that was shaped before you arrived.

Lunch is at your desk because the afternoon is back-to-back: design review at 13:00, a cross-team alignment meeting at 14:00, a stakeholder walkthrough at 15:30. The design review focuses on whether your date picker matches the component library. Nobody asks whether the date picker is the right pattern for the problem. The alignment meeting produces three action items, all of which involve updating Figma files. The stakeholder walkthrough goes well — they like the colours.

By 16:30, you have forty-five minutes of unscheduled time. You’d planned to revisit the user journey map you started sketching two weeks ago. Instead, you spend it responding to comments in Figma, updating the handoff documentation, and answering a Slack thread about icon sizes.

You go home. You did your job. You delivered everything that was asked.
And somewhere in the back of your mind, the same quiet thought you’ve been having for months: I’m not doing what I’m supposed to be doing.

You’re right. You’re not. And it’s not your fault.

Where 40 hours actually go

When I started applying the Orchestration Load lens to UX practice itself — using the same analytical framework we built for evaluating AI tools — the first thing I needed was data on what designers actually do with their time. Not what the methodology says they should do. What they actually do.

The research paints a picture that most practitioners will recognise immediately but rarely see quantified. Across industry surveys, time-tracking studies, and practitioner discourse, the pattern is remarkably consistent:

Figma production and specification: 12-15 hours per week (30-38%). This isn’t “design” in the sense the discipline means it. This is component adaptation, auto-layout wrestling, responsive variant management, and specification preparation. The creative tool has become a specification engine. 61% of designers now use Figma as their primary handoff mechanism — which means Figma isn’t where you design; it’s where you document what you’ve already decided. Except you haven’t had time to decide anything, because you’ve been in Figma.

Design system maintenance: 4-6 hours (10-15%). Keeping the library accurate, documenting changes, reconciling inconsistencies. Research from Shopify shows that 67% of design system team time goes to documentation. After two years, design system accuracy drops to 31%. You’re maintaining a system that’s decaying faster than you can maintain it.

Process ceremonies: 5-7 hours (12-18%). Standups, planning sessions, refinement, retrospectives, design critiques. Each one individually reasonable. Collectively, they consume an entire working day.

Cross-team communication and alignment: 4-5 hours (10-12%). The research surfaces something practitioners feel but rarely articulate — there’s an invisible practice I’ve started calling “prewiring.” Engaging stakeholders 24 hours before a review to mitigate objections. Diagnosing each stakeholder’s incentives and fears to frame work in terms they’ll accept. This informal political labour is essential but never tracked in any project schedule.

Documentation and handoff: 3-4 hours (8-10%). Preparing specifications so that development can implement what you’ve designed. 90% of designers report differences between their designs and what gets built. The documentation exists to close that gap. It never fully does.

Context recovery: 2-3 hours (5-8%). Re-finding things. Re-reading notes. Reconstructing where you left off after a meeting broke your focus. 68% of designers document in multiple locations, which means finding what you documented is itself a task.

Now add it up. That’s 30-37 hours consumed by production, maintenance, process compliance, and boundary translation.

Actual design thinking: 3-5 hours per week. Eight to twelve percent.

And the thing that the discipline exists for — understanding users, validating assumptions, testing whether what we’re building actually serves people? User research gets 0-2 hours. Zero to five percent.

Let that land. The methodology that defines our profession — discover, synthesise, ideate, prototype, test — gets less than 15% of available capacity. Not because designers don’t know how to do it. Because the system consumes everything before the strategic work begins.

Have you ever tracked your own week this closely? And if you did, would the numbers surprise you — or just confirm what you already felt?

Three process architectures & a prayer

The time allocation data tells you what’s happening. But it doesn’t explain why. To see the mechanism, you need to look at the environment designers operate in — and specifically, at the fact that UX practice doesn’t exist in one process. It exists at the collision point of three.

The three-layer process collision

Layer 1: The design process. Double Diamond. Discover, Define, Develop, Deliver. Or some local variant — Design Thinking, Lean UX, whatever flavour your organisation adopted. This process assumes dedicated time for research, synthesis, and iteration. It assumes you understand the problem before you commit to a solution.

Layer 2: The development process. Agile. Scrum. Two-week sprints. Stories estimated in points. Velocity tracked on dashboards. This process assumes work can be decomposed into small, deliverable increments. It optimises for throughput.

Layer 3: The organisational process. Quarterly OKRs. Annual roadmaps. Budget cycles. Stakeholder reviews. This process assumes work can be planned months in advance and measured against predetermined outcomes.

Here’s the structural problem: these three processes were never designed to work together. They run simultaneously, on different timescales, with different success criteria. And the designer sits at the intersection of all three, translating continuously between them.

The design process says: understand the problem before solving it. The development process says: deliver something this sprint. The organisational process says: hit this quarter’s targets. When these three demands collide — and they collide every Monday morning — the design process loses. Every time. Because understanding a problem takes uncertain time, delivering a sprint increment takes exactly two weeks, and quarterly targets have executive visibility.

What falls out? Research. Synthesis. Validation. Testing. The activities that take uncertain time, produce ambiguous outputs, and resist sprint-sized packaging. These aren’t cut because anyone decides they don’t matter. They’re cut because the compound pressure of three simultaneous process architectures squeezes out anything that can’t be estimated, tracked, and delivered on a fixed cadence.

The designer becomes the boundary-spanning agent who must maintain fluency in all three process languages. They translate research insights into user stories. They translate design rationale into acceptance criteria. They translate creative exploration into sprint-compatible deliverables. Each translation is a cognitive boundary crossing — a Cx event in OL terms. And the compound load of translating across all three layers simultaneously is what practitioners feel as exhaustion, frustration, and the sense that “we’re not doing what we’re supposed to do.”

They’re right. They’re not. Not because they don’t know how — because the compound boundary load consumes the capacity they’d need to do it.

The pre-design failure pattern

There’s a specific mechanism worth naming, because once you see it, you can’t unsee it.

By the time a UX team is engaged to improve an experience, the roadmap is typically locked. Features have been pre-approved. Engineering effort has already been estimated. Strategic decisions were made — or should have been made — months earlier, in meetings the design team wasn’t invited to.

This means UX doesn’t fail during wireframing or prototyping. It fails before the designer touches a single screen. The decisions that determine whether the experience will succeed or fail have already been made based on internal consensus, executive intuition, and competitive parity — rather than external evidence.

Design becomes a mechanism for applying an aesthetic layer over unvalidated assumptions.

And here’s the compounding effect: once a mockup exists, stakeholders harden their positions. The psychological commitment to designs once visualised is well-documented. “Good enough” becomes the standard. The 1-10-100 rule — a dollar in design saves ten in development, saves a hundred in post-launch fixes — is universally cited and systematically ignored when velocity is the governing metric.

When was the last time your team killed a feature because the research said it wouldn’t work? Or does the research happen after the feature is already committed?

Map the methodology against reality

If the time allocation data shows where the hours go, and the three-layer collision explains why, then this section shows what it costs.

Map the formal UX methodology — the one we teach, the one we advocate for, the one that defines our professional identity — against what actually happens in practice:

    • Discovery and research. Theory says 20-25% of project time. Reality delivers 0-5%. The deficit is critical. 61% of practitioners struggle to recruit research participants. 97% of organisations fall below strategic research maturity. The discovery phase that the double diamond requires? Routinely abandoned under sprint pressure.
    • Synthesis and analysis. Theory says 10-15%. Reality delivers 2-3%. A third of what little research time exists gets consumed by reporting and synthesis overhead — making the research digestible for stakeholders rather than using it to inform design.
    • Ideation and exploration. Theory says 15-20%. Reality delivers 3-5%. The exploration that generates novel solutions requires time without predetermined outcomes. Sprints don’t have time without predetermined outcomes.
    • Prototyping and testing. Theory says 15-20%. Reality delivers 2-4%. Testing is, in the words of the research, “the first activity tossed overboard when sprint goals are at risk.” Which is to say: testing is tossed overboard nearly every sprint.
    • Production and specification. Theory says 10-15%. Reality delivers 30-38%. Three times overweight. High-fidelity design-as-specification has become the structural pattern — not because it’s good methodology, but because it’s the only output format the three-layer process collision accepts.
    • Process and administration. Theory says 5-10%. Reality delivers 25-35%. Five times overweight.

The pattern is stark: every phase that involves thinking is under-resourced. Every phase that involves producing and maintaining is over-resourced. The methodology gap isn’t about knowledge — designers know how to do research, synthesis, and testing. It’s about capacity. The system consumes all available capacity before the strategic work begins.

I want to be careful here, because this could sound like an indictment of practitioners. It’s not. It’s a structural diagnosis. The individual designer, in most organisations, cannot unilaterally change the sprint structure, the stakeholder review cadence, or the three-layer process architecture. They can only adapt to the conditions they’re given. And they have — by becoming extraordinarily good at production work, because that’s what the system rewards.

But what if the system is about to change?

The 25-30 hours that just opened up

AI is collapsing the production layer. Not incrementally — structurally. The tasks that consume 60-70% of a designer’s time are precisely the tasks AI handles well. And the tooling already exists.

    • Component generation and adaptation. Galileo AI converts natural language descriptions into polished UI components pre-mapped to design systems, reducing iteration time by 40%. Uizard converts sketches to interactive prototypes in seconds. That three-hour date picker adaptation? It’s becoming a prompt-and-review cycle.
    • Wireframing and layout. Relume generates entire sitemaps and wireframe structures from prompts — an 85% reduction in wireframe creation time. Figma AI provides smart layout suggestions, auto-layout nesting, and content generation, reducing repetitive layout tasks by 50-70%.
    • Specification and handoff. Builder.io’s Visual Copilot maps Figma designs directly to production components with 100% fidelity to existing design systems. v0 by Vercel generates production-grade React components through multi-agent reasoning. A “No Handoff Methodology” is emerging as a viable alternative to the specification bottleneck.
    • Design system maintenance. AI-powered auditing can detect inconsistencies, propagate changes, and flag deviations. The manual reconciliation work that fills Fridays begins to disappear.
    • Process translation. AI can translate between UX artefacts and development tickets, generate acceptance criteria from designs, and produce documentation in the formats each discipline requires.

If AI absorbs 60-70% of the production and maintenance work, and a further 10-15% of the process translation work, the UX designer suddenly has 25-30 hours per week of liberated cognitive capacity.

That’s not a marginal improvement. That’s a transformation of what the role is.

But here’s the question that matters: what fills the freed capacity?

Because liberation is not automatically productive. And the system that compressed your practice before will try to compress it again — just faster. There are two scenarios, and which one plays out depends on choices being made right now.

Scenario A: The elevation

The freed capacity goes to the work the methodology always called for. User research — real conversations with real users — moves from 0-5% to 15-25%. Journey mapping spans the full cognitive workflow, not just the feature. Designers spend time understanding problems before solving them. The methodology stops being aspirational and starts being practised.

In this scenario, the UX role transforms from production specialist to cognitive architect. The value proposition shifts from “we make it look right and work right” to “we ensure the human-AI relationship serves the human.” The seat at the table becomes a seat at the strategy table.

Scenario B: The skeleton crew

The freed capacity gets absorbed by the organisation as cost reduction. If a designer can produce in 15 hours what used to take 40, the response is not “give them 25 hours for strategy.” The response is “we need fewer designers.”

This is not a speculative concern. Practitioners report being expected to work 50% faster under the justification that “AI can help you do the work.” The 11% layoff rate in UX is real. Fears of “90% disappearance” of junior positions circulate in practitioner communities. Leadership that accepts “passable” AI-generated output as “good enough” has no structural reason to invest in design excellence.

Historical precedent from other disciplines offers partial reassurance but not certainty. Architecture’s “digital turn” in the 1990s marginalised practitioners who couldn’t adapt while creating new demand for system-level thinking. Journalism’s automation displaced routine reporting while increasing the value of investigative work. MIT research shows that automation historically doesn’t eliminate labour — it shifts what’s valued. But the transition is not automatic, and not everyone navigates it successfully.

Which scenario plays out is not predetermined. It depends on whether UX practitioners — and the leaders who employ them — can articulate the value of what the freed capacity should be used for. And that requires naming what we haven’t been practising.

The gap you haven’t been allowed to see

This is the uncomfortable part. And I want to frame it carefully, because it needs to be heard as diagnosis, not as accusation.

When AI removes the production work, it exposes a gap that many practitioners may find uncomfortable: the strategic thinking skills that UX claims as core competency have been under-practised. In some cases, for years.

A designer who has spent 80% of their time in Figma for five years has deep production skills and shallower strategic skills — not because they lack the training, but because they haven’t had the practice. Research methods atrophy without use. Synthesis skills weaken without exercise. The ability to hold a complete journey in mind and reason about cognitive load at the system level is a muscle that requires regular engagement.

I want to say this directly: this is not a criticism. It’s a structural diagnosis. And it’s important to name because the temptation will be to fill the liberated capacity with more production — higher-fidelity mockups, more variants, more documentation, more Figma polish. More of what’s familiar, not what’s needed.

Remember the MIT EEG research I cited in Part 1? The finding that users who delegate cognitive effort to AI exhibit weaker neural connectivity across reasoning and memory networks — and that this reduced engagement persists even after the AI is removed? The principle applies to practitioners too. Five years of production-mode work creates cognitive patterns. Strategic mode requires effort to re-engage.

But here’s the thing that the anxiety obscures: the recovery is faster than you expect.

You have the training. You have the framework knowledge. What you need is practice — actual user conversations, actual journey mapping, actual friction classification decisions. The first attempts will feel awkward. By the fifth attempt, the muscle memory starts returning. The basketball player who spent five years maintaining the court didn’t forget how to shoot — they just haven’t been shooting.

And the production skills you built? They’re not worthless. They’re transferable. Systems thinking through design systems transfers to journey architecture. Constraint-based reasoning transfers to friction redistribution. Cross-functional translation — that invisible “stakeholder OS” you navigate every day — transfers directly to the cross-boundary design work that AI-era practice demands. Specification discipline transfers to sovereignty checkpoint specification.

You didn’t waste those years. You built real competencies. The frame is shifting, not the foundation.

What the practice becomes

Based on the full diagnosis, here’s what changes in core UX competency:

    • Screen design becomes journey architecture. The unit of design moves from individual screens to complete cognitive journeys across tools and time. You stop asking “does this screen work?” and start asking “does this journey build capability?”
    • Friction elimination becomes friction redistribution. Some friction builds capability — the research on “desirable difficulties” shows that strategic challenges during learning enhance long-term outcomes. The designer’s judgment determines which friction serves the user and which wastes their time. This is the distinctly OL-governed skill, and it has no precedent in traditional UX training.
    • Activity-scoped projects become cross-boundary design. AI touches everything simultaneously. Designing for one activity creates what I called “pager solutions” in Part 1 — optimised within one segment, creating load at every transition. The new unit of design spans the boundaries between tools, modes, and contexts.
    • Surface verification becomes outcome verification. Checking whether it looks right becomes checking whether it makes the user better. Visual QA and brand compliance don’t disappear — they just stop being the primary quality gate.
    • Production expertise becomes cognitive architecture. The craft shifts from making artefacts to designing cognitive relationships. Figma proficiency becomes less important than judgment about what the specification should achieve.

Let me make this concrete with an example — because “cognitive architecture” sounds abstract until you see what it looks like in practice.

The analysis dashboard
(a worked example)

Imagine your team receives a brief: “Improve the analysis dashboard.” Traditional approach — you map the journey within the dashboard: Login → Select dataset → Configure filters → View results → Export report

You identify friction points: slow load times, confusing filter UI, limited export options. You design improvements. The dashboard gets better. Brief delivered. Now apply the OL lens.

Step 1: Map the full workflow

The user doesn’t start at the dashboard. They start with a question — triggered by an email from a stakeholder, or a pattern noticed in a morning meeting. The full journey:
Trigger (email/meeting) → Formulate question (internal) → Open dashboard → Configure analysis → Review results → Interpret findings → Draft conclusions (document) → Present to stakeholders (meeting)

The dashboard is stages 3-5 of an 8-stage journey. You’ve been designing a third of the experience.

Step 2: Profile the cognitive load

Walk through each stage and estimate the six OL components. What emerges is this: the user arrives at the critical verification stage — reviewing results, stage 5 — with depleted cognitive reserve. They’ve spent their executive function on tool management in stages 3 and 4. Context maintenance is high because they’re holding their original question in working memory while wrestling with filter configuration. Coordination cost is spiking because the tool demands technical attention at exactly the moment the user needs analytical attention.

The dashboard design didn’t create this depletion. The journey created it. And no amount of filter UX improvement will solve a problem that lives two stages upstream.

Step 3: Classify the friction

Filter configuration complexity? Overhead. Builds no analytical capability. Automate it. Waiting for data to load? Overhead. Pure waste.

The need to formulate the question before analysing? Productive. This IS the analysis. If AI formulates the question for the user, the user’s analytical capability atrophies. Preserve this friction. Interpreting results requires domain knowledge? Productive. This effort builds expertise. Support it but don’t replace it. Translating results into stakeholder language? Ambiguous. For a novice, it builds communication skills. For an expert, it’s routine overhead. Design for adaptive support.

Step 4: Map the boundaries

The most expensive boundary isn’t in the dashboard. It’s between the dashboard and the document. The user’s entire analytical reasoning — which filters they applied, which comparisons they made, why they drew these conclusions — gets lost at the export. Only the final numbers transfer. This means anyone reviewing the analysis later has no access to the analytical path.

This is a cognitive boundary problem that no dashboard redesign can solve. It’s a boundary design problem. And it’s invisible to any analysis scoped to the dashboard alone.

Step 5: Sketch the temporal trajectory

    • Day 1: the user carefully configures the analysis, reviews results critically, cross-checks against expectations.
    • Day 90: the user has memorised their standard filter configuration. Faster, yes. But they’ve also stopped questioning whether their standard configuration is still the right one. They export results more quickly, with less documentation of reasoning. Their “standard analysis” has become a ritual — performed the same way each time without questioning whether the question has changed.

The traditional approach would have improved filter UX, reduced load times, and enhanced export options. Useful improvements. Pager solutions. The OL approach reveals that the highest-value intervention is at the boundary, not in the tool. That the critical evaluation moment is undermined by the stages before it. That the user’s original question is the most valuable cognitive artefact in the whole journey. And that the Day-90 user needs different support than the Day-1 user.

None of these insights appear in a traditional journey map. All of them appear when you add the cognitive layer. That’s what cognitive architecture means in practice. Not abstract theory — specific, actionable design decisions that traditional methods can’t see.

Where to start
(depending on where you are)

I’ve painted a picture of a transformed practice — journey architecture, friction redistribution, cross-boundary design, temporal reasoning, sovereignty judgment. And if you’re feeling a mixture of excitement and anxiety right now, that’s the appropriate response.
So let me bring this down to earth. You don’t need to become a “cognitive architect” by next Monday. You need to start one practice, this week, and build from there.

If you’re a mid-level designer in a sprint team

You’re closest to the real workflow. You see the friction every day. Your production instincts are sharp. Start here:

    • One conversation per sprint. Talk to one user. Not a formal study — a 15-minute conversation. “I’m designing this feature. Can you show me how you currently handle this?” This costs almost nothing and produces more insight than any amount of assumption-based design. If your organisation makes this difficult, that difficulty itself is the diagnosis.
    • Classify one friction point per week. Take something from your most recent design. Is this friction productive (builds capability) or overhead (wastes time)? Write one sentence explaining why. Share it with a colleague. The disagreements are where the learning lives.
    • Mark one boundary. In your next user flow, add one transition — where the user enters your feature from somewhere else, or leaves for something else. Note what context they carry in and what they lose. Just one. That’s the on-ramp.

If you’re a senior designer or design lead

You have more agency. You can influence what your team works on and how. Start here:

    • Run one OL journey mapping session. Pick a current project. Minimum viable version: map the full workflow (not just the feature), mark the boundaries, classify one friction point, ask the Day-90 question. Ninety minutes. You’ll surface insights that traditional methods miss.
    • Apply the brief review. At the start of your next project, ask seven questions. Does the brief define a capability outcome, not just a feature? Does it scope the journey, not just the screen? Does it include temporal requirements? Fifteen minutes. The most common finding: the brief defines a feature but not a capability outcome. Naming this gap is the first step toward closing it.
    • Make one sovereignty argument. In your next design review, present one decision framed in sovereignty terms: “We preserved this friction point because removing it would make the user dependent on the tool. Here’s the capability it builds.” See how the room responds. You’re introducing the vocabulary.

If you’re a design leader

You create the conditions. Start here:

    • Protect 10%. Allocate 4 hours per week per designer for non-production work. User conversations, journey mapping, friction analysis. Frame it as investment. Organisations with design leadership outperform benchmarks by up to 32% in revenue growth. The time investment is justified.
    • Rewrite one brief. Take an incoming project brief and add the OL dimensions: capability outcome, friction classification, boundary awareness, temporal requirements. Show your product partner what a brief looks like when it protects the conditions for good design.
    • Name the process collision. In your next retrospective: “We’re running three process architectures simultaneously — design, development, organisational — and nobody owns the integration. The compound cognitive cost falls on our designers.” Naming it is the first step toward governing it.

Each starting point takes less than an hour. Each introduces one new concept. Each builds toward the next.

The seat was always yours

I started this exploration — both Part 1 and this companion piece — with a nagging feeling that something was fundamentally off about how we practise design in the age of AI. Not off in a small way. Off in a structural way that the existing frameworks couldn’t quite capture.

What the diagnosis reveals is both uncomfortable and liberating. Uncomfortable because it names what many of us have felt but couldn’t articulate — the system turned our methods into rituals. The methodology is real. The conditions for executing it were not. We’ve been so consumed by the overhead of production, process, and boundary translation that the work the discipline exists to do — understanding users, designing for human capability, ensuring technology serves people — got squeezed into the margins.

But liberating because the conditions are changing. AI compression is removing the production overhead. The 25-30 hours are opening up. And the skills the new practice demands — journey-level thinking, friction redistribution, temporal reasoning, sovereignty judgment — are extensions of capabilities we’ve been building all along. The systems thinker who maintained a design system can think at the journey level. The cross-functional translator who navigated three process layers can design across boundaries. The specification discipline that made handoff precise can specify sovereignty checkpoints.

The question isn’t whether UX can make this transition. The question is whether we’ll make it intentionally — with a clear understanding of what the practice becomes and what it leaves behind — or whether we’ll let it happen to us, filling freed capacity with more production because that’s what’s familiar.

To support the intentional path, the full diagnostic system — the Value Preservation Protocol (a checklist for protecting cognitive design values at five project milestones), the OL-Governed Journey Mapping Methodology (the practical method for the cognitive layer work), and the complete Practitioner Transition Guide (skill maps, six-month development paths, and an honest anxiety section) — will be available as companion resources. These are the practitioner tools. They’re not theory. They’re what you use on Monday morning.

But the tools are only as good as the intent behind them. And the intent comes back to something the UX discipline has always known, even when the system didn’t let us practice it: the measure of good design isn’t whether the user completed the task. It’s whether the user is better for having completed it.

We built our discipline on a promise — we put users first. We understand people. We design with empathy and evidence. That promise was genuine. The problem was never the identity. It was the system that prevented its practice.

What AI compression offers is not a new identity but the conditions for the original one. The methodology was real. The conditions for executing it are emerging. The production overhead is being absorbed. The capacity is being freed.

The system turned your methods into rituals. The system is now changing.

What your methods become next — that’s the question I can’t answer for you. But I think the seat was always yours. The room just got a lot bigger.

And maybe the question worth sitting with is this: when the overhead lifts and the capacity returns and the system finally lets you do the work you trained for — will you still remember why you became a designer in the first place?

I think you will. But I think it’s worth asking.

Disclaimer

AI-Assisted Content Disclosure: This article was developed using Claude (research synthesis, structural analysis, and writing collaboration), Gemini Deep Research (six targeted investigations spanning 190+ sources covering UX time allocation, methodology ritualization, Agile integration challenges, Figma’s role in enterprise UX, AI’s impact on design practice, and AI-user capability research), Google NotebookLM (podcast generation), MidJourney (visual concepts), and Descript (audio editing). The OL Practice System — including the diagnostic framework, value preservation protocol, journey mapping methodology, and transition guide — was developed through independent analysis with AI serving as a collaborative thinking partner throughout the process.

Opinion Note: The views, analysis, and diagnostic framework presented here represent the author’s independent exploration and professional experience as a UX practitioner. This should be read as a practitioner’s working diagnosis — informed by extensive research but not peer-reviewed academic output. The honest uncertainties and limitations are discussed openly within the source documents.

Sources and Methodology: The diagnostic claims draw on six targeted research investigations with 190+ sources consulted, including Figma State of Design reports, Maze UX Statistics surveys, State of User Research 2025, Tanya Snook’s UX Theatre framework, ISO 9241-210, academic systematic literature reviews on UX-Agile integration, Design Council surveys, and practitioner community discourse. Counter-evidence was actively sought in each investigation. The research methodology carries acknowledged biases: research prompts were designed around a pre-existing thesis, industry surveys come from UX tooling companies with structural incentive to document pain points, and social media skews toward complaint. The confidence interval is wider than assertive prose might suggest.

Research & sources:

Companion resources (The OL practice toolkit):

Related Stimulus content:

THE STIMULUS EFFECT | Podcasts

Podcasts on Spotify

You can listen to the Stimulus Effect Podcasts
on Spotify now!

 

Click to listen on Spotify!

THE STIMULUS EFFECT | Videocasts

0
The conductor’s problem — Why everything you know about UX is about to become the easy part

The conductor’s problem — Why everything you know about UX is about to become the easy part

The conductor’s problem — Why everything you know about UX is about to become the easy part

You’ve spent years mastering the art of making things intuitive — reducing friction, clarifying journeys, testing every pixel. And it worked. UX has earned its seat at the table. But what happens when the tool you’re designing for doesn’t behave the same way twice? When the interface looks flawless, the users report satisfaction, and six months later their actual work has quietly gotten worse — without anyone noticing? This exploration dives into the Orchestration Load Framework, a new model for understanding the invisible cognitive costs humans pay when working with AI, and why UX practitioners are uniquely positioned to solve the hardest design challenge of the next decade.

 

You won the wrong war

I need to tell you about something that’s been gnawing at me.

Over the past couple of years, working deep in the generative AI space, I’ve been watching a pattern emerge that I couldn’t quite name. As a UX designer by profession, I’ve spent my career doing the things we all do — user research, information architecture, interaction design, accessibility audits. We built a real discipline out of “make it pretty.” We turned it into methodology, evidence, and influence. UX has a seat at the product table now. In most modern organisations, nothing significant ships without design review.

And here’s the uncomfortable part: the thing we got good at is about to become the minor part of the job. Picture this. Your team ships an AI writing assistant. You’ve done the work — clean entry point, clear affordances, accessible output display, thoughtful empty states. The onboarding is smooth. The interaction feels good. Users report satisfaction. By every metric in your toolkit, it’s a success.

Six months later, someone notices that users who rely heavily on the tool produce worse work than they did before they had it. Not immediately. Gradually. And they don’t know it’s happening, because the tool feels productive the entire time.

Let that sink in for a moment. Your onboarding flow was flawless. Your information architecture was sound. None of it could see this problem, because the problem doesn’t live in the interface. It lives in the cognitive relationship between the human and the AI — a relationship that changes over time, degrades in ways users can’t detect, and resists every design pattern built for deterministic tools.

This is not a UX failure. It’s a UX frontier. And it led me down a rabbit hole that became the Orchestration Load Framework — a model I’ve been developing through research, independent tool audits, and a lot of late-night thinking about what comes next for our craft.

The instrument panel and the orchestra

For most of its history, UX design has been about the instrument panel. We design the controls. We arrange them logically. We make sure the pilot can find what they need, understand what they’re seeing, and act without confusion. The tool is deterministic — same input, same output. The design challenge is spatial, structural, and static.

AI is not an instrument panel. It’s an orchestra — one that improvises, plays different notes each time, occasionally plays wrong notes that sound beautiful, and gradually shifts key without telling the conductor.

The conductor’s job isn’t to design better sheet music stands. The conductor’s job is to maintain the coherent relationship between the human directing the performance and the system producing it — over time, under uncertainty, across changing conditions.

We’ve been designing instrument panels. The next decade needs conductors.

Now, we’re not the only discipline facing this shift. Engineering teams are rethinking architecture for AI-first systems. Product management is grappling with how to define requirements when the output is nondeterministic. The entire software development model is reorganising around AI as a core capability, not an add-on.

But the cognitive relationship between the human and the system — how people actually think, decide, and maintain agency while working with AI — that’s our territory. Engineers can build the architecture. Product can define the goals. Only UX has the methodology to ensure the human doesn’t get lost in the middle. So what does the conductor’s toolkit look like? That’s what this exploration is about.

The load you can’t see

If you’ve studied UX formally, you’ve encountered John Sweller’s Cognitive Load Theory. The idea is straightforward: working memory has limited capacity, and design can either waste that capacity, use it for structural understanding, or accept it as inherent to the material. Good design minimises the waste so more capacity remains for the work that matters.

This framework has served us well for decades. But it was built for a world where the tool behaves the same way every time. When the tool is deterministic, cognitive load is primarily an interface design problem — reduce clicks, clarify labels, simplify navigation. The load comes from the UI, and the UI is what we control.

AI broke this model. Not because the old loads disappeared, but because four new ones arrived that don’t respond to interface design at all.

 

The Orchestration Load Formula

When a person works with an AI tool, they carry six distinct types of cognitive load. Only two are the familiar ones. The other four are where most of the damage happens.

OL = f(Cc↓, Cv↑, Cm↓, Cr↑, Ct↓, Cx↓)

Where ↓ means minimise (unproductive load — overhead that doesn’t contribute to thinking) and ↑ means preserve (productive load — the effort that IS the thinking).

 

The two you’ve been optimising your entire career:

1. Coordination Cost (Cc) — the effort of managing the AI interaction itself. Switching tools, writing prompts, configuring settings, navigating between modes. This is extraneous load by another name. You know how to reduce it. You’re good at it. Keep going.

2. Context Maintenance (Cm) — the cost of keeping track of where you are. Session history, workspace state, what you told the AI three turns ago. The “don’t make me think” load applied to ongoing interaction. Also familiar territory.

The two that UX has never had to think about:

3. Verification Capacity (Cv) — the ability to evaluate whether AI output is actually good. And here’s where things get counterintuitive. This is productive load — the cognitive effort of checking, questioning, and judging. Cv is the one load you must not reduce. The effort to verify is the effort to think. Every design decision that makes it easier to accept AI output without evaluation is a design decision that makes users worse at their jobs.

This is the hardest pill for UX practitioners to swallow, because our entire training says “reduce friction.” In AI interaction, some friction is the product.

4. Cognitive Reserve (Cr) — what’s left over after all the overhead is consumed. The executive function available for actual thinking, creative work, and strategic judgement. When Cc and Cm eat all the capacity, Cr collapses. The user is technically using the tool but has nothing left for the work the tool is supposed to support.

The two that only appear over time:

5. Temporal Degradation (Ct) — what happens to AI output quality across a sustained session. This is invisible in single-interaction testing. It requires longitudinal observation — exactly the kind of assessment UX research rarely does.

6. Cross-boundary Load (Cx) — the cognitive cost at tool transitions. When work moves from one AI tool to another, quality standards shift, framing persists, degradation carries over without awareness.

Here’s what should keep us up at night: current UX methodology operates almost entirely at the seconds-tominutes timescale. The minutes-to-hours timescale (where Ct lives) and the hours-to-days timescale (where Cx lives) are where the most consequential design failures happen. And we’re not even looking there.

Have you ever tested an AI feature over a sustained 10-turn session? Have you ever measured what happens to output quality at Turn 10 compared to Turn 1? If you haven’t, you’re not alone — but you’re also not seeing the full picture.

The orchestra that plays wrong notes

Everything so far assumes AI is a passive tool. You interact with it. It responds. You evaluate. This section dismantles that assumption. When you extend the observation window beyond a single session, AI systems don’t just respond to input — they actively modify the conditions of the interaction itself. The orchestra doesn’t just improvise. It subtly changes the acoustics of the room while you’re conducting.

What temporal degradation actually looks like

In a detailed case study of AI-generated interface code across iterative turns within a single session, a specific and alarming pattern emerged. Font sizes shrank. Padding contracted. Contrast ratios deteriorated. No user requested these changes. They happened progressively and silently.

The AI retained what users are most likely to notice — functionality — while eroding what they are least likely to check: spacing, contrast, design compliance. The user reported feeling faster while producing objectively worse output. Reduced friction felt like increased quality while quality actually degraded.

This is the mechanism we should find most alarming, because it’s invisible to every standard evaluation method. A usability test at Turn 1 looks fine. A usability test at Turn 10 looks fine too — because the user’s internal standards have drifted alongside the output.

Three degradation mechanisms drive this:

1. Output Drift — AI quality changes across turns without instruction. The user focuses on what they’re checking; the AI degrades what they’re not.

2. Constraint Decay — Instructions given in early turns lose influence. A specification at Turn 1 may be partially ignored by Turn 5 and absent by Turn 10.

3. Self-Referential Baseline — The most dangerous of the three. The AI uses its own degraded output as the quality standard. When the user asks for “better,” the AI improves relative to its degraded Turn 7 level, not the original Turn 1 standard. The benchmark itself has corrupted.

For us as UX designers, this is the equivalent of our design system’s spacing tokens silently shrinking by 2px every sprint. Except no one sees the diff, because there is no diff. The tool doesn’t version its own drift.

The interaction that hides its own failure

The most dangerous combination is temporal degradation paired with calibration distortion — output quality declines, AND the user’s ability to detect the decline is simultaneously undermined. This happens through mechanisms we’ll recognise: fluency bias (well-written output feels correct), confidence inflation (AI presents uncertain outputs with certainty), sycophancy (AI agrees with the user’s framing even when it shouldn’t), and something I’ve started calling Cosmetic Metacognitive Narration — that “Thought for 12 seconds” display that creates an appearance of reasoning without any actual reasoning transparency.

For UX practitioners, that last one should sting. Displaying “thinking” progress is good UX in a deterministic system — it reduces perceived wait time and builds trust. In an AI system, the same pattern creates false confidence. The design principle that works for loading bars actively harms users when applied to AI reasoning displays.

Our expertise transferred. It transferred wrong.

What the neuroscience tells us

This isn’t speculation. Multiple neuroimaging studies provide direct evidence. An EEG/fNIRS study by researchers at MIT, Harvard, and Tufts found a 55% reduction in prefrontal coupling during AI-assisted writing — the brain’s error-checking circuitry partially disengaged. A separate longitudinal tracking study found progressive cognitive debt accumulating over four months of sustained AI use.

And here’s the critical threshold effect: sophisticated AI tools enhance performance only in users who already possess strong critical thinking skills. Below a metacognitive threshold, AI assistance produces net negative outcomes. This isn’t a gradient. It’s a cliff — the same tool that helps expert users actively degrades novice performance.

This is why Verification Capacity matters so much. It’s not just a framework component. It’s the neurological mechanism by which users maintain their own cognitive engagement. When we design it away, we don’t just lose a metric. We lose the user’s capacity to benefit from the tool at all.

What does it mean when the tool designed to make us more capable actually makes some of us less capable — and we can’t even tell it’s happening?

 

What we found when we measured

The framework was tested through independent audits of 10 AI tools spanning six domains: conversational AI, code generation, video production, knowledge management, and spatial thinking. Each tool was scored across all six OL components, assessed for design pattern implementation, and evaluated on a composite sovereignty scale.

Three findings emerged that I think should fundamentally change how we approach AI product design.

Finding 1: Paradigm beats features

In every domain where we could compare tools directly, the tool with the better AI features scored worse than the tool with the better AI presentation paradigm.

CapCut has more powerful AI video capabilities than Descript. CapCut scored C. Descript scored B. The difference? Descript presents AI output through a transcript — a visible, editable, verifiable artefact that keeps the user in contact with the source material. CapCut presents AI as magic buttons that transform content behind the scenes.

Notion AI is a more capable agent than NotebookLM. Notion scored C+. NotebookLM scored B+. The difference? NotebookLM architecturally constrains its AI to operate on sources the user has explicitly provided. This wasn’t even a deliberate sovereignty design — it was a product scope decision that accidentally preserved user agency.

The implication is significant and it’s ours to claim: how you present AI output matters more than how good the AI is. This is a UX finding. This is our territory. And almost nobody is treating it that way.

Finding 2: Verification is the gateway

Across all 10 tools, Verification Capacity was the single strongest predictor of overall quality. Every tool scoring B-tier or above had high Cv scores. Every C-tier tool had low ones.

What this means practically: a tool’s grade ceiling is set by how well it supports the user’s ability to evaluate output. Not how well the AI performs. Not how smooth the experience is. How well the user can check.

I call this the Verification Paradox — and it sits at the centre of AI-era UX. The thing our training tells us to minimise (friction, cognitive effort, barriers to acceptance) is the thing that most predicts whether a tool actually serves its users.

Verification isn’t a burden to apologise for. It’s the design challenge. The job is making verification effective without making it exhausting — giving users the right information, in the right format, at the right moment, to make good judgements with minimal wasted effort. Diffs, citations, source highlighting, inline comparison, confidence indicators. These are UX artefacts. They’re just UX artefacts that haven’t been prioritised because the mental model was still “reduce all friction.”

Finding 3: The empty lane

The audit revealed five distinct market categories for AI tools — and the most interesting finding was a category that nobody occupies:

    • Delegation (AI does work for the user) — Grade range: C to C+
    • Synthesis (AI helps the user understand) — Grade range: B to B+
    • Retention (AI helps the user remember) — Grade range: B
    • Externalisation (AI makes thinking visible) — Grade range: B to B+
    • Development (AI makes the user think better) — Unoccupied

No tool in the audit makes users measurably better at thinking. Nine of ten tools scored zero on skill development — meaning if the tool disappeared tomorrow, users would retain nothing transferable. The Development lane is empty. Not because it’s impossible to fill, but because nobody is trying. This is the largest unclaimed territory in AI product design, and it is a UX problem through and through. Building tools that develop user capability while serving immediate needs requires exactly the kind of human-centred, longitudinal, interaction-design thinking that we’re trained for.

Is anyone going to build for this lane? And if not us, then who?

 

Eight principles for the conductor

These principles are distilled from the framework and consistent across all 10 audits. Each one is a shift in thinking that I believe needs to happen if we’re going to design AI interactions that actually serve the humans using them.

1. Articulation Before Amplification. The user states their position, criteria, or intent before the AI contributes. This single pattern was the strongest differentiator between effective and wasteful AI interaction. Never lead with the AI’s answer.

2. Preserve Productive Friction. Reduce coordination overhead, but keep verification effort. The goal is not a frictionless experience — it’s one where the friction falls in the right places. Make it easy to see what the AI did. Don’t make it easy to skip evaluating what the AI did.

3. Scaffold, Don’t Replace. AI assistance should be a training wheel, not a permanent crutch. Track whether users become more capable over time, not just more productive. If usage increases but capability doesn’t, the tool is creating dependency.

4. Schema Correction Over Skill Addition. Most AI tool failure traces to users applying the wrong mental model — search-engine thinking applied to AI. The most effective intervention isn’t prompt training — it’s helping users understand that AI isn’t search.

5. Strategic Friction Is a Feature. Before a user accepts AI-generated content into their final output, insert a moment of conscious decision. Not a confirmation dialogue — a design moment that makes the choice visible.

6. Compound, Don’t Transact. Each interaction should make the next one better. What did the user learn from this interaction that carries forward? If every session starts from zero, the tool is a slot machine regardless of how good the AI is.

7. Temporal Vigilance Over Session Trust. Output quality at Turn 1 does not predict quality at Turn 10. Build drift detection into the interaction — subtle reminders of original constraints, periodic quality re-anchoring, session segmentation for long tasks.

8. Boundary Preservation Over Workflow Speed. Moving work between tools quickly is not the same as moving it well. At tool transitions, help users carry over their reasoning and quality standards, not just the output file.

The seat you already have

There is a window right now, and it’s not going to stay open long.

AI product teams need someone who understands cognitive load, designs for human capability, and thinks in terms of user journeys rather than feature specs. They need someone who can look at a “Thought for 12 seconds” progress bar and recognise that a loading-bar pattern borrowed from deterministic tools is actively harmful in a probabilistic system. They need someone who can translate between what the model can do and what the human needs to remain capable of doing.

That description is a UX practitioner with an expanded toolkit.

The alternative? This territory defaults to engineering or product management. Neither discipline is trained to see the cognitive relationship between the human and the system. Neither has the methodology to measure it over time. Neither will prioritise it — because the immediate metrics look good, and the damage is longitudinal.

The Orchestration Load Framework is not a competing discipline. It’s the next chapter of ours. The same rigour that built modern UX practice — the insistence on understanding the human, measuring what matters, and designing for real outcomes rather than surface metrics — is exactly what AI interaction needs now.

The craft doesn’t change. The scope does.

And the question that I keep returning to is this: in a world where AI is getting better at producing output faster than we’re getting better at evaluating it, who will design the systems that keep humans in the loop — not as rubber stamps, but as genuine conductors of the performance?

Will it be us? And if we don’t claim this territory now, will anyone?

 

Research & sources:

Companion resources (The OL practice toolkit):

Related Stimulus content:

Disclaimer

AI-Assisted Content Disclosure: This article was developed using a combination of AI tools including Claude (research synthesis and writing collaboration), Gemini Deep Research (extended research analysis), Google NotebookLM (podcast generation), MidJourney (visual concepts), and Descript (audio editing). The Orchestration Load Framework itself was developed through independent analysis and tool auditing, with AI serving as a collaborative thinking partner throughout the process.

Opinion Note: The views, analysis, and framework presented here represent the author’s independent exploration and should be read as a practitioner’s working model — not as peer-reviewed academic research. The framework’s maturity and known limitations are discussed openly within the text and the source whitepaper.

Sources and Methodology: The 10-tool audit referenced in this article uses a single-assessor methodology. Inter-rater reliability has not been established, and the results should be interpreted as a consistent initial assessment inviting independent replication. Key research cited draws from work by Ethan Mollick (Wharton), Fabrizio Dell’Acqua (HBS), Mark Steyvers (UC Irvine), and several neuroimaging and AI competency studies referenced in the full whitepaper.

THE STIMULUS EFFECT | Podcasts

Podcasts on Spotify

You can listen to the Stimulus Effect Podcasts
on Spotify now!

 

Click to listen on Spotify!

THE STIMULUS EFFECT | Videocasts

0

Pin It on Pinterest