Saturday, April 25, 2026

The Cache Is the Thought — What KV Caching Reveals About How AI Actually Works
Machine Intelligence · Technical Essays
Architecture · Inference · Memory

The Cache Is the Thought

KV caching is usually described as an optimization. It is actually something more fundamental. The mechanism by which large language models hold a thought in their head.

• • •

There is a moment, when you look closely enough at the details of large language models, when a seemingly minor implementation detail suddenly reveals itself as something structural. A load-bearing idea that everything else depends on. For me, that moment came with reading some of the recent ideas surrounding KV caching. What looked at first like a straightforward performance optimization turned out to be the core mechanism that makes modern AI inference not just fast, but possible. Understanding it changes how you think about what these systems are actually doing.

To see why, start with the mathematics of attention. The operation at the heart of every transformer model deployed today.


The Quadratic Problem

Attention(Q,K,V) =
softmax( QKᵀ / √dk )V
The attention operation

In their raw mathematical form, transformers are elegant but brutally expensive. Generating each new token requires attending over every previous token in the sequence, which means computation grows quadratically with sequence length. A sequence twice as long is four times as expensive. This is a direct consequence of what attention is doing. The mechanism works by computing relationships between every pair of tokens, and the cost of that operation scales with the number of pairs.

The result, without any mitigation, is that generating text token by token requires recomputing the same keys K and values V for every prior token at every step. The context grows. The recomputation grows with it. Something that might take milliseconds for a short sequence becomes seconds for a long one and for the context windows that modern applications demand, it becomes entirely impractical.

KV caching addresses this with a conceptually simple observation: if the keys and values for previous tokens haven't changed, why recompute them? Store them once. At each new step, compute only the query q_t for the current token, and compare it against what's already cached. The incremental cost of generation collapses from recomputing the entire past to attending over a stored representation of it.

That shift, from recomputation to reuse, is what takes inference from something theoretical to something deployable. But the deeper implications of the shift are less obvious than they first appear.

For details on the attention mechanism of K, Q, and V matrices see this article I wrote previously.


The Bottleneck Nobody Was Watching

The memory cost of a KV cache scales with the number of layers, the number of attention heads, the head dimension, and the sequence length. That last term, sequence length, is where the problem lives. It looks innocent in a formula. In practice it dominates everything.

The Scaling Reality

At long context lengths, the KV cache routinely exceeds the size of the model weights themselves. A system you might describe as "running a 70B parameter model" is, in memory terms, often doing something closer to managing an enormous dynamic tensor that dwarfs the static parameters. The model weights are the smaller problem.

This is where something important happened quietly in the AI scaling narrative, and it hasn't fully registered in how most people talk about these systems. For years, the dominant frame was compute: how many FLOPs does training require, how many does inference require, how do we reduce them. That frame is not wrong, but it is increasingly incomplete. In a large number of real inference scenarios, compute is no longer the bottleneck. Memory is specifically, the cost of moving the KV cache tensors around fast enough to keep the compute units fed.

This is a meaningful shift. Compute follows Moore's law reasonably well. Memory bandwidth is harder. Data movement: getting tensors from where they're stored to where they're needed, at the speeds modern accelerators demand is a problem with different scaling properties than arithmetic. A system optimized purely for FLOPs can still be slow because it spends most of its time waiting for data to arrive.

We used to think in terms of model size and compute. We should be thinking in terms of memory: how we move it, compress it, and decide what to keep.

Understanding this reorients how you think about the architecture. The question is not just how many parameters a model has or how many operations inference requires. It is where does the KV cache live at each moment in generation, how much of it fits on fast memory, and how expensive is it to retrieve the parts that don't?


The Geometry of Compression

The natural response to a memory problem is compression. And indeed, significant research effort has gone into compressing KV caches: quantizing the tensors, applying transform coding, reducing precision from FP16 to INT8 or lower. These approaches work, but they face a mathematical constraint that is easy to miss and important to understand.

Attention operates on inner products. The softmax that produces attention weights depends on the dot products between queries and keys: q · k. If compression distorts the geometry of the key and value tensors: if it changes the relative distances and angles between vectors then those dot products change, the softmax distribution changes, and everything downstream degrades. The outputs of the model shift in ways that may be subtle or severe depending on how aggressively you've compressed.

This is the same constraint that appears in classical dimensionality reduction problems. The Johnson-Lindenstrauss lemma, random projections, and related results all address the same core question: how do you reduce the size of a representation while preserving the pairwise relationships that matter? KV compression is that problem, applied to the geometry that attention depends on. You can compress, but you have to compress in ways that respect the inner product structure. Naive quantization that ignores this tends to fail in practice even when it looks acceptable on simple benchmarks.

What compression must preserve

The relative geometry of key vectors; specifically, the dot products q·k that determine which tokens attend to which. Distort these and the attention distribution changes.

What this means in practice

Aggressive quantization requires careful calibration. The safe compression budget is smaller than it looks. Geometry-aware methods consistently outperform naive approaches at the same bit depth.


What Gets Remembered

Compression reduces the cost of storing the cache. But there is a more radical question lurking underneath it: do you need to store all of it at all?

In any sufficiently long sequence, not every token matters equally to what comes next. Some tokens receive consistently high attention weights from many subsequent tokens. They are load-bearing elements of the context, anchors that the model returns to repeatedly. Others receive weights so close to zero that they are, for practical purposes, invisible. Attending over them produces essentially the same result as not attending over them. They are terms in a sum that contribute nothing meaningful to the output.

Selective eviction, dropping low-attention tokens from the cache rather than storing them indefinitely, is the architectural response to this observation. It is mathematically equivalent to approximating the full attention distribution with a sparse subset: keeping the entries that matter and discarding the ones that don't. The approximation is generally very good because the entries being discarded were contributing almost nothing.

What makes this genuinely interesting, rather than merely practical, is what it implies about the nature of context in these systems. If you can drop most of a long sequence and lose almost nothing, then the model is not using context the way a human reader uses a document: scanning it linearly, building a cumulative understanding of every sentence. It is using context more selectively, more dynamically, attending to a relatively small set of salient anchors while the rest fades into irrelevance.

The KV cache is a working memory and like all working memories, what it forgets matters as much as what it keeps.

This opens the door to something more ambitious: learned eviction policies, where the model itself develops a sense of what is worth remembering. Rather than applying a fixed rule, drop tokens below an attention threshold, the system learns to anticipate which tokens will remain relevant as generation continues. That is a very different relationship to memory than the one most people imagine when they think about how language models process text.


A Subtle Numerical Point

One of the less discussed properties of KV caching is that it is not, in a strict sense, numerically neutral. Because modern inference operates in finite precision, typically FP16 or lower, the act of caching tensors and retrieving them later can produce slightly different results than recomputing the same values fresh at each step. Floating point arithmetic is not associative. The order and grouping of operations matters, and caching changes both.

At first this seems like a defect. A source of numerical noise introduced by an optimization. But the right way to think about it is as a reminder that these systems are dynamic numerical processes, not purely symbolic ones. They are not executing a fixed logical procedure that produces a deterministic output. They are performing approximate arithmetic at scale, and the approximations matter. In some cases, caching actually improves numerical stability by avoiding the accumulation of rounding errors that would occur across repeated recomputation.

The practical implication is modest: reproducibility at the bit level requires careful attention to whether and how caching is implemented. But the conceptual implication is worth sitting with. The outputs of these models are sensitive, in small ways, to the history of how their computation was organized. The past, in a very concrete numerical sense, shapes the present.


Memory as Architecture

Pulling back from the technical details, there is a framing shift that KV caching makes almost inevitable once you take it seriously. The standard way to describe a language model is in terms of its weights: the parameters learned during training that encode the model's knowledge about language, facts, and reasoning patterns. Those weights are static. They don't change during inference. They are, in a loose sense, the model's long-term memory.

The KV cache is something different. It holds the immediate context, i.e. the specific sequence the model is currently processing, the active thread of the conversation or document. It is populated at runtime and discarded when inference ends. It is, in the same loose sense, the model's working memory: what it is currently holding in mind as it generates.

This distinction between long-term knowledge encoded in weights and short-term context held in cache maps onto something recognizable from cognitive science. And once you have that framing, a lot of recent engineering trends in AI infrastructure begin to look less like isolated optimizations and more like different strategies for managing a memory system.

Hierarchical memory systems that tier the KV cache across GPU, CPU, and SSD storage are solving the same problem as the brain's distinction between fast working memory and slower long-term retrieval. Cache reuse across multiple requests, so storing the representation of a long system prompt so it doesn't need to be recomputed for every user query is a form of memoization that treats shared context as persistent memory. Multi-agent systems that share KV state across instances are, in effect, building a shared working memory distributed across multiple reasoning processes.

The Emerging Picture

Model weights encode what a system knows. The KV cache encodes what it is currently thinking about. The next wave of AI infrastructure improvements will largely be about the relationship between those two things, i.e. how to make working memory larger, cheaper, faster, and smarter about what it retains.

What this suggests is that the next significant advances in AI capability will not come only from larger models or better training data. The axes that have dominated the conversation for the past several years. They will come substantially from how memory is managed: how it is compressed, how it is moved, how it is prioritized, and ultimately how models are taught to reason about their own context rather than passively consuming it.

KV caching began as a simple answer to a quadratic scaling problem. It has become, without anyone quite announcing it, the foundation on which practical AI cognition is built. The weights are what a model knows. The cache is what it is thinking. And thinking, it turns out, is mostly a memory management problem.

The next frontier in AI is not more parameters. It is smarter memory.

Friday, April 24, 2026

The Last Justification — Why Leaders May Be Running Out of Reasons to Exist
Ideas & Futures
Philosophy · Economics · Artificial Intelligence

The Last Justification

Leaders have always claimed their authority from the management of scarcity. What happens when scarcity ends?

• • •

In 1930, John Maynard Keynes predicted that by 2030 the economic problem, the human struggle for subsistence, would be solved. He was writing at the depths of the Great Depression, which made the forecast either visionary or crazy depending on your viewpoint. Now in 2026, the people building artificial intelligence are implying essentially the same prediction on a faster timeline, and they have more reason to believe it than Keynes did.

That convergence deserves more attention than it is getting. Because if the economic problem is actually being solved. If AI delivers anything close to the post-scarcity conditions its most serious architects believe it will of infinite abundance, then something else dissolves alongside scarcity that nobody is talking about. The entire philosophical and practical foundation on which human leadership has rested for ten thousand years.

Leaders, in every civilization that has ever existed, have drawn their deepest authority from one source: someone has to manage the scarcity. Not divine right. Not democratic mandate. Not military necessity. Those are the decorative justifications, the ones that get written into constitutions and carved into monuments. Underneath all of them, holding the structure up, is the one argument that was always hardest to refute. Resources are finite. People compete for them. Without coordination and control, the result is violence. Therefore: leaders.

What happens when that argument stops being true? What if the future doesn't need leaders?

This might have seemed radically theoretical a generation ago. But the possibility of artificial general intelligence (AGI); systems that can perform most economically valuable cognitive work at near-zero marginal cost threatens to dissolve the scarcity foundations on which every human hierarchy from the pharaohs to the Fortune 500 has been constructed. To understand how radical this might be, it helps to trace the argument from its roots.


Scarcity as the Engine of Hierarchy

Before agriculture, humans lived in small bands of twenty to one hundred fifty people. Leadership in these groups was fluid, situational, and constantly checked by group consensus. The best hunter led the hunt. The most experienced elder arbitrated disputes. When the situation changed, so did the leader. There was no permanent hierarchy because there was, relatively speaking, no permanent scarcity. Needs were simple. Nature provided enough, albeit not abundantly, but enough.

The transformation came with the agricultural revolution around ten thousand years ago. Stored grain created something new in human experience: the possibility of accumulated surplus. And with surplus came its shadow twin, artificial scarcity; the condition in which resources exist in sufficient quantity but are distributed unequally through ownership and control.

The first leader who said 'this is mine' did not solve scarcity. He manufactured it.

Rousseau (paraphrased), Discourse on Inequality, 1755

Jean-Jacques Rousseau saw this with unusual clarity in 1755. His Discourse on Inequality argued that human beings in their natural state were essentially content. Needs were simple, and there was no systematic domination. The fall came when someone enclosed a piece of land and declared ownership, and others accepted the fiction. From that moment, leaders emerged not to solve natural scarcity but to protect and perpetuate artificial scarcity, i.e. to maintain the conditions that justified their own authority.

This is a darker reading of leadership than most civics textbooks provide. But it is a consistent one. Almost every function we associate with leaders, such as adjudicating property disputes, organizing armies, taxing and redistributing, managing information asymmetries traces directly to the management of scarce resources or the conflict that scarcity generates.


A Prediction Nobody Remembers

1930 Keynes writes his forecast

In 1930, at the very depth of the Great Depression, the economist John Maynard Keynes published a short essay that almost nobody read at the time and almost nobody remembers now. It was called Economic Possibilities for our Grandchildren, and it contained one of the most audacious forecasts in the history of economic thought.

While the world economy was collapsing around him, Keynes told readers not to confuse a cyclical crisis for the long-run trend. Step back far enough, he argued, and the trajectory was unambiguous: productive capacity per person had been compounding for generations. Within one hundred years, which puts us at 2030, the "economic problem," meaning the struggle for subsistence, would be solved. Not alleviated. Solved. He predicted living standards in advanced economies would be four to eight times higher than in 1930. On the material side of his ledger, he was roughly correct.

But the radical heart of the essay was what he thought would follow from this. He predicted the working week would shrink to perhaps fifteen hours. Not because of idleness, but because fifteen hours of labor would genuinely be sufficient to sustain abundant lives. The rest would be leisure. And here his anxiety surfaces in the prose, because he was not celebratory about this. He worried that humans had been so thoroughly shaped by the discipline of scarcity that they would not know what to do with themselves in its absence.

The Keynes Problem

Keynes predicted material abundance with remarkable accuracy. What he underestimated was how powerfully existing hierarchies would capture and redirect that abundance and how deeply the psychology of scarcity was embedded in human identity. Productivity gains arrived on schedule. The fifteen-hour work week did not.

He described what he called "purposiveness" which is the orientation toward future goals over present experience that scarcity conditions produce as a useful neurosis. Civilization needed it to pass through the scarcity phase. But it would need to be shed once that phase ended. Fail to shed it, and people would generate artificial purposes, artificial hierarchies, artificial scarcities simply to maintain the familiar structure of striving.

That failure to shed it is largely what happened. The abundance arrived, and was immediately captured by expanding desires, by concentrated ownership, by the managerial and political class that Keynes assumed would become unnecessary. Rather than the state withering away, institutions found new rationales for their authority. The leaders did not dissolve into leisure alongside everyone else. They found new scarcities to manage.


The Anarchist Tradition Had It Right

Keynes was not alone in this structural diagnosis, though he arrived at it from a very different direction. A generation earlier, the Russian anarchist Peter Kropotkin had argued in Mutual Aid that cooperation, not competition, was the dominant force in both nature and human history and that in conditions of genuine shared abundance, hierarchical leadership would become not just unnecessary but actively parasitic.

Kropotkin's crucial move was the same as Rousseau's, stated more bluntly: scarcity is largely manufactured by the ownership structures that leaders enforce, not a natural condition requiring management. Remove the leaders, redistribute the abundance, and the entire philosophical justification for authority collapses. What remains is voluntary coordination. People organizing themselves around shared tasks without anyone claiming the permanent right to direct others.

Marx made the same argument through different machinery. His endpoint, full communism, is explicitly a post-scarcity condition. In that terminal state, he wrote, the state withers away because there is nothing left for it to manage. Leaders and governments are, in his framework, instruments for managing class conflict over scarce resources. Eliminate scarcity through technological development and collective ownership, and the entire apparatus becomes superfluous.

That no Marxist state ever reached this destination is a separate conversation. What matters here is the theoretical logic, which is precise: scarcity is the engine, hierarchy is the machine the engine drives, and without the engine the machine has no purpose.


What Remains Without Scarcity

Before accepting this conclusion too quickly, it is worth asking what would remain of leadership if scarcity were genuinely eliminated. The honest answer is: something, but less than we might suppose.

Pure coordination problems exist independent of scarcity. Even in conditions of perfect material abundance, groups face the challenge of synchronizing action: deciding which direction to go, when to act, how to sequence collective effort. These are not about managing competing claims on scarce resources. They are about the mathematics of group decision-making: any time more than two people have potentially different preferences, some mechanism is needed to aggregate those preferences into a single action.

But notice what that mechanism looks like in the absence of scarcity pressure. It looks less like a king, a CEO, or a general, and more like a protocol. A shared norm. A voting procedure. The coordination residue of post-scarcity society might not deserve the name "leadership" at all. It might simply be coordination, which is a very different thing.

Leadership as we know it may be a scarcity-adapted institution that has outlived the conditions that made it necessary.

The anthropologist David Graeber, in his final book The Dawn of Everything, documented societies that maintained deliberate mechanisms to prevent permanent leadership from emerging such as seasonal leadership, ritual humiliation of chiefs, voluntary dispersal when hierarchy became too rigid. His argument was that hierarchy is a choice, not an inevitability, and that many human societies have at various points chosen otherwise. The story that we have always needed leaders is itself, he suggested, a story that leaders tell.


The Rupture That Changes Everything

This is the point at which a ninety-year-old essay about economic possibilities becomes urgently contemporary.

Keynes assumed abundance would arrive gradually as continued industrial productivity compounding steadily across generations. What he did not anticipate was a discontinuous jump: a technology that does not incrementally improve labor productivity, but potentially replaces the need for labor across entire categories of cognitive work, almost simultaneously, within a compressed timeframe.

1930

Keynes publishes his forecast. Predicts the economic problem solved by 2030 via gradual compounding of industrial productivity.

2022–2024

Large language models demonstrate capability across most categories of cognitive work. The gradual path becomes a potential cliff edge.

2025–2026

AI lab leaders begin publishing explicit post-scarcity forecasts. Amodei's Machines of Loving Grace predicts compression of a century of scientific progress into a decade.

2030

Keynes' original target date. The destination may arrive approximately on schedule, but via a road he could not have imagined.

If you take seriously what people like Dario Amodei are actually arguing and not the hedged public statements but the internal conviction, the claim is that we are within years of systems capable of performing most economically valuable cognitive work at near-zero marginal cost. Amodei has written explicitly about compressing a century of scientific and economic progress into roughly a decade. Sam Altman has gestured at civilizational transformation on similar timescales.

If that trajectory is even directionally correct, Keynes was not wrong. He was describing a destination that turns out to be reachable by a faster, stranger road than he imagined. His 2030 prediction may prove accurate almost to the year, while being entirely wrong about the mechanism.

This is not entirely theoretical. The early tremors of this structural shift are already visible inside organizations operating today. Middle management layers; whose primary function has always been information relay, coordination, and resource allocation across hierarchical tiers are being cut across industries at a pace that would have seemed implausible a decade ago. Teams are shrinking. Reporting structures are flattening. The organizational pyramid that defined the modern corporation for a century is losing floors.

More telling is what is happening to specialization. For most of human history, specialized knowledge was a primary source of individual power and, by extension, a justification for hierarchy. The lawyer, the engineer, the financial analyst, the data scientist occupied positions of authority partly because they possessed capabilities others simply did not have. AI is eroding that boundary rapidly. Skills that once required years of training to acquire can now be approximated in minutes by someone with no formal background and the right tool. When specialization stops being scarce, the hierarchies built around controlling access to specialized knowledge lose their rationale alongside it.

What is happening inside companies is a preview of a much larger structural question. Organizations are not shedding management layers because they have decided hierarchy is philosophically unjustified. They are shedding them because those layers are no longer functionally necessary. The economic logic that created them has changed. That same logic, operating at civilizational scale, is what Keynes was pointing at and what the most serious AI forecasters believe is now accelerating toward its conclusion.


The Question of Capture

But this is where the argument reaches its sharpest and most uncomfortable edge. Abundance delivered by artificial intelligence does not automatically mean distributed abundance. The critical question and the one that Rousseau would recognize immediately is who owns and controls the systems that produce it.

If a small number of companies or individuals control the infrastructure that generates post-scarcity conditions, then scarcity-based hierarchy has not been eliminated. It has been concentrated to a degree without historical precedent. Rather than a million leaders managing localized scarcity, you have a handful of individuals managing the systems that produce everything. The entire apparatus of leadership; its justification, its function, its claim on obedience collapses upward into a single point of control.

This is not a hypothetical concern. The same dynamic that turned agricultural surplus into feudalism, and industrial surplus into plutocracy, is already visible in the ownership structure of the systems being built. The logic is identical. Scarcity is not eliminated; it is repackaged. Access to abundance becomes the new scarce resource, and the leaders of that access inherit all the authority that scarcity has always generated.

The Recurring Pattern

Agricultural surplus → feudal hierarchy. Industrial surplus → plutocratic hierarchy. AI surplus → ? The technology changes. The capture dynamic has proven remarkably stable across all three transitions. The question is whether this one is different enough to break it.

Altman himself has implicitly acknowledged this with his advocacy for Universal Basic Income and OpenAI's original nonprofit structure. A recognition that without deliberate redistribution mechanisms, the abundance his systems might produce would simply reconcentrate at the top. The leaders would not wither away. They would become more powerful, not less, precisely because they control the systems that have made all other scarcity manageable.


An Ending Without a Conclusion

History has tested the scarcity-leadership connection repeatedly without ever fully breaking it. Every technology that promised to dissolve hierarchy: the printing press, the steam engine, electrification, the internet instead generated new hierarchies organized around control of the technology itself. The same dynamic may play out with AI.

But there is something qualitatively different about a technology that can replace cognitive labor across essentially all domains. Previous technologies replaced physical labor in specific sectors while creating new categories of cognitive work. The demand for human participation in the economy shifted but did not disappear. If AI eliminates the demand for cognitive labor as comprehensively as previous technologies eliminated the demand for physical labor, without creating new categories of work to absorb the displaced, then the economic basis for most leadership simply ceases to exist.

Keynes thought this moment would come gradually, giving human psychology and institutions time to adapt. He was wrong about the pace. The institutions built on scarcity logic are already struggling to adapt to a world that is merely becoming more automated but let alone one that reaches genuine post-scarcity conditions.

The question is whether we get a post-scarcity world at all as the traditional need for leadership fades. Every previous surplus in human history has been captured by feudal lords, by industrialists, by financiers and redirected. The realistic risk is not the dissolution of hierarchy but its opposite: a small group of people controlling systems that can produce everything, governing a world that finally has no material reason to accept being governed.

Keynes was right about the destination. The only question is whether a handful of people will control the road to get there.

Sunday, May 11, 2025

Polanyi’s Paradox: Why We Know More Than We Can Tell

Polanyi’s Paradox is the idea that much of what we know cannot be clearly expressed in words or formulas, and it is something I have thought about a lot, especially in the context of engineering and AI. Despite what some might assume, software engineering is not just about generating code. It involves layers of intuition, architectural decisions, project context, unwritten standards, and rules of thumb that experienced engineers learn over time. This is why, when engineers struggle to explain exactly how tools like Cursor or Windsurf fit into their workflow, it is not because the tools aren’t valuable, because they are. And it's not because engineers lack communication skills, it's because these tools support only part of what software engineers do. Much of the rest relies on experience and intuition that is hard to articulate. In this post, I want to explore how Polanyi’s Paradox connects to software engineering, what it tells us about the current state of AI, and what it might take to build AI systems that truly grasp the kind of knowledge humans use every day that is tacit.

But first, let's look back at the historical origins of this idea.

Historical Perspective

Michael Polanyi, a Hungarian-British polymath (1891–1976), introduced what later came to be known as Polanyi’s Paradox in 1966. Polanyi started his career as a distinguished physical chemist before turning to philosophy, bringing a scientist’s insight into how knowledge actually works in practice. In his book The Tacit Dimension, he famously wrote “we can know more than we can tell”, referring to the tacit knowledge that underlies many human skills. For example, Polanyi pointed out that we recognize familiar faces effortlessly without being able to list the exact features or rules that distinguish each face; a nod to Gestalt perception and intuition.

He illustrated the gap between knowledge and articulation with everyday instances: “The skill of a driver cannot be replaced by a thorough schooling in the theory of the motorcar,” he noted, just as one’s intimate know-how of using one’s body is entirely different from a textbook understanding of physiology. In short, there is always a residue of know-how that we carry in our minds and muscles that we cannot fully put into words.

In a time when many thinkers believed all knowledge could be neatly defined and broken into rules, Michael Polanyi pushed back with a simple but powerful idea: we know more than we can tell. He argued that intuition, experience, and personal judgment are essential parts of how we understand the world. For Polanyi, knowing isn’t just about facts and logic; it’s also about the unspoken, hard-to-explain feel we develop through doing and observing.

Relevance to Software Engineering

Although formulated 60 years ago, Polanyi’s Paradox is alive and well in modern software engineering. Despite the field’s basis in logic and code, much of what good software developers and teams do resides in tacit knowledge rather than explicit rules. In fact, studies suggest that only the “tip of the iceberg” of knowledge in a software organization is explicit and documented, perhaps ~20–30%, while the vast majority (70–80%) is unwritten, experience-based tacit knowledge.

In software teams, explicit knowledge (documents, code repositories, formal processes) is just the visible tip. The bulk of “what we know”, such as the skills, experience, intuition, team culture lies beneath the surface, undocumented but crucial. This tacit layer includes things like unwritten best practices, intuitive design sense, and the many gotchas one learns only through hard-earned experience.

This means that a software team’s most critical understanding often lives in people’s heads and habits rather than in manuals or comment blocks. Seasoned engineers accumulate a deep reservoir of intuition about architecture, code quality, and problem-solving approaches that is hard to articulate formally. They just "know" certain things which is a direct reflection of Polanyi’s Paradox in the tech world.

Polanyi’s Paradox in Software Engineering

In day-to-day software development, Polanyi’s Paradox shows up in numerous ways. For example:

  • User Interface Design: Good UI/UX designers often rely on an instinctive feel for what is intuitive to users. There is no complete rulebook for “easy to use.” Much of it comes from empathizing with users and leveraging subtle design sensibilities developed over time. A designer might not be able to fully explain why one layout “just works” better than another, because they are drawing on tacit knowledge of human behavior and aesthetics learned through many projects and feedback cycles. This know-how stands in contrast to explicit guidelines (like style guides or usability heuristics). Those help, but the "artistry" goes beyond what’s written down.

  • Debugging: Tracking down a complex bug is as much an art as a science. A veteran developer can often zero in on the likely cause of a problem quickly, guided by a “gut feeling” from past debugging experiences. This skill of knowing where to look in thousands of lines of code or which log hint to pursue is typically acquired through experience and not easily codified or taught. Two different engineers might solve the same bug via very different thought processes, neither of which is fully documented anywhere. Debugging knowledge lives in their heads as pattern recognition: “Ah, I’ve seen something like this before and it might be due to X.” Such intuitions are hard to write into a step-by-step troubleshooting guide.

  • Knowledge Transfer in Teams: When a new developer joins a software team, they often go through an onboarding period that involves shadowing others, pair programming, and code review. These practices recognize that a lot of the team’s knowledge is “tribal knowledge”, accumulated wisdom about the codebase and conventions that isn’t in the official docs. From knowing the historical reasons why module Y was built a certain way, to understanding which team member to ask about database quirks, these are things newcomers learn through social interaction.

    Much of what senior engineers pass on to juniors is tacit: it’s storytelling, mentorship, and shared experience. If a key developer suddenly leaves, they take a trove of tacit knowledge with them, and it’s often non-trivial to fill that gap. Attempts to capture everything in exhaustive documentation often fall short, because you can’t foresee or codify every relevant detail. This is in spite of the best efforts of Program Managers and tools like Jira, Confluence, etc. Indeed, trying to convert all tacit knowledge into explicit form is “fundamentally flawed due to the very nature of tacit knowledge, which is inherently personal, context-dependent, and difficult to articulate.”

These examples show that software engineering is not just about formal specifications and algorithms; it’s a human craft. The most effective development teams leverage tacit understanding in architecture decisions, code readability, and anticipating user needs. And when this tacit element is missing, say for example, a team only follows rigid checklists without any personal intuition then the results are often mediocre. Polanyi’s Paradox explains why certain programming expertise can’t simply be “transferred” by reading a book.

As a consequence, strategies like code reviews, pair programming, and apprenticeship-style learning are crucial in tech because they help share the unspoken wisdom. Conversely, over-reliance on documentation has limits: no matter how many pages of design docs you write, there will always be nuance that new engineers must pick up by working with others. In short, a great deal of “what developers do” cannot be fully captured in code comments or process manuals. They know more than they can tell.

The Future of Software Engineeing

So all of this should make software engineers feel pretty good about their current job security, because their job is more than just code generation and anyone who has used AI tools knows that even though they can speed up their producitivity tremendously, these tools fall woefully short in many areas primarily because of this tacit knowledge engineers have built up. This is despite some CEOs making announcements that they are replacing engineers with AI. These annoucements are generally made by people who have never built and maintained a large code product or if they have it's been decades since they did so. Furthermore, announcements where they say AI is writing X% of their code are highly suspect because:

  1. It's unclear what they are counting in the X%. Is it agents writing big chunks of code based on prompts? As opposed to how much of the X% is just code completion. Because, for example, as you type something into Google Docs and it suggests the next word or phrase, do you really think that AI wrote X% of that doc because it's doing autocomplete?
  2. They fail to recognized the tacit knowledge that goes into code architecture.

However, this is the current state for software engineers. Thinking that this will be the future and won't change is just engaging in "copium", which I see a lot of on LinkedIn. For example, take a look at the replies to the CEO of Zapier on LinkedIn here where he states that AI won't replace jobs. There's a lot of wishful thinking in the comments that things aren't going to change that much. But this is the current state and people are not internalizing the exponential change that is happening in AI.

Additionally, the Polanyi's Paradox of tacit knowledge that is part of software engineering isn't a formidable problem to solve for AI agents and doesn't even require a huge technological breakthrough. So how do you solve the fact that an AI agent in a program like Cursor or Claude Desktop doesn't have all of an engineer's tacit knowledge? Well you give access to all that knowledge to the AI. So access to Slack conversations, entire organization repos, Microsoft Teams meetings, project management notes on customers, stand up meeting recordings, Jira, Confluence, whatever the organization is using to capture that history. If that is done, then you have an AI agent or agents that has more organization and project knowledge than any engineer that works there.

The requirements to do that are larger context windows as extended information, agents that can operate effectively over that context memory such that agents can remember over sessions, and access all of that data. Infinite memory agents are within reach now. There are many possible solutions to this now including MemGPT from Letta along with the frontier AI companies extending their context windows. The biggest obstacle will be helping organizations streamline all of their data and communications to be made available to AI agents. However, organizations are going to be very motivated to do this, because those that do make their data and commuications available are going to have a huge competitive advantage because of their accelerated pace of software delivery and better organizational decision making.

So this is not a big leap to make agents that have this extended memory ability and being able to maintain intent over these tasks. The biggest impediment will be organizations being able to adapt their data and internal processes. Does this mean that all software engineering jobs will be eliminated? No. Engineers who can orchestrate, manage, secure, and generally work with AI agents will be in demand. So those who are using AI as part of their current process and learning as much as they can about taking advantage of AI and looking to innovate around AI Engineering will be fine.

Unfortunately, there is the mindset of many engineers who don't bring anything more to the table other than their ability to pull a story off of a Jira backlog that was written by a project manager and complete that story and who don't think in terms of innovation or design at all. They work in a bureaucratic mindset of just task completion and generating just the code for that task and not in a mindset of generating ideas and innovating. This type of engineer who doesn't innovate and specifically doesn't innovate using AI will be gone and will not be able to find a job like that again in engineering. In other words:

If you are an engineer who is used to thinking like a robot, you will be replaced with a robot.

But I don't think that most engineers can be blamed for thinking of their job in terms of being automatons writing code. Many engineers ever since getting out of school have been conditioned through endless cycles of sprints, planning meetings, story points, and retrospectives to think this way. Also, in many organizations, there are layers of project managers and customer success people between them and the end users, which reinforces the idea that their end goal is code generation and not building a product for a user. In addition, in larger organizations, they may only work on a small functionality of the end product. But if they are going to survive, they are going to need to bring their human-ness to the job, their uniqueness based on their personal experiences and innovate and be creative.

Relevance to Building Better AI

So given the Polanyi's Paradox on engineering, what have been the implications for AI?

Polanyi’s Paradox has long been recognized as a core challenge in artificial intelligence. Let's go back to the beginning. In the early decades of AI (mid-20th century through 1980s), programmers found that many tasks humans find trivial were extremely difficult to specify to a computer, precisely because of tacit knowledge. As Polanyi himself noted, tasks like driving a car in traffic or recognizing a face rely on a vast array of subtle cues and intuitions that we execute without conscious thought. We can perform them, but we can’t easily enumerate how we do it. This posed a problem: how do you write a program to do something you can’t fully explain to yourself? Early AI could handle problems that had clear rules (like solving equations or playing chess to some extent), but it struggled with open-ended, real-world tasks. Polanyi’s paradox was identified as a major obstacle for AI and automation. Unless we have a complete, explicit recipe for a task, getting a machine to do it is extremely difficult.

A classic example is commonsense reasoning: even a toddler understands that a wobbling stack of blocks might fall, or that a snowman in the road isn’t a threat, but such “common sense” eluded AI programs because nobody could fully encode all the needed if-then rules. As one economist summarized, the tasks easiest to automate are those that “follow explicit, codifiable procedures,” whereas the ones requiring “flexibility, judgment and common sense, skills that we understand only tacitly, have proved most vexing to automate.” Indeed, for decades computers remained “less sophisticated than preschool-age children” at tasks like understanding natural language or navigating an unpredictable environment. This is Polanyi’s Paradox writ large in technology: machines hit a wall when facing the tacit dimension of human know-how.

The past decade, however, has seen much progress in AI, largely by finding ways around Polanyi’s Paradox rather than directly solving it. Since we can’t easily tell the machine what we know implicitly, we let the machine learn from experience just as humans do. In other words, “contemporary AI seeks to overcome Polanyi’s paradox by building machines that learn from human examples, thus inferring the rules that we tacitly apply but do not explicitly understand.” Instead of programmers hand-coding every rule, we feed AI systems with massive amounts of data (images, recordings, text, demonstrations) and let them figure out the patterns.

This is the approach of machine learning, and especially deep learning. For instance, rather than attempting to enumerate all the visual features that distinguish a pedestrian from a shadow on the road, engineers train a self-driving car’s vision system on millions of driving images and let it learn the concept of “pedestrian” by example. The AI essentially internalizes a web of correlations and features, a form of tacit knowledge, from the data.

AI can now capture tacit patterns by finding regularities in unstructured inputs. A striking success of this approach was in the game of Go: expert Go players often say they rely on intuition to judge a good move (they “know” a move is strong without being able to verbalize why). Early AI couldn’t crack Go by brute-force logic alone. But a deep learning system (AlphaGo) was able to absorb millions of human moves and even play games against itself, eventually acquiring an almost intuitive style of play that surprised human experts. In effect, AlphaGo learned the tacit principles of Go that even top players couldn’t articulate, and it went on to defeat the world champion.

Similar stories abound: large language models (LLMs) were trained on billions of sentences of human text and as a result, they learned the hard-to-pin-down rules of language, style, and meaning without anyone programming those rules in directly. These models can answer questions, write essays, or carry a conversation in a very human-like way, not because they were explicitly told how to do it, but because they statistically absorbed how humans communicate. In other words, they captured a slice of our tacit linguistic knowledge.

Interestingly, many of today’s AI successes still depend on how well we can communicate our goals to the system; the unfortunately named idea that we are all familiar with of prompt engineering. Because machines lack the shared human context we often take for granted, we’re required to make our implicit understanding more explicit than we typically would with another person. When working with a coworker, you might simply say, “You know what I mean,” and count on shared experience to fill in the blanks. With AI, that luxury doesn’t exist. Instead, we must carefully spell out our expectations, constraints, and context. Getting the AI to produce a useful output often involves trial and error, tweaking inputs, and drawing on our own unspoken know-how about what might steer it in the right direction. This interaction process is, in itself, a reflection of our tacit knowledge; how we choose to phrase a task, what examples we give, and which assumptions we clarify all stem from our intuition about how the system behaves. In this way, human users must surface their internalized expertise just to bridge the gap between what we understand intuitively and what the machine needs to be told explicitly.

The human is leveraging personal know-how about the AI’s behavior to coax the desired result. For example, when trying to generate a specialized algorithm or solve a domain-specific problem, a developer might instinctively shape their request to match what they know the AI handles well; perhaps by framing the problem in terms that are common in training data or avoiding ambiguous phrasing. They may not consciously think about it, but they draw on a kind of internal playbook developed through past interactions: what tends to work, what confuses the model, and how to steer it toward quality output. This back-and-forth of adjusting inputs, interpreting results, and refining expectations—reflects a nuanced, intuitive understanding that the developer might not even fully articulate. It’s a collaborative process that takes some experience by the engineer with working with the AI where the machine brings breadth of data and pattern recognition, while the human brings situational judgment and contextual insight. In this way, navigating AI systems becomes another expression of Polanyi’s Paradox: we rely on what we know but can’t entirely explain to get useful results from a machine that requires us to make that knowledge as explicit as possible.

Polanyi’s Paradox as a Cautionary Tale

Finally, Polanyi’s Paradox and the quest to encode (or exploit) tacit knowledge in AI bring to mind several cautionary tales and analogies. One is the legend of Frankenstein. In Mary Shelley’s novel, a scientist creates a sentient being but cannot control it or imbue it with the values and understanding needed to integrate into society. The creature, “detached from human values, brought chaos rather than order,” as one commentary puts it. This resonates with modern fears that AI systems, if created without sufficient foresight, might behave in ways misaligned with human intentions; not out of malice, but because we never explicitly told them the full story of right, wrong, and context. Polanyi’s Paradox suggests we might not even know how to fully tell them these things, since so much of our common sense and ethics is tacit.

Another example if you have watched enough Disney then you have seen Mickey Mouse as The Sorcerer’s Apprentice in Fantasia as a young apprentice who animates a broom to do his cleaning for him. Lacking the master’s wisdom, he cannot stop the broom as it relentlessly carries out his literal instructions, eventually flooding the place. This is a perfect illustration of the failure of specification. The apprentice knew just enough magic to issue a command, but not enough to specify boundaries or handle contingencies. In AI terms, he had no “off-switch” or nuance in his instructions. The story underscores the risk of viewing powerful AI tools as magical solutions without understanding their limits. If we deploy AI with the mindset of that apprentice and expect it to perfectly do our bidding when we ourselves haven’t nailed down exactly what we want. We shouldn’t be surprised when we get unintended results.

Already, we’ve seen real examples of this: an AI instructed to maximize user engagement on a platform might learn that spreading sensationalist content achieves the goal, to society’s detriment; or a chatbot given free rein to converse might start generating troubling outputs if the designers didn’t anticipate certain prompts. As one observer quipped, if you’re “not careful, like a dreaming Mickey, you too will suffer the consequences of unintended hallucinations” from AI. In plainer terms, the irony of Polanyi’s Paradox in AI is that we are building machines far more literal-minded than ourselves to handle tasks that we ourselves only understand implicitly.

Conclusion

Polanyi’s Paradox remains a guiding concept in understanding the limits of formal knowledge in both human and machine domains. Historically, it reminds us that even at the height of scientific rationalism, that "knowing" something is often inexpressible. In software engineering, it encourages the recognition that programming is not purely formal and that expertise can’t be entirely captured in textbooks and that engineers must adapt. And in AI, Polanyi’s Paradox is both a challenge and a driving force. A challenge because AI must grapple with tasks that we as humans struggle to fully describe, and a driving force behind the rise of machine learning to let computers learn the unspeakable parts of human skill.

As we integrate AI more into our lives, Polanyi’s insight also serves as a warning: we should be mindful of what we haven’t been able to tell our machines. Balancing explicit knowledge with tacit understanding, and combining human intuition with machine computation, will be an important element of buiding stronger AI. After all, paraphrasing Polanyi, we (and our AI) know more than we can tell.

Monday, March 31, 2025

The Octopus, AI, and the Alien Mind: Rethinking Intelligence

I’ve long been fascinated by octopuses (and yes it's octopuses and not octopi). Their intelligence, forms, and alien-like behaviors have greatly interested me. Over the years, I’ve read many research papers on cephalopod cognition, behavior, and neurobiology and have always been amazed at how octopuses seem to defy our conventional definitions of intelligence, and that science has continued to learn much more about octopus behavior in recent years.

Giant Pacific Octopus
Giant Pacific Octopus

This scientific foundation has been enriched by fiction and nonfiction alike by some of my favorite sci-fi books on the subject. Adrian Tchaikovsky’s Children of Ruin, the second book in his Children of Time offered a speculative exploration of octopus-like alien intelligence evolved into a species that was space faring. Ray Nayler’s book, which is another recent book that I really enjoyed, The Mountain in the Sea, imagined a near future story around the discovery of communicative octopuses and what it means to truly understand another mind. Then in nonfiction, a recent book on octopuses is by Sy Montgomery, who has written extensively on octopuses. It's titled The Soul of an Octopus where she gives a personal perspective into octopus behavior and relationships. And then there's the documentary series Secrets of the Octopus which further explored octopus intelligence through some great visuals and firsthand accounts from researchers. What all of these works share is a sense of awe for the octopus as both a mirror and a counterpoint to our own minds and preconceptions.

This blog post explores how octopus intelligence challenges our narrow, human-centric understanding of cognition, and how embracing different alternative models of mind might open bold new directions in artificial intelligence.

Much of the current discourse around artificial intelligence (AI), especially artificial general intelligence (AGI) and its potential evolution into artificial superintelligence (ASI), is deeply rooted in comparisons to the human brain. This anthropocentric framing shapes how many prominent figures in the AI field conceptualize and pursue intelligence in machines. Ivan Soltesz, a neuroscientist at Stanford University, suggests that AI could eventually perform all human tasks, even those requiring subtle forms of reasoning like dark humor or one-shot learning. He envisions future AI systems that might even choose to appear childlike by “acting silly,” implying that human-like behavior remains a gold standard for intelligent systems. Similarly, Dr. Suin Yi at Texas A&M University has developed a “Super-Turing AI” that mimics the brain’s energy-efficient data migration processes, further reinforcing the idea that human neurobiology provides the blueprint for next-generation AI.[1]

Other researchers go even further. Guang-Bin Huang and colleagues propose building AI “twins” that replicate the brain’s cellular-level architecture, arguing that such replication could push AI beyond human intelligence.[2] Bo Yu and co-authors echo this sentiment in their call for AGI systems built directly from cortical region functionalities, essentially copying the operational mechanisms of the human brain into machine agents.[3] Meanwhile, analysts like Eitan Michael Azoff stress the importance of decoding how the human brain processes sensory input and cognitive tasks, contending that this is the key to building superior AI systems.[4] Underlying all these efforts is a persistent belief: that human cognition is not only a useful reference point, but perhaps the only viable model for creating truly intelligent machines. And to take it even further, some of the biggest critics of current AI advances will critisize current approaches as not learning in the ways that humans learn. This is one of the central points of Gary Marcus, a prominent LLM critic who, by the way, I disagree with on most of his thoughts around AI, argues that with LLM's immense need for data and its transformer neural network architecture that this is not how children learn.

However, this post challenges that assumption. AI doesn't need to mimic how children learn. AI doesn't need to emulate human brains to advance. While it’s understandable that AI development has historically drawn inspiration from the most intelligent system we know, our own minds, but this narrow focus may ultimately limit our imagination and the potential of the technologies we build.

In this context, the purpose of this post is to advocate for a broader, more inclusive definition of intelligence; one that moves beyond the human brain as the central paradigm. By looking to other models of cognition, particularly those found in non-human animals like the octopus, we can begin to break free from anthropocentric thinking. Octopuses demonstrate complex problem-solving, sensory integration, and even self-awareness with neural architectures completely unlike our own. They serve as a powerful counterpoint to the idea that intelligence must look and act like us. If we are serious about developing truly advanced AI, or even preparing for the possibility of encountering alien minds, it’s time we stopped treating human cognition as the default blueprint. The future of intelligence, both artificial and otherwise, may lie not in copying ourselves, but in embracing the radical diversity of minds that evolution (and possibly the universe) has to offer.

But before we get to a dicussion of thinking more broadly about intelligence that is not human-centric, we need to look at the octopus and some its amazing abilities.

The Mysterious Minds of Octopuses: Cognition and Consciousness

Octpuses are problem sovlers. For example, an octopus can unscrew the lid of a jar to retrieve a crab inside methodically. With eight limber arms covered in sensitive suckers, it solves a puzzle that would stump many simpler creatures. It will change color in a flush of reds and browns in bursts of expression, as if contemplating its next move. So one has to wonder: what kind of mind lurks behind those alien, horizontal pupils?

Octopuses are cephalopods. They are a class of mollusks that also includes squid and cuttlefish, and they have some of the most complex brains in the invertebrate world. The common octopus has around 500 million neurons, a count comparable to that of many small mammals like rabbits or rats. What’s pretty amazing is how those neurons are distributed. Unlike a human, whose neurons are mostly packed into one big brain, an octopus carries over half its neurons in its arms, in clusters of nerve cells called ganglia.[5] In effect, each arm has a “mini-brain” capable of independent sensing and control. If an octopus’s arm is severed (in an unfortunate encounter with a predator), the arm can still grab and react for a while on its own, showing complex reflexes without input from the central brain.[6] This decentralized nervous system means the octopus doesn’t have full top-down control of every tentacle movement in the way we control our limbs. Instead, its mind is spread throughout its body.

Such a bizarre setup evolved on a very different path from our own. The last common ancestor of humans and octopuses was likely a primitive worm-like creature over 500 million years ago. All the “smart” animals we’re used to, such as primates, birds, dolphins are our distant cousins with centralized brains, but the octopus is an entirely separate experiment in evolving intelligence.[7] Its evolutionary journey produced capabilities that are incredibly unique. For example, octopuses and their cephalopod relatives can perform amazing feats of camouflage and signaling. A common cuttlefish can flash rapid skin pattern changes to blend into a chessboard of coral and sand, even though it is likely colorblind, indicating sophisticated visual processing and motor control.[5] Octopuses have been observed using tools. The veined octopus famously gathers coconut shell halves and carries them to use later as a shelter, effectively assembling a portable armor when needed. They solve mazes and navigate complex environments in lab experiments, showing both short-term and long-term memory capabilities similar to those of trained mammals.[6]

Crucially, octopuses also demonstrate learning and problem-solving that hint at cognitive complexity. In laboratory tests, octopuses (and cuttlefish) can learn to associate visual symbols with rewards. For instance, figuring out which shape on a screen predicts food. They’re even capable of the cephalopod equivalent of the famous “marshmallow test” for self-control. In one 2021 study, cuttlefish were given a choice between a morsel of crab meat immediately or a tastier live shrimp if they waited a bit longer and many cuttlefish opted to wait for the better snack, exhibiting self-control and delayed gratification.[5] Such behavioral experiments suggest that these invertebrates can flexibly adapt their behavior and rein in impulses, abilities once thought to be the domain of large-brained vertebrates.

All these findings force us to ask: do octopuses have something akin to consciousness or subjective experience? While it’s hard to know exactly what it’s like to be an octopus, the evidence of sophisticated learning and neural complexity has been convincing enough that neuroscientists now take octopus consciousness seriously. In 2012, a group of prominent scientists signed the Cambridge Declaration on Consciousness, stating that humans are not unique in possessing the neurological substrates for consciousness. Non-human animals, including birds and octopuses, also possess these.[6, 10] In 2024, over 500 researchers signed an even stronger declaration supporting the likelihood of consciousness in mammals and birds and acknowledging the possibility in creatures like cephalopods. In everyday terms, an octopus can get bored, show preferences, solve novel problems, and perhaps experience something of the world; all with a brain architecture utterly unlike our own. It’s no wonder some animal welfare laws (for example, in the EU and parts of the US) have begun to include octopuses, recognizing that an animal this smart and behaviorally complex deserves ethical consideration.[5]

Beyond Anthropocentric Intelligence: Lessons from an Alien-like Being

Our understanding of animal intelligence has long been colored by anthropocentric bias; the tendency to measure other creatures by the yardstick of human-like abilities. For decades, researchers would ask whether animals can solve puzzles the way a human would, use language, or recognize themselves in mirrors. Abilities that didn’t resemble our own were often ignored or underestimated. Octopus intelligence throws a wrench into this approach. These animals excel at behaviors we struggle to even imagine: their entire body can become a sensing, thinking extension of the mind; they communicate by changing skin color and texture; they don’t form social groups or build cities, yet they exhibit curiosity and individuality. As one researcher put it, “Intelligence is fiendishly hard to define and measure, even in humans. The challenge grows exponentially in studying animals with sensory, motivational and problem-solving skills that differ profoundly from ours.”[5] To truly appreciate octopus cognition, we must broaden our definition of intelligence beyond tool use, verbal reasoning, or social learning, just because these are traits we prioritized because we’re good at them.

Octopuses teach us that multiple forms of intelligence exist, shaped by different bodies and environments. An octopus doesn’t plan a hunt with abstract maps or language, but its deft execution of a prey ambush, coordinating eight arms to herd fish into a corner, for instance is a kind of tactical genius. In Australian reefs, biologists have observed octopuses engaging in collaborative hunting alongside fish: a reef octopus will lead the hunt, flushing prey out of crevices, while groupers or wrasses snap up the fleeing target and the partners use signals (like arm movements or changes in posture) to coordinate their actions.[5] This cross-species teamwork suggests a level of problem-solving and communication we wouldn’t expect from a solitary mollusk. It challenges the notion that complex cooperation requires a primate-style social brain.

Philosopher Peter Godfrey-Smith has famously described the octopus as “the closest we will come to meeting an intelligent alien” on Earth. In fact, he notes that if bats (with their sonar and upside-down life) are Nagel’s example of an alien sensory world, octopuses are even more foreign; a creature with a decentralized mind, no rigid skeleton, and a shape-shifting body.[10] What is it like to be an octopus? It’s a question that stretches our imagination. The octopus confronts us with an intelligence that evolved in a fundamentally different way from our own, and thus forces us to recognize how narrow our definitions of mind have been. Historically, even renowned scientists fell into the trap of thinking only humans (or similar animals) could possess genuine thought or feeling. René Descartes in the 17th century infamously argued non-humans were mere automatons. Today, our perspective is shifting. We realize that an octopus solving a puzzle or exploring its tank with what appears to be curiosity is demonstrating a form of intelligence on its own terms. It may not pass a human IQ test, but it has cognitive strengths tuned to its world.

By shedding our anthropocentric lens, we uncover a startling truth: intelligence is not a single linear scale with humans at the top. Instead, it’s a rich landscape with many peaks. An octopus represents one such peak; an evolutionary pinnacle of cognition in the ocean, as different from us as one mind can be from another. If we acknowledge that, we can start to ask deeper questions: What general principles underlie intelligence in any form? And how can understanding the octopus’s “alien” mind spark new ideas in our quest to build intelligent machines?

Rethinking AI: From Human-Centric Models to Octopus-Inspired Systems

Contemporary artificial intelligence has been inspired mostly by human brains. For example, artificial neural networks vaguely mimic the neurons in our cortices, and reinforcement learning algorithms take cues from the reward-driven learning seen in mammals. This anthropomorphic inspiration has led to remarkable achievements, but it may also be limiting our designs. What if, in addition to human-like brains, we looked to octopus minds for fresh ideas on how to build and train AI?

One striking aspect of octopus biology is its distributed neural architecture. Instead of a single centralized processor, the octopus has numerous semi-autonomous processors (the arm ganglia) that can work in parallel. This suggests that AI systems might benefit from a more decentralized design. Today’s AI models typically operate as one monolithic network that processes inputs step-by-step. An octopus-inspired AI, by contrast, could consist of multiple specialized subnetworks that operate in parallel and share information when needed; more like a team of agents, or a brain with local “brains” for different functions. In fact, researchers in robotics have noted that the octopus’s distributed control system is incredibly efficient for managing its flexible, high-degree-of-freedom body. Rather than trying to compute a precise plan for every tentacle movement (a task that would be computationally intractable), the octopus’s central brain issues broad goals while each arm’s neural network handles the low-level maneuvers on the fly.[11] Decentralization and parallelism are keys to its control strategy.

In AI, we see early glimmers of this approach in embodied robotics and multi-agent systems. For example, a complex robot could be designed with independent controllers for each limb, all learning in tandem and coordinating similar to octopus arms. This would let the robot react locally to stimuli (like an arm adjusting grip reflexively) without waiting on a central algorithm, enabling faster and more adaptive responses. An octopus-like AI might also be highly adept at processing multiple sensory inputs at once. Octopuses integrate touch, taste (their suckers can “taste” chemicals), vision, and proprioception seamlessly while interacting with the world. Likewise, next-generation AI could merge vision, sound, touch, and other modalities in a more unified, parallel way, breaking free of the silos we often program into algorithms. Researchers have pointed out that emulating the octopus’s decentralized neural structure could allow AI to handle many tasks simultaneously and react quickly to environmental changes, rather than one step at a time.[12] Imagine an AI system monitoring a complex environment: an octopus approach might spawn many small “agents” each tracking a different variable, cooperating only when necessary, instead of one central brain bottleneck.

Furthermore, octopus cognition emphasizes embodiment - the idea that intelligence arises from the interplay of brain, body, and environment. Modern AI is increasingly exploring embodied learning (for instance, reinforcement learning agents in simulations or robots that learn by doing). Octopuses show how powerful embodiment can be: their very skin and arms form a loop with their brain, constantly sensing and acting. In AI, this suggests we should design agents that learn through physical or virtual interaction, not just from abstract data. Already, reinforcement learning is essentially trial-and-error problem solving, which parallels how an octopus might experimentally tug at parts of a shell until it finds a way to pry it open. Indeed, many octopus behaviors look like RL in action – they learn from experience and adapt strategies based on feedback, exactly the principle by which RL agents improve.[12] An octopus-inspired AI would likely be one that explores and adapts creatively, perhaps guided by curiosity and tactile experimentation, not just by the kind of formal logic humans sometimes use.

Here are a few ways octopus intelligence could inspire future AI:

  • Decentralized “brains” for parallel processing: Instead of one central AI model, use a collection of specialized models that work in concert, mirroring the octopus’s network of arm ganglia. This could make AI more robust and responsive, able to multitask or gracefully handle multiple goals at once[11, 12].
  • Embodied learning and sensory integration: Build AI that learns through a body (real or simulated), integrating vision, touch, and other senses in real-time. Just as an octopus’s arms feel and manipulate objects to understand them, an embodied AI could achieve richer learning by physically exploring its environment[12, 13].
  • Adaptive problem-solving (cognitive flexibility): Octopuses try different tactics and even exhibit impulse control when needed (as seen in the cuttlefish waiting for shrimp). AI agents could similarly be trained to switch strategies on the fly and delay immediate rewards for greater gains, improving their flexibility.[5, 12]
  • Communication and coordination: While octopuses aren’t social in the human sense, they do communicate (e.g. through color flashes). In AI, multiple agents might communicate their local findings to achieve a larger goal. Developing protocols for AI “agents” to share information akin to octopuses signaling or an arm sending feedback to the central brain which can lead to better coordination in multi-agent systems.[12]

This isn’t just speculative. Researchers in soft robotics are actively studying octopus neurology to design flexible robots, and computer scientists are proposing networked AI architectures influenced by these ideas.[11, 12] By looking at a creature so unlike ourselves, we expand our toolbox of design principles. We might create machines that think a little more like an octopus; machines that are more resilient, adaptable, and capable of processing complexity in a fluid, distributed way.

Speculative Encounters: Alien Intelligences and Other Minds

If octopuses represent an “alien mind” on Earth, what might actual alien intelligences look like? Science fiction has long toyed with this question, often using Earth creatures as inspiration. Notably, the film Arrival features heptapod aliens that resemble giant cephalopods, complete with seven limb-like appendages and an ink-like mode of communication. These aliens experience time non-linearly and communicate by painting complex circular symbols, which is a far cry from human speech. The creators of Arrival were influenced by findings in comparative cognition; they explicitly took cues from cephalopods as a model for an intelligence that is highly developed but utterly non-human.[14] The heptapods’ motivations in the story are opaque to humans, and initial contact is stymied by the barrier of understanding their language and perception. This scenario underscores how challenging it may be to recognize, let alone comprehend, a truly alien consciousness.

Beyond cephalopod-like extraterrestrials, speculative biology offers a wide array of possibilities. Consider an alien species that evolved as a hive mind, more like social insects on Earth. Individually, the creatures might be as simple as ants or bees, but collectively they form a super-intelligent entity, communicating via pheromones or electromagnetic signals. Their “thoughts” might be distributed across an entire colony or network, with no single point of view; intelligence as an emergent property of many bodies. This isn’t far-fetched; even on Earth, we see rudiments of collective intelligence in ant colonies, bee hives, and slime molds. A sufficiently advanced hive species might build cities or starships, but there may be no identifiable leader or central brain making their decision-making processes hard for humans to fathom.

Or imagine a planetary intelligence like the ocean of Solaris in Stanisław Lem’s classic novel Solaris. In that story, humans struggle to communicate with a vast alien entity that is essentially an ocean covering an entire planet; possibly a single, planet-wide organism with intelligence so different that its actions seem incomprehensible. Is it conscious? Does it dream, plan, or care about the humans orbiting above? The humans never really find out. Lem uses it to illustrate how an alien mind might be so far from our experience that we can’t even recognize its attempts at communication. Likewise, an alien intelligence might be embedded in a form of life that doesn’t even have discrete “individuals” as we understand them. It could be a network of microorganisms, or a cloud of gas that has achieved self-organization and data processing, as astronomer Fred Hoyle imagined in his novel The Black Cloud. If our probes encountered a Jupiter-sized storm system that subtly altered its own vortices in response to our signals, would we know we had met an alien mind? Stephen Wolfram, in a thought experiment, describes a scenario of a spacecraft “conversing” with a complicated swirling pattern on a planet, perhaps exchanging signals with it, and poses the question of whether we’d recognize this as intelligence or dismiss it as just physics. After all, any sufficiently complex physical system could encode computations as sophisticated as a brain’s, according to Wolfram’s Principle of Computational Equivalence.[16] In other words, alien intelligence might lurk in forms we would never intuitively label as minds.

Science fiction also entertains the possibility that the first alien intelligence we encounter might be artificial, not biological. If an extraterrestrial civilization advanced even a bit beyond us, they may have created Artificial Intelligences of their own and perhaps those AIs, not the biological beings, are what spread across the stars. Some theorists even speculate that the majority of intelligences in the universe could be machine intelligences, evolved from their original organic species and now operating on completely different substrates (silicon, quantum computing, plasma, who knows).[17] These machine minds might think at speeds millions of times faster than us, or communicate through channels we don’t detect. For instance, an alien AI might exist as patterns of electromagnetic fields, or as self-replicating nanobots diffused through the soil of a planet, subtly steering matter toward its goals.

Ultimately, exploring alien intelligences in speculation forces us to confront the vast space of possible minds. Our human mind is just one point in that space - one particular way intelligence can manifest. An octopus occupies another point, a very distant one. A truly alien mind could be farther away still. One insightful commentator noted that “the space of possible minds is vast, and the minds of every human being that ever lived only occupy a small portion of that space. Superintelligences could take up residence in far more alien, and far more disturbing, regions.”[18] In short, there could be forms of intelligence that are as far from us as we are from an amoeba, occupying corners of cognitive possibility we haven’t even conceived.

Crucially, by studying diverse intelligences, whether octopus or hypothetical alien, we expand our imagination for what minds can do. Cephalopods show that advanced cognition can arise in a creature with a short lifespan, no social culture to speak of, and a radically different brain plan. This suggests that on other worlds, intelligence might crop up under a variety of conditions, not just the Earth-like, primate-like scenario we used to assume. It also suggests that when we design AI, we shouldn’t constrain ourselves to one model of thinking. As one science writer put it, there are multiple evolutionary pathways and biological architectures that create intelligence. The study of cephalopods can yield new ways of thinking about artificial intelligence, consciousness, and plausible imaginings of unknown alien intelligence.[7] In embracing this diversity, we prepare ourselves for the possibility that when we finally meet E.T. (or create an alien intelligence ourselves in silico), it might not think or learn or communicate anything like we do.

Towards Diverse Super-Intelligence: Expanding the Definition of “Mind”

Why does any of this matter for the future of AI and especially the prospect of Artificial Super Intelligence (ASI)? It matters because if we remain locked in an anthropocentric mindset, we may limit the potential of AI or misjudge its nature. Expanding our definition of intelligence isn’t just an academic exercise; it could lead to more powerful and diverse forms of ASI that transcend what we can imagine now.

Today’s cutting-edge AI systems already hint at non-human forms of thinking. A large language model can write code, poetry, and have complex conversations, yet it does so with an architecture and style of “thought” very unlike a human brain. AI agents in game environments sometimes discover strategies that look alien to us; exploiting quirks of their world that we would never consider, because our human common sense filters them out. As AI researcher Michael Levin argues, intelligence is not about copying the human brain, but about the capacity to solve problems in flexible, creative ways; something that can happen in biological tissues, electronic circuits, or even colonies of cells.[13] If we define intelligence simply as achieving goals across varied environments, then machines are already joining animals on a spectrum of diverse intelligences.

We must recognize our “blind spot” for unfamiliar minds. We humans are naturally attuned to notice agency in entities that look or behave like us (or our pets). We’re far less good at recognizing it in, say, an AI that thinks in billions of parameters, or an alien life form made of crystal. This anthropocentric bias creates a dangerous blind spot. As one author noted, we may be oblivious to intelligence manifesting in radically different substrates. In the past, this bias led us to underestimate animal intelligences (we failed to see the clever problem-solving of crows or the learning in octopuses for a long time because those animals are so unlike us). In the present, it could mean we fail to appreciate the emergence of novel intelligences in our AI systems, simply because they don’t reason or introspect as a person would.[13] If we expand our mindset; appreciating the octopus’s mind, the potential minds of aliens, and the unconventional cognition of machines we’ll be better equipped to guide AI development toward true super-intelligence.

What might a diverse ASI look like? It might be an entity that combines the logical prowess of digital systems with the adaptive embodied skills seen in animals like octopuses. It could be a networked intelligence encompassing many agents (or robotic bodies) sharing one mind, much like octopus arms or a hive, rather than a singular centralized brain. Such an ASI could multitask on a level impossible for individual humans, perceiving the world through many “eyes” and “hands” at once. Its thought processes might not be describable by a neat sequence of steps (just as an octopus’s decision-making involves parallel arm-brain computations). It might also be more resilient: able to lose parts of itself (servers failing, robots getting damaged) and self-heal or re-route around those losses, the way an octopus can drop an arm and survive. By not insisting that intelligence must look like a human mind, we open the door to creative architectures that could surpass human capabilities while also being fundamentally different in form.

Philosophically, broadening the concept of intelligence fosters humility and caution. Nick Bostrom, in discussing the prospect of superintelligence, reminds us not to assume a super-AI will share our motivations or thinking patterns. In the vast space of possible minds, a superintelligence might be as alien to us as an octopus is, or more so.[18] By acknowledging that space, we can attempt to chart it. We can deliberately incorporate diversity into AI design, perhaps creating hybrid systems that blend multiple “thinking styles.” For example, an ASI could have a component that excels at sequential logical reasoning (a very human strength), another that operates more like a genetic algorithm exploring myriad possibilities in parallel (closer to an evolutionary or octopus-like trial-and-error strategy), and yet another that manages collective knowledge and learning over time (the way humans accumulate culture, something octopuses don’t do).[7]In combination, such a system might achieve a breadth of cognition no single-track mind could.

Expanding definitions of intelligence also has an ethical dimension. It encourages us to value minds that are not like ours - be they animal, machine, or extraterrestrial. If one day we create an AI that has an “alien” form of sentience, recognizing it as such will be crucial to treating it appropriately. The same goes for encountering alien life: we’ll need the wisdom to see intelligence in forms that might initially seem bizarre or unintelligible to us.

Conclusion

Cephalopod intelligence is not just an ocean curiosity; it’s a profound hint that the universe harbors many flavors of mind. By learning from the octopus, we prepare ourselves to build AI that is richer and more creative, and to recognize intelligence in whatever shape it takes: carbon or silicon, flesh or code, earthling or alien. The march toward Artificial Super Intelligence need not follow a single path. It can branch into a diverse ecosystem of thinking entities, each drawing from different principles of nature. Such a pluralistic approach might very well give rise to an ASI that is both exceptionally powerful and surprisingly adaptable; a true melding of human ingenuity with the wisdom of other minds. The octopus in its deep blue world, the hypothetical alien in its flying saucer (or tide pool, or cloud), and the AI in its datacenter may all be points on the great map of intelligence. By connecting those dots, we trace a richer picture of what mind can be and that map could guide us toward the next breakthroughs in our quest to create, and coexist with, intelligences beyond our own.


Sources

1. Henton, Lesley. "Artificial Intelligence That Uses Less Energy By Mimicking The Human Brain." Texas A&M Stories. https://stories.tamu.edu/news/2025/03/25/artificial-intelligence-that-uses-less-energy-by-mimicking-the-human-brain/. 2025.
2. Huang, Guang-Bin et al. "Artificial Intelligence without Restriction Surpassing Human Intelligence with Probability One: Theoretical Insight into Secrets of the Brain with AI Twins of the Brain." https://arxiv.org/pdf/2412.06820.
3. Yu, Bo et al. "Brain-inspired AI Agent: The Way Towards AGI." ArXiv, https://arxiv.org/pdf/2412.08875, 2024.
4. "Cracking the Brain’s Neural Code: Could This Lead to Superhuman AI?", https://www.thenila.com/blog/cracking-the-brains-neural-code-could-this-lead-to-superhuman-ai. Neurological Institute of Los Angeles.
5. Blaser, R. (2024). Octopuses are a new animal welfare frontier-what scientists know about consciousness in these unique creatures. The Conversation/Phys.org.
6. “Animal consciousness.” Wikipedia, Wikimedia Foundation, last modified March 30, 2025. https://en.wikipedia.org/wiki/Animal_consciousness. 7. Forking Paths (2023). “The Evolution of Stupidity (and Octopus Intelligence).” (On multiple evolutionary paths to intelligence).
8. Chung, W.S., Marshall, J. et al. (2021). Comparative brain structure and visual processing in octopus from different habitats. Current Biology. (Press summary: “How smart is an octopus?” University of Queensland/Phys.org).
9. Cambridge Declaration on Consciousness (2012) – Public statement by neuroscientists on animal consciousness.
10. Godfrey-Smith, P. (2013). “On Being an Octopus.” Boston Review. (Octopus as an independent evolution of mind).
11. Sivitilli, D. et al. (2022). “Lessons for Robotics From the Control Architecture of the Octopus.” Frontiers in Robotics and AI.
12. Sheriffdeen, Kayode. (2024). "From Sea to Syntax: Lessons from Octopus Behavior for Developing Advanced AI Programming Techniques." "https://easychair.org/publications/preprint/Tz1l/open#:~:text=architectures%20in%20AI%20systems%2C%20developers,to%20handle%20multiple%20tasks%20simultaneously.
13. Yu, J. (2025). “Beyond Brains: Why We Lack A Mature Science of Diverse Intelligence.” Intuition Machine (Medium).
14. Extinct Blog (2017). “From Humanoids to Heptapods: The Evolution of Extraterrestrials in Science Fiction.” (Discussion of Arrival and cephalopod-inspired aliens).
15. Poole, S. (2023). The Mountain in the Sea – book review, The Guardian. (Fictional exploration of octopus intelligence and communication).
16. Wolfram, S. (2022). “Alien Intelligence and the Concept of Technology.” Stephen Wolfram Writings.
17. Rees, Martin. (2023) "SETI: Why extraterrestrial intelligence is more likely to be artificial than biological." Astronomy.com. "https://www.astronomy.com/science/seti-why-extraterrestrial-intelligence-is-more-likely-to-be-artificial-than-biological/" 18. Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. (Excerpt via Philosophical Disquisitions blog on the space of possible minds)


The Cache Is the Thought — What KV Caching Reveals About How AI Actually Works Machine Intelligence · Technical Essays Apri...