billparker.ai

Sunday, May 11, 2025

Polanyi’s Paradox: Why We Know More Than We Can Tell

Polanyi’s Paradox is the idea that much of what we know cannot be clearly expressed in words or formulas, and it is something I have thought about a lot, especially in the context of engineering and AI. Despite what some might assume, software engineering is not just about generating code. It involves layers of intuition, architectural decisions, project context, unwritten standards, and rules of thumb that experienced engineers learn over time. This is why, when engineers struggle to explain exactly how tools like Cursor or Windsurf fit into their workflow, it is not because the tools aren’t valuable, because they are. And it's not because engineers lack communication skills, it's because these tools support only part of what software engineers do. Much of the rest relies on experience and intuition that is hard to articulate. In this post, I want to explore how Polanyi’s Paradox connects to software engineering, what it tells us about the current state of AI, and what it might take to build AI systems that truly grasp the kind of knowledge humans use every day that is tacit.

But first, let's look back at the historical origins of this idea.

Historical Perspective

Michael Polanyi, a Hungarian-British polymath (1891–1976), introduced what later came to be known as Polanyi’s Paradox in 1966. Polanyi started his career as a distinguished physical chemist before turning to philosophy, bringing a scientist’s insight into how knowledge actually works in practice. In his book The Tacit Dimension, he famously wrote “we can know more than we can tell”, referring to the tacit knowledge that underlies many human skills. For example, Polanyi pointed out that we recognize familiar faces effortlessly without being able to list the exact features or rules that distinguish each face; a nod to Gestalt perception and intuition.

He illustrated the gap between knowledge and articulation with everyday instances: “The skill of a driver cannot be replaced by a thorough schooling in the theory of the motorcar,” he noted, just as one’s intimate know-how of using one’s body is entirely different from a textbook understanding of physiology. In short, there is always a residue of know-how that we carry in our minds and muscles that we cannot fully put into words.

In a time when many thinkers believed all knowledge could be neatly defined and broken into rules, Michael Polanyi pushed back with a simple but powerful idea: we know more than we can tell. He argued that intuition, experience, and personal judgment are essential parts of how we understand the world. For Polanyi, knowing isn’t just about facts and logic; it’s also about the unspoken, hard-to-explain feel we develop through doing and observing.

Relevance to Software Engineering

Although formulated 60 years ago, Polanyi’s Paradox is alive and well in modern software engineering. Despite the field’s basis in logic and code, much of what good software developers and teams do resides in tacit knowledge rather than explicit rules. In fact, studies suggest that only the “tip of the iceberg” of knowledge in a software organization is explicit and documented, perhaps ~20–30%, while the vast majority (70–80%) is unwritten, experience-based tacit knowledge.

In software teams, explicit knowledge (documents, code repositories, formal processes) is just the visible tip. The bulk of “what we know”, such as the skills, experience, intuition, team culture lies beneath the surface, undocumented but crucial. This tacit layer includes things like unwritten best practices, intuitive design sense, and the many gotchas one learns only through hard-earned experience.

This means that a software team’s most critical understanding often lives in people’s heads and habits rather than in manuals or comment blocks. Seasoned engineers accumulate a deep reservoir of intuition about architecture, code quality, and problem-solving approaches that is hard to articulate formally. They just "know" certain things which is a direct reflection of Polanyi’s Paradox in the tech world.

Polanyi’s Paradox in Software Engineering

In day-to-day software development, Polanyi’s Paradox shows up in numerous ways. For example:

User Interface Design: Good UI/UX designers often rely on an instinctive feel for what is intuitive to users. There is no complete rulebook for “easy to use.” Much of it comes from empathizing with users and leveraging subtle design sensibilities developed over time. A designer might not be able to fully explain why one layout “just works” better than another, because they are drawing on tacit knowledge of human behavior and aesthetics learned through many projects and feedback cycles. This know-how stands in contrast to explicit guidelines (like style guides or usability heuristics). Those help, but the "artistry" goes beyond what’s written down.

Debugging: Tracking down a complex bug is as much an art as a science. A veteran developer can often zero in on the likely cause of a problem quickly, guided by a “gut feeling” from past debugging experiences. This skill of knowing where to look in thousands of lines of code or which log hint to pursue is typically acquired through experience and not easily codified or taught. Two different engineers might solve the same bug via very different thought processes, neither of which is fully documented anywhere. Debugging knowledge lives in their heads as pattern recognition: “Ah, I’ve seen something like this before and it might be due to X.” Such intuitions are hard to write into a step-by-step troubleshooting guide.

Knowledge Transfer in Teams: When a new developer joins a software team, they often go through an onboarding period that involves shadowing others, pair programming, and code review. These practices recognize that a lot of the team’s knowledge is “tribal knowledge”, accumulated wisdom about the codebase and conventions that isn’t in the official docs. From knowing the historical reasons why module Y was built a certain way, to understanding which team member to ask about database quirks, these are things newcomers learn through social interaction.

Much of what senior engineers pass on to juniors is tacit: it’s storytelling, mentorship, and shared experience. If a key developer suddenly leaves, they take a trove of tacit knowledge with them, and it’s often non-trivial to fill that gap. Attempts to capture everything in exhaustive documentation often fall short, because you can’t foresee or codify every relevant detail. This is in spite of the best efforts of Program Managers and tools like Jira, Confluence, etc. Indeed, trying to convert all tacit knowledge into explicit form is “fundamentally flawed due to the very nature of tacit knowledge, which is inherently personal, context-dependent, and difficult to articulate.”

These examples show that software engineering is not just about formal specifications and algorithms; it’s a human craft. The most effective development teams leverage tacit understanding in architecture decisions, code readability, and anticipating user needs. And when this tacit element is missing, say for example, a team only follows rigid checklists without any personal intuition then the results are often mediocre. Polanyi’s Paradox explains why certain programming expertise can’t simply be “transferred” by reading a book.

As a consequence, strategies like code reviews, pair programming, and apprenticeship-style learning are crucial in tech because they help share the unspoken wisdom. Conversely, over-reliance on documentation has limits: no matter how many pages of design docs you write, there will always be nuance that new engineers must pick up by working with others. In short, a great deal of “what developers do” cannot be fully captured in code comments or process manuals. They know more than they can tell.

The Future of Software Engineeing

So all of this should make software engineers feel pretty good about their current job security, because their job is more than just code generation and anyone who has used AI tools knows that even though they can speed up their producitivity tremendously, these tools fall woefully short in many areas primarily because of this tacit knowledge engineers have built up. This is despite some CEOs making announcements that they are replacing engineers with AI. These annoucements are generally made by people who have never built and maintained a large code product or if they have it's been decades since they did so. Furthermore, announcements where they say AI is writing X% of their code are highly suspect because:

It's unclear what they are counting in the X%. Is it agents writing big chunks of code based on prompts? As opposed to how much of the X% is just code completion. Because, for example, as you type something into Google Docs and it suggests the next word or phrase, do you really think that AI wrote X% of that doc because it's doing autocomplete?
They fail to recognized the tacit knowledge that goes into code architecture.

However, this is the current state for software engineers. Thinking that this will be the future and won't change is just engaging in "copium", which I see a lot of on LinkedIn. For example, take a look at the replies to the CEO of Zapier on LinkedIn here where he states that AI won't replace jobs. There's a lot of wishful thinking in the comments that things aren't going to change that much. But this is the current state and people are not internalizing the exponential change that is happening in AI.

Additionally, the Polanyi's Paradox of tacit knowledge that is part of software engineering isn't a formidable problem to solve for AI agents and doesn't even require a huge technological breakthrough. So how do you solve the fact that an AI agent in a program like Cursor or Claude Desktop doesn't have all of an engineer's tacit knowledge? Well you give access to all that knowledge to the AI. So access to Slack conversations, entire organization repos, Microsoft Teams meetings, project management notes on customers, stand up meeting recordings, Jira, Confluence, whatever the organization is using to capture that history. If that is done, then you have an AI agent or agents that has more organization and project knowledge than any engineer that works there.

The requirements to do that are larger context windows as extended information, agents that can operate effectively over that context memory such that agents can remember over sessions, and access all of that data. Infinite memory agents are within reach now. There are many possible solutions to this now including MemGPT from Letta along with the frontier AI companies extending their context windows. The biggest obstacle will be helping organizations streamline all of their data and communications to be made available to AI agents. However, organizations are going to be very motivated to do this, because those that do make their data and commuications available are going to have a huge competitive advantage because of their accelerated pace of software delivery and better organizational decision making.

So this is not a big leap to make agents that have this extended memory ability and being able to maintain intent over these tasks. The biggest impediment will be organizations being able to adapt their data and internal processes. Does this mean that all software engineering jobs will be eliminated? No. Engineers who can orchestrate, manage, secure, and generally work with AI agents will be in demand. So those who are using AI as part of their current process and learning as much as they can about taking advantage of AI and looking to innovate around AI Engineering will be fine.

Unfortunately, there is the mindset of many engineers who don't bring anything more to the table other than their ability to pull a story off of a Jira backlog that was written by a project manager and complete that story and who don't think in terms of innovation or design at all. They work in a bureaucratic mindset of just task completion and generating just the code for that task and not in a mindset of generating ideas and innovating. This type of engineer who doesn't innovate and specifically doesn't innovate using AI will be gone and will not be able to find a job like that again in engineering. In other words:

If you are an engineer who is used to thinking like a robot, you will be replaced with a robot.

But I don't think that most engineers can be blamed for thinking of their job in terms of being automatons writing code. Many engineers ever since getting out of school have been conditioned through endless cycles of sprints, planning meetings, story points, and retrospectives to think this way. Also, in many organizations, there are layers of project managers and customer success people between them and the end users, which reinforces the idea that their end goal is code generation and not building a product for a user. In addition, in larger organizations, they may only work on a small functionality of the end product. But if they are going to survive, they are going to need to bring their human-ness to the job, their uniqueness based on their personal experiences and innovate and be creative.

Relevance to Building Better AI

So given the Polanyi's Paradox on engineering, what have been the implications for AI?

Polanyi’s Paradox has long been recognized as a core challenge in artificial intelligence. Let's go back to the beginning. In the early decades of AI (mid-20th century through 1980s), programmers found that many tasks humans find trivial were extremely difficult to specify to a computer, precisely because of tacit knowledge. As Polanyi himself noted, tasks like driving a car in traffic or recognizing a face rely on a vast array of subtle cues and intuitions that we execute without conscious thought. We can perform them, but we can’t easily enumerate how we do it. This posed a problem: how do you write a program to do something you can’t fully explain to yourself? Early AI could handle problems that had clear rules (like solving equations or playing chess to some extent), but it struggled with open-ended, real-world tasks. Polanyi’s paradox was identified as a major obstacle for AI and automation. Unless we have a complete, explicit recipe for a task, getting a machine to do it is extremely difficult.

A classic example is commonsense reasoning: even a toddler understands that a wobbling stack of blocks might fall, or that a snowman in the road isn’t a threat, but such “common sense” eluded AI programs because nobody could fully encode all the needed if-then rules. As one economist summarized, the tasks easiest to automate are those that “follow explicit, codifiable procedures,” whereas the ones requiring “flexibility, judgment and common sense, skills that we understand only tacitly, have proved most vexing to automate.” Indeed, for decades computers remained “less sophisticated than preschool-age children” at tasks like understanding natural language or navigating an unpredictable environment. This is Polanyi’s Paradox writ large in technology: machines hit a wall when facing the tacit dimension of human know-how.

The past decade, however, has seen much progress in AI, largely by finding ways around Polanyi’s Paradox rather than directly solving it. Since we can’t easily tell the machine what we know implicitly, we let the machine learn from experience just as humans do. In other words, “contemporary AI seeks to overcome Polanyi’s paradox by building machines that learn from human examples, thus inferring the rules that we tacitly apply but do not explicitly understand.” Instead of programmers hand-coding every rule, we feed AI systems with massive amounts of data (images, recordings, text, demonstrations) and let them figure out the patterns.

This is the approach of machine learning, and especially deep learning. For instance, rather than attempting to enumerate all the visual features that distinguish a pedestrian from a shadow on the road, engineers train a self-driving car’s vision system on millions of driving images and let it learn the concept of “pedestrian” by example. The AI essentially internalizes a web of correlations and features, a form of tacit knowledge, from the data.

AI can now capture tacit patterns by finding regularities in unstructured inputs. A striking success of this approach was in the game of Go: expert Go players often say they rely on intuition to judge a good move (they “know” a move is strong without being able to verbalize why). Early AI couldn’t crack Go by brute-force logic alone. But a deep learning system (AlphaGo) was able to absorb millions of human moves and even play games against itself, eventually acquiring an almost intuitive style of play that surprised human experts. In effect, AlphaGo learned the tacit principles of Go that even top players couldn’t articulate, and it went on to defeat the world champion.

Similar stories abound: large language models (LLMs) were trained on billions of sentences of human text and as a result, they learned the hard-to-pin-down rules of language, style, and meaning without anyone programming those rules in directly. These models can answer questions, write essays, or carry a conversation in a very human-like way, not because they were explicitly told how to do it, but because they statistically absorbed how humans communicate. In other words, they captured a slice of our tacit linguistic knowledge.

Interestingly, many of today’s AI successes still depend on how well we can communicate our goals to the system; the unfortunately named idea that we are all familiar with of prompt engineering. Because machines lack the shared human context we often take for granted, we’re required to make our implicit understanding more explicit than we typically would with another person. When working with a coworker, you might simply say, “You know what I mean,” and count on shared experience to fill in the blanks. With AI, that luxury doesn’t exist. Instead, we must carefully spell out our expectations, constraints, and context. Getting the AI to produce a useful output often involves trial and error, tweaking inputs, and drawing on our own unspoken know-how about what might steer it in the right direction. This interaction process is, in itself, a reflection of our tacit knowledge; how we choose to phrase a task, what examples we give, and which assumptions we clarify all stem from our intuition about how the system behaves. In this way, human users must surface their internalized expertise just to bridge the gap between what we understand intuitively and what the machine needs to be told explicitly.

The human is leveraging personal know-how about the AI’s behavior to coax the desired result. For example, when trying to generate a specialized algorithm or solve a domain-specific problem, a developer might instinctively shape their request to match what they know the AI handles well; perhaps by framing the problem in terms that are common in training data or avoiding ambiguous phrasing. They may not consciously think about it, but they draw on a kind of internal playbook developed through past interactions: what tends to work, what confuses the model, and how to steer it toward quality output. This back-and-forth of adjusting inputs, interpreting results, and refining expectations—reflects a nuanced, intuitive understanding that the developer might not even fully articulate. It’s a collaborative process that takes some experience by the engineer with working with the AI where the machine brings breadth of data and pattern recognition, while the human brings situational judgment and contextual insight. In this way, navigating AI systems becomes another expression of Polanyi’s Paradox: we rely on what we know but can’t entirely explain to get useful results from a machine that requires us to make that knowledge as explicit as possible.

Polanyi’s Paradox as a Cautionary Tale

Finally, Polanyi’s Paradox and the quest to encode (or exploit) tacit knowledge in AI bring to mind several cautionary tales and analogies. One is the legend of Frankenstein. In Mary Shelley’s novel, a scientist creates a sentient being but cannot control it or imbue it with the values and understanding needed to integrate into society. The creature, “detached from human values, brought chaos rather than order,” as one commentary puts it. This resonates with modern fears that AI systems, if created without sufficient foresight, might behave in ways misaligned with human intentions; not out of malice, but because we never explicitly told them the full story of right, wrong, and context. Polanyi’s Paradox suggests we might not even know how to fully tell them these things, since so much of our common sense and ethics is tacit.

Another example if you have watched enough Disney then you have seen Mickey Mouse as The Sorcerer’s Apprentice in Fantasia as a young apprentice who animates a broom to do his cleaning for him. Lacking the master’s wisdom, he cannot stop the broom as it relentlessly carries out his literal instructions, eventually flooding the place. This is a perfect illustration of the failure of specification. The apprentice knew just enough magic to issue a command, but not enough to specify boundaries or handle contingencies. In AI terms, he had no “off-switch” or nuance in his instructions. The story underscores the risk of viewing powerful AI tools as magical solutions without understanding their limits. If we deploy AI with the mindset of that apprentice and expect it to perfectly do our bidding when we ourselves haven’t nailed down exactly what we want. We shouldn’t be surprised when we get unintended results.

Already, we’ve seen real examples of this: an AI instructed to maximize user engagement on a platform might learn that spreading sensationalist content achieves the goal, to society’s detriment; or a chatbot given free rein to converse might start generating troubling outputs if the designers didn’t anticipate certain prompts. As one observer quipped, if you’re “not careful, like a dreaming Mickey, you too will suffer the consequences of unintended hallucinations” from AI. In plainer terms, the irony of Polanyi’s Paradox in AI is that we are building machines far more literal-minded than ourselves to handle tasks that we ourselves only understand implicitly.

Conclusion

Polanyi’s Paradox remains a guiding concept in understanding the limits of formal knowledge in both human and machine domains. Historically, it reminds us that even at the height of scientific rationalism, that "knowing" something is often inexpressible. In software engineering, it encourages the recognition that programming is not purely formal and that expertise can’t be entirely captured in textbooks and that engineers must adapt. And in AI, Polanyi’s Paradox is both a challenge and a driving force. A challenge because AI must grapple with tasks that we as humans struggle to fully describe, and a driving force behind the rise of machine learning to let computers learn the unspeakable parts of human skill.

As we integrate AI more into our lives, Polanyi’s insight also serves as a warning: we should be mindful of what we haven’t been able to tell our machines. Balancing explicit knowledge with tacit understanding, and combining human intuition with machine computation, will be an important element of buiding stronger AI. After all, paraphrasing Polanyi, we (and our AI) know more than we can tell.

Monday, March 31, 2025

The Octopus, AI, and the Alien Mind: Rethinking Intelligence

I’ve long been fascinated by octopuses (and yes it's octopuses and not octopi). Their intelligence, forms, and alien-like behaviors have greatly interested me. Over the years, I’ve read many research papers on cephalopod cognition, behavior, and neurobiology and have always been amazed at how octopuses seem to defy our conventional definitions of intelligence, and that science has continued to learn much more about octopus behavior in recent years.

Giant Pacific Octopus

This scientific foundation has been enriched by fiction and nonfiction alike by some of my favorite sci-fi books on the subject. Adrian Tchaikovsky’s Children of Ruin, the second book in his Children of Time offered a speculative exploration of octopus-like alien intelligence evolved into a species that was space faring. Ray Nayler’s book, which is another recent book that I really enjoyed, The Mountain in the Sea, imagined a near future story around the discovery of communicative octopuses and what it means to truly understand another mind. Then in nonfiction, a recent book on octopuses is by Sy Montgomery, who has written extensively on octopuses. It's titled The Soul of an Octopus where she gives a personal perspective into octopus behavior and relationships. And then there's the documentary series Secrets of the Octopus which further explored octopus intelligence through some great visuals and firsthand accounts from researchers. What all of these works share is a sense of awe for the octopus as both a mirror and a counterpoint to our own minds and preconceptions.

This blog post explores how octopus intelligence challenges our narrow, human-centric understanding of cognition, and how embracing different alternative models of mind might open bold new directions in artificial intelligence.

Much of the current discourse around artificial intelligence (AI), especially artificial general intelligence (AGI) and its potential evolution into artificial superintelligence (ASI), is deeply rooted in comparisons to the human brain. This anthropocentric framing shapes how many prominent figures in the AI field conceptualize and pursue intelligence in machines. Ivan Soltesz, a neuroscientist at Stanford University, suggests that AI could eventually perform all human tasks, even those requiring subtle forms of reasoning like dark humor or one-shot learning. He envisions future AI systems that might even choose to appear childlike by “acting silly,” implying that human-like behavior remains a gold standard for intelligent systems. Similarly, Dr. Suin Yi at Texas A&M University has developed a “Super-Turing AI” that mimics the brain’s energy-efficient data migration processes, further reinforcing the idea that human neurobiology provides the blueprint for next-generation AI.^[1]

Other researchers go even further. Guang-Bin Huang and colleagues propose building AI “twins” that replicate the brain’s cellular-level architecture, arguing that such replication could push AI beyond human intelligence.^[2] Bo Yu and co-authors echo this sentiment in their call for AGI systems built directly from cortical region functionalities, essentially copying the operational mechanisms of the human brain into machine agents.^[3] Meanwhile, analysts like Eitan Michael Azoff stress the importance of decoding how the human brain processes sensory input and cognitive tasks, contending that this is the key to building superior AI systems.^[4] Underlying all these efforts is a persistent belief: that human cognition is not only a useful reference point, but perhaps the only viable model for creating truly intelligent machines. And to take it even further, some of the biggest critics of current AI advances will critisize current approaches as not learning in the ways that humans learn. This is one of the central points of Gary Marcus, a prominent LLM critic who, by the way, I disagree with on most of his thoughts around AI, argues that with LLM's immense need for data and its transformer neural network architecture that this is not how children learn.

However, this post challenges that assumption. AI doesn't need to mimic how children learn. AI doesn't need to emulate human brains to advance. While it’s understandable that AI development has historically drawn inspiration from the most intelligent system we know, our own minds, but this narrow focus may ultimately limit our imagination and the potential of the technologies we build.

In this context, the purpose of this post is to advocate for a broader, more inclusive definition of intelligence; one that moves beyond the human brain as the central paradigm. By looking to other models of cognition, particularly those found in non-human animals like the octopus, we can begin to break free from anthropocentric thinking. Octopuses demonstrate complex problem-solving, sensory integration, and even self-awareness with neural architectures completely unlike our own. They serve as a powerful counterpoint to the idea that intelligence must look and act like us. If we are serious about developing truly advanced AI, or even preparing for the possibility of encountering alien minds, it’s time we stopped treating human cognition as the default blueprint. The future of intelligence, both artificial and otherwise, may lie not in copying ourselves, but in embracing the radical diversity of minds that evolution (and possibly the universe) has to offer.

But before we get to a dicussion of thinking more broadly about intelligence that is not human-centric, we need to look at the octopus and some its amazing abilities.

The Mysterious Minds of Octopuses: Cognition and Consciousness

Octpuses are problem sovlers. For example, an octopus can unscrew the lid of a jar to retrieve a crab inside methodically. With eight limber arms covered in sensitive suckers, it solves a puzzle that would stump many simpler creatures. It will change color in a flush of reds and browns in bursts of expression, as if contemplating its next move. So one has to wonder: what kind of mind lurks behind those alien, horizontal pupils?

Octopuses are cephalopods. They are a class of mollusks that also includes squid and cuttlefish, and they have some of the most complex brains in the invertebrate world. The common octopus has around 500 million neurons, a count comparable to that of many small mammals like rabbits or rats. What’s pretty amazing is how those neurons are distributed. Unlike a human, whose neurons are mostly packed into one big brain, an octopus carries over half its neurons in its arms, in clusters of nerve cells called ganglia.^[5] In effect, each arm has a “mini-brain” capable of independent sensing and control. If an octopus’s arm is severed (in an unfortunate encounter with a predator), the arm can still grab and react for a while on its own, showing complex reflexes without input from the central brain.^[6] This decentralized nervous system means the octopus doesn’t have full top-down control of every tentacle movement in the way we control our limbs. Instead, its mind is spread throughout its body.

Such a bizarre setup evolved on a very different path from our own. The last common ancestor of humans and octopuses was likely a primitive worm-like creature over 500 million years ago. All the “smart” animals we’re used to, such as primates, birds, dolphins are our distant cousins with centralized brains, but the octopus is an entirely separate experiment in evolving intelligence.^[7] Its evolutionary journey produced capabilities that are incredibly unique. For example, octopuses and their cephalopod relatives can perform amazing feats of camouflage and signaling. A common cuttlefish can flash rapid skin pattern changes to blend into a chessboard of coral and sand, even though it is likely colorblind, indicating sophisticated visual processing and motor control.^[5] Octopuses have been observed using tools. The veined octopus famously gathers coconut shell halves and carries them to use later as a shelter, effectively assembling a portable armor when needed. They solve mazes and navigate complex environments in lab experiments, showing both short-term and long-term memory capabilities similar to those of trained mammals.^[6]

Crucially, octopuses also demonstrate learning and problem-solving that hint at cognitive complexity. In laboratory tests, octopuses (and cuttlefish) can learn to associate visual symbols with rewards. For instance, figuring out which shape on a screen predicts food. They’re even capable of the cephalopod equivalent of the famous “marshmallow test” for self-control. In one 2021 study, cuttlefish were given a choice between a morsel of crab meat immediately or a tastier live shrimp if they waited a bit longer and many cuttlefish opted to wait for the better snack, exhibiting self-control and delayed gratification.^[5] Such behavioral experiments suggest that these invertebrates can flexibly adapt their behavior and rein in impulses, abilities once thought to be the domain of large-brained vertebrates.

All these findings force us to ask: do octopuses have something akin to consciousness or subjective experience? While it’s hard to know exactly what it’s like to be an octopus, the evidence of sophisticated learning and neural complexity has been convincing enough that neuroscientists now take octopus consciousness seriously. In 2012, a group of prominent scientists signed the Cambridge Declaration on Consciousness, stating that humans are not unique in possessing the neurological substrates for consciousness. Non-human animals, including birds and octopuses, also possess these.^{[6, 10]} In 2024, over 500 researchers signed an even stronger declaration supporting the likelihood of consciousness in mammals and birds and acknowledging the possibility in creatures like cephalopods. In everyday terms, an octopus can get bored, show preferences, solve novel problems, and perhaps experience something of the world; all with a brain architecture utterly unlike our own. It’s no wonder some animal welfare laws (for example, in the EU and parts of the US) have begun to include octopuses, recognizing that an animal this smart and behaviorally complex deserves ethical consideration.^[5]

Beyond Anthropocentric Intelligence: Lessons from an Alien-like Being

Our understanding of animal intelligence has long been colored by anthropocentric bias; the tendency to measure other creatures by the yardstick of human-like abilities. For decades, researchers would ask whether animals can solve puzzles the way a human would, use language, or recognize themselves in mirrors. Abilities that didn’t resemble our own were often ignored or underestimated. Octopus intelligence throws a wrench into this approach. These animals excel at behaviors we struggle to even imagine: their entire body can become a sensing, thinking extension of the mind; they communicate by changing skin color and texture; they don’t form social groups or build cities, yet they exhibit curiosity and individuality. As one researcher put it, “Intelligence is fiendishly hard to define and measure, even in humans. The challenge grows exponentially in studying animals with sensory, motivational and problem-solving skills that differ profoundly from ours.”^[5] To truly appreciate octopus cognition, we must broaden our definition of intelligence beyond tool use, verbal reasoning, or social learning, just because these are traits we prioritized because we’re good at them.

Octopuses teach us that multiple forms of intelligence exist, shaped by different bodies and environments. An octopus doesn’t plan a hunt with abstract maps or language, but its deft execution of a prey ambush, coordinating eight arms to herd fish into a corner, for instance is a kind of tactical genius. In Australian reefs, biologists have observed octopuses engaging in collaborative hunting alongside fish: a reef octopus will lead the hunt, flushing prey out of crevices, while groupers or wrasses snap up the fleeing target and the partners use signals (like arm movements or changes in posture) to coordinate their actions.^[5] This cross-species teamwork suggests a level of problem-solving and communication we wouldn’t expect from a solitary mollusk. It challenges the notion that complex cooperation requires a primate-style social brain.

Philosopher Peter Godfrey-Smith has famously described the octopus as “the closest we will come to meeting an intelligent alien” on Earth. In fact, he notes that if bats (with their sonar and upside-down life) are Nagel’s example of an alien sensory world, octopuses are even more foreign; a creature with a decentralized mind, no rigid skeleton, and a shape-shifting body.^[10] What is it like to be an octopus? It’s a question that stretches our imagination. The octopus confronts us with an intelligence that evolved in a fundamentally different way from our own, and thus forces us to recognize how narrow our definitions of mind have been. Historically, even renowned scientists fell into the trap of thinking only humans (or similar animals) could possess genuine thought or feeling. René Descartes in the 17th century infamously argued non-humans were mere automatons. Today, our perspective is shifting. We realize that an octopus solving a puzzle or exploring its tank with what appears to be curiosity is demonstrating a form of intelligence on its own terms. It may not pass a human IQ test, but it has cognitive strengths tuned to its world.

By shedding our anthropocentric lens, we uncover a startling truth: intelligence is not a single linear scale with humans at the top. Instead, it’s a rich landscape with many peaks. An octopus represents one such peak; an evolutionary pinnacle of cognition in the ocean, as different from us as one mind can be from another. If we acknowledge that, we can start to ask deeper questions: What general principles underlie intelligence in any form? And how can understanding the octopus’s “alien” mind spark new ideas in our quest to build intelligent machines?

Rethinking AI: From Human-Centric Models to Octopus-Inspired Systems

Contemporary artificial intelligence has been inspired mostly by human brains. For example, artificial neural networks vaguely mimic the neurons in our cortices, and reinforcement learning algorithms take cues from the reward-driven learning seen in mammals. This anthropomorphic inspiration has led to remarkable achievements, but it may also be limiting our designs. What if, in addition to human-like brains, we looked to octopus minds for fresh ideas on how to build and train AI?

One striking aspect of octopus biology is its distributed neural architecture. Instead of a single centralized processor, the octopus has numerous semi-autonomous processors (the arm ganglia) that can work in parallel. This suggests that AI systems might benefit from a more decentralized design. Today’s AI models typically operate as one monolithic network that processes inputs step-by-step. An octopus-inspired AI, by contrast, could consist of multiple specialized subnetworks that operate in parallel and share information when needed; more like a team of agents, or a brain with local “brains” for different functions. In fact, researchers in robotics have noted that the octopus’s distributed control system is incredibly efficient for managing its flexible, high-degree-of-freedom body. Rather than trying to compute a precise plan for every tentacle movement (a task that would be computationally intractable), the octopus’s central brain issues broad goals while each arm’s neural network handles the low-level maneuvers on the fly.^[11] Decentralization and parallelism are keys to its control strategy.

In AI, we see early glimmers of this approach in embodied robotics and multi-agent systems. For example, a complex robot could be designed with independent controllers for each limb, all learning in tandem and coordinating similar to octopus arms. This would let the robot react locally to stimuli (like an arm adjusting grip reflexively) without waiting on a central algorithm, enabling faster and more adaptive responses. An octopus-like AI might also be highly adept at processing multiple sensory inputs at once. Octopuses integrate touch, taste (their suckers can “taste” chemicals), vision, and proprioception seamlessly while interacting with the world. Likewise, next-generation AI could merge vision, sound, touch, and other modalities in a more unified, parallel way, breaking free of the silos we often program into algorithms. Researchers have pointed out that emulating the octopus’s decentralized neural structure could allow AI to handle many tasks simultaneously and react quickly to environmental changes, rather than one step at a time.^[12] Imagine an AI system monitoring a complex environment: an octopus approach might spawn many small “agents” each tracking a different variable, cooperating only when necessary, instead of one central brain bottleneck.

Furthermore, octopus cognition emphasizes embodiment - the idea that intelligence arises from the interplay of brain, body, and environment. Modern AI is increasingly exploring embodied learning (for instance, reinforcement learning agents in simulations or robots that learn by doing). Octopuses show how powerful embodiment can be: their very skin and arms form a loop with their brain, constantly sensing and acting. In AI, this suggests we should design agents that learn through physical or virtual interaction, not just from abstract data. Already, reinforcement learning is essentially trial-and-error problem solving, which parallels how an octopus might experimentally tug at parts of a shell until it finds a way to pry it open. Indeed, many octopus behaviors look like RL in action – they learn from experience and adapt strategies based on feedback, exactly the principle by which RL agents improve.^[12] An octopus-inspired AI would likely be one that explores and adapts creatively, perhaps guided by curiosity and tactile experimentation, not just by the kind of formal logic humans sometimes use.

Here are a few ways octopus intelligence could inspire future AI:

Decentralized “brains” for parallel processing: Instead of one central AI model, use a collection of specialized models that work in concert, mirroring the octopus’s network of arm ganglia. This could make AI more robust and responsive, able to multitask or gracefully handle multiple goals at once^{[11, 12]}.
Embodied learning and sensory integration: Build AI that learns through a body (real or simulated), integrating vision, touch, and other senses in real-time. Just as an octopus’s arms feel and manipulate objects to understand them, an embodied AI could achieve richer learning by physically exploring its environment^{[12, 13]}.
Adaptive problem-solving (cognitive flexibility): Octopuses try different tactics and even exhibit impulse control when needed (as seen in the cuttlefish waiting for shrimp). AI agents could similarly be trained to switch strategies on the fly and delay immediate rewards for greater gains, improving their flexibility.^{[5, 12]}
Communication and coordination: While octopuses aren’t social in the human sense, they do communicate (e.g. through color flashes). In AI, multiple agents might communicate their local findings to achieve a larger goal. Developing protocols for AI “agents” to share information akin to octopuses signaling or an arm sending feedback to the central brain which can lead to better coordination in multi-agent systems.^[12]

This isn’t just speculative. Researchers in soft robotics are actively studying octopus neurology to design flexible robots, and computer scientists are proposing networked AI architectures influenced by these ideas.^{[11, 12]} By looking at a creature so unlike ourselves, we expand our toolbox of design principles. We might create machines that think a little more like an octopus; machines that are more resilient, adaptable, and capable of processing complexity in a fluid, distributed way.

Speculative Encounters: Alien Intelligences and Other Minds

If octopuses represent an “alien mind” on Earth, what might actual alien intelligences look like? Science fiction has long toyed with this question, often using Earth creatures as inspiration. Notably, the film Arrival features heptapod aliens that resemble giant cephalopods, complete with seven limb-like appendages and an ink-like mode of communication. These aliens experience time non-linearly and communicate by painting complex circular symbols, which is a far cry from human speech. The creators of Arrival were influenced by findings in comparative cognition; they explicitly took cues from cephalopods as a model for an intelligence that is highly developed but utterly non-human.^[14] The heptapods’ motivations in the story are opaque to humans, and initial contact is stymied by the barrier of understanding their language and perception. This scenario underscores how challenging it may be to recognize, let alone comprehend, a truly alien consciousness.

Beyond cephalopod-like extraterrestrials, speculative biology offers a wide array of possibilities. Consider an alien species that evolved as a hive mind, more like social insects on Earth. Individually, the creatures might be as simple as ants or bees, but collectively they form a super-intelligent entity, communicating via pheromones or electromagnetic signals. Their “thoughts” might be distributed across an entire colony or network, with no single point of view; intelligence as an emergent property of many bodies. This isn’t far-fetched; even on Earth, we see rudiments of collective intelligence in ant colonies, bee hives, and slime molds. A sufficiently advanced hive species might build cities or starships, but there may be no identifiable leader or central brain making their decision-making processes hard for humans to fathom.

Or imagine a planetary intelligence like the ocean of Solaris in Stanisław Lem’s classic novel Solaris. In that story, humans struggle to communicate with a vast alien entity that is essentially an ocean covering an entire planet; possibly a single, planet-wide organism with intelligence so different that its actions seem incomprehensible. Is it conscious? Does it dream, plan, or care about the humans orbiting above? The humans never really find out. Lem uses it to illustrate how an alien mind might be so far from our experience that we can’t even recognize its attempts at communication. Likewise, an alien intelligence might be embedded in a form of life that doesn’t even have discrete “individuals” as we understand them. It could be a network of microorganisms, or a cloud of gas that has achieved self-organization and data processing, as astronomer Fred Hoyle imagined in his novel The Black Cloud. If our probes encountered a Jupiter-sized storm system that subtly altered its own vortices in response to our signals, would we know we had met an alien mind? Stephen Wolfram, in a thought experiment, describes a scenario of a spacecraft “conversing” with a complicated swirling pattern on a planet, perhaps exchanging signals with it, and poses the question of whether we’d recognize this as intelligence or dismiss it as just physics. After all, any sufficiently complex physical system could encode computations as sophisticated as a brain’s, according to Wolfram’s Principle of Computational Equivalence.^[16] In other words, alien intelligence might lurk in forms we would never intuitively label as minds.

Science fiction also entertains the possibility that the first alien intelligence we encounter might be artificial, not biological. If an extraterrestrial civilization advanced even a bit beyond us, they may have created Artificial Intelligences of their own and perhaps those AIs, not the biological beings, are what spread across the stars. Some theorists even speculate that the majority of intelligences in the universe could be machine intelligences, evolved from their original organic species and now operating on completely different substrates (silicon, quantum computing, plasma, who knows).^[17] These machine minds might think at speeds millions of times faster than us, or communicate through channels we don’t detect. For instance, an alien AI might exist as patterns of electromagnetic fields, or as self-replicating nanobots diffused through the soil of a planet, subtly steering matter toward its goals.

Ultimately, exploring alien intelligences in speculation forces us to confront the vast space of possible minds. Our human mind is just one point in that space - one particular way intelligence can manifest. An octopus occupies another point, a very distant one. A truly alien mind could be farther away still. One insightful commentator noted that “the space of possible minds is vast, and the minds of every human being that ever lived only occupy a small portion of that space. Superintelligences could take up residence in far more alien, and far more disturbing, regions.”^[18] In short, there could be forms of intelligence that are as far from us as we are from an amoeba, occupying corners of cognitive possibility we haven’t even conceived.

Crucially, by studying diverse intelligences, whether octopus or hypothetical alien, we expand our imagination for what minds can do. Cephalopods show that advanced cognition can arise in a creature with a short lifespan, no social culture to speak of, and a radically different brain plan. This suggests that on other worlds, intelligence might crop up under a variety of conditions, not just the Earth-like, primate-like scenario we used to assume. It also suggests that when we design AI, we shouldn’t constrain ourselves to one model of thinking. As one science writer put it, there are multiple evolutionary pathways and biological architectures that create intelligence. The study of cephalopods can yield new ways of thinking about artificial intelligence, consciousness, and plausible imaginings of unknown alien intelligence.^[7] In embracing this diversity, we prepare ourselves for the possibility that when we finally meet E.T. (or create an alien intelligence ourselves in silico), it might not think or learn or communicate anything like we do.

Towards Diverse Super-Intelligence: Expanding the Definition of “Mind”

Why does any of this matter for the future of AI and especially the prospect of Artificial Super Intelligence (ASI)? It matters because if we remain locked in an anthropocentric mindset, we may limit the potential of AI or misjudge its nature. Expanding our definition of intelligence isn’t just an academic exercise; it could lead to more powerful and diverse forms of ASI that transcend what we can imagine now.

Today’s cutting-edge AI systems already hint at non-human forms of thinking. A large language model can write code, poetry, and have complex conversations, yet it does so with an architecture and style of “thought” very unlike a human brain. AI agents in game environments sometimes discover strategies that look alien to us; exploiting quirks of their world that we would never consider, because our human common sense filters them out. As AI researcher Michael Levin argues, intelligence is not about copying the human brain, but about the capacity to solve problems in flexible, creative ways; something that can happen in biological tissues, electronic circuits, or even colonies of cells.^[13] If we define intelligence simply as achieving goals across varied environments, then machines are already joining animals on a spectrum of diverse intelligences.

We must recognize our “blind spot” for unfamiliar minds. We humans are naturally attuned to notice agency in entities that look or behave like us (or our pets). We’re far less good at recognizing it in, say, an AI that thinks in billions of parameters, or an alien life form made of crystal. This anthropocentric bias creates a dangerous blind spot. As one author noted, we may be oblivious to intelligence manifesting in radically different substrates. In the past, this bias led us to underestimate animal intelligences (we failed to see the clever problem-solving of crows or the learning in octopuses for a long time because those animals are so unlike us). In the present, it could mean we fail to appreciate the emergence of novel intelligences in our AI systems, simply because they don’t reason or introspect as a person would.^[13] If we expand our mindset; appreciating the octopus’s mind, the potential minds of aliens, and the unconventional cognition of machines we’ll be better equipped to guide AI development toward true super-intelligence.

What might a diverse ASI look like? It might be an entity that combines the logical prowess of digital systems with the adaptive embodied skills seen in animals like octopuses. It could be a networked intelligence encompassing many agents (or robotic bodies) sharing one mind, much like octopus arms or a hive, rather than a singular centralized brain. Such an ASI could multitask on a level impossible for individual humans, perceiving the world through many “eyes” and “hands” at once. Its thought processes might not be describable by a neat sequence of steps (just as an octopus’s decision-making involves parallel arm-brain computations). It might also be more resilient: able to lose parts of itself (servers failing, robots getting damaged) and self-heal or re-route around those losses, the way an octopus can drop an arm and survive. By not insisting that intelligence must look like a human mind, we open the door to creative architectures that could surpass human capabilities while also being fundamentally different in form.

Philosophically, broadening the concept of intelligence fosters humility and caution. Nick Bostrom, in discussing the prospect of superintelligence, reminds us not to assume a super-AI will share our motivations or thinking patterns. In the vast space of possible minds, a superintelligence might be as alien to us as an octopus is, or more so.^[18] By acknowledging that space, we can attempt to chart it. We can deliberately incorporate diversity into AI design, perhaps creating hybrid systems that blend multiple “thinking styles.” For example, an ASI could have a component that excels at sequential logical reasoning (a very human strength), another that operates more like a genetic algorithm exploring myriad possibilities in parallel (closer to an evolutionary or octopus-like trial-and-error strategy), and yet another that manages collective knowledge and learning over time (the way humans accumulate culture, something octopuses don’t do).^[7]In combination, such a system might achieve a breadth of cognition no single-track mind could.

Expanding definitions of intelligence also has an ethical dimension. It encourages us to value minds that are not like ours - be they animal, machine, or extraterrestrial. If one day we create an AI that has an “alien” form of sentience, recognizing it as such will be crucial to treating it appropriately. The same goes for encountering alien life: we’ll need the wisdom to see intelligence in forms that might initially seem bizarre or unintelligible to us.

Conclusion

Cephalopod intelligence is not just an ocean curiosity; it’s a profound hint that the universe harbors many flavors of mind. By learning from the octopus, we prepare ourselves to build AI that is richer and more creative, and to recognize intelligence in whatever shape it takes: carbon or silicon, flesh or code, earthling or alien. The march toward Artificial Super Intelligence need not follow a single path. It can branch into a diverse ecosystem of thinking entities, each drawing from different principles of nature. Such a pluralistic approach might very well give rise to an ASI that is both exceptionally powerful and surprisingly adaptable; a true melding of human ingenuity with the wisdom of other minds. The octopus in its deep blue world, the hypothetical alien in its flying saucer (or tide pool, or cloud), and the AI in its datacenter may all be points on the great map of intelligence. By connecting those dots, we trace a richer picture of what mind can be and that map could guide us toward the next breakthroughs in our quest to create, and coexist with, intelligences beyond our own.

Sources

1. Henton, Lesley. "Artificial Intelligence That Uses Less Energy By Mimicking The Human Brain." Texas A&M Stories. https://stories.tamu.edu/news/2025/03/25/artificial-intelligence-that-uses-less-energy-by-mimicking-the-human-brain/. 2025.
2. Huang, Guang-Bin et al. "Artificial Intelligence without Restriction Surpassing Human Intelligence with Probability One: Theoretical Insight into Secrets of the Brain with AI Twins of the Brain." https://arxiv.org/pdf/2412.06820.
3. Yu, Bo et al. "Brain-inspired AI Agent: The Way Towards AGI." ArXiv, https://arxiv.org/pdf/2412.08875, 2024.
4. "Cracking the Brain’s Neural Code: Could This Lead to Superhuman AI?", https://www.thenila.com/blog/cracking-the-brains-neural-code-could-this-lead-to-superhuman-ai. Neurological Institute of Los Angeles.
5. Blaser, R. (2024). Octopuses are a new animal welfare frontier-what scientists know about consciousness in these unique creatures. The Conversation/Phys.org.
6. “Animal consciousness.” Wikipedia, Wikimedia Foundation, last modified March 30, 2025. https://en.wikipedia.org/wiki/Animal_consciousness. 7. Forking Paths (2023). “The Evolution of Stupidity (and Octopus Intelligence).” (On multiple evolutionary paths to intelligence).
8. Chung, W.S., Marshall, J. et al. (2021). Comparative brain structure and visual processing in octopus from different habitats. Current Biology. (Press summary: “How smart is an octopus?” University of Queensland/Phys.org).
9. Cambridge Declaration on Consciousness (2012) – Public statement by neuroscientists on animal consciousness.
10. Godfrey-Smith, P. (2013). “On Being an Octopus.” Boston Review. (Octopus as an independent evolution of mind).
11. Sivitilli, D. et al. (2022). “Lessons for Robotics From the Control Architecture of the Octopus.” Frontiers in Robotics and AI.
12. Sheriffdeen, Kayode. (2024). "From Sea to Syntax: Lessons from Octopus Behavior for Developing Advanced AI Programming Techniques." "https://easychair.org/publications/preprint/Tz1l/open#:~:text=architectures%20in%20AI%20systems%2C%20developers,to%20handle%20multiple%20tasks%20simultaneously.
13. Yu, J. (2025). “Beyond Brains: Why We Lack A Mature Science of Diverse Intelligence.” Intuition Machine (Medium).
14. Extinct Blog (2017). “From Humanoids to Heptapods: The Evolution of Extraterrestrials in Science Fiction.” (Discussion of Arrival and cephalopod-inspired aliens).
15. Poole, S. (2023). The Mountain in the Sea – book review, The Guardian. (Fictional exploration of octopus intelligence and communication).
16. Wolfram, S. (2022). “Alien Intelligence and the Concept of Technology.” Stephen Wolfram Writings.
17. Rees, Martin. (2023) "SETI: Why extraterrestrial intelligence is more likely to be artificial than biological." Astronomy.com. "https://www.astronomy.com/science/seti-why-extraterrestrial-intelligence-is-more-likely-to-be-artificial-than-biological/" 18. Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. (Excerpt via Philosophical Disquisitions blog on the space of possible minds)

Saturday, March 29, 2025

OpenAI's New Image Model

This week OpenAI has introduced a significant enhancement to its GPT-4o model by integrating advanced image generation capabilities directly into ChatGPT. This development enables users to create detailed and photorealistic images through simple text prompts within the chat interface. I watched their introduction video and read through the announcement and immediately I think anyone could see that this was a big step up from DALL-E.

The image generation exhibits notable improvements over previous models. It excels in accurately rendering text within images, a task that has traditionally challenged AI image generators - and opens up an incredible number of use cases. Even if you liked the images that models like DALL-E generated that fact that the text rendering was so bad led to a lot of frustration for trying to use for many real cases. Additionally, the model demonstrates a refined understanding of spatial relationships, allowing for more precise and contextually appropriate image compositions. These advancements are largely attributed to reinforcement learning from human feedback (RLHF), wherein human trainers provide corrective input to enhance the model’s accuracy and reliability.

The introduction of this feature has sparked both enthusiasm and debate within the creative community. Users have leveraged the tool to produce images in the style of renowned studios, such as Studio Ghibli, leading to discussions about intellectual property rights and the ethical implications of AI-generated art. Legal experts suggest that while specific works are protected by copyright, mimicking a visual style may not constitute infringement, highlighting the complexities of applying traditional copyright laws to AI-generated content.

Despite its impressive capabilities, the image generation feature has faced challenges. The surge in user engagement led to temporary rate limits, with OpenAI’s CEO Sam Altman noting that their GPUs were under significant strain due to high demand. Additionally, users have reported occasional inaccuracies, such as misrepresented image elements, indicating areas where further refinement is needed. And Sam Altman said as much in launch, but I think anyone's objective experience is that these inaccuracies are a fraction of what their old model created.

They also mentioned that they are aware there could be misuse. But according to OpenAI, they have implemented safeguards to prevent misuse of the image generation tool. These include blocking the creation of harmful content and embedding metadata to indicate AI-generated images, promoting transparency and ethical use. Users retain ownership of the images they create, provided they adhere to OpenAI’s usage policies.

But that's all I'm going to say about copyright and safety for now and leave the debate for others and for another time. Because right now with what we've seen the past week is that this is one of the most popular features OpenAI has released. Simply, it's just really fun to use. And it opens up all sorts of ways for people to extend their creativity in ways that were not possible or practical before, such as the mother I saw who created custom coloring pages for each of the children coming to her daughter's birthday party that was tailored to each child's interests.

And although I've had many discussions with people over the last few days of very practical uses, I wanted to test it out with some weird experiments with my profile picture. So the first thing I did was not to create a Studio Ghibli version - because everyone was doing that. Instead I created a "Peanuts", a "South Park", and a robot version of my profile picture:

Then I did a comic book version and an action figure version of me:

So this new image generation is a powerful tool - but more importantly it's just fun. But as the technology evolves, ongoing attention to ethical considerations and continuous refinement will need to be essential to maximize its positive impact while addressing emerging challenges.

Saturday, March 22, 2025

Computer World by Kraftwerk

In this post, I want to talk about a very prescient album by the visionary group - Kraftwerk. In 1981, German electronic pioneers Kraftwerk released their groundbreaking album Computer World, a musical exploration of the rapidly digitizing society at the beginning of the modern computer age. I didn't hear the full album until many years after it first came out, so I'm not sure what I would have thought then. But many of its themes must have seems pretty odd. But today, more than four decades later, it is astonishingly relevant. The album is a visionary reflection of our current relationship with technology.

When Computer World first appeared, computers were mostly relegated to large organizations, research labs, and universities. They were mysterious, intimidating machines and not yet fixtures in everyone's lives. This computing landscape was vastly different from today. Computers were transitioning from large, mainframe machines like the IBM System/370; room-sized systems primarily operated using punched cards, magnetic tape, and COBOL or FORTRAN, to the rise of early microcomputers such as the Apple II, Commodore PET, and IBM’s newly introduced Personal Computer (PC). The programming languages popular at the time were BASIC, Fortran, Pascal, and assembly language, reflecting limited processing power, memory capacities typically measured in kilobytes, and simple, text-based user interfaces. Data storage relied heavily on floppy disks, cassette tapes, and early hard drives that offered only a few megabytes of space at considerable cost. Kraftwerk’s vision of a computer dominated world was strikingly futuristic against this modest technological backdrop, highlighting their ability to anticipate the profound changes that would soon follow. With their clinical yet mesmerizing sound, they articulated a future that seemed distant but inevitable: a world deeply intertwined with digital technology, data, and communication.

The Album That Predicted the Future

I have talked before about the movie Metropolis in my post about Isaac Asimov's collection of short stories - Robot Dreams. And just like Fritz Lang’s visionary 1927 film Metropolis profoundly shaped our cultural imagination about industrialization and the societal tensions arising from technological advancement, Kraftwerk’s Computer World provided an equally prescient view into the digital era. Both works emerged ahead of their time, utilizing groundbreaking aesthetics. Lang did this through pioneering visual effects and Kraftwerk through minimalist electronic music to depict technology’s potential to redefine humanity. While Metropolis illustrated a society divided by class and machinery, Kraftwerk forecasted the pervasive influence of computers on human relationships, privacy, and identity. Each work challenged audiences to reconsider the balance between technological innovation and human values, leaving lasting legacies in popular culture and highlighting anxieties and hopes that remain strikingly relevant today.

Here's how you can really appreciate the overlap between Metropolis and Computer World. Queue up several Kraftwerk albums, so in addtion to Computer World, I would suggest The Man Machine (1978), Trans Europe Express (1977), and Radio Activity (1975) and then watch the full cut of Metropolis which you can watch here. Turn off the soundtrack of the movie and just overlay Kraftwerk albums. The music tracks extremely well and gives a really interesting perspective to watching the movie.

Besides Metropolis, Kraftwerk’s Computer World could be considered a musical counterpart to literary explorations of technological futures. Much like William Gibson’s cyberpunk classic Neuromancer (1984) or George Orwell’s surveillance dystopia in 1984, Kraftwerk’s album foreshadowed how deeply digital technology could embed itself into our daily lives, profoundly changing the human experience.

Yet Kraftwerk’s approach was neither explicitly dystopian nor utopian; it was observational, analytical, and occasionally playful. Their clear-eyed view of the future resembles that of Neal Stephenson’s Snow Crash and Don DeLillo’s White Noise, capturing both excitement and subtle unease about technological advancements.

Kraftwerk’s Computer World was remarkably ahead of its time in its depiction of a digitized society, robotics, and AI, much like Alan Turing’s pioneering work on the halting problem in the 1930s. Just as Kraftwerk envisioned personal computers, digital intimacy, and widespread surveillance decades before their reality, Turing’s foundational concepts in computational theory anticipated the boundaries and complexities of modern computing. His exploration of problems such as determining whether an algorithm could halt or run indefinitely laid the groundwork for understanding computation’s limits long before digital computers became commonplace. This may seem like a stretch, but both Kraftwerk and Turing, in their respective fields, anticipated technological and societal implications that would take decades to fully unfold, underscoring their visionary insight into a future that many others could scarcely imagine.

Track Listing

Side 1:

"Computer World"
"Pocket Calculator"
"Numbers"
"Computer World 2"

Side 2:

"Computer Love"
"Home Computer"
"It's More Fun to Compute"

Just looking at those song titles, one can see how this album is anticipating a future computer culture. Besides being just visionary, its not afraid to have fun with its subject matter. The song "It's More Fun to Compute" repeats that title line of "it's more fun to compute" as the only lyric over and over against swelling electronic music and computer beeps.

Musically, their minimalist electronic style itself anticipated genres that would later dominate global pop culture. Hip hop artists, from Afrika Bambaataa’s seminal track “Planet Rock” (1982), is directly influenced by Kraftwerk, to modern EDM and synth-pop acts, have all traced their lineage back to Kraftwerk’s forward-looking sounds. Songs like “Computer Love,” “Pocket Calculator,” and “Home Computer” captured with precision what has become today’s norm. Long before the Internet and smartphones became ubiquitous, Kraftwerk envisioned personal devices enabling instant global connectivity, electronic communication, and even human relationships through computers.

What Kraftwerk envisioned feels uncannily accurate. Their depictions of technology mediated isolation, digitized personal relationships, and society’s increasing dependence on machines were incredibly prescient. Long before social media reshaped human interactions, Kraftwerk sang of loneliness alleviated, or exacerbated, by computers, a paradox still familiar today.

Today, Computer World resonates even louder. Privacy concerns surrounding data collection (“Interpol and Deutsche Bank, FBI and Scotland Yard,” Kraftwerk intoned prophetically in the album’s title track), digital surveillance, algorithmic bias, and technology-driven isolation are all modern issues predicted by the band. Kraftwerk’s eerie reflection of databases and data driven society in 1981 feels almost prophetic in our era of big data, AI, and facial recognition.

Moreover, the song “Computer Love” accurately presaged the online dating revolution, capturing the poignant loneliness of individuals turning to technology to satisfy deeply human needs for connection. As dating apps become the norm, Kraftwerk’s portrayal of human vulnerability through technological mediation has proven remarkably prescient.

Conclusion

Computer World remains remarkably fresh because Kraftwerk dared to glimpse beyond their present. The album’s continued relevance shows that we’ve inherited precisely the future they imagined; a future that is marked by extraordinary digital connectivity but accompanied by new anxieties about surveillance, isolation, and artificiality.

Listening to Computer World today isn’t merely an act of musical appreciation. It’s a reminder that the intersection of humanity and technology remains complicated and evolving. As we grapple with AI, privacy, and our digital identities, Kraftwerk’s visionary work remains essential. Computer World is a timeless reflection of a society forever changed by computers.

Sunday, March 9, 2025

Random Forest: A Comprehensive Guide

Random Forest is a highly powerful and versatile machine learning algorithm, often considered the most widely used model among data scientists. Its popularity stems from its reliability and ease of use, making it a common first choice for many tasks. However, to leverage its full potential, it’s crucial to go beyond the basic understanding and explore the underlying mathematical principles, as well as the types of data features and problems where Random Forest truly excels.

At its core, Random Forest enhances the simplicity of decision trees by employing an ensemble approach, which significantly improves prediction accuracy while reducing overfitting. In this post, we’ll walk through the key concepts behind Random Forest, work through a practical example, and examine the mathematical foundations that drive its decision-making process. The example will be implemented in Python using scikit-learn, a library that offers excellent documentation and a well-optimized implementation of Random Forest.

Fundamentally, Random Forest is an ensemble of decision trees. Instead of relying on a single tree, which can be prone to overfitting, it constructs multiple trees using different subsets of both the data and the features. The final prediction is determined by aggregating the outputs of all the trees, either through majority voting (for classification) or averaging (for regression). This ensemble approach allows Random Forest to capture intricate relationships and interactions between variables. For instance, when predicting whether a customer will purchase a product, multiple nonlinear factors such as income, browsing history, and seasonality may influence the decision. Random Forest effectively models these interactions, making it a robust choice for complex datasets.

Random Forest can be used for two types of problems:

For classification: A majority vote among the trees determines the class. In scikit-learn, we can use its RandomForestClassifier module.
For regression: The average of the trees’ predictions is taken. In scikit-learn, we can use its RandomForestRegressor module.

Random Forest is a very flexible algorithm, but it performs best with certain types of data:

Numerical Features (Continuous and Discrete): Random Forest is naturally suited for numerical features since it creates decision tree splits based on thresholds. Some examples would be age, salary, temperature, stock prices.
Categorical Features (Low to Medium Cardinality): Random Forest works well when categories are few and meaningful. Some examples would include “Gender”, “Day of the Week” (Monday-Sunday). If categorical variables have high cardinality (e.g., ZIP codes, product IDs), proper encoding is necessary.

Okay, those are examples of data features where Random Forest works well, but its important that a person explicitly knows what types of data features it doesn't work well with. High cardinal data is problematic like zip codes or product IDs (if there are a lot of them - high cardinality), or something like user IDs. For something like user IDs where every row is a different ID, it should be obvious that shouldn't be a feature, but zip codes and product IDs would be better to group those into smaller segments like regions or product categories.

Sparse data (many zero values) and high dimensional data (too many featues) can also be problematic. For high dimensional data, if the number of features is much larger than the number of samples, Random Forest can become inefficient. In both situations, dimensionality reduction could be the answer. For example, in genomics (e.g., thousands of genes as features), feature selection or dimensionality reduction is needed. Also, be careful with one hot encoding categorical variables such that it creates a large number of columns (large in comparison to the number of samples).

Finally, while Random Forest is very scalable, it can struggle with big data (millions of rows and columns) due to its high computational cost, especially if you set the number of estimators (number of trees in the forest) to a "high" number. If that is the case, then alternatives to be tried include Gradient Boosting Trees (XGBoost, LightGBM) or even Neural Networks could scale better. Also be aware that when saving the model (through pickle) it could create a very large model file, which could be a deployment issue depending on where you are deploying it. A quick solution to that model size problem is to experiment with the number of trees. It is often the case that you can reduce the number of trees and not materially reduce the accuracy. That reduction in the number of trees can greatly reduce the size of the saved model file.

How Does a Random Forest Work?

Random Forest has three main characteristics:

Bootstrapping (Bagging):
Random subsets of the training data are selected with replacement to build each individual tree. This process, known as bootstrapping, ensures that every tree sees a slightly different version of the dataset. This increases the diversity among the trees and helps reduce the variance of the final model.
Random Feature Selection:
At each split instead of considering every feature, a random subset of features is chosen. The best split is determined only from this subset. This approach helps prevent any single strong predictor from dominating the model and increases overall robustness.
Tree Aggregation:
Voting or averaging: once all trees are built, their individual predictions are aggregated. For a classification task, the class that gets the most votes is chosen. In regression, the mean prediction is used.

The Math Behind Random Forests

The most important mathematical concept in Random Forests is the idea behind how decision trees make decisions regarding the splits.

Gini Impurity

A common metric used to evaluate splits in a decision tree is the Gini impurity. It measures how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the node.

The Gini impurity is given by:

\( G = 1 - \sum_{i=1}^{C} p_i^2 \)

Where:

\( C \) is the number of classes.
\( p_i \) is the proportion of samples of class i in the node. \( p_i = \frac{\text{Number of samples in class } i}{\text{Total samples in the node}} \)

A lower Gini impurity means a purer node, which is generally preferred when making splits. But let's not just blow by the formula for the Gini score. Let's try and get an intuitive understanding of why we are using it.

Example Gini Calculation:

Suppose a node contains 10 samples:

6 samples belong to Class 1: \( p_1 \) = 0.6
4 samples belong to Class 2: \( p_2 \) = 0.4

Then, the Gini impurity is:

\( G = 1 - (0.6^2 + 0.4^2) = 1 - (0.36 + 0.16) = 1 - 0.52 = 0.48 \)

The Gini Impurity is a measure of how “impure” or “mixed” a dataset is in terms of class distribution. The goal of this metric is to determine how often a randomly chosen element from the dataset would be incorrectly classified if it were labeled according to the distribution of classes in the dataset.

What Happens When a Node is Pure?

If a node contains only one class, then the probability for that class is 1 and for all other classes is 0.
Example: Suppose a node contains only class “A,” meaning \( p_A = 1 \), and all other \( p_i = 0 \). The Gini impurity calculation is:

\( G = 1 - (1^2) = 0 \)

This correctly indicates that the node is completely pure (no uncertainty).

What Happens When a Node is Impure?

If a node contains multiple classes in equal proportions, the impurity is higher. For example, if a node has two classes with equal probability \( p_1 = 0.5 \), \(p_2 = 0.5\) :

\( G = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = 0.5 \)

which shows that the node has some level of impurity.

What about \( \sum p_i^2 \) ?

The term:

\( \sum_{i=1}^{C} p_i^2 \)

represents the probability that two randomly chosen samples belong to the same class (this is also known as the probability of a correct classification by a randomly assigned label). If a node is pure (all samples belong to the same class), then one class has probability 1 and all others are 0. In this case, \( \sum p_i^2 = 1 \), meaning that the probability of correct classification is high, and thus impurity is low. If a node is impure (samples are evenly split among classes), then each \( p_i \) is small, making \( \sum p_i^2 \) small, and impurity is higher.

Okay, But Why Do We Subtract from 1?

The impurity should be 0 for pure nodes and higher for mixed nodes. Since \( \sum p_i^2 \) represents the probability of getting two samples from the same class, the complement \( 1 - \sum p_i^2 \) represents the probability of drawing two samples from different classes. Higher values of G indicate more impurity. In essence, subtracting from 1 flips the interpretation, making it a measure of impurity instead of purity.

Gini Final Thoughts

Gini impurity is not the only other criterion that can be used in Random Forests. In fact, scikit-learn has two others: entropy and log loss, but Gini impurity is the default and widely used.

Also, athough Gini impurity is primarily associated with decision trees and Random Forests, it has applications beyond these models, which is another reason why its a great idea to know some of the background around it. For example, Gini impurity is also used in CART algorithms, feature importance calculations, clustering and unsupervised learning, fairness and bias detection, in economics with income inequality measurements, and genetics.

Gini has this core idea of measuring impurity or diversity which makes it useful in any field that involves classification, grouping, or fairness assessment.

Random Forest: An Example

Let’s put all of these ideas together and walk through a simplified example to see these concepts.

Imagine a small dataset with two features and a binary label:

\[ \begin{array}{|c|c|c|c|} \hline \textbf{Sample} & \textbf{Feature1 (X)} & \textbf{Feature2 (Y)} & \textbf{Label} \\ \hline 1 & 2 & 10 & 0 \\ 2 & 3 & 15 & 0 \\ 3 & 4 & 10 & 1 \\ 4 & 5 & 20 & 1 \\ 5 & 6 & 15 & 0 \\ 6 & 7 & 25 & 1 \\ 7 & 8 & 10 & 0 \\ 8 & 9 & 20 & 1 \\ \hline \end{array} \]

Step 1: Bootstrapping

Randomly sample the dataset with replacement to create a bootstrap sample. For example, one such sample might have the indices:

[2, 3, 3, 5, 7, 8, 8, 1]

Extracted bootstrap sample:

\[ \begin{array}{|c|c|c|c|} \hline \textbf{Sample} & \textbf{Feature1 (X)} & \textbf{Feature2 (Y)} & \textbf{Label} \\ \hline 2 & 3 & 15 & 0 \\ 3 & 4 & 10 & 1 \\ 3 & 4 & 10 & 1 \\ 5 & 6 & 15 & 0 \\ 7 & 8 & 10 & 0 \\ 8 & 9 & 20 & 1 \\ 8 & 9 & 20 & 1 \\ 1 & 2 & 10 & 0 \\ \hline \end{array} \]

Notice some samples appear multiple times (e.g., Sample 3 and Sample 8), while some samples from the original dataset (e.g., Sample 4 and 6) don’t appear at all..

Step 2: Building a Single Decision Tree

For the bootstrap sample, a decision tree is built by considering various splits. Suppose first we consider a split on Feature1 at X = 5:

Left Node (X ≤ 5): Contains samples 2, 3, 3, 1.

Total samples = 4

Class 0 count = 2
Class 1 count = 2

Right Node (X > 5): Contains samples 5, 7, 8, 8.

Total samples = 4

Class 0 count = 2
Class 1 count = 2

Result:

The tree evaluates multiple such splits (e.g., other features, thresholds) and chooses the split that results in the lowest Gini impurity.

Here is an example of another one of the many other possible trees shown graphically.

The samples value represents the number of training data points (observations) that reached that particular node. At the root node (the topmost node), this number is equal to the total number of observations in the dataset used for training the tree. As the tree splits at each step, the number of samples in child nodes decreases because the dataset is divided based on feature conditions. The value represents how many training samples from each class are in the node. This helps determine how “pure” the node is. If all samples belong to a single class, the node is pure and doesn’t need further splitting. For example, in the node that has value = [5, 1] means 5 samples belong to class 0 and 1 sample belongs to class 1.

What is visualized here is a single decision tree, but in a Random Forest, multiple decision trees are built, each using a different subset of the data and features. The final prediction is determined by aggregating the outputs from all these trees.

Step 3: Aggregation of Trees

In a Random Forest model, this process is repeated multiple times with different bootstrap samples. After multiple decision trees are created:

Classification: Each tree independently votes for a class label with majority voting deciding the final predicted label. If there’s a tie, the class with lower numerical label (e.g., 0) might be chosen by convention, or other tie-breaker methods applied.
For example, if 5 trees predict class “1” and 2 trees predict class “0,” the final prediction is class 1.
For Regression: The average of all tree predictions is taken.

Hypothetical Aggregation Example If we had multiple trees (e.g., 5 trees):

\[ \begin{array}{|c|c|} \hline \textbf{Tree #} & \textbf{Prediction (for new input X=5, Y=15)} \\ \hline 1 & 0 \\ 2 & 1 \\ 3 & 0 \\ 4 & 1 \\ 5 & 1 \\ \hline \end{array} \]

Votes:

Class 0: 2 votes
Class 1: 3 votes

Final prediction: Class 1 (since it has majority votes).

Summary: This process leverages the randomness of bootstrap sampling and ensemble learning to improve predictive accuracy and generalization compared to just using a single decision tree.

Python Example Using scikit-learn

Okay now that we have the concepts, let's put this example into some Python code that demonstrates how to implement a Random Forest classifier on our small dataset:

from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Define the dataset
data = pd.DataFrame({
    'Feature1': [2, 3, 4, 5, 6, 7, 8, 9],
    'Feature2': [10, 15, 10, 20, 15, 25, 10, 20],
    'Label':    [0, 0, 1, 1, 0, 1, 0, 1]
})

X = data[['Feature1', 'Feature2']]
y = data['Label']

# Train a Random Forest with 5 trees
clf = RandomForestClassifier(n_estimators=5, random_state=42)
clf.fit(X, y)

# Predict for a new sample
new_sample = [[5, 15]]
prediction = clf.predict(new_sample)
print("Predicted Label:", prediction[0])

Output: Predicted Label: 0

This code creates a simple dataset and trains a Random Forest classifier with 5 trees. It predicts the label for a new sample where Feature1 is 5 and Feature2 is 15 with a prediction of label 0.

Conclusion

Random Forest is a popular ensemble learning method that leverages multiple decision trees to create a model that is both robust and accurate. By using bootstrapping and random feature selection, it reduces overfitting and captures complex patterns in the data. The mathematical principles, such as the calculation of Gini impurity, provide insight into how individual trees decide on the best splits, ensuring that the final model is well-tuned and reliable.

Whether you’re doing classification or regression problems, understanding the inner workings of Random Forest can strengthen your approach to machine learning in general.