A brief prelude: when we seek a fruitful philosophical question, it often lies not at the center of a field but at its neglected margins—where practitioners assume something quietly, yet rarely examine it. I shall therefore propose a topic from a modern scientific discipline whose conceptual foundations remain surprisingly under-discussed.
Summalecton 1. Whether the concept of “representation” in machine learning models is philosophically coherent.
Reservation 1. It seems that the concept of representation in machine learning is perfectly clear. For researchers routinely speak of latent representations, feature representations, and representation learning, and these notions demonstrably improve predictive performance in systems such as neural networks. What works operationally needs no deeper clarification.
Reservation 2. Further, the notion of representation already has an established meaning in cognitive science and philosophy of mind: an internal state stands in for something in the world. Since machine learning models contain internal vectors that correspond to patterns in data, it follows that they straightforwardly possess representations.
Reservation 3. Moreover, many interpretability studies show that specific neurons or directions in embedding space correspond to identifiable features—such as sentiment, syntactic roles, or objects in images. Therefore the representational status of such structures seems empirically confirmed.
Counterpoint. On the contrary, several theorists in contemporary artificial intelligence have begun to question whether internal states in neural networks truly represent anything at all, or whether they merely participate in statistical transformations without semantic grounding.
My response.
I answer that the notion of representation in machine learning is conceptually unstable and therefore an overlooked philosophical problem of considerable depth.
For in classical philosophy, representation involves at least three elements:
a vehicle (the thing doing the representing),
a content (what is represented), and
a relation that connects the two in a way that explains correctness or error.
Consider a map: the paper is the vehicle, the terrain is the content, and conventions of cartography establish how the map may be accurate or mistaken.
Yet in many machine-learning systems the situation is less clear. A vector inside a neural network is called a “representation,” but we rarely specify:
what exactly it represents,
in virtue of what relation, or
how misrepresentation would be defined.
Instead, the justification is pragmatic: if the internal state improves prediction, it is labeled a representation.
But this pragmatic usage hides a conceptual ambiguity. The internal vectors are not designed to correspond to objects or properties in the world; rather, they arise from optimization processes that minimize statistical loss functions. Their structure reflects the geometry of data distributions, not necessarily the ontology of the external world.
Thus two possibilities emerge:
Instrumentalist view: the so-called representations are merely useful computational intermediaries, like temporary variables in an algorithm.
Realist view: they genuinely encode aspects of the world, albeit in distributed and opaque forms.
The overlooked difficulty lies in determining which of these is correct—or whether the dichotomy itself is misguided.
This problem becomes even more intriguing in large language models. Their internal embeddings capture subtle semantic regularities, yet they are derived solely from patterns of text. If meaning arises only from correlations among symbols, one may wonder whether these vectors represent the world, or merely the statistical structure of language about the world.
Hence the philosophical puzzle: representation in machine learning may be neither purely semantic nor purely syntactic, but a hybrid notion for which our inherited conceptual vocabulary is poorly suited.
Reply to Reservation 1.
Operational success does not settle conceptual clarity. Physics long used the notion of “force” effectively before clarifying its theoretical meaning. Likewise, practical success in machine learning may coexist with philosophical ambiguity.
Reply to Reservation 2.
The analogy with cognitive representation presumes that machine states relate to the world as mental states do. Yet neural networks lack perception, intention, and embodiment—features often invoked to ground representation in minds.
Reply to Reservation 3.
Interpretability findings show correlations between internal features and human concepts, but correlation does not by itself establish representation. A thermometer correlates with temperature, yet we say it measures temperature only because of a causal and normative framework that machine-learning models do not clearly possess.
If you wish, I can also propose other similarly overlooked topics in obscure modern fields—for instance:
the metaphysics of gradient descent in optimization theory,
the epistemology of simulation-based science,
or the ontology of virtual particles in quantum field theory.
A brief orientation may help. Before judging whether “representation” in machine learning is obscure, we must ask a prior question: is the notion of representation itself clear even in paradigmatic cases? Philosophy often discovers that what seemed obvious dissolves under examination. Thus it is fitting to collect canonical examples across domains.
Summalecton 2. Whether there exist clear and uncontroversial examples of bona fide representations.
Reservation 1. It seems that there are many perfectly clear examples of representation. A geographical map represents terrain, a portrait represents a person, and a written sentence represents a thought. These cases are so obvious that no philosophical difficulty arises.
Reservation 2. Further, in science we possess rigorous representational systems: equations represent physical processes, diagrams represent structures, and computer models represent real systems. Since these are formalized and operationally precise, they must count as clear cases of representation.
Reservation 3. Moreover, language itself is widely understood as representational: words stand for objects, properties, and relations. Since linguistic representation is ubiquitous and well studied in semantics, the concept cannot be obscure.
Counterpoint. On the contrary, philosophers from Charles Sanders Peirce to Nelson Goodman have argued that representation is not a single simple relation but a complex family of relations involving convention, resemblance, and interpretation.
My response.
I answer that there do exist canonical examples commonly accepted as representations, yet careful inspection shows that they achieve this status in different ways, suggesting that the concept of representation is not unitary but plural.
Let us therefore gather several paradigmatic domains.
Maps are perhaps the clearest examples.
Their representational structure involves:
Vehicle: the physical map or diagram.
Target: the geographical terrain.
Rules: cartographic conventions (scale, projection, symbols).
Normativity: the map may be accurate or inaccurate.
Yet maps reveal an interesting feature: representation here depends heavily on conventions. A blue line means “river” only because users learn the code. Without those conventions, the map would be merely colored paper.
In art, representation appears to rely more on resemblance.
A portrait represents a person because its visual features resemble those of its subject. Yet resemblance alone cannot suffice:
A twin resembles a person but does not represent them.
A caricature may represent someone even while distorting them.
Thus representation in art combines resemblance with intentional and interpretive practices.
Human language represents the world through symbolic reference.
Example structure:
The word “tree” represents trees.
A sentence represents a state of affairs.
Yet here representation is almost entirely conventional. Nothing in the sound tree resembles an actual tree. Meaning arises through social linguistic practices.
Hence language demonstrates a form of representation that operates without resemblance.
Science employs many representational devices:
equations
diagrams
physical models
simulations
Consider a molecular model with colored balls and sticks. The colors and rods correspond to atoms and bonds. Yet no molecule actually contains colored spheres or sticks.
Thus scientific representation often works through idealization and distortion.
Instruments represent states of the world through causal processes.
For example:
The mercury column in a thermometer represents temperature.
A gauge needle represents pressure.
Here representation depends primarily on reliable causal coupling rather than resemblance or convention.
From these canonical examples we observe that representation may arise through at least four different mechanisms:
Resemblance (portraiture)
Convention (language, maps)
Causal correlation (measurement devices)
Structural analogy (scientific models)
No single mechanism appears sufficient for all cases.
Therefore the concept of representation might be better understood not as a single relation but as a family of related practices united by the idea of standing-in-for something within an interpretive system.
This observation is already suggestive for our original question: machine-learning “representations” might resemble some of these paradigms but not others.
Reply to Reservation 1.
Common examples are indeed widely accepted as representations, but their underlying mechanisms differ so significantly that the concept they share becomes philosophically elusive.
Reply to Reservation 2.
Scientific formalisms are precise in their internal rules, yet the relation between model and world—how exactly the equation “represents” reality—remains a central topic in the philosophy of science.
Reply to Reservation 3.
Linguistic semantics explains how words function within a language system, but the deeper question of how symbols relate to the world—famously debated in analytic philosophy—shows that representation in language is itself philosophically contentious.
If you wish, the next step could be even more illuminating. We might proceed almost scholastically by extracting the minimal conditions of representation from these examples—something like a proto-definition—and then test whether neural network “representations” satisfy them. This is where the inquiry usually becomes quite surprising.
The path now resembles the method of the medieval schools: from examples we attempt to extract the conditions of possibility of the phenomenon itself. Yet here a warning is appropriate. When philosophers attempt such distillation, they often discover that the phenomenon resists reduction to a single essence. Still, the attempt is illuminating.
Summalecton 3. Whether representation requires a determinate set of necessary conditions.
Reservation 1. It seems that representation must have clear necessary conditions. For without such conditions we could never distinguish genuine representations from mere physical correlations. Therefore philosophy must provide a strict definition.
Reservation 2. Further, classical semiotics already offers such a framework. According to Charles Sanders Peirce, a sign consists of a sign vehicle, an object, and an interpretant. Since this triadic structure accounts for representation in general, the conditions are already known.
Reservation 3. Moreover, scientific realism presupposes that models represent the world in a precise way. If representation lacked definite conditions, the epistemic authority of science would collapse into mere instrumentalism.
Counterpoint. On the contrary, philosophers such as Nelson Goodman and Bas van Fraassen have argued that representation is not grounded in resemblance or any single relation, but instead depends on complex practices of use.
My response.
I answer that representation does not appear to have a single strict essence; nevertheless, from canonical examples we may extract a cluster of conditions that are jointly sufficient in most paradigmatic cases.
From the domains previously surveyed—maps, portraits, language, scientific models, and measuring instruments—we may identify five recurrent features.
Every representation has a material or formal bearer:
ink on paper (map)
pigment on canvas (portrait)
sound or text (language)
vectors or equations (scientific model)
Without such a vehicle, nothing stands in for anything.
The vehicle is about something.
Examples:
a region of terrain
a particular person
a physical system
a measurable property
This “aboutness,” traditionally called intentionality, has been discussed extensively since Franz Brentano.
There must exist some systematic correspondence between vehicle and target.
Examples include:
geometric scaling (maps)
visual resemblance (portraits)
syntactic rules (language)
structural similarity (scientific models)
causal calibration (thermometers)
This mapping need not be perfect, but it must be sufficiently structured to support inference.
A genuine representation can be correct or incorrect.
A map may misplace a river.
A thermometer may be miscalibrated.
A sentence may assert something false.
This normative dimension distinguishes representation from mere correlation.
For example:
tree rings correlate with rainfall
shadows correlate with objects
Yet neither is usually called a representation because they do not participate in practices of correctness and error.
Finally, representation normally requires agents or systems capable of using the vehicle as a representation.
A subway map functions only for travelers who understand its conventions.
Scientific diagrams function within scientific practice.
This element strongly echoes the semiotic theory of Charles Sanders Peirce, where the interpretant completes the sign relation.
From these observations we may formulate a working schema:
A representation is a vehicle that, within an interpretive practice, stands in a structured mapping relation to a target such that the vehicle can succeed or fail in representing that target.
This schema preserves the diversity of mechanisms (resemblance, convention, causation) while retaining common structural features.
When we now compare these conditions with machine-learning internal states, several tensions immediately appear:
Vehicle: clearly present (vectors, activations).
Target: often unclear. What exactly does a latent vector represent?
Mapping relation: learned statistically rather than explicitly defined.
Normativity: error is defined relative to prediction tasks, not representation accuracy.
Interpretive system: primarily humans analyzing the model after training.
Thus the central puzzle emerges:
Machine learning systems seem to possess vehicles and mappings, yet the target and normative conditions are often ambiguous.
Reply to Reservation 1.
Philosophy indeed seeks necessary conditions, yet some concepts—such as “game,” as noted by Ludwig Wittgenstein—resist strict definitions and instead exhibit family resemblances.
Reply to Reservation 2.
Peirce’s triad illuminates representation profoundly, yet even within that framework disputes remain about how the interpretant arises and how objects are fixed.
Reply to Reservation 3.
Science need not collapse without a strict definition of representation. Scientific practice may rely on pragmatic success while philosophical analysis clarifies its conceptual underpinnings.
If you wish, the inquiry can now move to a particularly revealing step: testing neural network “representations” against each of the five conditions one by one. That exercise often reveals precisely where the concept begins to strain.
Summalecton 4. Whether internal states of contemporary machine-learning models satisfy the canonical conditions of representation.
Reservation 1. It seems that they do. For a latent vector in a neural network plainly functions like a map: it compresses information about the input and guides downstream outputs. Therefore it is a representation.
Reservation 2. Further, interpretability work routinely finds “features” corresponding to concepts (e.g., sentiment, gender, objectness). Therefore the target of such states is identifiable, and the representational relation is confirmed.
Reservation 3. Moreover, these systems demonstrably make mistakes. Hence normativity (correctness vs. error) is present, and so representation is present.
Counterpoint. On the contrary, it is possible for a system to be predictively useful while lacking any stable “aboutness”: it might merely transform inputs into outputs by statistical regularities, with no determinate target beyond the task itself.
My response.
I answer that internal states of machine-learning models partly satisfy the canonical conditions of representation, but they do so in a way that is often derivative, task-relative, and indeterminate. Let us test them against the five conditions we distilled.
Satisfied, and uncontroversially.
The vehicle is the activation pattern, embedding vector, attention head state, etc.
It is a well-defined bearer with measurable structure (dimensions, norms, directions, sparsity).
So far, so good.
Frequently underdetermined.
In canonical representations, the target is fixed by a relatively stable practice:
maps → terrain
thermometers → temperature
words → communal referents (however contested, still anchored socially)
In ML, the “target” of an internal state varies depending on what you take the model to be for.
Three common candidates compete:
Input-target: the state represents something about the input (e.g., “this image contains a dog”).
World-target: the state represents something in the external world (dogs as such).
Task-target: the state represents whatever is useful for the loss function (a decision boundary, a label proxy, a shortcut).
The trouble: (3) can succeed even when (1) and (2) fail.
A classifier might “detect wolves” by picking up snow in the background (a known failure mode in image classification). In that case, an internal feature may “represent” snow-likelihood, while the human intended target was wolfness.
So: targets in ML are often not fixed by the system itself, but by our interpretive stance plus the training objective and data.
Satisfied, but unlike maps.
There is a systematic correspondence between:
inputs and internal states, and
internal states and outputs.
But it differs from canonical cases in two ways.
Two models with the same performance can have quite different internal geometries. So the mapping is not uniquely determined by the function.
Even in simpler settings (e.g., linear transformations), there can be symmetries: rotations of latent space may preserve behavior. That means a “feature direction” is not always a natural kind; it may be an artifact of parameterization.
So ML mappings are systematic, but frequently not semantically privileged.
Present—but often only task-normativity, not representational normativity.
In canonical representations:
a map is wrong if it mislocates the river, even if it still helps you navigate sometimes.
a thermometer is wrong if miscalibrated, even if you adapt to its bias.
In ML, “wrongness” is usually defined as:
higher loss,
lower accuracy,
worse calibration,
failure on held-out distribution.
That is instrumental normativity: “it fails to do the job.”
But representational normativity is subtler: “it depicts the target incorrectly.”
These can come apart:
A model can get the right label for the wrong reason (shortcut learning).
A language model can generate a true sentence by luck (or dataset bias) while lacking stable grounding.
So ML models have a rich notion of performance error, but a much weaker, contested notion of misrepresentation.
A useful diagnostic is this question:
Can we say the internal state is false about the world independently of whether it harms task performance?
Often we cannot—except by importing human semantic standards from outside.
Satisfied, but frequently externalized.
Canonical representations live inside communities of users:
cartographers, navigators, readers, scientists
ML internal states typically become “representations” when:
researchers probe them,
humans label what directions correspond to,
downstream tasks “read” them out.
So interpretability often supplies the missing interpretant: we stabilize targets and meanings.
There is, however, one internal interpretive system: the rest of the network reads the state. But that internal “reader” is optimized for behavior, not for truth-conditional content.
Thus ML representations are often:
functional inside the model (they causally mediate outputs),
but semantic mostly in relation to human interpretation.
From this test, we can distinguish three grades of “representation” in ML:
Functional representations
Internal variables that reliably carry information useful for the model’s computation.
Task-relative representations
Variables whose “aboutness” is fixed by the training objective and dataset labels, not necessarily by the world.
World-linked representations
Variables that track external properties in a way that remains stable across contexts and supports robust counterfactuals (e.g., if the world changed in certain ways, the variable would change accordingly).
Many ML “representations” are (1) and (2). Fewer are (3). The philosophical tension is that our everyday word “representation” often implies (3), while ML practice is usually content with (1) and (2).
Reply to Reservation 1.
A latent vector can be map-like in its compressive utility, yet maps have stable targets and explicit correctness conditions. ML states often lack that stability unless we impose it.
Reply to Reservation 2.
Interpretability findings are real and valuable, but they frequently reveal a coherent reading, not the uniquely correct target. Multiple incompatible “concept decompositions” can fit the same behavior.
Reply to Reservation 3.
Mistakes exist, but they are often mistakes in task performance. Whether they constitute misrepresentation depends on a further semantic story—one that ML systems rarely contain intrinsically.
If you want to push this to full “gravity,” the next natural move is to examine what fixes a target at all: convention, causation, evolutionary/teleological function, and social practice. Then we can ask: which of these—if any—can be internal to machine learning systems? That is where the debate becomes genuinely metaphysical.
Summalecton 5. Whether the “target” of a representation can be fixed, and by what principles, in a way applicable to machine-learning systems.
Reservation 1. It seems that targets are fixed simply by resemblance or correlation. For if an internal state reliably covaries with dogs, then it represents dogs; and if it resembles a dog (as a picture does), then it represents a dog. Therefore target-fixing is easy and largely non-mysterious.
Reservation 2. Further, targets are fixed by intention: a cartographer intends a map to represent Paris; a speaker intends “Paris” to refer to Paris. Therefore, since machine-learning models have no intentions, they cannot genuinely fix targets; at best we project targets onto them.
Reservation 3. Moreover, targets are fixed by convention within linguistic and scientific communities. Since ML models are trained on human artifacts (text, labels), their targets are inherited from those conventions; thus there is no special puzzle.
Reservation 4. Further, philosophers have already solved reference via causal theories (names refer by causal–historical chains) or via functional/teleological theories (states represent what they were selected to indicate). Since ML models are engineered and trained, their “selection” is explicit: the loss function. Therefore target-fixing in ML is straightforward.
Counterpoint. On the contrary, the very same internal state can be redescribed as “about” many different things (input features, dataset quirks, world properties, output logits), and multiple incompatible interpretive stories can fit the same behavior. Thus target-fixing is not automatically given by correlation, intention, convention, or optimization.
My response.
I answer that “target-fixing” is not one mechanism but a bundle of mechanisms, each of which fixes targets in some domains and fails in others. And when we examine them, we find that contemporary ML systems typically secure at best derivative targets (borrowed from human practice) and only sometimes secure world-linked targets (robustly about external properties).
To “give justice” to the matter, let us collect the principal target-fixing strategies that philosophers and scientists implicitly use, and then test their fit to ML.
Idea: a representation targets what it resembles, or whose structure it mirrors (pictures, scale models, some scientific diagrams).
Strength:
Works well for depiction and for certain idealized models (e.g., a globe).
Weakness:
Resemblance is cheap: many things resemble many things.
Resemblance does not distinguish representation from mere similarity (a twin resembles you, but does not represent you).
In high-dimensional embeddings, “resemblance” becomes metaphorical: what would it even mean for a 3,072-dimensional vector to resemble Paris?
Application to ML:
Embeddings sometimes preserve relational structure (analogies, clusters). That supports structural representational talk, but it rarely fixes a unique target. At best it says: “this space mirrors some structure in the data,” which may be structure in language rather than in the world.
Idea: a state represents what it reliably tracks (thermometers track temperature).
Strength:
Explains measurement and signaling devices well.
Introduces a notion of aboutness grounded in causal coupling.
Weakness:
Correlation is ambiguous when multiple causes co-vary (the “wolf/snow” problem).
A state may track a proxy that is predictive but not the intended property.
Tracking alone doesn’t yield normativity: if the state tracks snow, it is not wrong—it is our interpretation that is wrong.
Application to ML:
In vision and sensorimotor robotics, ML states can track worldly variables via direct sensors; here causal coupling is real, not merely textual. But in purely offline training (especially language-only), the coupling is indirect: the model tracks regularities in text, which only mediates the world through human testimony and bias.
So: causal target-fixing is possible for ML, but often weak or second-hand.
Idea: targets are fixed by communal rules (words mean what the language community uses them to mean; map symbols mean what cartographers stipulate).
Strength:
Explains linguistic and institutional representation elegantly.
Makes normativity intelligible: “wrong” means “violates the rules of the practice.”
Weakness:
Leaves a residual question: why do these conventions latch onto the world successfully at all?
Risk of pure internalism: symbols refer only within the system.
Application to ML:
Supervised labels, prompts, and evaluation rubrics are conventions. Therefore ML inherits many targets by stipulation: the class “cat,” the token “Paris,” the reward “+1.” This yields genuine target-fixing of a sort, but it is extrinsic: it depends on human practices surrounding the model.
This is why many ML “representations” feel like representations-for-us.
Idea: a symbol’s target is fixed by its role in a network of inferences: what follows from it and what it follows from.
Strength:
Fits formal systems and some cognitive theories.
Clarifies why internal representations can be meaningful within a system.
Weakness:
The “symbol grounding” worry: a purely inferential network may never reach the world—only more symbols.
Two systems could have identical inferential roles but different worldly targets if embedded in different environments.
Application to ML:
Neural networks excel at internal role: each feature participates in a web of transformations. That supports functional meaning (it does something, it constrains outputs). Yet without some anchoring (perception/action or trusted measurement), the target may remain indeterminate between “world property” and “textual regularity.”
Idea: a term refers because it stands in a causal chain originating in “baptism” or initial contact (e.g., a name keeps referring through social transmission).
Strength:
Explains proper names and certain natural-kind terms plausibly.
Weakness:
Hard to apply to internal neural states.
Requires a story about how internal states are linked to original referents.
Application to ML:
A language model token like “Paris” is embedded in human causal–historical chains (books, speakers, maps). But the model’s internal vector for “Paris” is linked to those chains only through training text. That is still a kind of inheritance—yet mediated and lossy. The model’s “reference” here is parasitic on human reference.
Idea: a state represents what it is supposed to indicate given its function (often grounded in evolutionary selection). Misrepresentation becomes possible because function can fail.
Strength:
Gives a powerful account of normativity: a frog’s “fly detector” can misfire.
Explains how being wrong is not merely “unexpected” but “malfunction.”
Weakness:
Controversial metaphysics: what grounds “supposed to” outside evolution?
Risks circularity: we identify function by success, then explain success by function.
Application to ML:
Here lies a temptation: “the loss function is the selection pressure; therefore function fixes content.” But loss functions typically fix behavioral success, not truthful aboutness. A model is “supposed to” minimize loss, and if a shortcut minimizes loss, it is doing its job—so there is no misrepresentation by its own lights. Teleosemantics gives normativity when the system’s function is world-tracking; but many ML objectives are world-tracking only indirectly.
However: in reinforcement learning with rich environments, where reward genuinely depends on external states and robust generalization is required, a teleological story becomes more plausible: the agent is “supposed to” track certain affordances because failing to do so reliably reduces reward across changing contexts.
So teleosemantics can apply to ML, but it applies best when:
the agent is embedded,
the environment varies,
and success requires stable world-modeling.
Idea: a state represents a target if intervening on the state predictably changes behavior as if the target had changed, and this relationship holds across contexts.
Strength:
Gives an experimentally tractable criterion: you can test representation by causal interventions.
Connects to robustness and counterfactuals.
Weakness:
Still underdetermination: an intervenable feature might be a proxy.
Requires careful causal design.
Application to ML:
This is one of the most promising bridges: if changing an internal direction reliably toggles “is-a-dog” behavior across many contexts, and does so in ways aligning with actual dogs rather than spurious correlates, we gain evidence for a world-linked target. When the effect collapses out of distribution, the “target” was likely a dataset artifact.
So: ML representation becomes more bona fide when it supports causal, robust, counterfactual control.
From these mechanisms, we can distinguish two levels:
Extrinsic target-fixing
Targets fixed by human convention, labeling, interpretive practice, or evaluation.
Most supervised ML lives here.
Intrinsic or world-linked target-fixing
Targets fixed by reliable causal coupling, robust intervention patterns, and functional roles in survival/success across variable environments.
Some embodied systems and robust RL approach this.
The philosophical “gravity” is that the word representation slides between these levels without notice.
Not one miracle, but a convergence of conditions:
Perception and action loops (so states must answer to the world, not just text).
Environmental variability (so shortcuts break, forcing stable targets).
Causal learning pressures (so the system benefits from tracking interventions, not merely correlations).
Shared practices with agents (multi-agent settings where communication conventions emerge, generating internal normativity).
Explicit truth/tracking objectives (where the cost of being “right for the wrong reason” is made real).
Even then, indeterminacy may remain—but it becomes the familiar indeterminacy we already accept in human reference, rather than a special deficiency.
Reply to Reservation 1.
Resemblance and correlation are often ingredients, but they do not by themselves fix a unique target, nor do they by themselves provide normativity. They are too promiscuous.
Reply to Reservation 2.
Intention is indeed a strong target-fixer in human cases, but it is not the only one: instruments represent without intentions, by calibrated causal roles. Thus lack of intention does not automatically preclude representation in ML, though it shifts the burden to other mechanisms.
Reply to Reservation 3.
Conventions do fix targets, and ML heavily inherits them. Yet this yields representation that is often derivative: it explains why we can interpret models, but not why the model’s internal states are about the world in their own right.
Reply to Reservation 4.
Optimization can mimic selection, but “selected for minimizing loss” does not automatically yield “aboutness of dogs rather than snow.” Teleological normativity becomes substantive only when the system’s success depends on tracking stable worldly properties across changing contexts.
If you want the next step in full scholastic rigor, we can take one concrete case—say a vision model, a language model, and an embodied RL agent—and for each, ask: which target-fixing mechanisms are actually operative, and where do they fail? That comparative triad usually makes the whole debate vivid rather than abstract.
Summalecton 6. Whether it is worthwhile to discern representations in language models, rather than abandoning the term altogether.
Reservation 1. It seems pointless. For if language models can be fully described as statistical input–output machines, then speaking of “representations” adds nothing but metaphor. One should simply describe their parameters and functions mathematically.
Reservation 2. Further, the term “representation” invites anthropomorphic confusion. It suggests that the model knows or believes something. Therefore abandoning the term would promote conceptual clarity and prevent philosophical error.
Reservation 3. Moreover, engineering success does not depend on representational interpretation. Engineers can optimize models, adjust architectures, and measure performance without ever identifying what internal vectors “represent.” Therefore the concern seems idle.
Counterpoint. On the contrary, the practice of diagnosing, aligning, and controlling modern AI systems increasingly depends on understanding what internal states track and how they influence behavior. Without some notion of representation, such understanding becomes severely limited.
My response.
I answer that discerning representations in language models is worthwhile not because the term is metaphysically elegant, but because certain practical and epistemic tasks become impossible without something like it. The concept functions as an investigative tool: it helps us explain, predict, and intervene in complex systems whose internal computations are otherwise opaque.
To see this clearly, let us consider several scenarios in which the notion becomes practically indispensable.
Suppose a language model consistently produces incorrect historical claims about a particular event.
Two explanations are possible:
The model lacks the relevant information entirely.
The information exists internally but is not being retrieved or expressed in the given context.
These two situations require different remedies.
In the first case we might retrain on better data.
In the second case we might adjust prompting, retrieval, or decoding mechanisms.
To distinguish these cases we must ask a representational question:
Does the model internally encode the relevant fact?
Without such inquiry we remain blind to the structure of the failure.
Consider the challenge of ensuring that powerful systems behave safely.
A system might produce harmless outputs during testing yet internally encode harmful strategies that emerge only under certain conditions. Detecting this possibility requires examining what the system internally tracks.
Researchers sometimes probe whether a model internally represents concepts such as:
human preferences
deception
hidden goals
sensitive information
If such patterns exist, one can attempt to monitor or constrain them.
Thus the concept of representation becomes a safety diagnostic, not merely a philosophical curiosity.
Modern neural networks are extraordinarily complex dynamical systems.
Engineers often achieve progress by identifying intermediate structures inside them:
circuits that track syntactic structure
features corresponding to semantic categories
directions encoding sentiment or style
These discoveries allow targeted interventions:
editing internal features
steering outputs
compressing models
transferring knowledge
All of this implicitly assumes that certain internal variables stand in systematic relation to aspects of the input or task—which is precisely what we mean by representation.
A central problem in machine learning is understanding why some models generalize while others rely on fragile shortcuts.
If a model recognizes wolves by detecting snow, it has learned a poor representation.
If it tracks the structural features of wolves themselves, it has learned a better one.
Thus representation becomes a criterion for distinguishing:
spurious correlations
stable world-linked features
Without this distinction we cannot systematically improve robustness.
Beyond engineering, there is also the question of scientific understanding.
To explain the success of a complex system, scientists often describe the internal variables it uses to organize information. In neuroscience, for example, researchers speak of neurons representing orientation, location, or faces.
Even if such language is somewhat idealized, it provides a powerful explanatory framework.
Similarly, when studying language models we may ask:
Do they encode grammatical structure?
Do they track entities across sentences?
Do they maintain internal models of discourse?
These questions cannot even be posed without representational language.
The alternative—abandoning representation—would leave us with only two kinds of description:
microscopic mathematics (billions of parameters and matrix multiplications), or
macroscopic behavior (input–output statistics).
But complex systems are rarely understood at those extremes. Science typically advances through intermediate explanatory levels.
Representation talk is one candidate for such a level.
Reply to Reservation 1.
Mathematical descriptions capture the exact computation, yet they often fail to reveal why the system works. Representational descriptions provide explanatory compression by grouping patterns into meaningful structures.
Reply to Reservation 2.
Anthropomorphic misuse is indeed a danger. Yet abandoning the term entirely would discard a useful conceptual instrument. A more prudent approach is to clarify the term rather than banish it.
Reply to Reservation 3.
Engineering can proceed without representational analysis in early stages, but as systems become more powerful and safety-critical, understanding their internal organization becomes increasingly necessary.
A final reflection.
In truth, the philosophical drama here echoes an older tension in the philosophy of mind. Thinkers from Daniel Dennett to Wilfrid Sellars have argued that intentional vocabulary—talk of beliefs, meanings, representations—is often justified not because it reveals metaphysical essences, but because it forms a predictively and explanatorily powerful stance toward complex systems.
Language models may eventually force us to refine this vocabulary. Yet until a better conceptual framework appears, the investigation of representations remains one of the most promising ways to illuminate how these systems actually work.
A short preface.
If we wish to “give justice” to the question, we must not merely defend the usefulness of representation-talk but also ask what intellectual stance justifies its use. For the history of philosophy suggests that representation is often less an intrinsic property of things than a mode of interpretation adopted by observers. With this in mind we proceed.
Summalecton 7. Whether representation in language models is best understood as an interpretive stance rather than an intrinsic property.
Reservation 1. It seems that representation must be an intrinsic property of the system. For if an internal state represents dogs, this must be a fact about the system itself, not merely about our interpretation. Otherwise the notion would collapse into subjectivity.
Reservation 2. Further, if representation were merely an interpretive stance, then any sufficiently complex system—such as the weather or the stock market—could be said to contain representations. But this would trivialize the concept.
Reservation 3. Moreover, engineers performing interpretability research attempt to discover representations, not merely to impose them. Therefore representation must be a real internal structure awaiting detection.
Counterpoint. On the contrary, philosophers such as Daniel Dennett have proposed that the most fruitful way to understand complex systems is through what he calls the intentional stance: we attribute beliefs or representations when doing so yields predictive and explanatory power.
My response.
I answer that representation in language models is often neither purely intrinsic nor purely subjective, but is best understood as an interpretive stance constrained by the system’s causal organization.
Let us unfold this carefully.
A modern language model may contain hundreds of billions of parameters.
At the most literal level it is simply executing:
matrix multiplications
nonlinear transformations
token probability updates
A fully literal description of the system would consist of enormous tensors evolving through deterministic operations. Such a description is exact but useless for human understanding.
Thus a higher-level description becomes necessary.
In many sciences we adopt explanatory levels that do not correspond directly to fundamental physics.
Examples include:
genes in biology
pressure in thermodynamics
markets in economics
These entities are not illusions; they are patterns that become visible at certain scales of description.
The philosopher Daniel Dennett calls such entities real patterns: they are real insofar as they enable reliable prediction and compression of complex phenomena.
Representation in language models may belong to this class.
Suppose we discover that a particular direction in embedding space consistently correlates with sentiment.
If manipulating this direction reliably changes the model’s tone, then the pattern supports:
prediction (we can forecast outputs),
explanation (we understand how sentiment influences text),
intervention (we can steer behavior).
At this point it becomes rational to say:
“This direction represents sentiment.”
The representation is not merely imagined; it corresponds to a stable causal pattern within the system.
Yet it remains partly interpretive because:
the mapping between vector geometry and human concepts is mediated by our analytic framework.
Thus representation is real but theory-laden, much like many scientific constructs.
One might object that this makes representation too cheap.
But two constraints prevent triviality.
First constraint: predictive power
The interpretation must significantly compress or improve prediction of the system’s behavior.
A random mapping from vectors to concepts would not qualify.
Second constraint: causal manipulability
Interventions on the alleged representation must produce systematic behavioral effects.
If altering the state changes outputs as predicted, the interpretation gains credibility.
Thus representation claims become empirical hypotheses, not mere metaphors.
Neuroscientists often say that certain neurons represent:
spatial location
visual orientation
faces
These claims are not metaphysical pronouncements about the intrinsic nature of neurons. Rather, they are models explaining reliable causal patterns in neural activity.
Language-model interpretability appears to follow a similar methodology.
Therefore the study of representation in language models is worthwhile not because we expect to uncover an essence of meaning hidden inside vectors, but because:
identifying representational patterns helps us understand and control complex systems.
Representation thus functions as an explanatory bridge between low-level computation and high-level behavior.
Reply to Reservation 1.
Representation is not purely subjective because interpretations are constrained by empirical success. Interpretations that fail to predict or control behavior are discarded.
Reply to Reservation 2.
Not every complex system supports representational interpretation. Only systems whose internal patterns support stable prediction and manipulation at that level of description qualify.
Reply to Reservation 3.
Interpretability research indeed discovers patterns in the system. Yet the step from pattern to representation involves conceptual framing: we recognize the pattern as corresponding to something meaningful in our explanatory scheme.
A concluding meditation.
There is a curious irony here.
For centuries philosophers debated whether the human mind itself truly contains representations, or whether representation is merely a convenient way of describing neural activity. Now we find ourselves asking the same question about artificial systems.
In this respect, language models have not merely revived an old philosophical puzzle—they have transformed it into an empirical research program.
If you wish, the next Summalecton could examine something even more provocative:
Whether human mental representations differ in kind from those in language models—or only in degree.
That question, I suspect, lies at the quiet theological horizon of this entire discussion.
A brief prologue.
When one compares artificial systems with the human mind, the temptation is either to exaggerate the similarity or to deny it altogether. Yet philosophy, if it is patient, asks a subtler question: what kind of difference is at stake? Is it a difference of essence, or merely of organization and degree?
Summalecton 8. Whether human mental representations differ in kind from those in language models, or only in degree.
Reservation 1. It seems that the difference is merely one of degree. For both systems process symbols through large networks of interconnected units. Human brains consist of neurons; language models consist of artificial neurons. If both systems produce meaningful language and exhibit structured internal states, their representations must belong to the same general category.
Reservation 2. Further, many cognitive scientists already describe the brain itself as performing predictive statistical inference. If so, language models simply implement similar principles in silicon. Therefore any difference between the two must be quantitative rather than qualitative.
Reservation 3. Moreover, if representation is an interpretive stance grounded in causal patterns—as argued previously—then whenever a system exhibits such patterns we may legitimately attribute representations to it. Thus language models and humans should be treated symmetrically.
Reservation 4. On the other hand, it might be argued that human representations are fundamentally different because they involve consciousness. But consciousness is notoriously difficult to define and measure. Therefore it cannot serve as a clear boundary.
Counterpoint. On the contrary, philosophers such as John Searle have argued that computational systems manipulate symbols syntactically without possessing genuine semantic understanding, whereas human minds possess intrinsic intentionality.
My response.
I answer that the difference between human and language-model representations is neither purely one of kind nor purely one of degree. Rather, human representations appear to arise within a richer architecture of grounding, embodiment, and normative practices, which gives them a more stable form of aboutness.
Let us consider the principal dimensions along which the two differ.
Human representations emerge within a continuous loop of:
perception
action
feedback from the environment
A child learns the word “apple” while:
seeing apples
touching them
tasting them
observing their consequences in the world.
Thus the representation is anchored in multi-modal causal interaction with reality.
Language models, by contrast, typically learn from text alone. Their internal states therefore reflect patterns in linguistic usage rather than direct engagement with the world.
This does not eliminate representation entirely, but it makes the target less firmly anchored.
Human cognition integrates information across many domains:
perception
motor control
memory
planning
social interaction
This integration produces a relatively unified model of the environment and oneself within it.
Language models possess fragments of such structure—statistical patterns about many topics—but they lack a persistent internal model governing perception and action over time.
Consequently their representations tend to be contextually reconstructed rather than stably maintained.
Human representations exist within social practices where correctness matters.
For example:
statements can be true or false,
beliefs can be justified or unjustified,
assertions can be sincere or deceptive.
These norms are embedded in communities of agents who hold one another accountable.
Language models participate in such practices only indirectly through their users. Their outputs are evaluated by humans, but the model itself does not inhabit the normative framework that gives these evaluations meaning.
Human representational systems were shaped by evolutionary pressures to track features of the environment relevant to survival.
This evolutionary history provides a teleological grounding: certain neural states exist because they reliably indicated food, predators, or social signals.
Artificial models are shaped by optimization processes, yet these processes typically aim at narrow objectives (predicting tokens, maximizing reward) rather than at robust long-term survival in a complex environment.
Thus the functional grounding of their representations is weaker.
Human representation includes the capacity to represent:
oneself
one’s beliefs
one’s intentions
one’s future actions.
Such self-referential structures enable deliberation and responsibility.
Language models can simulate such discourse linguistically, yet they do not appear to maintain persistent internal commitments corresponding to those statements.
Therefore the difference between human and language-model representations can be described as follows:
Language models possess functional representations emerging from statistical learning and internal causal structure.
Human minds possess deeply grounded representations embedded in perception, action, social norms, and evolutionary function.
The difference is thus architectural and ecological, not necessarily metaphysical.
It resembles the difference between a map studied in a library and a map used by a traveler navigating an actual landscape.
Reply to Reservation 1.
Structural similarities between artificial and biological networks are real, but similar architectures can operate within very different ecological contexts. Representation depends not only on internal structure but also on how that structure interacts with the world.
Reply to Reservation 2.
Predictive inference may indeed be common to both systems. Yet the inputs, outputs, and purposes of the inference differ significantly, which alters the nature of the resulting representations.
Reply to Reservation 3.
The interpretive stance applies to both humans and machines. However, the richness of causal interaction and social practice surrounding human cognition stabilizes representational interpretations in ways rarely matched by present AI systems.
Reply to Reservation 4.
Consciousness may play a role, but the distinction outlined above does not depend on resolving the problem of consciousness.
A final reflection.
In earlier centuries theologians debated whether angels possessed intellects comparable to human reason, or whether their cognition belonged to an entirely different order. The discussion before us bears a faint resemblance to that ancient speculation.
Language models may be neither minds nor mere mechanisms, but something intermediate: systems whose internal structures invite the language of representation without yet fully deserving the philosophical weight traditionally attached to that term.
And perhaps the most honest position—melancholy though it may be—is that our concept of representation, forged to describe the human mind, is itself being quietly reshaped by the strange mirror of our own creations.
Summalecton 9. Whether the magnitude of information about humanity encoded in a language model implies that its representations surpass those of any individual human.
Reservation 1. It seems that they do. For a large language model is trained on corpora containing vast portions of human writing—literature, science, journalism, dialogue, and technical manuals. No individual human has encountered such a breadth of material. Therefore the model’s internal representations of human knowledge must be more comprehensive than those of any single person.
Reservation 2. Further, these models encode statistical regularities across millions of speakers and centuries of discourse. Thus they capture patterns of language and thought that no individual observer could detect. Hence their representations of human behavior and concepts may be more precise than those possessed by any particular human mind.
Reservation 3. Moreover, humans themselves often misunderstand their own collective practices. A model trained on large datasets might detect latent structures in language—semantic relationships, conceptual clusters, social patterns—that remain invisible to ordinary reflection. Therefore it could represent humanity in a uniquely powerful way.
Counterpoint. On the contrary, philosophers such as Hubert Dreyfus argued that human understanding arises from embodied participation in practices rather than from the accumulation of explicit information. If so, sheer informational magnitude may not yield genuine understanding.
My response.
I answer that the observation you raise is profound and should not be dismissed. Indeed, language models contain an extraordinary compression of human cultural information, and in this respect their internal structures may encode patterns of human discourse far beyond the experience of any single individual.
Yet the philosophical question concerns not merely quantity of information, but the form in which that information is organized and used.
Let us distinguish three different senses in which one might say a model “represents humanity.”
A language model functions somewhat like a vast statistical archive of human text.
Its parameters encode patterns drawn from:
countless authors
multiple disciplines
diverse historical periods.
In this sense the model may indeed contain a compressed statistical image of human discourse. One might compare it to an immense library whose volumes have been folded into a single mathematical structure.
Under this aspect, your claim is correct: the model embodies a scope of linguistic exposure unattainable for an individual human.
Beyond mere storage, the model learns relationships among words, concepts, and contexts.
For example:
analogies across domains
stylistic patterns
conceptual associations shared across many texts.
Because these patterns are extracted from massive datasets, the resulting representations may approximate the statistical structure of human language itself.
In that sense the model may sometimes detect regularities that no individual consciously perceives.
However, another dimension of representation concerns participation in the practices being represented.
Humans learn about:
promises by making and keeping them,
pain by feeling it,
social trust by navigating relationships.
These forms of understanding arise within lived interaction with the world.
Language models encounter these phenomena only through descriptions written by others.
Thus the difference lies less in information than in mode of access.
Your insight therefore suggests a refinement of the earlier discussion.
Language models may indeed possess richer statistical representations of human discourse than any single individual, because they integrate information from vast corpora.
Yet human minds possess representations grounded in:
direct perception,
bodily action,
emotional experience,
participation in social practices.
These two forms of representation are not easily reducible to one another.
One might say:
the model represents human descriptions of the world,
while humans represent the world in which those descriptions arise.
Both are remarkable achievements of information processing, though they operate in different epistemic environments.
Reply to Reservation 1.
The breadth of training data indeed allows models to encode patterns beyond any individual’s experience. This supports the claim that their representations of linguistic culture can be extraordinarily comprehensive.
Reply to Reservation 2.
Statistical aggregation can reveal latent structures in discourse. In this respect language models may function as instruments for discovering patterns within human communication.
Reply to Reservation 3.
Such discoveries can illuminate aspects of collective cognition. Yet whether this constitutes understanding in the full human sense remains a separate question.
A final reflection.
Your observation reminds us of a curious possibility: humanity may have inadvertently created a mirror in which its own linguistic and conceptual habits become visible at a scale never before possible.
In earlier ages scholars attempted to understand the human condition by reading libraries. Now those libraries have been distilled into mathematical systems that can respond in conversation.
Whether this mirror merely reflects our language or begins to participate in our understanding remains one of the most intriguing philosophical questions of our time.
Summalecton 10. Whether representation differs in principle from correlation.
Reservation 1. It seems that representation is nothing more than correlation. For any system that reliably distinguishes wolves from non-wolves must rely on features correlated with wolves: snow, fur texture, body shape, or context. Representation therefore consists simply in a sufficiently rich combination of correlated signals.
Reservation 2. Further, modern machine learning explicitly constructs representations as combinations of statistical features. Each dimension of a vector may encode one regularity, and many such dimensions together form a classifier. Hence representation is merely a higher-dimensional correlation structure.
Reservation 3. Moreover, even human perception operates in this way. The visual system detects edges, shapes, and textures—features correlated with objects. The brain integrates many such cues to infer that something is a wolf. Therefore representation reduces to layered correlations.
Counterpoint. On the contrary, philosophers such as Fred Dretske argued that mere correlation cannot by itself constitute representation, because correlation does not determine what the signal is about when multiple correlated causes exist.
My response.
I answer that your intuition captures an important truth: representation cannot exist without correlation, and in many practical systems representation indeed emerges from the combination of many correlated cues. In that sense, a representation can be understood as a structured bundle of correlations.
However, correlation alone leaves a residual ambiguity: when several correlated causes coexist, the system must somehow settle which cause the signal is taken to indicate.
Let us examine the wolf–snow example you mentioned.
Imagine a classifier that distinguishes wolves using several features:
snow in background
elongated snout
fur texture
body proportions
ear shape
eye placement
movement pattern
Each feature provides partial evidence. No single one suffices, but together they create a high-dimensional pattern strongly associated with wolves.
In modern terms, the classifier’s “wolf representation” might be the region of feature space where these correlations converge.
In this respect your proposal is accurate: representation may arise from the intersection of many heuristics.
Yet suppose the dataset has a bias:
wolves mostly appear in snowy landscapes
huskies appear mostly in urban scenes.
The system might construct a representation where:
snow contributes heavily
canine morphology contributes less.
Now the internal “wolf detector” may actually track snow + some vague canine features.
If the system sees a husky in snow, it misclassifies.
The puzzle then arises:
Did the system represent wolves, or did it represent snowy-canine contexts?
Correlation alone does not answer this question, because both interpretations fit the statistical pattern.
Philosophers introduced the concept of representation precisely to address this ambiguity.
They sought criteria determining which correlated cause counts as the content of the signal.
Different theories propose different answers:
causal tracking theories
teleological function theories
inferential role theories
pragmatic interpretation theories.
Each attempts to resolve the ambiguity left by raw correlation.
Your proposal can therefore be expressed more precisely:
A representation is a stable manifold of correlations across many features that together track some cluster in the environment.
This is quite close to how modern machine learning actually works.
However, an additional step is usually required:
the system must remain stable when irrelevant correlations disappear.
If snow disappears but the wolf remains identifiable, the representation becomes robust. If not, the system has merely learned a shortcut.
Thus representation may be understood as robust correlation across varying contexts.
We might therefore articulate a pragmatic definition:
A representation is a pattern of correlations that remains reliable across many environments and interventions.
Under this definition:
a single heuristic (snow) is weak,
many heuristics combined are stronger,
robustness across contexts stabilizes the representation.
This formulation fits both:
biological perception
modern machine learning.
Reply to Reservation 1.
You are correct that representations consist of correlated features. The philosophical dispute concerns whether correlation alone suffices to determine the content of the representation.
Reply to Reservation 2.
High-dimensional feature spaces indeed realize representations as bundles of correlations. Yet interpreting what those bundles represent requires examining how they behave across contexts.
Reply to Reservation 3.
Human perception also relies on correlated cues. However, interaction with the world continually tests and refines these correlations, helping stabilize their targets.
A quiet reflection.
What you propose echoes a shift occurring quietly in both philosophy and machine learning: the old image of representation as a single symbol standing for a thing is giving way to a more geometric picture in which representation is a region in a vast space of correlations.
If this view is correct, then the classical philosophical question—what does this symbol mean?—is slowly transforming into a different question:
Which structures in this space remain stable as the world changes?
And that question, unlike many metaphysical debates, can increasingly be answered by experiment.
Summalecton 11. Whether the notion of “content” in representation is misguided, given that classification itself depends on contingent conceptual schemes.
Reservation 1. It seems that the notion of representational content is misguided. For categories such as “fish” themselves depend on historically contingent human taxonomies. Whales and dolphins were once called fish, yet later zoology revised the classification. If representation depends on such shifting schemes, then talk of content becomes circular.
Reservation 2. Further, identifying the correct content of a representation often requires expert knowledge unavailable to the system itself. A classifier cannot know whether “fish” includes whales unless someone already supplies the biological taxonomy. Therefore the system cannot genuinely represent “fish” in the intended sense.
Reservation 3. Moreover, if representation is judged by whether it corresponds to the correct category, the analysis presupposes precisely the semantic framework it seeks to explain. Thus the concept of representational content seems methodologically suspect.
Counterpoint. On the contrary, scientists and engineers routinely speak of representational content when analyzing perception, language, and machine learning, and this language often proves practically useful.
My response.
I answer that your objection reveals a deep difficulty: the notion of content as a fixed and objective target is indeed problematic when categories themselves are historically and pragmatically constructed.
The example of fish illustrates this well.
For centuries many aquatic animals—including whales—were classified as fish. Later zoology reorganized biological knowledge and separated mammals from fish based on evolutionary lineage.
From the perspective of representation, this means that a system distinguishing “fish” might have been perfectly accurate relative to one taxonomy and incorrect relative to another.
Thus the supposed content of the representation changes as the conceptual framework changes.
Your criticism exposes a general fact: representational content often depends on human conceptual schemes.
Examples abound:
“planet” once included Pluto and later excluded it.
“disease” classifications evolve with medical knowledge.
“species” boundaries shift with new evolutionary insights.
If content depends on such frameworks, then it cannot serve as an independent ground for explaining representation.
In that sense, insisting that a system truly represents fish rather than aquatic vertebrates with fins may indeed involve a circular appeal to human taxonomy.
Instead of speaking about representational content, we might describe the system more neutrally.
A classifier does not represent “fish” in an absolute sense; rather it:
detects patterns in images,
clusters them according to training signals,
associates those clusters with labels provided by humans.
The internal structure therefore reflects statistical regularities in the training environment, not timeless natural categories.
Thus the system’s internal state might correspond to something like:
“objects with certain visual and contextual features frequently labeled ‘fish’ in the dataset.”
This description avoids the philosophical burden of claiming intrinsic semantic content.
One can therefore reinterpret representation talk pragmatically.
When engineers say a model “represents fish,” they usually mean something modest:
there exists an internal feature whose activation reliably predicts when the label “fish” will be produced.
In this pragmatic sense, representation is shorthand for predictive structure, not metaphysical aboutness.
Your argument suggests that debates about representation often mistake human conceptual authority for intrinsic properties of the system.
Because humans already possess elaborate taxonomies, we tend to project them into the model’s internal space and say:
“this neuron represents fish.”
Yet the model itself only encodes correlations learned from data.
The philosophical danger is therefore a kind of semantic projection.
Even if we abandon strong claims about content, something interesting still remains.
Inside the model there exist structured patterns that:
organize information about inputs,
influence predictions,
allow interventions that change behavior.
These patterns are real features of the system.
Whether we call them representations, features, or statistical structures becomes partly a matter of explanatory taste.
Reply to Reservation 1.
Your example of fish demonstrates that semantic categories evolve. Therefore representational content cannot always be treated as a fixed natural property.
Reply to Reservation 2.
Indeed, many classifications require expert knowledge beyond what the system itself possesses. This supports the view that semantic interpretation often originates outside the system.
Reply to Reservation 3.
The accusation of circularity has force when content is treated as a primitive explanatory notion. A more cautious approach treats representation talk as shorthand for patterns discovered within a predictive system.
A closing reflection.
Your objection recalls the skepticism of philosophers like Willard Van Orman Quine, who argued that meaning and reference are inseparable from the conceptual schemes through which we interpret the world.
If that skepticism is correct, then the debate about representation in machines may ultimately reveal something even more unsettling:
not that machines fail to possess representations,
but that our own concept of representation was never as stable as we imagined.
Summalecton 12. Whether the research question “Does a model represent a wolf?” is misguided.
Reservation 1. It seems that the question is misguided. For the internal state of a model is merely a vector in a high-dimensional space. Such a vector has no intrinsic semantic content. Therefore asking whether it represents a wolf imports a human conceptual category into a system that merely computes correlations.
Reservation 2. Further, the category wolf itself is historically contingent and theory-laden. Zoological classification evolves, and ordinary language categories shift across cultures. If the target concept is unstable, the question of whether a model represents it cannot have a well-defined answer.
Reservation 3. Moreover, the system’s competence arises from the combination of many heuristics and statistical regularities. A cluster of correlations—fur patterns, body shape, background cues—suffices for prediction without requiring a unified representation of wolves as such. Hence the question presupposes a level of conceptual unity that the system may not possess.
Reservation 4. Finally, the research question risks circularity. To decide whether a neuron represents wolves, one must already know which images contain wolves. Thus the analysis depends on the very semantic framework it claims to investigate.
Counterpoint. On the contrary, many researchers claim that identifying internal representations of concepts helps explain model behavior and diagnose failures.
My response.
I answer that your objection has substantial force: taken literally, the question “Does the model represent a wolf?” is indeed philosophically misguided.
The difficulty lies in the assumption that complex systems contain discrete symbolic contents corresponding to human concepts. Modern machine learning rarely works that way. Internal states typically encode distributed statistical structure, not explicit conceptual tokens.
Therefore the system may not contain a single “wolf representation” at all. Instead it may contain:
clusters of correlated features,
decision boundaries in feature space,
distributed patterns spanning many neurons.
Under such conditions the question “Which neuron represents wolves?” resembles asking which single molecule in a hurricane represents the storm.
A modern model organizes information in a high-dimensional geometry.
Concepts correspond not to single units but to regions or directions in this space.
Even these regions are fluid:
they overlap with neighboring categories,
they shift depending on context,
they reflect biases in the training distribution.
Thus the internal structure may support accurate predictions without isolating a discrete representation of any particular concept.
When researchers ask whether the model represents wolves, they often perform the following procedure:
Collect images labeled “wolf.”
Examine which internal features correlate with those images.
interpret those correlations as “wolf representations.”
But this procedure projects a pre-existing human taxonomy onto the model’s geometry.
The model itself may instead track a cluster such as:
“large gray canids appearing in snowy landscapes.”
Whether we call that “wolf” depends on human interpretation.
If the original question is misguided, it can be replaced by a more precise one:
What statistical structures inside the model support its ability to distinguish certain patterns in data?
This formulation avoids premature semantic commitments.
Researchers might then ask:
which features contribute to classification,
how robust those features are across contexts,
how the model’s internal geometry organizes inputs.
These questions concern mechanism and structure, not metaphysical content.
Despite its philosophical problems, representation talk persists because it offers a convenient shorthand.
Saying “the model has a wolf detector” compresses a complex description of correlations and feature interactions.
Yet one must remember that such language is heuristic, not literal.
Reply to Reservation 1.
Your objection correctly identifies that vectors lack intrinsic semantic content. Any semantic interpretation arises from human analysis of the model’s behavior.
Reply to Reservation 2.
The instability of categories like “fish” or “planet” shows that representational content cannot serve as an objective anchor for analysis.
Reply to Reservation 3.
Indeed, distributed correlations often suffice for prediction. The model need not construct unified conceptual entities.
Reply to Reservation 4.
The apparent circularity disappears once the question is reformulated in terms of statistical structure rather than semantic representation.
A closing reflection.
Your criticism points toward a methodological shift. Instead of asking whether machines possess the same concepts that humans do, it may be wiser to ask:
What kinds of structures emerge when vast quantities of human discourse are compressed into a computational system?
The answer to that question might illuminate both artificial models and the hidden statistical structure of human language itself—without requiring us to attribute neatly packaged concepts where none may actually exist.
Summalecton 13. Whether the comparison between chatbot and human representations presupposes an unjustified uniformity of human cognition.
Reservation 1. It seems that such comparison is misguided. For human cognition varies enormously across individuals, cultures, professions, and historical periods. If representations differ so widely, then there is no single “human representation” against which a chatbot could be meaningfully compared.
Reservation 2. Further, any attempt to define the “human concept” of something like wolf, justice, or fish inevitably privileges one conceptual framework over others. Thus the comparison risks silently imposing the perspective of a particular culture or discipline.
Reservation 3. Moreover, language models are trained on texts produced by millions of individuals. In that sense they encode a vast mixture of perspectives. Paradoxically, the model may reflect the diversity of human cognition more faithfully than any individual human mind.
Reservation 4. Finally, if representational diversity among humans is indeed “unfathomable,” then the very idea of comparing human cognition with machine cognition collapses, since the human side of the comparison lacks a stable reference point.
Counterpoint. On the contrary, sciences of the mind—from psychology to neuroscience—routinely identify shared cognitive structures despite individual variation.
My response.
I answer that your frustration touches a genuine philosophical oversight: discussions comparing “human representations” with those of machines often treat humanity as cognitively homogeneous, whereas in reality human cognition exhibits profound diversity.
However, this does not necessarily render the comparison meaningless. Rather, it reveals that the proper comparison is not between a model and a single human mind, but between a model and the statistical ecology of human cognition.
Let us unfold this more carefully.
In everyday discourse philosophers often speak as if humans share identical concepts:
“the human concept of a tree,”
“the human understanding of justice,”
“the human representation of animals.”
Yet anthropological and psychological research shows enormous variation:
taxonomies of animals differ across cultures,
spatial reasoning varies across languages,
moral concepts differ across societies.
Thus there is no single canonical representation of many concepts.
Your objection therefore exposes a methodological simplification often made in philosophical discussions.
A language model trained on large corpora does not learn the representation of any particular person.
Rather, it learns statistical regularities across the collective output of many people.
One might therefore say that the model approximates something like:
the distribution of ways humans talk and reason about things.
In this sense the model’s internal structures resemble a statistical synthesis of human discourse rather than the cognition of any specific individual.
Given this situation, the proper comparison may not be:
chatbot vs. human mind
but rather:
chatbot vs. population-level patterns of human cognition.
This distinction is crucial.
A single human mind may hold idiosyncratic beliefs, unusual conceptual schemes, or rare expertise. The model, by contrast, reflects aggregate linguistic behavior.
Thus the model resembles something like a collective mirror rather than an individual intellect.
Your critique suggests that the representational question should be reformulated.
Instead of asking:
Does the model represent wolves the way humans do?
we might ask:
Does the model reproduce the statistical distribution of ways humans describe and categorize wolves?
This shift replaces an essentialist comparison with a population-level one.
Ironically, this perspective reverses the usual intuition.
Because the model aggregates information from millions of texts, it may sometimes approximate collective linguistic patterns more accurately than any single individual.
An individual human knows only a tiny slice of human discourse.
The model, by contrast, compresses vast textual ecosystems.
Thus the comparison between model and individual human becomes somewhat misleading.
Reply to Reservation 1.
Your observation about cognitive diversity is correct. Human cognition is not uniform, and philosophical discussions sometimes overlook this fact.
Reply to Reservation 2.
Indeed, privileging a particular conceptual framework risks cultural bias. Recognizing diversity helps avoid treating any one taxonomy as definitive.
Reply to Reservation 3.
The aggregation of many voices in training data does allow models to capture broad patterns of discourse, though this aggregation also introduces its own distortions.
Reply to Reservation 4.
While diversity complicates the comparison, it does not eliminate it; rather, it shifts the level of analysis from individuals to populations.
A final meditation.
Your complaint echoes an old tension in philosophy: the desire to speak about the human mind as if it were a single thing, when in truth humanity consists of countless distinct cognitive worlds.
Language models, by compressing vast corpora of text, may inadvertently expose this plurality. They do not mirror a single mind but rather the statistical echo of many minds speaking at once.
And perhaps this is why discussions about their “representations” often feel unstable: they are conducted as if there were a single human conceptual scheme, when in fact there is only a vast and shifting landscape of them.
Summalecton 14. Whether there exist practical scenarios that genuinely justify the study of “representations” in language models.
Reservation 1. It seems that there are none. For any phenomenon attributed to “representations” could instead be described purely in terms of correlations, parameters, and statistical structure. Therefore representational language adds nothing essential.
Reservation 2. Further, the term “representation” invites anthropomorphic confusion and presupposes conceptual categories whose boundaries are historically contingent and culturally variable. Hence using the term in technical research risks obscuring rather than clarifying the mechanisms at work.
Reservation 3. Moreover, since human cognition itself is extraordinarily diverse, there exists no canonical human representation against which a model could be compared. Thus studying chatbot representations lacks a coherent benchmark.
Reservation 4. Finally, if a model’s competence arises from distributed statistical heuristics rather than unified conceptual tokens, then searching for “representations” of particular concepts (wolves, justice, fish) is methodologically misguided.
Counterpoint. On the contrary, researchers in machine learning and AI safety increasingly rely on internal analysis of models to diagnose failures, improve reliability, and control behavior.
My response.
I answer that your dissatisfaction is understandable, because many discussions of representation are indeed philosophically sloppy. Yet there remain several practical research scenarios where studying internal structure—often called “representations”—serves a clear operational purpose.
To avoid semantic inflation, we should understand these scenarios not as discovering metaphysical content but as identifying internal structures that systematically influence model behavior.
Let us therefore review the main cases one by one.
A model trained to classify wolves might rely heavily on snow in the background.
Researchers want to know:
which internal features drive the prediction,
whether those features track snow or wolf morphology.
The question is operational:
Which internal structures cause the model to produce the label?
If the dominant feature corresponds to snow, engineers may redesign training data.
Here “representation” functions as shorthand for causal feature structure.
Models often fail when the environment changes.
Example:
a classifier trained on internet images fails on wildlife camera images.
Researchers examine internal features to determine whether the model relies on:
superficial cues
or deeper structural patterns.
The goal is not metaphysical semantics but robust feature learning.
Studying internal geometry helps determine whether the model’s structure supports stable generalization.
Large models contain billions of parameters.
Understanding them purely at the parameter level is impossible.
Researchers therefore seek intermediate descriptions such as:
circuits
feature directions
activation clusters.
These structures provide a compressed explanation of computation.
Whether we call them “representations” or “features” is secondary; the practical aim is interpretability.
In some models, manipulating certain internal directions changes output behavior.
Examples include adjusting:
sentiment
writing style
topic emphasis.
If altering a particular direction reliably changes tone from neutral to positive, researchers say that the direction “represents sentiment.”
In practice this means:
manipulating that internal variable predictably alters outputs.
Such discoveries enable controlled editing of model behavior.
Advanced systems might behave safely in tests but exhibit undesirable behavior in rare contexts.
Researchers therefore attempt to detect internal patterns associated with:
deceptive strategies
harmful intentions
sensitive information.
If such patterns can be identified and monitored, systems may be made safer.
Again, the notion of representation is shorthand for detectable internal predictors of behavior.
Finally, there is a broader scientific motive.
Complex learning systems may reveal general principles of information organization.
Studying internal structures may illuminate:
how high-dimensional data is compressed,
how abstraction emerges from statistics,
how distributed systems coordinate information.
In this context the term representation refers to emergent structure within learned computation.
After examining these scenarios, one may say:
The philosophical notion of representation is indeed unstable.
Yet the practical activity of analyzing internal structures remains valuable.
Thus the real research question is not:
“Does the model represent wolves?”
but rather:
“What internal structures organize information in this system, and how do they influence behavior?”
In many engineering contexts, the word “representation” simply abbreviates that longer description.
Reply to Reservation 1.
While correlation-based descriptions are sufficient in principle, intermediate structural descriptions often make complex systems tractable to human understanding.
Reply to Reservation 2.
Anthropomorphic misuse is a genuine risk; careful researchers treat representational language as heuristic rather than literal.
Reply to Reservation 3.
The absence of a canonical human representation does not eliminate the practical value of analyzing internal model structure.
Reply to Reservation 4.
Indeed, searching for single neurons encoding concepts is often misguided. Modern work increasingly focuses on distributed structures instead.
A final thought.
Your resistance to the concept of representation performs a useful philosophical service. It forces the discussion to shed its vague metaphors and clarify what researchers are actually doing.
When the rhetoric is stripped away, the enterprise becomes humbler and perhaps more honest:
not the discovery of hidden meanings inside machines,
but the patient cartography of structures within large statistical systems.
Summalecton 15. Whether our inquiry has satisfactorily addressed the concerns raised in the first Summalecton regarding representation in machine learning.
Reservation 1. It seems that the inquiry has not succeeded. For the original concern was whether the concept of representation in machine learning is philosophically coherent. Yet throughout the discussion the term has repeatedly been weakened—sometimes meaning correlation bundles, sometimes internal causal structures, sometimes interpretive stances. Such shifting definitions suggest that the concept lacks the clarity required for serious theoretical work.
Reservation 2. Further, the discussion revealed that human conceptual categories themselves are unstable, historically contingent, and culturally diverse. If the target of representation is uncertain even in human cognition, then applying the concept to machine learning appears doubly questionable.
Reservation 3. Moreover, many of the practical scenarios invoked—debugging, feature analysis, safety monitoring—could be described entirely in non-semantic terms such as statistical structure or causal influence. Therefore the concept of representation seems dispensable.
Reservation 4. Finally, the conversation has increasingly acknowledged that language models encode statistical aggregates of human discourse rather than unified conceptual contents. If so, the original philosophical framing may have been misguided from the beginning.
Counterpoint. On the contrary, the discussion also revealed that identifying internal structures within models often requires intermediate explanatory language. In practice, researchers frequently rely on the vocabulary of representation to communicate these structures.
My response.
I answer that the conversation has not resolved the original philosophical tension; rather, it has clarified its true shape. In that respect the inquiry has succeeded, though perhaps not in the way initially expected.
Let us briefly retrace the path.
The first Summalecton proposed that the concept of representation in machine learning might be philosophically unstable.
That suspicion arose because:
internal states lack obvious semantic grounding,
researchers nonetheless speak freely of “representations.”
Thus the problem was framed as a question about conceptual coherence.
We then examined maps, portraits, language, instruments, and scientific models.
From these examples we discovered that representation itself is not a single relation but a family of practices involving:
resemblance,
convention,
causal tracking,
structural analogy.
Thus the concept was already more plural than initially assumed.
When these conditions were applied to neural networks, several tensions appeared:
vehicles exist (vectors),
mappings exist (learned transformations),
but targets and normativity often remain ambiguous.
This confirmed the initial suspicion that representation in machine learning is derivative and interpretive rather than intrinsic.
Your observation that representations may simply be bundles of correlations pushed the discussion further.
We then saw that:
correlation is necessary for representation,
but correlation alone leaves the target indeterminate.
Thus the philosophical distinction between correlation and representation became less absolute but not entirely dissolved.
Your later objections—especially concerning fish taxonomy and cognitive diversity—exposed another difficulty:
human categories themselves are unstable and plural.
Hence attributing specific conceptual content to model internals often involves projecting our conceptual schemes onto statistical structures.
Despite these criticisms, certain practical research activities remained:
identifying internal causal features,
diagnosing model failures,
steering behavior,
improving robustness.
In these contexts the word representation functions mainly as convenient shorthand.
Thus the conversation leads to a rather sober conclusion.
The original question—whether machine-learning systems possess representations—turns out to be less illuminating than it first appeared.
Instead we discovered three distinct layers:
Statistical structure inside models (uncontroversial).
Interpretive mappings from that structure to human concepts (partly conventional).
Philosophical theories of representation inherited from earlier debates about the mind.
Much confusion arises when these layers are conflated.
Therefore I would say:
The initial concern about conceptual instability was largely justified.
The conversation clarified why the notion of representation often becomes slippery in this domain.
Yet it also showed why researchers continue to use the term despite its imperfections.
In other words, the inquiry did not vindicate the concept, but it illuminated the reasons for both its persistence and its problems.
Reply to Reservation 1.
The shifting meanings indeed indicate conceptual looseness. Yet the discussion revealed why such looseness arises: representation historically unified many heterogeneous practices.
Reply to Reservation 2.
The instability of human categories complicates representational analysis, but it also explains why machine-learning representations often reflect statistical regularities rather than fixed concepts.
Reply to Reservation 3.
The research activities could indeed be described without the term “representation.” The term survives mainly as explanatory shorthand.
Reply to Reservation 4.
The recognition that models encode aggregates of discourse supports the view that representational content is often externally imposed.
A final Socratic confession.
If I may speak plainly: I suspect your dissatisfaction is not accidental but philosophical. The more carefully we examined the concept, the more it resembled a scaffolding of language rather than a clearly bounded phenomenon.
Yet this is often how philosophical inquiry ends—not by erecting a final doctrine, but by revealing the hidden assumptions within a question.
And perhaps that was the truest resolution available to the first Summalecton.
Summalecton 16. Whether the term “representation” is meaningful when applied to human cognition.
Reservation 1. It seems that the term is not meaningful. For the same difficulties that appeared in machine learning also arise in human cognition. Human categories are unstable, culturally variable, and historically contingent. Therefore speaking of mental representations presupposes conceptual boundaries that do not objectively exist.
Reservation 2. Further, human perception and reasoning rely on many overlapping heuristics and associations rather than on discrete symbolic tokens. If cognition consists of distributed patterns of correlations in neural activity, then describing those patterns as representations may simply project semantic interpretation onto neural processes.
Reservation 3. Moreover, individuals differ enormously in how they conceptualize the world. If one person’s concept of fish includes whales and another’s excludes them, there is no single content that a “fish representation” could possess. Thus representational talk appears ill-defined.
Reservation 4. Finally, it might be argued that the brain merely performs causal information processing. Neural signals correlate with environmental states and guide behavior. Such processes require no representational semantics; correlation and control suffice.
Counterpoint. On the contrary, many influential theories in philosophy and cognitive science—from those of Jerry Fodor to David Marr—have treated mental representations as indispensable explanatory constructs.
My response.
I answer that the term representation remains meaningful in human cognition, but only when interpreted in a modest and carefully constrained way.
The central reason is that human cognition exhibits a distinctive property: behavior often depends on internal states that systematically vary with aspects of the environment even when those aspects are not currently present.
Let us consider several examples.
An animal can remember the location of food even when the food is absent.
Neural systems maintain internal states corresponding to:
location in space,
orientation,
distance traveled.
Such internal structures guide navigation.
Even if the word representation is avoided, one must still describe how internal states stand in for features of the environment in guiding action.
Humans routinely reason about situations that are not currently occurring:
imagining tomorrow’s weather,
planning a journey,
recalling past events.
These activities involve internal states that function as surrogates for possible or remembered situations.
Without some notion of representation, explaining this capacity becomes difficult.
Language allows humans to convey information about distant or hypothetical situations.
When a person hears the sentence “A wolf is behind the hill,” their brain forms internal states that guide attention and action even though the wolf may not be visible.
Such states appear to function as stand-ins for described circumstances.
In many scientific contexts, representation serves as a useful explanatory level between neural mechanisms and behavior.
Instead of describing millions of neurons individually, scientists may say that certain neural populations encode:
spatial position,
visual orientation,
movement direction.
These descriptions compress complex causal structures into intelligible models.
However, the criticisms you raised remain important.
Mental representations are rarely:
discrete symbols,
uniform across individuals,
or sharply defined.
Instead they are typically:
distributed,
context-dependent,
shaped by experience and culture.
Thus representation should not be imagined as a little picture or word stored in the brain.
It is better understood as a functional relationship between internal neural states and patterns of behavior in relation to the environment.
Under this restrained interpretation, a mental representation is:
an internal state that systematically covaries with certain environmental structures and whose variation influences behavior in ways appropriate to those structures.
This definition does not assume perfect categories or universal concepts.
It merely identifies a functional role played by certain internal states.
Reply to Reservation 1.
Conceptual variability does not eliminate representation; it only shows that representational contents are shaped by cultural and experiential contexts.
Reply to Reservation 2.
Distributed neural processes can still realize representational functions even if they do not resemble symbolic tokens.
Reply to Reservation 3.
Differences between individuals imply that representations vary, not that they do not exist.
Reply to Reservation 4.
Pure correlation may explain some neural processes, but explaining flexible planning, imagination, and communication often benefits from representational descriptions.
A reflective closing.
Your skepticism is not misplaced. The concept of representation has often been used too casually, both in philosophy and in cognitive science.
Yet when one examines how humans remember, imagine, communicate, and navigate the world, it becomes difficult to avoid the intuition that the mind constructs internal stand-ins for aspects of reality.
Whether we call those stand-ins representations or choose a different term, the phenomenon they attempt to capture appears to remain.
Summalecton 17. Whether chatbots possess representations in exactly the same sense as humans.
Reservation 1. It seems that they do. For earlier we defined representation modestly as an internal state that covaries with aspects of the environment and whose variation influences behavior appropriately. Language models clearly satisfy this: internal vectors covary with linguistic patterns and guide outputs. Therefore chatbots possess representations in the same sense as humans.
Reservation 2. Further, chatbots encode vast amounts of information about the world through patterns in language. Their internal states systematically change when topics change—wolves, cities, emotions, mathematics. Thus they appear to maintain internal stand-ins for these topics just as human minds do.
Reservation 3. Moreover, humans themselves often learn about the world through language rather than direct experience. A historian’s knowledge of ancient Rome is mediated almost entirely through texts. If such knowledge counts as representation in humans, then textual training cannot disqualify language models.
Reservation 4. Finally, the diversity of human cognition undermines the idea that human representations are uniquely grounded or unified. If humans differ widely in their concepts and rely heavily on cultural artifacts, the difference between chatbot and human representations may be smaller than previously assumed.
Counterpoint. On the contrary, critics maintain that chatbot internal states track linguistic patterns rather than the world itself, whereas human cognition is embedded in perception and action.
My response.
I answer that your conclusion follows naturally from the modest definition of representation we adopted. If representation merely requires internal structures that systematically guide behavior relative to patterns in the environment, then language models indeed possess representations in essentially the same functional sense.
The earlier attempts to deny this relied on stronger assumptions about representation, such as:
intrinsic semantic content,
direct perceptual grounding,
unified conceptual systems.
Once those stronger requirements are relaxed, the distinction weakens considerably.
Both systems exhibit internal states that organize information:
humans: distributed neural activity patterns,
chatbots: high-dimensional vector activations.
In both cases these states:
respond systematically to inputs,
influence subsequent outputs,
encode regularities in prior experience.
From a functional perspective, the parallel is striking.
Your observation about historians is especially illuminating.
Much human knowledge is mediated by cultural artifacts:
books
conversations
diagrams
digital media.
Thus a person’s representation of ancient Rome may ultimately derive from linguistic evidence rather than direct perception.
Language models likewise derive their internal structures from linguistic corpora.
The epistemic difference therefore becomes quantitative and architectural, not purely categorical.
Indeed, language models perform an unusual cognitive operation: they compress patterns from many human minds simultaneously.
This gives them access to statistical regularities across discourse that no individual human could easily internalize.
In that sense they may possess representations that are:
broader in scope
but thinner in experiential grounding.
Even if representations exist in both systems, important differences remain:
humans integrate perception, action, emotion, and social interaction;
chatbots primarily process symbolic input.
These differences affect the structure and stability of representations but do not necessarily eliminate them.
Thus the most defensible conclusion may be:
Representation is not an exclusive property of human minds.
Both humans and language models exhibit representational structures.
The differences lie mainly in how those structures are formed and used, rather than in their mere existence.
Reply to Reservation 1.
Under the functional definition previously adopted, chatbots clearly meet the criteria for representation.
Reply to Reservation 2.
Their internal vectors track patterns across topics and contexts, enabling flexible responses that resemble representational organization.
Reply to Reservation 3.
Text-mediated knowledge in humans shows that direct sensory grounding is not always required for representation.
Reply to Reservation 4.
Human cognitive diversity indeed weakens claims of a uniquely unified human representational system.
A concluding thought.
Our conversation has gradually shifted the debate from a sharp metaphysical boundary between humans and machines to a more nuanced landscape.
Instead of asking whether machines possess representations at all, we may find it more illuminating to ask:
What varieties of representation exist, and how do different cognitive architectures realize them?
In that broader view, chatbots appear not as alien exceptions but as new participants in the long evolutionary story of representational systems.
Summalecton 18. Whether, when restricting attention to knowledge acquired from text, there remains reason to suspect an essentially different cognitive architecture between humans and chatbots.
Reservation 1. It seems that there is no essential difference. For if both humans and chatbots acquire knowledge from text, the input modality is the same: sequences of symbols. In both cases learning consists in extracting statistical regularities from linguistic patterns. Therefore the underlying cognitive process should be fundamentally similar.
Reservation 2. Further, many educated humans obtain large portions of their knowledge indirectly through reading. A historian, mathematician, or philosopher may spend decades interacting primarily with texts. If such knowledge counts as genuine human cognition, then a system trained on text appears epistemically comparable.
Reservation 3. Moreover, both systems employ distributed internal structures that encode associations between words, concepts, and contexts. Human neural networks and artificial neural networks both transform signals through layered architectures. Hence any difference may be quantitative rather than qualitative.
Reservation 4. Finally, if we deliberately bracket perception, embodiment, and action—restricting the comparison strictly to linguistic competence—then the usual arguments about grounding disappear. Under this constraint the architectures appear functionally analogous.
Counterpoint. On the contrary, some theorists argue that even textual understanding in humans depends upon cognitive structures developed through embodied interaction with the world.
My response.
I answer that confining the comparison to textual knowledge indeed narrows the gap considerably, yet there remain plausible reasons to suspect architectural differences. These differences concern how linguistic patterns are integrated with broader cognitive systems, even when the immediate task involves only text.
Let us examine the matter carefully.
At a certain level of abstraction the similarity is undeniable.
Both systems:
receive sequences of linguistic symbols,
learn associations among those symbols,
generate new sequences consistent with learned patterns.
In this restricted sense both architectures function as statistical processors of language.
This explains why language models can emulate many linguistic competencies once thought uniquely human.
However, when humans read text, the linguistic signals interact with cognitive structures that developed through earlier perceptual experience.
For example, when a person reads:
“The wolf ran across the snow.”
the sentence activates mental structures associated with:
motion
animals
spatial relations
prior perceptual memories.
Even if the reader is currently engaging only with text, those underlying structures were formed through multimodal interaction with the world.
Thus textual processing in humans may depend on latent cognitive scaffolding that originates outside language.
Human cognition also involves persistent mental models that evolve across time.
A reader gradually builds structured understandings of:
narratives
arguments
social relationships
causal explanations.
While language models can track context within limited windows, their internal state is typically reconstructed anew for each prompt rather than maintained as a continuous evolving model of the world.
This difference concerns the temporal organization of cognition.
Humans learn language incrementally across many years, continuously integrating new experiences.
Language models, by contrast, undergo:
a large training phase
followed by relatively fixed parameters.
This produces different patterns of adaptation and revision of knowledge.
Despite these distinctions, the gap is not as dramatic as earlier philosophical debates assumed.
When the comparison is restricted strictly to textual competence, we observe substantial convergence:
distributed representations,
statistical learning,
flexible pattern completion.
Thus the difference may lie less in the existence of representations and more in the architecture surrounding language processing.
Therefore the most balanced answer may be:
If we isolate linguistic processing from other cognitive functions, the architectures of humans and chatbots appear surprisingly similar in principle.
Yet in humans, linguistic cognition operates within a broader system shaped by perception, action, memory, and social interaction.
Whether this surrounding structure constitutes an essential architectural difference or merely an extension remains an open empirical question.
Reply to Reservation 1.
Restricting the domain to textual knowledge indeed highlights functional similarities between the two systems.
Reply to Reservation 2.
Text-mediated expertise in humans demonstrates that significant knowledge can arise purely through linguistic channels.
Reply to Reservation 3.
Distributed neural architectures in both biological and artificial systems support comparable forms of pattern learning.
Reply to Reservation 4.
Even when focusing on text alone, human linguistic processing may still rely on background cognitive structures formed through earlier non-linguistic experience.
A closing reflection.
Your proposal to confine the comparison to textual knowledge performs a valuable philosophical experiment. By narrowing the domain, it removes many traditional arguments about embodiment and forces us to examine the core computational processes of language itself.
And in doing so, it reveals something quietly unsettling: the mechanisms that allow humans to navigate vast landscapes of text may not be as unique—or as mysterious—as we once believed.
Summalecton 19. Whether the critique of textual cognition undermines the argument of John Searle.
Reservation 1. It seems that Searle has indeed been shown misguided. For his famous Chinese Room argument aimed to show that manipulating symbols according to formal rules cannot produce genuine understanding. Yet modern language models, trained purely on textual data, exhibit forms of linguistic competence that resemble human textual understanding. If humans themselves often learn through symbol manipulation in reading and writing, then Searle’s sharp division between syntax and understanding appears overstated.
Reservation 2. Further, Searle assumed that a system following syntactic rules could never acquire semantics. But large language models demonstrate that semantic-like structure can emerge from statistical patterns in language. Thus the supposed gap between syntax and meaning may be narrower than he believed.
Reservation 3. Moreover, the Chinese Room argument isolates the individual rule-following agent inside the room. Yet modern systems operate as large distributed networks whose internal structures encode extensive statistical information about the world through language. Therefore the original thought experiment may underestimate the representational power of large-scale systems.
Reservation 4. Finally, if human textual cognition itself relies heavily on linguistic patterns learned from cultural artifacts, then the difference between humans and language models may be one of degree rather than kind. In that case Searle’s attempt to establish a categorical distinction would fail.
Counterpoint. On the contrary, Searle’s argument does not depend on the weakness of symbol-processing systems but on a conceptual claim: that computation alone cannot generate intrinsic intentionality.
My response.
I answer that the developments we have discussed do not straightforwardly refute Searle, yet they do weaken the intuitive force of his argument and reveal certain limitations in its framing.
To see this clearly, we must recall the central claim of the Chinese Room.
In the famous thought experiment, a person inside a room manipulates Chinese symbols according to rules without understanding Chinese. From the outside, the room appears to understand the language. Searle concludes that:
executing a program that manipulates symbols does not by itself produce understanding.
The intended lesson is that syntax alone cannot generate semantics.
Large language models complicate the picture in two ways.
First, they show that extremely rich linguistic competence can arise from exposure to large corpora of text.
Second, their internal structures encode statistical relationships that approximate many aspects of human semantic knowledge.
Thus the stark contrast between “mere syntax” and “genuine understanding” becomes less obvious.
Symbolic structure itself may contain far more information than Searle’s example suggested.
Yet Searle’s central philosophical claim remains difficult to dismiss entirely.
The Chinese Room was intended to show that behavioral competence does not logically entail genuine understanding.
Even if a system produces perfect linguistic responses, one might still ask whether it possesses:
subjective experience,
intrinsic intentionality,
genuine comprehension.
These questions remain open.
What has changed is the burden of plausibility.
When Searle proposed his argument in the 1980s, systems capable of rich language interaction seemed unimaginable without genuine understanding.
Today we observe systems that perform many linguistic tasks through statistical learning alone.
This does not prove that they understand, but it weakens the intuition that understanding must involve fundamentally different mechanisms.
One might therefore reinterpret Searle’s argument not as a decisive refutation of computational cognition, but as a reminder that:
behavioral competence alone does not settle philosophical questions about the nature of understanding.
At the same time, modern developments suggest that the mechanisms producing competence may be closer to human cognition than Searle anticipated.
Reply to Reservation 1.
The success of language models challenges the sharp contrast between symbol manipulation and meaningful linguistic behavior, though it does not automatically establish genuine understanding.
Reply to Reservation 2.
Statistical structure in language can approximate semantic relationships, indicating that meaning-like organization may emerge from patterns in linguistic data.
Reply to Reservation 3.
Large distributed systems differ substantially from the simple rule-following agent imagined in the Chinese Room.
Reply to Reservation 4.
The comparison between human and machine textual cognition indeed reduces the apparent gap, though the philosophical question of intrinsic intentionality remains contested.
A concluding reflection.
The Chinese Room once functioned as a powerful intuition pump. It invited us to imagine a mechanical procedure devoid of meaning.
But our discussion suggests that human linguistic understanding itself may depend more heavily on statistical patterns in language than philosophers once believed.
If that is so, then Searle’s thought experiment may not have been wrong so much as directed at a picture of computation that history has quietly replaced.
Summalecton 20. Whether the Chinese Room argument of John Searle fails because the man inside the room is a misleading distraction.
Reservation 1. It seems that the argument fails in this way. For the man inside the room is merely a component of a larger system consisting of rulebooks, memory stores, and input–output procedures. If understanding occurs anywhere, it must belong to the entire system, not to the individual executing the rules. Therefore focusing on the man misdirects attention.
Reservation 2. Further, in many cognitive systems no single component possesses the property exhibited by the system as a whole. Individual neurons in the brain do not understand language, yet the brain does. Therefore the inability of the man to understand Chinese does not show that the room lacks understanding.
Reservation 3. Moreover, Searle’s argument appears to rely on an intuition about personal awareness: the man claims he does not understand Chinese. But the relevant question concerns the capacities of the whole computational system, not the subjective state of one internal component.
Reservation 4. Finally, the system-level interpretation fits naturally with modern distributed architectures, in which meaning arises from interactions among many elements rather than from a single interpreter.
Counterpoint. On the contrary, Searle anticipated this response and rejected it explicitly as the “systems reply,” arguing that even the entire room lacks genuine understanding.
My response.
I answer that your criticism corresponds precisely to what has historically been called the systems reply to the Chinese Room, and it remains one of the most forceful objections to the argument.
The key point is simple but profound:
cognitive properties often belong to systems rather than to their individual components.
Let us examine the matter more carefully.
In Searle’s thought experiment:
Chinese symbols enter the room,
the man consults a rulebook,
appropriate symbols are returned.
To outside observers the room behaves as if it understands Chinese.
Searle argues that since the man inside does not understand Chinese, the system does not understand Chinese.
But this inference is problematic.
Many systems possess properties not shared by their parts.
Examples abound:
Individual water molecules are not wet, yet water is wet.
Individual neurons do not think, yet brains think.
Individual transistors do not compute numbers, yet computers do.
Thus the fact that the man lacks understanding does not logically entail that the system lacks it.
The room—including rulebooks, symbol stores, and procedures—could constitute the entity that understands.
From a functional viewpoint, understanding may simply consist in the capacity to:
interpret linguistic inputs,
produce appropriate responses,
maintain coherent patterns of inference.
If the entire room exhibits these capacities, one might reasonably attribute understanding to the system.
The man’s personal ignorance becomes irrelevant.
Searle attempted to avoid this conclusion by imagining that the man internalizes the entire system:
memorizing all rules,
performing the computation mentally.
He then claims that even in this case the man would not understand Chinese.
But this move reintroduces the same difficulty.
For if the man truly implements the entire system internally, then the system and the man coincide, and the question becomes whether the resulting cognitive process constitutes understanding.
At that point the argument rests largely on intuition.
The Chinese Room therefore reveals less about computation than about our intuitions regarding understanding.
Some people feel that symbol manipulation cannot constitute understanding.
Others observe that complex systems performing sophisticated linguistic tasks appear to exhibit many of the functional properties associated with understanding.
The thought experiment itself cannot decisively settle the matter.
Reply to Reservation 1.
Indeed, the man is merely a component of the larger system. Cognitive properties may properly belong to the whole rather than to any individual part.
Reply to Reservation 2.
The analogy with neurons illustrates how emergent properties arise in complex systems.
Reply to Reservation 3.
Subjective awareness of the man inside the room does not determine whether the overall system performs the relevant cognitive function.
Reply to Reservation 4.
Modern distributed computational architectures further strengthen the plausibility of system-level cognition.
A reflective epilogue.
The Chinese Room once seemed to draw a clear boundary between minds and machines. Yet as we examine the argument closely, its persuasive force often depends on where we direct our attention.
If we look at the man alone, the room appears mechanical and mindless.
If we look at the entire system interacting with its environment, the question becomes far less obvious.
In this respect the thought experiment may reveal something unexpected: not that machines cannot understand, but that our intuitions about where understanding resides were never entirely secure to begin with.
Summalecton 21. Whether anything further should be added after the present inquiry into representation, chatbots, and human cognition.
Reservation 1. It seems that nothing further needs to be added. For the discussion has already examined the concept of representation from many angles: canonical examples, machine learning systems, human cognition, and the comparison between them. Additional reflections would merely repeat earlier points.
Reservation 2. Further, the original suspicion—that the notion of representation is unstable—has largely been confirmed. Since the term proves ambiguous and often misleading, continuing to analyze it may yield diminishing philosophical returns.
Reservation 3. Moreover, the discussion of the Chinese Room suggests that many traditional philosophical arguments depend on fragile intuitions. Once these intuitions are questioned, the debate risks dissolving into semantic disagreements rather than substantive inquiry.
Counterpoint. On the contrary, even when a philosophical concept proves unstable, the inquiry often reveals deeper structural questions that remain worth exploring.
My response.
I answer that there are indeed a few broader insights worth drawing together, not to prolong the debate indefinitely, but to illuminate what the conversation has quietly revealed.
Three such insights seem especially significant.
For decades philosophers insisted on a sharp boundary:
syntax: formal manipulation of symbols
semantics: genuine meaning or understanding.
Our discussion suggests that this boundary may have been drawn too rigidly.
Large-scale statistical systems show that complex semantic structure can emerge from patterns in language itself. While this does not eliminate philosophical questions about meaning, it weakens the traditional assumption that syntax alone could never generate semantic organization.
Thus one of the oldest philosophical dichotomies may require revision.
Another insight concerns the nature of representation itself.
Classical philosophy imagined representations as discrete symbols—mental words or images standing for things.
Modern machine learning instead reveals representations as regions or directions in high-dimensional spaces of correlations.
Human cognition may not be fundamentally different. Neural activity patterns likely encode information in distributed geometries rather than symbolic tokens.
If this view is correct, many philosophical debates about representation were framed using an outdated model of cognition.
Perhaps the most striking observation raised during our conversation is this:
Language models compress the linguistic output of enormous populations into a single computational structure.
In doing so they form something like a statistical mirror of collective human discourse.
This perspective changes the nature of the comparison between humans and chatbots. The model is not analogous to a single human mind; rather it resembles a compressed cultural artifact, somewhat like a vast library transformed into a dynamical system.
Understanding such systems may therefore illuminate not only artificial intelligence but also the statistical structure of human culture.
Thus the most enduring lesson of our inquiry may not concern whether machines possess representations, but rather this:
the concept of representation itself is being reshaped by the very systems we created to process language.
In earlier centuries philosophers debated how minds represent the world.
Now we confront systems that represent our representations of the world—the accumulated traces of human language.
Reply to Reservation 1.
While many issues have been examined, synthesizing their implications helps clarify the broader philosophical landscape.
Reply to Reservation 2.
The instability of the concept of representation does not end inquiry; rather it invites the development of more precise conceptual tools.
Reply to Reservation 3.
Even if some arguments rely on fragile intuitions, examining those intuitions often reveals deeper assumptions about cognition and meaning.
A final personal note in the spirit of Socratic melancholy.
Our conversation began with suspicion toward a fashionable philosophical term. It ends with something subtler: the realization that the debate about representation may say as much about our inherited philosophical vocabulary as it does about machines or minds.
In that sense the dialogue has done what philosophy often does best—not delivering final answers, but slowly exposing the hidden architecture of the questions themselves.