Lexical meaning is lower dimensional in psychosis

In a paper published last year in Scientific Reports , we explored a relatively simple but interesting question: What if psychosis-related speech can be characterized by the underlying geometry of meaning itself?

But what do I mean by the geometry of meaning? One way to think about it is to imagine that words and ideas live in a kind of abstract map. In the same way that cities can be close or far from one another on a geographic map, meanings can also be close or distant from one another in language. For example, words like “dog”, “cat”, and “animal” would occupy nearby regions because they are related in meaning, while words like “justice” or “Cheerios” would lie much farther away. Current language models learn these relationships automatically from big amounts of text, allowing them to represent words as positions in a very large semantic landscape. But unlike a regular map, which unfolds in two dimensions on paper, or the phsysical space in three observable dimensions, these spaces are modeled using many different dimensions, thousand of them in some language models. When we speak, we can imagine our discourse as moving through that landscape, jumping from one region of meaning to another.

Large language models, like the ones behind chatbots such as ChatGPT, represent words as points in a very large semantic space. This allows speech to be imagined as a trajectory moving through the landscape of meaning. When people speak, their discourse can move across many semantic directions or remain confined to a narrower region. If meanings repeatedly cluster around similar areas, the trajectory explores less of the space, and the speech may become more repetitive, or semantically constrained.

A large amount of computational work on psychosis and language has focused on semantic similarity, and studies repeatedly find that words produced by patients tend to be more semantically similar to one another than in healthy controls. Previous work using word embeddings and language models has shown this pattern across different datasets and tasks.

In this paper, we tried to push that idea one step further.

Instead of only asking whether words are more similar, we asked whether the dimensionality of semantic space itself becomes reduced in psychosis. The intuition is that that if speech explores fewer independent semantic directions, then the effective dimensions needed to describe that speech should also become smaller. To test this, we used embeddings from language models to represent speech samples as vectors in the space. Then we applied dimensionality reduction techniques, mainly Principal Component Analysis (PCA), together with estimates of intrinsic dimensionality (ID), to evaluate how reducible those semantic spaces were. To do it, we ask questions such as:

  • How many components are needed to explain most of the variance?
  • How much variance is captured by the first few components?
  • What is the minimum effective dimensionality required to describe the speech sample?

The datasets included speech samples in three different languages, which was important because many computational findings in psychiatry remain heavily English-centered. Despite linguistic differences, the pattern was remarkably consistent: speech from psychosis groups showed higher reducibility and lower effective dimensionality across datasets.

What makes this interesting is that it potentially reframes several previous findings under a more unified perspective. Higher semantic similarity, repetitive associations, restricted semantic exploration, and related phenomena may all reflect a deeper geometric property of discourse organization. Rather than isolated markers, they could emerge from changes in the structure of semantic space itself.

This also connects computational psychiatry more directly with broader ideas from dynamical systems, geometry, and network organization. Recent years have seen increasing interest in describing cognition using concepts such as manifolds, trajectories, intrinsic dimensionality, and geometric constraints. The present work suggests that these ideas may not only apply to neural activity or artificial systems, but also to the organization of meaning in natural language. The paper does not argue that psychosis means “less complex language.” The phenomenon is likely more subtle. A lower-dimensional semantic space may reflect stronger attraction toward certain semantic regions, altered contextual transitions, or reduced flexibility in navigating conceptual space. In other words, the issue may concern how meanings are organized and traversed, not merely vocabulary size or grammatical ability.

There are also important methodological implications. Much of current NLP-based psychiatric research relies on large collections of individual features whose interpretation can become obscure. Looking at the geometry of semantic organization offers a potentially more foundational level of analysis. It shifts attention from isolated markers toward the global structure underlying them.

Of course, there are still many open questions. We do not know exactly which cognitive or neurobiological mechanisms produce these geometries. Nor is it clear how these semantic constraints evolve longitudinally, relate to symptoms, or interact with clinical states such as remission or relapse. But the results suggest that language models can capture not only surface linguistic patterns, but also deeper organizational properties of meaning. I believe that this study contributes to a growing literature treating language as a window into latent cognitive organization, and language models are finally giving us tools to study those properties.

Palominos, C., Stein, F., Kircher, T., Ayesa-Arriola, R., Palaniyappan, L., Homan, P., ... & Hinzen, W. (2025). Lexical meaning is lower dimensional in psychosis. Scientific Reports.