Incoherence seems to me preferable to a distorting order. —Barthes
One of the ideas I thought about a lot when studying metaphysics and that continued to find me everywhere since then has been Projectivism. In our perception and understanding of the world, we attribute to it structure that it doesn’t actually have. Hume’s argument against causation used a form of this diagnosis: we often perceive two events or a series of events in sequence—B always follows A—and imagine that there is some causal relation between A and B, so that A causes B. We never observed the causal relation itself, though, so Hume thinks we have no justification for believing in that relation. All that we observe is correlation.
Visualizations are useful for understanding data, but they blur the line between what the data actually show and the patterns we project onto data. Every visualization embeds assumptions about the data’s structure, and the challenge isn’t just validating our assumptions — it’s recognizing when we’ve convinced ourselves we see meaningful patterns that exist only in our interpretation and not in reality. This distinction becomes especially dangerous when the same dataset can be made to tell completely different stories depending on our analytical choices.
Before we look at experiments, let’s understand why these visualizations can be problematic. Several landmark studies have revealed some of our interpretive biases and misunderstandings:
Albert et al. (2018) showed that even randomly sampled crime incidents produced illusory “hot-spots,” leading participants to re-allocate police resources. Using the VALCRI (Visual Analytics for Sense-making in CRiminal Intelligence analysis) project, the authors asked participants to evaluate whether they would increase police presence in one of two city districts, along with follow-up questions about how the data influenced their decisions and whether they could justify their decisions, given tools that showed spatial and chronological distribution of crime incidents in those two districts. They were given “random condition” data randomly selected from a large set of incidents, “pattern condition” data reflecting real spatial and temporal patterns, and presented with the data in an “interactive condition” where they could interact with tools to inspect incidents from different perspectives as well as a “static condition” where they could not interact with tools.
McNutt et al. (2020) later coined the term visualization mirages for silent but significant failures that arise at any stage of the analytic pipeline.
Cleveland & McGill’s classic experiments rank visual encodings by accuracy: position ≫ length/angle ≫ area ≫ color. Because t-SNE and UMAP re-encode high-dimensional distance as area density and color, they push viewers toward less reliable perceptual channels—making misreading almost inevitable.
Embedding visualizations face three key challenges: algorithmic sensitivity to parameter choices, method-specific trade-offs between local and global structure preservation, and the gap between what the algorithms optimize for versus what viewers need to interpret. While humans are capable of perceiving patterns and relative distances, different algorithms make different implicit choices about which aspects of high-dimensional structure to prioritize—choices that can dramatically change the story the same data appear to tell.
Let’s look at how this happens through a few experiments. I’ll limit focus here to t-SNE and UMAP, two popular embedding methods. Both are powerful methods, but need to be treated with some caution, as I’ll hope to illustrate below.
This experiment demonstrates what researchers call “apophenia”—our tendency to see meaningful patterns in random data. The algorithm parameters don’t just reveal structure; they impose some structure that isn’t there.
Research on ensemble perception shows a similar pitfall: observers can summarise large point clouds quickly, but their subjective confidence often diverges from ground-truth accuracy. Two examples are the survey of ensemble coding tasks by Szafir et al. (2016) and the “Regression by Eye” experiments by Correll & Heer (2017).
Wattenberg’s Distill guide to t-SNE (henceforth Wattenberg et al. (2016)) explains that the algorithm expands dense areas and contracts sparse ones—“cluster sizes … mean nothing”—and that lowering perplexity can manufacture clusters in pure noise.
UMAP’s own documentation warns that it “does not completely preserve density” and “can also create false tears in clusters” (UMAP-learn docs).
Kobak & Berens (2019) provide biological case-studies where such artefacts mislead interpretation, and Chari & Pachter (2023) show that, in large single-cell benchmarks, neighbor-overlap often falls below 0.3.
Watch how the same data tells different stories based on parameter choices (“Those hyper-parameters really matter,” Wattenberg et al., 2016). Wattenberg et al. (2016) showed that t-SNE is so sensitive to hyperparameters that you can make data look like it has distinct clusters or continuous structure that doesn’t exist.
In this experiment, the range is a bit small (perplexity 2-50) and some of the variance only really comes out at higher perplexity values like 100 or so. But you can still see some differences!
Running embeddings until one “looks interesting” (a real thing that happens!) is a visual form of p-hacking. A 2025 CHI study on confirmation bias in dashboard “data facts” shows that analysts overwhelmingly choose views that confirm prior beliefs and ignore contradictory ones
The research is clear about several critical issues:
Wattenberg (on t-SNE) and Kobak & Berens both demonstrate that visually separate islands can be artefacts. Users assume that visual clusters represent meaningful groups in the data, but this assumption is often violated by dimensionality reduction algorithms. As noted in the t-SNE literature, visual clusters can appear even in structured data with no clear clustering, making them potentially spurious findings.
Wattenberg also notes that visual “stability” can be forced by parameter tweaking without improving fidelity.
Once users see a pattern, they create stories to explain it. Two papers from Cindy Xiong (2023 and 2019) and collaborators offer complementary findings:
Here are a few evidence-based recommendations:
Ask these critical questions:
The solution isn’t to abandon embedding visualizations entirely—they can be useful exploratory tools when used responsibly. The key is to treat them as hypothesis generators, not hypothesis confirmers.
Always validate patterns found in embeddings using other methods:
The research makes one thing clear: our visual system and cognitive biases make us sitting ducks for embedding deceptions. We see patterns where none exist, sometimes create stories to explain randomness, and remain confident in our misinterpretations. This is important, especially if real decisions about research and resource are to be made on the basis of how people interpret these figures.
On the technical end, we need better tools that communicate uncertainty and parameter sensitivity. Interpretively, we shouldn’t think of embedding visualizations as persuasive devices but instead as exploratory tools that we treat with appropriate skepticism.
The ability to create beautiful visualizations is not the same as the ability to reveal truth. Sometimes, the most honest thing we can say about complex data is that it’s complex—and no amount of algorithmic magic will change that.