how to lie with embeddings

13 Jun 2025

Incoherence seems to me preferable to a distorting order. —Barthes

One of the ideas I thought about a lot when studying metaphysics and that continued to find me everywhere since then has been Projectivism. In our perception and understanding of the world, we attribute to it structure that it doesn’t actually have. Hume’s argument against causation used a form of this diagnosis: we often perceive two events or a series of events in sequence—B always follows A—and imagine that there is some causal relation between A and B, so that A causes B. We never observed the causal relation itself, though, so Hume thinks we have no justification for believing in that relation. All that we observe is correlation.

Visualizations are useful for understanding data, but they blur the line between what the data actually show and the patterns we project onto data. Every visualization embeds assumptions about the data’s structure, and the challenge isn’t just validating our assumptions — it’s recognizing when we’ve convinced ourselves we see meaningful patterns that exist only in our interpretation and not in reality. This distinction becomes especially dangerous when the same dataset can be made to tell completely different stories depending on our analytical choices.

Some Research Behind the Deception

Before we look at experiments, let’s understand why these visualizations can be problematic. Several landmark studies have revealed some of our interpretive biases and misunderstandings:

Visualization Mirages

Albert et al. (2018) showed that even randomly sampled crime incidents produced illusory “hot-spots,” leading participants to re-allocate police resources. Using the VALCRI (Visual Analytics for Sense-making in CRiminal Intelligence analysis) project, the authors asked participants to evaluate whether they would increase police presence in one of two city districts, along with follow-up questions about how the data influenced their decisions and whether they could justify their decisions, given tools that showed spatial and chronological distribution of crime incidents in those two districts. They were given “random condition” data randomly selected from a large set of incidents, “pattern condition” data reflecting real spatial and temporal patterns, and presented with the data in an “interactive condition” where they could interact with tools to inspect incidents from different perspectives as well as a “static condition” where they could not interact with tools.

McNutt et al. (2020) later coined the term visualization mirages for silent but significant failures that arise at any stage of the analytic pipeline.

Cleveland & McGill’s Graphical Perception Hierarchy

Cleveland & McGill’s classic experiments rank visual encodings by accuracy: position ≫ length/angle ≫ area ≫ color. Because t-SNE and UMAP re-encode high-dimensional distance as area density and color, they push viewers toward less reliable perceptual channels—making misreading almost inevitable.

Position along a common scale (most accurate)
Position along non-aligned scales
Length, direction, angle
Area
Volume, curvature
Shading, color saturation (least accurate)

Embedding visualizations face three key challenges: algorithmic sensitivity to parameter choices, method-specific trade-offs between local and global structure preservation, and the gap between what the algorithms optimize for versus what viewers need to interpret. While humans are capable of perceiving patterns and relative distances, different algorithms make different implicit choices about which aspects of high-dimensional structure to prioritize—choices that can dramatically change the story the same data appear to tell.

A Few Experiments

Let’s look at how this happens through a few experiments. I’ll limit focus here to t-SNE and UMAP, two popular embedding methods. Both are powerful methods, but need to be treated with some caution, as I’ll hope to illustrate below.

Experiment 1: Finding Patterns in Pure Randomness

This experiment demonstrates what researchers call “apophenia”—our tendency to see meaningful patterns in random data. The algorithm parameters don’t just reveal structure; they impose some structure that isn’t there.

Research on ensemble perception shows a similar pitfall: observers can summarise large point clouds quickly, but their subjective confidence often diverges from ground-truth accuracy. Two examples are the survey of ensemble coding tasks by Szafir et al. (2016) and the “Regression by Eye” experiments by Correll & Heer (2017).

Wattenberg’s Distill guide to t-SNE (henceforth Wattenberg et al. (2016)) explains that the algorithm expands dense areas and contracts sparse ones—“cluster sizes … mean nothing”—and that lowering perplexity can manufacture clusters in pure noise.
UMAP’s own documentation warns that it “does not completely preserve density” and “can also create false tears in clusters” (UMAP-learn docs).
Kobak & Berens (2019) provide biological case-studies where such artefacts mislead interpretation, and Chari & Pachter (2023) show that, in large single-cell benchmarks, neighbor-overlap often falls below 0.3.

Experiment 2: Hyperparameters matter

Watch how the same data tells different stories based on parameter choices (“Those hyper-parameters really matter,” Wattenberg et al., 2016). Wattenberg et al. (2016) showed that t-SNE is so sensitive to hyperparameters that you can make data look like it has distinct clusters or continuous structure that doesn’t exist.

In this experiment, the range is a bit small (perplexity 2-50) and some of the variance only really comes out at higher perplexity values like 100 or so. But you can still see some differences!

Experiment 3: Confirmation bias / view selection

Running embeddings until one “looks interesting” (a real thing that happens!) is a visual form of p-hacking. A 2025 CHI study on confirmation bias in dashboard “data facts” shows that analysts overwhelmingly choose views that confirm prior beliefs and ignore contradictory ones

What the Research Tells Us

The research is clear about several critical issues:

1. Cluster Assumption

Wattenberg (on t-SNE) and Kobak & Berens both demonstrate that visually separate islands can be artefacts. Users assume that visual clusters represent meaningful groups in the data, but this assumption is often violated by dimensionality reduction algorithms. As noted in the t-SNE literature, visual clusters can appear even in structured data with no clear clustering, making them potentially spurious findings.

2. Stability Illusion

Wattenberg also notes that visual “stability” can be forced by parameter tweaking without improving fidelity.

3. Narrative Fallacy

Once users see a pattern, they create stories to explain it. Two papers from Cindy Xiong (2023 and 2019) and collaborators offer complementary findings:

Belief-biased estimates (Xiong 2023) Viewers who expect a relationship between two variables over- or under-estimate r-values by ≈0.1.
Causal story-telling from correlation (Xiong 2019) With the same correlational dataset, 33–39 % of study participants who saw a two-bar summary, and ~20 % who saw a scatter plot, wrote explanations that implied causation, despite being reminded that “correlation ≠ causation.” High aggregation (two bars) and grouped encodings produced the strongest causal ratings, while fully disaggregated scatter plots produced the weakest.

Implications for Practice

Here are a few evidence-based recommendations:

For Creators of Embeddings

Always show multiple parameter settings - Single visualizations are misleading by default
Report distance preservation metrics - Quantify how well distances are preserved
Use stability analysis - Show how consistent patterns are across runs
Document all preprocessing - Feature scaling and selection dramatically impact results
Provide interaction - Let viewers explore the parameter space themselves

For Consumers of Embeddings

Ask these critical questions:

What parameters were used? Were they chosen before or after seeing the results?
How stable are these patterns across different runs?
What preprocessing was applied to the data?
How well are distances preserved from the original space?
What would the visualization look like with different parameters?

Building Better Practices

The solution isn’t to abandon embedding visualizations entirely—they can be useful exploratory tools when used responsibly. The key is to treat them as hypothesis generators, not hypothesis confirmers.

Validate!

Always validate patterns found in embeddings using other methods:

Statistical tests in the original high-dimensional space
Domain expert evaluation
Predictive modeling to test if clusters are meaningful
Stability analysis across multiple runs and parameters
User studies comparing projection methods highlight that some layouts feel trustworthy yet score poorly on objective overlap metrics—see the perception-based evaluation by Etemadpour et al., 2015.

Toward Honest Visualizations

The research makes one thing clear: our visual system and cognitive biases make us sitting ducks for embedding deceptions. We see patterns where none exist, sometimes create stories to explain randomness, and remain confident in our misinterpretations. This is important, especially if real decisions about research and resource are to be made on the basis of how people interpret these figures.

On the technical end, we need better tools that communicate uncertainty and parameter sensitivity. Interpretively, we shouldn’t think of embedding visualizations as persuasive devices but instead as exploratory tools that we treat with appropriate skepticism.

The ability to create beautiful visualizations is not the same as the ability to reveal truth. Sometimes, the most honest thing we can say about complex data is that it’s complex—and no amount of algorithmic magic will change that.

References

Albert, D. et al. 2018. Effect of Clustering Illusion during the Interaction with a Visual Analytics Environment
Chari, T. & Pachter, L. 2023. The Specious Art of Single-Cell Genomics
Cleveland, W. & McGill, R. 1984. Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods
Correll & Heer, 2017. Regression by Eye: Estimating Trends in Bivariate Visualizations
Kobak, D. & Berens, P. 2019. The Art of Using t-SNE for Single-Cell Transcriptomics
Li, S. et al. 2025. Confirmation Bias: The Double-Edged Sword of Data Facts in Visual Data Communication
McNutt, A., Kindlmann, G., & Correll, M. 2020. “Surfacing Visualization Mirages”
Etemadpour, R. et al. 2014. Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization
Szafir et al., 2016. Four types of ensemble coding in data visualization
UMAP-learn documentation, Using UMAP for Clustering
Wattenberg, M., Viégas, F., & Johnson, I. 2016. How to Use t-SNE Effectively
Xiong, C. et al., 2019. Illusion of Causality in Visualized Data
Xiong, C. et al., 2023. “Seeing What You Believe or Believing What You See? Belief Biases Correlation Estimation”