Daniel Bashir

some notes on value alignment

Lewis Mumford:

To perfect and extend the range of machines without perfecting and giving humane direction to the organs of social action and social control is to create dangerous tensions in the structure of society (Technics and Civilization, 367).

I’m a big fan of Iason Gabriel’s work. He brings a much-needed lens to questions about how AI systems will affect our social structure and the distribution of goods and resources, and whether they will be used in ways conducive to social and moral good. If we believe AI systems might become a foundational part of how we make decisions and organize parts of society, the work to understand what the resulting effects will be and how we ought to develop our social contracts and systems of governance in the face of those changes is vital. In particular, Gabriel cares about value alignment.

Gabriel understands value alignment in two separate parts: the technical challenge of trying to align AI systems with human values, and the normative question of what or whose values we try to align AI systems with. In “Artificial Intelligence, Values, and Alignment,” Gabriel considers six potential goals for alignment (it is worth noting that this is focused on one-person-one-agent scenarios): (1) instructions, (2) expressed intentions, (3) revealed preferences, (4) informed preferences or desires, (5) interest or well-being, (6) values. On the final goal, Gabriel makes the interesting point that, in practice, AI would have to be aligned with a set of beliefs about value as opposed to value itself. That values play an important role in social life makes Gabriel confident that alignment with a community’s moral beliefs is a good target.

Across his papers, I see Gabriel’s normative center of gravity as liberal egalitarian. Rawlsian justice is applied to sociotechnical systems: in his essay “Toward a Theory of Justice for Artificial Intelligence,” he argues that AI is now part of the basic structure of society. In Rawls’ thinking, this basic structure encompasses how the major social institutions (the organization of the economy, the nature of the family) cohere in a single system, how they assign basic rights and duties, and how they shape the socially-mediated divison of advantages. In Gabriels’ reckoning, this basic structure is a composite of sociotechnical systems. Because AI shapes the fundamental institutions that compose the basic structure of society, they are subject to egalitarian norms of justice.

Another theme in Gabriel’s papers is the move towards identifying a fair, broadly endorsed way to develop principles for the alignment or governance of AI systems. With the recognition that there often are reasonable disagreements between different communities, individuals, or value systems—disagreements tht cannot be resolved by appealing to shared first principles—approaches that can be recognized by all as procedurally fair (even if not all parties are equally satisfied by the outcomes of those procedures) seem just in an important sense.

While Gabriel’s focus on procedures for identifying fair principles feels reasonable, he largely brackets who controls the procedures. If we did identify fair principles (via something like the Veil of Ignorance—which I believe is not practical—or overlapping consensus), who would implement those principles? Gabriel warns against “value imposition and domination” and emphasizes “non-domination” as a criterion. I believe that his thinking would successfully cash out in the deployment of real systems if we modeled relatively benign actors who want to do the right thing.

I also worry that fair procedures, while nice in theory, may be difficult to implement when people have incommensurable fundamental commitments. Even such procedures may be difficult to resolve when people come to the table with fundamentally incompatible views and foundations—in this case, control of procedures becomes even more consequential because there is no neutral ground.

Consider economic distribution: a frequently posed question is, should AI-driven automation require redistributing gains to displaced workers? A libertarian might say: no, because property rights are invaluable. A socialist might say: yes, automation’s benefits should be socialized. On content moderation, a US conservative might say AI should not filter “hate speech” because this violates free speech absolutism. A European social democrate might say it should, because dignity trumps unlimited speech; an Islamic scholar might also say yes, but use a different definition of prohibited speech.

The worry about incommensurable values points to a deeper question about whether alignment frameworks assume too much universality or aggregate coherence in values. I think an interesting perspective on Gabriel’s thinking comes from Yuk Hui. Known for his ideas of “technodiversity” and “cosmotechnics,” Hui argues that there is not one universal technology, but multiple cosmotechnics embedded in different cosmologies and philosophies.

A way I like to frame this question is: had the technologies we know today (in the US, for me) developed in a society with totally different social and political foundations, how would they look? In what ways would we admit them into our lives? In a move against Heidegger, Hui argues for different relationships between technology and cosmos/nature depending on cultural-philosophical traditions.

For example, in this LARB interview, Hui discusses 天下 (tianxia), a cosmotechnics that legitimized government through cosmic alignment, not procedural fairness. As cosmotechnics of this type is not possible in a time with no conception of “Heaven”—that is to say, if we do not recognize the heavens as a morally legitimizing power, we cannot properly conceive of the values that would hold sway over us under 天下. This sort of recognition influences how we construct meaning from experience. Under such a cosmotechnics, the question of “alignment with whose values” might not arise in the same way. If technology is understood as harmonizing cosmic and moral order (and is developed in such a way), alignment would not be a separate normative problem to solve—it belongs to the conception of technology itself.

Looking back to Gabriel’s framework—which takes liberal-democratic categories as starting points—cosmotechnics says something about what is embedded in technologies before we consider alignment, conceptual translation difficulties across frameworks that don’t share basic categories, and scope limitations of individual philosophical approaches.

For a society like ours, Gabriel’s liberal egalitarian framework feels natural and appropriate. But understanding how technologies already embed worldviews helps us see what alignment projects can and can’t accomplish. Alignment efforts work downstream of fundamental choices about how technologies are conceived and built—by the time we ask whose values a particular system should reflect, we are often already operating with a system that embeds values and choices.[^1] Tom Mullaney’s work in The Chinese Typewriter shows how the very concept we develop of a technology constrains what it can do, and for whom. In alignment, then, we work within constraints that reflect not just technical choices but entire ways of relating technology to human life—ways that procedures alone cannot remake.

An aside / post-script: I really like Hui’s use of the Kantian antinomy in this essay. (1) Technology is anthropologically universal as the exteriorization of memory, and (2) technology is not anthropologically universal because it is conditioned by particular cosmologies. (1) and (2) are each insufficient, and their tension is productive.

[^1] to be clear, Gabriel understands and considers this in his work—he would be one of the first to say that our development of technology should be guided by inputs that represent different people’s values. even then, our imagination of technological possibilities is constrained by a history of value judgments, and we are never operating from a “neutral position.”