Thalience and the Semantic Web

There's an interesting science fiction book called Ventus, a world created by nanotechnology, ruled by sentient nanotech called Winds that infest every blade of grass and keep the planet stable and earthlike.

The catch (SPOILER WARNING) is that in the process of creating this world, the Winds had to revise their semantic concepts several times over so that they could be up to the task of managing planetary ecology. When the colonists arrived on Ventus, they discovered that the Winds didn't understand them. The Winds are speaking Thalience.

The interesting parts of Ventus (minus the obligatory Vernor Vinge space opera stuff) is that it specifically mentions Wittgenstein's concept of language-games as a prelude to Thalience. Thalience is not simply a language of words and definitions. It is a set of assumptions; the concepts used to speak are used for belief systems as well. Thalience is the operating system and the protocol used by the Winds. In fact, there are two competing language-games at work on Ventus: Meditation, the original language-game used by the colonists is still being used by some of the Winds and works out a compromise with Thalience that prevents them from being wiped out completely.

The end result is strangely parallel to the work of Slanislaw Lem in Solaris or His Master's Voice -- the inhuman being are clearly sentient, but even identifying common ground is impossible despite repeated attempts at communication from both sides. There is an even greater irony in the implication that Ventus is only possible because of the evolution of Thalience.

It's a mistake to think that the failure of Thalience to understand the colonists is a restriction on the language itself. Even Orwell's doublespeak would allow Winston to say "Big Brother double plus ungood" -- the whole point of 1984 is that it would never occur to Winston to say that. Doublespeak is a stylistic trick by Orwell to replace semantics with syntax, simplying the words themselves rather than the concepts to which they refer. Similarly, the semantic content of Thalience is what causes incomprehension at the higher layers rather than the Sapir-Whorf hypothesis.

One of the fundamental constructs of Thalience is the mutability of identity. To humans, who have concepts thousands of years old for naming and identifying the world around them, from Eden onwards, it would be a little unnerving to have trees, rocks and soil identifying themselves. However, this is precisely what the omnipresent Winds must do; as the form changes from sand grain to rock to brick, the Winds exchange information about what they are and what they do. When there isn't a clear word for what they are, they invent one and scatter consensus information about it. The end result is that while a human can pick up a sand grain and talk to it individually, he has no hope of talking to the desert, which is the communual intelligence of billions of sandgrains and an uncountable number of nanotech entities.

The same problem of communication occurs in human cultures all the time. Hacker culture has enough in common across the world to form its own dictionary. Even hackers in non-English speaking countries use language from the Hacker dictionary because the meaning is more precise than the native language allows. Conversely, Americans and British people frequently run into problems and misunderstandings despite supposedly speaking the same language and using having similar cultural values. This is generally known: it's why international diplomacy exists, and why it's so tricky; what North Korea means when it talks about "diplomacy" and "reunification" are radically different concepts from what most people would assume it means.

The reason Thalience is important is because it's true and it exists. Once we have Artificial Intelligence capable of redefining semantic concepts and creating new ones, there is no guarantee that those concepts will be understandable by humans. In fact, it's highly likely that they won't be. We'll have to use diplomacy to translate between its language, and ours. The current situation is even more frustrating, because in order to create a useful semantic web there has to be a machine understandable set of human concepts; and we don't have one single set of concepts.

The problem is that people have different words for the same thing, and use the same words for different things. Mathematical concepts are easy to agree on. Words like 'author' or 'title' are harder, but can be categorized due to the work done in Library Science and accessed through Z39.50 and Dublin Core. But cultural concepts like "liberal" or "date rape" are subject to massive amounts of cultural revisionism that make definition almost impossible. One of Feynman's stories is about a group of philosophers talking about Whitehead's idea of the 'unseen'. They're all using abstract terminology and coming to various conclusions, and finally they turn to Feynman and ask "Is an electron an unseen object?" Feynman turns around and says "I'll tell you if an electron is an unseen object if you can tell me what an unseen object is." And lo and behold, all the philosophers have different ideas of what an unseen object is.

You can determine how hard constructing a semantic web is just by thinking about the concept of date. A date is a very simple, obvious concept that can be wrapped up in a ridiculous number of formats, and varies according to the locale and even month of the country (if you count daylight savings time). And yet, even in systems which are designed to be integrated together, I've seen messages described as "date field, time field, first name, last name" without any indication of the number of characters used, the seperators, or the date and time format.

For comparison, take Web Services. For people who have always wanted to know the stock price of a publically traded company, Web Services would seem to be the definitive answer. And yet, even with a protocol that has been reworked several times over, different implementations can't even agree on a common representation of integers. (Note: last time I checked on SOAP was June 2002, so things may have improved since then.) Interoperability is a very hard problem even at the simplest levels, which is partly why there are so many standards bodies floating around trying to define everything.

Imagine trying to define "punk" to a machine. That's the problem we'll have in ten or twenty years.

Now imagine a machine trying to explain a semantic concept as culturally dependent to machines as "punk" is to humans. That's Thalience.

Comments