Posts Tagged ‘meaning’

Back in the 60′s and 70′s of the last century, the Whorfian hypothesis was a hot subject on college campuses. This was the idea that one’s native language, its syntax and semantics, strongly shaped one’s worldview. For example, Eskimos speaking Inuit supposedly had thirty different words for snow and so had a more complex relationship with their environment than someone speaking English with only one word for snow.

The problem of course is that skiers can make plenty of distinctions about kinds of snow even in English. Despite Whorfian hypothesis being theoretically attractive, it did not square in the end with our actual experience with language. That pretty much took the steam out of the Whorfian hypothesis, but now in the 21st Century, empirical support has been accumulating for a weaker version of it. This was the subject of an article in New York Times Magazine (http://nyti.ms/boqzs5).

The weak Whorfian hypothesis rejects the idea that language establishes an absolute limit on thinking. Thus we can learn about distinctions in types of snow if we really need them. The structure of a language, however, definitely can bias our thinking; and this could have consequences in practical matters like the ranking of retrieved documents. The choice of a particular semantic framework like RDF may therefore affect the performance of an information system in unexpected ways.

So far, experimental results on language and thought have focused on highly specific biases in areas of language like giving spatial directions, assigning gender to nouns, and dividing the spectrum into colors. It seems plausible, though, that this should generalize to the overall semantic problem of dividing up meaning into some kind of compact space. There is more than one way to skin a cat here, and there are probably advantages and disadvantages in each possibility.

A dogmatist might be tempted to argue here that RDF with certain standard taxonomies is the right way and everything else is wrong, but that is probably overreaching. We are not yet savvy enough about semantics to carve tablets in stone about its implementation. At present, one can say only whether a given scheme is optimal in some formal sense; but if it makes no obvious sense to people, then something more comprehendable might be better in the long run even if it is less than optimal.

The weak Whorfian hypothesis forces us to be more honest. If each semantic scheme introduces its own biases, then we need to experiment to see how different approaches work out for a given target application. Given that humans operate with more than one linguistic framework, we should not be so quick to assume than machines can do better at semantics with just a single framework.

Basics

5 Oct 2010

Linguists have long debated whether human language ability is innate or is simply learned by highly plastic neurocircuitry of a general sort. Recent studies with fMRI scans indicate, however, that cognitive skills like language understanding tend to be associated with highly specific brain locations across different individuals, supporting the idea that some kind of language-related structures exists. Studies of people impaired by strokes occurring in language regions also have shown this.

So when a young child learns that Mama is related to a concept of MOTHER, which applies to more than a single individual, this seems to draw upon specialized builtin logic within the human brain. This kind of symbolic capability is not unique to humans, being found to some extent in other large-brained social animals like elephants, whales, dolphins, and chimpanzees; but we certainly have more of it. This can seen in the relative size and organizational complexity of human brains.

The implication here is that concepts like MOTHER, BIRD, HOUSE, or FOOD are real in some sense at the genetic level. We of course do not necessarily all learn the same particular concepts; for example, speakers of different languages in different cultures can be expected to develop divergent concept frameworks. Nevertheless, it is possible to translate between unrelated languages like Inuit and English, meaning that there is still a large overlap in their lingistic repertories of concepts.

Consequently, when we technologists talk about incorporating semantics into search engines and other applications, we need to remember that semantics existed a long time before the first boolean electronic circuit and that what we call “semantics” should be consistent to what goes on in our own heads. This is perhaps only a marketing concern, but the business of selling semantic technology will be that much harder if we cannot agree on what we really mean.

The concept of CONCEPT would seem to be a focus point for semantics that everyone can grasp. Whether we approach language and meaning like Wittgenstein or like Russell or like Korzybski or like Chomsky or like Miller or like Berners-Lee, it helps to get grounded properly.

Learning

21 Dec 2009

Consider how we humans learn language. Even with formal education, it takes a child about 15 years starting from infancy to be able to read and understand general news articles in the New York Times. Over this period, one would probably hear or read at least on the order of 10 billion words. Even so, most high schoolers will need many additional years of schooling to become able to comprehend technical material.

So, how can anyone expect a computer to understand something like medical text after training on only about 100 million words of data? A computer of course runs on nanosecond cycles while the human brain operates on millisecond cycles; but we have had about 50,000 generations to evolve our language software, while the electronic computer has had only about 10 generations.

The bottom line here is that language learning is difficult; and it requires sifting through immense amounts of data. There probably is no magic technological shortcut here, but we have reached now the stage where our systems can routinely handle the volumes of data that would support semantic capabilities equivalent to an 8th-grade education. Decent commercial language processing tools are also now available.

Consequently, we are making major progress on semantic dictionaries, but have to be realistic about the work still ahead of us. Expect no overnight miracles from us or anyone else, especially when these are based on measly samples of data. There is still no royal road to semantics.

In Chapter 8 of Lewis Carroll’s “Alice Through the Looking Glass,” our intrepid logical adventurer is talking to the White Knight, who wants to sing to her. He says, “The name of the song is called ‘HADDOCK’S EYES.’”

It turns out of course that the name of the song is really “THE AGED AGED MAN,” though the song is actually called “WAYS AND MEANS.” The confusion here about naming is quite understandable to anyone who has ever ordered TenderSweet™ clams at HoJo’s and discovered that they are neither tender nor sweet.

All of this would be hilarious except that we have to build semantic dictionaries that must deal extensively with the meaning of names in text. This problem will take a while to talk about adequately; and so please tune in tomorrow.