BLOG

Archive for May, 2008

Google and language evolution

May 12th, 2008, Discussion, 9 Comments

Language traits and fashions advance extremely quickly and if left alone, seem to be one of the rawest, most observable forms of cultural or memetic evolution. Language also seems to be the facet that we hold the most dear to our self identity and any drift is immediately heralded as a decline in standards.

Various self-appointed mavens frequently take the moral high ground on how language ought to be, and you only have to question the general population to discover that perceived language erosion by the younger generations is top of the threat list in how they feel alienated from their own species in later life (related post: progression).

But yet scanning the etymology of any given word reveals a rocky and fascinating history and any golden age of language is of course immediately debunked. Someone’s God be with ye is someone else’s Goodbye, which is yet another’s Bye that is by now something I probably don’t understand. What we may consider slang is actually highly evolved language reduction. Just think about how much emotion and meaning can be conveyed by the shortest and “dumbest” idioms that seem to flow out of the USA! Genius!

There are two approaches that governments take to language: dictation or reaction. Ownership of a language by a governing body seems to be the memetic equivalent of eugenics; an attempt to control and command hereditary traits of something that no living being can possibly judge. Blonde hair and blue eyes are the best you say? Hmmm.

The French are of course famous for their stringent L’Académie française. Here the appointed members (knows as “immortals”) scrutinise daily life for signs of decay while cleansing society of all foreign loan words. Danish and Norwegian are very similar languages, expect the Norwegian Language Council decided to invent new “Norwegian” words for every part of the microcomputer, while Denmark’s own body, the Dansk Sprognævn, is more than happy to let CPU, RAM, bits, bytes and indeed “computer” itself though the iron curtain.

The difference here is that Denmark’s bureau appears to understand their role is to document and record the naturally occurring phenomenon (their main objective: “new words which have appeared enough in print and speech to be considered notable are added to the Danish dictionary”, but note that this doesn’t stop the population’s sky is falling reaction to the recent American-English overload they are experiencing).

So while a country taking ownership of its genes or planning its economy is generally considered morally dubious or fascist, dictating totally irrational language policies is still rife. Just check out the list of the world’s language regulators. Of course, in reality, language dictation can never have the reach or control of eugenics or communism in the countries we are discussing (although that didn’t stop the Welsh from trying to dictate their own suicide), but that just highlights further how futile their purist approach is!

English appears to be relatively unique because not only does it have no dictatorship, it also doesn’t have an appointed body. Whether its touted rise as the first “global language” is because of this, or a consequence of it being so wildly distributed in the “free-world” that it’s impossible to control or monitor (although France seems to try hard with French) is a topic for debate. But it seems clear that its sheer diversity and richness can in some part be attributed to the cultural freedom it has received.

The nearest that British English has to an authority is the Oxford University Press whose dictionary is the result of a long-running mission to “record the word’s most-known usages and variants in all varieties of English past and present, world-wide”. More like an ornithologist than a genetic engineer then.

So, how does this analysis of linguistic imperialism and study relate to Google?

The internet is fast approaching a tipping point where it will contain almost all human knowledge, past and present, in textual form and from a multitude of different authors and viewpoints. It’s only a short step to proclaim that this can be considered a complete data bank of language. Google therefore, as the world’s leading organiser of this data, has on-tap access to the historical sum of human language, limited only by the integrity of their algorithms.

Their seemingly benign, but useful, “Did you mean” feature (the one that corrects your spelling errors and lazy typing) works on a simple premise that is made powerful by its knowledge rather than process. Unlike a typical computer spell-checker, which works from static word lists, “Did you mean” compares similar phrases to the one entered to see if they might produce more search results. Because it indiscriminately uses occurrences of all words on the internet, it can find common usage spellings for proper nouns and slang, and remember common usage is what is important for language norms at any given time.

The service is therefore essentially a rapid, constantly updated, language usage analyser that is performing an automated version of the Oxford University Press’ mission, only on a scale unimaginable in a manual world. The natural reason that “football” is not “foot ball” is because of usage frequencies, whether or not dictation played a part in the past. It’s also the reason why “dubstep” is not “dub step“.

As the information age takes hold and language enters a free-fall state of growth due to the thirst for global communication, hopefully it will shake free of its oppressive regimes and the more archaic forms of language planning, to join eugenics on the list of ethical horrors and pseudoscience.

Launch of a new website design

May 11th, 2008, General, Good things, 4 Comments

I’ve had a personal homepage on the internet for over ten years now. If I can remember correctly my first site was hosted on Cardiff University’s computer science server in 1997 and featured a web calculator written in Perl. I soon progressed to something designed in CorelDraw that was littered with bevels, embossed lettering and lens flare. I was proud then, but hindsight is a painful thing.

Anyway, I am pleased to launch the latest iteration of my website today. It retains the content of my previous blog, but with a fresh new design (with thanks to Casper), a little more information about some of the projects I’m involved in and big hopes for the future as I attempt to angle the content more towards a discussion platform.

For those of you reading this via the RSS feed… now is the time to actually visit its home. For those of you browsing by… syndicate here!

How to get the most out of your taxes

May 8th, 2008, General, 1 Comment

With my final days in Denmark rushing by, I started to think about all of the high Scandinavian taxes I’ve paid. It seemed a shame not to experience the wonderful welfare state I’ve been contributing to for three years, so I decided to start cultivating abdominal pains over the last few weeks.

By last Monday morning I finally hit the jackpot: shooting cramps, nausea and a tender spot right by my appendix! After an emergency appointment with my doctor I was duly shipped off to Bispebjerg Hospital for 24 hours of blood tests, ultrasounds and CAT scans. After everything very serious and mildly serious was ruled out, the final verdict was an inconclusive guess that maybe a hernia operation I had three years ago had become aggravated. Oddly it turned out that said operation was exactly three years ago to the very day. I can’t help thinking that some guarantee ran out.

“Bispebjerg Hospital was designed by the architect Martin Nyrop and covers 48 acres. During the German occupation of Denmark 1940 to 1945 in World War II, the hospital treated those illegally resisting the occupying forces, harboured jews and helped transport about 2,000 of them to safety in neutral Sweden.”