Archive for November, 2008

The semantic web is stupid

For whatever reason I read some blog post about the “semantic web”. Often, and my even greater annoyance, people capitalize semantic web as if it were a proper noun. It is not even a properly thought out idea, let alone a proper noun.

What is the semantic web anyway? Well the notion is that the internet is full of “stuff” most of it unstructured and hard to understand. Like web pages which contain text and images. So the semantic web is some sort of web of semantically connected ideas. The origin of the name is from Sir Tim Berners-Lee, the inventor of the World Wide Web (which in contrast to the semantic web, actually does exist as a unique entity and thus deserves its capital letters). I don’t hold Berners-Lee in any particular high regard, and at the time (1999) he suggested there would be a “semantic web” separate or extended from the WWW the internet was a much stupider place and much less interconnected. Now most good sites have APIs, cloud computing is an everyday reality, and many people spend a significant part of the day inside web apps, which may use HTML as an interface but still are based on structured data.

So a combination of two things has happened in the past decade: the semantic web has happened without any lame W3C standards to guide it, and the semantic web is a dumb idea that will never exist and is used by a wayward companies to gain traction in a crowded market.

What really happened
First of all, let’s address why the semantic web has already happened. It’s called APIs. The real problem that the semantic web addresses is that there is little portability of data across the web. APIs of all types and interoperability solve this problem, but not through some grand plan of the W3C. Hilariously, the semantic web stack starts with “User Interface and Applications”, and then “Trust”. Those two items are basically most of the Internet. Worse than that, below this is “Proof”, a clear sign that this is in the dark realm of academics. If you want to create a Facebook app that shows you a map with travel videos from all our friends, you can use the Facebook, YouTube, and Google Maps APIs to achieve this. Why do you want to port everything down to RDF or whatever the hell the semantic web specifies and then build it back up? And with cloud computing tools that have great toolkits already built for them, you can manipulate the data in whatever language you’re using. You simply don’t need some lowest-common-denominator tool.

The second item, that the semantic web is a stupid idea is ultimately the real reason there isn’t and never will be a semantic web. So far all the specification work has been at the bottom but most businesses haven’t been leveraging this work, and yet a scant few actually claim to be “semantic”. Nearly all of these perform some sort of natural language processing, or NLP. This basically means that they analyze text (usually only English) to extract meaning (semantics) and then send that somewhere. This semantic analysis has far more history than the semantic web, which perhaps indicates why people want to associate semantic analysis with the semantic web. Good examples of companies applying the “semantic web” tag to their work are Powerset, Spock, and Tripit.

Powerset
Powerset is an NLP-driven search engine. I personally find their work within the wikipedia corpus to be pretty impressive, but there isn’t any demo outside that corpus because of the precise problem that it is hard to make NLP work for all domains. Semantic webbers might argue that if everything were to publish in RDF (aka do all the work of classifying the data and tagging it) then it would be simple to make Powerset’s NLP work everywhere. That assumes that Powerset isn’t really flexing its NLP muscle the way it claims to be and is instead relying on Wikipedia’s consistent structure. The reality is that Powerset has trouble applying NLP to the web as a whole because of the reality of NLP analysis: it is very compute-intensive and complex and does not scale at all.

Why Powerset isn’t semantic
It’s tough to know what Powerset’s algorithms are actually doing but it’s clear that they are doing a lot of work on the wikipedia corpus. However I think they stop short of extracting meaning. They use the same techniques as any search engine to find your answer, and on top of that they use very basic and ineffective summarization. My single query to Powerset hints that they are trying hard but still not very far: “How many countries are there in the world?” yields a reasonable article as the first result: List of Countries. And the answer (one of the many possible answers anyway) is right in the article, but it’s not in the snippet under the search result but it can’t (using the right sidebar) find the answer on the page. Google, on the other hand, puts an answer in the snippet along with an indication that the answer is ambiguous. That’s because Google relies on information being duplicated across the Internet and assumes that somewhere someone will have phrased the question in the exact same way you have, and other people will link to it, so you’ll get the right answer. Powerset doesn’t have faith in people and their behavior, it places much greater faith in its machines’ ability to analyze text and pull out answers. That faith is misplaced, at least for the time being. At some point in the future machines may be able to answer questions by understanding the semantics of the question and all the information on the internet, but not today.

Spock
Spock and Tripit are similarly limited in domain. Spock is a “people search” that apparently thinks I am 51 years old and live in Nicholasville. Neither is true, but it’s scrapping through a very small number of sites trying to find structured bits of data to tie together and present to me. To say that it is useless is to be kind to it. It is absolutely filled with ads, and devoid of useful info. A google search tells you far more about me than Spock, and better yet doesn’t seem creepy. I’m not 51, but if I was, is that what you’d want to know about me?

Why Spock isn’t semantic
Spock looks through a few sites which tend to have people on them, looks in the typical spots where interesting points of data are and then constructs a profile. The best reason I can give why this isn’t semantic is that there is a ton of data on the net about me. You can quickly find out that I had trouble with an AIC7xxx driver for Linux in 1999 if you’re interested, just use Google. If you go one deeper and figure out all my aliases (not difficult) you can unlock reams of information. Spock doesn’t do that because it is stupid. It may aspire to actually construct a semantic profile but right now a human being and Google can do far better with fewer ads.

Tripit
Tripit is the only thing I vaguely like although I don’t use it. Basically you forward all your travel emails to tripit and it scraps them and combines them together. So if you are flying to Chicago, staying at the Hilton, and renting a car from Enterprise, it will tell you that in one place.

Why TripIt isn’t semantic
This is supposedly semantic because it extracts the text of the email and figures out where you are going and when. I think that stops pretty short of “semantic”. It knows a bunch of places and formats for dates and it scans the email for dates and places. I sent Tripit the plans of the trip I’m currently on and it didn’t combine together the hotel and flight, so I have two entries. It even has dates, one says San Francisco, and so does the other. Perhaps it expects that I will be in the hotel for the entire time when I only have it booked partially. That reason is that it has no clue of the “semantics” of a trip to San Francisco. If it can’t even combine a flight and hotel stay, good luck with understanding anything more esoteric.

Will anything ever be semantic
My gut feeling is that over time we’ll be able to leverage NLP and machine learning in more clever ways, but I actually don’t believe that it will be based on any type of semantic tagging, but instead loads of data and loads of processing time using relatively unsophisticated algorithms. Google has two parallel mechanisms for connecting search queries to the (regular) web: results and ads. Results are generated from analyzing the link structure and ads leverage the principles of economics and scarcity. If I want “Tumi T3 luggage” and I ask google for it, by damned, I get it. Google doesn’t need to know what that is ontologically (as in classifying Tumi as a manufacturer of luggage, t3 as a line that Tumi makes and using luggage to reinforce the previous two classifications) but it does know that there are images of Tumi T3 as well as a load of sellers who are willing to pay to be in front of me when I type “Tumi T3 luggage”. Simply put, there’s no additional value in knowing the semantics if you can provide me good links without them. I simply know of no situation where this sort of semantic information is hugely useful and I challenge someone to suggest one.

Finally I think the entire idea of structuring the data of the web to be more machine readable is a fantasy by lazy academics. Google has done fine without such structure and it’s not clear to me that it would any better with said structuring. Further if you are, say, Delta, there is little incentive for you to use some lame duck format like RDF to make it easier for TripIt. You want to make your customers happy, not TripIt. Customers want email and web sites, and care very little about RDF. If you do have data you want to share around, you create an API and require people to use it to access your apps because that puts the onus on them if they want to convert it out of some format convenient to you to a format convenient to them (including RDF).

It’s sad to say that the semantic web is empty except for academics and wishful thinkers, but that’s what happens when you take what one guy says too seriously. You end up chasing the rabbit down the hole without checking to see if anyone’s following or if it’s even worth the time.

Comments

Why is the electoral map still so red?

Ok, this is fast turning in to a political blog. I’m trying to talk about interesting things, not just the political mudslinging that appears on the rest of the Internet.

This is for my foreign friends who are essentially baffled by the electoral college and electoral map. I personally find the electoral college system a very good one, save one little fact: electoral college members are under no obligation to vote with their state’s popular vote. This is a very thorny issue, but I think that at some point in the future it will become central to some electoral dispute. It was exactly the founders’ intent that the electoral college be independent of the people, and frankly at this point in the nation’s development, the concerns that led to this compromise (about dumb citizens, mob rule, and a deluded public) may be not be out of the question but they run against the spirit of democracy.

At any rate, the map as it appears on most websites is overwhelmingly red. This represents the fact that the center of America is both more conservative and much larger in area than in population. The electoral college assigns votes based on the total number of representatives in the House and Senate.

M. E. J. Newman in Michigan has created some cartograms that stretch the map based on population instead of based on land area. The most detailed of these (at the district level) clearly show that cities tend democratic and rural areas tend republican while suburban areas break either way.

Sized by state area equal to electoral votes

Sized by county area equal to population

Sized by county area equal to population, with shades of purple indicating close votes

If you’re familiar with America, the second map above basically has all cities, even those in the “big red middle” showing up blue. The states that voted red were often states without a significant metro area.

Comments (1)

America, a change is gonna come

I was walking out of the Hauptbahnhof, the train station in Zurich, at about nine in the morning. Jane and I had taken the night train from Florence back to Zurich so I hadn’t stayed up all night waiting. The respectable broadsheets all had publishing deadlines before the election had been called, so they only published ambiguous stories about the election, not being able to call a winner. To my surprise though a winner wasn’t even on the sandwich boards they announce the papers’ headlines on. So I walked out of the train station in a near gallop to get to the house to turn on CNN, which happens to be the only channel in English on my TV.

I was waiting to cross the street when Jane pointed out a paper held by another man’s waist. It had a picture of Obama and a German headline, which included the word president. My heart sprung forward several beats. I couldn’t really trust this paper, it was free, called “20 Minuten” (20 minutes, i.e. the amount of time it would take you to read it), and generally contained email amounts of foam about fashion and sports as it did real news content. I also figured that it would have been printed before, not after, the presses printed the real papers. But the idea was in my head, it was at least a race close enough to call for Obama.

My fear for the entire election would be that this good man would implode. I have watched politics long enough to be completely jaded by the process, and never before had it been so explicit how the sausage was made, and yet Obama had never really been tarnished. I have a habit of couching my hopes in negativity (”He’ll be elected by everyone but the voters”) and even in the last week I had still had a pretty negative view of my fellow citizens (”They had better do what they say for once” was my response to good polling, and “He’d better fucking win” more frankly).

I wanted to ditch Jane and run home to confirm, but being wise in the ways of keeping myself out of trouble I walked at an agonizingly slow but reasonably brisk pace. Jane is afraid of crossing streets in Zurich. I got home, dropped everything on the floor, and at about 9:15 I turned on CNN and he won. I had thought about my reaction, I figured I might shed a tear as I assumed a lot of people had done. Or maybe hug Jane. Instead I just said something really lame, like “He did it”.

But inside it was the most fantastic feeling, it was the feeling of waking up refreshed, throwing open the curtains and feeling warm sunlight streaming in. I thought a lot of things later in that week. I walked down the street, and literally, smiled at the thought of being American. Maybe this is pathetic, but when was the last time somebody did that? I thought that finally we could do something with our country that I would be happy to defend instead just defending it because I thought I should.

For most people, the moment was more like “I had been watching the television all night and Obama got more and more states, and then at 11pm, he was predicted the winner.” So for most it may not be the generation-defining moment, because it was the result of a long buildup, without the sudden surprise. But for me it came almost at once, and I will always remember where I was when Obama was elected President of the United States.

I buy completely in to the story of America. I’m one of the most American people I know. I’m also extremely skeptical: of this choice, of myself, and particularly of the proclamations of messianic glory that surround our President-Elect. What I can say is that for the first time, as an American, I feel like it’s not business as usual, like it’s not going to get worse before it gets better. Most of all, I’m proud to be an American.

No one I in my family supported Obama and we’re an extremely political family. I am the black sheep in many ways. I live abroad, I am better educated, and I am more “intellectual” in a family where it might seem that being intellectual is related to being dishonest. Although none of them voted for Obama, none of them seem particularly disheartened that he won. Had McCain won, though, I would have been devastated. I was nowhere near as engaged in the past two presidential campaigns though I was extremely disappointed in the results.

I can’t help but think that this is a victory not for my parents or my grandparents. It is not a victory for any number of demographics who, while not necessarily opposed to the idea of an Obama presidency, have little interest in his victory or defeat. It is a victory for those of us who have never had an interest in the political process until this election, and who have never considered politics as a vehicle for change. That is social, political, and economic change. We have been lead to believe that government can provide little, because it represents someone else and is represented by someone else. Too long have we been convinced that the mechanisms of democracy themselves mean that we will never be satisfied, that we would receive less than a compromise. We have believed that government is equal to gridlock, inaction, and apathy.

So far as my political history tells me, this apathy stretches back to Nixon v. McGovern. The chance for change was thwarted by smoke-filled politics of the worst kind. Devising ways to slice Americans in to special interests, figuring out what every persons’ hot button issue was and courting votes by welding together a bunch of basically incompatible hot-button policies that would ensure victory, this has been the major political accomplishment of the post-Kennedy era. After the inspirational Kennedy dynasty fell so quickly there was no sensible opposition to this divide and conquer strategy. So we had Nixon. And we had LBJ. And we had Carter. And Reagan. And Bush. And Clinton. And Bush. All of them have been cut from the fabric woven by Nixon and his divisive politics. No democrat, certainly not Gore or Kerry, was able to divide people in any comparable way so Republicans usually won. But it wasn’t for lack of trying on the Democrat’s part. It took a once in a lifetime blundering of executive power, George Bush’s presidency, to open the way up for someone like Obama.

Obama represents little to my family. They are disengaged from national politics. My parents were in primary school when Kennedy was assassinated, and they may have sense of the mythos but they knew little of the politics surrounding him. Their experience of federal power has been of continual disappointment. So much so that any government or attempt to better the nation through federal power is wasteful and not worth attempting. He may represent change, but not for the better, because they would never believe that any president could change anything for the better.

For me, for my generation, he represents our chance to take hold of government and make it function for the people once again. After years of pathetic and petty politics, we are beyond outsiders, we are the completely disenfranchised. We see our vote not as a birthright but as some sort of free prize that comes included in our US citizen happy meal. After the gross devaluation of our votes in the 2000 and 2004 elections, we have considered government worthless and our votes as well. But we’re young enough to still hope, and young enough to realize that we actually should care. After months of having it drilled in to us that Obama = change, we finally believe that if we can vote just once more, it might actually be worth something.

That is what it is to believe in this man, as a young American. We desire and we desparately need government to be worth something. Our identity as a people and as a nation is intrinsically linked to it. It is clear that there is little difference between Clintonian and Reaganian politics, between Democrats and Republicans. The young believe that essentially both parties get screwed both ways, that it is the essence of a mud fight with other people’s money: everyone ends up dirty. All we do is become more divided, and agree on less and less until there is nothing any two Americans can discuss any longer without either completely agreeing or disagreeing.

This is to say, I have great hope for Obama. I also have great fears that he will squander his chance to create a new political climate. After the twin fists of the Republican and Democratic parties pummeling the honor and soul of public duty to complete emptiness, perhaps no single man can change American that much, or that quickly. But he might.

And that is the olive branch that I extend to my family, to those who voted for Obama and those who didn’t. Like him or not, he is the once-in-a-generation chance we have to make public service serve the public. And I believe that specifics aside, politics aside, it is an undertaking that all Americans should support. Whether you agree or disagree with his politics and policies, give him the chance for change. And let him preside over the nation with the honor that his position deserves.

Comments (1)