Tuesday, April 28, 2009

A Sneak Preview of Wolfram|Alpha

I went to this Sneak Preview of Wolfram|Alpha: Computational Knowledge Engine at the Berkman Center today. It will be available "in a few weeks" here.

It was basically a demo by Stephen Wolfram himself. The first 40 minutes were spent watching him type in various queries and briefly show the results. Enter a query and it showed a few different pods of information. E.g., entering 2+2 showed 4 and then "four" and then a visual representation with 4 dots in a row. Other things quite often showed graphs. Some other things he entered:

gdp france
gdp france / italy
internet users in europe
333g of gold
5 molar H2SO4
decane 2 atm 50C
LDL 180 male age 40
567bp upstream of MSH3
ARM 5% 20 yr
D# minor
lawyers median wage
france fish production vs. poland
height of mt everest / length of the golden gate bridge
weather in princeton NJ when Kurt Godel died
huricane andrew
president of brazil 1922
tide NYC 11/5/2015
next total solar eclipse chicago

The important thing to realize is that all of these returned multiple bits of data in readable and usable form. The GDP queries returned lots of variations on that (per captica, etc.). The gene queries gave lots of data and diagrams. The stock queries gave all the other related info, not just the quote. The weather results gave various pages and event forecasts into the future. The stock graphs gave forecasts into the future! The ISS graphed the orbital locations of the space shuttle. It generated a nutrition label for "2 cups of OJ"

In general it was quite impressive. They've collected a lot of data, normalized it, get live feeds of some info and make it available to simple queries or more involved formulas. If it doesn't know what you mean, it makes and attempt and shows what it didn't understand or other options.

There were four big areas that they worked on:

Data curation - they have both free and licensed data, lots from feeds and incorporating them is partially automated and partially done by hand by a domain expert. He said they have "a reasonable start on 90% of a reference library"

Algorithms and Computations - implemented in 5-6 million lines of mathmatica code.

Linguistic Analysis - there are no manual or docs. It's different from the general natural language problem because they concentrate on short utterances. That may be harder or easier then the general problem but he's surprised by how useful it already is.

Automated Presentation - they can compute lots of different graphs and the question is what do you show. They use domain experts to figure this out.

It will be a free site with corporate sponsors. You can embed results in other sites. There will be pro versions available via subscription where you can upload your own data sets. They'll expose their ontology via RDF.

A fun bug was when they entered info about people and came across 50¢.

If you enter "meaning of life" the result 42 comes back. If you enter "42" you don't get the meaning of life or 8*7.


Patrick said...

Did you have a chance to ask "What is Fidel Castro's favorite color?"

Howard said...

I didn't, though someone else asked something similar and he didn't even try it saying it doesn't know that.

kim said...

Not surprisingly, Google has the beginnings of this as well. An example for unemployement. I'm sure it is a coincidence they posted the blog the same day Wolfram did his first public web demo.

Howard said...

My 15 minutes of fame... universal hub points to this post>