Stephen Wolfram generously gave me a two-hour demo of Wolfram Alpha last evening, and I was quite positively impressed. As he said, it’s not AI, and not aiming to be, so it shouldn’t be measured by contrasting it with HAL or Cyc but with Google or Yahoo.
At its heart is a formal Mathematica representation. Its inference engine is basically a large number of individually hand-engineered scripts for tapping into data which he and his team have spent the last several years gathering and “curating”. For example, he has assembled tables of historical financial information about countries’ GDP’s and about companies’ stock prices. In a small number of cases, he also connects via API to third party information, but mostly for realtime data such as a current stock price or current temperature. Rather than connecting to and relying on the current or future Semantic Web, Alpha computes its answers primarily from his own curated data to the extent possible; he sees Alpha as the home for almost all the information it needs, and will use to answer users’ queries.
In an important sense, Alpha is a logical extension of Mathematica: it extends the range of types of information for which significant power can be gained by manually, and exhaustively, enumerating a large set of cases: airplane designs, cities, currencies, etc. I.e., Alpha extends what Mathematica has done previously for things like chemical compounds, geometric surfaces, topological configurations, arithmetic series, trigonometric ratios, and equations. In the new cases, as Mathematica did in those abstract math cases, Alpha excels at not just retrieving the stored data but performing various appropriate numeric calculations on the data, and displaying the results in beautiful graphs and easily comprehended tables for the user.
The resulting mosaic covers a large portion of the space of queries that the average person might genuinely want to ask, in the course of their day. The interface is not exactly natural language, but can be treated by the user as though it were — just as users of browsers can treat them as though they parsed sentences even though they don’t. A better way to think of it is a DWIMM (“do what I might mean”), so if you type in something like “gdp France / Germany”, it calculates and returns a graph of the relative fraction of France’s annual GDP to Germany’s GDP, over the last 30 years or so. If you just type in “gdp”, it looks up your local host and (in my case) displays the GDP of the USA over the last 30 years, plus various pieces of information about what gross domestic product is, from a mathematical formula perspective but not from a semantic one. It does not have an ontology, so what it knows about, say, GDP, or population, or stock price, is no more nor less than the equations that involve that term. One vulnerability that this engenders in Alpha is that errors in the data may go unnoticed for a long time; a positive way of saying this is that one could align Alpha’s terms to an ontology and knowledge base, and use it to catch some fraction of errors as outright implausible violations of basic knowledge (e.g., Miami’s population dropping by exactly a factor a ten during the month of October, 2006.)
Another example of DWIMM occurs if you type in a complicated mathematical formula, sloppily, with run-on variables, parenthesis errors, typos, etc. In those cases, Alpha does a great job of guessing what you could possibly have meant by that, something close to what you typed in which would be a nontrivial graph, and displays that graph. If you type in a string of letters that’s parsable only as a chemical compound, it assumes that you want information about that compound. If you type in IL where it expects a state, it will interpret that as Illinois; where it expects a country, it will interpret that as Israel.
For those who are familiar with and enamored by Mathematica’s powerful theorem prover, it should be mentioned that that is, for the moment, turned off, for reasons having to do with computational cost — i.e., response time — and also to prevent “explosions” of less and less relevant answers from being produced. Cautiously, conditionally, at some time in the future, expect to see that theorem prover come into play.
There are two important dimensions I want to discuss about Wolfram Alpha, besides the remarks I’ve already made here. (1) What sorts of queries does it not handle, and (2) When it returns information, how much does it actually “understand” of what it’s displaying to you? There are two sorts of queries not (yet) handled: those where the data falls outside the mosaic I sketched above — such as: When is the first day of Summer in Sydney this year? Do Muslims believe that Mohammed was divine? Who did Hezbollah take prisoner on April 18, 1987? Which animals have fingers? — and those where the query requires logically reasoning out a way to combine (logically or arithmetically combine) two or more pieces of information which the system can individually fetch for you. One example of this is: “How old was Obama when Mitterrand was elected president of France?” It can tell you demographic information about Obama, if you ask, and it can tell you information about Mitterrand (including his ruleStartDate), but doesn’t make or execute the plan to calculate a person’s age on a certain date given his birth date, which is what is being asked for in this query. If it knows that exactly 17 people were killed in a certain attack, and if it also knows that 17 American soldiers were killed in that attack, it doesn’t return that attack if you ask for ones in which there were no civilian casualties, or only American casualties. It doesn’t perform that sort of deduction. If you ask “How fast does hair grow?”, it can’t parse or answer that query. But if you type in a speed, say “10cm/year”, it gives you a long and quite interesting list of things that happen at about that speed, involving glaciers melting, tectonic shift, and… hair growing.
This brings up the final issue I wanted to discuss: how much of what it returns does it understand. At one extreme is, say, Google, which responds to almost anything like a faithful puppy bringing in the morning newspaper without understanding much of anything it’s fetching (recognizing words in what it returns, often leading to amusing or hair-raising inappropriate “ads” being displayed, and leading to tons of false positives and false negatives). At the other extreme is, say, Cyc, which only can answer a small fraction of user queries, but can answer ones that require common sense (not just common sense queries like “Do surgeons often operate on themselves?”, but ones where the logical application of such knowledge is required to correctly disambiguate and parse the user’s query containing pronouns, elisions, ambiguous words, ellipsis, and so on) and where every piece of the query and every piece of the answer is as deeply understood as, say, arithmetic. Wolfram Alpha is somewhere around the geometric mean of those two extremes. It handles a much wider range of queries than Cyc, but much narrower than Google; it understands some of what it is displaying as an answer, but only some of it — e.g., the above example about it displaying the fact that hair grows 10cm/year if you ask for things that happen at 10cm/year but not if you ask how fast hair grows; or being able to report the number of cattle in Chicago but not (even a lower bound on) the number of mammals because it doesn’t know taxonomy and reason that way. If the connection between turbulent air and plane travel isn’t represented via an equation, it isn’t represented at all. As with many of these sentences, I want to add “…yet”, because Dr. Wolfram is very much aware of the limitations of his system, and has plans for addressing many of them as Alpha continues to develop.
The bottom line is that there are a large range of queries it can’t parse, and a large range of parsable queries it can’t answer even when it can answer the constituents out of which they should be answerable, but it handles a huge range of numeric and scientific queries correctly even in its current state. And Dr. Wolfram and his team are chipping away at the natural language blocks, at the holes in the curated data repository, and at increasing the type and depth of logical combination of constituents, one by one, in priority order, just as they should. I went in to the demo concerned that this might be a competitor to Cyc, given its “hand-curate knowledge and engineer it, versus let anyone add anything” philosophy, but came out of last night’s demo and discussion seeing Alpha as a complementary technology. I would invest in this, literally and figuratively. If it is not gobbled up by one of the existing industry superpowers, his company may well grow to become one of them in a small number of years, with most of us setting our default browser to be Wolfram Alpha.
Comments
GDP and Stock Market Prices
Hello, can you give me more details on where to see these tables of GDP and stock prices? I am very interested in this as I would like to develop a similar platform which performs this for stock options. Any info would be greatly appreciated.
Quite impressive but limited in choice to multi definition terms
I found it’s mathematics and physics results to be most useful but when typing in the term ‘Java’ I get the island geographical definition instead of sun corps software. Far enough though and a credit to Wolfram,s efforts in design I’m gather that in its future tweaks it may present a text option and accurate search zone for all definitions relation to a particular word.
osorio
It’s a step in the right
It’s a step in the right direction, but I think we’ve still got a long way to go to make queries return 100% relevant results to the user.
Sounds Powerful but
I haven’t been following Wolfram Alpha that much, but I am familiar with engines like Cyc. I do have a few thoughts based on your analysis.
It might seem that Wolfram Alpha is putting too much emphasis on building their database as opposed to “understanding” their database. This is where Cyc has its power in it’s drive for a “Semantic Web”
I’m not saying they aren’t improving but after reading this and some over at their site it seems they might be trying to improve their database, say laterally, instead of understanding or analyzing their current data, which could catch up with them as they try harder to attain their goal… and end up working against them.
It does sound like they are right now more of a hybrid in terms of their database size and the level they can understand it. You made great examples on say different ways a question can be asked and that WA might only be able to handle some, not all, of the ways. Parsing of our English language is where the power of these engines lie, well any language for that matter. But this almost becomes an exponentially hard problem as you consider how many languages there actually are.
Alans Miller
Oh check out this video too that I found on Youtube, a good visual representation of what Doug was laying out in this Article.
http://www.youtube.com/watch?v=pvngZAx1-PU
Correction of deductive queries
“How fast does hair grow?” actually does get processed correctly…
0.4 mm/day (millimeters per day)
jin
Wolfram Alpha sounds like something very special. This article was really great in answering some of my questions that I was thinking. It definitely is very innovative and could become a major player.
Alpha is not even trying to be like Google
Greg,
I agree with your overall conclusion…Alpha is not a Google-killer. It’s purpose is quite different. Alpha is focused on queries that require or benefit from some degree of data analysis, rather than broad web text searches. For example, if you searched Alpha for “$15 per hour”, you’ll get responses that tell you what $15 per hour comes to as an annual wage, or what the equivalent of US$15/hour is in the currencies of British pounds or Japanese Yen. If you do the same search on Google you’ll get a list of jobs advertised for $15 per hour. Same search, completely different responses, both of which are potentially useful if you know what your objective is and use the appropriate service.
The other thing is that Alpha does not search the web. It queries a very large set of data which is “curated” by the Wolfram folks, plus some trusted external sources for certain real-time data such as stock quotes, weather data, etc. So Alpha can almost never be as current as Google or any other search engine which is constantly combing the web for new information. Many folks see this as a limiting factor for Alpha – how can it scale if everything is curated by one organization – but if it is useful enough for it’s intended purpose (most facts don’t change constantly) then it can still be very successful for users, and as a business.
But I agree – forget this silly “Google-killer” stuff. This is not coming from the Wolfram folks themselves, but merely from over-excited, under-informed writers and analysts looking for dramatic angles on the story.
Tony Shaw
WolframAlpha is NOT a Google Killer
Heck of a fine write up on Wolfram Alpha. Thank you very much. I appreciate that you took the time to explain some of the similarities (heritage?) to Mathmatica. Overall I’m very surprised to hear so many people refer to this as a Google Killer. I clearly think (and you summed it up well) that this doesn’t seem to be the intent behind this. I wrote up a brief article on Why I think this isn’t going to be the case and would love to hear what you think on the matter: http://www.sagerock.com/blog/wolframalpha-not-google-killer/
Wolfram | Alpha piped to Google results
Think about this. You search for a term like heart health and Google brings back its “Results 1 – 10 of about 96,400,000 for heart health with Safesearch on. (0.24 seconds) ” now Wolfram can analyze the heck out of that and tell us how many articles are written in the previous month, year, 5 years etc. Also, what is the average length of each article, what is the flieshman index, reading level, etc. This kind of computational analytics on the fly is going to change the way people interface to all the data around the net. I don’t think search. I think interface to data.
Hope Wolfram lives upto the hype
I am hearing this connotation associated with Wolfram as “Google Killer”, etc. Lets hope its really does meet all expectations and not fall short of data as it seems like at this moment.
You may also be interested in reading this article
http://www.blogpandit.com/2009/05/wolfram-alpha-give-me-more-thats-not.html
Thanks Doug, Good post.
What is the Airspeed Velocity of an Unladen Swallow
Sorry, my first inclination of what to ask it is probably outside its expertise as well…
What is the Airspeed Velocity of an Unladen Swallow?
Wolfram/Alpha knew an answer for a European swallow, but not an African Swallow, 25 mph.
Catchphrase
So…does this mean we are all going to be answering peoples questions with “I don’t know, why don’t you Wolfram it?” in about 3 years?? (By the way…3 nuts hold down the engine of a 1964 1/2 Mustang…thought you would want to add that to the database).
Great work
When I read Wolfram my first thought is Mathematica. Mathematica is big player on universities and even poor students can affort it (student price)! So it is for a logical step forward that Wolfram Alpha will continue the line of great work. Definitive a very interesting piece of software!
Application to Data Warehousing
If a web service were created a resource such as this might have very real application for data analytics.
http://www.datamartist.com/wolfram-alpha-dimensional-generator
Currently, adding economic, demographic or other information (calendar information for example) can be a significant amount of work for a data warehouse administrator. Having a way to tap into this huge resource would be of real interest.
Sounds great.
Wolfram Alpha sounds like something very special. This article was really great in answering some of my questions that I was thinking. It definitely is very innovative and could become a major player. I am excited to eventually try Wolfram|Alpha out for myself.
It is very impressive. If new
It is very impressive. If new BI software did come out, it will solve a lot of problems. Nice work.
Some thoughts on Wolfram Alpha
Perhaps the folks reading this will find the material that I am writing here of interest: http://www.initialsingularity.com/alphatips/blog feel free to make suggestions for other topics to cover….
Very interesting, thank you
This was an extremely interesting and helpful article. When I heard about Wolfram|Alpha in the headlines a week or so ago, I was full of questions, and this has answered a lot of them.
I find it interesting that there is so much focus on deduction, it seems, whereas I would find a search engine that could return answers, even non-deductive ones, very enticing. Powerset appears to do that, which is encouraging, but I’m a little surprised that there isn’t more competition in this arena.
Anyway, I’ll have to wait until I can try Wolfram|Alpha out for myself, but thanks again for the great description.
In an important sense, Alpha
In an important sense, Alpha is a logical extension of Mathematica: it extends the range of types of information for which significant power can be gained by manually, and exhaustively, enumerating a large set of cases: airplane designs, cities, currencies, etc. I.e., Alpha extends what Mathematica has done previously for things like chemical compounds, geometric surfaces, topological configurations, arithmetic series, trigonometric ratios, and equations. In the new cases, as Mathematica did in those abstract math cases, Alpha excels at not just retrieving the stored data but performing various appropriate numeric calculations on the data, and displaying the results in beautiful graphs and easily comprehended tables for the user.
collaboration
Any chances of an eventual collaboration between Cycorp and Wolfram Alpha?
collaboration
I believe there are several possible paths that such collaboration could take, almost all of which involve at least a partial alignment of Cyc’s ontology with the Alpha terms. The possibilities include Alpha assimilating some content from Cyc, thereafter curating it; Alpha mapping in real time to certain Cyc content; Alpha’s natural language front end using Cyc to help disambiguate user queries or to help formulate connected prose answers when such answers are of nontrivial length/complexity; Cyc viewing Alpha as a sort of Heuristic Level module which it calls to process certain (sub-sub-…)problems when appropriate, or to display results of a certain form such as graphs, surfaces, etc. There are even more possibilities when you factor in third parties who might harness both Alpha and Cyc, adding some third capability, and producing some useful new functionality.