Wolfram|Alpha: The First Year

Years ago I wondered if it would ever be possible to systematically make human knowledge computable. And today, one year after the official launch of Wolfram|Alpha, I think I can say for sure: it is possible.

It takes a stack of technology and ideas that I’ve been assembling for nearly 30 years. And in many ways it’s a profoundly difficult project. But this year has shown that it is possible.

Wolfram|Alpha is of course a very long-term undertaking. But much has been built, the direction is set, and things are moving with accelerating speed.

Over the past year, we’ve roughly doubled the amount that Wolfram|Alpha knows. We’ve doubled the number of domains it handles, and the number of algorithms it can use. And we’ve actually much more than doubled the amount of raw data in it.

Things seem to be scaling better and better. The more we put into Wolfram|Alpha, the easier it becomes to add still more. We’ve honed both our automated and human processes, progressively building on what Wolfram|Alpha already does.

When we launched Wolfram|Alpha a year ago, about 2/3 of all queries generated a response. Now over 90% do.

So, what are some of the things we’ve learned over the past year?

First, encouragingly, people really seem to “get” the idea of Wolfram|Alpha. Perhaps those early precursors in science fiction helped. But people seem to understand the idea that they can ask Wolfram|Alpha a specific question, and it will compute a specific answer.

And indeed, right now, something over 50% of all queries to Wolfram|Alpha give zero hits in web searches: they are fresh new questions that aren’t explicitly written down anywhere on the web.

Another wonderful thing we’ve learned over the past year is that there are lots of people in the world who really want to see Wolfram|Alpha succeed. It’s been great to see how much support there is for what we’re trying to do, and to get so much helpful feedback.

Particularly valuable have been all the experts in so many areas who have volunteered their time and expertise—as well as their data and methods—to help us achieve our goal of deep, accurate, coverage of as many domains as possible.

I suppose one lesson we’ve really absorbed this year is how important it is to be working with the best, definitive, primary sources. By now, practically none of the raw data in Wolfram|Alpha is just “foraged from the web”.

Mostly it’s fed directly into Wolfram|Alpha from primary sources, based on relationships that we’ve developed—especially over the past year—with whomever is responsible for the data.

Something else we’ve learned, though, is that importing the raw data is perhaps only 5% of the work. After that we have to actually understand the data: how it’s represented—its units, conventions, etc.—and what it means. We have to align it with data we already have. And then we have to see how to compute from it, how to pick out what’s important, and how to present it.

We also have to work out how people will refer to the data: what they’ll call the entities; how they’ll describe properties they want. There’s almost never a systematic source. The web—and things like Wikipedia—are where we start. Doing automatic and manual “linguistic discovery”, trying to build up the right lexicons and grammars.

But of course now we have another crucial source: the huge stream of actual queries that have been fed to Wolfram|Alpha.

I’ve looked at data in many different fields. And I have to say that one of the surprises to me this year is how incredibly precise the “laws” of the Wolfram|Alpha query stream are. Perfect exponentials. Perfect power laws. Better than almost any physics experiment I’ve ever seen.

These laws tell us something we already knew: Wolfram|Alpha will never be “finished”. There’s always more tail. But they also tell us that with all the knowledge we’ve put into Wolfram|Alpha, we’re not in bad shape.

We study the Wolfram|Alpha query stream, distilling it to get a giant “to do” list. There are still some “obvious” things on it—like deeper coverage of popular culture, sports, and local information. And we’re working on these.

But already some of the things that end up high on the list seem quite esoteric. Card games. Sunspots. Mouse genes. Foreign mutual funds. But we’re systematically going through and doing all of these.

When we first launched Wolfram|Alpha there were some things I thought were just too obscure ever to cover. I kept on giving the example of “france goats” as a query we’d never be able to respond to.

But then, suddenly, quite a few months ago, I tried this query—and it worked! We’ve got data on livestock in France. With a time series of goat numbers back to 1971!

And I’ve now had this kind of experience many times. As we get deeper and deeper into the data that exists in the world, I keep on being surprised at how much is actually knowable, or computable.

One lesson we’ve learned, though, is that nothing is ever truly finished. Even before Wolfram|Alpha launched a year ago, we had by far the largest, most scholarly treatment of units of measure the world has ever seen. Nearly 8000 units, with all their patterns of usage carefully analyzed, and organized in computable form.

But over the course of this year, every week—from the corners of Wolfram|Alpha—we find more units to add. Like the “boepd” (barrel of oil equivalent per day), the “slinchf” (slinch-force), the “spat” (unit of solid angle), the “digney” (unit of resistance), or the “new hay load” (unit of mass).

One principle of mine is always to have a portfolio of development projects going on. From little enhancements to core multi-year engineering efforts to pie-in-the-sky research investigations.

New data flows into Wolfram|Alpha every second. But this year we were able to release a new fully-tested version of the whole Wolfram|Alpha codebase every single week.

We’ve introduced a few important new general frameworks this year. And as it happens, we have some major new ones currently not far away in the pipeline. Involving data. And computation. And linguistics. And presentation.

For me, Wolfram|Alpha is an exciting intellectual adventure. Not just for all the areas of knowledge it covers. But also because it represents a whole new paradigm for computing and for thinking about knowledge.

One of my consistent observations in the past has been that it takes me a decade to really absorb a new paradigm, and to see how to take the next big steps with it. And perhaps that will be the case with Wolfram|Alpha too.

But I’m happy to say that—perhaps because of the terrific team we have—I think there are some pretty big steps already visible.

We’ve recently made some breakthroughs, for example, in understanding how to bring together Wolfram|Alpha and Mathematica—to create a fascinating hybrid of ordinary human language and precise computer language, that I suspect represents the future of systematic interactions with computers.

We’re understanding how to make Wolfram|Alpha not just operate on its internal data and knowledge, but absorb new input from documents and sensors and feeds.

I even think that with the Wolfram|Alpha paradigm, I may have figured out something quite fundamental about a very abstract topic: the systematic automation of mathematics.

A lot has happened in the practical deployment of Wolfram|Alpha this year. The API. The beginnings of integration with Microsoft’s Bing search engine. The iPhone app, now the iPad app. The first ebook with integrated Wolfram|Alpha. And also the delivery of the first Wolfram|Alpha appliances for deploying custom versions of Wolfram|Alpha in enterprise environments.

But this is only the very beginning.

In many ways we’ve been holding back in expanding the use of Wolfram|Alpha—waiting until we felt we’d reached the right point. But now we’re there. And this year we’re going to be energetically making Wolfram|Alpha as broadly available as possible.

To mark our anniversary right now, we’re releasing a little burst of new features.

Homepage

A simpler, customizable home page. As well as lots of content additions. Like local street maps. And coverage of thousands of diseases and symptoms. And—since we’ve done terrestrial weather quite thoroughly—space weather.

We’re also making a systematic addition to how we interpret queries. Usually, Wolfram|Alpha works by trying to understand each query precisely and completely. And that is what one wants, if it’s possible.

But the linguistic and content space covered by Wolfram|Alpha has now filled in to the point where there’s also something else to try. Even if Wolfram|Alpha can’t interpret a particular specific query, it can still try to find the “nearest” query that it can interpret.

And as of today, that’s how Wolfram|Alpha is set up to work. Over the next little while, there’ll be considerably more sophistication added to the notion of “nearest queries”. But already this technique is adding quite a bit to the typical experience of using Wolfram|Alpha.

Needless to say, Wolfram|Alpha still can’t do everything. A few days ago the team had just made live a test version of the new “nearest queries” capability. And I was looking at our real-time monitor of queries that we still couldn’t respond to.

And flashing by came “chickens on mars”. Well, I think that one will be a while.  Then, a moment later, came “where is my hat”. I guess that might not be so long. Whether through RFID or vision or something else, I think we’re on a path to make Wolfram|Alpha be able to respond to that!

I’ve spent most of my adult life doing very large projects. Wolfram|Alpha is surely the largest and most complex so far. Over the course of this year we’ve continued to build up a terrific team. That’s turning what at one time seemed like an impossible goal into a practical, highly scalable, engineering effort.

With the help of many people, we’re building a remarkable intellectual structure—that’s steadily moving from being “interesting” to “convenient”, to absolutely necessary. Leading to a time when we’ll all wonder how on earth it was that before May 18, 2009 we could ever exist without Wolfram|Alpha….