Stephen Wolfram Blog Stephen Wolfram's Personal Blog Mon, 26 Jun 2017 19:32:17 +0000 en-US hourly 1 <![CDATA[Oh My Gosh, It’s Covered in Rule 30s!]]> Thu, 01 Jun 2017 06:22:57 +0000 Stephen Wolfram blog-cambridge-station-thumbA British Train Station A week ago a new train station, named “Cambridge North”, opened in Cambridge, UK. Normally such an event would be far outside my sphere of awareness. (I think I last took a train to Cambridge in 1975.) But last week people started sending me pictures of the new train station, wondering if I [...]]]> blog-cambridge-station-thumb

A British Train Station

A week ago a new train station, named “Cambridge North”, opened in Cambridge, UK. Normally such an event would be far outside my sphere of awareness. (I think I last took a train to Cambridge in 1975.) But last week people started sending me pictures of the new train station, wondering if I could identify the pattern on it:

Cambridge North train station

And, yes, it does indeed look a lot like patterns I’ve spent years studying—that come from simple programs in the computational universe. My first—and still favorite—examples of simple programs are one-dimensional cellular automata like this:

One-dimensional cellular automata

The system evolves line by line from the top, determining the color of each cell according to the rule underneath. This particular cellular automata I called “rule 182”, because the bit pattern in the rule corresponds to the number 182 in binary. There are altogether 256 possible cellular automata like this, and this is what all of them do:

256 possible cellular automata

Many of them show fairly simple behavior. But the huge surprise I got when I first ran all these cellular automata in the early 1980s is that even though all the rules are very simple to state, some of them generate very complex behavior. The first in the list that does that—and still my favorite example—is rule 30:

Rule 30

If one runs it for 400 steps one gets this:

After 400 steps

And, yes, it’s remarkable that starting from one black cell at the top, and just repeatedly following a simple rule, it’s possible to get all this complexity. I think it’s actually an example of a hugely important phenomenon, that’s central to how complexity gets made in nature, as well as to how we can get a new level of technology. And in fact, I think it’s important enough that I spent more than a decade writing a 1200-page book (that just celebrated its 15th anniversary) based on it.

And for years I’ve actually had rule 30 on my business cards:

Business cards

But back to the Cambridge North train station. Its pattern is obviously not completely random. But if it was made by a rule, what kind of rule? Could it be a cellular automaton?

I zoomed in on a photograph of the pattern:

Enlarged pattern

Suddenly, something seemed awfully familiar: the triangles, the stripes, the L shapes. Wait a minute… it couldn’t actually be my favorite rule of all time, rule 30?

Clearly the pattern is tipped 45° from how I’d usually display a cellular automaton. And there are black triangles in the photograph, not white ones like in rule 30. But if one black-white inverts the rule (so it’s now rule 135), one gets this:

Black-white inversion of the pattern

And, yes, it’s the same kind of pattern as in the photograph! But if it’s rule 30 (or rule 135) what’s its initial condition? Rule 30 can actually be used as a cryptosystem—because it can be hard (maybe even NP complete) to reconstruct its initial condition.

But, OK, if it’s my favorite rule, I wondered if maybe it’s also my favorite initial condition—a single black cell. And, yes, it is! The train station pattern comes exactly from the (inverted) right-hand edge of my favorite rule 30 pattern!

Edge of rule 30

Here’s the Wolfram Language code. First run the cellular automaton, then rotate the pattern:

Rotate[ArrayPlot[CellularAutomaton[135, {{1},0},40],Mesh->True],-45 Degree]

It’s a little trickier to pull out precisely the section of the pattern that’s used. Here’s the code (the PlotRange is what determines the part of the pattern that’s shown):

Graphics[Rotate[First[ArrayPlot[CellularAutomaton[135, {{1},0},80],Mesh->True]],-45 Degree], PlotRange->{{83,104},{-12,60}}]

OK, so where is this pattern actually used at the train station? Everywhere!

Cambridge North collage

It’s made of perforated aluminum. You can actually look through it, reminiscent of an old latticed window. From inside, the pattern is left-right reversed—so if it’s rule 135 from outside, it’s rule 149 from inside. And at night, the pattern is black-white inverted, because there’s light coming from inside—so from the outside it’s “rule 135 by day, and rule 30 at night”.

What are some facts about the rule 30 pattern? It’s extremely hard to rigorously prove things about it (and that’s interesting in itself—and closely related to the fundamental phenomenon of computational irreducibility). But, for example—like, say, the digits of π—many aspects of it seem random. And, for instance, black and white squares appear to occur with equal frequency—meaning that at the train station the panels let in about 50% of the outside light.

If one looks at sequences of n cells, it seems that all 2n configurations will occur on average with equal frequency. But not everything is random. And so, for example, if one looks at 3×2 blocks of cells, only 24 of the 32 possible ones ever occur. (Maybe some people waiting for trains will figure out which blocks are missing…)

When we look at the pattern, our visual system particularly picks out the black triangles. And, yes, it seems as if triangles of any size can ultimately occur, albeit with frequency decreasing exponentially with size.

If one looks carefully at the right-hand edge of the rule 30 pattern, one can see that it repeats. However, the repetition period seems to increase exponentially as one goes in from the edge.

At the train station, there are lots of identical panels. But rule 30 is actually an inexhaustible source of new patterns. So what would happen if one just continued the evolution, and rendered it on successive panels? Here’s the result. It’s a pity about the hint of periodicity on the right-hand edge, and the big triangle on panel 5 (which might be a safety problem at the train station).

Successive panels

Fifteen more steps in from the edge, there’s no hint of that anymore:

Fifteen more steps

What about other initial conditions? If the initial conditions repeat, then so will the pattern. But otherwise, so far as one can tell, the pattern will look essentially the same as with a single-cell initial condition.

One can try other rules too. Here are a few from the same simplest 256-rule set as rule 30:

Simple 256-rule set

Moving deeper from the edge the results look a little different (for aficionados, rule 89 is a transformed version of rule 45, rule 182 of rule 90, and rule 193 of rule 110):

Moving deeper from the edge

And starting from random initial conditions, rather than a single black cell, things again look different:

Starting from random initial conditions

And here are a few more rules, started from random initial conditions:

A few more rules

Here’s a website (made in a couple of minutes with a tiny piece of Wolfram Language code) that lets you experiment (including with larger rule numbers, based on longer-range rules). (And if you want to explore more systematically, here’s a Wolfram Notebook to try.)

Cellular automaton panel explorer

It’s amazing what’s out there in the computational universe of possible programs. There’s an infinite range of possible patterns. But it’s cool that the Cambridge North train station uses my all-time favorite discovery in the computational universe—rule 30! And it looks great!

The Bigger Picture

There’s something curiously timeless about algorithmically generated forms. A dodecahedron from ancient Egypt still looks crisp and modern today. As do periodic tilings—or nested forms—even from centuries ago:

Periodic tilings and nested forms

But can one generate richer forms algorithmically? Before I discovered rule 30, I’d always assumed that any form generated from simple rules would always somehow end up being obviously simple. But rule 30 was a big shock to my intuition—and from it I realized that actually in the computational universe of all possible rules, it’s actually very easy to get rich and complex behavior, even from simple underlying rules.

And what’s more, the patterns that are generated often have remarkable visual interest. Here are a few produced by cellular automata (now with 3 possible colors for each cell, rather than 2):

Three-color cellular automata

There’s an amazing diversity of forms. And, yes, they’re often complicated. But because they’re based on simple underlying rules, they always have a certain logic to them: in a sense each of them tells a definite “algorithmic story”.

One thing that’s notable about forms we see in the computational universe is that they often look a lot like forms we see in nature. And I don’t think that’s a coincidence. Instead, I think what’s going on is that rules in the computational universe capture the essence of laws that govern lots of systems in nature—whether in physics, biology or wherever. And maybe there’s a certain familiarity or comfort associated with forms in the computational universe that comes from their similarity to forms we’re used to in nature.

But is what we get from the computational universe art? When we pick out something like rule 30 for a particular purpose, what we’re doing is conceptually a bit like photography: we’re not creating the underlying forms, but we are selecting the ones we choose to use.

In the computational universe, though, we can be more systematic. Given some aesthetic criterion, we can automatically search through perhaps even millions or billions of possible rules to find optimal ones: in a sense automatically “discovering art” in the computational universe.

We did an experiment on this for music back in 2007: WolframTones. And what’s remarkable is that even by sampling fairly small numbers of rules (cellular automata, as it happens), we’re able to produce all sorts of interesting short pieces of music—that often seem remarkably “creative” and “inventive”.

From a practical point of view, automatic discovery in the computational universe is important because it allows for mass customization. It makes it easy to be “original” (and “creative”)—and to find something different every time, or to fit constraints that have never been seen before (say, a pattern in a complicated geometric region).

The Cambridge North train station uses a particular rule from the computational universe to make what amounts to an ornamental pattern. But one can also use rules from the computational universe for other things in architecture. And one can even imagine a building in which everything—from overall massing down to details of moldings—is completely determined by something close to a single rule.

One might assume that such a building would somehow be minimalist and sterile. But the remarkable fact is that this doesn’t have to be true—and that instead there are plenty of rich, almost “organic” forms to be “mined” from the computational universe.

Ever since I started writing about one-dimensional cellular automata back in the early 1980s, there’s been all sorts of interesting art done with them. Lots of different rules have been used. Sometimes they’ve been what I called “class 4” rules that have a particularly organic look. But often it’s been other rules—and rule 30 has certainly made its share of appearances—whether it’s on floors, shirts, tea cosies, kinetic installations, or, recently, mass-customized scarves (with the knitting machine actually running the cellular automaton):

CA art

But today we’re celebrating a new and different manifestation of rule 30. Formed from permanent aluminum panels, in an ancient university town, a marvellous corner of the computational universe adorns one of the most practical of structures: a small train station. My compliments to the architects. May what they’ve made give generations of rail travelers a little glimpse of the wonders of the computational universe. And maybe perhaps a few, echoing the last words attributed to the traveler in the movie 2001: A Space Odyssey, exclaim “oh my gosh, it’s covered in rule 30s!”

(Thanks to Wolfram Summer School alum Alyssa Adams for sending us the photos of Cambridge North.)

This post has been updated to include an image of Cambridge North at night, courtesy of Quintin Doyle, Senior Architectural Designer, Atkins.

]]> 7
<![CDATA[<em>A New Kind of Science</em>: A 15-Year View]]> Tue, 16 May 2017 13:43:21 +0000 Stephen Wolfram 15th-thumbStarting now, in celebration of its 15th anniversary, A New Kind of Science will be freely available in its entirety, with high-resolution images, on the web or for download. It’s now 15 years since I published my book A New Kind of Science—more than 25 since I started writing it, and more than 35 since [...]]]> 15th-thumb

Starting now, in celebration of its 15th anniversary, A New Kind of Science will be freely available in its entirety, with high-resolution images, on the web or for download.

A New Kind of Science

It’s now 15 years since I published my book A New Kind of Science—more than 25 since I started writing it, and more than 35 since I started working towards it. But with every passing year I feel I understand more about what the book is really about—and why it’s important. I wrote the book, as its title suggests, to contribute to the progress of science. But as the years have gone by, I’ve realized that the core of what’s in the book actually goes far beyond science—into many areas that will be increasingly important in defining our whole future.

So, viewed from a distance of 15 years, what is the book really about? At its core, it’s about something profoundly abstract: the theory of all possible theories, or the universe of all possible universes. But for me one of the achievements of the book is the realization that one can explore such fundamental things concretely—by doing actual experiments in the computational universe of possible programs. And in the end the book is full of what might at first seem like quite alien pictures made just by running very simple such programs.

Back in 1980, when I made my living as a theoretical physicist, if you’d asked me what I thought simple programs would do, I expect I would have said “not much”. I had been very interested in the kind of complexity one sees in nature, but I thought—like a typical reductionistic scientist—that the key to understanding it must lie in figuring out detailed features of the underlying component parts.

In retrospect I consider it incredibly lucky that all those years ago I happened to have the right interests and the right skills to actually try what is in a sense the most basic experiment in the computational universe: to systematically take a sequence of the simplest possible programs, and run them.

I could tell as soon as I did this that there were interesting things going on, but it took a couple more years before I began to really appreciate the force of what I’d seen. For me it all started with one picture:

Rule 30

Or, in modern form:

Rule 30, modern form

I call it rule 30. It’s my all-time favorite discovery, and today I carry it around everywhere on my business cards. What is it? It’s one of the simplest programs one can imagine. It operates on rows of black and white cells, starting from a single black cell, and then repeatedly applies the rules at the bottom. And the crucial point is that even though those rules are by any measure extremely simple, the pattern that emerges is not.

It’s a crucial—and utterly unexpected—feature of the computational universe: that even among the very simplest programs, it’s easy to get immensely complex behavior. It took me a solid decade to understand just how broad this phenomenon is. It doesn’t just happen in programs (“cellular automata”) like rule 30. It basically shows up whenever you start enumerating possible rules or possible programs whose behavior isn’t obviously trivial.

Similar phenomena had actually been seen for centuries in things like the digits of pi and the distribution of primes—but they were basically just viewed as curiosities, and not as signs of something profoundly important. It’s been nearly 35 years since I first saw what happens in rule 30, and with every passing year I feel I come to understand more clearly and deeply what its significance is.

Four centuries ago it was the discovery of the moons of Jupiter and their regularities that sowed the seeds for modern exact science, and for the modern scientific approach to thinking. Could my little rule 30 now be the seed for another such intellectual revolution, and a new way of thinking about everything?

In some ways I might personally prefer not to take responsibility for shepherding such ideas (“paradigm shifts” are hard and thankless work). And certainly for years I have just quietly used such ideas to develop technology and my own thinking. But as computation and AI become increasingly central to our world, I think it’s important that the implications of what’s out there in the computational universe be more widely understood.

Implications of the Computational Universe

Here’s the way I see it today. From observing the moons of Jupiter we came away with the idea that—if looked at right—the universe is an ordered and regular place, that we can ultimately understand. But now, in exploring the computational universe, we quickly come upon things like rule 30 where even the simplest rules seem to lead to irreducibly complex behavior.

One of the big ideas of A New Kind of Science is what I call the Principle of Computational Equivalence. The first step is to think of every process—whether it’s happening with black and white squares, or in physics, or inside our brains—as a computation that somehow transforms input to output. What the Principle of Computational Equivalence says is that above an extremely low threshold, all processes correspond to computations of equivalent sophistication.

It might not be true. It might be that something like rule 30 corresponds to a fundamentally simpler computation than the fluid dynamics of a hurricane, or the processes in my brain as I write this. But what the Principle of Computational Equivalence says is that in fact all these things are computationally equivalent.

It’s a very important statement, with many deep implications. For one thing, it implies what I call computational irreducibility. If something like rule 30 is doing a computation just as sophisticated as our brains or our mathematics, then there’s no way we can “outrun” it: to figure out what it will do, we have to do an irreducible amount of computation, effectively tracing each of its steps.

The mathematical tradition in exact science has emphasized the idea of predicting the behavior of systems by doing things like solving mathematical equations. But what computational irreducibility implies is that out in the computational universe that often won’t work, and instead the only way forward is just to explicitly run a computation to simulate the behavior of the system.

A Shift in Looking at the World

One of the things I did in A New Kind of Science was to show how simple programs can serve as models for the essential features of all sorts of physical, biological and other systems. Back when the book appeared, some people were skeptical about this. And indeed at that time there was a 300-year unbroken tradition that serious models in science should be based on mathematical equations.

But in the past 15 years something remarkable has happened. For now, when new models are created—whether of animal patterns or web browsing behavior—they are overwhelmingly more often based on programs than on mathematical equations.

Year by year, it’s been a slow, almost silent, process. But by this point, it’s a dramatic shift. Three centuries ago pure philosophical reasoning was supplanted by mathematical equations. Now in these few short years, equations have been largely supplanted by programs. For now, it’s mostly been something practical and pragmatic: the models work better, and are more useful.

But when it comes to understanding the foundations of what’s going on, one’s led not to things like mathematical theorems and calculus, but instead to ideas like the Principle of Computational Equivalence. Traditional mathematics-based ways of thinking have made concepts like force and momentum ubiquitous in the way we talk about the world. But now as we think in fundamentally computational terms we have to start talking in terms of concepts like undecidability and computational irreducibility.

Will some type of tumor always stop growing in some particular model? It might be undecidable. Is there a way to work out how a weather system will develop? It might be computationally irreducible.

These concepts are pretty important when it comes to understanding not only what can and cannot be modeled, but also what can and cannot be controlled in the world. Computational irreducibility in economics is going to limit what can be globally controlled. Computational irreducibility in biology is going to limit how generally effective therapies can be—and make highly personalized medicine a fundamental necessity.

And through ideas like the Principle of Computational Equivalence we can start to discuss just what it is that allows nature—seemingly so effortlessly—to generate so much that seems so complex to us. Or how even deterministic underlying rules can lead to computationally irreducible behavior that for all practical purposes can seem to show “free will”.

Cellular automata

Mining the Computational Universe

A central lesson of A New Kind of Science is that there’s a lot of incredible richness out there in the computational universe. And one reason that’s important is that it means that there’s a lot of incredible stuff out there for us to “mine” and harness for our purposes.

Want to automatically make an interesting custom piece of art? Just start looking at simple programs and automatically pick out one you like—as in our WolframTones music site from a decade ago. Want to find an optimal algorithm for something? Just search enough programs out there, and you’ll find one.

We’ve normally been used to creating things by building them up, step by step, with human effort—progressively creating architectural plans, or engineering drawings, or lines of code. But the discovery that there’s so much richness so easily accessible in the computational universe suggests a different approach: don’t try building anything; just define what you want, and then search for it in the computational universe.

Sometimes it’s really easy to find. Like let’s say you want to generate apparent randomness. Well, then just enumerate cellular automata (as I did in 1984), and very quickly you come upon rule 30—which turns out to be one of the very best known generators of apparent randomness (look down the center column of cell values, for examples). In other situations you might have to search 100,000 cases (as I did in finding the simplest axiom system for logic, or the simplest universal Turing machine), or you might have to search millions or even trillions of cases. But in the past 25 years, we’ve had incredible success in just discovering algorithms out there in the computational universe—and we rely on many of them in implementing the Wolfram Language.

At some level it’s quite sobering. One finds some tiny program out in the computational universe. One can tell it does what one wants. But when one looks at what it’s doing, one doesn’t have any real idea how it works. Maybe one can analyze some part—and be struck by how “clever” it is. But there just isn’t a way for us to understand the whole thing; it’s not something familiar from our usual patterns of thinking.

Of course, we’ve often had similar experiences before—when we use things from nature. We may notice that some particular substance is a useful drug or a great chemical catalyst, but we may have no idea why. But in doing engineering and in most of our modern efforts to build technology, the great emphasis has instead been on constructing things whose design and operation we can readily understand.

In the past we might have thought that was enough. But what our explorations of the computational universe show is that it’s not: selecting only things whose operation we can readily understand misses most of the immense power and richness that’s out there in the computational universe.

A World of Discovered Technology

What will the world look like when more of what we have is mined from the computational universe? Today the environment we build for ourselves is dominated by things like simple shapes and repetitive processes. But the more we use what’s out there in the computational universe, the less regular things will look. Sometimes they may look a bit “organic”, or like what we see in nature (since after all, nature follows similar kinds of rules). But sometimes they may look quite random, until perhaps suddenly and incomprehensibly they achieve something we recognize.

For several millennia we as a civilization have been on a path to understand more about what happens in our world—whether by using science to decode nature, or by creating our own environment through technology. But to use more of the richness of the computational universe we must at least to some extent forsake this path.

In the past, we somehow counted on the idea that between our brains and the tools we could create we would always have fundamentally greater computational power than the things around us—and as a result we would always be able to “understand” them. But what the Principle of Computational Equivalence says is that this isn’t true: out in the computational universe there are lots of things just as powerful as our brains or the tools we build. And as soon as we start using those things, we lose the “edge” we thought we had.

Today we still imagine we can identify discrete “bugs” in programs. But most of what’s powerful out there in the computational universe is rife with computational irreducibility—so the only real way to see what it does is just to run it and watch what happens.

We ourselves, as biological systems, are a great example of computation happening at a molecular scale—and we are no doubt rife with computational irreducibility (which is, at some fundamental level, why medicine is hard). I suppose it’s a tradeoff: we could limit our technology to consist only of things whose operation we understand. But then we would miss all that richness that’s out there in the computational universe. And we wouldn’t even be able to match the achievements of our own biology in the technology we create.

Machine Learning and the Neural Net Renaissance

There’s a common pattern I’ve noticed with intellectual fields. They go for decades and perhaps centuries with only incremental growth, and then suddenly, usually as a result of a methodological advance, there’s a burst of “hypergrowth” for perhaps 5 years, in which important new results arrive almost every week.

I was fortunate enough that my own very first field—particle physics—was in its period of hypergrowth right when I was involved in the late 1970s. And for myself, the 1990s felt like a kind of personal period of hypergrowth for what became A New Kind of Science—and indeed that’s why I couldn’t pull myself away from it for more than a decade.

But today, the obvious field in hypergrowth is machine learning, or, more specifically, neural nets. It’s funny for me to see this. I actually worked on neural nets back in 1981, before I started on cellular automata, and several years before I found rule 30. But I never managed to get neural nets to do anything very interesting—and actually I found them too messy and complicated for the fundamental questions I was concerned with.

And so I “simplified them”—and wound up with cellular automata. (I was also inspired by things like the Ising model in statistical physics, etc.) At the outset, I thought I might have simplified too far, and that my little cellular automata would never do anything interesting. But then I found things like rule 30. And I’ve been trying to understand its implications ever since.

In building Mathematica and the Wolfram Language, I’d always kept track of neural nets, and occasionally we’d use them in some small way for some algorithm or another. But about 5 years ago I suddenly started hearing amazing things: that somehow the idea of training neural nets to do sophisticated things was actually working. At first I wasn’t sure. But then we started building neural net capabilities in the Wolfram Language, and finally two years ago we released our website—and now we’ve got our whole symbolic neural net system. And, yes, I’m impressed. There are lots of tasks that had traditionally been viewed as the unique domain of humans, but which now we can routinely do by computer.

But what’s actually going on in a neural net? It’s not really to do with the brain; that was just the inspiration (though in reality the brain probably works more or less the same way). A neural net is really a sequence of functions that operate on arrays of numbers, with each function typically taking quite a few inputs from around the array. It’s not so different from a cellular automaton. Except that in a cellular automaton, one’s usually dealing with, say, just 0s and 1s, not arbitrary numbers like 0.735. And instead of taking inputs from all over the place, in a cellular automaton each step takes inputs only from a very well-defined local region.

Now, to be fair, it’s pretty common to study “convolutional neural nets”, in which the patterns of inputs are very regular, just like in a cellular automaton. And it’s becoming clear that having precise (say 32-bit) numbers isn’t critical to the operation of neural nets; one can probably make do with just a few bits.

But a big feature of neural nets is that we know how to make them “learn”. In particular, they have enough features from traditional mathematics (like involving continuous numbers) that techniques like calculus can be applied to provide strategies to make them incrementally change their parameters to “fit their behavior” to whatever training examples they’re given.

It’s far from obvious how much computational effort, or how many training examples, will be needed. But the breakthrough of about five years ago was the discovery that for many important practical problems, what’s available with modern GPUs and modern web-collected training sets can be enough.

Pretty much nobody ends up explicitly setting or “engineering” the parameters in a neural net. Instead, what happens is that they’re found automatically. But unlike with simple programs like cellular automata, where one’s typically enumerating all possibilities, in current neural nets there’s an incremental process, essentially based on calculus, that manages to progressively improve the net—a little like the way biological evolution progressively improves the “fitness” of an organism.

It’s plenty remarkable what comes out from training a neural net in this way, and it’s plenty difficult to understand how the neural net does what it does. But in some sense the neural net isn’t venturing too far across the computational universe: it’s always basically keeping the same basic computational structure, and just changing its behavior by changing parameters.

But to me the success of today’s neural nets is a spectacular endorsement of the power of the computational universe, and another validation of the ideas of A New Kind of Science. Because it shows that out in the computational universe, away from the constraints of explicitly building systems whose detailed behavior one can foresee, there are immediately all sorts of rich and useful things to be found.

NKS Meets Modern Machine Learning

Is there a way to bring the full power of the computational universe—and the ideas of A New Kind of Science—to the kinds of things one does with neural nets? I suspect so. And in fact, as the details become clear, I wouldn’t be surprised if exploration of the computational universe saw its own period of hypergrowth: a “mining boom” of perhaps unprecedented proportions.

In current work on neural nets, there’s a definite tradeoff one sees. The more what’s going on inside the neural net is like a simple mathematical function with essentially arithmetic parameters, the easier it is to use ideas from calculus to train the network. But the more what’s going is like a discrete program, or like a computation whose whole structure can change, the more difficult it is to train the network.

It’s worth remembering, though, that the networks we’re routinely training now would have looked utterly impractical to train only a few years ago. It’s effectively just all those quadrillions of GPU operations that we can throw at the problem that makes training feasible. And I won’t be surprised if even quite pedestrian (say, local exhaustive search) techniques will fairly soon let one do significant training even in cases where no incremental numerical approach is possible. And perhaps even it will be possible to invent some major generalization of things like calculus that will operate in the full computational universe. (I have some suspicions, based on thinking about generalizing basic notions of geometry to cover things like cellular automaton rule spaces.)

What would this let one do? Likely it would let one find considerably simpler systems that could achieve particular computational goals. And maybe that would bring within reach some qualitatively new level of operations, perhaps beyond what we’re used to being possible with things like brains.

There’s a funny thing that’s going on with modeling these days. As neural nets become more successful, one begins to wonder: why bother to simulate what’s going on inside a system when one can just make a black-box model of its output using a neural net? Well, if we manage to get machine learning to reach deeper into the computational universe, we won’t have as much of this tradeoff any more—because we’ll be able to learn models of the mechanism as well as the output.

I’m pretty sure that bringing the full computational universe into the purview of machine learning will have spectacular consequences. But it’s worth realizing that computational universality—and the Principle of Computational Equivalence—make it less a matter of principle. Because they imply that even neural nets of the kinds we have now are universal, and are capable of emulating anything any other system can do. (In fact, this universality result was essentially what launched the whole modern idea of neural nets, back in 1943.)

And as a practical matter, the fact that current neural net primitives are being built into hardware and so on will make them a desirable foundation for actual technology systems, though, even if they’re far from optimal. But my guess is that there are tasks where for the foreseeable future access to the full computational universe will be necessary to make them even vaguely practical.

Finding AI

What will it take to make artificial intelligence? As a kid, I was very interested in figuring out how to make a computer know things, and be able to answer questions from what it knew. And when I studied neural nets in 1981, it was partly in the context of trying to understand how to build such a system. As it happens, I had just developed SMP, which was a forerunner of Mathematica (and ultimately the Wolfram Language)—and which was very much based on symbolic pattern matching (“if you see this, transform it to that”). At the time, though, I imagined that artificial intelligence was somehow a “higher level of computation”, and I didn’t know how to achieve it.

I returned to the problem every so often, and kept putting it off. But then when I was working on A New Kind of Science it struck me: if I’m to take the Principle of Computational Equivalence seriously, then there can’t be any fundamentally “higher level of computation”—so AI must be achievable just with the standard ideas of computation that I already know.

And it was this realization that got me started building Wolfram|Alpha. And, yes, what I found is that lots of those very “AI-oriented things”, like natural language understanding, could be done just with “ordinary computation”, without any magic new AI invention. Now, to be fair, part of what was happening was that we were using ideas and methods from A New Kind of Science: we weren’t just engineering everything; we were often searching the computational universe for rules and algorithms to use.

So what about “general AI”? Well, I think at this point that with the tools and understanding we have, we’re in a good position to automate essentially anything we can define. But definition is a more difficult and central issue than we might imagine.

The way I see things at this point is that there’s a lot of computation even near at hand in the computational universe. And it’s powerful computation. As powerful as anything that happens in our brains. But we don’t recognize it as “intelligence” unless it’s aligned with our human goals and purposes.

Ever since I was writing A New Kind of Science, I’ve been fond of quoting the aphorism “the weather has a mind of its own”. It sounds so animistic and pre-scientific. But what the Principle of Computational Equivalence says is that actually, according to the most modern science, it’s true: the fluid dynamics of the weather is the same in its computational sophistication as the electrical processes that go on in our brains.

But is it “intelligent”? When I talk to people about A New Kind of Science, and about AI, I’ll often get asked when I think we’ll achieve “consciousness” in a machine. Life, intelligence, consciousness: they are all concepts that we have a specific example of, here on Earth. But what are they in general? All life on Earth shares RNA and the structure of cell membranes. But surely that’s just because all life we know is part of one connected thread of history; it’s not that such details are fundamental to the very concept of life.

And so it is with intelligence. We have only one example we’re sure of: us humans. (We’re not even sure about animals.) But human intelligence as we experience it is deeply entangled with human civilization, human culture and ultimately also human physiology—even though none of these details are presumably relevant in the abstract definition of intelligence.

We might think about extraterrestrial intelligence. But what the Principle of Computational Equivalence implies is that actually there’s “alien intelligence” all around us. But somehow it’s just not quite aligned with human intelligence. We might look at rule 30, for example, and be able to see that it’s doing sophisticated computation, just like our brains. But somehow it just doesn’t seem to have any “point” to what it’s doing.

We imagine that in doing the things we humans do, we operate with certain goals or purposes. But rule 30, for example, just seems to be doing what it’s doing—just following some definite rule. In the end, though, one realizes we’re not so very different. After all, there are definite laws of nature that govern our brains. So anything we do is at some level just playing out those laws.

Any process can actually be described either in terms of mechanism (“the stone is moving according to Newton’s laws”), or in terms of goals (“the stone is moving so as to minimize potential energy”). The description in terms of mechanism is usually what’s most useful in connecting with science. But the description in terms of goals is usually what’s most useful in connecting with human intelligence.

And this is crucial in thinking about AI. We know we can have computational systems whose operations are as sophisticated as anything. But can we get them to do things that are aligned with human goals and purposes?

In a sense this is what I now view as the key problem of AI: it’s not about achieving underlying computational sophistication, but instead it’s about communicating what we want from this computation.

The Importance of Language

I’ve spent much of my life as a computer language designer—most importantly creating what is now the Wolfram Language. I’d always seen my role as a language designer being to imagine the possible computations people might want to do, then—like a reductionist scientist—trying to “drill down” to find good primitives from which all these computations could be built up. But somehow from A New Kind of Science, and from thinking about AI, I’ve come to think about it a little differently.

Now what I more see myself as doing is making a bridge between our patterns of human thinking, and what the computational universe is capable of. There are all sorts of amazing things that can in principle be done by computation. But what the language does is to provide a way for us humans to express what we want done, or want to achieve—and then to get this actually executed, as automatically as possible.

Language design has to start from what we know and are familiar with. In the Wolfram Language, we name the built-in primitives with English words, leveraging the meanings that those words have acquired. But the Wolfram Language is not like natural language. It’s something more structured, and more powerful. It’s based on the words and concepts that we’re familiar with through the shared corpus of human knowledge. But it gives us a way to build up arbitrarily sophisticated programs that in effect express arbitrarily complex goals.

Yes, the computational universe is capable of remarkable things. But they’re not necessarily things that we humans can describe or relate to. But in building the Wolfram Language my goal is to do the best I can in capturing everything we humans want—and being able to express it in executable computational terms.

When we look at the computational universe, it’s hard not to be struck by the limitations of what we know how to describe or think about. Modern neural nets provide an interesting example. For the ImageIdentify function of the Wolfram Language we’ve trained a neural net to identify thousands of kinds of things in the world. And to cater to our human purposes, what the network ultimately does is to describe what it sees in terms of concepts that we can name with words—tables, chairs, elephants, etc.

But internally what the network is doing is to identify a series of features of any object in the world. Is it green? Is it round? And so on. And what happens as the neural network is trained is that it identifies features it finds useful for distinguishing different kinds of things in the world. But the point is that almost none of these features are ones to which we happen to have assigned words in human language.

Out in the computational universe it’s possible to find what may be incredibly useful ways to describe things. But they’re alien to us humans. They’re not something we know how to express, based on the corpus of knowledge our civilization has developed.

Now of course new concepts are being added to the corpus of human knowledge all the time. Back a century ago, if someone saw a nested pattern they wouldn’t have any way to describe it. But now we’d just say “it’s a fractal”. But the problem is that in the computational universe there’s an infinite collection of “potentially useful concepts”—with which we can never hope to ultimately keep up.

The Analogy in Mathematics

When I wrote A New Kind of Science I viewed it in no small part as an effort to break away from the use of mathematics—at least as a foundation for science. But one of the things I realized is that the ideas in the book also have a lot of implications for pure mathematics itself.

What is mathematics? Well, it’s a study of certain abstract kinds of systems, based on things like numbers and geometry. In a sense it’s exploring a small corner of the computational universe of all possible abstract systems. But still, plenty has been done in mathematics: indeed, the 3 million or so published theorems of mathematics represent perhaps the largest single coherent intellectual structure that our species has built.

Ever since Euclid, people have at least notionally imagined that mathematics starts from certain axioms (say, a+b=b+a, a+0=a, etc.), then builds up derivations of theorems. Why is math hard? The answer is fundamentally rooted in the phenomenon of computational irreducibility—which here is manifest in the fact that there’s no general way to shortcut the series of steps needed to derive a theorem. In other words, it can be arbitrarily hard to get a result in mathematics. But worse than that—as Gödel’s Theorem showed—there can be mathematical statements where there just aren’t any finite ways to prove or disprove them from the axioms. And in such cases, the statements just have to be considered “undecidable”.

And in a sense what’s remarkable about math is that one can usefully do it at all. Because it could be that most mathematical results one cares about would be undecidable. So why doesn’t that happen?

Well, if one considers arbitrary abstract systems it happens a lot. Take a typical cellular automaton—or a Turing machine—and ask whether it’s true that the system, say, always settles down to periodic behavior regardless of its initial state. Even something as simple as that will often be undecidable.

So why doesn’t this happen in mathematics? Maybe there’s something special about the particular axioms used in mathematics. And certainly if one thinks they’re the ones that uniquely describe science and the world there might be a reason for that. But one of the whole points of the book is that actually there’s a whole computational universe of possible rules that can be useful for doing science and describing the world.

And in fact I don’t think there’s anything abstractly special about the particular axioms that have traditionally been used in mathematics: I think they’re just accidents of history.

What about the theorems that people investigate in mathematics? Again, I think there’s a strong historical character to them. For all but the most trivial areas of mathematics, there’s a whole sea of undecidability out there. But somehow mathematics picks the islands where theorems can actually be proved—often particularly priding itself on places close to the sea of undecidability where the proof can only be done with great effort.

I’ve been interested in the whole network of published theorems in mathematics (it’s a thing to curate, like wars in history, or properties of chemicals). And one of the things I’m curious about is whether something there’s an inexorable sequence to the mathematics that’s done, or whether, in a sense, random parts are being picked.

And here, I think, there’s a considerable analogy to the kind of thing we were discussing before with language. What is a proof? Basically it’s a way of explaining to someone why something is true. I’ve made all sorts of automated proofs in which there are hundreds of steps, each perfectly verifiable by computer. But—like the innards of a neural net—what’s going on looks alien and not understandable by a human.

For a human to understand, there have to be familiar “conceptual waypoints”. It’s pretty much like with words in languages. If some particular part of a proof has a name (“Smith’s Theorem”), and has a known meaning, then it’s useful to us. But if it’s just a lump of undifferentiated computation, it won’t be meaningful to us.

In pretty much any axiom system, there’s an infinite set of possible theorems. But which ones are “interesting”? That’s really a human question. And basically it’s going to end up being ones with “stories”. In the book I show that for the simple case of basic logic, the theorems that have historically been considered interesting enough to be given names happen to be precisely the ones that are in some sense minimal.

But my guess is that for richer axiom systems pretty much anything that’s going to be considered “interesting” is going to have to be reached from things that are already considered interesting. It’s like building up words or concepts: you don’t get to introduce new ones unless you can directly relate them to existing ones.

In recent years I’ve wondered quite a bit about how inexorable or not progress is in a field like mathematics. Is there just one historical path that can be taken, say from arithmetic to algebra to the higher reaches of modern mathematics? Or are there an infinite diversity of possible paths, with completely different histories for mathematics?

The answer is going to depend on—in a sense—the “structure of metamathematical space”: just what is the network of true theorems that avoid the sea of undecidability? Maybe it’ll be different for different fields of mathematics, and some will be more “inexorable” (so it feels like the math is being “discovered”) than others (where it seems more like the math is arbitrary, and “invented”).

But to me one of the most interesting things is how close—when viewed in these kinds of terms—questions about the nature and character of mathematics end up being to questions about the nature and character of intelligence and AI. And it’s this kind of commonality that makes me realize just how powerful and general the ideas in A New Kind of Science actually are.

When Is There a Science?

There are some areas of science—like physics and astronomy—where the traditional mathematical approach has done quite well. But there are others—like biology, social science and linguistics—where it’s had a lot less to say. And one of the things I’ve long believed is that what’s needed to make progress in these areas is to generalize the kinds of models one’s using, to consider a broader range of what’s out there in the computational universe.

And indeed in the past 15 or so years there’s been increasing success in doing this. And there are lots of biological and social systems, for example, where models have now been constructed using simple programs.

But unlike with mathematical models which can potentially be “solved”, these computational models often show computational irreducibility, and are typically used by doing explicit simulations. This can be perfectly successful for making particular predictions, or for applying the models in technology. But a bit like for the automated proofs of mathematical theorems one might still ask, “is this really science?”.

Yes, one can simulate what a system does, but does one “understand” it? Well, the problem is that computational irreducibility implies that in some fundamental sense one can’t always “understand” things. There might be no useful “story” that can be told; there may be no “conceptual waypoints”—only lots of detailed computation.

Imagine that one’s trying to make a science of how the brain understands language—one of the big goals of linguistics. Well, perhaps we’ll get an adequate model of the precise rules which determine the firing of neurons or some other low-level representation of the brain. And then we look at the patterns generated in understanding some whole collection of sentences.

Well, what if those patterns look like the behavior of rule 30? Or, closer at hand, the innards of some recurrent neural network? Can we “tell a story” about what’s happening? To do so would basically require that we create some kind of higher-level symbolic representation: something where we effectively have words for core elements of what’s going on.

But computational irreducibility implies that there may ultimately be no way to create such a thing. Yes, it will always be possible to find patches of computational reducibility, where some things can be said. But there won’t be a complete story that can be told. And one might say there won’t be a useful reductionistic piece of science to be done. But that’s just one of the things that happens when one’s dealing with (as the title says) a new kind of science.

Controlling the AIs

People have gotten very worried about AI in recent years. They wonder what’s going to happen when AIs “get much smarter” than us humans. Well, the Principle of Computational Equivalence has one piece of good news: at some fundamental level, AIs will never be “smarter”—they’ll just be able to do computations that are ultimately equivalent to what our brains do, or, for that matter, what all sorts of simple programs do.

As a practical matter, of course, AIs will be able to process larger amounts of data more quickly than actual brains. And no doubt we’ll choose to have them run many aspects of the world for us—from medical devices, to central banks to transportation systems, and much more.

So then it’s important to figure how we’ll tell them what to do. As soon as we’re making serious use of what’s out there in the computational universe, we’re not going to be able to give a line-by-line description of what the AIs are going to do. Rather, we’re going to have to define goals for the AIs, then let them figure out how best to achieve those goals.

In a sense we’ve already been doing something like this for years in the Wolfram Language. There’s some high-level function that describes something you want to do (“lay out a graph”, “classify data”, etc.). Then it’s up to the language to automatically figure out the best way to do it.

And in the end the real challenge is to find a way to describe goals. Yes, you want to search for cellular automata that will make a “nice carpet pattern”, or a “good edge detector”. But what exactly do those things mean? What you need is a language that a human can use to say as precisely as possible what they mean.

It’s really the same problem as I’ve been talking about a lot here. One has to have a way for humans to be able to talk about things they care about. There’s infinite detail out there in the computational universe. But through our civilization and our shared cultural history we’ve come to identify certain concepts that are important to us. And when we describe our goals, it’s in terms of these concepts.

Three hundred years ago people like Leibniz were interested in finding a precise symbolic way to represent the content of human thoughts and human discourse. He was far too early. But now I think we’re finally in a position to actually make this work. In fact, we’ve already gotten a long way with the Wolfram Language in being able to describe real things in the world. And I’m hoping it’ll be possible to construct a fairly complete “symbolic discourse language” that lets us talk about the things we care about.

Right now we write legal contracts in “legalese” as a way to make them slightly more precise than ordinary natural language. But with a symbolic discourse language we’ll be able to write true “smart contracts” that describe in high-level terms what we want to have happen—and then machines will automatically be able to verify or execute the contract.

But what about the AIs? Well, we need to tell them what we generally want them to do. We need to have a contract with them. Or maybe we need to have a constitution for them. And it’ll be written in some kind of symbolic discourse language, that both allows us humans to express what we want, and is executable by the AIs.

There’s lots to say about what should be in an AI Constitution, and how the construction of such things might map onto the political and cultural landscape of the world. But one of the obvious questions is: can the constitution be simple, like Asimov’s Laws of Robotics?

And here what we know from A New Kind of Science tells us the answer: it can’t be. In a sense the constitution is an attempt to sculpt what can happen in the world and what can’t. But computational irreducibility says that there will be an unbounded collection of cases to consider.

For me it’s interesting to see how theoretical ideas like computational irreducibility end up impinging on these very practical—and central—societal issues. Yes, it all started with questions about things like the theory of all possible theories. But in the end it turns into issues that everyone in society is going to end up being concerned about.

There’s an Endless Frontier

Will we reach the end of science? Will we—or our AIs—eventually invent everything there is to be invented?

For mathematics, it’s easy to see that there’s an infinite number of possible theorems one can construct. For science, there’s an infinite number of possible detailed questions to ask. And there’s also an infinite array of possible inventions one can construct.

But the real question is: will there always be interesting new things out there?

Well, computational irreducibility says there will always be new things that need an irreducible amount of computational work to reach from what’s already there. So in a sense there’ll always be “surprises”, that aren’t immediately evident from what’s come before.

But will it just be like an endless array of different weirdly shaped rocks? Or will there be fundamental new features that appear, that we humans consider interesting?

It’s back to the very same issue we’ve encountered several times before: for us humans to find things “interesting” we have to have a conceptual framework that we can use to think about them. Yes, we can identify a “persistent structure” in a cellular automaton. Then maybe we can start talking about “collisions between structures”. But when we just see a whole mess of stuff going on, it’s not going to be “interesting” to us unless we have some higher-level symbolic way to talk about it.

In a sense, then, the rate of “interesting discovery” isn’t going to be limited by our ability to go out into the computational universe and find things. Instead, it’s going to be limited by our ability as humans to build a conceptual framework for what we’re finding.

It’s a bit like what happened in the whole development of what became A New Kind of Science. People had seen related phenomena for centuries if not millennia (distribution of primes, digits of pi, etc.). But without a conceptual framework they just didn’t seem “interesting”, and nothing was built around them. And indeed as I understand more about what’s out there in the computational universe—and even about things I saw long ago there—I gradually build up a conceptual framework that lets me go further.

By the way, it’s worth realizing that inventions work a little differently from discoveries. One can see something new happen in the computational universe, and that might be a discovery. But an invention is about figuring out how something can be achieved in the computational universe.

And—like in patent law—it isn’t really an invention if you just say “look, this does that”. You have to somehow understand a purpose that it’s achieving.

In the past, the focus of the process of invention has tended to be on actually getting something to work (“find the lightbulb filament that works”, etc.). But in the computational universe, the focus shifts to the question of what you want the invention to do. Because once you’ve described the goal, finding a way to achieve it is something that can be automated.

That’s not to say that it will always be easy. In fact, computational irreducibility implies that it can be arbitrarily difficult. Let’s say you know the precise rules by which some chemicals can interact. Can you find a chemical synthesis pathway that will let you get to some particular chemical structure? There may be a way, but computational irreducibility implies that there may be no way to find out how long the pathway may be. And if you haven’t found a pathway you may never be sure if it’s because there isn’t one, or just because you didn’t reach it yet.

The Fundamental Theory of Physics

If one thinks about reaching the edge of science, one cannot help but wonder about the fundamental theory of physics. Given everything we’ve seen in the computational universe, is it conceivable that our physical universe could just correspond to one of those programs out there in the computational universe?

Of course, we won’t really know until or unless we find it. But in the years since A New Kind of Science appeared, I’ve become ever more optimistic about the possibilities.

Needless to say, it would be a big change for physics. Today there are basically two major frameworks for thinking about fundamental physics: general relativity and quantum field theory. General relativity is a bit more than 100 years old; quantum field theory maybe 90. And both have achieved spectacular things. But neither has succeeded in delivering us a complete fundamental theory of physics. And if nothing else, I think after all this time, it’s worth trying something new.

But there’s another thing: from actually exploring the computational universe, we have a huge amount of new intuition about what’s possible, even in very simple models. We might have thought that the kind of richness we know exists in physics would require some very elaborate underlying model. But what’s become clear is that that kind of richness can perfectly well emerge even from a very simple underlying model.

What might the underlying model be like? I’m not going to discuss this in great detail here, but suffice it to say that I think the most important thing about the model is that it should have as little as possible built in. We shouldn’t have the hubris to think we know how the universe is constructed; we should just take a general type of model that’s as unstructured as possible, and do what we typically do in the computational universe: just search for a program that does what we want.

My favorite formulation for a model that’s as unstructured as possible is a network: just a collection of nodes with connections between them. It’s perfectly possible to formulate such a model as an algebraic-like structure, and probably many other kinds of things. But we can think of it as a network. And in the way I’ve imagined setting it up, it’s a network that’s somehow “underneath” space and time: every aspect of space and time as we know it must emerge from the actual behavior of the network.

Over the past decade or so there’s been increasing interest in things like loop quantum gravity and spin networks. They’re related to what I’ve been doing in the same way that they also involve networks. And maybe there’s some deeper relationship. But in their usual formulation, they’re much more mathematically elaborate.

From the point of view of the traditional methods of physics, this might seem like a good idea. But with the intuition we have from studying the computational universe—and using it for science and technology—it seems completely unnecessary. Yes, we don’t yet know the fundamental theory of physics. But it seems sensible to start with the simplest hypothesis. And that’s definitely something like a simple network of the kind I’ve studied.

At the outset, it’ll look pretty alien to people (including myself) trained in traditional theoretical physics. But some of what emerges isn’t so alien. A big result I found nearly 20 years ago (that still hasn’t been widely understood) is that when you look at a large enough network of the kind I studied you can show that its averaged behavior follows Einstein’s equations for gravity. In other words, without putting any fancy physics into the underlying model, it ends up automatically emerging. I think it’s pretty exciting.

People ask a lot about quantum mechanics. Yes, my underlying model doesn’t build in quantum mechanics (just as it doesn’t build in general relativity). Now, it’s a little difficult to pin down exactly what the essence of “being quantum mechanical” actually is. But there are some very suggestive signs that my simple networks actually end up showing what amounts to quantum behavior—just like in the physics we know.

OK, so how should one set about actually finding the fundamental theory of physics if it’s out there in the computational universe of possible programs? Well, the obvious thing is to just start searching for it, starting with the simplest programs.

I’ve been doing this—more sporadically than I would like—for the past 15 years or so. And my main discovery so far is that it’s actually quite easy to find programs that aren’t obviously not our universe. There are plenty of programs where space or time are obviously completely different from the way they are in our universe, or there’s some other pathology. But it turns out it’s not so difficult to find candidate universes that aren’t obviously not our universe.

But we’re immediately bitten by computational irreducibility. We can simulate the candidate universe for billions of steps. But we don’t know what it’s going to do—and whether it’s going to grow up to be like our universe, or completely different.

It’s pretty unlikely that in looking at that tiny fragment of the very beginning of a universe we’re going to ever be able to see anything familiar, like a photon. And it’s not at all obvious that we’ll be able to construct any kind of descriptive theory, or effective physics. But in a sense the problem is bizarrely similar to the one we have even in systems like neural networks: there’s computation going on there, but can we identify “conceptual waypoints” from which we can build up a theory that we might understand?

It’s not at all clear our universe has to be understandable at that level, and it’s quite possible that for a very long time we’ll be left in the strange situation of thinking we might have “found our universe” out in the computational universe, but not being sure.

Of course, we might be lucky, and it might be possible to deduce an effective physics, and see that some little program that we found ends up reproducing our whole universe. It would be a remarkable moment for science. But it would immediately raise a host of new questions—like why this universe, and not another?

Box of a Trillion Souls

Right now us humans exist as biological systems. But in the future it’s certainly going to be technologically possible to reproduce all the processes in our brains in some purely digital—computational—form. So insofar as those processes represent “us”, we’re going to be able to be “virtualized” on pretty much any computational substrate. And in this case we might imagine that the whole future of a civilization could wind up in effect as a “box of a trillion souls”.

Inside that box there would be all kinds of computations going on, representing the thoughts and experiences of all those disembodied souls. Those computations would reflect the rich history of our civilization, and all the things that have happened to us. But at some level they wouldn’t be anything special.

It’s perhaps a bit disappointing, but the Principle of Computational Equivalence tells us that ultimately these computations will be no more sophisticated than the ones that go on in all sorts of other systems—even ones with simple rules, and no elaborate history of civilization. Yes, the details will reflect all that history. But in a sense without knowing what to look for—or what to care about—one won’t be able to tell that there’s anything special about it.

OK, but what about for the “souls” themselves? Will one be able to understand their behavior by seeing that they achieve certain purposes? Well, in our current biological existence, we have all sorts of constraints and features that give us goals and purposes. But in a virtualized “uploaded” form, most of these just go away.

I’ve thought quite a bit about how “human” purposes might evolve in such a situation, recognizing, of course, that in virtualized form there’s little difference between human and AI. The disappointing vision is that perhaps the future of our civilization consists in disembodied souls in effect “playing videogames” for the rest of eternity.

But what I’ve slowly realized is that it’s actually quite unrealistic to project our view of goals and purposes from our experience today into that future situation. Imagine talking to someone from a thousand years ago and trying to explain that people in the future would be walking on treadmills every day, or continually sending photographs to their friends. The point is that such activities don’t make sense until the cultural framework around them has developed.

It’s the same story yet again as with trying to characterize what’s interesting or what’s explainable. It relies on the development of a whole network of conceptual waypoints.

Can we imagine what the mathematics of 100 years from now will be like? It depends on concepts we don’t yet know. So similarly if we try to imagine human motivation in the future, it’s going to rely on concepts we don’t know. Our best description from today’s viewpoint might be that those disembodied souls are just “playing videogames”. But to them there might be a whole subtle motivation structure that they could only explain by rewinding all sorts of steps in history and cultural development.

By the way, if we know the fundamental theory of physics then in a sense we can make the virtualization complete, at least in principle: we can just run a simulation of the universe for those disembodied souls. Of course, if that’s what’s happening, then there’s no particular reason it has to be a simulation of our particular universe. It could as well be any universe from out in the computational universe.

Now, as I’ve mentioned, even in any given universe one will never in a sense run out of things to do, or discover. But I suppose I myself at least find it amusing to imagine that at some point those disembodied souls might get bored with just being in a simulated version of our physical universe—and might decide it’s more fun (whatever that means to them) to go out and explore the broader computational universe. Which would mean that in a sense the future of humanity would be an infinite voyage of discovery in the context of none other than A New Kind of Science!

The Economics of the Computational Universe

Long before we have to think about disembodied human souls, we’ll have to confront the issue of what humans should be doing in a world where more and more can be done automatically by AIs. Now in a sense this issue is nothing new: it’s just an extension of the long-running story of technology and automation. But somehow this time it feels different.

And I think the reason is in a sense just that there’s so much out there in the computational universe, that’s so easy to get to. Yes, we can build a machine that automates some particular task. We can even have a general-purpose computer that can be programmed to do a full range of different tasks. But even though these kinds of automation extend what we can do, it still feels like there’s effort that we have to put into them.

But the picture now is different—because in effect what we’re saying is that if we can just define the goal we want to achieve, then everything else will be automatic. All sorts of computation, and, yes, “thinking”, may have to be done, but the idea is that it’s just going to happen, without human effort.

At first, something seems wrong. How could we get all that benefit, without putting in more effort? It’s a bit like asking how nature could manage to make all the complexity it does—even though when we build artifacts, even with great effort, they end up far less complex. The answer, I think, is it’s mining the computational universe. And it’s exactly the same thing for us: by mining the computational universe, we can achieve essentially an unbounded level of automation.

If we look at the important resources in today’s world, many of them still depend on actual materials. And often these materials are literally mined from the Earth. Of course, there are accidents of geography and geology that determine by whom and where that mining can be done. And in the end there’s a limit (if often very large) to the amount of material that’ll ever be available.

But when it comes to the computational universe, there’s in a sense an inexhaustible supply of material—and it’s accessible to anyone. Yes, there are technical issues about how to “do the mining”, and there’s a whole stack of technology associated with doing it well. But the ultimate resource of the computational universe is a global and infinite one. There’s no scarcity and no reason to be “expensive”. One just has to understand that it’s there, and take advantage of it.

The Path to Computational Thinking

Probably the greatest intellectual shift of the past century has been the one towards the computational way of thinking about things. I’ve often said that if one picks almost any field “X”, from archaeology to zoology, then by now there either is, or soon will be, a field called “computational X”—and it’s going to be the future of the field.

I myself have been deeply involved in trying to enable such computational fields, in particular through the development of the Wolfram Language. But I’ve also been interested in what is essentially the meta problem: how should one teach abstract computational thinking, for example to kids? The Wolfram Language is certainly important as a practical tool. But what about the conceptual, theoretical foundations?

Well, that’s where A New Kind of Science comes in. Because at its core it’s discussing the pure abstract phenomenon of computation, independent of its applications to particular fields or tasks. It’s a bit like with elementary mathematics: there are things to teach and understand just to introduce the ideas of mathematical thinking, independent of their specific applications. And so it is too with the core of A New Kind of Science. There are things to learn about the computational universe that give intuition and introduce patterns of computational thinking—quite independent of detailed applications.

One can think of it as a kind of “pre computer science” , or “pre computational X”. Before one gets into discussing the specifics of particular computational processes, one can just study the simple but pure things one finds in the computational universe.

And, yes, even before kids learn to do arithmetic, it’s perfectly possible for them to fill out something like a cellular automaton coloring book—or to execute for themselves or on a computer a whole range of different simple programs. What does it teach? Well, it certainly teaches the idea that there can be definite rules or algorithms for things—and that if one follows them one can create useful and interesting results. And, yes, it helps that systems like cellular automata make obvious visual patterns, that for example one can even find in nature (say on mollusc shells).

As the world becomes more computational—and more things are done by AIs and by mining the computational universe—there’s going to an extremely high value not only in understanding computational thinking, but also in having the kind of intuition that develops from exploring the computational universe and that is, in a sense, the foundation for A New Kind of Science.

What’s Left to Figure Out?

My goal over the decade that I spent writing A New Kind of Science was, as much as possible, to answer all the first round of “obvious questions” about the computational universe. And looking back 15 years later I think that worked out pretty well. Indeed, today, when I wonder about something to do with the computational universe, I find it’s incredibly likely that somewhere in the main text or notes of the book I already said something about it.

But one of the biggest things that’s changed over the past 15 years is that I’ve gradually begun to understand more of the implications of what the book describes. There are lots of specific ideas and discoveries in the book. But in the longer term I think what’s most significant is how they serve as foundations, both practical and conceptual, for a whole range of new things that one can now understand and explore.

But even in terms of the basic science of the computational universe, there are certainly specific results one would still like to get. For example, it would be great to get more evidence for or against the Principle of Computational Equivalence, and its domain of applicability.

Like most general principles in science, the whole epistemological status of the Principles of Computational Equivalence is somewhat complicated. Is it like a mathematical theorem that can be proved? Is it like a law of nature that might (or might not) be true about the universe? Or is it like a definition, say of the very concept of computation? Well, much like, say, the Second Law of Thermodynamics or Evolution by Natural Selection, it’s a combination of these.

But one thing that’s significant is that it’s possible to get concrete evidence for (or against) the Principle of Computational Equivalence. The principle says that even systems with very simple rules should be capable of arbitrarily sophisticated computation—so that in particular they should be able to act as universal computers.

And indeed one of the results of the book is that this is true for one of the simplest possible cellular automata (rule 110). Five years after the book was published I decided to put up a prize for evidence about another case: the simplest conceivably universal Turing machine. And I was very pleased that in just a few months the prize was won, the Turing machine was proved universal, and there was another piece of evidence for the Principle of Computational Equivalence.

There’s a lot to do in developing the applications of A New Kind of Science. There are models to be made of all sorts of systems. There’s technology to be found. Art to be created. There’s also a lot to do in understanding the implications.

But it’s important not to forget the pure investigation of the computational universe. In the analogy of mathematics, there are applications to be pursued. But there’s also a “pure mathematics” that’s worth pursuing in its own right. And so it is with the computational universe: there’s a huge amount to explore just at an abstract level. And indeed (as the title of the book implies) there’s enough to define a whole new kind of science: a pure science of the computational universe. And it’s the opening of that new kind of science that I think is the core achievement of A New Kind of Science—and the one of which I am most proud.

For the 10th anniversary of A New Kind of Science, I wrote three posts:

The complete high-resolution A New Kind of Science is now available on the web. There are also a limited number of print copies of the book still available (all individually coded!).

]]> 0
<![CDATA[Machine Learning for Middle Schoolers]]> Thu, 11 May 2017 19:50:25 +0000 Stephen Wolfram Machine Learning for Middle Schoolers(An Elementary Introduction to the Wolfram Language is available in print, as an ebook, and free on the web—as well as in Wolfram Programming Lab in the Wolfram Open Cloud. There’s also now a free online hands-on course based on it.) A year ago I published a book entitled An Elementary Introduction to the Wolfram [...]]]> Machine Learning for Middle Schoolers

(An Elementary Introduction to the Wolfram Language is available in print, as an ebook, and free on the web—as well as in Wolfram Programming Lab in the Wolfram Open Cloud. There’s also now a free online hands-on course based on it.)

An Elementary Introduction to the Wolfram LanguageA year ago I published a book entitled An Elementary Introduction to the Wolfram Language—as part of my effort to teach computational thinking to the next generation. I just published the second edition of the book—with (among other things) a significantly extended section on modern machine learning.

I originally expected my book’s readers would be high schoolers and up. But it’s actually also found a significant audience among middle schoolers (11- to 14-year-olds). So the question now is: can one teach the core concepts of modern machine learning even to middle schoolers? Well, the interesting thing is that—thanks to the whole technology stack we’ve now got in the Wolfram Language—the answer seems to be “yes”!

Here’s what I did in the book:

After this main text, the book has Exercises, Q&A and Tech Notes.

Exercises, Q&A, Tech Notes

The Backstory

What was my thinking behind this machine learning section? Well, first, it has to fit into the flow of the book—using only concepts that have already been introduced, and, when possible, reinforcing them. So it can talk about images, and real-world data, and graphs, and text—but not functional programming or external data resources.

Chapter list

With modern machine learning, it’s easy to show “wow” examples—like our website from 2015 (based on the Wolfram Language ImageIdentify function). But my goal in the book was also to communicate a bit of the background and intuition of how machine learning works, and where it can be used.

I start off by explaining that machine learning is different from traditional “programming”, because it’s based on learning from examples, rather than on explicitly specifying computational steps. The first thing I discuss is something that doesn’t really need all the fanciness of modern neural-net machine learning: it’s recognizing what languages text fragments are from:

LanguageIdentify[{"thank you", "merci", "dar las gracias", "感謝",    "благодарить"}]

Kids (and other people) can sort of imagine (or discuss in a classroom) how something like this might work—looking words up in dictionaries, etc. And I think it’s useful to give a first example that doesn’t seem like “pure magic”. (In reality, LanguageIdentify uses a combination of traditional lookup, and modern machine learning techniques.)

But then I give a much more “magic” example—of ImageIdentify:


I don’t immediately try to explain how it works, but instead go on to something different: sentiment analysis. Kids have lots of fun trying out sentiment analysis. But the real point here is that it shows the idea of making a “classifier”: there are an infinite number of possible inputs, but only (in this case) 3 possible outputs:

Classify["Sentiment", "I'm so excited to be programming"]

Having seen this, we’re ready to give a little more indication of how something like this works. And what I do is to show the function Classify classifying handwritten digits into 0s and 1s. I’m not saying what’s going on inside, but people can get the idea that Classify is given a bunch of examples, and then it’s using those to classify a particular input as being 0 or 1:


OK, but how does it do this? In reality one’s dealing with ideas about attractors—and inputs that lie in the basins of attraction for particular outputs. But in a first approximation, one can say that inputs that are “nearer to”, say, the 0 examples are taken to be 0s, and inputs that are nearer to the 1 examples are taken to be 1s.

People don’t usually have much difficulty with that explanation—unless they start to think too hard about what “nearest” might really mean in this context. But rather than concentrating on that, what I do in the book is just to talk about the case of numbers, where it’s really easy to see what “nearest” means:

Nearest[{10, 20, 30, 40, 50, 60, 70, 80}, 22]

Nearest isn’t the most exciting function to play with: one potentially puts a lot of things in, and then just one “nearest thing” comes out. Still, Nearest is nice because its functionality is pretty easy to understand (and one can have reasonable guesses about algorithms it could use).

Having seen Nearest for numbers, I show Nearest for colors. In the book, I’ve already talked about how colors are represented by red-green-blue triples of numbers, so this isn’t such a stretch—but seeing Nearest operate on colors begins to make it a little more plausible that it could operate on things like images too.


Next I show the case of words. In the book, I’ve already done quite a bit with strings and words. In the main text I don’t talk about the precise definition of “nearness” for words, but again, kids easily get the basic idea. (In a Tech Note, I do talk about EditDistance, another good algorithmic operation that people can think about and try out.)

Nearest[WordList[], "good", 10]

OK, so how does one get from here to something like ImageIdentify? The approach I used is to talk next about OCR and TextRecognize. This doesn’t seem as “magic” as ImageIdentify (and lots of people know about “OCR’ing documents”), but it’s a good place to get a further idea of what ImageIdentify is doing.

Turning a piece of text into an image, and then back into the same text again, doesn’t seem that impressive or useful. But it gets more interesting if one blurs the text out (and, yes, blurring an image is something I talked about earlier in the book):

Table[Blur[Rasterize[Style["hello", 20]], r], {r, 0, 4}]

Given the blurred image, the question is: can one still recognize the text? At this stage in the book I haven’t talked about /@ (Map) or % (last output) yet, so I have to write the code out a bit more verbosely. But the result is:

TextRecognize /@ %

And, yes, when the image isn’t too blurred, TextRecognize can recognize the text, but when the text gets too blurred, it stops being able to. I like this example, because it shows something impressive—but not “magic”—happening. And I think it’s useful to show both where machine learning-based functions succeed, and where they fail. By the way, the result here is different from the one in the book—because the text font is different, and those details matter when one’s on the edge of what can be recognized. (If one was doing this in a class, for example, one might try some different fonts and sizes, and discuss why some survive more blurring than others.)

TextRecognize shows how one can effectively do something like ImageIdentify, but with just 26 letterforms (well, actually, TextRecognize handles many more glyphs than that). But now in the book I show ImageIdentify again, blurring like we did with letters:

Table[Blur[], r], {r, 0, 22, 2}]

ImageIdentify /@ %

It’s fun to see what it does, but it’s also helpful. Because it gives a sense of the “attractor” around the “cheetah” concept: stay fairly close and the cheetah can still be recognized; go too far away and it can’t. (A slightly tricky issue is that we’re continually producing new, better neural nets for ImageIdentify—so even between when the book was finished and today there’ve been some new nets—and it so happens they give different results for the not-a-cheetah cases. Presumably the new results are “better”, though it’s not clear what that means, given that we don’t have an official right-answer “blurred cheetah” category, and who’s to say whether the blurriest image is more like a whortleberry or a person.)

I won’t go through my whole discussion of machine learning from the book here. Suffice it to say that after discussing explicitly trained functions like TextRecognize and ImageIdentify, I start discussing “unsupervised learning”, and things like clustering in feature space. I think our new FeatureSpacePlot is particularly helpful.

It’s fairly clear what it means to arrange colors:


But then one can “do the same thing” with images of letters. (In the book the code is a little longer, because I haven’t talked about /@ yet.)

FeatureSpacePlot[Rasterize /@ Alphabet[]]

And what’s nice about this is that—as well as being useful in its own right—it also reinforces the idea of how something like TextRecognize might work by finding the “nearest letter” to whatever input it’s given.

My final example in the section uses photographs. FeatureSpacePlot does a nice job of separating images of different kinds of things—again giving an idea of how ImageIdentify might work:


Obviously in just 10 pages in an elementary book I’m not able to give a complete exposition of modern machine learning. But I was pleased to see how many of the core concepts I was able to touch on.

Of course, the fact that this was possible at all depends critically on our whole Wolfram Language technology stack. Whether it’s the very fact that we have machine learning in the language, or the fact that we can seamlessly work with images or text or whatever, or the whole (28-year-old!) Wolfram Notebook system that lets us put all these pieces together—all these pieces are critical to making it possible to bring modern machine learning to people like middle schoolers.

And what I really like is that what one gets to do isn’t toy stuff: one can take what I’m discussing in the book, and immediately apply it in real-world situations. At some level the fact that this works is a reflection of the whole automation emphasis of the Wolfram Language: there’s very sophisticated stuff going on inside, but it’s automated at all levels, so one doesn’t need to be an expert and understand the details to be able to use it—or to get a good intuition about what can work and what can’t.

Going Further

OK, so how would one go further in teaching machine learning?

One early thing might be to start talking about probabilities. ImageIdentify has various possible choices of identifications, but what probabilities does it assign to them?
ImageIdentify[, All, 10, "Probability"]

This can lead to a useful discussion about prior probabilities, and about issues like trading off specificity for certainty.

But the big thing to talk about is training. (After all, “machine learning trainer” will surely be a big future career for some of today’s middle schoolers…) And the good news is that in the Wolfram Language environment, it’s possible to make training work with only a modest amount of data.

Let’s get some examples of images of characters from Guardians of the Galaxy by searching the web (we’re using an external search API, so you unfortunately can’t do exactly this on the Open Cloud):

data = AssociationMap[ WebImageSearch[#, "Thumbnails"] &, {"Star-Lord", "Gamora", "Groot", "Rocket Raccoon"}]

Now we can use these images as training material to create a classifier:


And, sure enough, it can identify Rocket:


And, yes, it thinks a real raccoon is him too:


How does it do it? Well, let’s look at FeatureSpacePlot:


Some of this looks good—but some looks confusing. Because it’s arranging some of the images not according to who they’re of, but just according to their background colors. And here we begin to see some of the subtlety of machine learning. The actual classifier we built works only because in the training examples for each character there were ones with different backgrounds—so it can figure out that background isn’t the only distinguishing feature.

Actually, there’s another critical thing as well: Classify isn’t starting from scratch in classifying the images. Because it’s already been pre-trained to pick out “good features” that help distinguish real-world images. In fact, it’s actually using everything it learned from the creation of ImageIdentify—and the tens of millions of images it saw in connection with that—to know up front what features it should pay attention to.

It’s a bit weird to see, but internally Classify is characterizing each image as a list of numbers, each associated with a different “feature”:


One can do an extreme version of this in which one insists that each image is reduced to just two numbers—and that’s essentially how FeatureSpacePlot determines where to position an image:


Under the Hood

OK, but what’s going on under the hood? Well, it’s complicated. But in the Wolfram Language it’s easy to see—and getting a look at it helps in terms of getting an intuition about how neural nets really work. So, for example, here’s the low-level Wolfram Language symbolic representation of the neural net that powers ImageIdentify:

net = NetModel["Wolfram ImageIdentify Net for WL 11.1"]

And there’s actually even more: just click and keep drilling down:

net = NetModel["Wolfram ImageIdentify Net for WL 11.1"]

And yes, this is hard to understand—certainly for middle schoolers, and even for professionals. But if we take this whole neural net object, and apply it to a picture of a tiger, it’ll do what ImageIdentify does, and tell us it’s a tiger:


But here’s a neat thing, made possible by a whole stack of functionality in the Wolfram Language: we can actually go “inside” the neural net, to get a sense of what’s happening. As an example, let’s just take the first 3 “layers” of the network, apply them to the tiger, and visualize what comes out:

Image /@ Take[net, 3][]

Basically what’s happening is that the network has made lots of copies of the original image, and then processed each of them to pick out a different aspect of the image. (What’s going on actually seems to be remarkably similar to the first few levels of visual processing in the brain.)

What if we go deeper into the network? Here’s what happens at layer 10. The images are more abstracted, and presumably pick out higher-level features:

Image /@ Take[Take[net, 10][],20]

Go to level 20, and the network is “thinking about” lots of little images:

ImageAssemble[Partition[Image /@ Take[net, 20][],30]]

But by level 28, it’s beginning to “come to some conclusions”, with only a few of its possible channels of activity “lighting up”:

ImageAdjust[ImageAssemble[Partition[Image /@ Take[net, 28][],50]]]

Finally, by level 31, all that’s left is an array of numbers, with a few peaks visible:

ListLinePlot[Take[net, 31][]]

And applying the very last layer of the network (a “softmax” layer) only a couple of peaks are left:

ListLinePlot[net[,None], PlotRange -> All]

And the highest one is exactly the one that corresponds to the concept of “tiger”:


I’m not imagining that middle schoolers will follow all these details (and no, nobody should be learning neural net layer types like they learn parts of the water cycle). But I think it’s really useful to see “inside” ImageIdentify, and get even a rough sense of how it works. To someone like me it still seems a little like magic that it all comes together as it does. But what’s great is that now with our latest Wolfram Language tools one can easily look inside, and start getting an intuition about what’s going on.

The Process of Training

The idea of the Wolfram Language Classify function is to do machine learning at the highest possible level—as automatically as possible, and building on as much pre-training as possible. But if one wants to get a more complete feeling for what machine learning is like, it’s useful to see what happens if one instead tries to just train a neural net from scratch.

There is an immediate practical issue though: to get a neural net, starting from scratch, to actually do anything useful, one typically has to give it a very large amount of training data—which is hard to collect and wrangle. But the good news here is that with the recent release of the Wolfram Data Repository we have a growing collection of ready-to-use training sets immediately available for use in the Wolfram Language.

Like here’s the classic MNIST handwritten digit training set, with its 60,000 training examples:


One thing one can do with a training set like this is just feed a random sample of it into Classify. And sure enough this gives one a classifier function that’s essentially a simple version of TextRecognize for handwritten digits:

c = Classify[RandomSample[ResourceData["MNIST"], 1000]]

And even with just 1000 training examples, it does pretty well:


And, yes, we can use FeatureSpacePlot to see how the different digits tend to separate in feature space:

FeatureSpacePlot[First /@ RandomSample[ResourceData["MNIST"], 250]]

But, OK, what if we want to actually train a neural net from scratch, with none of the fancy automation of Classify? Well, first we have to set up a raw neural net. And conveniently, the Wolfram Language has a bunch of classic neural nets built in. Here one’s called LeNet:

lenet = NetModel["LeNet"]

It’s much simpler than the ImageIdentify net, but it’s still pretty complicated. But we don’t have to understand what’s inside it to start training it. Instead, in the Wolfram Language, we can just use NetTrain (which, needless to say, automatically applies all the latest GPU tricks and so on):

net = NetTrain[lenet, RandomSample[ResourceData["MNIST"], 1000]]

It’s pretty neat to watch the training happening, and to see the orange line of the neural net’s error rate for fitting the examples keep going down. After about 20 seconds, NetTrain decides it’s gone far enough, and generates a finally trained net—which works pretty well:


If you stop the training early, it won’t do quite so well:

net = NetTrain[lenet, RandomSample[ResourceData["MNIST"], 1000], MaxTrainingRounds -> 1]


In the professional world of machine learning, there’s a whole art and science of figuring out the best parameters for training. But with what we’ve got now in the Wolfram Language, nothing is stopping a middle schooler from doing their own experiments, visualizing and analyzing the results, and getting as good an intuition as anyone.

What Are Neural Nets Made Of?

OK, so if we want to really get down to the lowest level, we have to talk about what neural nets are made of. I’m not sure how much of this is middle-school stuff—but as soon as one knows about graphs of functions, one can already explain quite a bit. Because, you see, the “layers” in a neural net are actually just functions, that take numbers in, and put numbers out.

Take layer 2 of LeNet. It’s essentially just a simple Ramp function, which we can immediately plot (and, yes, it looks like a ramp):

Plot[Ramp[x], {x, -1, 1}]

Neural nets don’t typically just deal with individual numbers, though. They deal with arrays (or “tensors”) of numbers—represented in the Wolfram Language as nested lists. And each layer takes an array of numbers in, and puts an array of numbers out. Here’s a typical single layer:

layer = NetInitialize[LinearLayer[4, "Input" -> 2]]

This particular layer is set up to take 2 numbers as input, and put 4 numbers out:

layer[{2, 3}]

It might seem to be doing something quite “random”, and actually it is. Because the actual function the layer is implementing is determined by yet another array of numbers, or “weights”—which NetInitialize here just sets randomly. Here’s what it set them to in this particular case:

NetExtract[layer, "Weights"]

Why is any of this useful? Well, the crucial point is that what NetTrain does is to progressively tweak the weights in each layer of a neural network to try to get the overall behavior of the net to match the training examples you gave.

There are two immediate issues, though. First, the structure of the network has to be such that it’s possible to get the behavior you want by using some appropriate set of weights. And second, there has to be some way to progressively tweak weights so as to get to appropriate values.

Well, it turns out a single LinearLayer like the one above can’t do anything interesting. Here’s a contour plot of (the first element of) its output, as a function of its two inputs. And as the name LinearLayer might suggest, we always get something flat and linear out:

ContourPlot[First[layer[{x, y}]], {x, -1, 1}, {y, -1, 1}]

But here’s the big discovery that makes neural nets useful: if we chain together several layers, it’s easy to get something much more complicated. (And, yes, in the Wolfram Language outputs from one layer get knitted into inputs to the next layer in a nice, automatic way.) Here’s an example with 4 layers—two linear layers and two ramps:

net = NetInitialize[   NetChain[{LinearLayer[10], Ramp, LinearLayer[1], Ramp},     "Input" -> 2]]

And now when we plot the function, it’s more complicated:

ContourPlot[net[{x, y}], {x, -1, 1}, {y, -1, 1}]

We can actually also look at an even simpler case—of a neural net with 3 layers, and just one number as final output. (For technical reasons, it’s nice to still have 2 inputs, though we’ll always set one of those inputs to the constant value of 1.)

net = NetInitialize[   NetChain[{LinearLayer[3], Ramp, LinearLayer[1]}, "Input" -> 2]]

Here’s what this particular network does as a function of its input:

Plot[net[{x, 1}], {x, -2, 2}]

Inside the network, there’s an array of 3 numbers being generated—and it turns out that “3” causes there to be at most 3 (+1) distinct linear parts in the function. Increase the 3 to 100, and things can get more complicated:

net = NetInitialize[   NetChain[{LinearLayer[100], Ramp, LinearLayer[1]}, "Input" -> 2]]
Plot[net[{x, 1}], {x, -2, 2}]

Now, the point is that this is in a sense a “random function”, determined by the particular random weights picked by NetInitialize. If we run NetInitialize a bunch of times, we’ll get a bunch of different results:

Table[With[{net =      NetInitialize[      NetChain[{LinearLayer[100], Ramp, LinearLayer[1]},        "Input" -> 2]]}, Plot[net[{x, 1}], {x, -2, 2}]], 8]

But the big question is: can we find an instance of this “random function” that’s useful for whatever we’re trying to do? Or, more particularly, can we find a random function that reproduces particular training examples?

Let’s imagine that our training examples give the values of the function at the dots in this plot (by the way, the setup here is more like machine learning in the style of Predict than Classify):

ListLinePlot[Table[Mod[n^2, 5], {n, 15}], Mesh -> All]

Here’s an instance of our network again:

net = NetInitialize[   NetChain[{LinearLayer[100], Ramp, LinearLayer[1]}, "Input" -> 2]]

And here’s a plot of what it initially does over the range of the training examples (and, yes, it’s obviously completely wrong):

Plot[net[{n, 1}], {n, 1, 15}]

Well, let’s just try training our network on our training data using NetTrain:

net = NetTrain[net, data = Table[{n, 1} -> {Mod[n^2, 5]}, {n, 15}]]

After about 20 seconds of training on my computer, there’s some vague sign that we’re beginning to reproduce at least some aspects of the original training data. But it’s at best slow going—and it’s not clear what’s eventually going to happen.

Plot[net[{n, 1}], {n, 1, 15}]

It’s a frontier question in neural net research just what structure of net will work best in any particular case (yes, we’re working on this question). But here let’s just try a slightly more complicated network:

net = NetInitialize[   NetChain[{LinearLayer[100], Tanh, LinearLayer[10], Ramp,      LinearLayer[1]}, "Input" -> 2]]

Random instances of this network don’t give very different results from our last network (though the presence of that Tanh layer makes the functions a bit smoother):

Tanh layer

But now let’s do some training (data was defined above):

net = NetTrain[net, data]

And here’s the result—which is surprisingly decent:

Plot[net[{n, 1}], {n, 1, 15}]

In fact, if we compare it to our original training data we see that the training values lie right on the function that the neural net produced:

Show[Plot[net[{n, 1}], {n, 1, 15}],   ListPlot[Table[Mod[n^2, 5], {n, 1, 15}], PlotStyle -> Red]]

Here’s what happened during the training process. The neural net effectively “tried out” a bunch of different possibilities, finally settling on the result here:

Machine learning animation

In what sense is the result “correct”? Well, it fits the training examples, and that’s really all we can ask. Because that’s all the input we gave. How it “interpolates” between the training examples is really its own business.  We’d like it to learn to “generalize” from the data it’s given—but it can’t really deduce much about the whole distribution of the data from the few points it’s being given here, so the kind of smooth interpolation it’s doing is as good as anything.

Outside the range of the training values, the neural net does what seem to be fairly random things—but again, there’s no “right answer” so one can’t really fault it:

Plot[net[{n, 1}], {n, -5, 25}]

But the fact that with the arbitrariness and messiness of our original neural net, we were able to successfully train it at all is quite remarkable. Neural nets of pretty much the type we’re talking about here had actually been studied for more than 60 years—but until the modern “deep learning revolution” nobody knew that it was going to be practical to train them for real problems.

But now—particularly with everything we have now in the Wolfram Language—it’s easy for anyone to do this.

So Much to Explore

Modern machine learning is very new—so even many of the obvious experiments haven’t been tried yet. But with our whole Wolfram Language setup there’s a lot that even middle schoolers can do. For example (and I admit I’m curious about this as I write this post): one can ask just how much something like the tiny neural net we were studying can learn.

Here’s a plot of the lengths of the first 60 Roman numerals:

ListLinePlot[Table[StringLength[RomanNumeral[n]], {n, 60}]]

After a small amount of training, here’s what the network managed to reproduce:

NetTrain[net,    Table[{n, 1} -> {StringLength[RomanNumeral[n]]}, {n, 60}]];
Plot[%[{n, 1}], {n, 1, 60}]

And one might think that maybe this is the best it’ll ever do. But I was curious if it could eventually do better—and so I just let it train for 2 minutes on my computer. And here’s the considerably better result that came out:

NetTrain[net,    Table[{n, 1} -> {StringLength[RomanNumeral[n]]}, {n, 60}],    MaxTrainingRounds -> Quantity[2, "Minutes"]];

Plot[%[{n, 1}], {n, 1, 60}]

I think I can see why this particular thing works the way it does.  But seeing it suggests all sorts of new questions to pursue. But to me the most exciting point is the overarching one of just how wide open this territory is—and how easy it is now to explore it.

Yes, there are plenty of technical details—some fundamental, some superficial. But transcending all of these, there’s intuition to be developed. And that’s something that can perfectly well start with the middle schoolers…

]]> 1
<![CDATA[Launching the Wolfram Data Repository: Data Publishing that Really Works]]> Thu, 20 Apr 2017 16:04:20 +0000 Stephen Wolfram launching-the-wolfram-data-repositoryAfter a Decade, It’s Finally Here! I’m pleased to announce that as of today, the Wolfram Data Repository is officially launched! It’s been a long road. I actually initiated the project a decade ago—but it’s only now, with all sorts of innovations in the Wolfram Language and its symbolic ways of representing data, as well [...]]]> launching-the-wolfram-data-repository

After a Decade, It’s Finally Here!

I’m pleased to announce that as of today, the Wolfram Data Repository is officially launched! It’s been a long road. I actually initiated the project a decade ago—but it’s only now, with all sorts of innovations in the Wolfram Language and its symbolic ways of representing data, as well as with the arrival of the Wolfram Cloud, that all the pieces are finally in place to make a true computable data repository that works the way I think it should.

Wolfram Data Respository

It’s happened to me a zillion times: I’m reading a paper or something, and I come across an interesting table or plot. And I think to myself: “I’d really like to get the data behind that, to try some things out”. But how can I get the data?

If I’m lucky there’ll be a link somewhere in the paper. But it’s usually a frustrating experience to follow it. Because even if there’s data there (and often there actually isn’t), it’s almost never in a form where one can readily use it. It’s usually quite raw—and often hard to decode, and perhaps even intertwined with text. And even if I can see the data I want, I almost always find myself threading my way through footnotes to figure out what’s going on with it. And in the end I usually just decide it’s too much trouble to actually pull out the data I want.

And I suppose one might think that this is just par for the course in working with data. But in modern times, we have a great counterexample: the Wolfram Language. It’s been one of my goals with the Wolfram Language to build into it as much data as possible—and make all of that data immediately usable and computable. And I have to say that it’s worked out great. Whether you need the mass of Jupiter, or the masses of all known exoplanets, or Alan Turing’s date of birth—or a trillion much more obscure things—you just ask for them in the language, and you’ll get them in a form where you can immediately compute with them.

Here’s the mass of Jupiter (and, yes, one can use “Wolfram|Alpha-style” natural language to ask for it):


Dividing it by the mass of the Earth immediately works:

Entity["Planet", "Jupiter"]["Mass"]/Entity["Planet", "Earth"]["Mass"]

Here’s a histogram of the masses of known exoplanets, divided by the mass of Jupiter:

Histogram[  EntityClass["Exoplanet", All]["Mass"]/   Entity["Planet", "Jupiter"]["Mass"]]

And here, for good measure, is Alan Turing’s date of birth, in an immediately computable form:

Alan Turing(person)["BirthDate"]

Of course, it’s taken many years and lots of work to make everything this smooth, and to get to the point where all those thousands of different kinds of data are fully integrated into the Wolfram Language—and Wolfram|Alpha.

But what about other data—say data from some new study or experiment? It’s easy to upload it someplace in some raw form. But the challenge is to make the data actually useful.

And that’s where the new Wolfram Data Repository comes in. Its idea is to leverage everything we’ve done with the Wolfram Language—and Wolfram|Alpha, and the Wolfram Cloud—to make it as easy as possible to make data as broadly usable and computable as possible.

There are many parts to this. But let me state our basic goal. I want it to be the case that if someone is dealing with data they understand well, then they should be able to prepare that data for the Wolfram Data Repository in as little as 30 minutes—and then have that data be something that other people can readily use and compute with.

It’s important to set expectations. Making data fully computable—to the standard of what’s built into the Wolfram Language—is extremely hard. But there’s a lower standard that still makes data extremely useful for many purposes. And what’s important about the Wolfram Data Repository (and the technology around it) is it now makes that standard easy to achieve—with the result that it’s now practical to publish data in a form that can really be used by many people.

An Example

Each item published in the Wolfram Data Repository gets its own webpage. Here, for example, is the page for a public dataset about meteorite landings:

Meteorite Landings

At the top is some general information about the dataset. But then there’s a piece of a Wolfram Notebook illustrating how to use the dataset in the Wolfram Language. And by looking at this notebook, one can start to see some of the real power of the Wolfram Data Repository.

One thing to notice is that it’s very easy to get the data. All you do is ask for ResourceData["Meteorite Landings"]. And whether you’re using the Wolfram Language on a desktop or in the cloud, this will give you a nice symbolic representation of data about 45716 meteorite landings (and, yes, the data is carefully cached so this is as fast as possible, etc.):

And then the important thing is that you can immediately start to do whatever computation you want on that dataset. As an example, this takes the "Coordinates" element from all rows, then takes a random sample of 1000 results, and geo plots them:

GeoListPlot[RandomSample[ResourceData["Meteorite Landings"][All, "Coordinates"],1000]]

Many things have to come together for this to work. First, the data has to be reliably accessible—as it is in the Wolfram Cloud. Second, one has to be able to tell where the coordinates are—which is easy if one can see the dataset in a Wolfram Notebook. And finally, the coordinates have to be in a form in which they can immediately be computed with.

This last point is critical. Just storing the textual form of a coordinate—as one might in something like a spreadsheet—isn’t good enough. One has to have it in a computable form. And needless to say, the Wolfram Language has such a form for geo coordinates: the symbolic construct GeoPosition[{lat, lon}].

There are other things one can immediately see from the meteorites dataset too. Like notice there’s a "Mass" column. And because we’re using the Wolfram Language, masses don’t have to just be numbers; they can be symbolic Quantity objects that correctly include their units. There’s also a "Year" column in the data, and again, each year is represented by an actual, computable, symbolic DateObject construct.

There are lots of different kinds of possible data, and one needs a sophisticated data ontology to handle them. But that’s exactly what we’ve built for the Wolfram Language, and for Wolfram|Alpha, and it’s now been very thoroughly tested. It involves 10,000 kinds of units, and tens of millions of “core entities”, like cities and chemicals and so on. We call it the Wolfram Data Framework (WDF)—and it’s one of the things that makes the Wolfram Data Repository possible.

What’s in the Wolfram Data Repository So Far?

Today is the initial launch of the Wolfram Data Repository, and to get ready for this launch we’ve been adding sample content to the repository for several months. Some of what we’ve added are “obvious” famous datasets. Some are datasets that we found for some reason interesting, or curious. And some are datasets that we created ourselves—and in some cases that I created myself, for example, in the course of writing my book A New Kind of Science.

WDR Home Page Categories

There’s plenty already in the Wolfram Data Repository that’ll immediately be useful in a variety of applications. But in a sense what’s there now is just an example of what can be there—and the kinds of things we hope and expect will be contributed by many other people and organizations.

The fact that the Wolfram Data Repository is built on top of our Wolfram Language technology stack immediately gives it great generality—and means that it can handle data of any kind. It’s not just tables of numerical data as one might have in a spreadsheet or simple database. It’s data of any type and structure, in any possible combination or arrangement.

Home Page Types

There are time series:

Take[ResourceData["US Federal Outlays by Agency"], 3]

There are training sets for machine learning:

RandomSample[ResourceData["MNIST"], 30]

There’s gridded data:

ResourceData["GMM-3 Mars Gravity Map"]

There’s the text of many books:

WordCloud[ResourceData["On the Origin of Species"]]

There’s geospatial data:

ResourceData["Global Air Navigation Aids"]

GeoHistogram[ResourceData"Global Air Navigation Aids"][All, "Geoposition"], 50, GeoRange-> United States]

Many of the data resources currently in the Wolfram Data Repository are quite tabular in nature. But unlike traditional spreadsheets or tables in databases, they’re not restricted to having just one level of rows and columns—because they’re represented using symbolic Wolfram Language Dataset constructs, which can handle arbitrarily ragged structures, of any depth.

ResourceData["Sample Data: Solar System Planets and Moons"]

But what about data that normally lives in relational or graph databases? Well, there’s a construct called EntityStore that was recently added to the Wolfram Language. We’ve actually been using something like it for years inside Wolfram|Alpha. But what EntityStore now does is to let you set up arbitrary networks of entities, properties and values, right in the Wolfram Language. It typically takes more curation than setting up something like a Dataset—but the result is a very convenient representation of knowledge, on which all the same functions can be used as with built-in Wolfram Language knowledge.

Here’s a data resource that’s an entity store:

ResourceData ["Museum of Modern Art Holdings and Artists"]

This adds the entity stores to the list of entity stores to be used automatically:

PrependTo [$EntityStores, %];

Now here are 5 random entities of type "MoMAArtist" from the entity store:

RandomEntity ["MoMAArtist", 5]

For each artist, one can extract a dataset of values:

Entity[MoMAArtist], Otto Mühl[Dataset]

This queries the entity store to find artists with the most recent birth dates:

EntityList[Entity["MoMAArtist", "BirthDate" -> TakeLargest[5]]]

How It Works

The Wolfram Data Repository is built on top of a new, very general thing in the Wolfram Language called the “resource system”. (Yes, expect all sorts of other repository and marketplace-like things to be rolling out shortly.)

The resource system has “resource objects”, that are stored in the cloud (using CloudObject), then automatically downloaded and cached on the desktop if necessary (using LocalObject). Each ResourceObject contains both primary content and metadata. For the Wolfram Data Repository, the primary content is data, which you can access using ResourceData.


The Wolfram Data Repository that we’re launching today is a public resource, that lives in the public Wolfram Cloud. But we’re also going to be rolling out private Wolfram Data Repositories, that can be run in Enterprise Private Clouds—and indeed inside our own company we’ve already set up several private data repositories, that contain internal data for our company.

There’s no limit in principle on the size of the data that can be stored in the Wolfram Data Repository. But for now, the “plumbing” is optimized for data that’s at most about a few gigabytes in size—and indeed the existing examples in the Wolfram Data Repository make it clear that an awful lot of useful data never even gets bigger than a few megabytes in size.

The Wolfram Data Repository is primarily intended for the case of definitive data that’s not continually changing. For data that’s constantly flowing in—say from IoT devices—we released last year the Wolfram Data Drop. Both Data Repository and Data Drop are deeply integrated into the Wolfram Language, and through our resource system, there’ll be some variants and combinations coming in the future.

Delivering Data to the World

Our goal with the Wolfram Data Repository is to provide a central place for data from all sorts of organizations to live—in such a way that it can readily be found and used.

Each entry in the Wolfram Data Repository has an associated webpage, which describes the data it contains, and gives examples that can immediately be run in the Wolfram Cloud (or downloaded to the desktop).

Open in Cloud

On the webpage for each repository entry (and in the ResourceObject that represents it), there’s also metadata, for indexing and searching—including standard Dublin Core bibliographic data. To make it easier to refer to the Wolfram Data Repository entries, every entry also has a unique DOI.

The way we’re managing the Wolfram Data Repository, every entry also has a unique readable registered name, that’s used both for the URL of its webpage, and for the specification of the ResourceObject that represents the entry.

It’s extremely easy to use data from the Wolfram Data Repository inside a Wolfram Notebook, or indeed in any Wolfram Language program. The data is ultimately stored in the Wolfram Cloud. But you can always download it—for example right from the webpage for any repository entry.

The richest and most useful form in which to get the data is the Wolfram Language or the Wolfram Data Framework (WDF)—either in ASCII or in binary. But we’re also setting it up so you can download in other formats, like JSON (and in suitable cases CSV, TXT, PNG, etc.) just by pressing a button.

Data Downloads

Of course, even formats like JSON don’t have native ways to represent entities, or quantities with units, or dates, or geo positions—or all those other things that WDF and the Wolfram Data Repository deal with. So if you really want to handle data in its full form, it’s much better to work directly in the Wolfram Language. But then with the Wolfram Language you can always process some slice of the data into some simpler form that does makes sense to export in a lower-level format.

How Contributions Work

The Wolfram Data Repository as we’re releasing it today is a platform for publishing data to the world. And to get it started, we’ve put in about 500 sample entries. But starting today we’re accepting contributions from anyone. We’re going to review and vet contributions much like we’ve done for the past decade for the Wolfram Demonstrations Project. And we’re going to emphasize contributions and data that we feel are of general interest.

But the technology of the Wolfram Data Repository—and the resource system that underlies it—is quite general, and allows people not just to publish data freely to the world, but also to share data in a more controlled fashion. The way it works is that people prepare their data just like they would for submission to the public Wolfram Data Repository. But then instead of actually submitting it, they just deploy it to their own Wolfram Cloud accounts, giving access to whomever they want.

And in fact, the general workflow is that even when people are submitting to the public Wolfram Data Repository, we’re going to expect them to have first deployed their data to their own Wolfram Cloud accounts. And as soon as they do that, they’ll get webpages and everything—just like in the public Wolfram Data Repository.

OK, so how does one create a repository entry? You can either do it programmatically using Wolfram Language code, or do it more interactively using Wolfram Notebooks. Let’s talk about the notebook way first.

You start by getting a template notebook. You can either do this through the menu item File > New > Data Resource, or you can use CreateNotebook["DataResource"]. Either way, you’ll get something that looks like this:

Data Resource Construction Notebook

Basically it’s then a question of “filling out the form”. A very important section is the one that actually provides the content for the resource:

Resource Content

Yes, it’s Wolfram Language code—and what’s nice is that it’s flexible enough to allow for basically any content you want. You can either just enter the content directly in the notebook, or you can have the notebook refer to a local file, or to a cloud object you have.

An important part of the Construction Notebook (at least if you want to have a nice webpage for your data) is the section that lets you give examples. When the examples are actually put up on the webpage, they’ll reference the data resource you’re creating. But when you’re filling in the Construction Notebook the resource hasn’t been created yet. The symbolic character of the Wolfram Language comes to the rescue, though. Because it lets you reference the content of the data resource symbolically as $$Data in the inputs that’ll be displayed, but lets you set $$Data to actual data when you’re working in the Construction Notebook to build up the examples.

Heart Rate Data

Alright, so once you’ve filled out the Construction Notebook, what do you do? There are two initial choices: set up the resource locally on your computer, or set it up in the cloud:

Private Deploy

And then, if you’re ready, you can actually submit your resource for publication in the public Wolfram Data Repository (yes, you need to get a Publisher ID, so your resource can be associated with your organization rather than just with your personal account):

Submit to the Wolfram Data Repository Page

It’s often convenient to set up resources in notebooks. But like everything else in our technology stack, there’s a programmatic Wolfram Language way to do it too—and sometimes this is what will be best.

Remember that everything that is going to be in the Wolfram Data Repository is ultimately a ResourceObject. And a ResourceObject—like everything else in the Wolfram Language—is just a symbolic expression, which happens to contain an association that gives the content and metadata of the resource object.

Well, once you’ve created an appropriate ResourceObject, you can just deploy it to the cloud using CloudDeploy. And when you do this, a private webpage associated with your cloud account will automatically be created. That webpage will in turn correspond to a CloudObject. And by setting the permissions of that cloud object, you can determine who will be able to look at the webpage, and who will be able to get the data that’s associated with it.

When you’ve got a ResourceObject, you can submit it to the public Wolfram Data Repository just by using ResourceSubmit.

By the way, all this stuff works not just for the main Wolfram Data Repository in the public Wolfram Cloud, but also for data repositories in private clouds. The administrator of an Enterprise Private Cloud can decide how they want to vet data resources that are submitted (and how they want to manage things like name collisions)—though often they may choose just to publish any resource that’s submitted.

The procedure we’ve designed for vetting and editing resources for the public Wolfram Data Repository is quite elaborate—though in any given case we expect it to run quickly. It involves doing automated tests on the incoming data and examples—and then ensuring that these continue working as changes are made, for example in subsequent versions of the Wolfram Language. Administrators of private clouds definitely don’t have to use this procedure—but we’ll be making our tools available if they want to.

Making a Data-Backed Publication

OK, so let’s say there’s a data resource in the Wolfram Data Repository. How can it actually be used to create a data-backed publication? The most obvious answer is just for the publication to include a link to the webpage for the data resource in the Wolfram Data Repository. And once people go to the page, it immediately shows them how to access the data in the Wolfram Language, use it in the Wolfram Open Cloud, download it, or whatever.

But what about an actual visualization or whatever that appears in the paper? How can people know how to make it? One possibility is that the visualization can just be included among the examples on the webpage for the data resource. But there’s also a more direct way, which uses Source Links in the Wolfram Cloud.

Here’s how it works. You create a Wolfram Notebook that takes data from the Wolfram Data Repository and creates the visualization:

Creating Visualizations with a Wolfram Notebook

Then you deploy this visualization to the Wolfram Cloud—either using Wolfram Language functions like CloudDeploy and EmbedCode, or using menu items. But when you do the deployment, you say to include a source link (SourceLink->Automatic in the Wolfram Language). And this means that when you get an embeddable graphic, it comes with a source link that takes you back to the notebook that made the graphic:

Sourcelink to the Notebook

So if someone is reading along and they get to that graphic, they can just follow its source link to see how it was made, and to see how it accesses data from the Wolfram Data Repository. With the Wolfram Data Repository you can do data-backed publishing; with source links you can also do full notebook-backed publishing.

The Big Win

Now that we’ve talked a bit about how the Wolfram Data Repository works, let’s talk again about why it’s important—and why having data in it is so valuable.

The #1 reason is simple: it makes data immediately useful, and computable.

There’s nice, easy access to the data (just use ResourceData["..."]). But the really important—and unique—thing is that data in the Wolfram Data Repository is stored in a uniform, symbolic way, as WDF, leveraging everything we’ve done with data over the course of so many years in the Wolfram Language and Wolfram|Alpha.

Why is it good to have data in WDF? First, because in WDF the meaning of everything is explicit: whether it’s an entity, or quantity, or geo position, or whatever, it’s a symbolic element that’s been carefully designed and documented. (And it’s not just a disembodied collection of numbers or strings.) And there’s another important thing: data in WDF is already in precisely the form it’s needed for one to be able to immediately visualize, analyze or otherwise compute with it using any of the many thousands of functions that are built into the Wolfram Language.

Wolfram Notebooks are also an important part of the picture—because they make it easy to show how to work with the data, and give immediately runnable examples. Also critical is the fact that the Wolfram Language is so succinct and easy to read—because that’s what makes it possible to give standalone examples that people can readily understand, modify and incorporate into their own work.

In many cases using the Wolfram Data Repository will consist of identifying some data resource (say through a link from a document), then using the Wolfram Language in Wolfram Notebooks to explore the data in it. But the Wolfram Data Repository is fully integrated into the Wolfram Language, so it can be used wherever the language is used. Which means the data from the Wolfram Data Repository can be used not just in the cloud or on the desktop, but also in servers and so on. And, for example, it can also be used in APIs or scheduled tasks, using the exact same ResourceData functions as ever.

The most common way the Wolfram Data Repository will be used is one resource at a time. But what’s really great about the uniformity and standardization that WDF provides is that it allows different data resources to be used together: those dates or geo positions mean the same thing even in different data resources, so they can immediately be put together in the same analysis, visualization, or whatever.

The Wolfram Data Repository builds on the whole technology stack that we’ve been assembling for the past three decades. In some ways it’s just a sophisticated piece of infrastructure that makes a lot of things easier to do. But I can already tell that its implications go far beyond that—and that it’s going to have a qualitative effect on the extent to which people can successfully share and reuse a wide range of kinds of data.

The Process of Data Curation

It’s a big win to have data in the Wolfram Data Repository. But what’s involved in getting it there? There’s almost always a certain amount of data curation required.

Let’s take a look again at the meteorite landings dataset I showed earlier in this post. It started from a collection of data made available in a nicely organized way by NASA. (Quite often one has to scrape webpages or PDFs; this is a case where the data happens to be set up to be downloadable in a variety of convenient formats.)

Meteorite Landing Data from NASA

As is fairly typical, the basic elements of the data here are numbers and strings. So the first thing to do is to figure out how to map these to meaningful symbolic constructs in WDF. For example, the “mass” column is labeled as being “(g)”, i.e. in grams—so each element in it should get converted to Quantity[value,"Grams"]. It’s a little trickier, though, because for some rows—corresponding to some meteorites—the value is just blank, presumably because it isn’t known.

So how should that be represented? Well, because the Wolfram Language is symbolic it’s pretty easy. And in fact there’s a standard symbolic construct Missing[...] for indicating missing data, which is handled consistently in analysis and visualization functions.

As we start to look further into the dataset, we see all sorts of other things. There’s a column labeled “year”. OK, we can convert that into DateObject[{value}]—though we need to be careful about any BC dates (how would they appear in the raw data?).

Next there are columns “reclat” and “reclong”, as well as a column called “GeoLocation” that seems to combine these, but with numbers quoted a different precision. A little bit of searching suggests that we should just take reclat and reclong as the latitude and longitude of the meteorite—then convert these into the symbolic form GeoPosition[{lat,lon}].

To do this in practice, we’d start by just importing all the data:

Import ["\ DOWNLOAD", "CSV"]

OK, let’s extract a sample row:

data [[2]]

Already there’s something unexpected: the date isn’t just the year, but instead it’s a precise time. So this needs to be converted:

Interpreter ["DateTime"][data[[2, 7]]]

Now we’ve got to reset this to correspond only to a date at a granularity of a year:

DateObject [{DateValue[Interpreter["DateTime"][data[[2, 7]]],     "Year"]}, "Year"]

Here is the geo position:

GeoPosition [{data[[2, -3]], data[[2, -2]]}]" title="GeoPosition[{data[[2, -3]], data[[2, -2]]}]

And we can keep going, gradually building up code that can be applied to each row of the imported data. In practice there are often little things that go wrong. There’s something missing in some row. There’s an extra piece of text (a “footnote”) somewhere. There’s something in the data that got misinterpreted as a delimiter when the data was provided for download. Each one of these needs to be handled—preferably with as much automation as possible.

But in the end we have a big list of rows, each of which needs to be assembled into an association, then all combined to make a Dataset object that can be checked to see if it’s good to go into the Wolfram Data Repository.

The Data Curation Hierarchy

The example above is fairly typical of basic curation that can be done in less than 30 minutes by any decently skilled user of the Wolfram Language. (A person who’s absorbed my book An Elementary Introduction to the Wolfram Language should, for example, be able to do it.)

It’s a fairly simple example—where notably the original form of the data was fairly clean. But even in this case it’s worth understanding what hasn’t been done. For example, look at the column labeled "Classification" in the final dataset. It’s got a bunch of strings in it. And, yes, we can do something like make a word cloud of these strings:

WordCloud [  Normal[ResourceData["Meteorite Landings"][All, "Classification"]]]

But to really make these values computable, we’d have to do more work. We’d have to figure out some kind of symbolic representation for meteorite classification, then we’d have to do curation (and undoubtedly ask some meteorite experts) to fit everything nicely into that representation. The advantage of doing this is that we could then ask questions about those values (“what meteorites are above L3?”), and expect to compute answers. But there’s plenty we can already do with this data resource without that.

My experience in general has been that there’s a definite hierarchy of effort and payoff in getting data to be computable at different levels—starting with the data just existing in digital form, and ending with the data being cleanly computable enough that it can be fully integrated in the core Wolfram Language, and used for repeated, systematic computations.

Data Hierarchy Levels


Let’s talk about this hierarchy a bit.

The zeroth thing, of course, is that the data has to exist. And the next thing is that it has to be in digital form. If it started on handwritten index cards, for example, it had better have been entered into a document or spreadsheet or something.

But then the next issue is: how are people supposed to get access to that document or spreadsheet? Well, a good answer is that it should be in some kind of accessible cloud—perhaps referenced with a definite URI. And for a lot of data repositories that exist out there, just making the data accessible like this is the end of the story.

But one has to go a lot further to make the data actually useful. The next step is typically to make sure that the data is arranged in some definite structure. It might be a set of rows and columns, or it might be something more elaborate, and, say, hierarchical. But the point is to have a definite, known structure.

In the Wolfram Language, it’s typically trivial to take data that’s stored in any reasonable format, and use Import to get it into the Wolfram Language, arranged in some appropriate way. (As I’ll talk about later, it might be a Dataset, it might be an EntityStore, it might just be a list of Image objects, or it might be all sorts of other things.)

But, OK, now things start getting more difficult. We need to be able to recognize, say, that such-and-such a column has entries representing countries, or pairs of dates, or animal species, or whatever. SemanticImport uses machine learning and does a decent job of automatically importing many kinds of data. But there are often things that have to be fixed. How exactly is missing data represented? Are there extra annotations that get in the way of automatic interpretation? This is where one starts needing experts, who really understand the data.

But let’s say one’s got through this stage. Well, then in my experience, the best thing to do is to start visualizing the data. And very often one will immediately see things that are horribly wrong. Some particular quantity was represented in several inconsistent ways in the data. Maybe there was some obvious transcription or other error. And so on. But with luck it’s fairly easy to transform the data to handle the obvious issues—though to actually get it right almost always requires someone who is an expert on the data.

What comes out of this process is typically very useful for many purposes—and it’s the level of curation that we’re expecting for things submitted to the Wolfram Data Repository.

It’ll be possible to do all sorts of analysis and visualization and other things with data in this form.

But if one wants, for example, to actually integrate the data into Wolfram|Alpha, there’s considerably more that has to be done. For a start, everything that can realistically be represented symbolically has to be represented symbolically. It’s not good enough to have random strings giving values of things—because one can’t ask systematic questions about those. And this typically requires inventing systematic ways to represent new kinds of concepts in the world—like the "Classification" for meteorites.

Wolfram|Alpha works by taking natural language input. So the next issue is: when there’s something in the data that can be referred to, how do people refer to it in natural language? Often there’ll be a whole collection of names for something, with all sorts of variations. One has to algorithmically capture all of the possibilities.

Next, one has to think about what kinds of questions will be asked about the data. In Wolfram|Alpha, the fact that the questions get asked in natural language forces a certain kind of simplicity on them. But it makes one also need to figure out just what the linguistics of the questions can be (and typically this is much more complicated than the linguistics for entities or other definite things). And then—and this is often a very difficult part—one has to figure out what people want to compute, and how they want to compute it.

At least in the world of Wolfram|Alpha, it turns out to be quite rare for people just to ask for raw pieces of data. They want answers to questions—that have to be computed with models, or methods, or algorithms, from the underlying data. For meteorites, they might want to know not the raw information about when a meteorite fell, but compute the weathering of the meteorite, based on when it fell, what climate it’s in, what it’s made of, and so on. And to have data successfully be integrated into Wolfram|Alpha, those kinds of computations all need to be there.

For full Wolfram|Alpha there’s even more. Not only does one have to be able to give a single answer, one has to be able to generate a whole report, that includes related answers, and presents them in a well-organized way.

It’s ultimately a lot of work. There are very few domains that have been added to Wolfram|Alpha with less than a few skilled person-months of work. And there are plenty of domains that have taken person-years or tens of person-years. And to get the right answers, there always has to be a domain expert involved.

Getting data integrated into Wolfram|Alpha is a significant achievement. But there’s further one can go—and indeed to integrate data into the Wolfram Language one has to go further. In Wolfram|Alpha people are asking one-off questions—and the goal is to do as well as possible on individual questions. But if there’s data in the Wolfram Language, people won’t just ask one-off questions with it: they’ll also do large-scale systematic computations. And this demands a much greater level of consistency and completeness—which in my experience rarely takes less than person-years per domain to achieve.

But OK. So where does this leave the Wolfram Data Repository? Well, the good news is that all that work we’ve put into Wolfram|Alpha and the Wolfram Language can be leveraged for the Wolfram Data Repository. It would take huge amounts of work to achieve what’s needed to actually integrate data into Wolfram|Alpha or the Wolfram Language. But given all the technology we have, it takes very modest amounts of work to make data already very useful. And that’s what the Wolfram Data Repository is about.

The Data Publishing Ecosystem

With the Wolfram Data Repository (and Wolfram Notebooks) there’s finally a great way to do true data-backed publishing—and to ensure that data can be made available in an immediately useful and computable way.

For at least a decade there’s been lots of interest in sharing data in areas like research and government. And there’ve been all sorts of data repositories created—often with good software engineering—with the result that instead of data just sitting on someone’s local computer, it’s now pretty common for it to be uploaded to a central server or cloud location.

But the problem has been that the data in these repositories is almost always in a quite raw form—and not set up to be generally meaningful and computable. And in the past—except in very specific domains—there’s been no really good way to do this, at least in any generality. But the point of the Wolfram Data Repository is to use all the development we’ve done on the Wolfram Language and WDF to finally be able to provide a framework for having data in an immediately computable form.

The effect is dramatic. One goes from a situation where people are routinely getting frustrated trying to make use of data to one in which data is immediately and readily usable. Often there’s been lots of investment and years of painstaking work put into accumulating some particular set of data. And it’s often sad to see how little the data actually gets used—even though it’s in principle accessible to anyone. But I’m hoping that the Wolfram Data Repository will provide a way to change this—by allowing data not just to be accessible, but also computable, and easy for anyone to immediately and routinely use as part of their work.

There’s great value to having data be computable—but there’s also some cost to making it so. Of course, if one’s just collecting the data now, and particularly if it’s coming from automated sources, like networks of sensors, then one can just set it up to be in nice, computable WDF right from the start (say by using the data semantics layer of the Wolfram Data Drop). But at least for a while there’s going to still be a lot of data that’s in the form of things like spreadsheets and traditional databases—-that don’t even have the technology to support the kinds of structures one would need to directly represent WDF and computable data.

So that means that there’ll inevitably have to be some effort put into curating the data to make it computable. Of course, with everything that’s now in the Wolfram Language, the level of tools available for curation has become extremely high. But to do curation properly, there’s always some level of human effort—and some expert input—that’s required. And a key question in understanding the post-Wolfram-Data-Repository data publishing ecosystem is who is actually going to do this work.

In a first approximation, it could be the original producers of the data—or it could be professional or other “curation specialists”—or some combination. There are advantages and disadvantages to all of these possibilities. But I suspect that at least for things like research data it’ll be most efficient to start with the original producers of the data.

The situation now with data curation is a little similar to the historical situation with document production. Back when I was first doing science (yes, in the 1970s) people handwrote papers, then gave them to professional typists to type. Once typed, papers would be submitted to publishers, who would then get professional copyeditors to copyedit them, and typesetters to typeset them for printing. It was all quite time consuming and expensive. But over the course of the 1980s, authors began to learn to type their own papers on a computer—and then started just uploading them directly to servers, in effect putting them immediately in publishable form.

It’s not a perfect analogy, but in both data curation and document editing there are issues of structure and formatting—and then there are issues that require actual understanding of the content. (Sometimes there are also more global “policy” issues too.) And for producing computable data, as for producing documents, almost always the most efficient thing will be to start with authors “typing their own papers”—or in the case of data, putting their data into WDF themselves.

Of course, to do this requires learning at least a little about computable data, and about how to do curation. And to assist with this we’re working with various groups to develop materials and provide training about such things. Part of what has to be communicated is about mechanics: how to move data, convert formats, and so on. But part of it is also about principles—and about how to make the best judgement calls in setting up data that’s computable.

We’re planning to organize “curate-a-thons” where people who know the Wolfram Language and have experience with WDF data curation can pair up with people who understand particular datasets—and hopefully quickly get all sorts of data that they may have accumulated over decades into computable form—and into the Wolfram Data Repository.

In the end I’m confident that a very wide range of people (not just techies, but also humanities people and so on) will be able to become proficient at data curation with the Wolfram Language. But I expect there’ll always be a certain mixture of “type it yourself” and “have someone type it for you” approaches to data curation. Some people will make their data computable themselves—or will have someone right there in their lab or whatever who does. And some people will instead rely on outside providers to do it.

Who will these providers be? There’ll be individuals or companies set up much like the ones who provide editing and publishing services today. And to support this we’re planning a “Certified Data Curator” program to help define consistent standards for people who will work with the originators of a wide range of different kinds of data putting it into computable form.

But in additional to individuals or specific “curation companies”, there are at least two other kinds of entities that have the potential to be major facilitators of making data computable.

The first is research libraries. The role of libraries at many universities is somewhat in flux these days. But something potentially very important for them to do is to provide a central place for organizing—and making computable—data from the university and beyond. And in many ways this is just a modern analog of traditional library activities like archiving and cataloging.

It might involve the library actually having a private cloud version of the Wolfram Data Repository—and it might involve the library having its own staff to do curation. Or it might just involve the library providing advice. But I’ve found there’s quite a bit of enthusiasm in the library community for this kind of direction (and it’s perhaps an interesting sign that at our company people involved in data curation have often originally been trained in library science).

In addition to libraries, another type of organization that should be involved in making data computable is publishing companies. Some might say that publishing companies have had it a bit easy in the last couple of decades. Back in the day, every paper they published involved all sorts of production work, taking it from manuscript to final typeset version. But for years now, authors have been delivering their papers in digital forms that publishers don’t have to do much work on.

With data, though, there’s again something for publishers to do, and again a place for them to potentially add great value. Authors can pretty much put raw data into public repositories for themselves. But what would make publishers visibly add value is for them to process (or “edit”) the data—putting in the work to make it computable. The investment and processes will be quite similar to what was involved on the text side in the past—it’s just that now instead of learning about phototypesetting systems, publishers should be learning about WDF and the Wolfram Language.

It’s worth saying that as of today all data that we accept into the Wolfram Data Repository is being made freely available. But we’re anticipating in the near future we’ll also incorporate a marketplace in which data can be bought and sold (and even potentially have meaningful DRM, at least if it’s restricted to being used in the Wolfram Language). It’ll also be possible to have a private cloud version of the Wolfram Data Repository—in which whatever organization that runs it can set up whatever rules it wants about contributions, subscriptions and access.

One feature of traditional paper publishing is the sense of permanence it provides: once even just a few hundred printed copies of a paper are on shelves in university libraries around the world, it’s reasonable to assume that the paper is going to be preserved forever. With digital material, preservation is more complicated.

If someone just deploys a data resource to their Wolfram Cloud account, then it can be available to the world—but only so long as the account is maintained. The Wolfram Data Repository, though, is intended to be something much more permanent. Once we’ve accepted a piece of data for the repository, our goal is to ensure that it’ll continue to be available, come what may. It’s an interesting question how best to achieve that, given all sorts of possible future scenarios in the world. But now that the Wolfram Data Repository is finally launched, we’re going to be working with several well-known organizations to make sure that its content is as securely maintained as possible.

Data-Backed Journals

The Wolfram Data Repository—and private versions of it—is basically a powerful, enabling technology for making data available in computable form. And sometimes all one wants to do is to make the data available.

But at least in academic publishing, the main point usually isn’t the data. There’s usually a “story to be told”—and the data is just backup for that story. Of course, having that data backing is really important—and potentially quite transformative. Because when one has the data, in computable form, it’s realistic for people to work with it themselves, reproducing or checking the research, and directly building on it themselves.

But, OK, how does the Wolfram Data Repository relate to traditional academic publishing? For our official Wolfram Data Repository we’re going to have definite standards for what we accept—and we’re going to concentrate on data that we think is of general interest or use. We have a whole process for checking the structure of data, and applying software quality assurance methods, as well as expert review, to it.

And, yes, each entry in the Wolfram Data Repository gets a DOI, just like a journal article. But for our official Wolfram Data Repository we’re focused on data—and not the story around it. We don’t see it as our role to check the methods by which the data was obtained, or to decide whether conclusions drawn from it are valid or not.

But given the Wolfram Data Repository, there are lots of new opportunities for data-backed academic journals that do in effect “tell stories”, but now have the infrastructure to back them up with data that can readily be used.

I’m looking forward, for example, to finally making the journal Complex Systems that I founded 30 years ago a true data-backed journal. And there are many existing journals where it makes sense to use versions of the Wolfram Data Repository (often in a private cloud) to deliver computable data associated with journal articles.

But what’s also interesting is that now that one can take computable data for granted, there’s a whole new generation of “Journal of Data-Backed ____” journals that become possible—that not only use data from the Wolfram Data Repository, but also actually present their results as Wolfram Notebooks that can immediately be rerun and extended (and can also, for example, contain interactive elements).

The Corporate Version

I’ve been talking about the Wolfram Data Repository in the context of things like academic journals. But it’s also important in corporate settings. Because it gives a very clean way to have data shared across an organization (or shared with customers, etc.).

Typically in a corporate setting one’s talking about private cloud versions. And of course these can have their own rules about how contributions work, and who can access what. And the data can not only be immediately used in Wolfram Notebooks, but also in automatically generated reports, or instant APIs.

It’s been interesting to see—during the time we’ve been testing the Wolfram Data Repository—just how many applications we’ve found for it within our own company.

There’s information that used to be on webpages, but is now in our private Wolfram Data Repository, and is now immediately usable for computation. There’s information that used to be in databases, and which required serious programming to access, but is now immediately accessible through the Wolfram Language. And there are all sorts of even quite small lists and so on that used to exist only in textual form, but are now computable data in our data repository.

It’s always been my goal to have a truly “computable company”—and putting in place our private Wolfram Data Repository is an important step in achieving this.

My Very Own Data

In addition to public and corporate uses, there are also great uses of Wolfram Data Repository technology for individuals—and particularly for individual researchers. In my own case, I’ve got huge amounts of data that I’ve collected or generated over the course of my life. I happen to be pretty organized at keeping things—but it’s still usually something of an adventure to remember enough to “bring back to life” data I haven’t dealt with in a decade or more. And in practice I make much less use of older data than I should—even though in many cases it took me immense effort to collect or generate the data in the first place.

But now it’s a different story. Because all I have to do is to upload data once and for all to the Wolfram Data Repository, and then it’s easy for me to get and use the data whenever I want to. Some data (like medical or financial records) I want just for myself, so I use a private cloud version of the Wolfram Data Repository. But other data I’ve been getting uploaded into the public Wolfram Data Repository.

Here’s an example. It comes from a page in my book A New Kind of Science:

Page 833 from A New Kind of Science

The page says that by searching about 8 trillion possible systems in the computational universe I found 199 that satisfy some particular criterion. And in the book I show examples of some of these. But where’s the data?

Well, because I’m fairly organized about such things, I can go into my file system, and find the actual Wolfram Notebook from 2001 that generated the picture in the book. And that leads me to a file that contains the raw data—which then takes a very short time to turn into a data resource for the Wolfram Data Repository:

Three-Color Cellular Automaton Rules that Double Their Input

We’ve been systematically mining data from my research going back into the 1980s—even from Mathematica Version 1 notebooks from 1988 (which, yes, still work today). Sometimes the experience is a little less inspiring. Like to find a list of people referenced in the index of A New Kind of Science, together with their countries and dates, the best approach seemed to be to scrape the online book website:

ResourceData["People Mentioned in Stephen Wolfram\[CloseCurlyQuote]s \ \[OpenCurlyDoubleQuote]A New Kind of Science\[CloseCurlyDoubleQuote]"]

And to get a list of the books I used while working on A New Kind of Science required going into an ancient FileMaker database. But now all the data—nicely merged with Open Library information deduced from ISBNs—is in a clean WDF form in the Wolfram Data Repository. So I can do such things as immediately make a word cloud of the titles of the books:

WordCloud[  StringRiffle[   Normal[ResourceData["Books in Stephen Wolfram's Library"][All,      "Title"]]]]

What It Means

Many things have had to come together to make today’s launch of the Wolfram Data Repository possible. In the modern software world it’s easy to build something that takes blobs of data and puts them someplace in the cloud for people to access. But what’s vastly more difficult is to have the data actually be immediately useful—and making that possible is what’s required the whole development of our Wolfram Language and Wolfram Cloud technology stack, which are now the basis for the Wolfram Data Repository.

But now that the Wolfram Data Repository exists—and private versions of it can be set up—there are lots of new opportunities. For the research community, the most obvious is finally being able to do genuine data-backed publication, where one can routinely make underlying data from pieces of research available in a way that people can actually use. There are variants of this in education—making data easy to access and use for educational exercises and projects.

In the corporate world, it’s about making data conveniently available across an organization. And for individuals, it’s about maintaining data in such a way that it can be readily used for computation, and built on.

But in the end, I see the Wolfram Data Repository as a key enabling technology for defining how one can work with data in the future—and I’m excited that after all this time it’s finally now launched and available to everyone.

To comment, please visit the copy of this post at the Wolfram Blog »

]]> 0
<![CDATA[The R&D Pipeline Continues: Launching Version 11.1]]> Thu, 16 Mar 2017 15:52:54 +0000 Stephen Wolfram v11-1-thumbA Minor Release That’s Not Minor I’m pleased to announce the release today of Version 11.1 of the Wolfram Language (and Mathematica). As of now, Version 11.1 is what’s running in the Wolfram Cloud—and desktop versions are available for immediate download for Mac, Windows and Linux. What’s new in Version 11.1? Well, actually a remarkable [...]]]> v11-1-thumb

A Minor Release That’s Not Minor

I’m pleased to announce the release today of Version 11.1 of the Wolfram Language (and Mathematica). As of now, Version 11.1 is what’s running in the Wolfram Cloud—and desktop versions are available for immediate download for Mac, Windows and Linux.

What’s new in Version 11.1? Well, actually a remarkable amount. Here’s a summary:

Summary of new features

There’s a lot here. One might think that a .1 release, nearly 29 years after Version 1.0, wouldn’t have much new any more. But that’s not how things work with the Wolfram Language, or with our company. Instead, as we’ve built our technology stack and our procedures, rather than progressively slowing down, we’ve been continually accelerating. And now even something as notionally small as the Version 11.1 release packs an amazing amount of R&D, and new functionality.

A Visual Change

There’s one very obvious change in 11.1: the documentation looks different. We’ve spiffed up the design, and on the web we’ve made everything responsive to window width—so it looks good even when it’s in a narrow sidebar in the cloud, or on a phone.

Wolfram Language documentation

We’ve also introduced some new design elements—like the mini-view of the Details section. Most people like to see examples as soon as they get to a function page. But it’s important not to forget the Details—and the mini-view provides what amounts to a little “ad” for them.

Examples and details

Lots of New Functions

Here’s a word cloud of new functions in Version 11.1:

Word cloud of new functions

Altogether there are an impressive 132 new functions—together with another 98 that have been significantly enhanced. These functions represent the finished output of our R&D pipeline in just the few months that have elapsed since Version 11.0 was released.

When we bring out a major “integer” release—like Version 11—we’re typically introducing a bunch of complete, new frameworks. In (supposedly) minor .1 releases like Version 11.1, we’re not aiming for complete new frameworks. Instead, there’s typically new functionality that’s adding to existing frameworks—together with a few (sometimes “experimental”) hints of major new frameworks to come. Oh, and if a complete, new framework does happen to be finished in time for a .1 release, it’ll be there too.

Neural Nets

One very hot area in which Version 11.1 makes some big steps forward is neural nets. It’s been exciting over the past few years to see this area advance so quickly in the world at large, and it’s been great to see the Wolfram Language at the very leading edge of what’s being done.

Our goal is to define a very high-level interface to neural nets, that’s completely integrated into the Wolfram Language. Version 11.1 adds some new recently developed building blocks—in particular 30 new types of neural net layers (more than double what was there in 11.0), together with automated support for recurrent nets. The concept is always to let the neural net be specified symbolically in the Wolfram Language, then let the language automatically fill in the details, interface with low-level libraries, etc. It’s something that’s very convenient for ordinary feed-forward networks (tensor sizes are all knitted together automatically, etc.)—but for recurrent nets (with variable-length sequences, etc.) it’s something that’s basically essential if one’s going to avoid lots of low-level programming.

Another crucial feature of neural nets in the Wolfram Language is that it’s set up to be automatic to encode images, text or whatever in an appropriate way. In Version 11.1, NetEncoder and NetDecoder cover a lot of new cases—extending what’s integrated into the Wolfram Language.

It’s worth saying that underneath the whole integrated symbolic interface, the Wolfram Language is using a very efficient low-level library—currently MXNet—which takes care of optimizing ultimate performance for the latest CPU and GPU configurations. By the way, another feature enhanced in 11.1 is the ability to store complete neural net specifications, complete with encoders, etc. in a portable and reusable .wlnet file.

There’s a lot of power in treating neural nets as symbolic objects. In 11.1 there are now functions like NetMapOperator and NetFoldOperator that symbolically build up new neural nets. And because the neural nets are symbolic, it’s easy to manipulate them, for example breaking them apart to monitor what they’re doing inside, or systematically comparing the performance of different structures of net.

In some sense, neural net layers are like the machine code of a neural net programming system. In 11.1 there’s a convenient function—NetModel—that provides pre-built trained or untrained neural net models. As of today, there are a modest number of famous neural nets included, but we plan to add more every week—surfing the leading edge of what’s being developed in the neural net research community, as well as adding some ideas of our own.

Here’s a simple example of NetModel at work:

net = NetModel["LeNet Trained on MNIST Data"]

Now apply the network to some actual data—and see it gets the right answer:


But because the net is specified symbolically, it’s easy to “go inside” and “see what it’s thinking”. Here’s a tiny (but neat) piece of functional programming that visualizes what happens at every layer in the net—and, yes, in the end the first square lights up red to show that the output is 0:

FoldPairList[{ArrayPlot[ArrayFlatten[Partition[#1, UpTo[5]]],      ColorFunction -> "Rainbow"], #2[#1]} &,   NetExtract[net, "Input"][0], Normal[net]]

More Machine Learning

Neural nets are an important method for machine learning. But one of the core principles of the Wolfram Language is to provide highly automated functionality, independent of underlying methods. And in 11.1 there’s a bunch more of this in the area of machine learning. (As it happens, much of it uses the latest deep learning neural net methods, but for users what’s important is what it does, not how it does it.)

My personal favorite new machine learning function in 11.1 is FeatureSpacePlot. Give it any collection of objects, and it’ll try to lay them out in an appropriate “feature space”. Like here are the flags of countries in Europe:

FeatureSpacePlot[EntityValue[=countries in Europe, "FlagImage"]]

What’s particularly neat about FeatureSpacePlot is that it’ll immediately use sophisticated pre-trained feature extractors for specific classes of input—like photographs, texts, etc. And there’s also now a FeatureNearest function that’s the analog of Nearest, but operates in feature space. Oh, and all the stuff with NetModel and pre-trained net models immediately flows into these functions, so it becomes trivial, say, to experiment with “meaning spaces”:

FeatureSpacePlot[{"dog", "ant", "bear", "moose", "cucumber", "bean",    "broccoli", "cabbage"},   FeatureExtractor ->    NetModel["GloVe 50-Dimensional Word Vectors Trained on Wikipedia \ and Gigaword-5 Data"]]

Particularly with NetModel, there are all sorts of very useful few-line neural net programs that one can construct. But in 11.1 there are also some major new, more infrastructural, machine learning capabilities. Notable examples are ActiveClassification and ActivePrediction—which build classifiers and predictors by actively sampling a space, learning how to do this as efficiently as possible. There will be lots of end-user applications for ActiveClassification and ActivePrediction, but for us internally the most immediately interesting thing is that we can use these functions to optimize all sorts of meta-algorithms that are built into the Wolfram Language.


Version 11.0 began the process of making audio—like images—something completely integrated into the Wolfram Language. Version 11.1 continues that process. For example, for desktop systems, it adds AudioCapture to immediately capture audio from a microphone on your computer. (Yes, it’s nontrivial to automatically handle out-of-core storage and processing of large audio samples, etc.) Here’s an example of me saying “hello”:

Play Audio

You can immediately take this, and, say, make a cepstrogram (yes, that’s another new audio function in 11.1):


Images & Visualization

Version 11.1 has quite an assortment of new features for images and visualization. CurrentImage got faster and better. ImageEffect has lots of new effects added. There are new functions and options to support the latest in computational photography and computational microscopy. And images got even more integrated as first-class objects—that one can for example now immediately do arithmetic with:

Sqrt[2 Wolfie Image]-EdgeDetect[Wolfie Image]

Something else with images—that I’ve long wanted—is the ability to take a bitmap image, and find an approximate vector graphics representation of it:

ImageGraphics[Poke Spikey]

TextRecognize has also become significantly stronger—in particular being able to pick out structure in text, like paragraphs and columns and the like.

Oh, and in visualization, there are things like GeoBubbleChart, here showing the populations of the largest cities in the US:

GeoBubbleChart[EntityValue[United States["LargestCities"], {"Position",     "Population"}]]

There’s lots of little (but nice) stuff too. Like support for arbitrary callouts in pie charts, optimized labeling of discrete histograms and full support of scaling functions for Plot3D, etc.

More Data

There’s always new data flowing into the Wolfram Knowledgebase, and there’ve also been plenty of completely new things added since 11.0: 130,000+ new types of foods, 250,000+ atomic spectral lines, 12,000+ new mountains, 10,000+ new notable buildings, 300+ types of neurons, 650+ new waterfalls, 200+ new exoplanets (because they’ve recently been discovered), and lots else (not to mention 7,000+ new spelling words). There’s also, for example, much higher resolution geo elevation data—so now a 3D-printable Mount Everest can have much more detail:

ListPlot3D[GeoElevationData[GeoDisk[Mount Everest]], Mesh -> None]

Integrated External Services

Something new in Version 11.1 are integrated external services—that allow built-in functions that work by calling external APIs. Two examples are WebSearch and WebImageSearch. Here are thumbnail images found by searching the web for “colorful birds”:

WebImageSearch["colorful birds", "Thumbnails"]

For the heck of it, let’s see what ImageIdentify thinks they are (oh, and in 11.1. ImageIdentify is much more accurate, and you can even play with the network inside it by using NetModel):

ImageIdentify /@ %

Since WebSearch and WebImageSearch use external APIs, users need to pay for them separately. But we’ve set up what we call Service Credits to make this seamless. (Everything’s in the language, of course, so there’s for example $ServiceCreditsAvailable.)

There will be quite a few more examples of integrated services in future versions, but in 11.1, beyond web searching, there’s also TextTranslation. WordTranslation (new in 11.0) handles individual word translation for hundreds of languages; now in 11.1 TextTranslation uses external services to also translate complete pieces of text between several tens of languages:

TextTranslation["This is an integrated external service.", "French"]

More Math, More Algorithms

A significant part of our R&D organization is devoted to continuing our three-decade effort to push the frontiers of mathematical and algorithmic computation. So it should come as no surprise that Version 11.1 has all sorts of advances in these areas. There’s space-filling curves, fractal meshes, ways to equidistribute points on a sphere:

Graphics[HilbertCurve[5]] MengerMesh[3, 3] Graphics3D[Sphere[SpherePoints[200], 0.1]]

There are new kinds of spatial, robust and multivariate statistics. There are Hankel transforms, built-in modular inverses, and more. Even in differentiation, there’s something new: nth order derivatives, for symbolic n:

D[x Exp[x], {x, n}]

Here’s something else about differentiation: there are now functions RealAbs and RealSign that are versions of Abs and Sign that are defined only by the real axis, and so can freely be differentiated, without having to give any assumptions about variables.

In Version 10.1, we introduced the function AnglePath, that computes a path from successive segments with specified lengths and angles. At some level, AnglePath is like an industrial-scale version of Logo (or Scratch) “turtle geometry”. But AnglePath has turned out to be surprisingly broadly useful, so for Version 11.1, we’ve generalized it to AnglePath3D (and, yes, there are all sorts of subtleties about frames and Euler angles and so on).

A Language of Granular Dates

When we say “June 23, 1988”, what do we mean? The beginning of that day? The whole 24-hour period from midnight to midnight? Or what? In Version 11.1 we’ve introduced the notion of granularity for dates—so you can say whether a date is supposed to represent a day, a year, a second, a week starting Sunday—or for that matter just an instant in time.

It’s a nice application of the symbolic character of the Wolfram Language—and it solves all sorts of problems in dealing with dates and times. In a way, it’s a little like precision for numbers, but it’s really its own thing. Here for example is how we now represent “the current week”:


Here’s the current decade:


This is the next month from now:


This says we want to start from next month, then add 7 weeks—getting another month:

NextDate["Month"] + =7wk

And here’s the result to the granularity of a month:

CurrentDate[%, "Month"]

Talking of dates, by the way, one of the things that’s coming across the system is the use of Dated as a qualifier, for example for properties of entities of the knowledgebase (so this asks for the population of New York City in 1970):

New York City [ Dated[ "Population", 1970 ] ]

Language Tweaks

I’m very proud of how smooth the Wolfram Language is to use—and part of how that’s been achieved is that for 30 years we’ve been continually polishing it. We’re always making sure everything fits perfectly together—and we’re always adding little conveniences.

One of our principles is that if there’s a lump of computational work that people repeatedly do, then so long as there’s a good name for it (that people will readily remember, and readily recognize when they see it in a piece of code), it should be inserted as a built-in function. A very simple example in Version 11.1 is ReverseSort:

ReverseSort[{1, 2, 3, 4}]

(One might think: what’s the point of this—it’s just Reverse[Sort[...]]. But it’s very common to want to map what’s now ReverseSort over a bunch of objects, and it’s smoother to be able to say ReverseSort /@ ... rather than Reverse[Sort[#]]& /@ ... or Reverse@*Sort /@ ...).

Another little convenience: Nearest now has special ways to specify useful things to return. For example, this gives the distances from 2.7 to the 5 nearest values:

Nearest[{1, 2, 3, 4, 5, 6, 7} -> "Distance", 2.7, 5]

CellularAutomaton is a very broad function. Version 11.1 makes it easier to use for common cases by allowing rules to be specified by associations with labeled elements:

ArrayPlot[  CellularAutomaton[<|"OuterTotalisticCode" -> 110, "Dimension" -> 2,     "Neighborhood" -> 5|>, {{{1}}, 0}, {{{50}}}]]

We’re always trying to make sure that patterns we’ve established get used as broadly as possible. Like in 11.1, you can use UpTo in lots of new places, like in ImageSize specifications.

We also always trying to make sure that things are as general as possible. Like IntegerString now works not only with the standard representation of integers, but also with traditional ones used for different purposes around the world:

IntegerString[12345, "TraditionalChineseFinancial"]

And IntegerName can also now handle different types and languages of names:

IntegerName[12345, {"French", "Ordinal"}]

And there are lots more examples—each making the experience of using the Wolfram Language just a little bit smoother.

A Language of Persistence

If you make a definition list x=7, or $TimeZone=11, the definition will persist until you clear it, or until your session is over. But what if you want a definition that persists longer—say across all your sessions? Well, in Version 11.1 that’s now possible, thanks to PersistentValue.

PersistentValue lets you specify a name (like "foo"), and a "persistence location". (It also allows options like PersistenceTime and ExpirationDate.) The persistence location can just be "KernelSession"—which means that the value lasts only for a single kernel session. But it can also be "FrontEndSession", or "Local" (meaning that it should be the same whenever you use the same computer), or "Cloud" (meaning that it’s globally synchronized across the cloud).

PersistentValue is pretty general. It lets you have values in different places (like different private clouds, for example); then there’s a $PersistencePath that defines the order to look at them in, and a MergingFunction that specifies how (if at all) the values should be merged.

Systems-Level Programming

One of the goals of the Wolfram Language is to be able to interact as broadly as possible with all computational ecosystems. Version 11.1 adds support for the M4A audio format, the .ubj binary JSON format, as well as .ini files and Java .properties files. There’s also a new function, BinarySerialize, that converts any Wolfram Language expression into a new binary (“WXF”) form, optimized for speed or size:

BinarySerialize[RandomGraph[{50, 100}]]

BinaryDeserialize gets it back:


Version 11.0 introduced WolframScript—a command-line interface to the Wolfram Language, running either locally or in the cloud. With WolframScript you can create standalone Wolfram Language programs that run from the shell. There are several enhancements to WolframScript itself in 11.1, but there’s also now a new New > Script menu item that gives you a notebook interface for creating .wls (=“Wolfram Language Script”) files to be run by WolframScript:


Strengthening the Infrastructure

One of the major ways the Wolfram Language has advanced in recent times has been in its deployability. We’ve put a huge amount of work into making sure that the Wolfram Language can be robustly deployed at scale (and there are now lots of examples of successes out in the world).

We make updates to the Wolfram Cloud very frequently (and invisibly), steadily enhancing server performance and user interface capabilities. Along with Version 11.1 we’ve made some major updates. There are a few signs of this in the language.

Like there’s now an option AutoCopy that can be set for any cloud object—and that means that every time the object is accessed, one should get a fresh copy of it. This is very useful if, for example, you want to have a notebook that lots of people can separately modify. (“Explore these ideas; here’s a notebook to start from…”, etc.)

CloudDeploy[APIFunction[...]] makes it extremely easy to deploy web APIs. In Version 11.1 there are some options to automate aspects of how those APIs behave. For example, there’s AllowedCloudExtraParameters, which lets you say that APIs can have parameters like "_timeout" or "_geolocation" automated. There’s also AllowedCloudParameterExtensions (no, it’s not the longest name in the system; that honor currently goes to MultivariateHypergeometricDistribution). What AllowedCloudParameterExtensions does is to let you say not just x=value, but x__url=..., or x__json=....

Another thing about Version 11.1 is that it’s got various features added to support private instances of the Wolfram Cloud—and our major new Wolfram Enterprise Private Cloud product (with a first version released late last year). For example, in addition to $WolframID for the Wolfram Cloud, there’s also $CloudUserID that’s generalized to allow authentication on private clouds. And inside the system, there are all sorts of new capabilities associated with “multicloud authentication” and so on. (Yes, it’s a complicated area—but the symbolic character of the Wolfram Language lets one handle it rather beautifully.)

And There’s More

OK, so I’ve summarized some of what’s in 11.1. There’s a lot more I could say. New functions, and new capabilities—each of which is going to be exciting to somebody. But to me it’s actually pretty amazing that I can write this long a post about a .1 release! It’s a great testament to the strength of the R&D pipeline—and to how much can be done with the framework we’ve built in the Wolfram Language over the past 30 years.

We always work on a portfolio of projects—from small ones that get delivered very quickly, to ones that may take a decade or more to mature. Version 11.1 has the results of several multi-year projects (e.g. in machine learning, computational geometry, etc.), and a great many shorter projects. It’s exciting for us to be able to deliver the fruits of our efforts, and I look forward to hearing what people do with Version 11.1—and to seeing successive versions continue to be developed and delivered.

To comment, please visit the copy of this post at the Wolfram Blog »

]]> 0
<![CDATA[Two Hours of Experimental Mathematics]]> Mon, 06 Mar 2017 20:15:15 +0000 Stephen Wolfram experiment-thumbA Talk, a Performance… a Live Experiment “In the next hour I’m going to try to make a new discovery in mathematics.” So I began a few days ago at two different hour-long Math Encounters events at the National Museum of Mathematics (“MoMath”) in New York City. I’ve been a trustee of the museum since [...]]]> experiment-thumb

Stephen Wolfram leading a live experiment at MoMath

A Talk, a Performance… a Live Experiment

“In the next hour I’m going to try to make a new discovery in mathematics.” So I began a few days ago at two different hour-long Math Encounters events at the National Museum of Mathematics (“MoMath”) in New York City. I’ve been a trustee of the museum since before it opened in 2012, and I was looking forward to spending a couple of hours trying to “make some math” there with a couple of eclectic audiences from kids to retirees.

People usually assume that new discoveries aren’t things one can ever see being made in real time. But the wonderful thing about the computational tools I’ve spent decades building is that they make it so fast to implement ideas that it becomes realistic to make discoveries as a kind of real-time performance art.
Try the experiments for yourself in the Wolfram Open Cloud »

But mathematics is an old field. Haven’t all the “easy” discoveries already been made? Absolutely not! Mathematics has progressed along definite lines, steadily adding theorems about all sorts of things. Many great mathematicians (Gauss, Ramanujan, etc.) have done experimental mathematics to find out what’s true. But in general, experimental mathematics hasn’t been pursued nearly as much as it could or should have been. And that means that there’s a huge amount of “low-hanging fruit” still to be picked—even if one’s only got a couple of hours to spend doing it.

Experiment #1

My rule for live experiments is that to keep everything fresh I think of the topic only a few minutes before I start. But since this was my first-ever such event at the museum, I thought I should have a topic that’s somehow a big one for me. So I decided it should be something related to cellular automata—which were the very first examples I explored in the multi-decade journey that led to A New Kind of Science.

While their setup is nice and easy to understand, cellular automata are fundamentally systems from the computational universe, not “mathematical” systems. But what I thought I’d do for my first experiment was to look at some cellular automata that somehow have a traditional mathematical interpretation.

After introducing cellular automata (and the Wolfram Language), I started off talking about Pascal’s triangle—formed by making each number to be the sum of left and right neighbors at each step. Here’s the code I wrote to make Pascal’s triangle (yes, replacing 0 by “” is a bit hacky, but it makes everything much easier to read):

NestList[RotateLeft[#] + RotateRight[#] &, CenterArray[{1}, 21, 0, 10]/. 0 -> "" // Grid

If one does the same thing mod 2, one gets a rather clear pattern:

NestList[Mod[RotateLeft[#] + RotateRight[#],2] &, CenterArray[{1}, 21, 0], 10]/. 0 -> "" // Grid

And one can think of this as a cellular automaton, with this rule:


Here’s what happens if one runs this, starting from a single 1:

ArrayPlot[CellularAutomaton[90, {{1}, 0}, 50]]

And here’s the same result, from the “mathematical” code:

ArrayPlot[  NestList[Mod[RotateLeft[#] + RotateRight[#], 2] &,    CenterArray[{1}, 101, 0], 50]]

OK, so what happens if we change the math a bit? Instead of using mod 2, let’s use mod 5:

ArrayPlot[  NestList[Mod[RotateLeft[#] + RotateRight[#], 5] &,    CenterArray[{1}, 101, 0], 50]]

It’s still a regular pattern. But here was my idea for the experiment: explore what happens if the rule involves mathematical operations other than pure addition.

So what about multiplication? I was mindful of the fact that all the 0s in the initial conditions would tend to make a lot of 0s. So I thought: let’s try adding constants before doing the multiplication. And here’s the first thing I tried:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 5] &,    CenterArray[{1}, 101, 0], 50]]

I was pretty surprised. I wasn’t expecting anything that complicated. But, OK, I thought, let’s back off and try an even simpler rule: let’s use mod 3 instead of mod 5. (Mod 2 would already have been covered by my exhaustive study of the “elementary cellular automata”.)

Here’s the result I got:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3] &,    CenterArray[{1}, 101, 0], 50]]

I immediately said, “I wonder how fast that pattern grows.” I guessed it might be a logarithm or a square root.

But before going on, I wanted to scope out what else was there in the space of rules like this. Just to check, I ran the mod 2 case. As expected, nothing interesting.

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 2] &,     CenterArray[{1}, 101, 0], 50]], {a, 0, 1}, {b, 0, 1}]

OK, now the mod 3 case:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 3] &,     CenterArray[{1}, 101, 0], 50]], {a, 0, 2}, {b, 0, 2}]

An interesting little collection. But then it was time to analyze the growth of those patterns.

The first step, as suggested by someone in the audience, was just to rotate the list every step, to make the straight edge be vertical:

ArrayPlot[  NestList[RotateRight[     Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,    CenterArray[{1}, 100, 0], 50]]

Then we picked every other step, to get rid of the horizontal stripes:

ArrayPlot[  Take[NestList[    RotateRight[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,     CenterArray[{1}, 200, 0], 200], 1 ;; -1 ;; 2]]

And—when in doubt—just run it longer, here for 3000 steps. Well, my guess about square root or logarithm was wrong: this looks roughly linear, albeit irregular.

ArrayPlot[  Take[NestList[    RotateRight[Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,     CenterArray[{1}, 3000, 0], 3000], 1 ;; -1 ;; 2]]

I was disappointed that this was so gray and hard to read. Trying colors didn’t help, though; the pattern is just rather sparse.

Well, then I tried to just plot the position of the right-hand edge. Here’s the code I came up with:

data = (First[#] - #) &[    Flatten[FirstPosition[Reverse[#], 1 | 2] & /@       Take[NestList[        RotateRight[          Mod[(1 + RotateLeft[#])*(2 + RotateRight[#]), 3]] &,         CenterArray[{1}, 2000, 0], 2000], 1 ;; -1 ;; 2]]];

Here’s a fit:

Fit[data, {1, x}, x]

OK, how can one get some better analysis? First, I took differences to see the growth at each step: always either 0 or 2 cells. Then I looked for runs of growth or no growth. And then I looked specifically for runs of growth, and saw how long the successive runs were.

runs = Length /@ Take[Split[Differences[data]], 1 ;; -1 ;; 2];

What is this? Being New York, there were lots of finance people in the audience—including in the front row a world expert on power laws. So the obvious question was, did the spikes have a power-law distribution of sizes? The results, based on the data I had, were inconclusive:


But instead of looking further at this particular rule, I decided to take a quick look at the case of higher moduli. These are the results I got for mod 4:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 4] &,     CenterArray[{1}, 100, 0], 50], PlotLabel -> {a, b}], {a, 0, 3}, {b,    0, 3}]

There was one that looked interesting here:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(1 + RotateRight[#]), 4] &,    CenterArray[{1, 2, 1}, 100, 0], 50]]

Would it end up having lots of different possible structures? Trying it with random initial conditions made it look like it was never going to have anything other than repetitive behavior:

ArrayPlot[  NestList[Mod[(1 + RotateLeft[#])*(1 + RotateRight[#]), 4] &,    RandomInteger[2, 100], 50]]

Well, by this point our time was basically up. But it was hard to stop. I quickly tried the case of mod 5—and discovered all sorts of interesting behavior:

Table[ArrayPlot[   NestList[Mod[(a + RotateLeft[#])*(b + RotateRight[#]), 5] &,     CenterArray[{1, 2}, 100, 0], 50], PlotLabel -> {a, b}], {a, 0,    4}, {b, 0, 4}]

I just had to check out a couple of these. One that has an overall nested pattern, but with lots of complicated stuff going on “in the background”:

ArrayPlot[  NestList[Mod[(3 + RotateLeft[#])*(3 + RotateRight[#]), 5] &,    CenterArray[{1, 2}, 400, 0], 200]]

And one with a mixture of regular and irregular growth:

ArrayPlot[  Take[NestList[Mod[(2 + RotateLeft[#])*(4 + RotateRight[#]), 5] &,     CenterArray[{1, 2}, 2000, 0], 1000], 1 ;; -1 ;; 2]]

It was time to stop. But I was pretty satisfied. Live experiments are always risky. And we might have found nothing interesting. But instead we found something really interesting: rich and complex behavior based on iterating rules given by simple algebraic formulas. In a sense what we found is an example of a bridge between traditional mathematical constructs (like algebraic formulas), and pure computational systems, with arbitrary computational rules. In an hour we certainly didn’t finish—but we found a seed for all sorts of future research—on what we might call “MoMath Cellular Automata.”

Experiment #2

After a break, it was time for experiment #2. This time I decided to do something more related to numbers. I started by talking about reversal-addition systems—where at each step one adds a number to the number obtained by reversing its digits. I showed the result for base 10, starting from the number 123:

PadLeft[IntegerDigits[    NestList[FromDigits[Reverse[IntegerDigits[#]]] + # &, 123,      100]]] // ArrayPlot

Then I said, “Instead of reversing the digits, let’s just rotate them to the left. And let’s make the system simpler, by using base 2 instead of base 10.”

This was the sequence of numbers obtained, starting from 1:

NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1, 10]

Someone asked whether it was a recognizable sequence. FindSequenceFunction didn’t think so:


Then the question was, what’s the overall pattern? Here’s the result for 100 steps:

PadLeft[IntegerDigits[    NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1,      100], 2]] // ArrayPlot" src="

It looks remarkably complex. And doing 1000 steps doesn’t make it look any simpler:

PadLeft[IntegerDigits[    NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, 1,      1000], 2]] // ArrayPlot" src="

What about starting with something other than 1?

Table[PadLeft[    IntegerDigits[     NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, n,       100], 2]] // ArrayPlot, {n, 10}]

All pretty similar. I wondered if rotating right, rather than left, would make a difference. It really didn’t:

Table[PadLeft[IntegerDigits[     NestList[FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + # &, n,       100], 2]] // ArrayPlot, {n,10}]

I thought maybe it’d be interesting to have a fixed number of digits, so I tried reducing mod 220, to keep only the last 20 digits:

Table[PadLeft[    IntegerDigits[     NestList[FromDigits[RotateRight[IntegerDigits[#, 2]], 2], 2^20] + # &, n,       100], 2]] // ArrayPlot, {n, 10}]

Table[FindTransientRepeat[NestList[Mod[FromDigits[RotateRight[IntegerDigits[#,2]],2+#,2^n]&,1,1000],4,{n,8}] // Column" src="

Then I decided to make complete transition graphs for all 2n states in each case. Curious-looking pictures, but not immediately illuminating.

Table[Labeled[   Graph[# ->        Mod[FromDigits[RotateRight[IntegerDigits[#, 2]], 2] + #,         2^n] & /@ Range[2^n]], n], {n, 2, 9}]

By now I was wondering: “Is there a still simpler system involving digit rotation that does something interesting?” I wondered what would happen if instead of adding in the original number at each step, I just multiplied by 2, and added some constant. This didn’t immediately lead to anything interesting:

Table[PadLeft[    IntegerDigits[     NestList[2 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + n &,       1, 100], 2]] // ArrayPlot, {n, 6}]

So then I wondered about multiplying by 3:

Table[PadLeft[    IntegerDigits[     NestList[3 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + n &,       1, 100], 2]] // ArrayPlot, {n, 7}]

Again, nothing too exciting. But—just to be complete—I thought I’d better run the experiment of looking at a sequence of other multipliers.

Table[Labeled[   PadLeft[IntegerDigits[      NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 100], 2]] // ArrayPlot, a], {a, 20}]

Similar behavior until—aha!—something weird and complicated happens when one gets to multiplier 13.

There was an immediate guess from the audience that primes might be special. But that theory was quickly exploded by the case of multiplier 21.

Table[Labeled[   PadLeft[IntegerDigits[      NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 100], 2]] // ArrayPlot, a], {a, 20, 24}]

OK, so then the hunt was on for what was special about the multipliers that led to complex behavior. But first we had to figure out how to recognize complex behavior. I thought I’d try something newfangled: using machine learning to make a feature space plot of the images for different multipliers.

It was somewhat interesting—and a nice application of machine learning—but not immediately too useful. (To make it better, one would have to think harder about the feature extractor to use.)

imags = Table[    PadLeft[IntegerDigits[       NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,         1, 100], 2]] // Image, {a, 50}]; FeatureSpacePlot[imags]

So how could one tell from that which were the complex patterns? A histogram of entropies wasn’t obviously illuminating:

Histogram[Entropy /@ imags]

As I was writing this blog post, I thought I should find the entropy distribution more accurately; even including 1000 possible multipliers, it still doesn’t seem terribly helpful:

Histogram[  Entropy /@    Table[PadLeft[      IntegerDigits[       NestList[a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,         1, 100], 2]] // Image, {a, 1000}]]

An expert in telecom math in the front row suggested taking a Fourier transform. I said I wasn’t hopeful:


Yes, there are better ways to do the Fourier transform. But someone else (a hedge-fund CEO, as it happened) suggested looking at the occurrences of particular 2×2 blocks in each pattern. For the case of multiplier 13, lots of blocks occur:

Counts[Flatten[   Partition[    PadLeft[IntegerDigits[      NestList[13 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 20], 2]], {2, 2}], 1]]

But for the case of multiplier 5, where the pattern is simple, most blocks never occur:

Counts[Flatten[   Partition[    PadLeft[IntegerDigits[      NestList[5 FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &,        1, 20], 2]], {2, 2}], 1]]

So this suggested that we just generate a list of how many of the 16 possible blocks actually do occur, for each multiplier:

blks = Table[   Length[Union[     Flatten[Partition[       PadLeft[IntegerDigits[         NestList[          a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &, 1,           20], 2]], {2, 2}], 1]]], {a, 50}]

Here’s a plot:


Where are the 16s?

Flatten[Position[blks, 16]]

FindSequenceFunction didn’t have any luck with these numbers. Plotting the “block count” for longer gave this:

ListLinePlot[  Table[Length[    Union[Flatten[      Partition[       PadLeft[IntegerDigits[         NestList[          a FromDigits[RotateLeft[IntegerDigits[#, 2]], 2] + 1 &, 1,           20], 2]], {2, 2}], 1]]], {a, 1000}]]

Definitely some structure. But it’s not clear what it is.

And once again, we were out of time—having found an interesting kind of system with the curious property that it’s usually complex in its behavior, but for some special cases, isn’t.

The Live Experiment Process

I’ve done many live experiments over the years—though it’s been a while since they were about math. And as the Wolfram Language has evolved, it’s become easier and easier to do the experiments nicely and smoothly—without time wasted on glitches and debugging.

Wolfram Notebooks have the nice little feature that they (by default) keep a Notebook History (see the Cell menu)—that shows when each cell in the notebook has been modified. Here are the results for Experiment #1 and Experiment #2. Mostly they show rather linear progress, with comparatively little backtracking. (There’s a gap in Experiment #2, which came because my network connection suddenly stopped working. Conveniently, there were some networking experts in the audience—and eventually it was determined that the USB-C connection from my fine new computer to the projector had somehow misnegotiated itself as an Ethernet connection…)

Cell history

Every year at our Summer School I start out by doing a live experiment or two—because I think live experiments are a great way to show just how accessible discovery can be if one approaches it the right way, and with the right tools. I’m expecting that live experiments will be an important part of the process of educating people about computational thinking too.

With the Wolfram Language, one can do live experiments—and live coding—about all sorts of things. (We even tried doing a prototype Live Coding Competition at our last Wolfram Technology Conference; it worked well, and we’ll probably develop it into a whole program.)

But whether they’re live or not, computer experiments are an incredibly powerful methodology for making discoveries—not least in mathematics.

Of course, it’s easy to generate all kinds of random facts about mathematics. The issue is: how does one generate “interesting” facts? In a first approximation, for a fact to be interesting to us humans, it has to relate to things we care about. Those things could be technological applications, observations about the real world—or just pieces of mathematics that have, for whatever reason, historically been studied (think Fermat’s Last Theorem, for example).

I like to think that my book A New Kind of Science significantly broadened the kinds of “math-like facts” that one might consider “interesting”—by providing a general intellectual framework (about computation, complexity, and so on) into which those facts can be fit.

But part of the skill needed to do good experimental mathematics is to look for facts that somehow can ultimately be related to larger frameworks, and ultimately to the traditions of mathematics. Like in any area of research, it takes experience and intuition—and luck can help too.

But in experimental mathematics, it’s extremely easy to get started: there’s plenty of fertile territory to be explored, even with quite elementary mathematical ideas. We just happen to live at a time when the tools to make this kind of exploration feasible first exist. (Of course, I’ve spent a lot of my life building them…)

How should experimental mathematics be done? Perhaps there could be “math-a-thons” (or “discover-a-thons”), analogous to hackathons, where the output is math papers, not software projects.

More than 30 years ago I started the journal Complex Systems—and one of my long-term goals was to make it a repository for results in experimental mathematics. It certainly has published plenty of them, but the standard form of modern academic papers isn’t optimized for experimental mathematics. Instead, one can imagine some kind of “Discoveries in Experimental Mathematics,” that is much more oriented towards straightforward reports of the results of experiments.

In some ways it would be a return to an earlier style of scientific publishing—like all those papers from the 1800s reporting sighting of strange new animals or physical phenomena. But what’s new today is that with the Wolfram Language—and particularly with Notebook documents—it’s possible not just to report on what one’s seen, but instead to give a completely reproducible version of it, that anyone else can also explore. (And if there’s a giant computation involved, to store the results in a cloud object.)

I’m hoping that finally it’ll now be possible to establish a serious ecosystem for experimental mathematics. A place where results can be found, presented clearly with good visualization and so on, and published in a form where others can build on them. It’s been a long time coming, but I think it’s going to be an important direction for mathematics going forward.

And it was fun for me (and I hope for the audience too) to spend a couple of hours prototyping it live and in public a few days ago.

Download the complete notebooks:
Session #1/Experiment #1 »
Session #2/Experiment #2 »

Try the experiments for yourself in the Wolfram Open Cloud »

]]> 1
<![CDATA[Launching Wolfram|Alpha Open Code]]> Mon, 12 Dec 2016 18:56:18 +0000 Stephen Wolfram opencode-thumbCode for Everyone Computational thinking needs to be an integral part of modern education—and today I’m excited to be able to launch another contribution to this goal: Wolfram|Alpha Open Code. Every day, millions of students around the world use Wolfram|Alpha to compute answers. With Wolfram|Alpha Open Code they’ll now not just be able to get [...]]]> opencode-thumb

Wolfram|Alpha and Wolfram Language logos

Code for Everyone

Computational thinking needs to be an integral part of modern education—and today I’m excited to be able to launch another contribution to this goal: Wolfram|Alpha Open Code.

Every day, millions of students around the world use Wolfram|Alpha to compute answers. With Wolfram|Alpha Open Code they’ll now not just be able to get answers, but also be able to get code that lets them explore further and immediately apply computational thinking.

It takes a lot of sophisticated technology to make this possible. But to the user, it’s simple. Do a computation with Wolfram|Alpha. Now in almost every section of the output you’ll see an “Open Code” link. Click it and Wolfram|Alpha will generate code for you, then open it in a fully runnable and editable notebook that you can immediately use in the Wolfram Open Cloud:

x^2 sin x in Wolfram|Alpha

The sections of the notebook parallel the sections of your Wolfram|Alpha output. But now each section contains not results, but instead core Wolfram Language code needed to get those results. You can run any piece of code by clicking the [>] button (or typing Shift+Enter):

Running code in the cloud

But the really important thing is that right there on the web you can change and extend the code, and then instantly run it again:

Plot[x^2Sin[x]/(1+Tan[x]), {x, -6.3, 6.3}]

The Power of Code

If all someone wants is a single, quick result, then classic Wolfram|Alpha should be all they’ll need. But as soon as they want to go further—that’s where Wolfram|Alpha Open Code comes in.

Let’s say you just got a mathematical result from Wolfram|Alpha:


But then you wonder: “what happens for a whole range of exponents?” Well, it’s going to get pretty complicated to tell Wolfram|Alpha what you want just using natural language. But it’s easy to say what to do by giving a tiny bit of Wolfram Language code (and, yes, you can interactively spin those 3D surfaces around):

Table[Plot3D[x2 Cos[n x] Sin[y], {x, -3.1, 3.1}, {y, -6.6, 6.6}],{n,0,4}]

You could give code to interactively change the parameters too:

ManipulateTable[Plot3D[x2 Cos[n x] Sin[y], {x, -3.1, 3.1}, {y, -6.6, 6.6}],{n,0,10}]

Starting with Wolfram|Alpha, then extending using the Wolfram Language, is very powerful. Here’s what happens with some real-world data. Start in Wolfram|Alpha, then get the underlying Wolfram Language code (it can be made shorter, but then it’s a little less clear what’s going on):

Italy GDP

Evaluate the code to get a time series. Then plot it. And divide by the corresponding result for the US:

DateListPlot[%]  DateListPlot[Entity["Country", "Italy"][EntityProperty["Country", "GDP", {"Date" -> All, "CurrencyUnit" -> "CurrentUSDollar"}]]/Entity["Country", "UnitedStates"][EntityProperty["Country", "GDP", {"Date" -> All, "CurrencyUnit" -> "CurrentUSDollar"}]],Filling->Axis]

An important feature of notebooks is that they’re full, computable documents—and you can add whatever you want to them. You can do a whole series of computations. You can put in text to annotate what you’re doing. You can add section headings. You can edit out parts you don’t need. And so on. And of course you can do all of this in the cloud, using any modern web browser.

The Ulterior Motive

Wolfram|Alpha Open Code is going to be really useful to a lot of people—not just students. But when I invented it my immediate objective was very much educational: I wanted to be able to give the millions of students who use Wolfram|Alpha every day a taste of the power of code, and what can be achieved if one learns about code and computational thinking.

Computational thinking is a critically important skill for the future. And after 30 years of development we’re at the exciting point with the Wolfram Language of being able to directly teach serious computational thinking to a very wide range of students. I see Wolfram|Alpha Open Code as opening a window into the world of computational thinking for all the students who use Wolfram|Alpha.

There’s no learning curve to climb with Wolfram|Alpha: you just type in your question, directly in natural language. But now with Wolfram|Alpha Open Code you can explicitly see how your question gets interpreted computationally. And as soon as you want to go further, you’re immediately doing computational thinking, and writing code. You’re not doing an abstract coding exercise, or creating code in some toy context. You’re immediately using code to formulate computational ideas and get results about something you’re working on.

Of course, what makes this feasible is the character of the Wolfram Language—and its uniquely high-level knowledge-based nature. Because that’s what allows real computations that you want to do to be expressed in small amounts of code that can readily be understood and modified or extended.

Yes, the Wolfram Language has a definite structure and syntax, based on definite principles. But that’s a lot of what makes it easy to understand and to write. And in a notebook you’re always getting suggestions about what to type—and if your browser language is set to something other than English you’ll often get annotations in that language too. And the code you get from using Wolfram|Alpha Open Code will continually illustrate the core principles of the Wolfram Language.

Paths into Computational Thinking

Over the course of the past year, we’ve introduced two important paths into computational thinking, both supported by Wolfram Programming Lab, and available free in the Wolfram Open Cloud.

The first path is to start from Explorations: small projects created using code, that a student can immediately dive into, and then modify and interact with. The second path is to systematically learn the Wolfram Language, for example using my book An Elementary Introduction to the Wolfram Language.

And now Wolfram|Alpha Open Code provides a third path: start from a question that a student has asked, and then automatically generate custom code that provides a starting point for further work and thinking.

It’s a nice complement to the other two paths—and perhaps it’ll often provide encouragement to pursue one or the other of them. But it’s a perfectly good path all by itself—and students can go a long way following it.

Of course, under the hood, there’s a remarkable amount of sophisticated technology that’s being used. There’s the whole natural-language understanding system of Wolfram|Alpha that’s understanding the original question. There’s the Wolfram|Alpha computational knowledge system that’s formulating what pieces of code to generate. Then there’s the Wolfram Open Cloud, providing an interactive notebook environment on the web capable of running the code. And at the center of all of it is the Wolfram Language, with its whole integrated design and vast built-in capabilities and knowledge.

It’s taken 30 years of development to get to this point. But now we’ve been able to put everything together to create a very powerful path for students to get into computational thinking.

And I have to say that for me it’s exciting to think about kids out there using Wolfram|Alpha just for homework, but then pressing the Open Code button, and suddenly being transported into the world of code and computational thinking—and perhaps starting on a lifelong journey.

I’m thrilled to be able to provide the tools that make this possible. Try it out. Tell us what you think. Share what you do, and show others what’s possible.

To comment, please visit the copy of this post at the Wolfram Blog »

]]> 0
<![CDATA[Quick, How Might the Alien Spacecraft Work?]]> Thu, 10 Nov 2016 15:58:50 +0000 Stephen Wolfram arrival-thumb[This post is about the movie Arrival; there are no movie spoilers here.] Connecting with Hollywood “It’s an interesting script” said someone on our PR team. It’s pretty common for us to get requests from movie-makers about showing our graphics or posters or books in movies. But the request this time was different: could we [...]]]> arrival-thumb

[This post is about the movie Arrival; there are no movie spoilers here.]

Arrival trailer image

Connecting with Hollywood

“It’s an interesting script” said someone on our PR team. It’s pretty common for us to get requests from movie-makers about showing our graphics or posters or books in movies. But the request this time was different: could we urgently help make realistic screen displays for a big Hollywood science fiction movie that was just about to start shooting?

Well, in our company unusual issues eventually land in my inbox, and so it was with this one. Now it so happens that through some combination of relaxation and professional interest I’ve probably seen basically every mainstream science fiction movie that’s appeared over the past few decades. But just based on the working title (“Story of Your Life”) I wasn’t even clear that this movie was science fiction, or what it was at all.

But then I heard that it was about first contact with aliens, and so I said “sure, I’ll read the script”. And, yes, it was an interesting script. Complicated, but interesting. I couldn’t tell if the actual movie would be mostly science fiction or mostly a love story. But there were definitely interesting science-related themes in it—albeit mixed with things that didn’t seem to make sense, and a liberal sprinkling of minor science gaffes.

When I watch science fiction movies I have to say I quite often cringe, thinking, “someone’s spent $100 million on this movie—and yet they’ve made some gratuitous science mistake that could have been fixed in an instant if they’d just asked the right person”. So I decided that even though it was a very busy time for me, I should get involved in what’s now called Arrival and personally try to give it the best science I could.

There are, I think, several reasons Hollywood movies often don’t get as much science input as they should. The first is that movie-makers usually just aren’t sensitive to the “science texture” of their movies. They can tell if things are out of whack at a human level, but they typically can’t tell if something is scientifically off. Sometimes they’ll get as far as calling a local university for help, but too often they’re sent to a hyper-specialized academic who’ll not-very-usefully tell them their whole story is wrong. Of course, to be fair, science content usually doesn’t make or break movies. But I think having good science content—like, say, good set design—can help elevate a good movie to greatness.

As a company we’ve had a certain amount of experience working with Hollywood, for example writing all the math for six seasons of the television show Numb3rs. I hadn’t personally been involved—though I have quite a few science friends who’ve helped with movies. There’s Jack Horner, who worked on Jurassic Park, and ended up (as he tells it) pretty much having all his paleontology theories in the movie, including ones that turned out to be wrong. And then there’s Kip Thorne (famous for the recent triumph of detecting gravitational waves), who as a second career in his 80s was the original driving force behind Interstellar—and who made the original black-hole visual effects with Mathematica. From an earlier era there was Marvin Minsky who consulted on AI for 2001: A Space Odyssey, and Ed Fredkin who ended up as the model for the rather eccentric Dr. Falken in WarGames. And recently there was Manjul Bhargava, who for a decade shepherded what became The Man Who Knew Infinity, eventually carefully “watching the math” in weeks of editing sessions.

All of these people had gotten involved with movies much earlier in their production. But I figured that getting involved when the movie was about to start shooting at least had the advantage that one knew the movie was actually going to get made (and yes, there’s often a remarkably high noise-to-signal ratio about such things in Hollywood). It also meant that my role was clear: all I could do was try to uptick and smooth out the science; it wasn’t even worth thinking about changing anything significant in the plot.

The inspiration for the movie had come from an interesting 1998 short story by Ted Chiang. But it was a conceptually complicated story, riffing off a fairly technical idea in mathematical physics—and I wasn’t alone in wondering how anyone could possibly make a movie out of it. Still, there it was, a 120-page script that basically did it, with some science from the original story, and quite a lot added, mostly still in a rather “lorem ipsum” state. And so I went to work, making comments, suggesting fixes, and so on.

A Few Weeks Later…

Cut to a few weeks later. My son Christopher and I arrive on the set of Arrival in Montreal. The latest X-Men movie is filming at a huge facility next door. Arrival is at a more modest facility. We get there when they’re in the middle of filming a scene inside a helicopter. We can’t see the actors, but we’re watching on the “video village” monitor, along with a couple of producers and other people.

The first line I hear is “I’ve prepared a list of questions [for the aliens], starting with some binary sequences…”. And I’m like “Wow, I suggested saying that! This is great!” But then there’s another take. And a word changes. And then there are more takes. And, yes, the dialogue sounds smoother. But the meaning isn’t right. And I’m realizing: this is more difficult than I thought. Lots of tradeoffs. Lots of complexity. (Happily, in the final movie, it ends up being a blend, with the right meaning, and sounding good.)

After a while there’s a break in filming. We talk to Amy Adams, who plays a linguist assigned to communicate with the aliens. She’s spent some time shadowing a local linguistics professor, and is keen to talk about the question of how much the language one uses determines how one thinks—which is a topic that as a computer-language designer I’ve long been interested in. But what the producers really want is for me to talk to Jeremy Renner, who plays a physicist in the movie. He’s feeling out of sorts right then—so off we go to look at the “science tent” set they’ve built and think about what visuals will work with it.

Me and Christopher on set

Writing Code

The script made it clear that there were going to be lots of opportunities for interesting visuals. But much as I might have found it fun, I just didn’t personally have the time to work on creating them. Fortunately, though, my son Christopher—who is a very fast and creative programmer—was interested in doing it. We’d hoped to just be able to ship him off to the set for a week or two, but it was decided he was still too young, so he started off working remotely.

His basic strategy was simple: just ask “if we were doing this for real, what analysis and computations would we be doing?”. We’ve got a list of alien landing sites; what’s the pattern? We’ve got geometric data on the shape of the spacecraft; what’s its significance? We’ve got alien “handwriting”; what does it mean?

Collage of visualizations

The movie-makers were giving Christopher raw data, just like in real life, and he was trying to analyze it. And he was turning each question that was asked into all sorts of Wolfram Language code and visualizations.

Christopher was well aware that code shown in movies often doesn’t make sense (a favorite, regardless of context, seems to be the source code for nmap.c in Linux). But he wanted to create code that would make sense, and would actually do the analyses that would be going on in the movie.

GeoGraphics[{Thickness[0.001], {Red,     GeoPath /@ (List @@@        EdgeList[NearestNeighborGraph[landingSites, 3]])},    Table[GeoDisk[#, Quantity[n, "Miles"]] & /@ landingSites, {n, 0,      1000, 250}], Red, GeoStyling[Opacity[1]],    GeoDisk[#, Quantity[50, "Miles"]] & /@ landingSites},   GeoRange -> "World",   GeoProjection -> "WagnerII", GeoZoomLevel -> 3]

Module[{i = image from previous output , corners = ImageCorners[i, 3, 0.1, 5]},  Show[{i,    Graphics[{      {Orange, Thickness[0.003],        Outer[If[#1 === #2, {}, {Opacity[            3000/EuclideanDistance[#1, #2]^2], Line[{#1, #2}]}] &,         corners, corners, 1]},      {EdgeForm[Green], FaceForm[],        Rectangle[# - 10, # + 10] & /@ corners}      }]}]]

In the final movie, the screen visuals are a mixture of ones Christopher created, ones derived from what he created, and ones that were put in separately. Occasionally one can see code. Like there’s a nice shot of rearranging alien “handwriting”, in which one sees a Wolfram Language notebook with rather elegant Wolfram Language code in it. And, yes, those lines of code actually do the transformation that’s in the notebook. It’s real stuff, with real computations being done.

A Theory of Interstellar Travel

When I first started looking at the script for the movie, I quickly realized that to make coherent suggestions I really needed to come up with a concrete theory for the science of what might be going on. Unfortunately there wasn’t much time—and in the end I basically had just one evening to invent how interstellar space travel might work. Here’s the beginning of what I wrote for the movie-makers about what I came up with that evening (to avoid spoilers I’m not showing more):

Science (Fiction) of Interstellar Spacecraft

Obviously all these physics details weren’t directly needed in the movie. But thinking them through was really useful in making consistent suggestions about the script. And they led to all sorts of science-fictiony ideas for dialogue. Here are a few of the ones that (probably for the better) didn’t make it into the final script. “The whole ship goes through space like one giant quantum particle”. “The aliens must directly manipulate the spacetime network at the Planck scale”. “There’s spacetime turbulence around the skin of the ship”. “It’s like the skin of the ship has an infinite number of types of atoms, not just the 115 elements we know” (that was going to be related to shining a monochromatic laser at the ship and seeing it come back looking like a rainbow). It’s fun for an “actual scientist” like me to come up with stuff like this. It’s kind of liberating. Especially since every one of these science-fictiony pieces of dialog can lead one into a long, serious, physics discussion.

For the movie, I wanted to have a particular theory for interstellar travel. And who knows, maybe one day in the distant future it’ll turn out to be correct. But as of now, we certainly don’t know. In fact, for all we know, there’s just some simple “hack” in existing physics that’ll immediately make interstellar travel possible. For example, there’s even some work I did back in 1982 that implies that with standard quantum field theory one should, almost paradoxically, be able to continually extract “zero point energy” from the vacuum. And over the years, this basic mechanism has become what’s probably the most quoted potential propulsion source for interstellar travel, even if I myself don’t actually believe in it. (I think it takes idealizations of materials much too far.)

Maybe (as has been popular recently) there’s a much more prosaic way to propel at least a tiny spacecraft, by pushing it at least to nearby stars with radiation pressure from a laser. Or maybe there’s some way to do “black hole engineering” to set up appropriate distortions in spacetime, even in the standard Einsteinian theory of gravity. It’s important to realize that even if (when?) we know the fundamental theory of physics, we still may not immediately be able to determine, for example, whether faster-than-light travel is possible in our universe. Is there some way to set up some configuration of quantum fields and black holes and whatever so that things behave just so? Computational irreducibility (related to undecidability, Gödel’s Theorem, the Halting Problem, etc.) tells one that there’s no upper bound on just how elaborate and difficult-to-set-up the configuration might need to be. And in the end one could use up all the computation that can be done in the history of the universe—and more—trying to invent the structure that’s needed, and never know for sure if it’s impossible.

What Are Physicists Like?

When we’re visiting the set, we eventually meet up with Jeremy Renner. We find him sitting on the steps of his trailer smoking a cigarette, looking every bit the gritty action-adventurer that I realize I’ve seen him as in a bunch of movies. I wonder about the most efficient way to communicate what physicists are like. I figure I should just start talking about physics. So I start explaining the physics theories that are relevant to the movie. We’re talking about space and time and quantum mechanics and faster-than-light travel and so on. I’m sprinkling in a few stories I heard from Richard Feynman about “doing physics in the field” on the Manhattan Project. It’s an energetic discussion, and I’m wondering what mannerisms I’m displaying—that might or might not be typical of physicists. (I can’t help remembering Oliver Sacks telling me how uncanny it was for him to see how many of his mannerisms Robin Williams had picked up for Awakenings after only a little exposure, so I’m wondering what Jeremy is going to pick up from me in these few hours.)

Jeremy is keen to understand how the science relates to the arc of the story for the movie, and what the aliens as well as humans must be feeling at different points. I try to talk about what it’s like to figure stuff out in science. Then I realize the best thing is to actually show it a bit, by doing some Wolfram Language live coding. And it turns out that the way the script is written right then, Jeremy is actually supposed to be on camera using Wolfram Language himself (just like—I’m happy to say—so many real-life physicists do).

Christopher shows some of the code he’s written for the movie, and how the controls to make the dynamics work. Then we start talking about how one sets about figuring out the code. We do some preliminaries. Then we’re off and running, doing live coding. And here’s the first example we make—based on the digits of pi that we’d been discussing in relation to SETI or Contact (the book version) or something:


What to Say to the Aliens

Arrival is partly about interstellar travel. But it’s much more about how we’d communicate with the aliens once they’ve showed up here. I’ve actually thought a lot about alien intelligence. But mostly I’ve thought about it in a more difficult case than in Arrival—where there are no aliens or spaceships in evidence, and where the only thing we have is some thin stream of data, say from a radio transmission, and where it’s difficult even to know if what we’ve got should be considered evidence of “intelligence” at all (remember, for example, that it often seems that even the weather can be complex enough to seem like it “has a mind of its own”).

But in Arrival, the aliens are right here. So then how should we start communicating with them? We need something universal that doesn’t depend on the details of human language or human history. Well, OK, if you’re right there with the aliens, there are physical objects to point to. (Yes, that assumes the aliens have some notion of discrete objects, rather than just a continuum, but by the time they’ve got spaceships and so on, that seems like a decently safe bet.) But what if you want to be more abstract?

Well, then there’s always mathematics. But is mathematics actually universal? Does anyone who builds spaceships necessarily have to know about prime numbers, or integrals, or Fourier series? It’s certainly true that in our human development of technology, those are things we’ve needed to understand. But are there other (and perhaps better) paths to technology? I think so.

For me, the most general form of abstraction that seems relevant to the actual operation of our universe is what we get by looking at the computational universe of possible programs. Mathematics as we’ve practiced it does show up there. But so do an infinite diversity of other abstract collections of rules. And what I realized a while back is that many of these are very relevant—and actually very good—for producing technology.

So, OK, if we look across the computational universe of possible programs, what might we pick out as reasonable universals to start an abstract discussion with aliens who’ve come to visit us?

Once one can point to discrete objects, one has the potential to start talking about numbers, first in unary, then perhaps in binary. Here’s the beginning of a notebook I made about this for the movie. The words and code are for human consumption; for the aliens there’d just be “flash cards” of the main graphics:

Establishing Communication

OK, so after basic numbers, and maybe some arithmetic, what’s next? It’s interesting to realize that even what we’ve discussed so far doesn’t reflect the history of human mathematics: despite how fundamental they are (as well as their appearance in very old traditions like the I Ching) binary numbers only got popular quite recently—long after lots of much-harder-to-explain mathematical ideas.

So, OK, we don’t need to follow the history of human mathematics or science—or, for that matter, the order in which it’s taught to humans. But we need to find things that can be understood very directly—without outside knowledge or words. Things that for example we’d recognize if we just unearthed them without context in some archeological dig.

Well, it so happens that there’s a class of computational systems that I’ve studied for decades that I think fit the bill remarkably well: cellular automata. They’re based on simple rules that are easy to display visually. And they work by repeatedly applying these rules, and often generating complex patterns—that we now know can be used as the basis for all sorts of interesting technology.

Cellular Automata

From looking at cellular automata one can actually start to build up a whole world view, or, as I called the book I wrote about such things, A New Kind of Science. But what if we want to communicate more traditional ideas in human science and mathematics? What should we do then?

Pythagorean TheoremMaybe we could start by showing 2D geometrical figures. Gauss suggested back around 1820 that one could carve a picture of the standard visual for the Pythagorean theorem out of the Siberian forest, for aliens to see.

It’s easy to get into trouble, though. We might think of showing Platonic solids. And, yes, 3D printouts should work. But 2D perspective renderings depend in a lot of detail on our particular visual systems. Networks are even worse: how are we to know that those lines joining nodes represent abstract connections?

One might think about logic: perhaps start showing the true theorems of logic. But how would one present them? Somehow one has to have a symbolic representation: textual, expression trees, or something. From what we know now about computational knowledge, logic isn’t a particularly good global starting point for representing general concepts. But in the 1950s this wasn’t clear, and there was a charming book (my copy of which wound up on the set of Arrival) that tried to build up a whole way to communicate with aliens using logic:

LINCOS book cover

But what about things with numbers? In Contact (the movie), prime numbers are key. Well, despite their importance in the history of human mathematics, primes actually don’t figure much in today’s technology, and when they do (like in public-key cryptosystems) it usually seems somehow incidental that they’re what’s used.

In a radio signal, primes might at first seem like good “evidence for intelligence”. But of course primes can be generated by programs—and actually by fairly simple ones, including for example cellular automata. And so if one sees a sequence of primes, it’s not immediate evidence that there’s a whole elaborate civilization behind it; it might just come from a simple program that somehow “arose naturally”.

One can easily illustrate primes visually (not least as numbers of objects that can’t be arranged in non-trivial rectangles). But going further with them seems to require concepts that can’t be represented so directly.

Hyrdogren diagram from Pioneer 10 plaqueIt’s awfully easy to fall into implicitly assuming a lot of human context. Pioneer 10—the human artifact that’s gone further into interstellar space than any other (currently about 11 billion miles, which is about 0.05% of the distance to α Centauri)—provides one of my favorite examples. There’s a plaque on that spacecraft that includes a representation of the wavelength of the 21-centimeter spectral line of hydrogen. Now the most obvious way to represent that would probably just be a line 21 cm long. But back in 1972 Carl Sagan and others decided to do something “more scientific”, and instead made a schematic diagram of the quantum mechanical process leading to the spectral line. The problem is that this diagram relies on conventions from human textbooks—like using arrows to represent quantum spins—that really have nothing to do with the underlying concepts and are incredibly specific to the details of how science happened to develop for us humans.

But back to Arrival. To ask a question like “what is your purpose on Earth?” one has to go a lot further than just talking about things like binary sequences or cellular automata. It’s a very interesting problem, and one that’s strangely analogous to something that’s becoming very important right now in the world: communicating with AIs, and defining what goals or purposes they should have (notably “be nice to the humans”).

In a sense, AIs are a little like alien intelligences, right now, here on Earth. The only intelligence we really understand so far is human intelligence. But inevitably every example we see of it shares all the details of the human condition and of human history. So what is intelligence like when it doesn’t share those details?

Well, one of the things that’s emerged from basic science I’ve done is that there isn’t really a bright line between the “intelligent” and the merely “computational”. Things like cellular automata—or the weather—are doing things just as complex as our brains. But even if in some sense they’re “thinking”, they’re not doing so in human-like ways. They don’t share our context and our details.

But if we’re going to “communicate” about things like purpose, we’ve got to find some way to align things. In the AI case, I’ve in fact been working on creating what I call a “symbolic discourse language” that’s a way of expressing concepts that are important to us humans, and communicating them to AIs. There are short-term practical applications, like setting up smart contracts. And there are long-term goals, like defining some analog of a “constitution” for how AIs should generally behave.

Well, in communicating with aliens, we’ve got to build up a common “universal” language that allows us to express concepts that are important to us. That’s not going to be easy. Human natural languages are based on the particulars of the human condition and the history of human civilization. And my symbolic discourse language is really just trying to capture things that are important to humans—not what might be important to aliens.

Of course, in Arrival, we already know that the aliens share some things with us. After all, like the monolith in 2001: A Space Odyssey, even from their shape we recognize the aliens’ spaceships as artifacts. They don’t seem like weird meteorites or something; they seem like something that was made “on purpose”.

But what purpose? Well, purpose is not really something that can be defined abstractly. It’s really something that can be defined only relative to a whole historical and cultural framework. So to ask aliens what their purpose is, we first have to have them understand the historical and cultural framework in which we operate.

Somehow I wonder about the day when we’ll have developed our AIs to the point where we can start asking them what their purpose is. At some level I think it’s going to be disappointing. Because, as I’ve said, I don’t think there’s any meaningful abstract definition of purpose. So there’s nothing “surprising” the AI will tell us. What it considers its purpose will just be a reflection of its detailed history and context. Which in the case of the AI—as its ultimate creators—we happen to have considerable control over.

For aliens, of course, it’s a different story. But that’s part of what Arrival is about.

The Movie Process

I’ve spent a lot of my life doing big projects—and I’m always curious how big projects of any kind are organized. When I see a movie I’m one of those people who sits through to the end of the credits. So it was pretty interesting for me to see the project of making a movie a little closer up in Arrival.

In terms of scale, making a movie like Arrival is a project of about the same size as releasing a major new version of the Wolfram Language. And it’s clear there are some similarities—as well as lots of differences.

Both involve all sorts of ideas and creativity. Both involve pulling together lots of different kinds of skills. Both have to have everything fit together to make a coherent product in the end.

In some ways I think movie-makers have it easier than us software developers. After all, they just have to make one thing that people can watch. In software—and particularly in language design—we have to make something that different people can use in an infinite diversity of different ways, including ones we can’t directly foresee. Of course, in software you always get to make new versions that incrementally improve things; in movies you just get one shot.

And in terms of human resources, there are definitely ways software has it easier than a movie like Arrival. Well-managed software development tends to have a somewhat steady rhythm, so one can have consistent work going on, with consistent teams, for years. In making a movie like Arrival one’s usually bringing in a whole sequence of people—who might never even have met before—each for a very short time. To me, it’s amazing this can work at all. But I guess over the years many of the tasks in the movie industry have become standardized enough that someone can be there for a week or two and do something, then successfully hand it on to another person.

I’ve led a few dozen major software releases in my life. And one might think that by now I’d have got to the point where doing a software release would just be a calm and straightforward process. But it never is. Perhaps it’s because we’re always trying to do majorly new and innovative things. Or perhaps it’s just the nature of such projects. But I’ve found that to get the project done to the quality level I want always requires a remarkable degree of personal intensity. Yes, at least in the case of our company, there are always extremely talented people working on the project. But somehow there are always things to do that nobody expected, and it takes a lot of energy, focus and pushing to get them all together.

At times, I’ve imagined that the process might be a little like making a movie. And in fact in the early years of Mathematica, for example, we even used to have “software credits” that looked very much like movie credits—except that the categories of contributors were things that often had to be made up by me (“lead package developers”, “expression formatting”, “lead font designer”, …). But after a decade or so, recognizing the patchwork of contributions to different versions just became too complex, and so we had to give up on software credits. Still, for a while I thought we’d try having “wrap parties”, just like for movies. But somehow when the scheduled party came around, there was always some critical software issue that had come up, and the key contributors couldn’t come to the party because they were off fixing it.

Software development—or at least language development—also has some structural similarities to movie making. One starts from a script—an overall specification of what one wants the finished product to be like. Then one actually tries to build it. Then, inevitably, at the end when one looks at what one has, one realizes one has to change the specification. In movies like Arrival, that’s post-production. In software, it’s more an iteration of the development process.

It was interesting to me to see how the script and the suggestions I made for it propagated through the making of Arrival. It reminded me quite a lot of how I, at least, do software design: everything kept on getting simpler. I’d suggest some detailed way to fix a piece of dialogue. “You shouldn’t say [the Amy Adams character] flunked calculus; she’s way too analytical for that.” “You shouldn’t say the spacecraft came a million light years; that’s outside the galaxy; say a trillion miles instead.” The changes would get made. But then things would get simpler, and the core idea would get communicated in some more minimal way. I didn’t see all the steps (though that would have been interesting). But the results reminded me quite a lot of the process of software design I’ve done so many times—cut out any complexity one can, and make everything as clear and minimal as possible.

Can You Write a Whiteboard?

My contributions to Arrival were mostly concentrated around the time the movie was shooting early in the summer of 2015. And for almost a year all I heard was that the movie was “in post-production”. But then suddenly in May of this year I get an email: could I urgently write a bunch of relevant physics on a whiteboard for the movie?

There was a scene with Amy Adams in front of a whiteboard, and somehow what was written on the whiteboard when the scene was shot was basic high-school-level physics—not the kind of top-of-the-line physics one would expect from people like the Jeremy Renner character in the movie.

Somewhat amusingly, I don’t think I’ve ever written much on a whiteboard before. I’ve used computers for essentially all my work and presentations for more than 30 years, and before that the prevailing technologies were blackboards and overhead projector transparencies. Still, I duly got a whiteboard set up in my office, and got to work writing (in my now-very-rarely-used handwriting) some things I imagined a good physicist might think of if they were trying to understand an interstellar spacecraft that had just showed up.

Here’s what I came up with. The big spaces on the whiteboard were there to make it easier to composite in Amy Adams (and particularly her hair) moving around in front of the whiteboard. (In the end, the whiteboard got rewritten yet again for the final movie, so what’s here isn’t in detail what’s in the movie.)


In writing the whiteboard, I imagined it as a place where the Jeremy Renner character or his colleagues would record notable ideas about the spacecraft, and formulas related to them. And after a little while, I ended up with quite a tale of physics fact and speculation.

Here’s a key:

Whiteboard key

(1) Maybe the spacecraft has its strange (here, poorly drawn) rattleback-like shape because it spins as it travels, generating gravitational waves in spacetime in the process.

(2) Maybe the shape of the spacecraft is somehow optimized for producing a maximal intensity of some pattern of gravitational radiation.

(3) This is Einstein’s original formula for the strength of gravitational radiation emitted by a changing mass distribution. Qij is the quadrupole moment of the distribution, computed from the integral shown.

(4) There are higher-order terms, that depend on higher-order multipole moments, computed by these integrals of the spacecraft mass density ρ(Ω) weighted by spherical harmonics.

(5) The gravitational waves would lead to a perturbation in the structure of spacetime, represented by the 4-dimensional tensor hμν.

(6) Maybe the spacecraft somehow “swims” through spacetime, propelled by the effects of these gravitational waves.

(7) Maybe around the skin of the spacecraft, there’s “gravitational turbulence” in the structure of spacetime, with power-law correlations like the turbulence one sees around objects moving in fluids. (Or maybe the spacecraft just “boils spacetime” around it…)

(8) This is the Papapetrou equation for how a spin tensor evolves in General Relativity, as a function of proper time τ.

(9) The equation of geodesic motion describing how things move in (potentially curved) spacetime. Γ is the Christoffel symbol determined by the structure of spacetime. And, yes, one can just go ahead and solve such equations using NDSolve in the Wolfram Language.

(10) Einstein’s equation for the gravitational field produced by a moving mass (the field determines the motion of the mass, which in turn reacts back to change the field).

(11) A different idea is that the spacecraft might somehow have negative mass, or at least negative pressure. A photon gas has pressure 1/3 ρ; the most common version of dark energy would have pressure −ρ.

(12) The equation for the energy–momentum tensor, that specifies the combination of mass, pressure and velocity that appears in relativistic computations for perfect fluids.

(13) Maybe the spacecraft represents a “bubble” in which the structure of spacetime is different. (The arrow pointed to a schematic spacecraft shape pre-drawn on the whiteboard.)

(14) Is there anything special about the Christoffel symbols (“coefficients of the connection on the tangent fiber bundle”) for the shape of the spacecraft, as computed from its spatial metric tensor?

(15) A gravitational wave can be described as a perturbation in the metric of spacetime relative to flat background Minkowski space where Special Relativity operates.

(16) The equation for the propagation of a gravitational wave, taking into account the first few “nonlinear” effects of the wave on itself.

(17) The relativistic Boltzmann equation describing motion (“transport”) and collision in a gas of Bose–Einstein particles like gravitons.

(18) A far-out idea: maybe there’s a way of making a “laser” using gravitons rather than photons, and maybe that’s how the spacecraft works.

(19) Lasers are a quantum phenomenon. This is a Feynman diagram of self-interaction of gravitons in a cavity. (Photons don’t have these kinds of direct “nonlinear” self interactions.)

(20) How might one make a mirror for gravitons? Maybe one can make a metamaterial with a carefully constructed microscopic structure all the way down to the Planck scale.

(21) Lasers involve coherent states made from superpositions of infinite numbers of photons, as formed by infinitely nested creation operators applied to the quantum field theoretic vacuum.

(22) There’s a Feynman diagram for that: this is a Bethe–Salpeter-type self-consistent equation for a graviton bound state (which we don’t know exists) that might be relevant to a graviton laser.

(23) Basic nonlinear interactions of gravitons in a perturbative approximation to quantum gravity.

(24) A possible correction term for the Einstein–Hilbert action of General Relativity from quantum effects.

Eek, I can see how these explanation might seem like they’re in an alien language themselves! Still, they’re actually fairly tame compared to “full physics-speak”. But let me explain a bit of the “physics story” on the whiteboard.

It starts from an obvious feature of the spacecraft: its rather unusual, asymmetrical shape. It looks a bit like one of those rattleback tops that one can start spinning one way, but then it changes direction. So I thought: maybe the spacecraft spins around. Well, any massive (non-spherical) object spinning around will produce gravitational waves. Usually they’re absurdly too weak to detect, but if the object is sufficiently massive or spins sufficiently rapidly, they can be substantial. And indeed, late last year, after a 30-year odyssey, gravitational waves from two black holes spinning around and merging were detected—and they were sufficiently intense to detect from a third of the way across the universe. (Accelerating masses effectively generate gravitational waves like accelerating electric charges generate electromagnetic waves.)

OK, so let’s imagine the spacecraft somehow spins rapidly enough to generate lots of gravitational waves. And what if we could somehow confine those gravitational waves in a small region, maybe even by using the motion of the spacecraft itself? Well, then the waves would interfere with themselves. But what if the waves got coherently amplified, like in a laser? Well, then the waves would get stronger, and they’d inevitably start having a big effect on the motion of the spacecraft—like perhaps pushing it through spacetime.

But why should the gravitational waves get amplified? In an ordinary laser that uses photons (“particles of light”), one basically needs to continually make new photons by pumping energy into a material. Photons are so-called Bose–Einstein particles (“bosons”) which means that they tend to all “do the same thing”—which is why the light in a laser comes out as a coherent wave. (Electrons are fermions, which means that they try never to do the same thing, leading to the Exclusion Principle that’s crucial in making matter stable, etc.)

Just as light waves can be thought of as made up of photons, so gravitational waves can most likely be thought of as made up of gravitons (though, to be fair, we don’t yet have any fully consistent theory of gravitons). Photons don’t interact directly with each other—basically because photons interact with things like electrons that have electric charge, but photons themselves don’t have electric charge. Gravitons, on the other hand, do interact directly with each other—basically because they interact with things that have any kind of energy, and they themselves can have energy.

These kinds of nonlinear interactions can have wild effects. For example, gluons in QCD have nonlinear interactions that have the effect of keeping them permanently confined inside the particles like protons that they keep “glued” together. It’s not at all clear what nonlinear interactions of gravitons might do. The idea here is that perhaps they’d lead to some kind of self-sustaining “graviton laser”.

The formulas at the top of the whiteboard are basically about the generation and effects of gravitational waves. The ones at the bottom are mostly about gravitons and their interactions. The formulas at the top are basically all associated with Einstein’s General Theory of Relativity (which for 100 years has been the theory of gravity used in physics). The formulas at the bottom give a mixture of classical and quantum approaches to gravitons and their interactions. The diagrams are so-called Feynman diagrams, in which wavy lines schematically represent gravitons propagating through spacetime.

I have no real idea if a “graviton laser” is possible, or how it would work. But in an ordinary photon laser, the photons always effectively bounce around inside some kind of cavity whose walls act as mirrors. Unfortunately, however, we don’t know how to make a graviton mirror—just like we don’t know any way of making something that will shield a gravitational field (well, dark matter sort of would, if it actually exists). For the whiteboard, I made the speculation that perhaps there’s some weird way of making a “metamaterial” down at the Planck scale of 10-34 meters (where quantum effects in gravity basically have to become important) that could act as a graviton mirror. (Another possibility is that a graviton laser could work more like a free-electron laser without a cavity as such.)

Now, remember, my idea with the whiteboard was to write what I thought a typical good physicist, say plucked from a government lab, might think about if confronted with the situation in the movie. It’s more “conventional” than the theory I personally came up with for how to make an interstellar spacecraft. But that’s because my theory depends on a bunch of my own ideas about how fundamental physics works, that aren’t yet mainstream in the physics community.

What’s the correct theory of interstellar travel? Needless to say, I don’t know. I’d be amazed if either the main theory I invented for the movie or the theory on the whiteboard were correct as they stand. But who knows? And of course it’d be extremely helpful if some aliens showed up in interstellar spaceships to even show us that interstellar travel is possible…

What Is Your Purpose on Earth?

If aliens show up on Earth, one of the obvious big questions is: why are you here? What is your purpose? It’s something the characters in Arrival talk about a lot. And when Christopher and I were visiting the set we were asked to make a list of possible answers, that could be put on a whiteboard or a clipboard. Here’s what we came up with:

What Is Your Purpose on Earth?

As I mentioned before, the whole notion of purpose is something that’s very tied into cultural and other context. And it’s interesting to think about what purposes one would have put on this list at different times in human history. It’s also interesting to imagine what purposes humans—or AIs—might give for doing things in the future. Perhaps I’m too pessimistic but I rather expect that for future humans, AIs, and aliens, the answer will very often be something out there in the computational universe of possibilities—that we today aren’t even close to having words or concepts for.

And Now It’s a Movie…

The movie came together really well, the early responses look great… and it’s fun to see things like this (yes, that’s Christopher’s code):


It’s been interesting and stimulating to be involved with Arrival. It’s let me understand a little more about just what’s involved in creating all those movies I see—and what it takes to merge science with compelling fiction. It’s also led me to ask some science questions beyond any I’ve asked before—but that relate to all sorts of things I’m interested in.

But through all of this, I can’t help wondering: “What if it was real, and aliens did arrive on Earth?”. I’d like to think that being involved with Arrival has made me a little more prepared for that. And certainly if their spaceships do happen to look like giant black rattlebacks, we’ll even already have some nice Wolfram Language code for that…

]]> 17
<![CDATA[A Short Talk on AI Ethics]]> Mon, 17 Oct 2016 19:49:55 +0000 Stephen Wolfram aiethics-thumbLast week I gave a talk (and did a panel discussion) at a conference entitled “Ethics of Artificial Intelligence” held at the NYU Philosophy Department’s Center for Mind, Brain and Consciousness. Here’s the video and a transcript: Thanks for inviting me here today. You know, it’s funny to be here. My mother was a philosophy [...]]]> aiethics-thumb

Last week I gave a talk (and did a panel discussion) at a conference entitled “Ethics of Artificial Intelligence” held at the NYU Philosophy Department’s Center for Mind, Brain and Consciousness. Here’s the video and a transcript:

Thanks for inviting me here today.

You know, it’s funny to be here. My mother was a philosophy professor in Oxford. And when I was a kid I always said the one thing I’d never do was do or talk about philosophy. But, well, here I am.

Before I really get into AI, I think I should say a little bit about my worldview. I’ve basically spent my life alternating between doing basic science and building technology. I’ve been interested in AI for about as long as I can remember. But as a kid I started out doing physics and cosmology and things. That got me into building technology to automate stuff like math. And that worked so well that I started thinking about, like, how to really know and compute everything about everything. That was in about 1980—and at first I thought I had to build something like a brain, and I was studying neural nets and so on. But I didn’t get too far.

And meanwhile I got interested in an even bigger problem in science: how to make the most general possible theories of things. The dominant idea for 300 years had been to use math and equations. But I wanted to go beyond them. And the big thing I realized was that the way to do that was to think about programs, and the whole computational universe of possible programs.

Cellular automata grid

And that led to my personal Galileo-like moment. I just pointed my “computational telescope” at these simplest possible programs, and I saw this amazing one I called rule 30—that just seemed to go on producing complexity forever from essentially nothing.

Rule 30

Well, after I’d seen this, I realized this is actually something that happens all over the computational universe—and all over nature. It’s really the secret that lets nature make all the complicated stuff we see. But it’s something else too: it’s a window into what raw, unfettered computation is like. At least traditionally when we do engineering we’re always building things that are simple enough that we can foresee what they’ll do.

But if we just go out into the computational universe, things can be much wilder. Our company has done a lot of mining out there, finding programs that are useful for different purposes, like rule 30 is for randomness. And modern machine learning is kind of part way from traditional engineering to this kind of free-range mining.

But, OK, what can one say in general about the computational universe? Well, all these programs can be thought of as doing computations. And years ago I came up with what I call the Principle of Computational Equivalence—that says that if behavior isn’t obviously simple, it typically corresponds to a computation that’s maximally sophisticated. There are lots of predictions and implications of this. Like that universal computation should be ubiquitous. As should undecidability. And as should what I call computational irreducibility.

An example of cellular automata

Can you predict what it’s going to do? Well, it’s probably computationally irreducible, which means you can’t figure out what it’s going to do without effectively tracing every step and going through the same computational effort it does. It’s completely deterministic. But to us it’s got what seems like free will—because we can never know what it’s going to do.

Here’s another thing: what’s intelligence? Well, our big unifying principle says that everything—from a tiny program, to our brains, is computationally equivalent. There’s no bright line between intelligence and mere computation. The weather really does have a mind of its own: it’s doing computations just as sophisticated as our brains. To us, though, it’s pretty alien computation. Because it’s not connected to our human goals and experiences. It’s just raw computation that happens to be going on.

So how do we tame computation? We have to mold it to our goals. And the first step there is to describe our goals. And for the past 30 years what I’ve basically been doing is creating a way to do that.

I’ve been building a language—that’s now called the Wolfram Language—that allows us to express what we want to do. It’s a computer language. But it’s not really like other computer languages. Because instead of telling a computer what to do in its terms, it builds in as much knowledge as possible about computation and the world, so that we humans can describe in our terms what we want, and then it’s up to the language to get it done as automatically as possible.

This basic idea has worked really well, and in the form of Mathematica it’s been used to make endless inventions and discoveries over the years. It’s also what’s inside Wolfram|Alpha. Where the idea is to take pure natural language questions, understand them, and use the kind of curated knowledge and algorithms of our civilization to answer them. And, yes, it’s a very classic AIish thing. And of course it’s computed answers to billions and billions of questions from humans, for example inside Siri.

I had an interesting experience recently, figuring out how to use what we’ve built to teach computational thinking to kids. I was writing exercises for a book. At the beginning, it was easy: “make a program to do X”. But later on, it was like “I know what to say in the Wolfram Language, but it’s really hard to express in English”. And of course that’s why I just spent 30 years building the Wolfram Language.

English has maybe 25,000 common words; the Wolfram Language has about 5000 carefully designed built-in constructs—including all the latest machine learning—together with millions of things based on curated data. And the idea is that once one can think about something in the world computationally, it should be as easy as possible to express it in the Wolfram Language. And the cool thing is, it really works. Humans, including kids, can read and write the language. And so can computers. It’s a kind of high-level bridge between human thinking, in its cultural context, and computation.

OK, so what about AI? Technology has always been about finding things that exist, and then taming them to automate the achievement of particular human goals. And in AI the things we’re taming exist in the computational universe. Now, there’s a lot of raw computation seething around out there—just as there’s a lot going on in nature. But what we’re interested in is computation that somehow relates to human goals.

So what about ethics? Well, maybe we want to constrain the computation, the AI, to only do things we consider ethical. But somehow we have to find a way to describe what we mean by that.

Well, in the human world, one way we do this is with laws. But so how do we connect laws to computations? We may call them “legal codes”, but today laws and contracts are basically written in natural language. There’ve been simple computable contracts in areas like financial derivatives. And now one’s talking about smart contracts around cryptocurrencies.

But what about the vast mass of law? Well, Leibniz—who died 300 years ago next month—was always talking about making a universal language to, as we would say now, express it all in a computable way. He was a few centuries too early, but I think now we’re finally in a position to do this.

I just posted a long blog about all this last week, but let me try to summarize. With the Wolfram Language we’ve managed to express a lot of kinds of things in the world—like the ones people ask Siri about. And I think we’re now within sight of what Leibniz wanted: to have a general symbolic discourse language that represents everything involved in human affairs.

I see it basically as a language design problem. Yes, we can use natural language to get clues, but ultimately we have to build our own symbolic language. It’s actually the same kind of thing I’ve done for decades in the Wolfram Language. Take even a word like “plus”. Well, in the Wolfram Language there’s a function called Plus, but it doesn’t mean the same thing as the word. It’s a very specific version, that has to do with adding things mathematically. And as we design a symbolic discourse language, it’s the same thing. The word “eat” in English can mean lots of things. But we need a concept—that we’ll probably refer to as “eat”—that’s a specific version, that we can compute with.

So let’s say we’ve got a contract written in natural language. One way to get a symbolic version is to use natural language understanding—just like we do for billions of Wolfram|Alpha inputs, asking humans about ambiguities. Another way might be to get machine learning to describe a picture. But the best way is just to write in symbolic form in the first place, and actually I’m guessing that’s what lawyers will be doing before too long.

And of course once you have a contract in symbolic form, you can start to compute about it, automatically seeing if it’s satisfied, simulating different outcomes, automatically aggregating it in bundles, and so on. Ultimately the contract has to get input from the real world. Maybe that input is “born digital”, like data about accessing a computer system, or transferring bitcoin. Often it’ll come from sensors and measurements—and it’ll take machine learning to turn into something symbolic.

Well, if we can express laws in computable form maybe we can start telling AIs how we want them to act. Of course it might be better if we could boil everything down to simple principles, like Asimov’s Laws of Robotics, or utilitarianism or something.

But I don’t think anything like that is going to work. What we’re ultimately trying to do is to find perfect constraints on computation, but computation is something that’s in some sense infinitely wild. The issue already shows up in Gödel’s Theorem. Like let’s say we’re looking at integers and we’re trying to set up axioms to constrain them to just work the way we think they do. Well, what Gödel showed is that no finite set of axioms can ever achieve this. With any set of axioms you choose, there won’t just be the ordinary integers; there’ll also be other wild things.

And the phenomenon of computational irreducibility implies a much more general version of this. Basically, given any set of laws or constraints, there’ll always be “unintended consequences”. This isn’t particularly surprising if one looks at the evolution of human law. But the point is that there’s theoretically no way around it. It’s ubiquitous in the computational universe.

Now I think it’s pretty clear that AI is going to get more and more important in the world—and is going to eventually control much of the infrastructure of human affairs, a bit like governments do now. And like with governments, perhaps the thing to do is to create an AI Constitution that defines what AIs should do.

What should the constitution be like? Well, it’s got to be based on a model of the world, and inevitably an imperfect one, and then it’s got to say what to do in lots of different circumstances. And ultimately what it’s got to do is provide a way of constraining the computations that happen to be ones that align with our goals. But what should those goals be? I don’t think there’s any ultimate right answer. In fact, one can enumerate goals just like one can enumerate programs out in the computational universe. And there’s no abstract way to choose between them.

But for us there’s a way to choose. Because we have particular biology, and we have a particular history of our culture and civilization. It’s taken us a lot of irreducible computation to get here. But now we’re just at some point in the computational universe, that corresponds to the goals that we have.

Human goals have clearly evolved through the course of history. And I suspect they’re about to evolve a lot more. I think it’s pretty inevitable that our consciousness will increasingly merge with technology. And eventually maybe our whole civilization will end up as something like a box of a trillion uploaded human souls.

But then the big question is: “what will they choose to do?”. Well, maybe we don’t even have the language yet to describe the answer. If we look back even to Leibniz’s time, we can see all sorts of modern concepts that hadn’t formed yet. And when we look inside a modern machine learning or theorem proving system, it’s humbling to see how many concepts it effectively forms—that we haven’t yet absorbed in our culture.

Maybe looked at from our current point of view, it’ll just seem like those disembodied virtual souls are playing videogames for the rest of eternity. At first maybe they’ll operate in a simulation of our actual universe. Then maybe they’ll start exploring the computational universe of all possible universes.

But at some level all they’ll be doing is computation—and the Principle of Computational Equivalence says it’s computation that’s fundamentally equivalent to all other computation. It’s a bit of a letdown. Our proud future ending up being computationally equivalent just to plain physics, or to little rule 30.

Of course, that’s just an extension of the long story of science showing us that we’re not fundamentally special. We can’t look for ultimate meaning in where we’ve reached. We can’t define an ultimate purpose. Or ultimate ethics. And in a sense we have to embrace the details of our existence and our history.

There won’t be a simple principle that encapsulates what we want in our AI Constitution. There’ll be lots of details that reflect the details of our existence and history. And the first step is just to understand how to represent those things. Which is what I think we can do with a symbolic discourse language.

And, yes, conveniently I happen to have just spent 30 years building the framework to create such a thing. And I’m keen to understand how we can really use it to create an AI Constitution.

So I’d better stop talking about philosophy, and try to answer some questions.

After the talk there was a lively Q&A (followed by a panel discussion), included on the video.  Some questions were:

  • When will AI reach human-level intelligence?
  • What are the difficulties you foresee in developing a symbolic discourse language?
  • Do we live in a deterministic universe?
  • Is our present reality a simulation?
  • Does free will exist, and how does consciousness arise from computation?
  • Can we separate rules and principles in a way that is computable for AI?
  • How can AI navigate contradictions in human ethical systems?
]]> 6
<![CDATA[Computational Law, Symbolic Discourse and the AI Constitution]]> Wed, 12 Oct 2016 18:30:45 +0000 Stephen Wolfram Computational Law, Symbolic Discourse and the AI ConstitutionLeibniz’s Dream Gottfried Leibniz—who died 300 years ago this November—worked on many things. But a theme that recurred throughout his life was the goal of turning human law into an exercise in computation. Of course, as we know, he didn’t succeed. But three centuries later, I think we’re finally ready to give it a serious [...]]]> Computational Law, Symbolic Discourse and the AI Constitution

Leibniz’s Dream

Computational Law, Discourse Language and the AI Constitution Gottfried Leibniz—who died 300 years ago this November—worked on many things. But a theme that recurred throughout his life was the goal of turning human law into an exercise in computation. Of course, as we know, he didn’t succeed. But three centuries later, I think we’re finally ready to give it a serious try again. And I think it’s a really important thing to do—not just because it’ll enable all sorts of new societal opportunities and structures, but because I think it’s likely to be critical to the future of our civilization in its interaction with artificial intelligence.

Human law, almost by definition, dates from the very beginning of civilization—and undoubtedly it’s the first system of rules that humans ever systematically defined. Presumably it was a model for the axiomatic structure of mathematics as defined by the likes of Euclid. And when science came along, “natural laws” (as their name suggests) were at first viewed as conceptually similar to human laws, except that they were supposed to define constraints for the universe (or God) rather than for humans.

Over the past few centuries we’ve had amazing success formalizing mathematics and exact science. And out of this there’s a more general idea that’s emerged: the idea of computation. In computation, we’re dealing with arbitrary systems of rules—not necessarily ones that correspond to mathematical concepts we know, or features of the world we’ve identified. So now the question is: can we use the ideas of computation, in very much the way Leibniz imagined, to formalize human law?

The basic issue is that human law talks about human activities, and (unlike say for the mechanics of particles) we don’t have a general formalism for describing human activities. When it comes to talking about money, for example, we often can be precise. And as a result, it’s pretty easy to write a very formal contract for paying a subscription, or determining how an option on a publicly traded stock should work.

But what about all the things that typical legal contracts deal with? Well, clearly we have one way to write legal contracts: just use natural language (like English). It’s often very stylized natural language, because it’s trying to be as precise as possible. But ultimately it’s never going to be precise. Because at the lowest level it’s always going to depend on the meanings of words, which for natural language are effectively defined just by the practice and experience of the users of the language.

A New Kind of Language

For a computer language, though, it’s a different story. Because now the constructs in the language are absolutely precise: instead of having a vague, societally defined effect on human brains, they’re defined to have a very specific effect on a computer. Of course, traditional computer languages don’t directly talk about things relevant to human activities: they only directly talk about things like setting values for variables, or calling abstractly defined functions.

But what I’m excited about is that we’re starting to have a bridge between the precision of traditional computer languages and the ability to talk about real-world constructs. And actually, it’s something I’ve personally been working on for more than three decades now: our knowledge-based Wolfram Language.

The Wolfram Language is precise: everything in it is defined to the point where a computer can unambiguously work with it. But its unique feature among computer languages is that it’s knowledge based. It’s not just a language to describe the low-level operations of a computer; instead, built right into the language is as much knowledge as possible about the real world. And this means that the language includes not just numbers like 2.7 and strings like “abc”, but also constructs like the United States, or the Consumer Price Index, or an elephant. And that’s exactly what we need in order to start talking about the kinds of things that appear in legal contracts or human laws.

I should make it clear that the Wolfram Language as it exists today doesn’t include everything that’s needed. We’ve got a large and solid framework, and we’re off to a good start. But there’s more about the world that we have to encode to be able to capture the full range of human activities and human legal specifications.

The Wolfram Language has, for example, a definition of what a banana is, broken down by all kinds of details. So if one says “you should eat a banana”, the language has a way to represent “a banana”. But as of now, it doesn’t have a meaningful way to represent “you”, “should” or “eat”.

Is it possible to represent things like this in a precise computer language? Absolutely! But it takes language design to set up how to do it. Language design is a difficult business—in fact, it’s probably the most intellectually demanding thing I know, requiring a strange mixture of high abstraction together with deep knowledge and down-to-earth practical judgment. But I’ve been doing it now for nearly four decades, and I think I’m finally ready for the challenge of doing language design for everyday discourse.

So what’s involved? Well, let’s first talk about it in a simpler case: the case of mathematics. Consider the function Plus, which adds things like numbers together. When we use the English word “plus” it can have all sorts of meanings. One of those meanings is adding numbers together. But there are other meanings, that are related, say, by various analogies (“product X plus”, “the plus wire”, “it’s a real plus”, …).

When we come to define Plus in the Wolfram Language we want to build on the everyday notion of “plus”, but we want to make it precise. And we can do that by picking the specific meaning of “plus” that’s about adding things like numbers together. And once we know that this is what Plus means, we immediately know all sorts of properties, and can do explicit computations with it.

Now consider a concept like “magnesium”. It’s not as perfect and abstract a concept as Plus. But physics and chemistry give us a clear definition of the element magnesium—which we can then use in the Wolfram Language to have a well-defined “magnesium” entity.

It’s very important that the Wolfram Language is a symbolic language—because it means that the things in it don’t immediately have to have “values”; they can just be symbolic constructs that stand for themselves. And so, for example, the entity “magnesium” is represented as a symbolic construct, that doesn’t itself “do” anything, but can still appear in a computation, just like, for example, a number (like 9.45) can appear.

There are many kinds of constructs that the Wolfram Language supports. Like “New York City” or “last Christmas” or “geographically contained within”. And the point is that the design of the language has defined a precise meaning for them. New York City, for example, is taken to mean the precise legal entity considered to be New York City, with geographical borders defined by law. Internal to the Wolfram Language, there’s always a precise canonical representation for something like New York City (it’s Entity["City", {"NewYork", "NewYork", "UnitedStates"}]). And this internal representation is all that matters when it comes to computation. Yes, it’s convenient to refer to New York City as “nyc”, but in the Wolfram Language that natural language form is immediately converted to the precise internal form.

So what about “you should eat a banana”? Well, we’ve got to go through the same language design process for something like “eat” as for Plus (or “banana”). And the basic idea is that we’ve got to figure out a standard meaning for “eat”. For example, it might be “ingestion of food by a person (or animal)”. Now, there are plenty of other possible meanings for the English word “eat”—for example, ones that use analogies, as in “this function eats its arguments”. But the idea—like for Plus—is to ignore these, and just to define a standard notion of “eat” that is precise, and suitable for computation.

One gets a reasonable idea of what kinds of constructs one has to deal with just by thinking about parts of speech in English. There are nouns. Sometimes (as in “banana” or “elephant”) there’s a pretty precise definition of what these correspond to, and usually the Wolfram Language already knows about them. Sometimes it’s a little vaguer but still concrete (as in “chair” or “window”), and sometimes it’s abstract (like “happiness” or “justice”). But in each case one can imagine one or several entities that capture a definite meaning for the noun—just like the Wolfram Language already has entities for thousands of kinds of things.

Beyond nouns, there are verbs. There’s typically a certain superstructure that exists around verbs. Grammatically there might be a subject for the verb, and an object, and so on. Verbs are similar to functions in the Wolfram Language: each one deals with certain arguments, that for example correspond to its subject, object, etc. Now of course in English (or any other natural language) there are all sorts of elaborate special cases and extra features that can be associated with verbs. But basically we don’t care about these. Because we’re really just trying to define symbolic constructs that represent certain concepts. We don’t have to capture every detail of how a particular verb works; we’re just using the English verb as a way to give us a kind of “cognitive hook” for the concept.

We can go through other parts of speech. Adverbs that modify verbs; adjectives that modify nouns. These can sometimes be represented in the Wolfram Language by constructs like EntityInstance, and sometimes by options to functions. But the important point in all cases is that we’re not trying to faithfully reproduce how the natural language works; we’re just using the natural language as a guide to how concepts are set up.

Pronouns are interesting. They work a bit like variables in pure anonymous functions. In “you should eat a banana”, the “you” is like a free variable that’s going to be filled in with a particular person.

Parts of speech and grammatical structures suggest certain general features to capture in a symbolic representation of discourse. There are a bunch of others, though. For example, there are what amount to “calculi” that one needs to represent notions of time (“within the time interval”, “starting later”, etc.) or of space (“on top of”, “contained within”, etc.). We’ve already got many calculi like these in the Wolfram Language; the most straightforward are ones about numbers (“greater than”, etc.) or sets (“member of”), etc. Some calculi have long histories (“temporal logic”, “set theory”, etc.); others still have to be constructed.

Is there a global theory of what to do? Well, no more than there’s a global theory of how the world works. There are concepts and constructs that are part of how our world works, and we need to capture these. No doubt there’ll be new things that come along in the future, and we’ll want to capture those too. And my experience from building Wolfram|Alpha is that the best thing to do is just to build each thing one needs, without starting off with any kind of global theory. After a while, one may notice that one’s built similar things several times, and one may go in and unify them.

One can get deep into the foundations of science and philosophy about this. Yes, there’s a computational universe out there of all the possible rules by which systems can operate (and, yes, I’ve spent a good part of my life studying the basic science of this). And there’s our physical universe that presumably operates according to certain rules from the computational universe. But from these rules can emerge all sorts of complex behavior, and in fact the phenomenon of computational irreducibility implies that in a sense there’s no limit to what can be built up.

But there’s not going to be an overall way to talk about all this stuff. And if we’re going to be dealing with any finite kind of discourse it’s going to only capture certain features. Which features we choose to capture is going to be determined by what concepts have evolved in the history of our society. And usually these concepts will be mirrored in the words that exist in the languages we use.

At a foundational level, computational irreducibility implies that there’ll always be new concepts that could be introduced. Back in antiquity, Aristotle introduced logic as a way to capture certain aspects of human discourse. And there are other frameworks that have been introduced in the history of philosophy, and more recently, natural language processing and artificial intelligence research. But computational irreducibility effectively implies that none of them can ever ultimately be complete. And we must expect that as the concepts we consider relevant evolve, so too must the symbolic representation we have for discourse.

The Discourse Workflow

OK, so let’s say we’ve got a symbolic representation for discourse. How’s it actually going to be used? Well, there are some good clues from the way natural language works.

In standard discussions of natural language, it’s common to talk about “interrogative statements” that ask a question, “declarative statements” that assert something and “imperative statements” that say to do something. (Let’s ignore “exclamatory statements”, like expletives, for now.)

Interrogative statements are what we’re dealing with all the time in Wolfram|Alpha: “what is the density of gold?”, “what is 3+7?”, “what was the latest reading from that sensor?”, etc. They’re also common in notebooks used to interact with the Wolfram Language: there’s an input (In[1]:= 2+2) and then there’s a corresponding output (Out[1]= 4).

Declarative statements are all about filling in particular values for variables. In a very coarse way, one can set values (x=7), as in typical procedural languages. But it’s typically better to think about having environments in which one’s asserting things. Maybe those environments are supposed to represent the real world, or some corner of it. Or maybe they’re supposed to represent some fictional world, where for example dinosaurs didn’t go extinct, or something.

Imperative statements are about making things happen in the world: “open the pod bay doors”, “pay Bob 0.23 bitcoin”, etc.

In a sense, interrogative statements determine the state of the world, declarative statements assert things about the state of the world, and imperative statements change the state of the world.

In different situations, we can mean different things by “the world”. We could be talking about abstract constructs, like integers or logic operations, that just are the way they are. We could be talking about natural laws or other features of our physical universe that we can’t change. Or we could be talking about our local environment, where we can move around tables and chairs, choose to eat bananas, and so on. Or we could be talking about our mental states, or the internal state of something like a computer.

There are lots of things one can do if one has a general symbolic representation for discourse. But one of them—which is the subject of this post—is to express things like legal contracts. The beginning of a contract, with its various whereas clauses, recitals, definitions and so on tends to be dense with declarative statements (“this is so”). Then the actual terms of the contract tend to end up with imperative statements (“this should happen”), perhaps depending on certain things determined by interrogative statements (“did this happen?”).

It’s not hard to start seeing the structure of contracts as being much like programs. In simple cases, they just contain logical conditionals: “if X then Y”. In other cases they’re more modeled on math: “if this amount of X happens, that amount of Y should happen”. Sometimes there’s iteration: “keep doing X until Y happens”. Occasionally there’s some recursion: “keep applying X to every Y”. And so on.

There are already some places where legal contracts are routinely represented by what amount to programs. The most obvious are financial contracts for things like bonds and options—which just amount to little programs that define payouts based on various formulas and conditionals.

There’s a whole industry of using “rules engines” to encode certain kinds of regulations as “if then” rules, usually mixed with formulas. In fact, such things are almost universally used for tax and insurance computations. (They’re also common in pricing engines and the like.)

Of course, it’s no coincidence that one talks about “legal codes”. The word code—which comes from the Latin codex—originally referred to systematic collections of legal rules. And when programming came along a couple of millennia later, it used the word “code” because it basically saw itself as similarly setting up rules for how things should work, except now the things had to do with the operation of computers rather than the conduct of worldly affairs.

But now, with our knowledge-based computer language and the idea of a symbolic discourse language, what we’re trying to do is to make it so we can talk about a broad range of worldly affairs in the same kind of way that we talk about computational processes—so we put all those legal codes and contracts into computational form.

Code versus Language

How should we think about symbolic discourse language compared to ordinary natural language? In a sense, the symbolic discourse language is a representation in which all the nuance and “poetry” have been “crushed” out of the natural language. The symbolic discourse language is precise, but it’ll almost inevitably lose the nuance and poetry of the original natural language.

If someone says “2+2” to Wolfram|Alpha, it’ll dutifully answer “4”. But what if instead they say, “hey, will you work out 2+2 for me”. Well, that sets up a different mood. But Wolfram|Alpha will take that input and convert it to exactly the same symbolic form as “2+2”, and similarly just respond “4”.

This is exactly the kind of thing that’ll happen all the time with symbolic discourse language. And if the goal is to answer precise questions—or, for that matter, to create a precise legal contract, it’s exactly what one wants. One just needs the hard content that will actually have a consequence for what one’s trying to do, and in this case one doesn’t need the “extras” or “pleasantries”.

Of course, what one chooses to capture depends on what one’s trying to do. If one’s trying to get psychological information, then the “mood” of a piece of natural language can be very important. Those “exclamatory statements” (like expletives) carry meaning one cares about. But one can still perfectly well imagine capturing things like that in a symbolic way—for example by having an “emotion track” in one’s symbolic discourse language. (Very coarsely, this might be represented by sentiment or by position in an emotion space—or, for that matter, by a whole symbolic language derived, say, from emoji.)

In actual human communication through natural language, “meaning” is a slippery concept, that inevitably depends on the context of the communication, the history of whoever is communicating, and so on. My notion of a symbolic discourse language isn’t to try to magically capture the “true meaning” of a piece of natural language. Instead, my goal is just to capture some meaning that one can then compute with.

For convenience, one might choose to start with natural language, and then try to translate it into the symbolic discourse language. But the point is for the symbolic discourse language to be the real representation: the natural language is just a guide for trying to generate it. And in the end, the notion is that if one really wants to be sure one’s accurate in what one’s saying, one should say it directly in the symbolic discourse language, without ever using natural language.

Back in the 1600s, one of Leibniz’s big concerns was to have a representation that was independent of which natural language people were using (French, German, Latin, etc.). And one feature of a symbolic discourse language is that it has to operate “below” the level of specific natural languages.

There’s a rough kind of universality among human languages, in that it seems to be possible to represent any human concept at least to some approximation in any language. But there are plenty of nuances that are extremely hard to translate—between different languages, or the different cultures that surround them (or even the same language at different times in history). But in the symbolic discourse language, one’s effectively “crushing out” these differences—and getting something that is precise, even though it typically won’t correspond exactly to any particular human natural language.

A symbolic discourse language is about representing things in the world. Natural language is just one way to try to describe those things. But there are others. For example, one might give a picture. One could try to describe certain features of the picture in natural language (“a cat with a hat on its head”)—or one could go straight from the picture to the symbolic discourse language.

In the example of a picture, it’s very obvious that the symbolic discourse language isn’t going to capture everything. Maybe it could capture something like “he is taking the diamond”. But it’s not going to specify the color of every pixel, and it’s not going to describe all conceivable features of a scene at every level of detail.

In some sense, what the symbolic discourse language is doing is to specify a model of the system it’s describing. And like any model, it’s capturing some features, and idealizing others away. But the importance of it is that it provides a solid foundation on which computations can be done, conclusions can be drawn, and actions can be taken.

Why Now?

I’ve been thinking about creating what amounts to a general symbolic discourse language for nearly 40 years. But it’s only recently—with the current state of the Wolfram Language—that I’ve had the framework to actually do it. And it’s also only recently that I’ve understood how to think about the problem in a sufficiently practical way.

Yes, it’s nice in principle to have a symbolic way to represent things in the world. And in specific cases—like answering questions in Wolfram|Alpha—it’s completely clear why it’s worth doing this. But what’s the point of dealing with more general discourse? Like, for example, when do we really want to have a “general conversation” with a machine?

The Turing test says that being able to do this is a sign of achieving general AI. But “general conversations” with machines—without any particular purpose in mind—so far usually seem in practice to devolve quickly into party tricks and Easter eggs. At least that’s our experience looking at interactions people have with Wolfram|Alpha, and it also seems to be the experience with decades of chatbots and the like.

But the picture quickly changes if there’s a purpose to the conversation: if you’re actually trying to get the machine to do something, or learn something from the machine. Still, in most of these cases, there’s no real reason to have a general representation of things in the world; it’s sufficient just to represent specific machine actions, particular customer service goals, or whatever. But if one wants to tackle the general problem of law and contracts, it’s a different story. Because inevitably one’s going to have to represent the full spectrum of human affairs and issues. And so now there’s a definite goal to having a symbolic representation of the world: one needs it to be able to say what should happen and have machines understand it.

Sometimes it’s useful to do that because one wants the machines just to be able to check whether what was supposed to happen actually did; sometimes one wants to actually have the machines automatically enforce or do things. But either way, one needs the machine to be able to represent general things in the world—and so one needs a symbolic discourse language to be able to do this.

Some History

In a sense, it’s a very obvious idea to have something like a symbolic discourse language. And indeed it’s an idea that’s come up repeatedly across the course of centuries. But it’s proved a very difficult idea to make work, and it has a history littered with (sometimes quite wacky) failures.

Things in a sense started well. Back in antiquity, logic as discussed by Aristotle provided a very restricted example of a symbolic discourse language. And when the formalism of mathematics began to emerge it provided another example of a restricted symbolic discourse language.

But what about more general concepts in the world? There’d been many efforts—between the Tetractys of the Pythagoreans and the I Ching of the Chinese—to assign symbols or numbers to a few important concepts. But around 1300 Ramon Llull took it further, coming up with a whole combinatorial scheme for representing concepts—and then trying to implement this with circles of paper that could supposedly mechanically determine the validity of arguments, particularly religious ones.

Four centuries later, Gottfried Leibniz was an enthusiast of Llull’s work, at first imagining that perhaps all concepts could be converted to numbers and truth then determined by doing something like factoring into primes. Later, Leibniz starting talking about a characteristica universalis (or, as Descartes called it, an “alphabet of human thoughts”)—essentially a universal symbolic language. But he never really tried to construct such a thing, instead chasing what one might consider “special cases”—including the one that led him to calculus.

With the decline of Latin as the universal natural language in the 1600s, particularly in areas like science and diplomacy, there had already been efforts to invent “philosophical languages” (as they were called) that would represent concepts in an abstract way, not tied to any specific natural language. The most advanced of these was by John Wilkins—who in 1668 produced a book cataloging over 10,000 concepts and representing them using strange-looking glyphs, with a rendering of the Lord’s Prayer as an example.

In some ways these efforts evolved into the development of encyclopedias and later thesauruses, but as language-like systems, they basically went nowhere. Two centuries later, though, as the concept of internationalization spread, there was a burst of interest in constructing new, country-independent languages—and out of this emerged Volapük and then Esperanto. These languages were really just artificial natural languages; they weren’t an attempt to produce anything like a symbolic discourse language. I always used to enjoy seeing signs in Esperanto at European airports, and was disappointed in the 1980s when these finally disappeared. But, as it happens, right around that time, there was another wave of language construction. There were languages like Lojban, intended to be as unambiguous as possible, and ones like the interestingly minimal Toki Pona intended to support the simple life, as well as the truly bizarre Ithkuil, intended to encompass the broadest range of linguistic and supposedly cognitive structures.

Along the way, there were also attempts to simplify languages like English by expressing everything in terms of 1000 or 2000 basic words (instead of the usual 20,000–30,000)—as in the “Simple English” version of Wikipedia or the xkcd Thing Explainer.

There were a few, more formal, efforts. One example was Hans Freudenthal’s 1960 Lincos “language for cosmic intercourse” (i.e. communication with extraterrestrials) which attempted to use the notation of mathematical logic to capture everyday concepts. In the early days of the field of artificial intelligence, there were plenty of discussions of “knowledge representation”, with approaches based variously on the grammar of natural language, the structure of predicate logic or the formalism of databases. Very few large-scale projects were attempted (Doug Lenat’s Cyc being a notable counterexample), and when I came to develop Wolfram|Alpha I was disappointed at how little of relevance to our needs seemed to have emerged.

In a way I find it remarkable that something as fundamental as the construction of a symbolic discourse language should have had so little serious attention paid to it in the past. But at some level it’s not so surprising. It’s a difficult, large project, and it somehow lies in between established fields. It’s not a linguistics project. Yes, it may ultimately illuminate how languages work, but that’s not its main point. It’s not a computer science project because it’s really about content, not algorithms. And it’s not a philosophy project because it’s mostly about specific nitty-gritty and not much about general principles.

There’ve been a few academic efforts in the last half century or so, discussing ideas like “semantic primes” and “natural semantic metalanguage”. Usually such efforts have tried to attach themselves to the field of linguistics—but their emphasis on abstract meaning rather than pure linguistic structure has put them at odds with prevailing trends, and none have turned into large-scale projects.

Outside of academia, there’s been a steady stream of proposals—sometimes promoted by wonderfully eccentric individuals—for systems to organize and name concepts in the world. It’s not clear how far this pursuit has come since Ramon Llull—and usually it’s only dealing with pure ontology, and never with full meaning of the kind that can be conveyed in natural language.

I suppose one might hope that with all the recent advances in machine learning there’d be some magic way to automatically learn an abstract representation for meaning. And, yes, one can take Wikipedia, for example, or a text corpus, and use dimension reduction to derive some effective “space of concepts”. But, not too surprisingly, simple Euclidean space doesn’t seem to be a very good model for the way concepts relate (one can’t even faithfully represent graph distances). And even the problem of taking possible meanings for words—as a dictionary might list them—and breaking them into clusters in a space of concepts doesn’t seem to be easy to do effectively.

Still, as I’ll discuss later, I think there’s a very interesting interplay between symbolic discourse language and machine learning. But for now my conclusion is that there’s not going to be any alternative but to use human judgment to construct the core of any symbolic discourse language that’s intended for humans to use.

Contracts into Code

But let’s get back to contracts. Today, there are hundreds of billions of them being signed every year around the world (and vastly more being implicitly entered into)—though the number of “original” ones that aren’t just simple modifications is probably just in the millions (and is perhaps comparable to the number of original computer programs or apps being written.)

So can these contracts be represented in precise symbolic form, as Leibniz hoped 300 years ago? Well, if we can develop a decently complete symbolic discourse language, it should be possible. (Yes, every contract would have to be defined relative to some underlying set of “governing law” rules, etc., that are in some ways like the built-in functions of the symbolic discourse language.)

But what would it mean? Among other things, it would mean that contracts themselves would become computable things. A contract would be converted to a program in the symbolic discourse language. And one could do abstract operations just on this program. And this means one can imagine formally determining—in effect through a kind of generalization of logic—whether, say, a given contract has some particular implication, could ever lead to some particular outcome, or is equivalent to some other contract.

Ultimately, though, there’s a theoretical problem with this. Because questions like this can run into issues of formal undecidability, which means there’s no guarantee that any systematic finite computation will answer them. The same problem arises in reasoning about typical software programs people write, and in practice it’s a mixed bag, with some things being decidable, and others not.

Of course, even in the Wolfram Language as it is today, there are plenty of things (such as the very basic “are these expressions equal?”) that are ultimately in principle undecidable. And there are certainly questions one can ask that run right into such issues. But an awful lot of the kinds of questions that people naturally ask turn out to be answerable with modest amounts of computation. And I wouldn’t be surprised if this were true for questions about contracts too. (It’s worth noting that human-formulated questions tend to run into undecidability much less than questions picked, say at random, from the whole computational universe of possibilities.)

If one has contracts in computational form, there are other things one can expect to do too. Like to be able to automatically work out what the contracts imply for a large range of possible inputs. The 1980s revolution in quantitative finance started when it became clear one could automatically compute distributions of outcomes for simple options contracts. If one had lots (perhaps billions) of contracts in computational form, there’d be a lot more that could be done along these lines—and no doubt, for better or worse, whole new areas of financial engineering that could be developed.

Where Do the Inputs Come From?

OK, so let’s say one has a computational contract. What can one directly do with it? Well, it depends somewhat on what the form of its inputs is. One important possibility is that they’re in a sense “born computational”: that they’re immediately statements about a computational system (“how many accesses has this ID made today?”, “what is the ping time for this connection?”, “how much bitcoin got transferred?”, etc.). And in that case, it should be possible to immediately and unambiguously “evaluate” the contract—and find out if it’s being satisfied.

This is something that’s very useful for lots of purposes—both for humans interacting with machines, and machines interacting with machines. In fact, there are plenty of cases where versions of it are already in use. One can think of computer security provisions such as firewall rules as one example. There are others that are gradually emerging, such as automated SLAs (service-level agreements) and automated terms-of-service. (I’m certainly hoping our company, for example, will be able to make these a routine part of our business practices before too long.)

But, OK, it’s certainly not true that every input for every contract is “born computational”: plenty of inputs have to come from seeing what happens in the “outside” world (“did the person actually go to place X?”, “was the package maintained in a certain environment?”, “did the information get leaked to social media?”, “is the parrot dead?”, etc.). And the first thing to say is that in modern times it’s become vastly easier to automatically determine things about the world, not least because one can just make measurements with sensors. Check the GPS trace. Look at the car counting sensor. And so on. The whole Internet of Things is out there to provide input about the real world for computational contracts.

Having said this, though, there’s still an issue. Yes, with a GPS trace there’s a definite answer (assuming the GPS is working properly) for whether someone or something went to a particular place. But let’s say one’s trying to determine something less obviously numerical. Let’s say, for example, that one’s trying to determine whether a piece of fruit should be considered “Fancy Grade” or not. Well, given some pictures of the piece of fruit an expert can pretty unambiguously tell. But how can we make this computational?

Well, here’s a place where we can use modern machine learning. We can set up some neural net, say in the Wolfram Language, and then show it lots of examples of fruit that’s Fancy Grade and that’s not. And from my experience (and those of our customers!) most of the time we’ll get a system that’s really good at a task like grading fruit. It’ll certainly be much faster than humans, and it’ll probably be more reliable and more consistent too.

And this gives a whole new way to set up contracts about things in the world. Two parties can just agree that the contract should say “if the machine learning system says X then do Y”. In a sense it’s like any other kind of computational contract: the machine learning system is just a piece of code. But it’s a little different. Because normally one expects that one can readily examine everything that a contract says: one can in effect read and understand the code. But with machine learning in the middle, there can no longer be any expectation of that.

Nobody specifically set up all those millions of numerical weights in the neural net; they were just determined by some approximate and somewhat random process from whatever training data that was given. Yes, in principle we can measure everything about what’s happening inside the neural net. But there’s no reason to expect that we’ll ever be able to get an understandable explanation—or prediction—of what the net will do in any particular case. Most likely it’s an example of the phenomenon I call computational irreducibility–which means there really isn’t any way to see what will happen much more efficiently than just by running it.

What’s the difference with asking a human expert, then, whose thought processes one can’t understand? Well, in practice machine learning is much faster so one can make much more use of “expert judgment”. And one can set things up so they’re repeatable, and one can for example systematically test for biases one thinks might be there, and so on.

Of course, one can always imagine cheating the machine learning. If it’s repeatable, one could use machine learning itself to try to learn cases where it would fail. And in the end it becomes rather like computer security, where holes are being found, patches are being applied, and so on. And in some sense this is no different from the typical situation with contracts too: one tries to cover all situations, then it becomes clear that something hasn’t been correctly addressed, and one tries to write a new contract to address it, and so on.

But the important bottom line is that with machine learning one can expect to get “judgment oriented” input into contracts. I expect the typical pattern will be this: in the contract there’ll be something stated in the symbolic discourse language (like “X will personally do Y”). And at the level of the symbolic discourse language there’ll be a clear meaning to this, from which, for example, all sorts of implications can be drawn. But then there’s the question of whether what the contract said is actually what happened in the real world. And, sure, there can be lots of sensor data that gives information on this. But in the end there’ll be a “judgment call” that has to be made. Did the person actually personally do this? Well—like for a remote exam proctoring system—one can have a camera watching the person, one can record their pattern of keystrokes, and maybe even measure their EEG. But something’s got to synthesize this data, and make the judgment call about what happened, and turn this in effect into a symbolic statement. And in practice I expect it will typically end up being a machine learning system that does this.

Smart Contracts

OK, so let’s say we’ve got ways to set up computational contracts. How can we enforce them? Well, ones that basically just involve computational processes can at some level enforce themselves. A particular piece of software can be built to issue licenses only in such-and-such a way. A cloud system can be built to make a download available only if it receives a certain amount of bitcoin. And so on.

But how far do we trust what’s going on? Maybe someone hacked the software, or the cloud. How can we be sure nothing bad has happened? The basic answer is to use the fact that the world is a big place. As a (sometime) physicist it makes me think of measurement in quantum mechanics. If we’re just dealing with a little quantum effect, there’s always interference that can happen. But when we do a real measurement, we’re amplifying that little quantum effect to the point where so many things (atoms, etc.) are involved that it’s unambiguous what happened—in much the same way as the Second Law of Thermodynamics makes it inconceivable that all the air molecules in a room will spontaneously line up on one side.

And so it is with bitcoin, Ethereum, etc. The idea is that some particular thing that happened (“X paid Y such-and-such” or whatever) is shared and recorded in so many places that there can’t be any doubt about it. Yes, it’s in principle possible that all the few thousand places that actually participate in something like bitcoin today could collude to give a fake result. But the idea is that it’s like with gas molecules in a room: the probability is inconceivably small. (As it happens, my Principle of Computational Equivalence suggests that there’s more than an analogy with the gas molecules, and that actually the underlying principles at work are basically exactly the same. And, yes, there are lots of interesting technical details about the operation of distributed blockchain ledgers, distributed consensus protocols, etc., but I’m not going to get into them here.)

It’s popular these days to talk about “smart contracts”. When I’ve been talking about “computational contracts” I mean contracts that can be expressed computationally. But by “smart contracts” people usually mean contracts that can both be expressed computationally and execute automatically. Most often the idea is to set up a smart contract in a distributed computation environment like Ethereum, and then to have the code in the contract evaluate based on inputs from the computation environment.

Sometimes the input is intrinsic—like the passage of time (who could possibly tamper with the clock of the whole internet?), or physically generated random numbers. And in cases like this, one has fairly pure smart contracts, say for paying subscriptions, or for running distributed lotteries.

But more often there has to be some input from the outside—from something that happens in the world. Sometimes one just needs public information: the price of a stock, the temperature at a weather station, or a seismic event like a nuclear explosion. But somehow the smart contract needs access to an “oracle” that can give it this information. And conveniently enough, there is one good such oracle available in the world: Wolfram|Alpha. And indeed Wolfram|Alpha is becoming widely used as an oracle for smart contracts. (Yes, our general public terms of service say you currently just shouldn’t rely on Wolfram|Alpha for anything you consider critical—though hopefully soon those terms of service will get more sophisticated, and computational.)

But what about non-public information from the outside world? The current thinking for smart contracts tends to be that one has to get humans in the loop to verify the information: that in effect one has to have a jury (or a democracy) to decide whether something is true. But is that really the best one can do? I tend to suspect there’s another path, that’s like using machine learning to inject human-like judgment into things. Yes, one can use people, with all their inscrutable and hard-to-systematically-influence behavior. But what if one replaces those people in effect by AIs—or even a collection of today’s machine-learning systems?

One can think of a machine-learning system as being a bit like a cryptosystem. To attack it and spoof its input one has to do something like inverting how it works. Well, given a single machine-learning system there’s a certain effort needed to achieve this. But if one has a whole collection of sufficiently independent systems, the effort goes up. It won’t be good enough just to change a few parameters in the system. But if one just goes out into the computational universe and picks systems at random then I think one can expect to have the same kind of independence as by having different people. (To be fair, I don’t yet quite know how to apply the mining of the computational universe that I’ve done for programs like cellular automata to the case of systems like neural nets.)

There’s another point as well: if one has a sufficiently dense net of sensors in the world, then it becomes increasingly easy to be sure about what’s happened. If there’s just one motion sensor in a room, it might be easy to cover it. And maybe even if there are several sensors, it’s still possible to avoid them, Mission Impossible-style. But if there are enough sensors, then by synthesizing information from them one can inevitably build up an understanding of what actually happened. In effect, one has a model of how the world works, and with enough sensors one can validate that the model is correct.

It’s not surprising, but it always helps to have redundancy. More nodes to ensure the computation isn’t tampered with. More machine-learning algorithms to make sure they aren’t spoofed. More sensors to make sure they’re not fooled. But in the end, there has to be something that says what should happen—what the contract is. And the contract has to be expressed in some language in which there are definite concepts. So somehow from the various redundant systems one has in the world, one has to make a definite conclusion—one has to turn the world into something symbolic, on which the contract can operate.

Writing Computational Contracts

Let’s say we have a good symbolic discourse language. Then how should contracts actually get written in it?

One approach is to take existing contracts written in English or any other natural language, and try to translate (or parse) them into the symbolic discourse language. Well, what will happen is somewhat like what happens with Wolfram|Alpha today. The translator will not know exactly what the natural language was supposed to mean, and so it will give several possible alternatives. Maybe there was some meaning that the original writer of the natural-language contract had in mind. But maybe the “poetry” of that meaning can’t be expressed in the symbolic discourse language: it requires something more definite. And a human is going to have to decide which alternative to pick.

Translating from natural-language contracts may be a good way to start, but I suspect it will quickly give way to writing contracts directly in the symbolic discourse language. Today lawyers have to learn to write legalese. In the future, they’re going to have to learn to write what amounts to code: contracts expressed precisely in a symbolic discourse language.

One might think that writing everything as code, rather than natural-language legalese, would be a burden. But my guess is that it will actually be a great benefit. And it’s not just because it will let contracts operate more easily. It’s also that it will help lawyers think better about contracts. It’s an old claim (the Sapir–Whorf hypothesis) that the language one uses affects the way one thinks. And this is no doubt somewhat true for natural languages. But in my experience it’s dramatically true for computer languages. And indeed I’ve been amazed over the years at how my thinking has changed as we’ve added more to the Wolfram Language. When I didn’t have a way to express something, it didn’t enter my thinking. But once I had a way to express it, I could think in terms of it.

And so it will be, I believe, for legal thinking. When there’s a precise symbolic discourse language, it’ll become possible to think more clearly about all sorts of things.

Of course, in practice it’ll help that there’ll no doubt be all sorts of automated annotation: “if you add that clause, it’ll imply X, Y and Z”, etc. It’ll also help that it’ll routinely be possible to take some contract and simulate its consequences for a range of inputs. Sometimes one will want statistical results (“is this biased?”). Sometimes one will want to hunt for particular “bugs” that will only be found by trying lots of inputs.

Yes, one can read a contract in natural language, like one can read a math paper. But if one really wants to know its implications one needs it in computational form, so one can run it and see what it implies—and also so one can give it to a computer to implement.

The World with Computational Contracts

Back in ancient Babylon it was a pretty big deal when there started to be written laws like the Code of Hammurabi. Of course, with very few people able to read, there was all sorts of clunkiness at first—like having people recite the laws in order from memory. Over the centuries things got more streamlined, and then about 500 years ago, with the advent of widespread literacy, laws and contracts started to be able to get more complex (which among other things allowed them to be more nuanced, and to cover more situations).

In recent decades the trend has accelerated, particularly now that it’s so easy to copy and edit documents of any length. But things are still limited by the fact that humans are in the loop, authoring and interpreting the documents. Back 50 years ago, pretty much the only way to define a procedure for anything was to write it down, and have humans implement it. But then along came computers, and programming. And very soon it started to be possible to define vastly more complex procedures—to be implemented not by humans, but instead by computers.

And so, I think, it will be with law. Once computational law becomes established, the complexity of what can be done will increase rapidly. Typically a contract defines some model of the world, and specifies what should happen in different situations. Today the logical and algorithmic structure of models defined by contracts still tends to be fairly simple. But with computational contracts it’ll be feasible for them to be much more complex—so that they can for example more faithfully capture how the world works.

Of course, that just makes defining what should happen even more complex—and before long it might feel a bit like constructing an operating system for a computer, that tries to cover all the different situations the computer might find itself in.

In the end, though, one’s going to have to say what one wants. One might be able to get a certain distance by just giving specific examples. But ultimately I think one’s going to have to use a symbolic discourse language that can express a higher level of abstraction.

Sometimes one will be able to just write everything in the symbolic discourse language. But often, I suspect, one will use the symbolic discourse language to define what amount to goals, and then one will have to use machine-learning kinds of methods to fill in how to define a contract that actually achieves them.

And as soon as there’s computational irreducibility involved, it’ll typically be impossible to know for sure that there are no bugs, or “unintended consequences”. Yes, one can do all kinds of automated tests. But in the end it’s theoretically impossible to have any finite procedure that can guarantee to check all possibilities.

Today there are plenty of legal situations that are too complex to handle without expert lawyers. And in a world where computational law is common, it won’t just be convenient to have computers involved, it’ll be necessary.

In a sense it’s similar to what’s already happened in many areas of engineering. Back when humans had to design everything themselves, humans could typically understand the structures that were being built. But once computers are involved in design it becomes inevitable that they’re needed in figuring out how things work too.

Today a fairly complex contract might involve a hundred pages of legalese. But once there’s computational law—and particularly contracts constructed automatically from goals—the lengths are likely to increase rapidly. At some level it won’t matter, though—just as it doesn’t really matter how long the code of a program one’s using is. Because the contract will in effect just be run automatically by computer.

Leibniz saw computation as a simplifying element in the practice of law. And, yes, some things will become simpler and better defined. But a vast ocean of complexity will also open up.

What Does It Mean for AIs?

How should one tell an AI what to do? Well, you have to have some form of communication that both humans and AIs can understand—and that is rich enough to describe what one wants. And as I’ve described elsewhere, what I think this basically means is that one has to have a knowledge-based computer language—which is precisely what the Wolfram Language is—and ultimately one needs a full symbolic discourse language.

But, OK, so one tells an AI to do something, like “go get some cookies from the store”. But what one says inevitably won’t be complete. The AI has to operate within some model of the world, and with some code of conduct. Maybe it can figure out how to steal the cookies, but it’s not supposed to do that; presumably one wants it to follow the law, or a certain code of conduct.

And this is where computational law gets really important: because it gives us a way to provide that code of conduct in a way that AIs can readily make use of.

In principle, we could have AIs ingest the complete corpus of laws and historical cases and so on, and try to learn from these examples. But as AIs become more and more important in our society, it’s going to be necessary to define all sorts of new laws, and many of these are likely to be “born computational”, not least, I suspect, because they’ll be too algorithmically complex to be usefully described in traditional natural language.

There’s another problem too: we really don’t just want AIs to follow the letter of the law (in whatever venue they happen to be), we want them to behave ethically too, whatever that may mean. Even if it’s within the law, we probably don’t want our AIs lying and cheating; we want them somehow to enhance our society along the lines of whatever ethical principles we follow.

Well, one might think, why not just teach AIs ethics like we could teach them laws? In practice, it’s not so simple. Because whereas laws have been somewhat decently codified, the same can’t be said for ethics. Yes, there are philosophical and religious texts that talk about ethics. But it’s a lot vaguer and less extensive than what exists for law.

Still, if our symbolic discourse language is sufficiently complete, it certainly should be able to describe ethics too. And in effect we should be able to set up a system of computational laws that defines a whole code of conduct for AIs.

But what should it say? One might have a few immediate ideas. Perhaps one could combine all the ethical systems of the world. Obviously hopeless. Perhaps one could have the AIs just watch what humans do and learn their system of ethics from it. Similarly hopeless. Perhaps one could try something more local, where the AIs switch their behavior based on geography, cultural context, etc. (think “protocol droid”). Perhaps useful in practice, but hardly a complete solution.

So what can one do? Well, perhaps there are a few principles one might agree on. For example, at least the way we think about things today, most of us don’t want humans to go extinct (of course, maybe in the future, having mortal beings will be thought too disruptive, or whatever). And actually, while most people think there are all sorts of things wrong with our current society and civilization, people usually don’t want it to change too much, and they definitely don’t want change forced upon them.

So what should we tell the AIs? It would be wonderful if we could just give the AIs some simple set of almost axiomatic principles that would make them always do what we want. Maybe they could be based on Asimov’s Three Laws of Robotics. Maybe they could be something seemingly more modern based on some kind of global optimization. But I don’t think it’s going to be that easy.

The world is a complicated place; if nothing else, that’s basically guaranteed by the phenomenon of computational irreducibility. And it’s pretty much inevitable that there’s not going to be any finite procedure that’ll force everything to “come out the way one wants” (whatever that may be).

Let me take a somewhat abstruse, but well defined, example from mathematics. We think we know what integers are. But to really be able to answer all questions about integers (including about infinite collections of them, etc.) we need to set up axioms that define how integers work. And that’s what Giuseppe Peano tried to do in the late 1800s. For a while it looked good, but then in 1931 Kurt Gödel surprised the world with his Incompleteness Theorem, which implied among other things, that actually, try as one might, there was never going to be a finite set of axioms that would define the integers as we expect them to be, and nothing else.

In some sense, Peano’s original axioms actually got quite close to defining just the integers we want. But Gödel showed that they also allow bizarre non-standard integers, where for example the operation of addition isn’t finitely computable.

Well, OK, that’s abstract mathematics. What about the real world? Well, one of the things that we’ve learned since Gödel’s time is that the real world can be thought of in computational terms, pretty much just like the mathematical systems Gödel considered. And in particular, one can expect the same phenomenon of computational irreducibility (which itself is closely related to Gödel’s Theorem). And the result of this is that whatever simple intuitive goal we may define, it’s pretty much inevitable we’ll have to build up what amount to an arbitrarily complicated collection of rules to try to achieve it—and whatever we do, there’ll always be at least some “unintended consequences”.

None of this should really come as much of a surprise. After all, if we look at actual legal systems as they’ve evolved over the past couple of thousand years, there always end up being a lot of laws. It’s not like there’s a single principle from which everything else can be derived; there inevitably end up being lots of different situations that have to be covered.

Principles of the World?

But is all this complexity just a consequence of the “mechanics” of how the world works? Imagine—as one expects—that AIs get more and more powerful. And that more and more of the systems of the world, from money supplies to border controls, are in effect put in the hands of AIs. In a sense, then, the AIs play a role a little bit like governments, providing an infrastructure for human activities.

So, OK, perhaps we need a “constitution” for the AIs, just like we set up constitutions for governments. But again the question comes: what should the constitution have in it?

Let’s say that the AIs could mold human society in pretty much any way. How would we want it molded? Well, that’s an old question in political philosophy, debated since antiquity. At first an idea like utilitarianism might sound good: somehow maximize the well-being of as many people as possible. But imagine actually trying to do this with AIs that in effect control the world. Immediately one is thrust into concrete versions of questions that philosophers and others have debated for centuries. Let’s say one can sculpt the probability distribution for happiness among people in the world. Well, now we’ve got to get precise about whether it’s the mean or the median or the mode or a quantile or, for that matter, the kurtosis of the distribution that we’re trying to maximize.

No doubt one can come up with rhetoric that argues for some particular choice. But there just isn’t an abstract “right answer”. Yes, we can have a symbolic discourse language that expresses any choice. But there’s no mathematical derivation of the answer and there’s no law of nature that forces a particular answer. I suppose there could be a “best answer given our biological nature”. But as things advance, this won’t be on solid ground either, as we increasingly manage to use technology to transcend the biology that evolution has delivered to us.

Still, we might argue, there’s at least one constraint: we don’t want a scheme where we’ll go extinct—and where nothing will in the end exist. Even this is going to be a complicated thing to discuss, because we need to say what the “we” here is supposed to be: just how “evolved” relative to the current human condition can things be, and not consider “us” to have gone extinct?

But even independent of this, there’s another issue: given any particular setup, computational irreducibility can make it in a sense irreducibly difficult to find out its consequences. And so in particular, given any specific optimization criterion (or constitution), there may be no finite procedure that will determine whether it allows for infinite survival, or whether in effect it implies civilization will “halt” and go extinct.

OK, so things are complicated. What can one actually do? For a little while there’ll probably be the notion that AIs must ultimately have human owners, who must act according to certain principles, following the usual way human society operates. But realistically this won’t last long.

Who would be responsible for a public-domain AI system that’s spread across the internet? What happens when the bots it spawns start misbehaving on social media (yes, the notion that social media accounts are just for humans will soon look very “early 21st century”)?

Of course, there’s an important question of why AIs should “follow the rules” at all. After all, humans certainly don’t always do that. It’s worth remembering, though, that we humans are probably a particularly difficult case: after all, we’re the product a multibillion-year process of natural selection, in which there’s been a continual competitive struggle for survival. AIs are presumably coming into the world in very different circumstances, and without the same need for “brutish instincts”. (Well, I can’t help thinking of AIs from different companies or countries being imbued by their creators with certain brutish instincts, but that’s surely not a necessary feature of AI existence.)

In the end, though, the best hope for getting AIs to “follow the rules” is probably by more or less the same mechanism that seems to maintain human society today: that following the rules is the way some kind of dynamic equilibrium is achieved. But if we can get the AIs to “follow the rules”, we still have to define what the rules—the AI Constitution—should be.

And, of course, this is a hard problem, with no “right answer”. But perhaps one approach is to see what’s happened historically with humans. And one important and obvious thing is that there are different countries, with different laws and customs. So perhaps at the very least we have to expect that there’d be multiple AI Constitutions, not just one.

Even looking at countries today, an obvious question is how many there should be. Is there some easy way to say that—with technology as it exists, for example—7 billion people should be expected to organize themselves into about 200 countries?

It sounds a bit like asking how many planets the solar system should end up with. For a long time this was viewed as a “random fact of nature” (and widely used by philosophers as an example of something that, unlike 2+2=4, doesn’t “have to be that way”). But particularly having seen so many exoplanet systems, it’s become clear that our solar system actually pretty much has to have about the number of planets it does.

And maybe after we’ve seen the sociologies of enough video-game virtual worlds, we’ll know something about how to “derive” the number of countries. But of course it’s not at all clear that AI Constitutions should be divided anything like countries.

The physicality of humans has the convenient consequence that at least at some level one can divide the world geographically. But AIs don’t need to have that kind of spatial locality. One can imagine some other schemes, of course. Like let’s say one looks at the space of personalities and motivations, and finds clusters in it. Perhaps one could start to say “here’s an AI Constitution for that cluster” and so on. Maybe the constitutions could fork, perhaps almost arbitrarily (a “Git-like model of society”). I don’t know how things like this would ultimately work, but they seem more plausible than what amounts to a single, consensus, AI Constitution for everywhere and everyone.

There are so many issues, though. Like here’s one. Let’s assume AIs are the dominant power in our world. But let’s assume that they successfully follow some constitution or constitutions that we’ve defined for them. Well, that’s nice—but does it mean nothing can ever change in the world? I mean, just think if we were still all operating according to laws that had been set up 200 years ago: most of society has moved on since then, and wants different laws (or at least different interpretations) to reflect its principles.

But what if precise laws for AIs were burnt in around the year 2020, for all eternity? Well, one might say, real constitutions always have explicit clauses that allow for their own modification (in the US Constitution it’s Article V). But looking at the actual constitutions of countries around the world isn’t terribly encouraging. Some just say basically that the constitution can be changed if some supreme leader (a person) says so. Many say that the constitution can be changed through some democratic process—in effect by some sequence of majority or similar votes. And some basically define a bureaucratic process for change so complex that one wonders if it’s formally undecidable whether it would ever come to a conclusion.

At first, the democratic scheme seems like an obvious winner. But it’s fundamentally based on the concept that people are somehow easy to count (of course, one can argue about which people, etc.). But what happens when personhood gets more complicated? When, for example, there are in effect uploaded human consciousnesses, deeply intertwined with AIs? Well, one might say, there’s always got to be some “indivisible person” involved. And yes, I can imagine little clumps of pineal gland cells that are maintained to define “a person”, just like in the past they were thought to be the seat of the soul. But from the basic science I’ve done I think I can say for certain that none of this will ultimately work—because in the end the computational processes that define things just don’t have this kind of indivisibility.

So what happens to “democracy” when there are no longer “people to count”? One can imagine all sorts of schemes, involving identifying the density of certain features in “people space”. I suppose one can also imagine some kind of bizarre voting involving transfinite numbers of entities, in which perhaps the axiomatization of set theory has a key effect on the future of history.

It’s an interesting question how to set up a constitution in which change is “burned in”. There’s a very simple example in bitcoin, where the protocol just defines by fiat that the value of mined bitcoin goes down every year. Of course, that setup is in a sense based on a model of the world—and in particular on something like Moore’s Law and the apparent short-term predictability of technological development. But following the same general idea, one might starting thinking about a constitution that says “change 1% of the symbolic code in this every year”. But then one’s back to having to decide “which 1%?”. Maybe it’d be based on usage, or observations of the world, or some machine-learning procedure. But whatever algorithm or meta-algorithm is involved, there’s still at some point something that has to be defined once and for all.

Can one make a general theory of change? At first, this might seem hopeless. But in a sense exploring the computational universe of programs is like seeing a spectrum of all possible changes. And there’s definitely some general science that can be done on such things. And maybe there’s some setup—beyond just “fork whenever there could be a change”—that would let one have a constitution that appropriately allows for change, as well as changing the way one allows for change, and so on.

Making It Happen

OK, we’ve talked about some far-reaching and foundational issues. But what about the here and now? Well, I think the exciting thing is that 300 years after Gottfried Leibniz died, we’re finally in a position to do what he dreamed of: to create a general symbolic discourse language, and to apply it to build a framework for computational law.

With the Wolfram Language we have the foundational symbolic system—as well as a lot of knowledge of the world—to start from. There’s still plenty to do, but I think there’s now a definite path forward. And it really helps that in addition to the abstract intellectual challenge of creating a symbolic discourse language, there’s now also a definite target in mind: being able to set up practical systems for computational law.

It’s not going to be easy. But I think the world is ready for it, and needs it. There are simple smart contracts already in things like bitcoin and Ethereum, but there’s vastly more that can be done—and with a full symbolic discourse language the whole spectrum of activities covered by law becomes potentially accessible to structured computation. It’s going to lead to all sorts of both practical and conceptual advances. And it’s going to enable new legal, commercial and societal structures—in which, among other things, computers are drawn still further into the conduct of human affairs.

I think it’s also going to be critical in defining the overall framework for AIs in the future. What ethics, and what principles, should they follow? How do we communicate these to them? For ourselves and for the AIs we need a way to formulate what we want. And for that we need a symbolic discourse language. Leibniz had the right idea, but 300 years too early. Now in our time I’m hoping we’re finally going to get to build for real what he only imagined. And in doing so we’re going to take yet another big step forward in harnessing the power of the computational paradigm.

]]> 10