Logic is a foundation for many things. But what are the foundations of logic itself?
In symbolic logic, one introduces symbols like p and q to stand for statements (or “propositions”) like “this is an interesting essay”. Then one has certain “rules of logic”, like that, for any p and any q, NOT (p AND q) is the same as (NOT p) OR (NOT q).
But where do these “rules of logic” come from? Well, logic is a formal system. And, like Euclid’s geometry, it can be built on axioms. But what are the axioms? We might start with things like p AND q = q AND p, or NOT NOT p = p. But how many axioms does one need? And how simple can they be?
It was a nagging question for a long time. But at 8:31pm on Saturday, January 29, 2000, out on my computer screen popped a single axiom. I had already shown there couldn’t be anything simpler, but I soon established that this one little axiom was enough to generate all of logic:
✕
((p·q)·r)·(p·((p·r)·p))==r 
But how did I know it was correct? Well, because I had a computer prove it. And here’s the proof, as I printed it in 4point type in A New Kind of Science (and it’s now available in the Wolfram Data Repository):
With the latest version of the Wolfram Language, anyone can now generate this proof in under a minute. And given this proof, it’s straightforward to verify each step. But why is the result true? What’s the explanation?
That’s the same kind of question that’s increasingly being asked about all sorts of computational systems, and all sorts of applications of machine learning and AI. Yes, we can see what happens. But can we understand it?
I think this is ultimately a deep question—that’s actually critical to the future of science and technology, and in fact to the future of our whole intellectual development.
But before we talk more about this, let’s talk about logic, and about the axiom I found for it.
Logic as a formal discipline basically originated with Aristotle in the 4th century BC. As part of his lifelong effort to catalog things (animals, causes, etc.), Aristotle cataloged valid forms of arguments, and created symbolic templates for them which basically provided the main content of logic for two thousand years.
By the 1400s, however, algebra had been invented, and with it came cleaner symbolic representations of things. But it was not until 1847 that George Boole finally formulated logic in the same kind of way as algebra, with logical operations like AND and OR being thought of as operating according to algebralike rules.
Within a few years, people were explicitly writing down axiom systems for logic. A typical example was:
✕
and[p,q]==and[q,p],or[p,q]==or[q,p],and[p,or[q,not[q]]]==p,or[p,and[q,not[q]]]==p,and[p,or[q,r]]==or[and[p,q],and[p,r]],or[p,and[q,r]]==and[or[p,q],or[p,r]]} 
But does logic really need AND and OR and NOT? After the first decade of the 1900s several people had discovered that actually the single operation that we now call NAND is enough, with for example p OR q being computed as (p NAND p) NAND (q NAND q). (The “functional completeness” of NAND could have remained forever a curiosity but for the development of semiconductor technology—which implements all the billions of logic operations in a modern microprocessor with combinations of transistors that perform none other than NAND or the related function NOR.)
But, OK, so what do the axioms of logic (or “Boolean algebra”) look like in terms of NAND? Here’s the first known version of them, from Henry Sheffer in 1913 (here dot · stands for NAND):
✕
{(p\[CenterDot]p)\[CenterDot](p\[CenterDot]p)==p,p\[CenterDot](q\[CenterDot](q\[CenterDot]q))==p\[CenterDot]p,(p\[CenterDot](q\[CenterDot]r))\[CenterDot](p\[CenterDot](q\[CenterDot]r))==((q\[CenterDot]q)\[CenterDot]p)\[CenterDot]((r\[CenterDot]r)\[CenterDot]p)} 
Back in 1910 Whitehead and Russell’s Principia Mathematica had popularized the idea that perhaps all of mathematics could be derived from logic. And particularly with this in mind, there was significant interest in seeing just how simple the axioms for logic could be. Some of the most notable work on this was done in Lviv and Warsaw (then both part of Poland), particularly by Jan Łukasiewicz (who, as a side effect of his work, invented in 1920 parenthesisfree Łukasiewicz or “Polish” notation). In 1944, at the age of 66, Łukasiewicz fled from the approaching Soviets—and in 1947 ended up in Ireland.
Meanwhile, the Irishborn Carew Meredith, who had been educated at Winchester and Cambridge, and had become a mathematics coach in Cambridge, had been forced by his pacifism to go back to Ireland in 1939. And in 1947, Meredith went to lectures by Łukasiewicz in Dublin, which inspired him to begin a search for simple axioms, which would occupy most of the rest of his life.
Already by 1949, Meredith found the twoaxiom system:
✕
{(p\[CenterDot](q\[CenterDot]r))\[CenterDot](p\[CenterDot](q\[CenterDot]r))==((r\[CenterDot]p)\[CenterDot]p)\[CenterDot]((q\[CenterDot]p)\[CenterDot]p),(p\[CenterDot]p)\[CenterDot](q\[CenterDot]p)==p} 
After nearly 20 years of work, he had succeeded by 1967 in simplifying this to:
✕
{p\[CenterDot](q\[CenterDot](p\[CenterDot]r))==((r\[CenterDot]q)\[CenterDot]q)\[CenterDot]p,(p\[CenterDot]p)\[CenterDot](q\[CenterDot]p)==p} 
But could it get any simpler? Meredith had been picking away for years trying to see how a NAND could be removed here or there. But after 1967 he apparently didn’t get any further (he died in 1976), though in 1969 he did find the threeaxiom system:
✕
{p\[CenterDot](q\[CenterDot](p\[CenterDot]r))==p\[CenterDot](q\[CenterDot](q\[CenterDot]r)),(p\[CenterDot]p)\[CenterDot](q\[CenterDot]p)==p,p\[CenterDot]q==q\[CenterDot]p} 
I actually didn’t know about Meredith’s work when I started exploring axiom systems for logic. I’d gotten into the subject as part of trying to understand what kinds of behavior simple rules could produce. Back in the early 1980s I’d made the surprising discovery that even cellular automata with some of the simplest possible rules—like my favorite rule 30—could generate behavior of great complexity.
And having spent the 1990s basically trying to figure out just how general this phenomenon was, I eventually wanted to see how it might apply to mathematics. It’s an immediate observation that in mathematics one’s basically starting from axioms (say for arithmetic, or geometry, or logic), and then trying to prove a whole collection of sophisticated theorems from them.
But just how simple can the axioms be? Well, that was what I wanted to discover in 1999. And as my first example, I decided to look at logic (or, equivalently, Boolean algebra). Contrary to what I would ever have expected beforehand, my experience with cellular automata, Turing machines, and many other kinds of systems—including even partial differential equations—was that one could just start enumerating the simplest possible cases, and after not too long one would start seeing interesting things.
But could one “discover logic” this way? Well, there was only one way to tell. And in late 1999 I set things up to start exploring what amounts to the space of all possible axiom systems—starting with the simplest ones.
In a sense any axiom system provides a set of constraints, say on p · q. It doesn’t say what p · q “is”; it just gives properties that p · q must satisfy (like, for example, it could say that p · q = p · q). Then the question is whether from these properties one can derive all the theorems of logic that hold when p · q is Nand[p, q]: no more and no less.
There’s a direct way to test some of this. Just take the axiom system, and see what explicit forms of satisfy the axioms if and can, say, be True or False. If the axiom system were just then, yes, could be Nand[, ]—but it doesn’t have to be. It could also be And[, ] or Equal[, ]—or lots of other things which won’t satisfy the same theorems as the NAND function in logic. But by the time one gets to the axiom system one’s reached the point where Nand[, ] (and the basically equivalent Nor[, ]) are the only “models” of that work—at least assuming and have only two possible values.
So is this then an axiom system for logic? Well, no. Because it implies, for example, that there’s a possible form for with 3 values for and , whereas there’s no such thing for logic. But, OK, the fact that this axiom system with just one axiom even gets close suggests it might be worth looking for a single axiom that reproduces logic. And that’s what I did back in January 2000 (it’s gotten a bit easier these days, thanks notably to the handy, fairly new Wolfram Language function Groupings).
It was easy to see that no axioms with 3 or fewer “NANDs” (or, really, 3 or fewer “dot operators”) could work. And by 5am on Saturday, January 29 (yes, I was a night owl then), I’d found that none with 4 NANDs could work either. By the time I stopped working on it a little after 6am, I’d gotten 14 possible candidates with 5 NANDs. But when I started work again on Saturday evening and did more tests, every one of these candidates failed.
So, needless to say, the next step was to try cases with 6 NANDs. There were 288,684 of these in all. But my code was efficient, and it didn’t take long before out popped on my screen (yes, from Mathematica Version 4):
At first I didn’t know what I had. All I knew was that these were the 25 inequivalent 6NAND axioms that got further than any of the 5NAND ones. But were any of them really an axiom system for logic? I had a (rather computationintensive) empirical method that could rule axioms out. But the only way to know for sure whether any axiom was actually correct was to prove that it could successfully reproduce, say, the Sheffer axioms for logic.
It took a little software wrangling, but before many days had gone by, I’d discovered that most of the 25 couldn’t work. And in the end, just two survived:
✕
{((p\[CenterDot]q)\[CenterDot]r)\[CenterDot](p\[CenterDot]((p\[CenterDot]r)\[CenterDot]p))==r} 
✕
{(p\[CenterDot]((r\[CenterDot]p)\[CenterDot]p))\[CenterDot](r\[CenterDot](q\[CenterDot]p))==r} 
And to my great excitement, I was successfully able to have my computer prove that both are axioms for logic. The procedure I’d used ensured that there could be no simpler axioms for logic. So I knew I’d come to the end of the road: after a century (or maybe even a couple of millennia), we could finally say that the simplest possible axiom for logic was known.
Not long after, I found two 2axiom systems, also with 6 NANDs in total, that I proved could reproduce logic:


And if one chooses to take commutativity for granted, then these show that all it takes to get logic is one tiny 4NAND axiom.
OK, so it’s neat to be able to say that one’s “finished what Aristotle started” (or at least what Boole started) and found the very simplest possible axiom system for logic. But is it just a curiosity, or is there real significance to it?
Before the whole framework I developed in A New Kind of Science, I think one would have been hardpressed to view it as much more than a curiosity. But now one can see that it’s actually tied into all sorts of foundational questions, like whether one should consider mathematics to be invented or discovered.
Mathematics as humans practice it is based on a handful of particular axiom systems—each in effect defining a certain field of mathematics (say logic, or group theory, or geometry, or set theory). But in the abstract, there are an infinite number of possible axiom systems out there—in effect each defining a field of mathematics that could in principle be studied, even if we humans haven’t ever done it.
Before A New Kind of Science I think I implicitly assumed that pretty much anything that’s just “out there” in the computational universe must somehow be “less interesting” than things we humans have explicitly built and studied. But my discoveries about simple programs made it clear that at the very least there’s often lots of richness in systems that are just “out there” than in ones that we carefully select.
So what about axiom systems for mathematics? Well, to compare what’s just “out there” with what we humans have studied, we have to know where the axiom systems for existing areas of mathematics that we’ve studied—like logic—actually lie. And based on traditional humanconstructed axiom systems we’d conclude that they have to be far, far out there—in effect only findable if one already knows where they are.
But my axiomsystem discovery basically answered the question, “How far out is logic?” For something like cellular automata, it’s particularly easy to assign a number (as I did in the early 1980s) to each possible cellular automaton. It’s slightly harder to do this with axiom systems, though not much. And in one approach, my axiom can be labeled as 411;3;7;118—constructed in the Wolfram Language as:
✕
Groupings[{p, q, r}[[1 + IntegerDigits[411, 3, 7]]], CenterDot > 2][[118]] == r 
And at least in the space of possible functional forms (not accounting for variable labeling), here’s a visual indication of where the axiom lies:
Given how fundamental logic is to so many formal systems we humans study, we might have thought that in any reasonable representation, logic corresponds to one of the very simplest conceivable axiom systems. But at least with the (NANDbased) representation we’re using, that’s not true. There’s still by most measures a very simple axiom system for it, but it’s perhaps the hundred thousandth possible axiom system one would encounter if one just started enumerating axiom systems starting from the simplest one.
So given this, the obvious next question is, what about all the other axiom systems? What’s the story with those? Well, that’s exactly the kind of investigation that A New Kind of Science is all about. And indeed in the book I argue that things like the systems we see in nature are often best captured precisely by those “other rules” that we can find by enumerating possibilities.
In the case of axiom systems, I made a picture that represents what happens in “fields of mathematics” corresponding to different possible axiom systems. Each row shows the consequences of a particular axiom system, with the boxes across the page indicating whether a particular theorem is true in that axiom system. (Yes, at some point Gödel’s Theorem bites one, and it becomes irreducibly difficult to prove or disprove a given theorem in a given axiom system; in practice, with my methods that happened just a little further to the right than the picture shows…)
Is there something fundamentally special about “humaninvestigated” fields of mathematics? From this picture, and other things I’ve studied, there doesn’t seem to be anything obvious. And I suspect actually that the only thing that’s really special about these fields of mathematics is the historical fact that they are what have been studied. (One might make claims like that they arise because they “describe the real world”, or because they’re “related to how our brains work”, but the results in A New Kind of Science argue against these.)
Alright, well then what’s the significance of my axiom system for logic? The size of it gives a sense of the ultimate information content of logic as an axiomatic system. And it makes it look like—at least for now—we should view logic as much more having been “invented as a human construct” than having been “discovered” because it was somehow “naturally exposed”.
If history had been different, and we’d routinely looked (in the manner of A New Kind of Science) at lots of possible simple axiom systems, then perhaps we would have “discovered” the axiom system for logic as one with particular properties we happened to find interesting. But given that we have explored so few of the possible simple axiom systems, I think we can only reasonably view logic as something “invented”—by being constructed in an essentially “discretionary” way.
In a sense this is how logic looked, say, back in the Middle Ages—when the possible syllogisms (or valid forms of argument) were represented by (Latin) mnemonics like bArbArA and cElErAnt. And to mirror this, it’s fun to find mnemonics for what we now know is the simplest possible axiom system for logic.
Starting with , we can represent each in prefix or Polish form (the reverse of the “reverse Polish” of an HP calculator) as Dpq—so the whole axiom can be written =DDDpqrDpDDprpr. Then (as Ed Pegg found for me) there’s an English mnemonic for this: FIGure OuT Queue, where are u, r, e. Or, looking at first letters of words (with operator B, and being a, p, c): “Bit by bit, a program computed Boolean algebra’s best binary axiom covering all cases”.
OK, so how does one actually prove that my axiom system is correct? Well, the most immediate thing to do is just to show that from it one can derive a known axiom system for logic—like Sheffer’s axiom system:
✕
{(p\[CenterDot]p)\[CenterDot](p\[CenterDot]p)==p,p\[CenterDot](q\[CenterDot](q\[CenterDot]q))==p\[CenterDot]p,(p\[CenterDot](q\[CenterDot]r))\[CenterDot](p\[CenterDot](q\[CenterDot]r))==((q\[CenterDot]q)\[CenterDot]p)\[CenterDot]((r\[CenterDot]r)\[CenterDot]p)} 
There are three axioms here, and we’ve got to derive each of them. Well, with the latest version of the Wolfram Language, here’s what we do to derive the first one:
✕
pf = FindEquationalProof[(p\[CenterDot]p)\[CenterDot](p\[CenterDot]p) == p, \!\( \*SubscriptBox[\(\[ForAll]\), \({p, q, r}\)]\(\((\((p\[CenterDot]q)\)\[CenterDot]r)\)\[CenterDot]\((p\ \[CenterDot]\((\((p\[CenterDot]r)\)\[CenterDot]p)\))\) == r\)\)] 
It’s pretty remarkable that it’s now possible to just do this. The “proof object” records that 54 steps were used in the proof. And from this proof object we can generate a notebook that describes each of those steps:
✕
pf["ProofNotebook"] 
In outline, what happens is that a whole sequence of intermediate lemmas are proved, which eventually allow the final result to be derived. There’s a whole network of interdependencies between lemmas, as this visualization shows:
✕
pf["ProofGraph"] 
Here are the networks involved in deriving all three of the axioms in the Sheffer axiom system—with the last one involving a somewhat whopping 504 steps:
And, yes, it’s clear these are pretty complicated. But before we discuss what that complexity means, let’s talk about what actually goes on in the individual steps of these proofs.
The basic idea is straightforward. Let’s imagine we had an axiom that just said . (Mathematically, this corresponds to the statement that · is commutative.) More precisely, what the axiom says is that for any expressions and , is equivalent to .
OK, so let’s say we wanted to derive from this axiom that . We could do this by using the axiom to transform to , to , and then finally to .
FindEquationalProof does essentially the same thing, though it chooses to do the steps in a slightly different order, and modifies the lefthand side as well as the righthand side:
✕
pf = FindEquationalProof[(a\[CenterDot]b)\[CenterDot](c\[CenterDot]d) \ == (d\[CenterDot]c)\[CenterDot](b\[CenterDot]a), \!\( \*SubscriptBox[\(\[ForAll]\), \({p, q}\)]\(p\[CenterDot]q == q\[CenterDot]p\)\)]["ProofNotebook"] 
Once one’s got a proof like this, it’s straightforward to just run through each of its steps, and check that they produce the result that’s claimed. But how does one find the proof? There are lots of different possible sequences of substitutions and transformations that one could do. So how does one find a sequence that successfully gets to the final result?
One might think: why not just try all possible sequences, and if there is any sequence that works, one will eventually find it? Well, the problem is that one quickly ends up with an astronomical number of possible sequences to check. And indeed the main art of automated theorem proving consists of finding ways to prune the number of sequences one has to check.
This quickly gets pretty technical, but the most important idea is easy to talk about if one knows basic algebra. Let’s say you’re trying to prove an algebraic result like:
✕
(1 + x^2) (1  x + x^2) (1 + x + x^2) == (1 + x) (1 + x + x^2) (1 + x^3) 
Well, there’s a guaranteed way to do this: just apply the rules of algebra to expand out each side—and immediately one can see they’re the same:
✕
{Expand[(1 + x^2) (1  x + x^2) (1 + x + x^2)], Expand[(1 + x) (1 + x + x^2) (1 + x^3)]} 
Why does this work? Well, it’s because there’s a way of taking algebraic expressions like this, and always systematically reducing them so that eventually they get to a standard form. OK, but so can one do the same thing for proofs with arbitrary axiom systems?
The answer is: not immediately. It works in algebra because algebra has a special property that guarantees one can always “make progress” in reducing expressions. But what was discovered independently several times in the 1970s (under names like the Knuth–Bendix and the Gröbner Basis algorithm) is that even if an axiom system doesn’t intrinsically have the appropriate property, one can potentially find “completions” of it that do.
And that’s what’s going on in typical proofs produced by FindEquationalProof (which is based on the Waldmeister (“master of trees”) system). There are socalled “critical pair lemmas” that don’t directly “make progress” themselves, but make it possible to set up paths that do. And the reason things get complicated is that even if the final expression one’s trying to get to is fairly short, one may have to go through all sorts of much longer intermediate expressions to get there. And so, for example, for the proof of the first Sheffer axiom above, here are the intermediate steps:
In this case, the largest intermediate form is about 4 times the size of the original axiom. Here it is:
✕
(p\[CenterDot]q)\[CenterDot](((p\[CenterDot]q)\[CenterDot](q\ \[CenterDot]((q\[CenterDot]q)\[CenterDot]q)))\[CenterDot](p\ \[CenterDot]q)) == (((q\[CenterDot](q\[CenterDot]((q\[CenterDot]q)\ \[CenterDot]q)))\[CenterDot]r)\[CenterDot]((p\[CenterDot]q)\ \[CenterDot](((p\[CenterDot]q)\[CenterDot](q\[CenterDot]((q\ \[CenterDot]q)\[CenterDot]q)))\[CenterDot](p\[CenterDot]q))))\ \[CenterDot](q\[CenterDot]((q\[CenterDot]q)\[CenterDot]q)) 
One can represent expressions like this as a tree. Here’s this one, compared to the original axiom:
And here’s how the sizes of intermediate steps evolve through the proofs found for each of the Sheffer axioms:
Is it surprising that these proofs are so complicated? In some ways, not really. Because, after all, we know perfectly well that math can be hard. In principle it might have been that anything that’s true in math would be easy to prove. But one of the side effects of Gödel’s Theorem from 1931 was to establish that even things we can eventually prove can have proofs that are arbitrarily long.
And actually this is a symptom of the much more general phenomenon I call computational irreducibility. Consider a system governed, say, by the simple rule of a cellular automaton (and of course, every essay of mine must have a cellular automaton somewhere!). Now just run the system:
✕
ArrayPlot[CellularAutomaton[{2007,{3,1}},RandomInteger[2,3000],1500],Frame>False,ColorRules>{0>White,1>Hue[0.09, 1., 1.],2>Hue[0.03, 1., 0.76]}] 
One might have thought that given that there’s a simple rule that underlies the system, there’d always be a quick way to figure out what the system will do. But that’s not the case. Because according to my Principle of Computational Equivalence the operation of the system will often correspond to a computation that’s just as sophisticated as any computation that we could set up to figure out the behavior of the system. And this means that the actual behavior of the system in effect corresponds to an irreducible amount of computational work that we can’t in general shortcut in any way.
In the picture above, let’s say we want to know whether the pattern eventually dies out. Well, we could just keep running it, and if we’re lucky it’ll eventually resolve to something whose outcome is obvious. But in general there’s no upper bound to how far we’ll have to go to, in effect, prove what happens.
When we do things like the logic proofs above, it’s a slightly different setup. Instead of just running something according to definite rules, we’re asking whether there exists a way to get to a particular result by taking some series of steps that each follow a particular rule. And, yes, as a practical computational problem, this is immediately more difficult. But the core of the difficulty is still the same phenomenon of computational irreducibility—and that this phenomenon implies that there isn’t any general way to shortcut the process of working out what a system will do.
Needless to say, there are plenty of things in the world—especially in technology and scientific modeling, as well as in areas where there are various forms of regulation—that have traditionally been set up to implicitly avoid computational irreducibility, and to operate in ways whose outcome can readily be foreseen without an irreducible amount of computation.
But one of the implications of my Principle of Computational Equivalence is that this is a rather singular and contrived situation—because it says that computational irreducibility is in fact ubiquitous across systems in the computational universe.
OK, but what about mathematics? Maybe somehow the rules of mathematics are specially chosen to show computational reducibility. And there are indeed some cases where that’s true (and in some sense it even happens in logic). But for the most part it appears that the axiom systems of mathematics are not untypical of the space of all possible axiom systems—where computational irreducibility is inevitably rampant.
At some level, the point of a proof is to know that something is true. Of course, particularly in modern times, proof has very much taken a back seat to pure computation. Because in practice it’s much more common to want to generate things by explicit computation than it is to want to “go back” and construct a proof that something is true.
In pure mathematics, though, it’s fairly common to deal with things that at least nominally involve an infinite number of cases (“true for all primes”, etc.), for which at least direct computation can’t work. And when it comes to questions of verification (“can this program ever crash?” or “can this cryptocurrency ever get spent twice?”) it’s often more reasonable to attempt a proof than to do something like run all possible cases.
But in the actual practice of mathematics, there’s more to proof than just establishing if things are true. Back when Euclid first wrote his Elements, he just gave results, and proofs were “left to the reader”. But for better or worse, particularly over the past century, proof has become something that doesn’t just happen behind the scenes, but is instead actually the primary medium through which things are supposed to be communicated.
At some level I think it’s a quirk of history that proofs are typically today presented for humans to understand, while programs are usually just thought of as things for computers to run. Why has this happened? Well, at least in the past, proofs could really only be represented in essentially textual form—so if they were going to be used, it would have to be by humans. But programs have essentially always been written in some form of computer language. And for the longest time, that language tended to be set up to map fairly directly onto the lowlevel operations of the computer—which meant that it was readily “understandable” by the computer, but not necessarily by humans.
But as it happens, one of the main goals of my own efforts over the past several decades has been to change this—and to develop in the Wolfram Language a true “computational communication language” in which computational ideas can be communicated in a way that is readily understandable to both computers and humans.
There are many consequences of having such a language. But one of them is that it changes the role of proof. Let’s say one’s looking at some mathematical result. Well, in the past the only plausible way to communicate how one should understand it was to give a proof that people could read. But now something different is possible: one can give a Wolfram Language program that computes the result. And in many ways this is a much more powerful way to communicate why the result is true. Because every piece of the program is something precise and unambiguous—that if one wants to, one can actually run. There’s no issue of trying to divine what some piece of text means, perhaps filling in some implicit assumptions. Instead, everything is right there, in absolutely explicit form.
OK, so what about proof? Are there in fact unambiguous and precise ways to write proofs? Well, potentially yes, though it’s not particularly easy. And even though the main Wolfram Language has now existed for 30 years, it’s taken until pretty much now to figure out a reasonable way to represent in it even such structurally comparatively straightforward proofs as the one for my axiom system above.
One can imagine authoring proofs in the Wolfram Language much like one authors programs—and indeed we’re working on seeing how to provide highlevel versions of this kind of “proof assistant” functionality. But the proof of my axiom system that I showed above is not something anyone authored; it’s something that was found by the computer. And as such, it’s more like the output of running a program than like a program itself. (Like a program, though, the proof can in some sense be “run” to verify the result.)
Most of the time when people use the Wolfram Language—or WolframAlpha—they just want to compute things. They’re interested in getting results, not in understanding why they get the results they do. But in WolframAlpha, particularly in areas like math and chemistry, a popular feature for students is “stepbystep solutions”:
When WolframAlpha does something like computing an integral, it’s using all sorts of powerful systematic algorithmic techniques optimized for getting answers. But when it’s asked to show steps it needs to do something different: it needs instead to explain step by step why it gets the result it does.
It wouldn’t be useful for it to explain how it actually got the result; it’s a very nonhuman process. Instead, it basically has to figure out how the kinds of operations humans learn can be used to get the result. Often it’ll figure out some trick that can be used. Yes, there’ll be a systematic way to do it that’ll always work. But it involves too many “mechanical” steps. The “trick” (“trig substitution”, “integration by parts”, whatever) won’t work in general, but in this particular case it’ll provide a faster way to get to the answer.
OK, but what about getting understandable versions of other things? Like the operation of programs in general. Or like the proof of my axiom system.
Let’s start by talking about programs. Let’s say one’s written a program, and one wants to explain how it works. One traditional approach is just to “include comments” in the code. Well, if one’s writing in a traditional lowlevel language, that may be the best one can do. But the whole point of the Wolfram Language being a computational communication language is that the language itself is supposed to allow you to communicate ideas, without needing extra pieces of English text.
It takes effort to make a Wolfram Language program be a good piece of exposition, just like it takes effort to make English text a good piece of exposition. But one can end up with a piece of Wolfram Language code that really explains very clearly how it works just through the code itself.
Of course, it’s very common for the actual execution of the code to do things that can’t readily be foreseen just from the program. I’ll talk about extreme cases like cellular automata soon. But for now let’s imagine that one’s constructed a program where there’s some ability to foresee the broad outlines of what it does.
And in such a case, I’ve found that computational essays (presented as Wolfram Notebooks) are a great tool in explaining what’s going on. It’s crucial that the Wolfram Language is symbolic, so it’s possible to run even the tiniest fragments of any program on their own (with appropriate symbolic expressions as input or output). And when one does this, one can present a succession of steps in the program as a succession of elements in the dialog that forms the core of a computational notebook.
In practice, it’s often critical to create visualizations of inputs or outputs. Yes, everything can be represented as an explicit symbolic expression. But we humans often have a much easier time understanding things when they’re presented visually, rather than as some kind of onedimensional languagelike string.
Of course, there’s something of an art to creating good visualizations. But in the Wolfram Language we’ve managed to go a long way towards automating this art—often using pretty sophisticated machine learning and other algorithms to do things like lay out networks or graphics elements.
What about just starting from the raw execution trace for a program? Well, it’s hard. I’ve done experiments on this for decades, and never been very satisfied with the results. Yes, you can zoom in to see lots of details of what’s going on. But when it comes to knowing the “big picture” I’ve never found any particularly good techniques for automatically producing things that are terribly useful.
At some level it’s similar to the general problem of reverse engineering. You are shown some final machine code, chip design, or whatever. But now you want to go backwards to reconstruct the higherlevel description that some human started from, that was somehow “compiled” to what you see.
In the traditional approach to engineering, where one builds things up incrementally, always somehow being able to foresee the consequences of what one’s building, this approach can in principle work. But if one does engineering by just searching the computational universe to find an optimal program (much like I searched possible axiom systems to find one for logic), then there’s no guarantee that there’s any “human story” or explanation behind this program.
It’s a similar problem in natural science. You see some elaborate set of things happening in some biological system. Can one “reverse engineer” these to find an “explanation” for them? Sometimes one might be able to say, for example, that evolution by natural selection would be likely to lead to something. Or that it’s just common in the computational universe and so is likely to occur. But there’s no guarantee that the natural world is set up in any way that necessarily allows human explanation.
Needless to say, when one makes models for things, one inevitably considers only the particular aspects that one’s interested in, and idealizes everything else away. And particularly in areas like medicine, it’s not uncommon to end up with some approximate model that’s a fairly shallow decision tree that’s easy to explain, at least as far as it goes.
What does it mean to say that something is explainable? Basically it’s that humans can understand it.
So what does it take for humans to understand something? Well, somehow we have to be able to “wrap our brains around it”. Let’s take a typical cellular automaton with complex behavior. A computer has no problem following each step in the evolution. And with immense effort a human could laboriously reproduce what a computer does.
But one wouldn’t say that means the human “understands” what the cellular automaton is doing. To get to that point, the human would have to be readily able to reason about how the cellular automaton behaves, at some high level. Or put another way, the human would have to be able to “tell a story” that other humans could readily understand, about how the cellular automaton behaves.
Is there a general way to do this? Well, no, because of computational irreducibility. But it can still be the case that certain features that humans choose to care about can be explained in some reduced, higherlevel way.
How does this work? Well, in a sense it requires that some higherlevel language be constructed that can describe the features one’s interested in. Looking at a typical cellular automaton pattern, one might try to talk not in terms of colors of huge numbers of individual cells, but instead in terms of the higherlevel structures one can pick out. And the key point is that it’s possible to make at least a partial catalog of these structures: even though there are lots of details that don’t quite fit, there are still particular structures that occur often.
And if we were going to start “explaining” the behavior of the cellular automaton, we’d typically begin by giving the structures names, and then we’d start talking about what’s going on in terms of these named things.
The case of a cellular automaton has an interesting simplifying feature: because it operates according to simple deterministic rules, there are structures that just repeat identically. If we’re dealing with things in the natural world, for example, we typically won’t see this kind of identical repetition. Instead, it’ll just be that this tiger, say, is extremely similar to this other one, so we can call them both “tigers”, even though their atoms are not identical in their arrangement.
What’s the bigger picture of what’s going on? Well, it’s basically that we’re using the idea of symbolic representation. We’re saying that we can assign something—often a word—that we can use to symbolically describe a whole class of things, without always having to talk about all the detailed parts of each thing.
In effect it’s a kind of information compression: we’re using symbolic constructs to find a shorter way to describe what we’re interested in.
Let’s imagine we’ve generated a giant structure, say a mathematical one:
✕
Solve[a x^4 + b x^3 + c x^2 + d x + e == 0, x] 
Well, a first step is to generate a kind of internal higherlevel representation. For example, we might find substructures that appear repeatedly. And we might then assign names to them. And then display a “skeleton” of the whole structure in terms of these names:
And, yes, this kind of “dictionary compression”–like scheme is useful in bringing a first level of explainability.
But let’s go back to the proof of my axiom system. The lemmas that were generated in this proof are precisely set up to be elements that are used repeatedly (a bit like shared common subexpressions). But even having in effect factored them out, we’re still left with a proof that is not something that we humans can readily understand.
So how can we go further? Well, basically we have to come up with some yethigherlevel description. But what might this be?
If you’re trying to explain something to somebody, it’s a lot easier when there’s something similar that they’ve already understood. Imagine trying to explain a modern drone to someone from the Stone Age. It’d probably be pretty difficult. But explaining it to someone from 50 years ago, who’d already seen helicopters and model airplanes etc., would be a lot easier.
And ultimately the point is that when we explain something, we do it in some language that both we and whoever we’re explaining it to knows. And the richer this language is, the fewer new elements we have to introduce in order to communicate whatever it is that we’re trying to explain.
There’s a pattern that’s been repeated throughout intellectual history. Some particular collection of things gets seen a bunch of times. And gradually it’s understood that these things are all somehow abstractly similar. And they can all be described in terms of some particular new concept, often referred to by some new word or phrase.
Let’s say one had seen things like water and blood and oil. Well, at some point one realizes that there’s a general concept of “liquid”, and all of these can be described as liquids. And once one has this concept, one can start reasoning in terms of it, and identifying more concepts—like, say, viscosity—that build on it.
When does it makes sense to group things into a concept? Well, that’s a difficult question, which can’t ultimately be answered without foreseeing everything that might be done with that concept. And in practice, in the evolution of human language and human ideas there’s some kind of process of progressive approximation that goes on.
There’s a much more rapid recapitulation that happens in a modern machine learning system. Imagine taking all sorts of objects that one’s seen in the world, and just feeding them to FeatureSpacePlot and seeing what comes out. Well, if one gets definite clusters in feature space, then one might reasonably think that each of these clusters should be identified as corresponding to a “concept”, that we could for example label with a word.
Now, to be fair, what’s happening with FeatureSpacePlot—like in human intellectual development—is in some ways incremental. Because to lay the objects out in feature space, FeatureSpacePlot is using features that it’s learned how to extract from previous categorizations it knows about.
But, OK, given the world as it is, what are the best categories—or best concepts—one can use to describe things? Well, it’s an evolving story. And in fact breakthroughs—whether in science, technology or elsewhere—are very often precisely associated with the realization that some new category or concept can usefully be identified.
But in the actual evolution of our civilization, there’s a kind of spiral at work. First some particular concept is identified—say the idea of a program. And once some concept has been identified, people start using it, and thinking in terms of it. And pretty soon all sorts of new things have been constructed on the basis of that concept. But then another level of abstraction is identified, and new concepts get constructed, building on top of the previous one.
It’s pretty much the same story for the technology stack of modern civilization, and its “intellectual stack”. Both involve towers of concepts, and successive levels of abstraction.
In order for people to be able to communicate using some concept, they have to have learned about it. And, yes, there are some concepts (like object permanence) that humans automatically learn by themselves just by observing the natural world. But looking for example at a list of common words in modern English, it’s pretty clear that most of the concepts that we now use in modern civilization aren’t ones that people can just learn for themselves from the natural world.
Instead—much like a modern machine learning system—at the very least they need some “specially curated” experience of the world, organized to highlight particular concepts. And for more abstract areas (like mathematics) they probably need explicit exposure to the concepts themselves in their raw abstract forms.
But, OK, as the “intellectual stack” of civilization advances, will we always have to learn progressively more? We might worry that at some point our brains just won’t be able to keep up, and we’d have to add some kind of augmentation. But perhaps fortunately, I think it’s one of those cases where the problem can instead most likely be “solved in software”.
The issue is this: At any given point in history, there’s a certain set of concepts that are important in being able to operate in the world as it is at that time. And, yes, as civilization progresses new things are discovered, and new concepts are introduced. But there’s another process at work as well: new concepts bring new levels of abstraction, which typically subsume large numbers of earlier concepts.
We often see this in technology. There was a time when to operate a computer you needed to know all sorts of lowlevel details. But over time those got abstracted away, so all you need to know is some general concept. You click an icon and things start to happen—and you don’t have to understand operating systems, or interrupt handlers or schedulers, or any of those details.
Needless to say, the Wolfram Language provides a great example of all this. Because it goes to tremendous trouble to “automate out” lots of lowlevel details (for example about what specific algorithm to use) and let human users just think about things in terms of higherlevel concepts.
Yes, there still need to be some people who understand the details “underneath” the abstraction (though I’m not sure how many flint knappers modern society needs). But mostly education can concentrate on teaching at a higher level.
There’s often an implicit assumption in education that to reach higherlevel concepts one has to somehow recapitulate the history of how those concepts were historically arrived at. But usually—and perhaps always—this doesn’t seem to be true. In an extreme case, one might imagine that to teach about computers, one would have to recapitulate the history of mathematical logic. But actually we know that people can go straight to modern concepts of computing, without recapitulating any of the history.
But what is ultimately the understandability network of concepts? Are there concepts that can only be understood if one already understands other concepts? Given a particular ambient experience for a human (or particular background training for a neural network) there is presumably some ordering.
But I suspect that something analogous to computation universality probably implies that if one’s just dealing with a “raw brain” then one could start anywhere. So if some alien were exposed to category theory and little else from the very beginning, they’d no doubt build a network of concepts where this is at the root, and maybe what for us is basic arithmetic would be something only reached in their analog of math graduate school.
Of course, such an alien might form their technology stack and their built environment in a quite different way from us—much as the recent history of our own civilization might have been very different if computers had successfully been developed in the 1800s rather than in the mid1900s.
I’ve often wondered to what extent the historical trajectory of human mathematics is an “accident”, and to what extent it’s somehow inexorable. As I mentioned earlier, at the level of formal systems there are many possible axiom systems from which one could construct something that is formally like mathematics.
But the actual history of mathematics did not start with arbitrary axiom systems. It started—in Babylonian times—with efforts to use arithmetic for commerce and geometry for land surveying. And from these very practical origins, successive layers of abstraction have been added that have led eventually to modern mathematics—with for example numbers being generalized from positive integers, to rationals, to roots, to all integers, to decimals, to complex numbers, to algebraic numbers, to quaternions and so on.
Is there an inexorability to this progression of abstraction? I suspect to some extent there is. And probably it’s a similar story as with other kinds of concept formation. Given some stage that’s been reached, there are various things that can readily get studied, and after a while groups of them are seen to be examples of more general and abstract constructs—which then in turn define another stage from which new things can be studied.
Are there ways to break out of this cycle? One possibility would be through doing experiments in mathematics. Yes, one can systematically prove things about particular mathematical systems. But one can also just empirically notice mathematical facts—like Ramanujan’s observation that is numerically close to an integer. And the question is: are things like this just “random facts of mathematics” or do they somehow fit into the whole “fabric of mathematics”?
One can ask the same kind of thing about questions in mathematics. Is the question of whether odd perfect numbers exist (which has been unanswered since Pythagoras) a core question in mathematics, or is it, in a sense, a random question that doesn’t connect into the fabric of mathematics?
Just like one can enumerate things like axiom systems, so also one can imagine enumerating possible questions in mathematics. But if one does this, I suspect there’s immediately an issue. Gödel’s Theorem establishes that in axiom systems like the one for arithmetic there are “formally undecidable” propositions, that can’t be proved or disproved from within the axiom system.
But the particular examples that Gödel constructed seemed far from anything that would arise naturally in doing mathematics. And for a long time it was assumed that somehow the phenomenon of undecidability was something that, while in principle present, wasn’t going to be relevant in “real mathematics”.
However, with my Principle of Computational Equivalence and my experience in the computational universe, I’ve come to the strong conclusion that this isn’t correct—and that instead undecidability is actually close at hand even in typical mathematics as it’s been practiced. Indeed, I won’t be surprised if a fair fraction of the current famous unsolved problems of mathematics (Riemann Hypothesis, P=NP, etc.) actually turn out to be in effect undecidable.
But if there’s undecidability all around, how come there’s so much mathematics that’s successfully been done? Well, I think it’s because the things that have been done have implicitly been chosen to avoid undecidability, basically just by virtue of the way mathematics has been built up. Because if what one’s doing is basically to form progressive levels of abstraction based on things one has shown are true, one’s basically setting up a path that’s going to be able to move forward without being forced into undecidability.
Of course, doing experimental mathematics or asking “random questions” may immediately land one in some area that’s full of undecidability. But at least so far in its history, this hasn’t been the way the mainstream discipline of mathematics has evolved.
So what about those “random facts of mathematics”? Well, it’s pretty much like in other areas of intellectual endeavor. “Random facts” don’t really get integrated into a line of intellectual development until some structure—and typically some abstract concepts—are built around them.
My own favorite discoveries about the origins of complexity in systems like the rule 30 cellular automaton are a good example. Yes, similar phenomena had been seen—even millennia earlier (think: randomness in the sequence of primes). But without a broader conceptual framework nobody really paid attention to them.
Nested patterns are another example. There are isolated examples of these in mosaics from the 1200s, but nobody really paid attention to them until the whole framework around nesting and fractals emerged in the 1980s.
It’s the same story over and over again: until abstract concepts around them have been identified, it’s hard to really think about new things, even when one encounters phenomena that exhibit them.
And so, I suspect, it is with mathematics: there’s a certain inevitable layering of abstract concept on top of abstract concept that defines the trajectory of mathematics. Is it a unique path? Undoubtedly not. In the vast space of possible mathematical facts, there are particular directions that get picked, and built along. But others could have been picked instead.
So does this mean that the subject matter of mathematics is inevitably dominated by historical accidents? Not as much as one might think. Because—as mathematics has discovered over and over again, starting with things like algebra and geometry—there’s a remarkable tendency for different directions and different approaches to wind up having equivalences or correspondences in the end.
And probably at some level this is a consequence of the Principle of Computational Equivalence, and the phenomenon of computational universality: even though the underlying rules (or underlying “language”) used in different areas of mathematics are different, there ends up being some way to translate between them—so that at the next level of abstraction the path that was taken no longer critically matters.
OK, so let’s go back to the logic proof. How does it connect to typical mathematics? Well, right now, it basically doesn’t. Yes, the proof has the same nominal form as a standard mathematical proof. But it isn’t “humanmathematician friendly”. It’s all just mechanical details. It doesn’t connect to higherlevel abstract concepts that a human mathematician can readily understand.
It would help a lot if we discovered that nontrivial lemmas in the proof already appeared in the mathematics literature. (I don’t think any of them do, but our theoremsearching capabilities haven’t gotten to the point where one can be sure.) But if they did appear, then this would likely give us a way to connect these lemmas to other things in mathematics, and in effect to identify a circle of abstract concepts around them.
But without that, how can the proof become explainable?
Well, maybe there’s just a different way to do the proof that’s fundamentally more connected to existing mathematics. But even given the proof as we have it now, one could imagine “building out” new concepts that would define a higher level of abstraction and put the proof in a more general context.
I’m not sure how to do either of these things. I’ve considered sponsoring a prize (analogous to my 2007 Turing machine prize) for “making the proof explainable”. But it’s not at all clear how one could objectively judge “explainability”. (Maybe one could ask for a 1hour video that would successfully explain the proof to a typical mathematician—but this is definitely rather subjective.)
But just like we can automate things like finding aesthetic layouts for networks, perhaps we can automate the process of making a proof explainable. The proof as it is right now basically just says (without explanation), “Consider these few hundred lemmas”. But let’s say we could identify a modest number of “interesting” lemmas. Maybe we could somehow add these to our canon of known mathematics and then be able to use them to understand the proof.
There’s an analogy here with language design. In building up the Wolfram Language what I’ve basically done is to try to identify “lumps of computational work” that people will often want. Then we make these into builtin functions in the language, with particular names that people can use to refer to them.
A similar process goes on—though in a much less organized way—in the evolution of human natural languages. “Lumps of meaning” that turn out to be useful eventually get represented by words in the language. Sometimes they start as phrases constructed out of a few existing words. But the most impactful ones are typically sufficiently far away from anything that has come before that they just arrive as new words with potentially quitehardtogive definitions.
In the design of the Wolfram Language—with functions named with English words—I leverage the “ambient understanding” that comes from the English words (and sometimes from their meanings in common applications of computation).
One would want to do something similar in identifying lemmas to add to our canon of mathematics. Not only would one want to make sure that each lemma was somehow “intrinsically interesting”, but one would also want when possible to select lemmas that are “easy to reach” from existing known mathematical results and concepts.
But what does it mean for a lemma to be “intrinsically interesting”? I have to say that before I worked on A New Kind of Science, I assumed that there was great arbitrariness and historical accident in the choice of lemmas (or theorems) in any particular areas of mathematics that get called out and given names in typical textbooks.
But when I looked in detail at theorems in basic logic, I was surprised to find something different. Let’s say one arranges all the true theorems of basic logic in order of their sizes (e.g. might come first; AND a bit later, and so on). When one goes through this list there’s lots of redundancy. Indeed, most of the theorems end up being trivial extensions of theorems that have already appeared in the list.
But just sometimes one gets to a theorem that essentially gives new information—and that can’t be proved from the theorems that have already appeared in the list. And here’s the remarkable fact: there are 14 such theorems, and they essentially correspond exactly with the theorems that are typically given names in textbooks of logic. (Here AND is ∧, OR is ∨, and NOT is ¬.)
In other words, at least in this case, the named or “interesting” theorems are the ones that give minimal statements of new information. (Yes, after a while, by this definition there will be no new information, because one will have encountered all the axioms needed to prove anything that can be proved—though one can go a bit further with this approach by starting to discuss limiting the complexity of proofs that are allowed.)
What about with NAND theorems, like the ones in the proof? Once again, one can arrange all true NAND theorems in order—and then find which of them can’t be proved from any earlier in the list:
✕
{{a\[CenterDot]b==b\[CenterDot]a,a==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a),a==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b),(a\[CenterDot]a)\[CenterDot]a==(b\[CenterDot]b)\[CenterDot]b,(a\[CenterDot]a)\[CenterDot]b==(a\[CenterDot]b)\[CenterDot]b,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot]((b\[CenterDot]b)\[CenterDot]c)},{a\[CenterDot]b==b\[CenterDot]a,a==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a),a==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b),a==(a\[CenterDot]a)\[CenterDot](b\[CenterDot]a),a==(a\[CenterDot]b)\[CenterDot](a\[CenterDot]a),a==(b\[CenterDot]a)\[CenterDot](a\[CenterDot]a),(a\[CenterDot]a)\[CenterDot]a==a\[CenterDot](a\[CenterDot]a),(a\[CenterDot]a)\[CenterDot]a==(b\[CenterDot]b)\[CenterDot]b,a\[CenterDot](a\[CenterDot]a)==(b\[CenterDot]b)\[CenterDot]b,(a\[CenterDot]a)\[CenterDot]a==b\[CenterDot](b\[CenterDot]b),a\[CenterDot](a\[CenterDot]a)==b\[CenterDot](b\[CenterDot]b),a\[CenterDot](a\[CenterDot]b)==(a\[CenterDot]b)\[CenterDot]a,a\[CenterDot](a\[CenterDot]b)==a\[CenterDot](b\[CenterDot]a),(a\[CenterDot]a)\[CenterDot]b==(a\[CenterDot]b)\[CenterDot]b,a\[CenterDot](a\[CenterDot]b)==a\[CenterDot](b\[CenterDot]b),a\[CenterDot](a\[CenterDot]b)==(b\[CenterDot]a)\[CenterDot]a,(a\[CenterDot]a)\[CenterDot]b==b\[CenterDot](a\[CenterDot]a),(a\[CenterDot]a)\[CenterDot]b==(b\[CenterDot]a)\[CenterDot]b,(a\[CenterDot]a)\[CenterDot]b==b\[CenterDot](a\[CenterDot]b),a\[CenterDot](a\[CenterDot]b)==(b\[CenterDot]b)\[CenterDot]a,(a\[CenterDot]a)\[CenterDot]b==b\[CenterDot](b\[CenterDot]a),(a\[CenterDot]b)\[CenterDot]a==a\[CenterDot](b\[CenterDot]a),(a\[CenterDot]b)\[CenterDot]a==a\[CenterDot](b\[CenterDot]b),a\[CenterDot](b\[CenterDot]a)==a\[CenterDot](b\[CenterDot]b),(a\[CenterDot]b)\[CenterDot]a==(b\[CenterDot]a)\[CenterDot]a,a\[CenterDot](b\[CenterDot]a)==(b\[CenterDot]a)\[CenterDot]a,(a\[CenterDot]b)\[CenterDot]a==(b\[CenterDot]b)\[CenterDot]a,a\[CenterDot](b\[CenterDot]a)==(b\[CenterDot]b)\[CenterDot]a,a\[CenterDot](b\[CenterDot]b)==(b\[CenterDot]a)\[CenterDot]a,(a\[CenterDot]b)\[CenterDot]b==b\[CenterDot](a\[CenterDot]a),(a\[CenterDot]b)\[CenterDot]b==(b\[CenterDot]a)\[CenterDot]b,(a\[CenterDot]b)\[CenterDot]b==b\[CenterDot](a\[CenterDot]b),a\[CenterDot](b\[CenterDot]b)==(b\[CenterDot]b)\[CenterDot]a,(a\[CenterDot]b)\[CenterDot]b==b\[CenterDot](b\[CenterDot]a),a\[CenterDot](b\[CenterDot]c)==a\[CenterDot](c\[CenterDot]b),(a\[CenterDot]b)\[CenterDot]c==(b\[CenterDot]a)\[CenterDot]c,a\[CenterDot](b\[CenterDot]c)==(b\[CenterDot]c)\[CenterDot]a,(a\[CenterDot]b)\[CenterDot]c==c\[CenterDot](a\[CenterDot]b),a\[CenterDot](b\[CenterDot]c)==(c\[CenterDot]b)\[CenterDot]a,(a\[CenterDot]b)\[CenterDot]c==c\[CenterDot](b\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a)==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a)==(a\[CenterDot]a)\[CenterDot](b\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a)==(a\[CenterDot]b)\[CenterDot](a\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]a)==(b\[CenterDot]a)\[CenterDot](a\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b)==(a\[CenterDot]a)\[CenterDot](a\[CenterDot]c),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b)==(a\[CenterDot]a)\[CenterDot](b\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b)==(a\[CenterDot]a)\[CenterDot](c\[CenterDot]a),(a\[CenterDot]a)\[CenterDot](a\[CenterDot]b)==(a\[CenterDot]b)\[CenterDot](a\[CenterDot]a)},{a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==(c\[CenterDot](a\[CenterDot]a))\[CenterDot]a,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==((c\[CenterDot]a)\[CenterDot]c)\[CenterDot]a,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==(c\[CenterDot](a\[CenterDot]c))\[CenterDot]a,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==(c\[CenterDot](b\[CenterDot]b))\[CenterDot]b,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==((c\[CenterDot]b)\[CenterDot]c)\[CenterDot]b,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==(c\[CenterDot](b\[CenterDot]c))\[CenterDot]b,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==(c\[CenterDot](c\[CenterDot]a))\[CenterDot]a,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==(c\[CenterDot](c\[CenterDot]b))\[CenterDot]b,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==((c\[CenterDot]c)\[CenterDot]c)\[CenterDot]a,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]b)==(c\[CenterDot](c\[CenterDot]c))\[CenterDot]a,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==((c\[CenterDot]c)\[CenterDot]c)\[CenterDot]b,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]b==(c\[CenterDot](c\[CenterDot]c))\[CenterDot]b,a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==a\[CenterDot](a\[CenterDot](c\[CenterDot]b)),(a\[CenterDot](a\[CenterDot]b))\[CenterDot]c==((a\[CenterDot]b)\[CenterDot]a)\[CenterDot]c,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]c==(a\[CenterDot](b\[CenterDot]a))\[CenterDot]c,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot]((b\[CenterDot]a)\[CenterDot]c),((a\[CenterDot]a)\[CenterDot]b)\[CenterDot]c==((a\[CenterDot]b)\[CenterDot]b)\[CenterDot]c,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]c==(a\[CenterDot](b\[CenterDot]b))\[CenterDot]c,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot]((b\[CenterDot]b)\[CenterDot]c),a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==((a\[CenterDot]b)\[CenterDot]c)\[CenterDot]a,a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==(a\[CenterDot](b\[CenterDot]c))\[CenterDot]a,a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==a\[CenterDot]((b\[CenterDot]c)\[CenterDot]a),a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==((a\[CenterDot]b)\[CenterDot]c)\[CenterDot]c,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]c==(a\[CenterDot](b\[CenterDot]c))\[CenterDot]c,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot]((b\[CenterDot]c)\[CenterDot]c),a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot](c\[CenterDot](a\[CenterDot]b)),a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==(a\[CenterDot](c\[CenterDot]b))\[CenterDot]a,a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==a\[CenterDot]((c\[CenterDot]b)\[CenterDot]a),a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot](c\[CenterDot](b\[CenterDot]a)),a\[CenterDot](a\[CenterDot](b\[CenterDot]c))==((a\[CenterDot]c)\[CenterDot]b)\[CenterDot]b,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot](c\[CenterDot](b\[CenterDot]b)),((a\[CenterDot]a)\[CenterDot]b)\[CenterDot]c==((a\[CenterDot]c)\[CenterDot]b)\[CenterDot]c,(a\[CenterDot](a\[CenterDot]b))\[CenterDot]c==(a\[CenterDot](c\[CenterDot]b))\[CenterDot]c,a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot]((c\[CenterDot]b)\[CenterDot]c),a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot](c\[CenterDot](b\[CenterDot]c)),a\[CenterDot]((a\[CenterDot]b)\[CenterDot]c)==a\[CenterDot](c\[CenterDot](c\[CenterDot]b))}} 
NAND doesn’t have the same kind of historical traditional as AND, OR and NOT. (And there doesn’t seem to be any human language that, for example, has a single ordinary word for NAND.) But in the list of NAND theorems, the first highlighted one is easy to recognize as commutativity of NAND. After that, one really has to do a bit of translation to name the theorems: is like the law of double negation, is like the absorption law, is like “weakening”, and so on.
But, OK, so if one’s going to learn just a few “key theorems” of NAND logic, which should they be? Perhaps they should be theorems that appear as “popular lemmas” in proofs.
Of course, there are many possible proofs of any given theorem. But let’s say we just use the particular proofs that FindEquationalProof generates. Then it turns out that in the proofs of the first thousand NAND theorems the single most popular lemma is , followed by such lemmas like .
What are these? Well, for the particular methods that FindEquationalProof uses, they’re useful. But for us humans they don’t seem terribly helpful.
But what about popular lemmas that happen to be short? is definitely not the most popular lemma, but it is the shortest. is more popular, but longer. And then there are lemmas like .
But how useful are these lemmas? Here’s a way to test. Look at the first thousand NAND theorems, and see how much adding the lemmas shortens the proofs of these theorems (at least as found by FindEquationalProof):
is very successful, often cutting down the proof by nearly 100 steps. is much less successful; in fact, it actually sometimes seems to “confuse” FindEquationalProof, causing it to take more rather than fewer steps (visible as negative values in the plot). is OK at shortening, but not as good as . Though if one combines it with , the result is more consistent shortening.
One could go on with this analysis, say including a comparison of how much shortening is produced by a given lemma, relative to how long its own proof was. But the problem is that if one adds several “useful lemmas”, like and , there are still plenty of long proofs—and thus a lot left to “understand”:
There are different ways to create models of things. For a few hundred years, exact science was dominated by the idea of finding mathematical equations that could be solved to say how things should behave. But in pretty much the time since A New Kind of Science appeared, there’s been a strong shift to instead set up programs that can be run to say how things should behave.
Sometimes those programs are explicitly constructed for a particular purpose; sometimes they’re exhaustively searched for. And in modern times, at least one class of such programs is deduced using machine learning, essentially by going backwards from examples of how the system is known to behave.
OK, so with these different forms of modeling, how easy is it to “understand what’s going on”? With mathematical equations, it’s a big plus when it’s possible to find an “exact solution”—in which the behavior of the system can be represented by something like an explicit mathematical formula. And even when this doesn’t happen, it’s fairly common to be able to make at least some mathematical statements that are abstract enough to connect to other systems and other behaviors.
As I discussed above, with a program—like a cellular automaton—it can be a different story. Because it’s common to be thrust immediately into computational irreducibility, which ultimately limits how much one can ever hope to shortcut or “explain” what’s going on.
But what about with machine learning, and, say, with neural nets? At some level, the training of a neural net is like recapitulating inductive discovery in natural science. One’s trying to start from examples and deduce a model for how a system behaves. But then can one understand the model?
Again there are issues of computational irreducibility. But let’s talk about a case where we can at least imagine what it would look like to understand what’s going on.
Instead of using a neural net to model how some system behaves, let’s consider making a neural net that classifies some aspect of the world: say, takes images and classifies them according to what they’re images of (“boat”, “giraffe”, etc.). When we train the neural net, it’s learning to give correct final outputs. But potentially one can think of the way it does this as being to internally make a sequence of distinctions (a bit like playing a game of Twenty Questions) that eventually determines the correct output.
But what are those distinctions? Sometimes we can recognize some of them. “Is there a lot of blue in the image?”, for example. But most of the time they’re essentially features of the world that we humans don’t notice. Maybe there’s an alternative history of natural science where some of them would have shown up. But they’re not things that are part of our current canon of perception or analysis.
If we wanted to add them, we’d probably end up inventing words for them. But the situation is very similar to the one with the logic proof. An automated system has created things that it’s effectively using as “waypoints” in generating a result. But they’re not waypoints we recognize or relate to.
Once again, if we found that particular distinctions were very common for neural nets, we might decide that those are distinctions that are worth us humans learning, and adding to our standard canon of ways to describe the world.
Can we expect that a modest number of such distinctions would go a long way? It’s analogous to asking whether a modest number of theorems would go a long way in understanding something like the logic proof.
My guess is that the answer is fuzzy. If one looks, for example, at a large corpus of math papers, one can ask how common different theorems are. It turns out that the frequency of theorems follows an almost perfect Zipf law (with the Central Limit Theorem, the Implicit Function Theorem and Fubini’s Theorem as the top three). And it’s probably the same with distinctions that are “worth knowing”, or new theorems that are “worth knowing”.
Knowing a few will get one a certain distance, but there’ll be an infinite powerlaw tail, and one will never get to the end.
Whether one looks at mathematics, science or technology, one sees the same basic qualitative progression of building a stack of increasing abstraction. It would be nice to be able to quantify this process. Perhaps one could look at how certain terms or descriptions that are common at one time later get subsumed into higher levels of abstraction, which then in turn have new terms or descriptions associated with them.
Maybe one could create an idealized model of this process using some formal model of computation, like Turing machines. Imagine that at the lowest level one has a basic Turing machine, with no abstraction. Now imagine selecting programs for this Turing machine according to some defined random process. Then run these programs and analyze them to see what “higherlevel” model of computation can successfully reproduce the aggregate behavior of these programs without having to run each step in each program.
One might have thought that computational irreducibility would imply that this higherlevel model of computation would inevitably be more complicated in its construction. But the key point is that we’re only trying to reproduce the aggregate behavior of the programs, not their individual behavior.
OK, but so then what happens if you iterate this process—essentially recapitulating idealized human intellectual history and building a progressive tower of abstraction?
Conceivably there’s some analogy to critical phenomena in physics, and to the renormalization group. And if so, one might imagine being able to identify a definite trajectory in the space of what amount to concept representation frameworks. What will the trajectory do?
Maybe it’ll have some kind of fixedpoint behavior, representing the guess that at any point in history there are about the same number of abstract concepts that are worth learning—with new ones slowly being invented, and old ones being subsumed.
What might any of this mean for mathematics? One guess might be that any “random fact of mathematics”, say discovered empirically, would eventually be covered when some level of abstraction is reached. It’s not obvious how this process would work. After all, at any given level of abstraction, there are always new empirical facts to be “jumped to”. And it might very well be that the “rising tide of abstraction” would move only slowly compared to the rate at which such jumps could be made.
OK, so what does all this mean for the future of understanding?
In the past, when humans looked, say, at the natural world, they had few pretensions to understand it. Sometimes they would personify certain aspects of it in terms of spirits or deities. But they saw it as just acting as it did, without any possibility for humans to understand in detail why.
But with the rise of modern science—and especially as more of our everyday existence came to be in built environments dominated by technology (or regulatory structures) that we had designed—these expectations changed. And as we look at computation or AI today, it seems unsettling that we might not be able to understand it.
But ultimately there’s always going to be a competition between what the systems in our world do, and what our brains are capable of computing about them. If we choose to interact only with systems that are computationally much simpler than our brains, then, yes, we can expect to use our brains to systematically understand what the systems are doing.
But if we actually want to make full use of the computational capabilities that our universe makes possible, then it’s inevitable that the systems we’re dealing with will be equivalent in their computational capabilities to our brains. And this means that—as computational irreducibility implies—we’ll never be able to systematically “outthink” or “understand” those systems.
But then how can we use them? Well, pretty much like people have always used systems from the natural world. Yes, we don’t know everything about how they work or what they might do. But at some level of abstraction we know enough to be able to see how to get purposes we care about achieved with them.
What about in an area like mathematics? In mathematics we’re used to building our stack of knowledge so that each step is something we can understand. But experimental mathematics—as well as things like automated theorem proving—make it clear that there are places to go that won’t have this feature.
Will we call this “mathematics”? I think we should. But it’s a different tradition from what we’ve mostly used for the past millennium. It’s one where we can still build abstractions, and we can still construct new levels of understanding.
But somewhere underneath there will be all sorts of computational irreducibility that we’ll never really be able to bring into the realm of human understanding. And that’s basically what’s going on in the proof of my little axiom for logic. It’s an early example of what I think will be the dominant experience in mathematics—and a lot else—in the future.
]]>It was 1968. I was 8 years old. The “space race” was in full swing. For the first time, a space probe had recently landed on another planet (Venus). And I was eagerly studying everything I could to do with space.
Then on April 3, 1968 (May 15 in the UK), the movie 2001: A Space Odyssey was released—and I was keen to see it. So in the early summer of 1968 there I was, the first time I’d ever been in an actual cinema (yes, it was called that in the UK). I’d been dropped off for a matinee, and was pretty much the only person in the theater. And to this day, I remember sitting in a plush seat and eagerly waiting for the curtain to go up, and the movie to begin.
It started with an impressive extraterrestrial sunrise. But then what was going on? Those weren’t space scenes. Those were landscapes, and animals. I was confused, and frankly a little bored. But just when I was getting concerned, there was a bone thrown in the air that morphed into a spacecraft, and pretty soon there was a rousing waltz—and a big space station turning majestically on the screen.
The next two hours had a big effect on me. It wasn’t really the spacecraft (I’d seen plenty of them in books by then, and in fact made many of my own concept designs). And at the time I didn’t care much about the extraterrestrials. But what was new and exciting for me in the movie was the whole atmosphere of a world full of technology—and the notion of what might be possible there, with all those bright screens doing things, and, yes, computers driving it all.
It would be another year before I saw my first actual computer in real life. But those two hours in 1968 watching 2001 defined an image of what the computational future could be like, that I carried around for years.
I think it was during the intermission to the movie that some seller of refreshments—perhaps charmed by a solitary kid so earnestly pondering the movie—gave me a “cinema program” about the movie. Half a century later I still have that program, complete with a food stain, and faded writing from my 8yearold self, recording (with some misspelling) where and when I saw the movie:
A lot has happened in the past 50 years, particularly in technology, and it’s an interesting experience for me to watch 2001 again—and compare what it predicted with what’s actually happened. Of course, some of what’s actually been built over the past 50 years has been done by people like me, who were influenced in larger or smaller ways by 2001.
When WolframAlpha was launched in 2009—showing some distinctly HALlike characteristics—we paid a little homage to 2001 in our failure message (needless to say, one piece of notable feedback we got at the beginning was someone asking: “How did you know my name was Dave?!”):
One very obvious prediction of 2001 that hasn’t panned out, at least yet, is routine, luxurious space travel. But like many other things in the movie, it doesn’t feel like what was predicted was off track; it’s just that—50 years later—we still haven’t got there yet.
So what about the computers in the movie? Well, they have lots of flatscreen displays, just like real computers today. In the movie, though, one obvious difference is that there’s one physical display per functional area; the notion of windows, or dynamically changeable display areas, hadn’t arisen yet.
Another difference is in how the computers are controlled. Yes, you can talk to HAL. But otherwise, it’s lots and lots of mechanical buttons. To be fair, cockpits today still have plenty of buttons—but the centerpiece is now a display. And, yes, in the movie there weren’t any touchscreens—or mice. (Both had actually been invented a few years before the movie was made, but neither was widely known.)
There also aren’t any keyboards to be seen (and in the hightech spacecraft full of computers going to Jupiter, the astronauts are writing with pens on clipboards; presciently, no slide rules and no tape are shown—though there is one moment when a printout that looks awfully like a punched card is produced). Of course, there were keyboards for computers back in the 1960s. But in those days, very few people could type, and there probably didn’t seem to be any reason to think that would change. (Being something of a committed tool user, I myself was routinely using a typewriter even in 1968, though I didn’t know any other kids who were—and my hands at the time weren’t big or strong enough to do much other than type fast with one finger, a skill whose utility returned decades later with the advent of smartphones.)
What about the content of the computer displays? That might have been my favorite thing in the whole movie. They were so graphical, and communicating so much information so quickly. I had seen plenty of diagrams in books, and had even painstakingly drawn quite a few myself. But back in 1968 it was amazing to imagine that a computer could generate information, and display it graphically, so quickly.
Of course there was television (though color only arrived in the UK in 1968, and I’d only seen black and white). But television wasn’t generating images; it was just showing what a camera saw. There were oscilloscopes too, but they just had a single dot tracing out a line on the screen. So the computer displays in 2001 were, at least for me, something completely new.
At the time it didn’t seem odd that in the movie there were lots of printed directions (how to use the “Picturephone”, or the zerogravity toilet, or the hibernation modules). Today, any such instructions (and they’d surely be much shorter, or at least broken up a lot, for today’s less patient readers) would be shown onscreen. But when 2001 was made, the idea of word processing, and of displaying text to read onscreen, was still several years in the future—probably not least because at the time people thought of computers as machines for calculation, and there didn’t seem to be anything calculational about text.
There are lots of different things shown on the displays in 2001. Even though there isn’t the idea of dynamically movable windows, the individual displays, when they’re not showing anything, go into a kind of “iconic” state, just showing in large letters codes like NAV or ATM or FLX or VEH or GDE.
When the displays are active they sometimes show things like tables of numbers, and sometimes show lightly animated versions of a whole variety of textbooklike diagrams. A few of them show 1980sstyle animated 3D line graphics (“what’s the alignment of the spacecraft?”, etc.)—perhaps modeled after analog airplane controls.
But very often there’s also something else—and occasionally it fills a whole display. There’s something that looks like code, or a mixture of code and math.
It’s usually in a fairly “modernlooking” sans serif font (well, actually, a font called Manifold for IBM Selectric electric typewriters). Everything’s uppercase. And with stars and parentheses and names like TRAJ04, it looks a bit like early Fortran code (except that given the profusion of semicolons, it was more likely modeled on IBM’s PL/I language). But then there are also superscripts, and builtup fractions—like math.
Looking at this now, it’s a bit like trying to decode an alien language. What did the makers of the movie intend this to be about? A few pieces make sense to me. But a lot of it looks random and nonsensical—meaningless formulas full of unreasonably highprecision numbers. Considering all the care put into the making of 2001, this seems like a rare lapse—though perhaps 2001 started the long and somewhat unfortunate tradition of showing meaningless code in movies. (A recent counterexample is my son Christopher’s alienlanguageanalysis code for Arrival, which is actual Wolfram Language code that genuinely makes the visualizations shown.)
But would it actually make sense to show any form of code on real displays like the ones in 2001? After all, the astronauts aren’t supposed to be building the spacecraft; they’re only operating it. But here’s a place where the future is only just now arriving. During most of the history of computing, code has been something that humans write, and computers read. But one of my goals with the Wolfram Language is to create a true computational communication language that is highlevel enough that not only computers, but also humans, can usefully read.
Yes, one might be able to describe in words some procedure that a spacecraft is executing. But one of the points of the Wolfram Language is to be able to state the procedure in a form that directly fits in with human computational thinking. So, yes, on the first real manned spacecraft going to Jupiter, it’ll make perfect sense to display code, though it won’t look quite like what’s in 2001.
I’ve watched 2001 several times over the years, though not specifically in the year 2001 (that year for me was dominated by finishing my magnum opus A New Kind of Science). But there are several very obvious things in the movie 2001 that don’t ring true for the real year 2001—quite beyond the very different state of space travel.
One of the most obvious is that the haircuts and clothing styles and general formality look wrong. Of course these would have been very hard to predict. But perhaps one could at least have anticipated (given the hippie movement etc.) that clothing styles and so on would get less formal. But back in 1968, I certainly remember for example getting dressed up even to go on an airplane.
Another thing that today doesn’t look right in the movie is that nobody has a personal computer. Of course, back in 1968 there were still only a few thousand computers in the whole world—each weighing at least some significant fraction of a ton—and basically nobody imagined that one day individual people would have computers, and be able to carry them around.
As it happens, back in 1968 I’d recently been given a little plastic kit mechanical computer (called DigiComp I) that could (very laboriously) do 3digit binary operations. But I think it’s fair to say that I had absolutely no grasp of how this could scale up to something like the computers in 2001. And indeed when I saw 2001 I imagined that to have access to technology like I saw in the movie, I’d have to be joining something like NASA when I was grown up.
What of course I didn’t foresee—and I’m not sure anyone did—is that consumer electronics would become so small and cheap. And that access to computers and computation would therefore become so ubiquitous.
In the movie, there’s a sequence where the astronauts are trying to troubleshoot a piece of electronics. Lots of nice computeraided, engineeringstyle displays come up. But they’re all of printed circuit boards with discrete components. There are no integrated circuits or microprocessors—which isn’t surprising, because in 1968 these basically hadn’t been invented yet. (Correctly, there aren’t vacuum tubes, though. Apparently the actual prop used—at least for exterior views—was a gyroscope.)
It’s interesting to see all sorts of little features of technology that weren’t predicted in the movie. For example, when they’re taking commemorative pictures in front of the monolith on the Moon, the photographer keeps tipping the camera after each shot—presumably to advance the film inside. The idea of digital cameras that could electronically take pictures simply hadn’t been imagined then.
In the history of technology, there are certain things that just seem inevitable—even though sometimes they may take decades to finally arrive. An example are videophones. There were early ones even back in the 1930s. And there were attempts to consumerize them in the 1970s and 1980s. But even by the 1990s they were still exotic—though I remember that with some effort I successfully rented a pair of them in 1993—and they worked OK, even over regular phone lines.
On the space station in 2001, there’s a Picturephone shown, complete with an AT&T logo—though it’s the old Bell System logo that looks like an actual bell. And as it happens, when 2001 was being made, there was a real project at AT&T called the Picturephone.
Of course, in 2001 the Picturephone isn’t a cellphone or a mobile device. It’s a builtin object, in a kiosk—a pay Picturephone. In the actual course of history, though, the rise of cellphones occurred before the consumerization of videochat—so payphone and videochat technology basically never overlapped.
Also interesting in 2001 is that the Picturephone is a pushbutton phone, with exactly the same numeric button layout as today (though without the * and # [“octothorp”]). Pushbutton phones actually already existed in 1968, although they were not yet widely deployed. And, of course, because of the details of our technology today, when one actually does a videochat, I don’t know of any scenario in which one ends up pushing mechanical buttons.
There’s a long list of instructions printed on the Picturephone—but in actuality, just like today, its operation seems quite straightforward. Back in 1968, though, even direct longdistance dialing (without an operator) was fairly new—and wasn’t yet possible at all between different countries.
To use the Picturephone in 2001, one inserts a credit card. Credit cards had existed for a while even in 1968, though they were not terribly widely used. The idea of automatically reading credit cards (say, using a magnetic stripe) had actually been developed in 1960, but it didn’t become common until the 1980s. (I remember that in the mid1970s in the UK, when I got my first ATM card, it consisted simply of a piece of plastic with holes like a punched card—not the most secure setup one can imagine.)
At the end of the Picturephone call in 2001, there’s a charge displayed: $1.70. Correcting for inflation, that would be about $12 today. By the standards of modern cellphones—or internet videochatting—that’s very expensive. But for a presentday satellite phone, it’s not so far off, even for an audio call. (Today’s handheld satphones can’t actually support the necessary data rates for videocalls, and networks on planes still struggle to handle videocalls.)
On the space shuttle (or, perhaps better, space plane) the cabin looks very much like a modern airplane—which probably isn’t surprising, because things like Boeing 737s already existed in 1968. But in a correct (at least for now) modern touch, the seat backs have TVs—controlled, of course, by a row of buttons. (And there’s also futuristicforthe1960s programming, like a televised women’s judo match.)
A curious filmschoollike fact about 2001 is that essentially every major scene in the movie (except the ones centered on HAL) shows the consumption of food. But how would food be delivered in the year 2001? Well, like everything else, it was assumed that it would be more automated, with the result that in the movie a variety of elaborate food dispensers are shown. As it’s turned out, however, at least for now, food delivery is something that’s kept humans firmly in the loop (think McDonald’s, Starbucks, etc.).
In the part of the movie concerned with going to Jupiter, there are “hibernaculum pods” shown—with people inside in hibernation. And above these pods there are vitalsign displays, that look very much like modern ICU displays. In a sense, that was not such a stretch of a prediction, because even in 1968, there had already been oscilloscopestyle EKG displays for some time.
Of course, how to put people into hibernation isn’t something that’s yet been figured out in real life. That it—and cryonics—should be possible has been predicted for perhaps a century. And my guess is that—like cloning or gene editing—to do it will take inventing some clever tricks. But in the end I expect it will pretty much seem like a historical accident in which year it’s figured out. It just so happens not to have happened yet.
There’s a scene in 2001 where one of the characters arrives on the space station and goes through some kind of immigration control (called “Documentation”)—perhaps imagined to be set up as some kind of extension to the Outer Space Treaty from 1967. But what’s particularly notable in the movie is that the clearance process is handled automatically, using biometrics, or specifically, voiceprint identification. (The US insignia displayed are identical to the ones on today’s US passports, but in typical pre1980s form, the system asks for “surname” and “Christian name”.)
There had been primitive voice recognition systems even in the 1950s (“what digit is that?”), and the idea of identifying speakers by voice was certainly known. But what was surely not obvious is that serious voice systems would need the kind of computer processing power that only became available in the late 2000s.
And in just the last few years, automatic biometric immigration control systems have started to become common at airports—though using face and sometimes fingerprint recognition rather than voice. (Yes, it probably wouldn’t work well to have lots of people talking at different kiosks at the same time.)
In the movie, the kiosk has buttons for different languages: English, Dutch, Russian, French, Italian, Japanese. It would have been very hard to predict what a more appropriate list for 2001 might have been.
Even though 1968 was still in the middle of the Cold War, the movie correctly portrays international use of the space station—though, like in Antarctica today, it portrays separate moon bases for different countries. Of course, the movie talks about the Soviet Union. But the fact the Berlin Wall would fall 21 years after 1968 isn’t the kind of thing that ever seems predictable in human history.
The movie shows logos from quite a few companies as well. The space shuttle is proudly branded Pan Am. And in at least one scene, its instrument panel has “IBM” in the middle. (There’s also an IBM logo on spacesuit controls during an EVA near Jupiter.) On the space station there are two hotels shown: Hilton and Howard Johnson’s. There’s also a Whirlpool “TV dinner” dispenser in the galley of the spacecraft going to the Moon. And there’s the AT&T (Bell System) Picturephone, as well as an Aeroflot bag, and a BBC newscast. (The channel is “BBC 12”, though in reality the expansion has only been from BBC 2 to BBC 4 in the past 50 years.)
Companies have obviously risen and fallen over the course of 50 years, but it’s interesting how many of the ones featured in the movie still exist, at least in some form. Many of their logos are even almost the same—though AT&T and BBC are two exceptions, and the IBM logo got stripes added in 1972.
It’s also interesting to look at the fonts used in the movie. Some seem quite dated to us today, while others (like the title font) look absolutely modern. But what’s strange is that at times over the past 50 years some of those “modern” fonts would have seemed old and tired. But such, I suppose, is the nature of fashion. And it’s worth remembering that even those “serifed fonts” from stone inscriptions in ancient Rome are perfectly capable of looking sharp and modern.
Something else that’s changed since 1968 is how people talk, and the words they use. The change seems particularly notable in the technospeak. “We are running crosschecking routines to determine reliability of this conclusion” sounds fine for the 1960s, but not so much for today. There’s mention of the risk of “social disorientation” without “adequate preparation and conditioning”, reflecting a kind of behaviorist view of psychology that at least wouldn’t be expressed the same way today.
It’s sort of charming when a character in 2001 says that whenever they “phone” a moon base, they get “a recording which repeats that the phone lines are temporarily out of order”. One might not say something too different about landlines on Earth today, but it feels like with a moon base one should at least be talking about automatically finding out if their network is down, rather than about having a person call on the phone and listen to a recorded message.
Of course, had a character in 2001 talked about “not being able to ping their servers”, or “getting 100% packet loss” it would have been completely incomprehensible to 1960s moviegoers—because those are concepts of a digital world which basically had just not been invented yet (even though the elements for it definitely existed).
The most notable and enduring character from 2001 is surely the HAL 9000 computer, described (with exactly the same words as might be used today) as “the latest in machine intelligence”. HAL talks, lipreads, plays chess, recognizes faces from sketches, comments on artwork, does psychological evaluations, reads from sensors and cameras all over the spaceship, predicts when electronics will fail, and—notably to the plot—shows a variety of humanlike emotional responses.
It might seem remarkable that all these AIlike capabilities would be predicted in the 1960s. But actually, back then, nobody yet thought that AI would be hard to create—and it was widely assumed that before too long computers would be able to do pretty much everything humans can, though probably better and faster and on a larger scale.
But already by the 1970s it was clear that things weren’t going to be so easy, and before long the whole field of AI basically fell into disrepute—with the idea of creating something like HAL beginning to seem as fictional as digging up extraterrestrial artifacts on the Moon.
In the movie, HAL’s birthday is January 12, 1992 (though in the book version of 2001, it was 1997). And in 1997, in Urbana, Illinois, fictional birthplace of HAL (and, also, as it happens, the headquarters location of my company), I went to a celebration of HAL’s fictional birthday. People talked about all sorts of technologies relevant to HAL. But to me the most striking thing was how low the expectations had become. Almost nobody even seemed to want to mention “general AI” (probably for fear of appearing kooky), and instead people were focusing on solving very specific problems, with specific pieces of hardware and software.
Having read plenty of popular science (and some science fiction) in the 1960s, I certainly started from the assumption that one day HALlike AIs would exist. And in fact I remember that in 1972, when I happened to end up delivering a speech to my whole school—and picking the topic of what amounts to AI ethics. I’m afraid that what I said I would now consider naive and misguided (and in fact I was perhaps partly misled by 2001). But, heck, I was only 12 at the time. And what I find interesting today is just that I thought AI was an important topic even back then.
For the remainder of the 1970s I was personally mostly very focused on physics (which, unlike AI, was thriving at the time). AI was still in the back of my mind, though, when for example I wanted to understand how brains might or might not relate to statistical physics and to things like the formation of complexity. But what made AI really important again for me was that in 1981 I had launched my first computer language (SMP) and had seen how successful it was at doing mathematical and scientific computations—and I got to wondering what it would take to do computations about (and know about) everything.
My immediate assumption was that it would require full brainlike capabilities, and therefore general AI. But having just lived through so many advances in physics, this didn’t immediately faze me. And in fact, I even had a fairly specific plan. You see, SMP—like the Wolfram Language today—was fundamentally based on the idea of defining transformations to apply when expressions match particular patterns. I always viewed this as a rough idealization of certain forms of human thinking. And what I thought was that general AI might effectively just require adding a way to match not just precise patterns, but also approximate ones (e.g. “that’s a picture of an elephant, even though its pixels aren’t exactly the same as in the sample”).
I tried a variety of schemes for doing this, one of them being neural nets. But somehow I could never formulate experiments that were simple enough to even have a clear definition of success. But by making simplifications to neural nets and a couple of other kinds of systems, I ended up coming up with cellular automata—which quickly allowed me to make some discoveries that started me on my long journey of studying the computational universe of simple programs, and made me set aside approximate pattern matching and the problem of AI.
At the time of HAL’s fictional birthday in 1997, I was actually right in the middle of my intense 10year process of exploring the computational universe and writing A New Kind of Science—and it was only out of my great respect for 2001 that I agreed to break out of being a hermit for a day and talk about HAL.
It so happened that just three weeks before there had been the news of the successful cloning of Dolly the sheep.
And, as I pointed out, just like general AI, people had discussed cloning mammals for ages. But it had been assumed to be impossible, and almost nobody had worked on it—until the success with Dolly. I wasn’t sure what kind of discovery or insight would lead to progress in AI. But I felt certain that eventually it would come.
Meanwhile, from my study of the computational universe, I’d formulated my Principle of Computational Equivalence—which had important things to say about artificial intelligence. And at some level, what it said is that there isn’t some magic “bright line” that separates the “intelligent” from the merely computational.
Emboldened by this—and with the Wolfram Language as a tool—I then started thinking again about my quest to solve the problem of computational knowledge. It certainly wasn’t an easy thing. But after quite a few years of work, in 2009, there it was: WolframAlpha—a general computational knowledge engine with a lot of knowledge about the world. And particularly after WolframAlpha was integrated with voice input and voice output in things like Siri, it started to seem in many ways quite HALlike.
HAL in the movie had some more tricks, though. Of course he had specific knowledge about the spacecraft he was running—a bit like the custom Enterprise WolframAlpha systems that now exist at various large corporations. But he had other capabilities too—like being able to do visual recognition tasks.
And as computer science developed, such things had hardened into tough nuts that basically “computers just can’t do”. To be fair, there was lots of practical progress in things like OCR for text, and face recognition. But it didn’t feel general. And then in 2012, there was a surprise: a trained neural net was suddenly discovered to perform really well on standard image recognition tasks.
It was a strange situation. Neural nets had first been discussed in the 1940s, and had seen several rounds of waxing and waning enthusiasm over the decades. But suddenly just a few years ago they really started working. And a whole bunch of “HALlike tasks” that had seemed out of range suddenly began to seem achievable.
In 2001, there’s the idea that HAL wasn’t just “programmed”, but somehow “learned”. And in fact HAL mentions at one point that HAL had a (human) teacher. And perhaps the gap between HAL’s creation in 1992 and deployment in 2001 was intended to correspond to HAL’s humanlike period of education. (Arthur C. Clarke probably changed the birth year to 1997 for the book because he thought that a 9yearold computer would be obsolete.)
But the most important thing that’s made modern machine learning systems actually start to work is precisely that they haven’t been trained at humantype rates. Instead, they’ve immediately been fed millions or billions of example inputs—and then they’ve been expected to burn huge amounts of CPU time systematically finding what amount to progressively better fits to those examples. (It’s conceivable that an “active learning” machine could be set up to basically find the examples it needs within a humanschoolroomlike environment, but this isn’t how the most important successes in current machine learning have been achieved.)
So can machines now do what HAL does in the movie? Unlike a lot of the tasks presumably needed to run an actual spaceship, most of the tasks the movie concentrates on HAL doing are ones that seem quintessentially human. And most of these turn out to be wellsuited to modern machine learning—and month by month more and more of them have now been successfully tackled.
But what about knitting all these tasks together, to make a “complete HAL”? One could conceivably imagine having some giant neural net, and “training it for all aspects of life”. But this doesn’t seem like a good way to do things. After all, if we’re doing celestial mechanics to work out the trajectory of a spacecraft, we don’t have to do it by matching examples; we can do it by actual calculation, using the achievements of mathematical science.
We need our HAL to be able to know about a lot of kinds of things, and to be able to compute about a lot of kinds of things, including ones that involve humanlike recognition and judgement.
In the book version of 2001, the name HAL was said to stand for “Heuristically programmed ALgorithmic computer”. And the way Arthur C. Clarke explained it is that this was supposed to mean “it can work on a program that’s already set up, or it can look around for better solutions and you get the best of both worlds”.
And at least in some vague sense, this is actually a pretty good description of what I’ve built over the past 30 years as the Wolfram Language. The “programs that are already set up” happen to try to encompass a lot of the systematic knowledge about computation and about the world that our civilization has accumulated.
But there’s also the concept of searching for new programs. And actually the science that I’ve done has led me to do a lot of work searching for programs in the computational universe of all possible programs. We’ve had many successes in finding useful programs that way, although the process is not as systematic as one might like.
In recent years, the Wolfram Language has also incorporated modern machine learning—in which one is effectively also searching for programs, though in a restricted domain defined for example by weights in a neural network, and constructed so that incremental improvement is possible.
Could we now build a HAL with the Wolfram Language? I think we could at least get close. It seems well within range to be able to talk to HAL in natural language about all sorts of relevant things, and to have HAL use knowledgebased computation to control and figure out things about the spaceship (including, for example, simulating components of it).
The “computer as everyday conversation companion” side of things is less well developed, not least because it’s not as clear what the objective might be there. But it’s certainly my hope that in the next few years—in part to support applications like computational smart contracts (and yes, it would have been good to have one of those set up for HAL)—that things like my symbolic discourse language project will provide a general framework for doing this.
Do computers “make mistakes”? When the first electronic computers were made in the 1940s and 1950s, the big issue was whether the hardware in them was reliable. Did the electrical signals do what they were supposed to, or did they get disrupted, say because a moth (“bug”) flew inside the computer?
By the time mainframe computers were developed in the early 1960s, such hardware issues were pretty well under control. And so in some sense one could say (and marketing material did) that computers were “perfectly reliable”.
HAL reflects this sentiment in 2001. “The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.”
From a modern point of view, saying this kind of thing seems absurd. After all, everyone knows that computer systems—or, more specifically, software systems—inevitably have bugs. But in 1968, bugs weren’t really understood.
After all, computers were supposed to be perfect, logical machines. And so, the thinking went, they must operate in a perfect way. And if anything went wrong, it must, as HAL says in the movie, “be attributable to human error”. Or, in other words, that if the human were smart and careful enough, the computer would always “do the right thing”.
When Alan Turing did his original theoretical work in 1936 to show that universal computers could exist, he did it by writing what amounts to a program for his proposed universal Turing machine. And even in this very first program (which is only a page long), it turns out that there were already bugs.
But, OK, one might say, with enough effort, surely one can get rid of any possible bug. Well, here’s the problem: to do so requires effectively foreseeing every aspect of what one’s program could ever do. But in a sense, if one were able to do that, one almost doesn’t need the program in the first place.
And actually, pretty much any program that’s doing nontrivial things is likely to show what I call computational irreducibility, which implies that there’s no way to systematically shortcut what the program does. To find out what it does, there’s basically no choice but just to run it and watch what it does. Sometimes this might be seen like a desirable feature—for example if one’s setting up a cryptocurrency that one wants to take irreducible effort to mine.
And, actually, if there isn’t computational irreducibility in a computation, then it’s a sign that the computation isn’t being done as efficiently as it could be.
What is a bug? One might define it as a program doing something one doesn’t want. So maybe we want the pattern on the left created by a very simple program to never die out. But the point is that there may be no way in anything less than an infinite time to answer the “halting problem” of whether it can in fact die out. So, in other words, figuring out if the program “has a bug” and does something one doesn’t want may be infinitely hard.
And of course we know that bugs are not just a theoretical problem; they exist in all largescale practical software. And unless HAL only does things that are so simple that we foresee every aspect of them, it’s basically inevitable that HAL will exhibit bugs.
But maybe, one might think, HAL could at least be given some overall directives—like “be nice to humans”, or other potential principles of AI ethics. But here’s the problem: given any precise specification, it’s inevitable that there will unintended consequences. One might say these are “bugs in the specification”, but the problem is they’re inevitable. When computational irreducibility is present, there’s basically never any finite specification that can avoid any conceivable “unintended consequence”.
Or, said in terms of 2001, it’s inevitable that HAL will be capable of exhibiting unexpected behavior. It’s just a consequence of being a system that does sophisticated computation. It lets HAL “show creativity” and “take initiative”. But it also means HAL’s behavior can’t ever be completely predicted.
The basic theoretical underpinnings to know this already existed in the 1950s or even earlier. But it took experience with actual complex computer systems in the 1970s and 1980s for intuition about bugs to develop. And it took my explorations of the computational universe in the 1980s and 1990s to make it clear how ubiquitous the phenomenon of computational irreducibility actually is, and how much it affects basically any sufficiently broad specification.
It’s interesting to see what the makers of 2001 got wrong about the future, but it’s impressive how much they got right. So how did they do it? Well, between Stanley Kubrick and Arthur C. Clarke (and their “scientific consultant” Fred Ordway III), they solicited input from a fair fraction of the top technology companies of the day—and (though there’s nothing in the movie credits about them) received a surprising amount of detailed information about the plans and aspirations of these companies, along with quite a few designs custommade for the movie as a kind of product placement.
In the very first space scene in the movie, for example, one sees an assortment of differently shaped spacecraft, that were based on concept designs from the likes of Boeing, Grumman and General Dynamics, as well as NASA. (In the movie, there are no aerospace manufacturer logos—and NASA also doesn’t get a mention; instead the assorted spacecraft carry the flags of various countries.)
But so where did the notion of having an intelligent computer come from? I don’t think it had an external source. I think it was just an idea that was very much “in the air” at the time. My late friend Marvin Minsky, who was one of the pioneers of AI in the 1960s, visited the set of 2001 during its filming. But Kubrick apparently didn’t ask him about AI; instead he asked about things like computer graphics, the naturalness of computer voices, and robotics. (Marvin claims to have suggested the configuration of arms that was used for the pods on the Jupiter spacecraft.)
But what about the details of HAL? Where did those come from? The answer is that they came from IBM.
IBM was at the time by far the world’s largest computer company, and it also conveniently happened to be headquartered in New York City, which is where Kubrick and Clarke were doing their work. IBM—as now—was always working on advanced concepts that they could demo. They worked on voice recognition. They worked on image recognition. They worked on computer chess. In fact, they worked on pretty much all the specific technical features of HAL shown in 2001. Many of these features are even shown in the “Information Machine” movie IBM made for the 1964 World’s Fair in New York City (though, curiously, that movie has a dynamic multiwindow form of presentation that wasn’t adopted for HAL).
In 1964, IBM had proudly introduced their System/360 mainframe computers:
And the rhetoric about HAL having a flawless operational record could almost be out of IBM’s marketing material for the 360. And of course HAL was physically big—like a mainframe computer (actually even big enough that a person could go inside the computer). But there was one thing about HAL that was very nonIBM. Back then, IBM always strenuously avoided ever saying that computers could themselves be smart; they just emphasized that computers would do what people told them to. (Somewhat ironically, the internal slogan that IBM used for its employees was “Think”. It took until the 1980s for IBM to start talking about computers as smart—and for example in 1980 when my friend Greg Chaitin was advising the thenhead of research at IBM he was told it was deliberate policy not to pursue AI, because IBM didn’t want its human customers to fear they might be replaced by AIs.)
An interesting letter from 1966 surfaced recently. In it, Kubrick asks one of his producers (a certain Roger Caras, who later became well known as a wildlife TV personality): “Does I.B.M. know that one of the main themes of the story is a psychotic computer?”. Kubrick is concerned that they will feel “swindled”. The producer writes back, talking about IBM as “the technical advisor for the computer”, and saying that IBM will be OK so long as they are “not associated with the equipment failure by name”.
But was HAL supposed to be an IBM computer? The IBM logo appears a couple of times in the movie, but not on HAL. Instead, HAL has a nameplate that looks like this:
It’s certainly interesting that the blue is quite like IBM’s characteristic “big blue” blue. It’s also very curious that if you go one step forward in the alphabet from the letters H A L, you get I B M. Arthur C. Clarke always claimed this was a coincidence, and it probably was. But my guess is that at some point, that blue part of HAL’s nameplate was going to say “IBM”.
Like some other companies, IBM was fond of naming its products with numbers. And it’s interesting to look at what numbers they used. In the 1960s, there were a lot of 3 and 4digit numbers starting with 3’s and 7’s, including a whole 7000 series, etc. But, rather curiously, there was not a single one starting with 9: there was no IBM 9000 series. In fact, IBM didn’t have a single product whose name started with 9 until the 1990s. And I suspect that was due to HAL.
By the way, the IBM liaison for the movie was their head of PR, C. C. Hollister, who was interviewed in 1964 by the New York Times about why IBM—unlike its competitors—ran general advertising (think Super Bowl), given that only a thin stratum of corporate executives actually made purchasing decisions about computers. He responded that their ads were “designed to reach… the articulators or the 8 million to 10 million people that influence opinion on all levels of the nation’s life” (today one would say “opinion makers”, not “articulators”).
He then added “It is important that important people understand what a computer is and what it can do.” And in some sense, that’s what HAL did, though not in the way Hollister might have expected.
OK, so now we know—at least over the span of 50 years—what happened to the predictions from 2001, and in effect how science fiction did (or did not) turn into science fact. So what does this tell us about predictions we might make today?
In my observation things break into three basic categories. First, there are things people have been talking about for years, that will eventually happen—though it’s not clear when. Second, there are surprises that basically nobody expects, though sometimes in retrospect they may seem somewhat obvious. And third, there are things people talk about, but that potentially just won’t ever be possible in our universe, given how its physics works.
Something people have talked about for ages, that surely will eventually happen, is routine space travel. When 2001 was released, no humans had ever ventured beyond Earth orbit. But even by the very the next year, they’d landed on the Moon. And 2001 made what might have seemed like a reasonable prediction that by the year 2001 people would routinely be traveling to the Moon, and would be able to get as far as Jupiter.
Now of course in reality this didn’t happen. But actually it probably could have, if it had been considered a sufficient priority. But there just wasn’t the motivation for it. Yes, space has always been more broadly popular than, say, ocean exploration. But it didn’t seem important enough to put the necessary resources into.
Will it ever happen? I think it’s basically a certainty. But will it take 5 years or 50? It’s very hard to tell—though based on recent developments I would guess about halfway between.
People have been talking about space travel for well over a hundred years. They’ve been talking about what’s now called AI for even longer. And, yes, at times there’ve been arguments about how some feature of human intelligence is so fundamentally special that AI will never capture it. But I think it’s pretty clear at this point that AI is on an inexorable path to reproduce any and all features of whatever we would call intelligence.
A more mundane example of what one might call “inexorable technology development” is videophones. Once one had phones and one had television, it was sort of inevitable that eventually one would have videophones. And, yes, there were prototypes in the 1960s. But for detailed reasons of computer and telecom capacity and cost, videophone technology didn’t really become broadly available for a few more decades. But it was basically inevitable that it eventually would.
In science fiction, basically ever since radio was invented, it was common to imagine that in the future everyone would be able to communicate through radio instantly. And, yes, it took the better part of a century. But eventually we got cellphones. And in time we got smartphones that could serve as magic maps, and magic mirrors, and much more.
An example that’s today still at an earlier stage in its development is virtual reality. I remember back in the 1980s trying out early VR systems. But back then, they never really caught on. But I think it’s basically inevitable that they eventually will. Perhaps it will require having video that’s at the same quality level as human vision (as audio has now been for a couple of decades). And whether it’s exactly VR, or instead augmented reality, that eventually becomes widespread is not clear. But something like that surely will. Though exactly when is not clear.
There are endless examples one can cite. People have been talking about selfdriving cars since at least the 1960s. And eventually they will exist. People have talked about flying cars for even longer. Maybe helicopters could have gone in this direction, but for detailed reasons of control and reliability that didn’t work out. Maybe modern drones will solve the problem. But again, eventually there will be flying cars. It’s just not clear exactly when.
Similarly, there will eventually be robotics everywhere. I have to say that this is something I’ve been hearing will “soon happen” for more than 50 years, and progress has been remarkably slow. But my guess is that once it’s finally figured out how to really do “generalpurpose robotics”—like we can do generalpurpose computation—things will advance very quickly.
And actually there’s a theme that’s very clear over the past 50+ years: what once required the creation of special devices is eventually possible by programming something that is general purpose. In other words, instead of relying on the structure of physical devices, one builds up capabilities using computation.
What is the end point of this? Basically it’s that eventually everything will be programmable right down to atomic scales. In other words, instead of specifically constructing computers, we’ll basically build everything “out of computers”. To me, this seems like an inevitable outcome. Though it happens to be one that hasn’t yet been much discussed, or, say, explored in science fiction.
Returning to more mundane examples, there are other things that will surely be possible one day, like drilling into the Earth’s mantle, or having cities under the ocean (both subjects of science fiction in the past—and there’s even an ad for a “Pan Am Underwater Hotel” visible on the space station in 2001). But whether these kinds of things will be considered worth doing is not so clear. Bringing back dinosaurs? It’ll surely be possible to get a good approximation to their DNA. How long all the necessary bioscience developments will take I don’t know, but one day one will surely be able to have a live stegosaurus again.
Perhaps one of the oldest “science fiction” ideas ever is immortality. And, yes, human lifespans have been increasing. But will there come a point where humans can for practical purposes be immortal? I am quite certain that there will. Quite whether the path will be primarily biological, or primarily digital, or some combination involving molecularscale technology, I do not know. And quite what it will all mean, given the inevitable presence of an infinite number of possible bugs (today’s “medical conditions”), I am not sure. But I consider it a certainty that eventually the old idea of human immortality will become a reality. (Curiously, Kubrick—who was something of an enthusiast for things like cryonics—said in an interview in 1968 that one of the things he thought might have happened by the year 2001 is the “elimination of old age”.)
So what’s an example of something that won’t happen? There’s a lot we can’t be sure about without knowing the fundamental theory of physics. (And even given such a theory, computational irreducibility means it can be arbitrarily hard to work out the consequence for some particular issue.) But two decent candidates for things that won’t ever happen are HoneyIShrunktheKids miniaturization and fasterthanlight travel.
Well, at least these things don’t seem likely to happen the way they are typically portrayed in science fiction. But it’s still possible that things that are somehow functionally equivalent will happen. For example, it perfectly well could be possible to “scan an object” at an atomic scale, and then “reinterpret it”, and build up using molecularscale construction at least a very good approximation to it that happens to be much smaller.
What about fasterthanlight travel? Well, maybe one will be able to deform spacetime enough that it’ll effectively be possible. Or conceivably one will be able to use quantum mechanics to effectively achieve it. But these kinds of solutions assume that what one cares about are things happening directly in our physical universe.
But imagine that in the future everyone has effectively been “uploaded” into some digital system—so that the “physics” one’s experiencing is instead something virtualized. And, yes, at the level of the underlying hardware maybe there will be restrictions based on the speed of light. But for purposes of the virtualized experience, there’ll be no such constraint. And, yes, in a setup like this, one can also imagine another science fiction favorite: time travel (notwithstanding its many philosophical issues).
OK, so what about surprises? If we look at the world today, compared to 50 years ago, it’s easy to identify some surprises. Computers are far more ubiquitous than almost anyone expected. And there are things like the web, and social media, that weren’t really imagined (even though perhaps in retrospect they seem “obvious”).
There’s another surprise, whose consequences are so far much less well understood, but that I’ve personally been very involved with: the fact that there’s so much complexity and richness to be found in the computational universe.
Almost by definition, “surprises” tend to occur when understanding what’s possible, or what makes sense, requires a change of thinking, or some kind of “paradigm shift”. Often in retrospect one imagines that such changes of thinking just occur—say in the mind of one particular person—out of the blue. But in reality what’s almost always going on is that there’s a progressive stack of understanding developed—which, perhaps quite suddenly, allows one to see something new.
And in this regard it’s interesting to reflect on the storyline of 2001. The first part of the movie shows an alien artifact—a black monolith—that appears in the world of our ape ancestors, and starts the process that leads to modern civilization. Maybe the monolith is supposed to communicate critical ideas to the apes by some kind of telepathic transmission.
But I like to have another interpretation. No ape 4 million years ago had ever seen a perfect black monolith, with a precise geometrical shape. But as soon as they saw one, they could tell that something they had never imagined was possible. And the result was that their worldview was forever changed. And—a bit like the emergence of modern science as a result of Galileo seeing the moons of Jupiter—that’s what allowed them to begin constructing what became modern civilization.
When I first saw 2001 fifty years ago nobody knew whether there would turn out to be life on Mars. People didn’t expect large animals or anything. But lichens or microorganisms seemed, if anything, more likely than not.
With radio telescopes coming online, and humans just beginning to venture out into space, it also seemed quite likely that before long we’d find evidence of extraterrestrial intelligence. But in general people seemed neither particularly excited, or particularly concerned, about this prospect. Yes, there would be mention of the time when a radio broadcast of H. G. Wells’s War of the Worlds story was thought to be a real alien invasion in New Jersey. But 20 or so years after the end of World War II, people were much more concerned about the ongoing Cold War, and what seemed like the real possibility that the world would imminently blow itself up in a giant nuclear conflagration.
The seed for what became 2001 was a rather nice 1951 short story by Arthur C. Clarke called “The Sentinel” about a mysterious pyramid discovered on the Moon, left there before life emerged on Earth, and finally broken open by humans using nuclear weapons, but found to have contents that were incomprehensible. Kubrick and Clarke worried that before 2001 was released, their story might have been overtaken by the actual discovery of extraterrestrial intelligence (and they even explored taking out insurance against this possibility).
But as it is, 2001 became basically the first serious movie exploration of what the discovery of extraterrestrial intelligence might be like. As I’ve recently discussed at length, deciding in the abstract whether or not something was really “produced by intelligence” is a philosophically deeply challenging problem. But at least in the world as it is today, we have a pretty good heuristic: things that look geometrically simpler (with straight edges, circles, etc.) are probably artifacts. Of course, at some level it’s a bit embarrassing that nature seems to quite effortlessly make things that look more complex than what we typically produce, even with all our engineering prowess. And, as I’ve argued elsewhere, as we learn to take advantage of more of the computational universe, this will no doubt change. But at least for now, the “if it’s geometrically simple, it’s probably an artifact” heuristic works quite well.
And in 2001 we see it in action—when the perfectly cuboidal black monolith appears on the 4millionyearold Earth: it’s visually very obvious that it isn’t something that belongs, and that it’s something that was presumably deliberately constructed.
A little later in the movie, another black monolith is discovered on the Moon. It’s noticed because of what’s called in the movie the “Tycho Magnetic Anomaly” (“TMA1”)—probably named by Kubrick and Clarke after the South Atlantic Anomaly associated with the Earth’s radiation belts, that was discovered in 1958. The magnetic anomaly could have been natural (“a magnetic rock”, as one of the characters says). But once it’s excavated and found to be a perfect black cuboidal monolith, extraterrestrial intelligence seems the only plausible origin.
As I’ve discussed elsewhere, it’s hard to even recognize intelligence that doesn’t have any historical or cultural connection to our own. And it’s essentially inevitable that this kind of alien intelligence will seem to us in many ways incomprehensible. (It’s a curious question, though, what would happen if the alien intelligence had already inserted itself into the distant past of our own history, as in 2001.)
Kubrick and Clarke at first assumed that they’d have to actually show extraterrestrials somewhere in the movie. And they worried about things like how many legs they might have. But in the end Kubrick decided that the only alien that had the degree of impact and mystery that he wanted was an alien one never actually saw.
And so, for the last 17% of 2001, after Dave Bowman goes through the “star gate” near Jupiter, one sees what was probably supposed to be purposefully incomprehensible—if aesthetically interesting. Are these scenes of the natural world elsewhere in the universe? Or are these artifacts created by some advanced civilization?
We see some regular geometric structures, that read to us like artifacts. And we see what appear to be more fluid or organic forms, that do not. For just a few frames there are seven strange flashing octahedra.
I’m pretty sure I never noticed these when I first saw 2001 fifty years ago. But in 1997, when I studied the movie in connection with HAL’s birthday, I’d been thinking for years about the origins of complexity, and about the differences between natural and artificial systems—so the octahedra jumped out at me (and, yes, I spent quite a while wrangling the LaserDisc version of 2001 I had back then to try to look at them more carefully).
I didn’t know what the octahedra were supposed to be. With their regular flashing, I at first assumed they were meant to be some kind of space beacons. But I’m told that actually they were supposed to be the extraterrestrials themselves, appearing in a little cameo. Apparently there’d been an earlier version of the script in which the octahedra wound up riding in a ticker tape parade in New York City—but I think the cameo was a better idea.
When Kubrick was interviewed about 2001, he gave an interesting theory for the extraterrestrials: “They may have progressed from biological species, which are fragile shells for the mind at best, into immortal machine entities—and then, over innumerable eons, they could emerge from the chrysalis of matter transformed into beings of pure energy and spirit. Their potentialities would be limitless and their intelligence ungraspable by humans.”
It’s interesting to see Kubrick grappling with the idea that minds and intelligence don’t have to have physical form. Of course, in HAL he’d already in a sense imagined a “nonphysical mind”. But back in the 1960s, with the idea of software only just emerging, there wasn’t yet a clear notion that computation could be something meaningful in its own right, independent of the particulars of its “hardware” implementation.
That universal computation was possible had arisen as an essentially mathematical idea in the 1930s. But did it have physical implications? In the 1980s I started talking about things like computational irreducibility, and about some of the deep connections between universal computation and physics. But back in the 1950s, people looked for much more direct implications of universal computation. And one of the notable ideas that emerged was of “universal constructors”—that would somehow be able to construct anything, just as universal computers could compute anything.
In 1952—as part of his attempt to “mathematicize” biology—John von Neumann wrote a book about “selfreproducing automata” in which he came up with what amounts to an extremely complicated 2D cellular automaton that can have a configuration that reproduces itself. And of course—as was discovered in 1953—it turns out to be correct that digital information, as encoded in DNA, is what specifies the construction of biological organisms.
But in a sense von Neumann’s efforts were based on the wrong intuition. For he assumed (as I did, before I saw evidence to the contrary) that to make something that has a sophisticated feature like selfreproduction, the thing itself must somehow be correspondingly complicated.
But as I discovered many years later by doing experiments in the computational universe of simple programs, it’s just not true that it takes a complicated system to show complicated behavior: even systems (like cellular automata) with some of the simplest imaginable rules can do it. And indeed, it’s perfectly possible to have systems with very simple rules that show selfreproduction—and in the end selfreproduction doesn’t seem like a terribly special feature at all (think computer code that copies itself, etc.).
But back in the 1950s von Neumann and his followers didn’t know that. And given the enthusiasm for things to do with space, it was inevitable that the idea of “selfreproducing machines” would quickly find its way into notions of selfreproducing space probes (as well as selfreproducing lunar factories, etc.)
I’m not sure if these threads had come together by the time 2001 was made, but certainly by the time of the 2010 sequel, Arthur C. Clarke had decided that the black monoliths were selfreproducing machines. And in a scene reminiscent of the modern idea that AIs, when given the directive to make more paperclips, might turn everything (including humans) into paperclips, the 2010 movie includes black monoliths turning the entire planet of Jupiter into a giant collection of black monoliths.
What are the aliens trying to do in 2001? I think Kubrick recognized that their motivations would be difficult to map onto anything human. Why for example does Dave Bowman wind up in what looks like a LouisXVstyle hotel suite—that’s probably the most timeless humancreated backdrop of the movie (except for the fact that in keeping with 1960s practices, there’s a bathtub but no shower in the suite)?
It’s interesting that 2001 contains both artificial and extraterrestrial intelligence. And it’s interesting that 50 years after 2001 was released, we’re getting more and more comfortable with the idea of artificial intelligence, yet we believe we’ve seen no evidence of extraterrestrial intelligence.
As I’ve argued extensively elsewhere, I think the great challenge of thinking about extraterrestrial intelligence is defining what we might mean by intelligence. It’s very easy for us humans to have the analog of a preCopernican view in which we assume that our intelligence and capabilities are somehow fundamentally special, just like the Earth used to be assumed to be at the center of the universe.
But what my Principle of Computational Equivalence suggests is that in fact we’ll never be able to define anything fundamentally special about our intelligence; what’s special about it is its particular history and connections. Does the weather “have a mind of its own”? Well, based on the Principle of Computational Equivalence I don’t think there’s anything fundamentally different about the computations it’s doing from the ones that go on in our brains.
And similarly, when we look out into the cosmos, it’s easy to see examples of sophisticated computation going on. Of course, we don’t think of the complex processes in a pulsar magnetosphere as “extraterrestrial intelligence”; we just think of them as something “natural”. In the past we might have argued that however complex such a process looks, it’s really somehow fundamentally simpler than human intelligence. But given the Principle of Computational Equivalence we know this isn’t true.
So why don’t we consider a pulsar magnetosphere to be an example of “intelligence”? Well, because in it we don’t recognize anything like our own history, or our own detailed behavior. And as a result, we don’t have a way to connect what it does with purposes that we humans understand.
The computational universe of all possible programs is full of sophisticated computations that aren’t aligned with any existing human purposes. But as we try to develop AI, what we are effectively doing is to mine that computational universe for programs that do things we want done.
Out there in the computational universe, though, there’s an infinite collection of “possible AIs”. And there’s nothing less capable about the ones that we don’t yet choose to use; we just don’t see how they align with things we want.
Artificial intelligence is in a sense the first example of alien intelligence that we’re seeing (yes, there are animals too, but it’s easier to connect with AI). We’re still at the very early stages of getting widespread intuition about AI. But as we understand more about what AI really can be, and how it relates to everything else in the computational universe, I think we’ll get a clearer perspective on the forms intelligence can take.
Will we find extraterrestrial intelligence? Well, in many respects I think we already have. It’s all around us in the universe—doing all kinds of sophisticated computations.
Will there ever be a dramatic moment, like in 2001, where we find extraterrestrial intelligence that’s aligned enough with our own intelligence that we can recognize the perfect black monoliths it makes—even if we can’t figure out their “purpose”? My current suspicion is that it’ll be more “push” than “pull”: instead of seeing something that we suddenly recognize, we’ll instead gradually generalize our notion of intelligence, until we start to be comfortable attributing it not just to ourselves and our AIs, but also to other things in the universe.
When I first saw 2001 I don’t think I ever even calculated how old I’d be in the year 2001. I was always thinking about what the future might be like, but I didn’t internalize actually living through it. Back when I was 8 years old, in 1968, space was my greatest interest, and I made lots of little carefully stapled booklets, full of typewritten text and neatly drawn diagrams. I kept detailed notes on every space probe that was launched, and tried to come up with spacecraft (I wrote it “spacecraft”) designs of my own.
What made me do this? Well, presaging quite a bit that I’ve done in my life, I did it just because I found it personally interesting. I never showed any of it to anyone, and never cared what anyone might think of it. And for nearly 50 years I’ve just had it all stored away. But looking at it again now, I found one unique example of something related to my interests that I did for school: a booklet charmingly titled “The Future”, written when I was 9 or 10 years old, and containing what’s to me now a cringingly embarrassing page of my predictions for the future of space exploration (complete with a nod to 2001):
Fortunately perhaps, I didn’t wait around to find out how wrong these predictions were, and within a couple of years my interest in space had transformed into interests in more foundational fields, first physics and then computation and the study of the computational universe. When I first started using computers around 1972, it was a story of paper tape and teleprinters—far from the flashing screens of 2001.
But I’ve been fortunate enough to live through a time when the computer technology of 2001 went from pure fiction to something close to fact. And I’ve been even more fortunate to have been able to contribute a bit to that.
I’ve often said—in a kind of homage to 2001—that my favorite personal aspiration is to build “alien artifacts”: things that are recognizable once they’re built, but which nobody particularly expected would exist or be possible. I like to think that WolframAlpha is some kind of example—as is what the Wolfram Language has become. And in a sense so have my efforts been in exploring the computational universe.
I never interacted with Stanley Kubrick. But I did interact with Arthur C. Clarke, particularly when my big book A New Kind of Science was being published. (I like to think that the book is big in content, but it is definitely big in size, with 1280 pages, weighing nearly 6 pounds.) Arthur C. Clarke asked for a prepublication copy, which I duly sent, and on March 1, 2002, I received an email from him saying that “A ruptured postman has just staggered away from my front door… Stay tuned…..”.
Then, three days later, I got another piece of mail: “Well, I have <looked> at (almost) every page and am still in a state of shock. Even with computers, I don’t see how you could have done it.” Wow! I actually succeeded in making what seemed to Arthur C. Clarke like an alien artifact!
He offered me a backcover quote for the book: “… Stephen’s magnum opus may be the book of the decade, if not the century. It’s so comprehensive that perhaps he should have called it ‘A New Kind of Universe’, and even those who skip the 1200 pages of (extremely lucid) text will find the computergenerated illustrations fascinating. My friend HAL is very sorry he hadn’t thought of them first…” (In the end Steve Jobs talked me out of having quotes on the book, though, saying “Isaac Newton didn’t have backcover quotes; why do you want them?”)
It’s hard for me to believe it’s been 50 years since I first saw 2001. Not all of 2001 has come true (yet). But for me what was important was that it presented a vision of what might be possible—and an idea of how different the future might be. It helped me set the course of my life to try to define in whatever ways I can what the future will be. And not just waiting for aliens to deliver monoliths, but trying to build some “alien artifacts” myself.
]]>What happens if you take four of today’s most popular buzzwords and string them together? Does the result mean anything? Given that today is April 1 (as well as being Easter Sunday), I thought it’d be fun to explore this. Think of it as an Easter egg… from which something interesting just might hatch. And to make it clear: while I’m fooling around in stringing the buzzwords together, the details of what I’ll say here are perfectly real.
But before we can really launch into talking about the whole string of buzzwords, let’s discuss some of the background to each of the buzzwords on their own.
Saying something is “quantum” sounds very modern. But actually, quantum mechanics is a century old. And over the course of the past century, it’s been central to understanding and calculating lots of things in the physical sciences. But even after a century, “truly quantum” technology hasn’t arrived. Yes, there are things like lasers and MRIs and atomic force microscopes that rely on quantum phenomena, and needed quantum mechanics in order to be invented. But when it comes to the practice of engineering, what’s done is still basically all firmly classical, with nothing quantum about it.
Today, though, there’s a lot of talk about quantum computing, and how it might change everything. I actually worked on quantum computing back in the early 1980s (so, yes, it’s not that recent an idea). And I have to say, I was always a bit skeptical about whether it could ever really work—or whether any “quantum gains” one might get would be counterbalanced by inefficiencies in measuring what was going on.
But in any case, in the past 20 years or so there’s been all sorts of nice theoretical work on formulating the idea of quantum circuits and quantum computing. Lots of things have been done with the Wolfram Language, including an ongoing project of ours to produce a definitive symbolic way of representing quantum computations. But so far, all we can ever do is calculate about quantum computations, because the Wolfram Language itself just runs on ordinary, classical computers.
There are companies that have built what they say are (small) true quantum computers. And actually, we’ve been hoping to hook the Wolfram Language up to them, so we can implement a QuantumEvaluate function. But so far, this hasn’t happened. So I can’t really vouch for what QuantumEvaluate will (or will not) do.
But the big idea is basically this. In ordinary classical physics, one can pretty much say that definite things happen in the world. A billiard ball goes in this direction, or that. But in any particular case, it’s a definite direction. In quantum mechanics, though, the idea is that an electron, say, doesn’t intrinsically go in a particular, definite direction. Instead, it essentially goes in all possible directions, each with a particular amplitude. And it’s only when you insist on measuring where it went that you’ll get a definite answer. And if you do many measurements, you’ll just see probabilities for it to go in each direction.
Well, what quantum computing is trying to do is somehow to make use of the “all possible directions” idea in order to in effect get lots of computations done in parallel. It’s a tricky business, and there are only a few types of problems where the theory’s been worked out—the most famous being integer factoring. And, yes, according to the theory, a big quantum computer should be able to factor a big integer fast enough to make today’s cryptography infrastructure implode. But the only thing anyone so far even claims to have built along these lines is a tiny quantum computer—that definitely can’t yet do anything terribly interesting.
But, OK, so one critical aspect of quantum mechanics is that there can be interference between different paths that, say, an electron can take. This is mathematically similar to the interference that happens in light, or even in water waves, just in classical physics. In quantum mechanics, though, there’s supposed to be something much more intrinsic about the interference, leading to the phenomenon of entanglement, in which one basically can’t ever “see the wave that’s interfering”—only the effect.
In computing, though, we’re not making use of any kind of interference yet. Because (at least in modern times) we’re always trying to deal with discrete bits—while the typical phenomenon of interference (say in light) basically involves continuous numbers. And my personal guess is that optical computing—which will surely come—will succeed in delivering some spectacular speedups. It won’t be truly “quantum”, though (though it might be marketed like that). (For the technically minded, it’s a complicated question how computationtheoretic results apply to continuous processes like interferencebased computing.)
A decade ago computers didn’t have any systematic way to tell whether a picture was of an elephant or a teacup. But in the past five years, thanks to neural networks, this has basically become easy. (Interestingly, the image identifier we made three years ago remains basically state of the art.)
So what’s the big idea? Well, back in the 1940s people started thinking seriously about the brain being like an electrical machine. And this led to mathematical models of “neural networks”—which were proved to be equivalent in computational power to mathematical models of digital computers. Over the years that followed, billions of actual digital electronic computers were built. And along the way, people (including me) experimented with neural networks, but nobody could get them to do anything terribly interesting. (Though for years they were quietly used for things like optical character recognition.)
But then, starting in 2012, a lot of people suddenly got very excited, because it seemed like neural nets were finally able to do some very interesting things, at first especially in connection with images.
So what happened? Well, a neural net basically corresponds to a big mathematical function, formed by connecting together lots of smaller functions, each involving a certain number of parameters (“weights”). At the outset, the big function basically just gives random outputs. But the way the function is set up, it’s possible to “train the neural net” by tuning the parameters inside it so that the function will give the outputs one wants.
It’s not like ordinary programming where one explicitly defines the steps a computer should follow. Instead, the idea is just to give examples of what one wants the neural net to do, and then to expect it to interpolate between them to work out what to do for any particular input. In practice one might show a bunch of images of elephants, and a bunch of images of teacups, and then do millions of little updates to the parameters to get the network to output “elephant” when it’s fed an elephant, and “teacup” when it’s fed a teacup.
But here’s the crucial idea: the neural net is somehow supposed to generalize from the specific examples it’s shown—and it’s supposed to say that anything that’s “like” an elephant example is an elephant, even if its particular pixels are quite different. Or, said another way, there are lots of images that might be fed to the network that are in the “basin of attraction” for “elephant” as opposed to “teacup”. In a mechanical analogy, one might say that there are lots of places water might fall on a landscape, while still ending up flowing to one lake rather than another.
At some level, any sufficiently complicated neural net can in principle be trained to do anything. But what’s become clear is that for lots of practical tasks (that turn out to overlap rather well with some of what our brains seem to do easily) it’s realistic with feasible amounts of GPU time to actually train neural networks with a few million elements to do useful things. And, yes, in the Wolfram Language we’ve now got a rather sophisticated symbolic framework for training and using neural networks—with a lot of automation (that itself uses neural nets) for everything.
The word “blockchain” was first used in connection with the invention of Bitcoin in 2008. But of course the idea of a blockchain had precursors. In its simplest form, a blockchain is like a ledger, in which successive entries are coded in a way that depends on all previous entries.
Crucial to making this work is the concept of hashing. Hashing has always been one of my favorite practical computation ideas (and I even independently came up with it when I was about 13 years old, in 1973). What hashing does is to take some piece of data, like a text string, and make a number (say between 1 and a million) out of it. It does this by “grinding up the data” using some complicated function that always gives the same result for the same input, but will almost always give different results for different inputs. There’s a function called Hash in the Wolfram Language, and for example applying it to the previous paragraph of text gives 8643827914633641131.
OK, but so how does this relate to blockchain? Well, back in the 1980s people invented “cryptographic hashes” (and actually they’re very related to things I’ve done on computational irreducibility). A cryptographic hash has the feature that while it’s easy to work out the hash for a particular piece of data, it’s very hard to find a piece of data that will generate a given hash.
So let’s say you want to prove that you created a particular document at a particular time. Well, you could compute a hash of that document, and publish it in a newspaper (and I believe Bell Labs actually used to do this every week back in the 1980s). And then if anyone ever says “no, you didn’t have that document yet” on a certain date, you can just say “but look, its hash was already in every copy of the newspaper!”.
The idea of a blockchain is that one has a series of blocks, with each containing certain content, together with a hash. And then the point is that the data from which that hash is computed is a combination of the content of the block, and the hash of the preceding block. So this means that each block in effect confirms everything that came before it on the blockchain.
In cryptocurrencies like Bitcoin the big idea is to be able to validate transactions, and, for example, be able to guarantee just by looking at the blockchain that nobody has spent the same bitcoin twice.
How does one know that the blocks are added correctly, with all their hashes computed, etc.? Well, the point is that there’s a whole decentralized network of thousands of computers around the world that store the blockchain, and there are lots of people (well, actually not so many in practice these days) competing to be the one to add each new block (and include transactions people have submitted that they want in it).
The rules are (more or less) that the first person to add a block gets to keep the fees offered on the transactions in it. But each block gets “confirmed” by lots of people including this block in their copy of the blockchain, and then continuing to add to the blockchain with this block in it.
In the latest version of the Wolfram Language, BlockchainBlockData[−1, BlockchainBase > "Bitcoin"] gives a symbolic representation of the latest block that we’ve seen be added to the Bitcoin blockchain. And by the time maybe 5 more blocks have been added, we can be pretty sure everyone’s satisfied that the block is correct. (Yes, there’s an analogy with measurement in quantum mechanics here, which I’ll be talking about soon.)
Traditionally, when people keep ledgers, say of transactions, they’ll have one central place where a master ledger is maintained. But with a blockchain the whole thing can be distributed, so you don’t have to trust any single entity to keep the ledger correct.
And that’s led to the idea that cryptocurrencies like Bitcoin can flourish without central control, governments or banks involved. And in the last couple of years there’s been lots of excitement generated by people making large amounts of money speculating on cryptocurrencies.
But currencies aren’t the only thing one can use blockchains for, and Ethereum pioneered the idea that in addition to transactions, one can run arbitrary computations at each node. Right now with Ethereum the results of each computation are confirmed by being run on every single computer in the network, which is incredibly inefficient. But the bigger point is just that computations can be running autonomously on the network. And the computations can interact with each other, defining “smart contracts” that run autonomously, and say what should happen in different circumstances.
Pretty much any nontrivial smart contract will eventually need to know about something in the world (“did it rain today?”, “did the package arrive?”, etc.), and that has to come from off the blockchain—from an “oracle”. And it so happens (yes, as a result of a few decades of work) that our Wolfram Knowledgebase, which powers WolframAlpha, etc., provides the only realistic foundation today for making such oracles.
Back in the 1950s, people thought that pretty much anything human intelligence could do, it’d soon be possible to make artificial (machine) intelligence do better. Of course, this turned out to be much harder than people expected. And in fact the whole concept of “creating artificial intelligence” pretty much fell into disrepute, with almost nobody wanting to market their systems as “doing AI”.
But about five years ago—particularly with the unexpected successes in neural networks—all that changed, and AI was back, and cooler than ever.
What is AI supposed to be, though? Well, in the big picture I see it as being the continuation of a long trend of automating things that humans previously had to do for themselves—and in particular doing that through computation. But what makes a computation an example of AI, and not just, well, a computation?
I’ve built a whole scientific and philosophical structure around something I call the Principle of Computational Equivalence, that basically says that the universe of possible computations—even done by simple systems—is full of computations that are as sophisticated as one can ever get, and certainly as our brains can do.
In doing engineering, and in building programs, though, there’s been a tremendous tendency to try to prevent anything too sophisticated from happening—and to set things up so that the systems we build just follow exactly steps we can foresee. But there’s much more to computation than that, and in fact I’ve spent much of my life building systems that make use of this.
WolframAlpha is a great example. Its goal is to take as much knowledge about the world as possible, and make it computable, then to be able to answer questions as expertly as possible about it. Experientially, it “feels like AI”, because you get to ask it questions in natural language like a human, then it computes answers often with unexpected sophistication.
Most of what’s inside WolframAlpha doesn’t work anything like brains probably do, not least because it’s leveraging the last few hundred years of formalism that our civilization has developed, that allow us to be much more systematic than brains naturally are.
Some of the things modern neural nets do (and, for example, our machine learning system in the Wolfram Language does) perhaps work a little more like brains. But in practice what really seems to make things “seem like AI” is just that they’re operating on the basis of sophisticated computations whose behavior we can’t readily understand.
These days the way I see it is that out in the computational universe there’s amazing computational power. And the issue is just to be able to harness that for useful human purposes. Yes, “an AI” can go off and do all sorts of computations that are just as sophisticated as our brains. But the issue is: can we align what it does with things we care about doing?
And, yes, I’ve spent a large part of my life building the Wolfram Language, whose purpose is to provide a computational communication language in which humans can express what they want in a form suitable for computation. There’s lots of “AI power” out there in the computational universe; our challenge is to harness it in a way that’s useful to us.
Oh, and we want to have some kind of computational smart contracts that define how we want the AIs to behave (e.g. “be nice to humans”). And, yes, I think the Wolfram Language is going to be the right way to express those things, and build up the “AI constitutions” we want.
At the outset, it might seem as if “quantum”, “neural”, “blockchain” and “AI” are all quite separate concepts, without a lot of commonality. But actually it turns out that there are some amazing common themes.
One of the strongest has to do with complexity generation. And in fact, in their different ways, all the things we’re talking about rely on complexity generation.
What do I mean by complexity generation? One day I won’t have to explain this. But for now I probably still do. And somehow I find myself always showing the same picture—of my alltime favorite science discovery, the rule 30 automaton. Here it is:
And the point here is that even though the rule (or program) is very simple, the behavior of the system just spontaneously generates complexity, and apparent randomness. And what happens is complicated enough that it shows what I call “computational irreducibility”, so that you can’t reduce the computational work needed to see how it will behave: you essentially just have to follow each step to find out what will happen.
There are all sorts of important phenomena that revolve around complexity generation and computational irreducibility. The most obvious is just the fact that sophisticated computation is easy to get—which is in a sense what makes something like AI possible.
But OK, how does this relate to blockchain? Well, complexity generation is what makes cryptographic hashing possible. It’s what allows a simple algorithm to make enough apparent randomness to successfully be used as a cryptographic hash.
In the case of something like Bitcoin, there’s another connection too: the protocol needs people to have to make some investment to be able to add blocks to the blockchain, and the way this is achieved is (bizarrely enough) by forcing them to do irreducible computations that effectively cost computer time.
What about neural nets? Well, the very simplest neural nets don’t involve much complexity at all. If one drew out their “basins of attraction” for different inputs, they’d just be simple polygons. But in useful neural nets the basins of attraction are much more complicated.
It’s most obvious when one gets to recurrent neural nets, but it happens in the training process for any neural net: there’s a computational process that effectively generates complexity as a way to approximate things like the distinctions (“elephant” vs. “teacup”) that get made in the world.
Alright, so what about quantum mechanics? Well, quantum mechanics is at some level full of randomness. It’s essentially an axiom of the traditional mathematical formalism of quantum mechanics that one can only compute probabilities, and that there’s no way to “see under the randomness”.
I personally happen to think it’s pretty likely that that’s just an approximation, and that if one could get “underneath” things like space and time, we’d see how the randomness actually gets generated.
But even in the standard formalism of quantum mechanics, there’s a kind of complementary place where randomness and complexity generation is important, and it’s in the somewhat mysterious process of measurement.
Let’s start off by talking about another phenomenon in physics: the Second Law of Thermodynamics, or Law of Entropy Increase. This law says that if you start, for example, a bunch of gas molecules in a very orderly configuration (say all in one corner of a box), then with overwhelming probability they’ll soon randomize (and e.g. spread out randomly all over the box). And, yes, this kind of trend towards randomness is something we see all the time.
But here’s the strange part: if we look at the laws for, say, the motion of individual gas molecules, they’re completely reversible—so just as they say that the molecules can randomize themselves, so also they say that they should be able to unrandomize themselves.
But why do we never see that happen? It’s always been a bit mysterious, but I think there’s a clear answer, and it’s related to complexity generation and computational irreducibility. The point is that when the gas molecules randomize themselves, they’re effectively encrypting the initial conditions they were given.
It’s not impossible to place the gas molecules so they’ll unrandomize rather than randomize; it’s just that to work out how to do this effectively requires breaking the encryption—or in essence doing something very much like what’s involved in Bitcoin mining.
OK, so how does this relate to quantum mechanics? Well, quantum mechanics itself is fundamentally based on probability amplitudes, and interference between different things that can happen. But our experience of the world is that definite things happen. And the bridge from quantum mechanics to this involves the rather “boltedon” idea of quantum measurement.
The notion is that some little quantum effect (“the electron ends up with spin up, rather than down”) needs to get amplified to the point where one can really be sure what happened. In other words, one’s measuring device has to make sure that the little quantum effect associated with one electron cascades so that it’s spread across lots and lots of electrons and other things.
And here’s the tricky part: if one wants to avoid interference being possible (so we can really perceive something “definite” as having happened), then one needs to have enough randomness that things can’t somehow equally well go backwards—just like in thermodynamics.
So even though pure quantum circuits as one imagines them for practical quantum computers typically have a sufficiently simple mathematical structure that they (presumably) don’t intrinsically generate complexity, the process of measuring what they do inevitably must generate complexity. (And, yes, it’s a reasonable question whether that’s in some sense where the randomness one sees “really” comes from… but that’s a different story.)
Reversibility and irreversibility are a strangely common theme, at least between “quantum”, “neural” and “blockchain”. If one ignores measurement, a fundamental feature of quantum mechanics is that it’s reversible. What this means is that if one takes a quantum system, and lets it evolve in time, then whatever comes out one will always, at least in principle, be able to take and run backwards, to precisely reproduce where one started from.
Typical computation isn’t reversible like that. Consider an OR gate, that might be a basic component in a computer. In p OR q, the result will be true if either p or q is true. But just knowing that the result is “true”, you can’t figure out which of p and q (or both) is true. In other words, the OR operation is irreversible: it doesn’t preserve enough information for you to invert it.
In quantum circuits, one uses gates that, say, take two inputs (say p and q), and give two outputs (say p' and q'). And from those two outputs one can always uniquely reproduce the two inputs.
OK, but now let’s talk about neural nets. Neural nets as they’re usually conceived are fundamentally irreversible. Here’s why. Imagine (again) that you make a neural network to distinguish elephants and teacups. To make that work, a very large number of different possible input images all have to map, say, to “elephant”. It’s like the OR gate, but more so. Just knowing the result is “elephant” there’s no unique way to invert the computation. And that’s the whole point: one wants anything that’s enough like the elephant pictures one showed to still come out as “elephant”; in other words, irreversibility is central to the whole operation of at least this kind of neural net.
So, OK, then how could one possibly make a quantum neural net? Maybe it’s just not possible. But if so, then what’s going on with brains? Because brains seem to work very much like neural nets. And yet brains are physical systems that presumably follow quantum mechanics. So then how are brains possible?
At some level the answer has to do with the fact that brains dissipate heat. Well, what is heat? Microscopically, heat is the random motion of things like molecules. And one way to state the Second Law of Thermodynamics (or the Law of Entropy Increase) is that under normal circumstances those random motions never spontaneously organize themselves into any kind of systematic motion. In principle all those molecules could start moving in just such a way as to turn a flywheel. But in practice nothing like that ever happens. The heat just stays as heat, and doesn’t spontaneously turn into macroscopic mechanical motion.
OK, but so let’s imagine that microscopic processes involving, say, collisions of molecules, are precisely reversible—as in fact they are according to quantum mechanics. Then the point is that when lots of molecules are involved, their motions can get so “encrypted” that they just seem random. If one could look at all the details, there’d still be enough information to reverse everything. But in practice one can’t do that, and so it seems like whatever was going on in the system has just “turned into heat”.
So then what about producing “neural net behavior”? Well, the point is that while one part of a system is, say, systematically “deciding to say elephant”, the detailed information that would be needed to go back to the initial state is getting randomized, and turning into heat.
To be fair, though, this is glossing over quite a bit. And in fact I don’t think anyone knows how one can actually set up a quantum system (say a quantum circuit) that behaves in this kind of way. It’d be pretty interesting to do so, because it’d potentially tell us a lot about the quantum measurement process.
To explain how one goes from quantum mechanics in which everything is just an amplitude, to our experience of the world in which definite things seem to happen, people sometimes end up trying to appeal to mystical features of consciousness. But the point about a quantum neural net is that it’s quantum mechanical, yet it “comes to definite conclusions” (e.g. elephant vs. teacup).
Is there a good toy model for such a thing? I suspect one could create one from a quantum version of a cellular automaton that shows phase transition behavior—actually not unlike the detailed mechanics of a real quantum magnetic material. And what will be necessary is that the system has enough components (say spins) that the “heat” needed to compensate for its apparent irreversible behavior will stay away from the part where the irreversible behavior is observed.
Let me make a perhaps slightly confusing side remark. When people talk about “quantum computers”, they are usually talking about quantum circuits that operate on qubits (quantum analog of binary bits). But sometimes they actually mean something different: they mean quantum annealing devices.
Imagine you’ve got a bunch of dominoes and you’re trying to arrange them on the plane so that some matching condition associated with the markings on them is always satisfied. It turns out this can be a very hard problem. It’s related to computational irreducibility (and perhaps to problems like integer factoring). But in the end, to find out, say, the configuration that does best in satisfying the matching condition everywhere, one may effectively have to essentially just try out all possible configurations, and see which one works best.
Well, OK, but let’s imagine that the dominoes were actually molecules, and the matching condition corresponds to arranging molecules to minimize energy. Then the problem of finding the best overall configuration is like the problem of finding the minimum energy configuration for the molecules, which physically should correspond to the most stable solid structure that can be formed from the molecules.
And, OK, it might be hard to compute that. But what about an actual physical system? What will the molecules in it actually do when one cools it down? If it’s easy for the molecules to get to the lowest energy configuration, they’ll just do it, and one will have a nice crystalline solid.
People sometimes assume that “the physics will always figure it out”, and that even if the problem is computationally hard, the molecules will always find the optimal solution. But I don’t think this is actually true—and I think what instead will happen is that the material will turn mushy, not quite liquid and not quite solid, at least for a long time.
Still, there’s the idea that if one sets up this energy minimization problem quantum mechanically, then the physical system will be successful at finding the lowest energy state. And, yes, in quantum mechanics it might be harder to get stuck in local minima, because there is tunneling, etc.
But here’s the confusing part: when one trains a neural net, one ends up having to effectively solve minimization problems like the one I’ve described (“which values of weights make the network minimize the error in its output relative to what one wants?”). So people end up sometimes talking about “quantum neural nets”, meaning dominolike arrays which are set up to have energy minimization problems that are mathematically equivalent to the ones for neural nets.
(Yet another connection is that convolutional neural nets—of the kind used for example in image recognition—are structured very much like cellular automata, or like dynamic spin systems. But in training neural nets to handle multiscale features in images, one seems to end up with scale invariance similar to what one sees at critical points in spin systems, or their quantum analogs, as analyzed by renormalization group methods.)
OK, but let’s return to our whole buzzword string. What about blockchain? Well, one of the big points about a blockchain is in a sense to be as irreversible as possible. Once something has been added to a blockchain, one wants it to be inconceivable that it should ever be reversed out.
How is that achieved? Well, it’s curiously similar to how it works in thermodynamics or in quantum measurement. Imagine someone adds a block to their copy of a blockchain. Well, then the idea is that lots of other people all over the world will make their own copies of that block on their own blockchain nodes, and then go on independently adding more blocks from there.
Bad things would happen if lots of the people maintaining blockchain nodes decided to collude to not add a block, or to modify it, etc. But it’s a bit like with gas molecules (or degrees of freedom in quantum measurement). By the time everything is spread out among enough different components, it’s extremely unlikely that it’ll all concentrate together again to have some systematic effect.
Of course, people might not be quite like gas molecules (though, frankly, their observed aggregate behavior, e.g. jostling around in a crowd, is often strikingly similar). But all sorts of things in the world seem to depend on an assumption of randomness. And indeed, that’s probably necessary to maintain stability and robustness in markets where trading is happening.
OK, so when a blockchain tries to ensure that there’s a “definite history”, it’s doing something very similar to what a quantum measurement has to do. But just to close the loop a little more, let’s ask what a quantum blockchain might be like.
Yes, one could imagine using quantum computing to somehow break the cryptography in a standard blockchain. But the more interesting (and in my view, realistic) possibility is to make the actual operation of the blockchain itself be quantum mechanical.
In a typical blockchain, there’s a certain element of arbitrariness in how blocks get added, and who gets to do it. In a “proof of work” scheme (as used in Bitcoin and currently also Ethereum), to find out how to add a new block one searches for a “nonce”—a number to throw in to make a hash come out in a certain way. There are always many possible nonces (though each one is hard to find), and the typical strategy is to search randomly for them, successively testing each candidate.
But one could imagine a quantum version in which one is in effect searching in parallel for all possible nonces, and as a result producing many possible blockchains, each with a certain quantum amplitude. And to fill out the concept, imagine that—for example in the case of Ethereum—all computations done on the blockchain were reversible quantum ones (achieved, say, with a quantum version of the Ethereum Virtual Machine).
But what would one do with such a blockchain? Yes, it would be an interesting quantum system with all kinds of dynamics. But to actually connect it to the world, one has get data on and off the blockchain—or, in other words, one has to do a measurement. And the act of that measurement would in effect force the blockchain to pick a definite history.
OK, so what about a “neural blockchain”? At least today, by far the most common strategy with neural nets is first to train them, then to put them to work. (One can train them “passively” by just feeding them a fixed set of examples, or one can train them “actively” by having them in effect “ask” for the examples they want.) But by analogy with people, neural nets can also have “lifelong learning”, in which they’re continually getting updated based on the “experiences” they’re having.
So how do the neural nets record these experiences? Well, by changing various internal weights. And in some ways what happens is like what happens with blockchains.
Science fiction sometimes talks about direct braintobrain transfer of memories. And in a neural net context this might mean just taking a big block of weights from one neural net and putting it into another. And, yes, it can work well to transfer definite layers in one network to another (say to transfer information on what features of images are worth picking out). But if you try to insert a “memory” deep inside a network, it’s a different story. Because the way a memory is represented in a network will depend on the whole history of the network.
It’s like in a blockchain: you can’t just replace one block and expect everything else to work. The whole thing has been knitted into the sequence of things that happen through time. And it’s the same thing with memories in neural nets: once a memory has formed in a certain way, subsequent memories will be built on top of this one.
At the outset, one might have thought that “quantum”, “neural” and “blockchain” (not to mention “AI”) didn’t have much in common (other than that they’re current buzzwords)—and that in fact they might in some sense be incompatible. But what we’ve seen is that actually there are all sorts of connections between them, and all sorts of fundamental phenomena that are shared between systems based on them.
So what might a “quantum neural blockchain AI” (“QNBAI”) be like?
Let’s look at the pieces again. A single blockchain node is a bit like a single brain, with a definite memory. But in a sense the whole blockchain network becomes robust through all the interactions between different blockchain nodes. It’s a little like how human society and human knowledge develop.
Let’s say we’ve got a “raw AI” that can do all sorts of computation. Well, the big issue is whether we can find a way to align what it can do with things that we humans think we want to do. And to make that alignment, we essentially have to communicate with the AI at a level of abstraction that transcends the details of how it works: in effect, we have to have some symbolic language that we both understand, and that for example AI can translate into the details of how it operates.
Inside the AI it may end up using all kinds of “concepts” (say to distinguish one class of images from another). But the question is whether those concepts are ones that we humans in a sense “culturally understand”. In other words, are those concepts (and, for example, the words for them) ones that there’s a whole widely understood story about?
In a sense, concepts that we humans find useful for communication are ones that have been used in all sorts of interactions between different humans. The concepts become robust by being “knitted into” the thought patterns of many interacting brains, a bit like the data put on a blockchain becomes a robust part of “collective blockchain memory” through the interactions between blockchain nodes.
OK, so there’s something strange here. At first it seemed like QNBAIs would have to be something completely exotic and unfamiliar (and perhaps impossible). But somehow as we go over their features they start to seem awfully familiar—and actually awfully like us.
Yup, according to the physics, we know we are “quantum”. Neural nets capture many core features of how our brains seem to work. Blockchain—at least as a general concept—is somehow related to individual and societal memory. And AI, well, AI in effect tries to capture what’s aligned with human goals and intelligence in the computational universe—which is also what we’re doing.
OK, so what’s the closest thing we know to a QNBAI? Well, it’s probably all of us!
Maybe that sounds crazy. I mean, why should a string of buzzwords from 2018 connect like that? Well, at some level perhaps there’s an obvious answer: we tend to create and study things that are relevant to us, and somehow revolve around us. And, more than that, the buzzwords of today are things that are somehow just within the scope that we can now think about with the concepts we’ve currently developed–and that are somehow connected through them.
I must say that when I chose these buzzwords I had no idea they’d connect at all. But as I’ve tried to work through things in writing this, it’s been remarkable how much connection I’ve found. And, yes, in a fittingly bizarre end to a somewhat bizarre journey, it does seem to be the case that a string plucked from today’s buzzword universe has landed very close to home. And maybe in the end—at least in some sense—we are our buzzwords!
]]>Starting now, in celebration of its 15th anniversary, A New Kind of Science will be freely available in its entirety, with highresolution images, on the web or for download.
It’s now 15 years since I published my book A New Kind of Science—more than 25 since I started writing it, and more than 35 since I started working towards it. But with every passing year I feel I understand more about what the book is really about—and why it’s important. I wrote the book, as its title suggests, to contribute to the progress of science. But as the years have gone by, I’ve realized that the core of what’s in the book actually goes far beyond science—into many areas that will be increasingly important in defining our whole future.
So, viewed from a distance of 15 years, what is the book really about? At its core, it’s about something profoundly abstract: the theory of all possible theories, or the universe of all possible universes. But for me one of the achievements of the book is the realization that one can explore such fundamental things concretely—by doing actual experiments in the computational universe of possible programs. And in the end the book is full of what might at first seem like quite alien pictures made just by running very simple such programs.
Back in 1980, when I made my living as a theoretical physicist, if you’d asked me what I thought simple programs would do, I expect I would have said “not much”. I had been very interested in the kind of complexity one sees in nature, but I thought—like a typical reductionistic scientist—that the key to understanding it must lie in figuring out detailed features of the underlying component parts.
In retrospect I consider it incredibly lucky that all those years ago I happened to have the right interests and the right skills to actually try what is in a sense the most basic experiment in the computational universe: to systematically take a sequence of the simplest possible programs, and run them.
I could tell as soon as I did this that there were interesting things going on, but it took a couple more years before I began to really appreciate the force of what I’d seen. For me it all started with one picture:
Or, in modern form:
I call it rule 30. It’s my alltime favorite discovery, and today I carry it around everywhere on my business cards. What is it? It’s one of the simplest programs one can imagine. It operates on rows of black and white cells, starting from a single black cell, and then repeatedly applies the rules at the bottom. And the crucial point is that even though those rules are by any measure extremely simple, the pattern that emerges is not.
It’s a crucial—and utterly unexpected—feature of the computational universe: that even among the very simplest programs, it’s easy to get immensely complex behavior. It took me a solid decade to understand just how broad this phenomenon is. It doesn’t just happen in programs (“cellular automata”) like rule 30. It basically shows up whenever you start enumerating possible rules or possible programs whose behavior isn’t obviously trivial.
Similar phenomena had actually been seen for centuries in things like the digits of pi and the distribution of primes—but they were basically just viewed as curiosities, and not as signs of something profoundly important. It’s been nearly 35 years since I first saw what happens in rule 30, and with every passing year I feel I come to understand more clearly and deeply what its significance is.
Four centuries ago it was the discovery of the moons of Jupiter and their regularities that sowed the seeds for modern exact science, and for the modern scientific approach to thinking. Could my little rule 30 now be the seed for another such intellectual revolution, and a new way of thinking about everything?
In some ways I might personally prefer not to take responsibility for shepherding such ideas (“paradigm shifts” are hard and thankless work). And certainly for years I have just quietly used such ideas to develop technology and my own thinking. But as computation and AI become increasingly central to our world, I think it’s important that the implications of what’s out there in the computational universe be more widely understood.
Here’s the way I see it today. From observing the moons of Jupiter we came away with the idea that—if looked at right—the universe is an ordered and regular place, that we can ultimately understand. But now, in exploring the computational universe, we quickly come upon things like rule 30 where even the simplest rules seem to lead to irreducibly complex behavior.
One of the big ideas of A New Kind of Science is what I call the Principle of Computational Equivalence. The first step is to think of every process—whether it’s happening with black and white squares, or in physics, or inside our brains—as a computation that somehow transforms input to output. What the Principle of Computational Equivalence says is that above an extremely low threshold, all processes correspond to computations of equivalent sophistication.
It might not be true. It might be that something like rule 30 corresponds to a fundamentally simpler computation than the fluid dynamics of a hurricane, or the processes in my brain as I write this. But what the Principle of Computational Equivalence says is that in fact all these things are computationally equivalent.
It’s a very important statement, with many deep implications. For one thing, it implies what I call computational irreducibility. If something like rule 30 is doing a computation just as sophisticated as our brains or our mathematics, then there’s no way we can “outrun” it: to figure out what it will do, we have to do an irreducible amount of computation, effectively tracing each of its steps.
The mathematical tradition in exact science has emphasized the idea of predicting the behavior of systems by doing things like solving mathematical equations. But what computational irreducibility implies is that out in the computational universe that often won’t work, and instead the only way forward is just to explicitly run a computation to simulate the behavior of the system.
One of the things I did in A New Kind of Science was to show how simple programs can serve as models for the essential features of all sorts of physical, biological and other systems. Back when the book appeared, some people were skeptical about this. And indeed at that time there was a 300year unbroken tradition that serious models in science should be based on mathematical equations.
But in the past 15 years something remarkable has happened. For now, when new models are created—whether of animal patterns or web browsing behavior—they are overwhelmingly more often based on programs than on mathematical equations.
Year by year, it’s been a slow, almost silent, process. But by this point, it’s a dramatic shift. Three centuries ago pure philosophical reasoning was supplanted by mathematical equations. Now in these few short years, equations have been largely supplanted by programs. For now, it’s mostly been something practical and pragmatic: the models work better, and are more useful.
But when it comes to understanding the foundations of what’s going on, one’s led not to things like mathematical theorems and calculus, but instead to ideas like the Principle of Computational Equivalence. Traditional mathematicsbased ways of thinking have made concepts like force and momentum ubiquitous in the way we talk about the world. But now as we think in fundamentally computational terms we have to start talking in terms of concepts like undecidability and computational irreducibility.
Will some type of tumor always stop growing in some particular model? It might be undecidable. Is there a way to work out how a weather system will develop? It might be computationally irreducible.
These concepts are pretty important when it comes to understanding not only what can and cannot be modeled, but also what can and cannot be controlled in the world. Computational irreducibility in economics is going to limit what can be globally controlled. Computational irreducibility in biology is going to limit how generally effective therapies can be—and make highly personalized medicine a fundamental necessity.
And through ideas like the Principle of Computational Equivalence we can start to discuss just what it is that allows nature—seemingly so effortlessly—to generate so much that seems so complex to us. Or how even deterministic underlying rules can lead to computationally irreducible behavior that for all practical purposes can seem to show “free will”.
A central lesson of A New Kind of Science is that there’s a lot of incredible richness out there in the computational universe. And one reason that’s important is that it means that there’s a lot of incredible stuff out there for us to “mine” and harness for our purposes.
Want to automatically make an interesting custom piece of art? Just start looking at simple programs and automatically pick out one you like—as in our WolframTones music site from a decade ago. Want to find an optimal algorithm for something? Just search enough programs out there, and you’ll find one.
We’ve normally been used to creating things by building them up, step by step, with human effort—progressively creating architectural plans, or engineering drawings, or lines of code. But the discovery that there’s so much richness so easily accessible in the computational universe suggests a different approach: don’t try building anything; just define what you want, and then search for it in the computational universe.
Sometimes it’s really easy to find. Like let’s say you want to generate apparent randomness. Well, then just enumerate cellular automata (as I did in 1984), and very quickly you come upon rule 30—which turns out to be one of the very best known generators of apparent randomness (look down the center column of cell values, for examples). In other situations you might have to search 100,000 cases (as I did in finding the simplest axiom system for logic, or the simplest universal Turing machine), or you might have to search millions or even trillions of cases. But in the past 25 years, we’ve had incredible success in just discovering algorithms out there in the computational universe—and we rely on many of them in implementing the Wolfram Language.
At some level it’s quite sobering. One finds some tiny program out in the computational universe. One can tell it does what one wants. But when one looks at what it’s doing, one doesn’t have any real idea how it works. Maybe one can analyze some part—and be struck by how “clever” it is. But there just isn’t a way for us to understand the whole thing; it’s not something familiar from our usual patterns of thinking.
Of course, we’ve often had similar experiences before—when we use things from nature. We may notice that some particular substance is a useful drug or a great chemical catalyst, but we may have no idea why. But in doing engineering and in most of our modern efforts to build technology, the great emphasis has instead been on constructing things whose design and operation we can readily understand.
In the past we might have thought that was enough. But what our explorations of the computational universe show is that it’s not: selecting only things whose operation we can readily understand misses most of the immense power and richness that’s out there in the computational universe.
What will the world look like when more of what we have is mined from the computational universe? Today the environment we build for ourselves is dominated by things like simple shapes and repetitive processes. But the more we use what’s out there in the computational universe, the less regular things will look. Sometimes they may look a bit “organic”, or like what we see in nature (since after all, nature follows similar kinds of rules). But sometimes they may look quite random, until perhaps suddenly and incomprehensibly they achieve something we recognize.
For several millennia we as a civilization have been on a path to understand more about what happens in our world—whether by using science to decode nature, or by creating our own environment through technology. But to use more of the richness of the computational universe we must at least to some extent forsake this path.
In the past, we somehow counted on the idea that between our brains and the tools we could create we would always have fundamentally greater computational power than the things around us—and as a result we would always be able to “understand” them. But what the Principle of Computational Equivalence says is that this isn’t true: out in the computational universe there are lots of things just as powerful as our brains or the tools we build. And as soon as we start using those things, we lose the “edge” we thought we had.
Today we still imagine we can identify discrete “bugs” in programs. But most of what’s powerful out there in the computational universe is rife with computational irreducibility—so the only real way to see what it does is just to run it and watch what happens.
We ourselves, as biological systems, are a great example of computation happening at a molecular scale—and we are no doubt rife with computational irreducibility (which is, at some fundamental level, why medicine is hard). I suppose it’s a tradeoff: we could limit our technology to consist only of things whose operation we understand. But then we would miss all that richness that’s out there in the computational universe. And we wouldn’t even be able to match the achievements of our own biology in the technology we create.
There’s a common pattern I’ve noticed with intellectual fields. They go for decades and perhaps centuries with only incremental growth, and then suddenly, usually as a result of a methodological advance, there’s a burst of “hypergrowth” for perhaps 5 years, in which important new results arrive almost every week.
I was fortunate enough that my own very first field—particle physics—was in its period of hypergrowth right when I was involved in the late 1970s. And for myself, the 1990s felt like a kind of personal period of hypergrowth for what became A New Kind of Science—and indeed that’s why I couldn’t pull myself away from it for more than a decade.
But today, the obvious field in hypergrowth is machine learning, or, more specifically, neural nets. It’s funny for me to see this. I actually worked on neural nets back in 1981, before I started on cellular automata, and several years before I found rule 30. But I never managed to get neural nets to do anything very interesting—and actually I found them too messy and complicated for the fundamental questions I was concerned with.
And so I “simplified them”—and wound up with cellular automata. (I was also inspired by things like the Ising model in statistical physics, etc.) At the outset, I thought I might have simplified too far, and that my little cellular automata would never do anything interesting. But then I found things like rule 30. And I’ve been trying to understand its implications ever since.
In building Mathematica and the Wolfram Language, I’d always kept track of neural nets, and occasionally we’d use them in some small way for some algorithm or another. But about 5 years ago I suddenly started hearing amazing things: that somehow the idea of training neural nets to do sophisticated things was actually working. At first I wasn’t sure. But then we started building neural net capabilities in the Wolfram Language, and finally two years ago we released our ImageIdentify.com website—and now we’ve got our whole symbolic neural net system. And, yes, I’m impressed. There are lots of tasks that had traditionally been viewed as the unique domain of humans, but which now we can routinely do by computer.
But what’s actually going on in a neural net? It’s not really to do with the brain; that was just the inspiration (though in reality the brain probably works more or less the same way). A neural net is really a sequence of functions that operate on arrays of numbers, with each function typically taking quite a few inputs from around the array. It’s not so different from a cellular automaton. Except that in a cellular automaton, one’s usually dealing with, say, just 0s and 1s, not arbitrary numbers like 0.735. And instead of taking inputs from all over the place, in a cellular automaton each step takes inputs only from a very welldefined local region.
Now, to be fair, it’s pretty common to study “convolutional neural nets”, in which the patterns of inputs are very regular, just like in a cellular automaton. And it’s becoming clear that having precise (say 32bit) numbers isn’t critical to the operation of neural nets; one can probably make do with just a few bits.
But a big feature of neural nets is that we know how to make them “learn”. In particular, they have enough features from traditional mathematics (like involving continuous numbers) that techniques like calculus can be applied to provide strategies to make them incrementally change their parameters to “fit their behavior” to whatever training examples they’re given.
It’s far from obvious how much computational effort, or how many training examples, will be needed. But the breakthrough of about five years ago was the discovery that for many important practical problems, what’s available with modern GPUs and modern webcollected training sets can be enough.
Pretty much nobody ends up explicitly setting or “engineering” the parameters in a neural net. Instead, what happens is that they’re found automatically. But unlike with simple programs like cellular automata, where one’s typically enumerating all possibilities, in current neural nets there’s an incremental process, essentially based on calculus, that manages to progressively improve the net—a little like the way biological evolution progressively improves the “fitness” of an organism.
It’s plenty remarkable what comes out from training a neural net in this way, and it’s plenty difficult to understand how the neural net does what it does. But in some sense the neural net isn’t venturing too far across the computational universe: it’s always basically keeping the same basic computational structure, and just changing its behavior by changing parameters.
But to me the success of today’s neural nets is a spectacular endorsement of the power of the computational universe, and another validation of the ideas of A New Kind of Science. Because it shows that out in the computational universe, away from the constraints of explicitly building systems whose detailed behavior one can foresee, there are immediately all sorts of rich and useful things to be found.
Is there a way to bring the full power of the computational universe—and the ideas of A New Kind of Science—to the kinds of things one does with neural nets? I suspect so. And in fact, as the details become clear, I wouldn’t be surprised if exploration of the computational universe saw its own period of hypergrowth: a “mining boom” of perhaps unprecedented proportions.
In current work on neural nets, there’s a definite tradeoff one sees. The more what’s going on inside the neural net is like a simple mathematical function with essentially arithmetic parameters, the easier it is to use ideas from calculus to train the network. But the more what’s going is like a discrete program, or like a computation whose whole structure can change, the more difficult it is to train the network.
It’s worth remembering, though, that the networks we’re routinely training now would have looked utterly impractical to train only a few years ago. It’s effectively just all those quadrillions of GPU operations that we can throw at the problem that makes training feasible. And I won’t be surprised if even quite pedestrian (say, local exhaustive search) techniques will fairly soon let one do significant training even in cases where no incremental numerical approach is possible. And perhaps even it will be possible to invent some major generalization of things like calculus that will operate in the full computational universe. (I have some suspicions, based on thinking about generalizing basic notions of geometry to cover things like cellular automaton rule spaces.)
What would this let one do? Likely it would let one find considerably simpler systems that could achieve particular computational goals. And maybe that would bring within reach some qualitatively new level of operations, perhaps beyond what we’re used to being possible with things like brains.
There’s a funny thing that’s going on with modeling these days. As neural nets become more successful, one begins to wonder: why bother to simulate what’s going on inside a system when one can just make a blackbox model of its output using a neural net? Well, if we manage to get machine learning to reach deeper into the computational universe, we won’t have as much of this tradeoff any more—because we’ll be able to learn models of the mechanism as well as the output.
I’m pretty sure that bringing the full computational universe into the purview of machine learning will have spectacular consequences. But it’s worth realizing that computational universality—and the Principle of Computational Equivalence—make it less a matter of principle. Because they imply that even neural nets of the kinds we have now are universal, and are capable of emulating anything any other system can do. (In fact, this universality result was essentially what launched the whole modern idea of neural nets, back in 1943.)
And as a practical matter, the fact that current neural net primitives are being built into hardware and so on will make them a desirable foundation for actual technology systems, though, even if they’re far from optimal. But my guess is that there are tasks where for the foreseeable future access to the full computational universe will be necessary to make them even vaguely practical.
What will it take to make artificial intelligence? As a kid, I was very interested in figuring out how to make a computer know things, and be able to answer questions from what it knew. And when I studied neural nets in 1981, it was partly in the context of trying to understand how to build such a system. As it happens, I had just developed SMP, which was a forerunner of Mathematica (and ultimately the Wolfram Language)—and which was very much based on symbolic pattern matching (“if you see this, transform it to that”). At the time, though, I imagined that artificial intelligence was somehow a “higher level of computation”, and I didn’t know how to achieve it.
I returned to the problem every so often, and kept putting it off. But then when I was working on A New Kind of Science it struck me: if I’m to take the Principle of Computational Equivalence seriously, then there can’t be any fundamentally “higher level of computation”—so AI must be achievable just with the standard ideas of computation that I already know.
And it was this realization that got me started building WolframAlpha. And, yes, what I found is that lots of those very “AIoriented things”, like natural language understanding, could be done just with “ordinary computation”, without any magic new AI invention. Now, to be fair, part of what was happening was that we were using ideas and methods from A New Kind of Science: we weren’t just engineering everything; we were often searching the computational universe for rules and algorithms to use.
So what about “general AI”? Well, I think at this point that with the tools and understanding we have, we’re in a good position to automate essentially anything we can define. But definition is a more difficult and central issue than we might imagine.
The way I see things at this point is that there’s a lot of computation even near at hand in the computational universe. And it’s powerful computation. As powerful as anything that happens in our brains. But we don’t recognize it as “intelligence” unless it’s aligned with our human goals and purposes.
Ever since I was writing A New Kind of Science, I’ve been fond of quoting the aphorism “the weather has a mind of its own”. It sounds so animistic and prescientific. But what the Principle of Computational Equivalence says is that actually, according to the most modern science, it’s true: the fluid dynamics of the weather is the same in its computational sophistication as the electrical processes that go on in our brains.
But is it “intelligent”? When I talk to people about A New Kind of Science, and about AI, I’ll often get asked when I think we’ll achieve “consciousness” in a machine. Life, intelligence, consciousness: they are all concepts that we have a specific example of, here on Earth. But what are they in general? All life on Earth shares RNA and the structure of cell membranes. But surely that’s just because all life we know is part of one connected thread of history; it’s not that such details are fundamental to the very concept of life.
And so it is with intelligence. We have only one example we’re sure of: us humans. (We’re not even sure about animals.) But human intelligence as we experience it is deeply entangled with human civilization, human culture and ultimately also human physiology—even though none of these details are presumably relevant in the abstract definition of intelligence.
We might think about extraterrestrial intelligence. But what the Principle of Computational Equivalence implies is that actually there’s “alien intelligence” all around us. But somehow it’s just not quite aligned with human intelligence. We might look at rule 30, for example, and be able to see that it’s doing sophisticated computation, just like our brains. But somehow it just doesn’t seem to have any “point” to what it’s doing.
We imagine that in doing the things we humans do, we operate with certain goals or purposes. But rule 30, for example, just seems to be doing what it’s doing—just following some definite rule. In the end, though, one realizes we’re not so very different. After all, there are definite laws of nature that govern our brains. So anything we do is at some level just playing out those laws.
Any process can actually be described either in terms of mechanism (“the stone is moving according to Newton’s laws”), or in terms of goals (“the stone is moving so as to minimize potential energy”). The description in terms of mechanism is usually what’s most useful in connecting with science. But the description in terms of goals is usually what’s most useful in connecting with human intelligence.
And this is crucial in thinking about AI. We know we can have computational systems whose operations are as sophisticated as anything. But can we get them to do things that are aligned with human goals and purposes?
In a sense this is what I now view as the key problem of AI: it’s not about achieving underlying computational sophistication, but instead it’s about communicating what we want from this computation.
I’ve spent much of my life as a computer language designer—most importantly creating what is now the Wolfram Language. I’d always seen my role as a language designer being to imagine the possible computations people might want to do, then—like a reductionist scientist—trying to “drill down” to find good primitives from which all these computations could be built up. But somehow from A New Kind of Science, and from thinking about AI, I’ve come to think about it a little differently.
Now what I more see myself as doing is making a bridge between our patterns of human thinking, and what the computational universe is capable of. There are all sorts of amazing things that can in principle be done by computation. But what the language does is to provide a way for us humans to express what we want done, or want to achieve—and then to get this actually executed, as automatically as possible.
Language design has to start from what we know and are familiar with. In the Wolfram Language, we name the builtin primitives with English words, leveraging the meanings that those words have acquired. But the Wolfram Language is not like natural language. It’s something more structured, and more powerful. It’s based on the words and concepts that we’re familiar with through the shared corpus of human knowledge. But it gives us a way to build up arbitrarily sophisticated programs that in effect express arbitrarily complex goals.
Yes, the computational universe is capable of remarkable things. But they’re not necessarily things that we humans can describe or relate to. But in building the Wolfram Language my goal is to do the best I can in capturing everything we humans want—and being able to express it in executable computational terms.
When we look at the computational universe, it’s hard not to be struck by the limitations of what we know how to describe or think about. Modern neural nets provide an interesting example. For the ImageIdentify function of the Wolfram Language we’ve trained a neural net to identify thousands of kinds of things in the world. And to cater to our human purposes, what the network ultimately does is to describe what it sees in terms of concepts that we can name with words—tables, chairs, elephants, etc.
But internally what the network is doing is to identify a series of features of any object in the world. Is it green? Is it round? And so on. And what happens as the neural network is trained is that it identifies features it finds useful for distinguishing different kinds of things in the world. But the point is that almost none of these features are ones to which we happen to have assigned words in human language.
Out in the computational universe it’s possible to find what may be incredibly useful ways to describe things. But they’re alien to us humans. They’re not something we know how to express, based on the corpus of knowledge our civilization has developed.
Now of course new concepts are being added to the corpus of human knowledge all the time. Back a century ago, if someone saw a nested pattern they wouldn’t have any way to describe it. But now we’d just say “it’s a fractal”. But the problem is that in the computational universe there’s an infinite collection of “potentially useful concepts”—with which we can never hope to ultimately keep up.
When I wrote A New Kind of Science I viewed it in no small part as an effort to break away from the use of mathematics—at least as a foundation for science. But one of the things I realized is that the ideas in the book also have a lot of implications for pure mathematics itself.
What is mathematics? Well, it’s a study of certain abstract kinds of systems, based on things like numbers and geometry. In a sense it’s exploring a small corner of the computational universe of all possible abstract systems. But still, plenty has been done in mathematics: indeed, the 3 million or so published theorems of mathematics represent perhaps the largest single coherent intellectual structure that our species has built.
Ever since Euclid, people have at least notionally imagined that mathematics starts from certain axioms (say, a+b=b+a, a+0=a, etc.), then builds up derivations of theorems. Why is math hard? The answer is fundamentally rooted in the phenomenon of computational irreducibility—which here is manifest in the fact that there’s no general way to shortcut the series of steps needed to derive a theorem. In other words, it can be arbitrarily hard to get a result in mathematics. But worse than that—as Gödel’s Theorem showed—there can be mathematical statements where there just aren’t any finite ways to prove or disprove them from the axioms. And in such cases, the statements just have to be considered “undecidable”.
And in a sense what’s remarkable about math is that one can usefully do it at all. Because it could be that most mathematical results one cares about would be undecidable. So why doesn’t that happen?
Well, if one considers arbitrary abstract systems it happens a lot. Take a typical cellular automaton—or a Turing machine—and ask whether it’s true that the system, say, always settles down to periodic behavior regardless of its initial state. Even something as simple as that will often be undecidable.
So why doesn’t this happen in mathematics? Maybe there’s something special about the particular axioms used in mathematics. And certainly if one thinks they’re the ones that uniquely describe science and the world there might be a reason for that. But one of the whole points of the book is that actually there’s a whole computational universe of possible rules that can be useful for doing science and describing the world.
And in fact I don’t think there’s anything abstractly special about the particular axioms that have traditionally been used in mathematics: I think they’re just accidents of history.
What about the theorems that people investigate in mathematics? Again, I think there’s a strong historical character to them. For all but the most trivial areas of mathematics, there’s a whole sea of undecidability out there. But somehow mathematics picks the islands where theorems can actually be proved—often particularly priding itself on places close to the sea of undecidability where the proof can only be done with great effort.
I’ve been interested in the whole network of published theorems in mathematics (it’s a thing to curate, like wars in history, or properties of chemicals). And one of the things I’m curious about is whether something there’s an inexorable sequence to the mathematics that’s done, or whether, in a sense, random parts are being picked.
And here, I think, there’s a considerable analogy to the kind of thing we were discussing before with language. What is a proof? Basically it’s a way of explaining to someone why something is true. I’ve made all sorts of automated proofs in which there are hundreds of steps, each perfectly verifiable by computer. But—like the innards of a neural net—what’s going on looks alien and not understandable by a human.
For a human to understand, there have to be familiar “conceptual waypoints”. It’s pretty much like with words in languages. If some particular part of a proof has a name (“Smith’s Theorem”), and has a known meaning, then it’s useful to us. But if it’s just a lump of undifferentiated computation, it won’t be meaningful to us.
In pretty much any axiom system, there’s an infinite set of possible theorems. But which ones are “interesting”? That’s really a human question. And basically it’s going to end up being ones with “stories”. In the book I show that for the simple case of basic logic, the theorems that have historically been considered interesting enough to be given names happen to be precisely the ones that are in some sense minimal.
But my guess is that for richer axiom systems pretty much anything that’s going to be considered “interesting” is going to have to be reached from things that are already considered interesting. It’s like building up words or concepts: you don’t get to introduce new ones unless you can directly relate them to existing ones.
In recent years I’ve wondered quite a bit about how inexorable or not progress is in a field like mathematics. Is there just one historical path that can be taken, say from arithmetic to algebra to the higher reaches of modern mathematics? Or are there an infinite diversity of possible paths, with completely different histories for mathematics?
The answer is going to depend on—in a sense—the “structure of metamathematical space”: just what is the network of true theorems that avoid the sea of undecidability? Maybe it’ll be different for different fields of mathematics, and some will be more “inexorable” (so it feels like the math is being “discovered”) than others (where it seems more like the math is arbitrary, and “invented”).
But to me one of the most interesting things is how close—when viewed in these kinds of terms—questions about the nature and character of mathematics end up being to questions about the nature and character of intelligence and AI. And it’s this kind of commonality that makes me realize just how powerful and general the ideas in A New Kind of Science actually are.
There are some areas of science—like physics and astronomy—where the traditional mathematical approach has done quite well. But there are others—like biology, social science and linguistics—where it’s had a lot less to say. And one of the things I’ve long believed is that what’s needed to make progress in these areas is to generalize the kinds of models one’s using, to consider a broader range of what’s out there in the computational universe.
And indeed in the past 15 or so years there’s been increasing success in doing this. And there are lots of biological and social systems, for example, where models have now been constructed using simple programs.
But unlike with mathematical models which can potentially be “solved”, these computational models often show computational irreducibility, and are typically used by doing explicit simulations. This can be perfectly successful for making particular predictions, or for applying the models in technology. But a bit like for the automated proofs of mathematical theorems one might still ask, “is this really science?”.
Yes, one can simulate what a system does, but does one “understand” it? Well, the problem is that computational irreducibility implies that in some fundamental sense one can’t always “understand” things. There might be no useful “story” that can be told; there may be no “conceptual waypoints”—only lots of detailed computation.
Imagine that one’s trying to make a science of how the brain understands language—one of the big goals of linguistics. Well, perhaps we’ll get an adequate model of the precise rules which determine the firing of neurons or some other lowlevel representation of the brain. And then we look at the patterns generated in understanding some whole collection of sentences.
Well, what if those patterns look like the behavior of rule 30? Or, closer at hand, the innards of some recurrent neural network? Can we “tell a story” about what’s happening? To do so would basically require that we create some kind of higherlevel symbolic representation: something where we effectively have words for core elements of what’s going on.
But computational irreducibility implies that there may ultimately be no way to create such a thing. Yes, it will always be possible to find patches of computational reducibility, where some things can be said. But there won’t be a complete story that can be told. And one might say there won’t be a useful reductionistic piece of science to be done. But that’s just one of the things that happens when one’s dealing with (as the title says) a new kind of science.
People have gotten very worried about AI in recent years. They wonder what’s going to happen when AIs “get much smarter” than us humans. Well, the Principle of Computational Equivalence has one piece of good news: at some fundamental level, AIs will never be “smarter”—they’ll just be able to do computations that are ultimately equivalent to what our brains do, or, for that matter, what all sorts of simple programs do.
As a practical matter, of course, AIs will be able to process larger amounts of data more quickly than actual brains. And no doubt we’ll choose to have them run many aspects of the world for us—from medical devices, to central banks to transportation systems, and much more.
So then it’s important to figure how we’ll tell them what to do. As soon as we’re making serious use of what’s out there in the computational universe, we’re not going to be able to give a linebyline description of what the AIs are going to do. Rather, we’re going to have to define goals for the AIs, then let them figure out how best to achieve those goals.
In a sense we’ve already been doing something like this for years in the Wolfram Language. There’s some highlevel function that describes something you want to do (“lay out a graph”, “classify data”, etc.). Then it’s up to the language to automatically figure out the best way to do it.
And in the end the real challenge is to find a way to describe goals. Yes, you want to search for cellular automata that will make a “nice carpet pattern”, or a “good edge detector”. But what exactly do those things mean? What you need is a language that a human can use to say as precisely as possible what they mean.
It’s really the same problem as I’ve been talking about a lot here. One has to have a way for humans to be able to talk about things they care about. There’s infinite detail out there in the computational universe. But through our civilization and our shared cultural history we’ve come to identify certain concepts that are important to us. And when we describe our goals, it’s in terms of these concepts.
Three hundred years ago people like Leibniz were interested in finding a precise symbolic way to represent the content of human thoughts and human discourse. He was far too early. But now I think we’re finally in a position to actually make this work. In fact, we’ve already gotten a long way with the Wolfram Language in being able to describe real things in the world. And I’m hoping it’ll be possible to construct a fairly complete “symbolic discourse language” that lets us talk about the things we care about.
Right now we write legal contracts in “legalese” as a way to make them slightly more precise than ordinary natural language. But with a symbolic discourse language we’ll be able to write true “smart contracts” that describe in highlevel terms what we want to have happen—and then machines will automatically be able to verify or execute the contract.
But what about the AIs? Well, we need to tell them what we generally want them to do. We need to have a contract with them. Or maybe we need to have a constitution for them. And it’ll be written in some kind of symbolic discourse language, that both allows us humans to express what we want, and is executable by the AIs.
There’s lots to say about what should be in an AI Constitution, and how the construction of such things might map onto the political and cultural landscape of the world. But one of the obvious questions is: can the constitution be simple, like Asimov’s Laws of Robotics?
And here what we know from A New Kind of Science tells us the answer: it can’t be. In a sense the constitution is an attempt to sculpt what can happen in the world and what can’t. But computational irreducibility says that there will be an unbounded collection of cases to consider.
For me it’s interesting to see how theoretical ideas like computational irreducibility end up impinging on these very practical—and central—societal issues. Yes, it all started with questions about things like the theory of all possible theories. But in the end it turns into issues that everyone in society is going to end up being concerned about.
Will we reach the end of science? Will we—or our AIs—eventually invent everything there is to be invented?
For mathematics, it’s easy to see that there’s an infinite number of possible theorems one can construct. For science, there’s an infinite number of possible detailed questions to ask. And there’s also an infinite array of possible inventions one can construct.
But the real question is: will there always be interesting new things out there?
Well, computational irreducibility says there will always be new things that need an irreducible amount of computational work to reach from what’s already there. So in a sense there’ll always be “surprises”, that aren’t immediately evident from what’s come before.
But will it just be like an endless array of different weirdly shaped rocks? Or will there be fundamental new features that appear, that we humans consider interesting?
It’s back to the very same issue we’ve encountered several times before: for us humans to find things “interesting” we have to have a conceptual framework that we can use to think about them. Yes, we can identify a “persistent structure” in a cellular automaton. Then maybe we can start talking about “collisions between structures”. But when we just see a whole mess of stuff going on, it’s not going to be “interesting” to us unless we have some higherlevel symbolic way to talk about it.
In a sense, then, the rate of “interesting discovery” isn’t going to be limited by our ability to go out into the computational universe and find things. Instead, it’s going to be limited by our ability as humans to build a conceptual framework for what we’re finding.
It’s a bit like what happened in the whole development of what became A New Kind of Science. People had seen related phenomena for centuries if not millennia (distribution of primes, digits of pi, etc.). But without a conceptual framework they just didn’t seem “interesting”, and nothing was built around them. And indeed as I understand more about what’s out there in the computational universe—and even about things I saw long ago there—I gradually build up a conceptual framework that lets me go further.
By the way, it’s worth realizing that inventions work a little differently from discoveries. One can see something new happen in the computational universe, and that might be a discovery. But an invention is about figuring out how something can be achieved in the computational universe.
And—like in patent law—it isn’t really an invention if you just say “look, this does that”. You have to somehow understand a purpose that it’s achieving.
In the past, the focus of the process of invention has tended to be on actually getting something to work (“find the lightbulb filament that works”, etc.). But in the computational universe, the focus shifts to the question of what you want the invention to do. Because once you’ve described the goal, finding a way to achieve it is something that can be automated.
That’s not to say that it will always be easy. In fact, computational irreducibility implies that it can be arbitrarily difficult. Let’s say you know the precise rules by which some chemicals can interact. Can you find a chemical synthesis pathway that will let you get to some particular chemical structure? There may be a way, but computational irreducibility implies that there may be no way to find out how long the pathway may be. And if you haven’t found a pathway you may never be sure if it’s because there isn’t one, or just because you didn’t reach it yet.
If one thinks about reaching the edge of science, one cannot help but wonder about the fundamental theory of physics. Given everything we’ve seen in the computational universe, is it conceivable that our physical universe could just correspond to one of those programs out there in the computational universe?
Of course, we won’t really know until or unless we find it. But in the years since A New Kind of Science appeared, I’ve become ever more optimistic about the possibilities.
Needless to say, it would be a big change for physics. Today there are basically two major frameworks for thinking about fundamental physics: general relativity and quantum field theory. General relativity is a bit more than 100 years old; quantum field theory maybe 90. And both have achieved spectacular things. But neither has succeeded in delivering us a complete fundamental theory of physics. And if nothing else, I think after all this time, it’s worth trying something new.
But there’s another thing: from actually exploring the computational universe, we have a huge amount of new intuition about what’s possible, even in very simple models. We might have thought that the kind of richness we know exists in physics would require some very elaborate underlying model. But what’s become clear is that that kind of richness can perfectly well emerge even from a very simple underlying model.
What might the underlying model be like? I’m not going to discuss this in great detail here, but suffice it to say that I think the most important thing about the model is that it should have as little as possible built in. We shouldn’t have the hubris to think we know how the universe is constructed; we should just take a general type of model that’s as unstructured as possible, and do what we typically do in the computational universe: just search for a program that does what we want.
My favorite formulation for a model that’s as unstructured as possible is a network: just a collection of nodes with connections between them. It’s perfectly possible to formulate such a model as an algebraiclike structure, and probably many other kinds of things. But we can think of it as a network. And in the way I’ve imagined setting it up, it’s a network that’s somehow “underneath” space and time: every aspect of space and time as we know it must emerge from the actual behavior of the network.
Over the past decade or so there’s been increasing interest in things like loop quantum gravity and spin networks. They’re related to what I’ve been doing in the same way that they also involve networks. And maybe there’s some deeper relationship. But in their usual formulation, they’re much more mathematically elaborate.
From the point of view of the traditional methods of physics, this might seem like a good idea. But with the intuition we have from studying the computational universe—and using it for science and technology—it seems completely unnecessary. Yes, we don’t yet know the fundamental theory of physics. But it seems sensible to start with the simplest hypothesis. And that’s definitely something like a simple network of the kind I’ve studied.
At the outset, it’ll look pretty alien to people (including myself) trained in traditional theoretical physics. But some of what emerges isn’t so alien. A big result I found nearly 20 years ago (that still hasn’t been widely understood) is that when you look at a large enough network of the kind I studied you can show that its averaged behavior follows Einstein’s equations for gravity. In other words, without putting any fancy physics into the underlying model, it ends up automatically emerging. I think it’s pretty exciting.
People ask a lot about quantum mechanics. Yes, my underlying model doesn’t build in quantum mechanics (just as it doesn’t build in general relativity). Now, it’s a little difficult to pin down exactly what the essence of “being quantum mechanical” actually is. But there are some very suggestive signs that my simple networks actually end up showing what amounts to quantum behavior—just like in the physics we know.
OK, so how should one set about actually finding the fundamental theory of physics if it’s out there in the computational universe of possible programs? Well, the obvious thing is to just start searching for it, starting with the simplest programs.
I’ve been doing this—more sporadically than I would like—for the past 15 years or so. And my main discovery so far is that it’s actually quite easy to find programs that aren’t obviously not our universe. There are plenty of programs where space or time are obviously completely different from the way they are in our universe, or there’s some other pathology. But it turns out it’s not so difficult to find candidate universes that aren’t obviously not our universe.
But we’re immediately bitten by computational irreducibility. We can simulate the candidate universe for billions of steps. But we don’t know what it’s going to do—and whether it’s going to grow up to be like our universe, or completely different.
It’s pretty unlikely that in looking at that tiny fragment of the very beginning of a universe we’re going to ever be able to see anything familiar, like a photon. And it’s not at all obvious that we’ll be able to construct any kind of descriptive theory, or effective physics. But in a sense the problem is bizarrely similar to the one we have even in systems like neural networks: there’s computation going on there, but can we identify “conceptual waypoints” from which we can build up a theory that we might understand?
It’s not at all clear our universe has to be understandable at that level, and it’s quite possible that for a very long time we’ll be left in the strange situation of thinking we might have “found our universe” out in the computational universe, but not being sure.
Of course, we might be lucky, and it might be possible to deduce an effective physics, and see that some little program that we found ends up reproducing our whole universe. It would be a remarkable moment for science. But it would immediately raise a host of new questions—like why this universe, and not another?
Right now us humans exist as biological systems. But in the future it’s certainly going to be technologically possible to reproduce all the processes in our brains in some purely digital—computational—form. So insofar as those processes represent “us”, we’re going to be able to be “virtualized” on pretty much any computational substrate. And in this case we might imagine that the whole future of a civilization could wind up in effect as a “box of a trillion souls”.
Inside that box there would be all kinds of computations going on, representing the thoughts and experiences of all those disembodied souls. Those computations would reflect the rich history of our civilization, and all the things that have happened to us. But at some level they wouldn’t be anything special.
It’s perhaps a bit disappointing, but the Principle of Computational Equivalence tells us that ultimately these computations will be no more sophisticated than the ones that go on in all sorts of other systems—even ones with simple rules, and no elaborate history of civilization. Yes, the details will reflect all that history. But in a sense without knowing what to look for—or what to care about—one won’t be able to tell that there’s anything special about it.
OK, but what about for the “souls” themselves? Will one be able to understand their behavior by seeing that they achieve certain purposes? Well, in our current biological existence, we have all sorts of constraints and features that give us goals and purposes. But in a virtualized “uploaded” form, most of these just go away.
I’ve thought quite a bit about how “human” purposes might evolve in such a situation, recognizing, of course, that in virtualized form there’s little difference between human and AI. The disappointing vision is that perhaps the future of our civilization consists in disembodied souls in effect “playing videogames” for the rest of eternity.
But what I’ve slowly realized is that it’s actually quite unrealistic to project our view of goals and purposes from our experience today into that future situation. Imagine talking to someone from a thousand years ago and trying to explain that people in the future would be walking on treadmills every day, or continually sending photographs to their friends. The point is that such activities don’t make sense until the cultural framework around them has developed.
It’s the same story yet again as with trying to characterize what’s interesting or what’s explainable. It relies on the development of a whole network of conceptual waypoints.
Can we imagine what the mathematics of 100 years from now will be like? It depends on concepts we don’t yet know. So similarly if we try to imagine human motivation in the future, it’s going to rely on concepts we don’t know. Our best description from today’s viewpoint might be that those disembodied souls are just “playing videogames”. But to them there might be a whole subtle motivation structure that they could only explain by rewinding all sorts of steps in history and cultural development.
By the way, if we know the fundamental theory of physics then in a sense we can make the virtualization complete, at least in principle: we can just run a simulation of the universe for those disembodied souls. Of course, if that’s what’s happening, then there’s no particular reason it has to be a simulation of our particular universe. It could as well be any universe from out in the computational universe.
Now, as I’ve mentioned, even in any given universe one will never in a sense run out of things to do, or discover. But I suppose I myself at least find it amusing to imagine that at some point those disembodied souls might get bored with just being in a simulated version of our physical universe—and might decide it’s more fun (whatever that means to them) to go out and explore the broader computational universe. Which would mean that in a sense the future of humanity would be an infinite voyage of discovery in the context of none other than A New Kind of Science!
Long before we have to think about disembodied human souls, we’ll have to confront the issue of what humans should be doing in a world where more and more can be done automatically by AIs. Now in a sense this issue is nothing new: it’s just an extension of the longrunning story of technology and automation. But somehow this time it feels different.
And I think the reason is in a sense just that there’s so much out there in the computational universe, that’s so easy to get to. Yes, we can build a machine that automates some particular task. We can even have a generalpurpose computer that can be programmed to do a full range of different tasks. But even though these kinds of automation extend what we can do, it still feels like there’s effort that we have to put into them.
But the picture now is different—because in effect what we’re saying is that if we can just define the goal we want to achieve, then everything else will be automatic. All sorts of computation, and, yes, “thinking”, may have to be done, but the idea is that it’s just going to happen, without human effort.
At first, something seems wrong. How could we get all that benefit, without putting in more effort? It’s a bit like asking how nature could manage to make all the complexity it does—even though when we build artifacts, even with great effort, they end up far less complex. The answer, I think, is it’s mining the computational universe. And it’s exactly the same thing for us: by mining the computational universe, we can achieve essentially an unbounded level of automation.
If we look at the important resources in today’s world, many of them still depend on actual materials. And often these materials are literally mined from the Earth. Of course, there are accidents of geography and geology that determine by whom and where that mining can be done. And in the end there’s a limit (if often very large) to the amount of material that’ll ever be available.
But when it comes to the computational universe, there’s in a sense an inexhaustible supply of material—and it’s accessible to anyone. Yes, there are technical issues about how to “do the mining”, and there’s a whole stack of technology associated with doing it well. But the ultimate resource of the computational universe is a global and infinite one. There’s no scarcity and no reason to be “expensive”. One just has to understand that it’s there, and take advantage of it.
Probably the greatest intellectual shift of the past century has been the one towards the computational way of thinking about things. I’ve often said that if one picks almost any field “X”, from archaeology to zoology, then by now there either is, or soon will be, a field called “computational X”—and it’s going to be the future of the field.
I myself have been deeply involved in trying to enable such computational fields, in particular through the development of the Wolfram Language. But I’ve also been interested in what is essentially the meta problem: how should one teach abstract computational thinking, for example to kids? The Wolfram Language is certainly important as a practical tool. But what about the conceptual, theoretical foundations?
Well, that’s where A New Kind of Science comes in. Because at its core it’s discussing the pure abstract phenomenon of computation, independent of its applications to particular fields or tasks. It’s a bit like with elementary mathematics: there are things to teach and understand just to introduce the ideas of mathematical thinking, independent of their specific applications. And so it is too with the core of A New Kind of Science. There are things to learn about the computational universe that give intuition and introduce patterns of computational thinking—quite independent of detailed applications.
One can think of it as a kind of “pre computer science” , or “pre computational X”. Before one gets into discussing the specifics of particular computational processes, one can just study the simple but pure things one finds in the computational universe.
And, yes, even before kids learn to do arithmetic, it’s perfectly possible for them to fill out something like a cellular automaton coloring book—or to execute for themselves or on a computer a whole range of different simple programs. What does it teach? Well, it certainly teaches the idea that there can be definite rules or algorithms for things—and that if one follows them one can create useful and interesting results. And, yes, it helps that systems like cellular automata make obvious visual patterns, that for example one can even find in nature (say on mollusc shells).
As the world becomes more computational—and more things are done by AIs and by mining the computational universe—there’s going to an extremely high value not only in understanding computational thinking, but also in having the kind of intuition that develops from exploring the computational universe and that is, in a sense, the foundation for A New Kind of Science.
My goal over the decade that I spent writing A New Kind of Science was, as much as possible, to answer all the first round of “obvious questions” about the computational universe. And looking back 15 years later I think that worked out pretty well. Indeed, today, when I wonder about something to do with the computational universe, I find it’s incredibly likely that somewhere in the main text or notes of the book I already said something about it.
But one of the biggest things that’s changed over the past 15 years is that I’ve gradually begun to understand more of the implications of what the book describes. There are lots of specific ideas and discoveries in the book. But in the longer term I think what’s most significant is how they serve as foundations, both practical and conceptual, for a whole range of new things that one can now understand and explore.
But even in terms of the basic science of the computational universe, there are certainly specific results one would still like to get. For example, it would be great to get more evidence for or against the Principle of Computational Equivalence, and its domain of applicability.
Like most general principles in science, the whole epistemological status of the Principles of Computational Equivalence is somewhat complicated. Is it like a mathematical theorem that can be proved? Is it like a law of nature that might (or might not) be true about the universe? Or is it like a definition, say of the very concept of computation? Well, much like, say, the Second Law of Thermodynamics or Evolution by Natural Selection, it’s a combination of these.
But one thing that’s significant is that it’s possible to get concrete evidence for (or against) the Principle of Computational Equivalence. The principle says that even systems with very simple rules should be capable of arbitrarily sophisticated computation—so that in particular they should be able to act as universal computers.
And indeed one of the results of the book is that this is true for one of the simplest possible cellular automata (rule 110). Five years after the book was published I decided to put up a prize for evidence about another case: the simplest conceivably universal Turing machine. And I was very pleased that in just a few months the prize was won, the Turing machine was proved universal, and there was another piece of evidence for the Principle of Computational Equivalence.
There’s a lot to do in developing the applications of A New Kind of Science. There are models to be made of all sorts of systems. There’s technology to be found. Art to be created. There’s also a lot to do in understanding the implications.
But it’s important not to forget the pure investigation of the computational universe. In the analogy of mathematics, there are applications to be pursued. But there’s also a “pure mathematics” that’s worth pursuing in its own right. And so it is with the computational universe: there’s a huge amount to explore just at an abstract level. And indeed (as the title of the book implies) there’s enough to define a whole new kind of science: a pure science of the computational universe. And it’s the opening of that new kind of science that I think is the core achievement of A New Kind of Science—and the one of which I am most proud.
For the 10th anniversary of A New Kind of Science, I wrote three posts:
The complete highresolution A New Kind of Science is now available on the web. There are also a limited number of print copies of the book still available (all individually coded!).
]]>(An Elementary Introduction to the Wolfram Language is available in print, as an ebook, and free on the web—as well as in Wolfram Programming Lab in the Wolfram Open Cloud. There’s also now a free online handson course based on it.)
A year ago I published a book entitled An Elementary Introduction to the Wolfram Language—as part of my effort to teach computational thinking to the next generation. I just published the second edition of the book—with (among other things) a significantly extended section on modern machine learning.
I originally expected my book’s readers would be high schoolers and up. But it’s actually also found a significant audience among middle schoolers (11 to 14yearolds). So the question now is: can one teach the core concepts of modern machine learning even to middle schoolers? Well, the interesting thing is that—thanks to the whole technology stack we’ve now got in the Wolfram Language—the answer seems to be “yes”!
Here’s what I did in the book:
After this main text, the book has Exercises, Q&A and Tech Notes.
What was my thinking behind this machine learning section? Well, first, it has to fit into the flow of the book—using only concepts that have already been introduced, and, when possible, reinforcing them. So it can talk about images, and realworld data, and graphs, and text—but not functional programming or external data resources.
With modern machine learning, it’s easy to show “wow” examples—like our imageidentify.com website from 2015 (based on the Wolfram Language ImageIdentify function). But my goal in the book was also to communicate a bit of the background and intuition of how machine learning works, and where it can be used.
I start off by explaining that machine learning is different from traditional “programming”, because it’s based on learning from examples, rather than on explicitly specifying computational steps. The first thing I discuss is something that doesn’t really need all the fanciness of modern neuralnet machine learning: it’s recognizing what languages text fragments are from:
Kids (and other people) can sort of imagine (or discuss in a classroom) how something like this might work—looking words up in dictionaries, etc. And I think it’s useful to give a first example that doesn’t seem like “pure magic”. (In reality, LanguageIdentify uses a combination of traditional lookup, and modern machine learning techniques.)
But then I give a much more “magic” example—of ImageIdentify:
I don’t immediately try to explain how it works, but instead go on to something different: sentiment analysis. Kids have lots of fun trying out sentiment analysis. But the real point here is that it shows the idea of making a “classifier”: there are an infinite number of possible inputs, but only (in this case) 3 possible outputs:
Having seen this, we’re ready to give a little more indication of how something like this works. And what I do is to show the function Classify classifying handwritten digits into 0s and 1s. I’m not saying what’s going on inside, but people can get the idea that Classify is given a bunch of examples, and then it’s using those to classify a particular input as being 0 or 1:
OK, but how does it do this? In reality one’s dealing with ideas about attractors—and inputs that lie in the basins of attraction for particular outputs. But in a first approximation, one can say that inputs that are “nearer to”, say, the 0 examples are taken to be 0s, and inputs that are nearer to the 1 examples are taken to be 1s.
People don’t usually have much difficulty with that explanation—unless they start to think too hard about what “nearest” might really mean in this context. But rather than concentrating on that, what I do in the book is just to talk about the case of numbers, where it’s really easy to see what “nearest” means:
Nearest isn’t the most exciting function to play with: one potentially puts a lot of things in, and then just one “nearest thing” comes out. Still, Nearest is nice because its functionality is pretty easy to understand (and one can have reasonable guesses about algorithms it could use).
Having seen Nearest for numbers, I show Nearest for colors. In the book, I’ve already talked about how colors are represented by redgreenblue triples of numbers, so this isn’t such a stretch—but seeing Nearest operate on colors begins to make it a little more plausible that it could operate on things like images too.
Next I show the case of words. In the book, I’ve already done quite a bit with strings and words. In the main text I don’t talk about the precise definition of “nearness” for words, but again, kids easily get the basic idea. (In a Tech Note, I do talk about EditDistance, another good algorithmic operation that people can think about and try out.)
OK, so how does one get from here to something like ImageIdentify? The approach I used is to talk next about OCR and TextRecognize. This doesn’t seem as “magic” as ImageIdentify (and lots of people know about “OCR’ing documents”), but it’s a good place to get a further idea of what ImageIdentify is doing.
Turning a piece of text into an image, and then back into the same text again, doesn’t seem that impressive or useful. But it gets more interesting if one blurs the text out (and, yes, blurring an image is something I talked about earlier in the book):
Given the blurred image, the question is: can one still recognize the text? At this stage in the book I haven’t talked about /@ (Map) or % (last output) yet, so I have to write the code out a bit more verbosely. But the result is:
And, yes, when the image isn’t too blurred, TextRecognize can recognize the text, but when the text gets too blurred, it stops being able to. I like this example, because it shows something impressive—but not “magic”—happening. And I think it’s useful to show both where machine learningbased functions succeed, and where they fail. By the way, the result here is different from the one in the book—because the text font is different, and those details matter when one’s on the edge of what can be recognized. (If one was doing this in a class, for example, one might try some different fonts and sizes, and discuss why some survive more blurring than others.)
TextRecognize shows how one can effectively do something like ImageIdentify, but with just 26 letterforms (well, actually, TextRecognize handles many more glyphs than that). But now in the book I show ImageIdentify again, blurring like we did with letters:
It’s fun to see what it does, but it’s also helpful. Because it gives a sense of the “attractor” around the “cheetah” concept: stay fairly close and the cheetah can still be recognized; go too far away and it can’t. (A slightly tricky issue is that we’re continually producing new, better neural nets for ImageIdentify—so even between when the book was finished and today there’ve been some new nets—and it so happens they give different results for the notacheetah cases. Presumably the new results are “better”, though it’s not clear what that means, given that we don’t have an official rightanswer “blurred cheetah” category, and who’s to say whether the blurriest image is more like a whortleberry or a person.)
I won’t go through my whole discussion of machine learning from the book here. Suffice it to say that after discussing explicitly trained functions like TextRecognize and ImageIdentify, I start discussing “unsupervised learning”, and things like clustering in feature space. I think our new FeatureSpacePlot is particularly helpful.
It’s fairly clear what it means to arrange colors:
But then one can “do the same thing” with images of letters. (In the book the code is a little longer, because I haven’t talked about /@ yet.)
And what’s nice about this is that—as well as being useful in its own right—it also reinforces the idea of how something like TextRecognize might work by finding the “nearest letter” to whatever input it’s given.
My final example in the section uses photographs. FeatureSpacePlot does a nice job of separating images of different kinds of things—again giving an idea of how ImageIdentify might work:
Obviously in just 10 pages in an elementary book I’m not able to give a complete exposition of modern machine learning. But I was pleased to see how many of the core concepts I was able to touch on.
Of course, the fact that this was possible at all depends critically on our whole Wolfram Language technology stack. Whether it’s the very fact that we have machine learning in the language, or the fact that we can seamlessly work with images or text or whatever, or the whole (28yearold!) Wolfram Notebook system that lets us put all these pieces together—all these pieces are critical to making it possible to bring modern machine learning to people like middle schoolers.
And what I really like is that what one gets to do isn’t toy stuff: one can take what I’m discussing in the book, and immediately apply it in realworld situations. At some level the fact that this works is a reflection of the whole automation emphasis of the Wolfram Language: there’s very sophisticated stuff going on inside, but it’s automated at all levels, so one doesn’t need to be an expert and understand the details to be able to use it—or to get a good intuition about what can work and what can’t.
OK, so how would one go further in teaching machine learning?
One early thing might be to start talking about probabilities. ImageIdentify has various possible choices of identifications, but what probabilities does it assign to them?
This can lead to a useful discussion about prior probabilities, and about issues like trading off specificity for certainty.
But the big thing to talk about is training. (After all, “machine learning trainer” will surely be a big future career for some of today’s middle schoolers…) And the good news is that in the Wolfram Language environment, it’s possible to make training work with only a modest amount of data.
Let’s get some examples of images of characters from Guardians of the Galaxy by searching the web (we’re using an external search API, so you unfortunately can’t do exactly this on the Open Cloud):
Now we can use these images as training material to create a classifier:
And, sure enough, it can identify Rocket:
And, yes, it thinks a real raccoon is him too:
How does it do it? Well, let’s look at FeatureSpacePlot:
Some of this looks good—but some looks confusing. Because it’s arranging some of the images not according to who they’re of, but just according to their background colors. And here we begin to see some of the subtlety of machine learning. The actual classifier we built works only because in the training examples for each character there were ones with different backgrounds—so it can figure out that background isn’t the only distinguishing feature.
Actually, there’s another critical thing as well: Classify isn’t starting from scratch in classifying the images. Because it’s already been pretrained to pick out “good features” that help distinguish realworld images. In fact, it’s actually using everything it learned from the creation of ImageIdentify—and the tens of millions of images it saw in connection with that—to know up front what features it should pay attention to.
It’s a bit weird to see, but internally Classify is characterizing each image as a list of numbers, each associated with a different “feature”:
One can do an extreme version of this in which one insists that each image is reduced to just two numbers—and that’s essentially how FeatureSpacePlot determines where to position an image:
OK, but what’s going on under the hood? Well, it’s complicated. But in the Wolfram Language it’s easy to see—and getting a look at it helps in terms of getting an intuition about how neural nets really work. So, for example, here’s the lowlevel Wolfram Language symbolic representation of the neural net that powers ImageIdentify:
And there’s actually even more: just click and keep drilling down:
And yes, this is hard to understand—certainly for middle schoolers, and even for professionals. But if we take this whole neural net object, and apply it to a picture of a tiger, it’ll do what ImageIdentify does, and tell us it’s a tiger:
But here’s a neat thing, made possible by a whole stack of functionality in the Wolfram Language: we can actually go “inside” the neural net, to get a sense of what’s happening. As an example, let’s just take the first 3 “layers” of the network, apply them to the tiger, and visualize what comes out:
Basically what’s happening is that the network has made lots of copies of the original image, and then processed each of them to pick out a different aspect of the image. (What’s going on actually seems to be remarkably similar to the first few levels of visual processing in the brain.)
What if we go deeper into the network? Here’s what happens at layer 10. The images are more abstracted, and presumably pick out higherlevel features:
Go to level 20, and the network is “thinking about” lots of little images:
But by level 28, it’s beginning to “come to some conclusions”, with only a few of its possible channels of activity “lighting up”:
Finally, by level 31, all that’s left is an array of numbers, with a few peaks visible:
And applying the very last layer of the network (a “softmax” layer) only a couple of peaks are left:
And the highest one is exactly the one that corresponds to the concept of “tiger”:
I’m not imagining that middle schoolers will follow all these details (and no, nobody should be learning neural net layer types like they learn parts of the water cycle). But I think it’s really useful to see “inside” ImageIdentify, and get even a rough sense of how it works. To someone like me it still seems a little like magic that it all comes together as it does. But what’s great is that now with our latest Wolfram Language tools one can easily look inside, and start getting an intuition about what’s going on.
The idea of the Wolfram Language Classify function is to do machine learning at the highest possible level—as automatically as possible, and building on as much pretraining as possible. But if one wants to get a more complete feeling for what machine learning is like, it’s useful to see what happens if one instead tries to just train a neural net from scratch.
There is an immediate practical issue though: to get a neural net, starting from scratch, to actually do anything useful, one typically has to give it a very large amount of training data—which is hard to collect and wrangle. But the good news here is that with the recent release of the Wolfram Data Repository we have a growing collection of readytouse training sets immediately available for use in the Wolfram Language.
Like here’s the classic MNIST handwritten digit training set, with its 60,000 training examples:
One thing one can do with a training set like this is just feed a random sample of it into Classify. And sure enough this gives one a classifier function that’s essentially a simple version of TextRecognize for handwritten digits:
And even with just 1000 training examples, it does pretty well:
And, yes, we can use FeatureSpacePlot to see how the different digits tend to separate in feature space:
But, OK, what if we want to actually train a neural net from scratch, with none of the fancy automation of Classify? Well, first we have to set up a raw neural net. And conveniently, the Wolfram Language has a bunch of classic neural nets built in. Here one’s called LeNet:
It’s much simpler than the ImageIdentify net, but it’s still pretty complicated. But we don’t have to understand what’s inside it to start training it. Instead, in the Wolfram Language, we can just use NetTrain (which, needless to say, automatically applies all the latest GPU tricks and so on):
It’s pretty neat to watch the training happening, and to see the orange line of the neural net’s error rate for fitting the examples keep going down. After about 20 seconds, NetTrain decides it’s gone far enough, and generates a finally trained net—which works pretty well:
If you stop the training early, it won’t do quite so well:
In the professional world of machine learning, there’s a whole art and science of figuring out the best parameters for training. But with what we’ve got now in the Wolfram Language, nothing is stopping a middle schooler from doing their own experiments, visualizing and analyzing the results, and getting as good an intuition as anyone.
OK, so if we want to really get down to the lowest level, we have to talk about what neural nets are made of. I’m not sure how much of this is middleschool stuff—but as soon as one knows about graphs of functions, one can already explain quite a bit. Because, you see, the “layers” in a neural net are actually just functions, that take numbers in, and put numbers out.
Take layer 2 of LeNet. It’s essentially just a simple Ramp function, which we can immediately plot (and, yes, it looks like a ramp):
Neural nets don’t typically just deal with individual numbers, though. They deal with arrays (or “tensors”) of numbers—represented in the Wolfram Language as nested lists. And each layer takes an array of numbers in, and puts an array of numbers out. Here’s a typical single layer:
This particular layer is set up to take 2 numbers as input, and put 4 numbers out:
It might seem to be doing something quite “random”, and actually it is. Because the actual function the layer is implementing is determined by yet another array of numbers, or “weights”—which NetInitialize here just sets randomly. Here’s what it set them to in this particular case:
Why is any of this useful? Well, the crucial point is that what NetTrain does is to progressively tweak the weights in each layer of a neural network to try to get the overall behavior of the net to match the training examples you gave.
There are two immediate issues, though. First, the structure of the network has to be such that it’s possible to get the behavior you want by using some appropriate set of weights. And second, there has to be some way to progressively tweak weights so as to get to appropriate values.
Well, it turns out a single LinearLayer like the one above can’t do anything interesting. Here’s a contour plot of (the first element of) its output, as a function of its two inputs. And as the name LinearLayer might suggest, we always get something flat and linear out:
But here’s the big discovery that makes neural nets useful: if we chain together several layers, it’s easy to get something much more complicated. (And, yes, in the Wolfram Language outputs from one layer get knitted into inputs to the next layer in a nice, automatic way.) Here’s an example with 4 layers—two linear layers and two ramps:
And now when we plot the function, it’s more complicated:
We can actually also look at an even simpler case—of a neural net with 3 layers, and just one number as final output. (For technical reasons, it’s nice to still have 2 inputs, though we’ll always set one of those inputs to the constant value of 1.)
Here’s what this particular network does as a function of its input:
Inside the network, there’s an array of 3 numbers being generated—and it turns out that “3” causes there to be at most 3 (+1) distinct linear parts in the function. Increase the 3 to 100, and things can get more complicated:
Now, the point is that this is in a sense a “random function”, determined by the particular random weights picked by NetInitialize. If we run NetInitialize a bunch of times, we’ll get a bunch of different results:
But the big question is: can we find an instance of this “random function” that’s useful for whatever we’re trying to do? Or, more particularly, can we find a random function that reproduces particular training examples?
Let’s imagine that our training examples give the values of the function at the dots in this plot (by the way, the setup here is more like machine learning in the style of Predict than Classify):
Here’s an instance of our network again:
And here’s a plot of what it initially does over the range of the training examples (and, yes, it’s obviously completely wrong):
Well, let’s just try training our network on our training data using NetTrain:
After about 20 seconds of training on my computer, there’s some vague sign that we’re beginning to reproduce at least some aspects of the original training data. But it’s at best slow going—and it’s not clear what’s eventually going to happen.
It’s a frontier question in neural net research just what structure of net will work best in any particular case (yes, we’re working on this question). But here let’s just try a slightly more complicated network:
Random instances of this network don’t give very different results from our last network (though the presence of that Tanh layer makes the functions a bit smoother):
But now let’s do some training (data was defined above):
And here’s the result—which is surprisingly decent:
In fact, if we compare it to our original training data we see that the training values lie right on the function that the neural net produced:
Here’s what happened during the training process. The neural net effectively “tried out” a bunch of different possibilities, finally settling on the result here:
In what sense is the result “correct”? Well, it fits the training examples, and that’s really all we can ask. Because that’s all the input we gave. How it “interpolates” between the training examples is really its own business. We’d like it to learn to “generalize” from the data it’s given—but it can’t really deduce much about the whole distribution of the data from the few points it’s being given here, so the kind of smooth interpolation it’s doing is as good as anything.
Outside the range of the training values, the neural net does what seem to be fairly random things—but again, there’s no “right answer” so one can’t really fault it:
But the fact that with the arbitrariness and messiness of our original neural net, we were able to successfully train it at all is quite remarkable. Neural nets of pretty much the type we’re talking about here had actually been studied for more than 60 years—but until the modern “deep learning revolution” nobody knew that it was going to be practical to train them for real problems.
But now—particularly with everything we have now in the Wolfram Language—it’s easy for anyone to do this.
Modern machine learning is very new—so even many of the obvious experiments haven’t been tried yet. But with our whole Wolfram Language setup there’s a lot that even middle schoolers can do. For example (and I admit I’m curious about this as I write this post): one can ask just how much something like the tiny neural net we were studying can learn.
Here’s a plot of the lengths of the first 60 Roman numerals:
After a small amount of training, here’s what the network managed to reproduce:
And one might think that maybe this is the best it’ll ever do. But I was curious if it could eventually do better—and so I just let it train for 2 minutes on my computer. And here’s the considerably better result that came out:
I think I can see why this particular thing works the way it does. But seeing it suggests all sorts of new questions to pursue. But to me the most exciting point is the overarching one of just how wide open this territory is—and how easy it is now to explore it.
Yes, there are plenty of technical details—some fundamental, some superficial. But transcending all of these, there’s intuition to be developed. And that’s something that can perfectly well start with the middle schoolers…
]]>Last week I gave a talk (and did a panel discussion) at a conference entitled “Ethics of Artificial Intelligence” held at the NYU Philosophy Department’s Center for Mind, Brain and Consciousness. Here’s the video and a transcript:
Thanks for inviting me here today.
You know, it’s funny to be here. My mother was a philosophy professor in Oxford. And when I was a kid I always said the one thing I’d never do was do or talk about philosophy. But, well, here I am.
Before I really get into AI, I think I should say a little bit about my worldview. I’ve basically spent my life alternating between doing basic science and building technology. I’ve been interested in AI for about as long as I can remember. But as a kid I started out doing physics and cosmology and things. That got me into building technology to automate stuff like math. And that worked so well that I started thinking about, like, how to really know and compute everything about everything. That was in about 1980—and at first I thought I had to build something like a brain, and I was studying neural nets and so on. But I didn’t get too far.
And meanwhile I got interested in an even bigger problem in science: how to make the most general possible theories of things. The dominant idea for 300 years had been to use math and equations. But I wanted to go beyond them. And the big thing I realized was that the way to do that was to think about programs, and the whole computational universe of possible programs.
And that led to my personal Galileolike moment. I just pointed my “computational telescope” at these simplest possible programs, and I saw this amazing one I called rule 30—that just seemed to go on producing complexity forever from essentially nothing.
Well, after I’d seen this, I realized this is actually something that happens all over the computational universe—and all over nature. It’s really the secret that lets nature make all the complicated stuff we see. But it’s something else too: it’s a window into what raw, unfettered computation is like. At least traditionally when we do engineering we’re always building things that are simple enough that we can foresee what they’ll do.
But if we just go out into the computational universe, things can be much wilder. Our company has done a lot of mining out there, finding programs that are useful for different purposes, like rule 30 is for randomness. And modern machine learning is kind of part way from traditional engineering to this kind of freerange mining.
But, OK, what can one say in general about the computational universe? Well, all these programs can be thought of as doing computations. And years ago I came up with what I call the Principle of Computational Equivalence—that says that if behavior isn’t obviously simple, it typically corresponds to a computation that’s maximally sophisticated. There are lots of predictions and implications of this. Like that universal computation should be ubiquitous. As should undecidability. And as should what I call computational irreducibility.
Can you predict what it’s going to do? Well, it’s probably computationally irreducible, which means you can’t figure out what it’s going to do without effectively tracing every step and going through the same computational effort it does. It’s completely deterministic. But to us it’s got what seems like free will—because we can never know what it’s going to do.
Here’s another thing: what’s intelligence? Well, our big unifying principle says that everything—from a tiny program, to our brains, is computationally equivalent. There’s no bright line between intelligence and mere computation. The weather really does have a mind of its own: it’s doing computations just as sophisticated as our brains. To us, though, it’s pretty alien computation. Because it’s not connected to our human goals and experiences. It’s just raw computation that happens to be going on.
So how do we tame computation? We have to mold it to our goals. And the first step there is to describe our goals. And for the past 30 years what I’ve basically been doing is creating a way to do that.
I’ve been building a language—that’s now called the Wolfram Language—that allows us to express what we want to do. It’s a computer language. But it’s not really like other computer languages. Because instead of telling a computer what to do in its terms, it builds in as much knowledge as possible about computation and the world, so that we humans can describe in our terms what we want, and then it’s up to the language to get it done as automatically as possible.
This basic idea has worked really well, and in the form of Mathematica it’s been used to make endless inventions and discoveries over the years. It’s also what’s inside WolframAlpha. Where the idea is to take pure natural language questions, understand them, and use the kind of curated knowledge and algorithms of our civilization to answer them. And, yes, it’s a very classic AIish thing. And of course it’s computed answers to billions and billions of questions from humans, for example inside Siri.
I had an interesting experience recently, figuring out how to use what we’ve built to teach computational thinking to kids. I was writing exercises for a book. At the beginning, it was easy: “make a program to do X”. But later on, it was like “I know what to say in the Wolfram Language, but it’s really hard to express in English”. And of course that’s why I just spent 30 years building the Wolfram Language.
English has maybe 25,000 common words; the Wolfram Language has about 5000 carefully designed builtin constructs—including all the latest machine learning—together with millions of things based on curated data. And the idea is that once one can think about something in the world computationally, it should be as easy as possible to express it in the Wolfram Language. And the cool thing is, it really works. Humans, including kids, can read and write the language. And so can computers. It’s a kind of highlevel bridge between human thinking, in its cultural context, and computation.
OK, so what about AI? Technology has always been about finding things that exist, and then taming them to automate the achievement of particular human goals. And in AI the things we’re taming exist in the computational universe. Now, there’s a lot of raw computation seething around out there—just as there’s a lot going on in nature. But what we’re interested in is computation that somehow relates to human goals.
So what about ethics? Well, maybe we want to constrain the computation, the AI, to only do things we consider ethical. But somehow we have to find a way to describe what we mean by that.
Well, in the human world, one way we do this is with laws. But so how do we connect laws to computations? We may call them “legal codes”, but today laws and contracts are basically written in natural language. There’ve been simple computable contracts in areas like financial derivatives. And now one’s talking about smart contracts around cryptocurrencies.
But what about the vast mass of law? Well, Leibniz—who died 300 years ago next month—was always talking about making a universal language to, as we would say now, express it all in a computable way. He was a few centuries too early, but I think now we’re finally in a position to do this.
I just posted a long blog about all this last week, but let me try to summarize. With the Wolfram Language we’ve managed to express a lot of kinds of things in the world—like the ones people ask Siri about. And I think we’re now within sight of what Leibniz wanted: to have a general symbolic discourse language that represents everything involved in human affairs.
I see it basically as a language design problem. Yes, we can use natural language to get clues, but ultimately we have to build our own symbolic language. It’s actually the same kind of thing I’ve done for decades in the Wolfram Language. Take even a word like “plus”. Well, in the Wolfram Language there’s a function called Plus, but it doesn’t mean the same thing as the word. It’s a very specific version, that has to do with adding things mathematically. And as we design a symbolic discourse language, it’s the same thing. The word “eat” in English can mean lots of things. But we need a concept—that we’ll probably refer to as “eat”—that’s a specific version, that we can compute with.
So let’s say we’ve got a contract written in natural language. One way to get a symbolic version is to use natural language understanding—just like we do for billions of WolframAlpha inputs, asking humans about ambiguities. Another way might be to get machine learning to describe a picture. But the best way is just to write in symbolic form in the first place, and actually I’m guessing that’s what lawyers will be doing before too long.
And of course once you have a contract in symbolic form, you can start to compute about it, automatically seeing if it’s satisfied, simulating different outcomes, automatically aggregating it in bundles, and so on. Ultimately the contract has to get input from the real world. Maybe that input is “born digital”, like data about accessing a computer system, or transferring bitcoin. Often it’ll come from sensors and measurements—and it’ll take machine learning to turn into something symbolic.
Well, if we can express laws in computable form maybe we can start telling AIs how we want them to act. Of course it might be better if we could boil everything down to simple principles, like Asimov’s Laws of Robotics, or utilitarianism or something.
But I don’t think anything like that is going to work. What we’re ultimately trying to do is to find perfect constraints on computation, but computation is something that’s in some sense infinitely wild. The issue already shows up in Gödel’s Theorem. Like let’s say we’re looking at integers and we’re trying to set up axioms to constrain them to just work the way we think they do. Well, what Gödel showed is that no finite set of axioms can ever achieve this. With any set of axioms you choose, there won’t just be the ordinary integers; there’ll also be other wild things.
And the phenomenon of computational irreducibility implies a much more general version of this. Basically, given any set of laws or constraints, there’ll always be “unintended consequences”. This isn’t particularly surprising if one looks at the evolution of human law. But the point is that there’s theoretically no way around it. It’s ubiquitous in the computational universe.
Now I think it’s pretty clear that AI is going to get more and more important in the world—and is going to eventually control much of the infrastructure of human affairs, a bit like governments do now. And like with governments, perhaps the thing to do is to create an AI Constitution that defines what AIs should do.
What should the constitution be like? Well, it’s got to be based on a model of the world, and inevitably an imperfect one, and then it’s got to say what to do in lots of different circumstances. And ultimately what it’s got to do is provide a way of constraining the computations that happen to be ones that align with our goals. But what should those goals be? I don’t think there’s any ultimate right answer. In fact, one can enumerate goals just like one can enumerate programs out in the computational universe. And there’s no abstract way to choose between them.
But for us there’s a way to choose. Because we have particular biology, and we have a particular history of our culture and civilization. It’s taken us a lot of irreducible computation to get here. But now we’re just at some point in the computational universe, that corresponds to the goals that we have.
Human goals have clearly evolved through the course of history. And I suspect they’re about to evolve a lot more. I think it’s pretty inevitable that our consciousness will increasingly merge with technology. And eventually maybe our whole civilization will end up as something like a box of a trillion uploaded human souls.
But then the big question is: “what will they choose to do?”. Well, maybe we don’t even have the language yet to describe the answer. If we look back even to Leibniz’s time, we can see all sorts of modern concepts that hadn’t formed yet. And when we look inside a modern machine learning or theorem proving system, it’s humbling to see how many concepts it effectively forms—that we haven’t yet absorbed in our culture.
Maybe looked at from our current point of view, it’ll just seem like those disembodied virtual souls are playing videogames for the rest of eternity. At first maybe they’ll operate in a simulation of our actual universe. Then maybe they’ll start exploring the computational universe of all possible universes.
But at some level all they’ll be doing is computation—and the Principle of Computational Equivalence says it’s computation that’s fundamentally equivalent to all other computation. It’s a bit of a letdown. Our proud future ending up being computationally equivalent just to plain physics, or to little rule 30.
Of course, that’s just an extension of the long story of science showing us that we’re not fundamentally special. We can’t look for ultimate meaning in where we’ve reached. We can’t define an ultimate purpose. Or ultimate ethics. And in a sense we have to embrace the details of our existence and our history.
There won’t be a simple principle that encapsulates what we want in our AI Constitution. There’ll be lots of details that reflect the details of our existence and history. And the first step is just to understand how to represent those things. Which is what I think we can do with a symbolic discourse language.
And, yes, conveniently I happen to have just spent 30 years building the framework to create such a thing. And I’m keen to understand how we can really use it to create an AI Constitution.
So I’d better stop talking about philosophy, and try to answer some questions.
After the talk there was a lively Q&A (followed by a panel discussion), included on the video. Some questions were:
Gottfried Leibniz—who died 300 years ago this November—worked on many things. But a theme that recurred throughout his life was the goal of turning human law into an exercise in computation. Of course, as we know, he didn’t succeed. But three centuries later, I think we’re finally ready to give it a serious try again. And I think it’s a really important thing to do—not just because it’ll enable all sorts of new societal opportunities and structures, but because I think it’s likely to be critical to the future of our civilization in its interaction with artificial intelligence.
Human law, almost by definition, dates from the very beginning of civilization—and undoubtedly it’s the first system of rules that humans ever systematically defined. Presumably it was a model for the axiomatic structure of mathematics as defined by the likes of Euclid. And when science came along, “natural laws” (as their name suggests) were at first viewed as conceptually similar to human laws, except that they were supposed to define constraints for the universe (or God) rather than for humans.
Over the past few centuries we’ve had amazing success formalizing mathematics and exact science. And out of this there’s a more general idea that’s emerged: the idea of computation. In computation, we’re dealing with arbitrary systems of rules—not necessarily ones that correspond to mathematical concepts we know, or features of the world we’ve identified. So now the question is: can we use the ideas of computation, in very much the way Leibniz imagined, to formalize human law?
The basic issue is that human law talks about human activities, and (unlike say for the mechanics of particles) we don’t have a general formalism for describing human activities. When it comes to talking about money, for example, we often can be precise. And as a result, it’s pretty easy to write a very formal contract for paying a subscription, or determining how an option on a publicly traded stock should work.
But what about all the things that typical legal contracts deal with? Well, clearly we have one way to write legal contracts: just use natural language (like English). It’s often very stylized natural language, because it’s trying to be as precise as possible. But ultimately it’s never going to be precise. Because at the lowest level it’s always going to depend on the meanings of words, which for natural language are effectively defined just by the practice and experience of the users of the language.
For a computer language, though, it’s a different story. Because now the constructs in the language are absolutely precise: instead of having a vague, societally defined effect on human brains, they’re defined to have a very specific effect on a computer. Of course, traditional computer languages don’t directly talk about things relevant to human activities: they only directly talk about things like setting values for variables, or calling abstractly defined functions.
But what I’m excited about is that we’re starting to have a bridge between the precision of traditional computer languages and the ability to talk about realworld constructs. And actually, it’s something I’ve personally been working on for more than three decades now: our knowledgebased Wolfram Language.
The Wolfram Language is precise: everything in it is defined to the point where a computer can unambiguously work with it. But its unique feature among computer languages is that it’s knowledge based. It’s not just a language to describe the lowlevel operations of a computer; instead, built right into the language is as much knowledge as possible about the real world. And this means that the language includes not just numbers like 2.7 and strings like “abc”, but also constructs like the United States, or the Consumer Price Index, or an elephant. And that’s exactly what we need in order to start talking about the kinds of things that appear in legal contracts or human laws.
I should make it clear that the Wolfram Language as it exists today doesn’t include everything that’s needed. We’ve got a large and solid framework, and we’re off to a good start. But there’s more about the world that we have to encode to be able to capture the full range of human activities and human legal specifications.
The Wolfram Language has, for example, a definition of what a banana is, broken down by all kinds of details. So if one says “you should eat a banana”, the language has a way to represent “a banana”. But as of now, it doesn’t have a meaningful way to represent “you”, “should” or “eat”.
Is it possible to represent things like this in a precise computer language? Absolutely! But it takes language design to set up how to do it. Language design is a difficult business—in fact, it’s probably the most intellectually demanding thing I know, requiring a strange mixture of high abstraction together with deep knowledge and downtoearth practical judgment. But I’ve been doing it now for nearly four decades, and I think I’m finally ready for the challenge of doing language design for everyday discourse.
So what’s involved? Well, let’s first talk about it in a simpler case: the case of mathematics. Consider the function Plus, which adds things like numbers together. When we use the English word “plus” it can have all sorts of meanings. One of those meanings is adding numbers together. But there are other meanings, that are related, say, by various analogies (“product X plus”, “the plus wire”, “it’s a real plus”, …).
When we come to define Plus in the Wolfram Language we want to build on the everyday notion of “plus”, but we want to make it precise. And we can do that by picking the specific meaning of “plus” that’s about adding things like numbers together. And once we know that this is what Plus means, we immediately know all sorts of properties, and can do explicit computations with it.
Now consider a concept like “magnesium”. It’s not as perfect and abstract a concept as Plus. But physics and chemistry give us a clear definition of the element magnesium—which we can then use in the Wolfram Language to have a welldefined “magnesium” entity.
It’s very important that the Wolfram Language is a symbolic language—because it means that the things in it don’t immediately have to have “values”; they can just be symbolic constructs that stand for themselves. And so, for example, the entity “magnesium” is represented as a symbolic construct, that doesn’t itself “do” anything, but can still appear in a computation, just like, for example, a number (like 9.45) can appear.
There are many kinds of constructs that the Wolfram Language supports. Like “New York City” or “last Christmas” or “geographically contained within”. And the point is that the design of the language has defined a precise meaning for them. New York City, for example, is taken to mean the precise legal entity considered to be New York City, with geographical borders defined by law. Internal to the Wolfram Language, there’s always a precise canonical representation for something like New York City (it’s Entity["City", {"NewYork", "NewYork", "UnitedStates"}]). And this internal representation is all that matters when it comes to computation. Yes, it’s convenient to refer to New York City as “nyc”, but in the Wolfram Language that natural language form is immediately converted to the precise internal form.
So what about “you should eat a banana”? Well, we’ve got to go through the same language design process for something like “eat” as for Plus (or “banana”). And the basic idea is that we’ve got to figure out a standard meaning for “eat”. For example, it might be “ingestion of food by a person (or animal)”. Now, there are plenty of other possible meanings for the English word “eat”—for example, ones that use analogies, as in “this function eats its arguments”. But the idea—like for Plus—is to ignore these, and just to define a standard notion of “eat” that is precise, and suitable for computation.
One gets a reasonable idea of what kinds of constructs one has to deal with just by thinking about parts of speech in English. There are nouns. Sometimes (as in “banana” or “elephant”) there’s a pretty precise definition of what these correspond to, and usually the Wolfram Language already knows about them. Sometimes it’s a little vaguer but still concrete (as in “chair” or “window”), and sometimes it’s abstract (like “happiness” or “justice”). But in each case one can imagine one or several entities that capture a definite meaning for the noun—just like the Wolfram Language already has entities for thousands of kinds of things.
Beyond nouns, there are verbs. There’s typically a certain superstructure that exists around verbs. Grammatically there might be a subject for the verb, and an object, and so on. Verbs are similar to functions in the Wolfram Language: each one deals with certain arguments, that for example correspond to its subject, object, etc. Now of course in English (or any other natural language) there are all sorts of elaborate special cases and extra features that can be associated with verbs. But basically we don’t care about these. Because we’re really just trying to define symbolic constructs that represent certain concepts. We don’t have to capture every detail of how a particular verb works; we’re just using the English verb as a way to give us a kind of “cognitive hook” for the concept.
We can go through other parts of speech. Adverbs that modify verbs; adjectives that modify nouns. These can sometimes be represented in the Wolfram Language by constructs like EntityInstance, and sometimes by options to functions. But the important point in all cases is that we’re not trying to faithfully reproduce how the natural language works; we’re just using the natural language as a guide to how concepts are set up.
Pronouns are interesting. They work a bit like variables in pure anonymous functions. In “you should eat a banana”, the “you” is like a free variable that’s going to be filled in with a particular person.
Parts of speech and grammatical structures suggest certain general features to capture in a symbolic representation of discourse. There are a bunch of others, though. For example, there are what amount to “calculi” that one needs to represent notions of time (“within the time interval”, “starting later”, etc.) or of space (“on top of”, “contained within”, etc.). We’ve already got many calculi like these in the Wolfram Language; the most straightforward are ones about numbers (“greater than”, etc.) or sets (“member of”), etc. Some calculi have long histories (“temporal logic”, “set theory”, etc.); others still have to be constructed.
Is there a global theory of what to do? Well, no more than there’s a global theory of how the world works. There are concepts and constructs that are part of how our world works, and we need to capture these. No doubt there’ll be new things that come along in the future, and we’ll want to capture those too. And my experience from building WolframAlpha is that the best thing to do is just to build each thing one needs, without starting off with any kind of global theory. After a while, one may notice that one’s built similar things several times, and one may go in and unify them.
One can get deep into the foundations of science and philosophy about this. Yes, there’s a computational universe out there of all the possible rules by which systems can operate (and, yes, I’ve spent a good part of my life studying the basic science of this). And there’s our physical universe that presumably operates according to certain rules from the computational universe. But from these rules can emerge all sorts of complex behavior, and in fact the phenomenon of computational irreducibility implies that in a sense there’s no limit to what can be built up.
But there’s not going to be an overall way to talk about all this stuff. And if we’re going to be dealing with any finite kind of discourse it’s going to only capture certain features. Which features we choose to capture is going to be determined by what concepts have evolved in the history of our society. And usually these concepts will be mirrored in the words that exist in the languages we use.
At a foundational level, computational irreducibility implies that there’ll always be new concepts that could be introduced. Back in antiquity, Aristotle introduced logic as a way to capture certain aspects of human discourse. And there are other frameworks that have been introduced in the history of philosophy, and more recently, natural language processing and artificial intelligence research. But computational irreducibility effectively implies that none of them can ever ultimately be complete. And we must expect that as the concepts we consider relevant evolve, so too must the symbolic representation we have for discourse.
OK, so let’s say we’ve got a symbolic representation for discourse. How’s it actually going to be used? Well, there are some good clues from the way natural language works.
In standard discussions of natural language, it’s common to talk about “interrogative statements” that ask a question, “declarative statements” that assert something and “imperative statements” that say to do something. (Let’s ignore “exclamatory statements”, like expletives, for now.)
Interrogative statements are what we’re dealing with all the time in WolframAlpha: “what is the density of gold?”, “what is 3+7?”, “what was the latest reading from that sensor?”, etc. They’re also common in notebooks used to interact with the Wolfram Language: there’s an input (In[1]:= 2+2) and then there’s a corresponding output (Out[1]= 4).
Declarative statements are all about filling in particular values for variables. In a very coarse way, one can set values (x=7), as in typical procedural languages. But it’s typically better to think about having environments in which one’s asserting things. Maybe those environments are supposed to represent the real world, or some corner of it. Or maybe they’re supposed to represent some fictional world, where for example dinosaurs didn’t go extinct, or something.
Imperative statements are about making things happen in the world: “open the pod bay doors”, “pay Bob 0.23 bitcoin”, etc.
In a sense, interrogative statements determine the state of the world, declarative statements assert things about the state of the world, and imperative statements change the state of the world.
In different situations, we can mean different things by “the world”. We could be talking about abstract constructs, like integers or logic operations, that just are the way they are. We could be talking about natural laws or other features of our physical universe that we can’t change. Or we could be talking about our local environment, where we can move around tables and chairs, choose to eat bananas, and so on. Or we could be talking about our mental states, or the internal state of something like a computer.
There are lots of things one can do if one has a general symbolic representation for discourse. But one of them—which is the subject of this post—is to express things like legal contracts. The beginning of a contract, with its various whereas clauses, recitals, definitions and so on tends to be dense with declarative statements (“this is so”). Then the actual terms of the contract tend to end up with imperative statements (“this should happen”), perhaps depending on certain things determined by interrogative statements (“did this happen?”).
It’s not hard to start seeing the structure of contracts as being much like programs. In simple cases, they just contain logical conditionals: “if X then Y”. In other cases they’re more modeled on math: “if this amount of X happens, that amount of Y should happen”. Sometimes there’s iteration: “keep doing X until Y happens”. Occasionally there’s some recursion: “keep applying X to every Y”. And so on.
There are already some places where legal contracts are routinely represented by what amount to programs. The most obvious are financial contracts for things like bonds and options—which just amount to little programs that define payouts based on various formulas and conditionals.
There’s a whole industry of using “rules engines” to encode certain kinds of regulations as “if then” rules, usually mixed with formulas. In fact, such things are almost universally used for tax and insurance computations. (They’re also common in pricing engines and the like.)
Of course, it’s no coincidence that one talks about “legal codes”. The word code—which comes from the Latin codex—originally referred to systematic collections of legal rules. And when programming came along a couple of millennia later, it used the word “code” because it basically saw itself as similarly setting up rules for how things should work, except now the things had to do with the operation of computers rather than the conduct of worldly affairs.
But now, with our knowledgebased computer language and the idea of a symbolic discourse language, what we’re trying to do is to make it so we can talk about a broad range of worldly affairs in the same kind of way that we talk about computational processes—so we put all those legal codes and contracts into computational form.
How should we think about symbolic discourse language compared to ordinary natural language? In a sense, the symbolic discourse language is a representation in which all the nuance and “poetry” have been “crushed” out of the natural language. The symbolic discourse language is precise, but it’ll almost inevitably lose the nuance and poetry of the original natural language.
If someone says “2+2” to WolframAlpha, it’ll dutifully answer “4”. But what if instead they say, “hey, will you work out 2+2 for me”. Well, that sets up a different mood. But WolframAlpha will take that input and convert it to exactly the same symbolic form as “2+2”, and similarly just respond “4”.
This is exactly the kind of thing that’ll happen all the time with symbolic discourse language. And if the goal is to answer precise questions—or, for that matter, to create a precise legal contract, it’s exactly what one wants. One just needs the hard content that will actually have a consequence for what one’s trying to do, and in this case one doesn’t need the “extras” or “pleasantries”.
Of course, what one chooses to capture depends on what one’s trying to do. If one’s trying to get psychological information, then the “mood” of a piece of natural language can be very important. Those “exclamatory statements” (like expletives) carry meaning one cares about. But one can still perfectly well imagine capturing things like that in a symbolic way—for example by having an “emotion track” in one’s symbolic discourse language. (Very coarsely, this might be represented by sentiment or by position in an emotion space—or, for that matter, by a whole symbolic language derived, say, from emoji.)
In actual human communication through natural language, “meaning” is a slippery concept, that inevitably depends on the context of the communication, the history of whoever is communicating, and so on. My notion of a symbolic discourse language isn’t to try to magically capture the “true meaning” of a piece of natural language. Instead, my goal is just to capture some meaning that one can then compute with.
For convenience, one might choose to start with natural language, and then try to translate it into the symbolic discourse language. But the point is for the symbolic discourse language to be the real representation: the natural language is just a guide for trying to generate it. And in the end, the notion is that if one really wants to be sure one’s accurate in what one’s saying, one should say it directly in the symbolic discourse language, without ever using natural language.
Back in the 1600s, one of Leibniz’s big concerns was to have a representation that was independent of which natural language people were using (French, German, Latin, etc.). And one feature of a symbolic discourse language is that it has to operate “below” the level of specific natural languages.
There’s a rough kind of universality among human languages, in that it seems to be possible to represent any human concept at least to some approximation in any language. But there are plenty of nuances that are extremely hard to translate—between different languages, or the different cultures that surround them (or even the same language at different times in history). But in the symbolic discourse language, one’s effectively “crushing out” these differences—and getting something that is precise, even though it typically won’t correspond exactly to any particular human natural language.
A symbolic discourse language is about representing things in the world. Natural language is just one way to try to describe those things. But there are others. For example, one might give a picture. One could try to describe certain features of the picture in natural language (“a cat with a hat on its head”)—or one could go straight from the picture to the symbolic discourse language.
In the example of a picture, it’s very obvious that the symbolic discourse language isn’t going to capture everything. Maybe it could capture something like “he is taking the diamond”. But it’s not going to specify the color of every pixel, and it’s not going to describe all conceivable features of a scene at every level of detail.
In some sense, what the symbolic discourse language is doing is to specify a model of the system it’s describing. And like any model, it’s capturing some features, and idealizing others away. But the importance of it is that it provides a solid foundation on which computations can be done, conclusions can be drawn, and actions can be taken.
I’ve been thinking about creating what amounts to a general symbolic discourse language for nearly 40 years. But it’s only recently—with the current state of the Wolfram Language—that I’ve had the framework to actually do it. And it’s also only recently that I’ve understood how to think about the problem in a sufficiently practical way.
Yes, it’s nice in principle to have a symbolic way to represent things in the world. And in specific cases—like answering questions in WolframAlpha—it’s completely clear why it’s worth doing this. But what’s the point of dealing with more general discourse? Like, for example, when do we really want to have a “general conversation” with a machine?
The Turing test says that being able to do this is a sign of achieving general AI. But “general conversations” with machines—without any particular purpose in mind—so far usually seem in practice to devolve quickly into party tricks and Easter eggs. At least that’s our experience looking at interactions people have with WolframAlpha, and it also seems to be the experience with decades of chatbots and the like.
But the picture quickly changes if there’s a purpose to the conversation: if you’re actually trying to get the machine to do something, or learn something from the machine. Still, in most of these cases, there’s no real reason to have a general representation of things in the world; it’s sufficient just to represent specific machine actions, particular customer service goals, or whatever. But if one wants to tackle the general problem of law and contracts, it’s a different story. Because inevitably one’s going to have to represent the full spectrum of human affairs and issues. And so now there’s a definite goal to having a symbolic representation of the world: one needs it to be able to say what should happen and have machines understand it.
Sometimes it’s useful to do that because one wants the machines just to be able to check whether what was supposed to happen actually did; sometimes one wants to actually have the machines automatically enforce or do things. But either way, one needs the machine to be able to represent general things in the world—and so one needs a symbolic discourse language to be able to do this.
In a sense, it’s a very obvious idea to have something like a symbolic discourse language. And indeed it’s an idea that’s come up repeatedly across the course of centuries. But it’s proved a very difficult idea to make work, and it has a history littered with (sometimes quite wacky) failures.
Things in a sense started well. Back in antiquity, logic as discussed by Aristotle provided a very restricted example of a symbolic discourse language. And when the formalism of mathematics began to emerge it provided another example of a restricted symbolic discourse language.
But what about more general concepts in the world? There’d been many efforts—between the Tetractys of the Pythagoreans and the I Ching of the Chinese—to assign symbols or numbers to a few important concepts. But around 1300 Ramon Llull took it further, coming up with a whole combinatorial scheme for representing concepts—and then trying to implement this with circles of paper that could supposedly mechanically determine the validity of arguments, particularly religious ones.
Four centuries later, Gottfried Leibniz was an enthusiast of Llull’s work, at first imagining that perhaps all concepts could be converted to numbers and truth then determined by doing something like factoring into primes. Later, Leibniz starting talking about a characteristica universalis (or, as Descartes called it, an “alphabet of human thoughts”)—essentially a universal symbolic language. But he never really tried to construct such a thing, instead chasing what one might consider “special cases”—including the one that led him to calculus.
With the decline of Latin as the universal natural language in the 1600s, particularly in areas like science and diplomacy, there had already been efforts to invent “philosophical languages” (as they were called) that would represent concepts in an abstract way, not tied to any specific natural language. The most advanced of these was by John Wilkins—who in 1668 produced a book cataloging over 10,000 concepts and representing them using strangelooking glyphs, with a rendering of the Lord’s Prayer as an example.
In some ways these efforts evolved into the development of encyclopedias and later thesauruses, but as languagelike systems, they basically went nowhere. Two centuries later, though, as the concept of internationalization spread, there was a burst of interest in constructing new, countryindependent languages—and out of this emerged Volapük and then Esperanto. These languages were really just artificial natural languages; they weren’t an attempt to produce anything like a symbolic discourse language. I always used to enjoy seeing signs in Esperanto at European airports, and was disappointed in the 1980s when these finally disappeared. But, as it happens, right around that time, there was another wave of language construction. There were languages like Lojban, intended to be as unambiguous as possible, and ones like the interestingly minimal Toki Pona intended to support the simple life, as well as the truly bizarre Ithkuil, intended to encompass the broadest range of linguistic and supposedly cognitive structures.
Along the way, there were also attempts to simplify languages like English by expressing everything in terms of 1000 or 2000 basic words (instead of the usual 20,000–30,000)—as in the “Simple English” version of Wikipedia or the xkcd Thing Explainer.
There were a few, more formal, efforts. One example was Hans Freudenthal’s 1960 Lincos “language for cosmic intercourse” (i.e. communication with extraterrestrials) which attempted to use the notation of mathematical logic to capture everyday concepts. In the early days of the field of artificial intelligence, there were plenty of discussions of “knowledge representation”, with approaches based variously on the grammar of natural language, the structure of predicate logic or the formalism of databases. Very few largescale projects were attempted (Doug Lenat’s Cyc being a notable counterexample), and when I came to develop WolframAlpha I was disappointed at how little of relevance to our needs seemed to have emerged.
In a way I find it remarkable that something as fundamental as the construction of a symbolic discourse language should have had so little serious attention paid to it in the past. But at some level it’s not so surprising. It’s a difficult, large project, and it somehow lies in between established fields. It’s not a linguistics project. Yes, it may ultimately illuminate how languages work, but that’s not its main point. It’s not a computer science project because it’s really about content, not algorithms. And it’s not a philosophy project because it’s mostly about specific nittygritty and not much about general principles.
There’ve been a few academic efforts in the last half century or so, discussing ideas like “semantic primes” and “natural semantic metalanguage”. Usually such efforts have tried to attach themselves to the field of linguistics—but their emphasis on abstract meaning rather than pure linguistic structure has put them at odds with prevailing trends, and none have turned into largescale projects.
Outside of academia, there’s been a steady stream of proposals—sometimes promoted by wonderfully eccentric individuals—for systems to organize and name concepts in the world. It’s not clear how far this pursuit has come since Ramon Llull—and usually it’s only dealing with pure ontology, and never with full meaning of the kind that can be conveyed in natural language.
I suppose one might hope that with all the recent advances in machine learning there’d be some magic way to automatically learn an abstract representation for meaning. And, yes, one can take Wikipedia, for example, or a text corpus, and use dimension reduction to derive some effective “space of concepts”. But, not too surprisingly, simple Euclidean space doesn’t seem to be a very good model for the way concepts relate (one can’t even faithfully represent graph distances). And even the problem of taking possible meanings for words—as a dictionary might list them—and breaking them into clusters in a space of concepts doesn’t seem to be easy to do effectively.
Still, as I’ll discuss later, I think there’s a very interesting interplay between symbolic discourse language and machine learning. But for now my conclusion is that there’s not going to be any alternative but to use human judgment to construct the core of any symbolic discourse language that’s intended for humans to use.
But let’s get back to contracts. Today, there are hundreds of billions of them being signed every year around the world (and vastly more being implicitly entered into)—though the number of “original” ones that aren’t just simple modifications is probably just in the millions (and is perhaps comparable to the number of original computer programs or apps being written.)
So can these contracts be represented in precise symbolic form, as Leibniz hoped 300 years ago? Well, if we can develop a decently complete symbolic discourse language, it should be possible. (Yes, every contract would have to be defined relative to some underlying set of “governing law” rules, etc., that are in some ways like the builtin functions of the symbolic discourse language.)
But what would it mean? Among other things, it would mean that contracts themselves would become computable things. A contract would be converted to a program in the symbolic discourse language. And one could do abstract operations just on this program. And this means one can imagine formally determining—in effect through a kind of generalization of logic—whether, say, a given contract has some particular implication, could ever lead to some particular outcome, or is equivalent to some other contract.
Ultimately, though, there’s a theoretical problem with this. Because questions like this can run into issues of formal undecidability, which means there’s no guarantee that any systematic finite computation will answer them. The same problem arises in reasoning about typical software programs people write, and in practice it’s a mixed bag, with some things being decidable, and others not.
Of course, even in the Wolfram Language as it is today, there are plenty of things (such as the very basic “are these expressions equal?”) that are ultimately in principle undecidable. And there are certainly questions one can ask that run right into such issues. But an awful lot of the kinds of questions that people naturally ask turn out to be answerable with modest amounts of computation. And I wouldn’t be surprised if this were true for questions about contracts too. (It’s worth noting that humanformulated questions tend to run into undecidability much less than questions picked, say at random, from the whole computational universe of possibilities.)
If one has contracts in computational form, there are other things one can expect to do too. Like to be able to automatically work out what the contracts imply for a large range of possible inputs. The 1980s revolution in quantitative finance started when it became clear one could automatically compute distributions of outcomes for simple options contracts. If one had lots (perhaps billions) of contracts in computational form, there’d be a lot more that could be done along these lines—and no doubt, for better or worse, whole new areas of financial engineering that could be developed.
OK, so let’s say one has a computational contract. What can one directly do with it? Well, it depends somewhat on what the form of its inputs is. One important possibility is that they’re in a sense “born computational”: that they’re immediately statements about a computational system (“how many accesses has this ID made today?”, “what is the ping time for this connection?”, “how much bitcoin got transferred?”, etc.). And in that case, it should be possible to immediately and unambiguously “evaluate” the contract—and find out if it’s being satisfied.
This is something that’s very useful for lots of purposes—both for humans interacting with machines, and machines interacting with machines. In fact, there are plenty of cases where versions of it are already in use. One can think of computer security provisions such as firewall rules as one example. There are others that are gradually emerging, such as automated SLAs (servicelevel agreements) and automated termsofservice. (I’m certainly hoping our company, for example, will be able to make these a routine part of our business practices before too long.)
But, OK, it’s certainly not true that every input for every contract is “born computational”: plenty of inputs have to come from seeing what happens in the “outside” world (“did the person actually go to place X?”, “was the package maintained in a certain environment?”, “did the information get leaked to social media?”, “is the parrot dead?”, etc.). And the first thing to say is that in modern times it’s become vastly easier to automatically determine things about the world, not least because one can just make measurements with sensors. Check the GPS trace. Look at the car counting sensor. And so on. The whole Internet of Things is out there to provide input about the real world for computational contracts.
Having said this, though, there’s still an issue. Yes, with a GPS trace there’s a definite answer (assuming the GPS is working properly) for whether someone or something went to a particular place. But let’s say one’s trying to determine something less obviously numerical. Let’s say, for example, that one’s trying to determine whether a piece of fruit should be considered “Fancy Grade” or not. Well, given some pictures of the piece of fruit an expert can pretty unambiguously tell. But how can we make this computational?
Well, here’s a place where we can use modern machine learning. We can set up some neural net, say in the Wolfram Language, and then show it lots of examples of fruit that’s Fancy Grade and that’s not. And from my experience (and those of our customers!) most of the time we’ll get a system that’s really good at a task like grading fruit. It’ll certainly be much faster than humans, and it’ll probably be more reliable and more consistent too.
And this gives a whole new way to set up contracts about things in the world. Two parties can just agree that the contract should say “if the machine learning system says X then do Y”. In a sense it’s like any other kind of computational contract: the machine learning system is just a piece of code. But it’s a little different. Because normally one expects that one can readily examine everything that a contract says: one can in effect read and understand the code. But with machine learning in the middle, there can no longer be any expectation of that.
Nobody specifically set up all those millions of numerical weights in the neural net; they were just determined by some approximate and somewhat random process from whatever training data that was given. Yes, in principle we can measure everything about what’s happening inside the neural net. But there’s no reason to expect that we’ll ever be able to get an understandable explanation—or prediction—of what the net will do in any particular case. Most likely it’s an example of the phenomenon I call computational irreducibility–which means there really isn’t any way to see what will happen much more efficiently than just by running it.
What’s the difference with asking a human expert, then, whose thought processes one can’t understand? Well, in practice machine learning is much faster so one can make much more use of “expert judgment”. And one can set things up so they’re repeatable, and one can for example systematically test for biases one thinks might be there, and so on.
Of course, one can always imagine cheating the machine learning. If it’s repeatable, one could use machine learning itself to try to learn cases where it would fail. And in the end it becomes rather like computer security, where holes are being found, patches are being applied, and so on. And in some sense this is no different from the typical situation with contracts too: one tries to cover all situations, then it becomes clear that something hasn’t been correctly addressed, and one tries to write a new contract to address it, and so on.
But the important bottom line is that with machine learning one can expect to get “judgment oriented” input into contracts. I expect the typical pattern will be this: in the contract there’ll be something stated in the symbolic discourse language (like “X will personally do Y”). And at the level of the symbolic discourse language there’ll be a clear meaning to this, from which, for example, all sorts of implications can be drawn. But then there’s the question of whether what the contract said is actually what happened in the real world. And, sure, there can be lots of sensor data that gives information on this. But in the end there’ll be a “judgment call” that has to be made. Did the person actually personally do this? Well—like for a remote exam proctoring system—one can have a camera watching the person, one can record their pattern of keystrokes, and maybe even measure their EEG. But something’s got to synthesize this data, and make the judgment call about what happened, and turn this in effect into a symbolic statement. And in practice I expect it will typically end up being a machine learning system that does this.
OK, so let’s say we’ve got ways to set up computational contracts. How can we enforce them? Well, ones that basically just involve computational processes can at some level enforce themselves. A particular piece of software can be built to issue licenses only in suchandsuch a way. A cloud system can be built to make a download available only if it receives a certain amount of bitcoin. And so on.
But how far do we trust what’s going on? Maybe someone hacked the software, or the cloud. How can we be sure nothing bad has happened? The basic answer is to use the fact that the world is a big place. As a (sometime) physicist it makes me think of measurement in quantum mechanics. If we’re just dealing with a little quantum effect, there’s always interference that can happen. But when we do a real measurement, we’re amplifying that little quantum effect to the point where so many things (atoms, etc.) are involved that it’s unambiguous what happened—in much the same way as the Second Law of Thermodynamics makes it inconceivable that all the air molecules in a room will spontaneously line up on one side.
And so it is with bitcoin, Ethereum, etc. The idea is that some particular thing that happened (“X paid Y suchandsuch” or whatever) is shared and recorded in so many places that there can’t be any doubt about it. Yes, it’s in principle possible that all the few thousand places that actually participate in something like bitcoin today could collude to give a fake result. But the idea is that it’s like with gas molecules in a room: the probability is inconceivably small. (As it happens, my Principle of Computational Equivalence suggests that there’s more than an analogy with the gas molecules, and that actually the underlying principles at work are basically exactly the same. And, yes, there are lots of interesting technical details about the operation of distributed blockchain ledgers, distributed consensus protocols, etc., but I’m not going to get into them here.)
It’s popular these days to talk about “smart contracts”. When I’ve been talking about “computational contracts” I mean contracts that can be expressed computationally. But by “smart contracts” people usually mean contracts that can both be expressed computationally and execute automatically. Most often the idea is to set up a smart contract in a distributed computation environment like Ethereum, and then to have the code in the contract evaluate based on inputs from the computation environment.
Sometimes the input is intrinsic—like the passage of time (who could possibly tamper with the clock of the whole internet?), or physically generated random numbers. And in cases like this, one has fairly pure smart contracts, say for paying subscriptions, or for running distributed lotteries.
But more often there has to be some input from the outside—from something that happens in the world. Sometimes one just needs public information: the price of a stock, the temperature at a weather station, or a seismic event like a nuclear explosion. But somehow the smart contract needs access to an “oracle” that can give it this information. And conveniently enough, there is one good such oracle available in the world: WolframAlpha. And indeed WolframAlpha is becoming widely used as an oracle for smart contracts. (Yes, our general public terms of service say you currently just shouldn’t rely on WolframAlpha for anything you consider critical—though hopefully soon those terms of service will get more sophisticated, and computational.)
But what about nonpublic information from the outside world? The current thinking for smart contracts tends to be that one has to get humans in the loop to verify the information: that in effect one has to have a jury (or a democracy) to decide whether something is true. But is that really the best one can do? I tend to suspect there’s another path, that’s like using machine learning to inject humanlike judgment into things. Yes, one can use people, with all their inscrutable and hardtosystematicallyinfluence behavior. But what if one replaces those people in effect by AIs—or even a collection of today’s machinelearning systems?
One can think of a machinelearning system as being a bit like a cryptosystem. To attack it and spoof its input one has to do something like inverting how it works. Well, given a single machinelearning system there’s a certain effort needed to achieve this. But if one has a whole collection of sufficiently independent systems, the effort goes up. It won’t be good enough just to change a few parameters in the system. But if one just goes out into the computational universe and picks systems at random then I think one can expect to have the same kind of independence as by having different people. (To be fair, I don’t yet quite know how to apply the mining of the computational universe that I’ve done for programs like cellular automata to the case of systems like neural nets.)
There’s another point as well: if one has a sufficiently dense net of sensors in the world, then it becomes increasingly easy to be sure about what’s happened. If there’s just one motion sensor in a room, it might be easy to cover it. And maybe even if there are several sensors, it’s still possible to avoid them, Mission Impossiblestyle. But if there are enough sensors, then by synthesizing information from them one can inevitably build up an understanding of what actually happened. In effect, one has a model of how the world works, and with enough sensors one can validate that the model is correct.
It’s not surprising, but it always helps to have redundancy. More nodes to ensure the computation isn’t tampered with. More machinelearning algorithms to make sure they aren’t spoofed. More sensors to make sure they’re not fooled. But in the end, there has to be something that says what should happen—what the contract is. And the contract has to be expressed in some language in which there are definite concepts. So somehow from the various redundant systems one has in the world, one has to make a definite conclusion—one has to turn the world into something symbolic, on which the contract can operate.
Let’s say we have a good symbolic discourse language. Then how should contracts actually get written in it?
One approach is to take existing contracts written in English or any other natural language, and try to translate (or parse) them into the symbolic discourse language. Well, what will happen is somewhat like what happens with WolframAlpha today. The translator will not know exactly what the natural language was supposed to mean, and so it will give several possible alternatives. Maybe there was some meaning that the original writer of the naturallanguage contract had in mind. But maybe the “poetry” of that meaning can’t be expressed in the symbolic discourse language: it requires something more definite. And a human is going to have to decide which alternative to pick.
Translating from naturallanguage contracts may be a good way to start, but I suspect it will quickly give way to writing contracts directly in the symbolic discourse language. Today lawyers have to learn to write legalese. In the future, they’re going to have to learn to write what amounts to code: contracts expressed precisely in a symbolic discourse language.
One might think that writing everything as code, rather than naturallanguage legalese, would be a burden. But my guess is that it will actually be a great benefit. And it’s not just because it will let contracts operate more easily. It’s also that it will help lawyers think better about contracts. It’s an old claim (the Sapir–Whorf hypothesis) that the language one uses affects the way one thinks. And this is no doubt somewhat true for natural languages. But in my experience it’s dramatically true for computer languages. And indeed I’ve been amazed over the years at how my thinking has changed as we’ve added more to the Wolfram Language. When I didn’t have a way to express something, it didn’t enter my thinking. But once I had a way to express it, I could think in terms of it.
And so it will be, I believe, for legal thinking. When there’s a precise symbolic discourse language, it’ll become possible to think more clearly about all sorts of things.
Of course, in practice it’ll help that there’ll no doubt be all sorts of automated annotation: “if you add that clause, it’ll imply X, Y and Z”, etc. It’ll also help that it’ll routinely be possible to take some contract and simulate its consequences for a range of inputs. Sometimes one will want statistical results (“is this biased?”). Sometimes one will want to hunt for particular “bugs” that will only be found by trying lots of inputs.
Yes, one can read a contract in natural language, like one can read a math paper. But if one really wants to know its implications one needs it in computational form, so one can run it and see what it implies—and also so one can give it to a computer to implement.
Back in ancient Babylon it was a pretty big deal when there started to be written laws like the Code of Hammurabi. Of course, with very few people able to read, there was all sorts of clunkiness at first—like having people recite the laws in order from memory. Over the centuries things got more streamlined, and then about 500 years ago, with the advent of widespread literacy, laws and contracts started to be able to get more complex (which among other things allowed them to be more nuanced, and to cover more situations).
In recent decades the trend has accelerated, particularly now that it’s so easy to copy and edit documents of any length. But things are still limited by the fact that humans are in the loop, authoring and interpreting the documents. Back 50 years ago, pretty much the only way to define a procedure for anything was to write it down, and have humans implement it. But then along came computers, and programming. And very soon it started to be possible to define vastly more complex procedures—to be implemented not by humans, but instead by computers.
And so, I think, it will be with law. Once computational law becomes established, the complexity of what can be done will increase rapidly. Typically a contract defines some model of the world, and specifies what should happen in different situations. Today the logical and algorithmic structure of models defined by contracts still tends to be fairly simple. But with computational contracts it’ll be feasible for them to be much more complex—so that they can for example more faithfully capture how the world works.
Of course, that just makes defining what should happen even more complex—and before long it might feel a bit like constructing an operating system for a computer, that tries to cover all the different situations the computer might find itself in.
In the end, though, one’s going to have to say what one wants. One might be able to get a certain distance by just giving specific examples. But ultimately I think one’s going to have to use a symbolic discourse language that can express a higher level of abstraction.
Sometimes one will be able to just write everything in the symbolic discourse language. But often, I suspect, one will use the symbolic discourse language to define what amount to goals, and then one will have to use machinelearning kinds of methods to fill in how to define a contract that actually achieves them.
And as soon as there’s computational irreducibility involved, it’ll typically be impossible to know for sure that there are no bugs, or “unintended consequences”. Yes, one can do all kinds of automated tests. But in the end it’s theoretically impossible to have any finite procedure that can guarantee to check all possibilities.
Today there are plenty of legal situations that are too complex to handle without expert lawyers. And in a world where computational law is common, it won’t just be convenient to have computers involved, it’ll be necessary.
In a sense it’s similar to what’s already happened in many areas of engineering. Back when humans had to design everything themselves, humans could typically understand the structures that were being built. But once computers are involved in design it becomes inevitable that they’re needed in figuring out how things work too.
Today a fairly complex contract might involve a hundred pages of legalese. But once there’s computational law—and particularly contracts constructed automatically from goals—the lengths are likely to increase rapidly. At some level it won’t matter, though—just as it doesn’t really matter how long the code of a program one’s using is. Because the contract will in effect just be run automatically by computer.
Leibniz saw computation as a simplifying element in the practice of law. And, yes, some things will become simpler and better defined. But a vast ocean of complexity will also open up.
How should one tell an AI what to do? Well, you have to have some form of communication that both humans and AIs can understand—and that is rich enough to describe what one wants. And as I’ve described elsewhere, what I think this basically means is that one has to have a knowledgebased computer language—which is precisely what the Wolfram Language is—and ultimately one needs a full symbolic discourse language.
But, OK, so one tells an AI to do something, like “go get some cookies from the store”. But what one says inevitably won’t be complete. The AI has to operate within some model of the world, and with some code of conduct. Maybe it can figure out how to steal the cookies, but it’s not supposed to do that; presumably one wants it to follow the law, or a certain code of conduct.
And this is where computational law gets really important: because it gives us a way to provide that code of conduct in a way that AIs can readily make use of.
In principle, we could have AIs ingest the complete corpus of laws and historical cases and so on, and try to learn from these examples. But as AIs become more and more important in our society, it’s going to be necessary to define all sorts of new laws, and many of these are likely to be “born computational”, not least, I suspect, because they’ll be too algorithmically complex to be usefully described in traditional natural language.
There’s another problem too: we really don’t just want AIs to follow the letter of the law (in whatever venue they happen to be), we want them to behave ethically too, whatever that may mean. Even if it’s within the law, we probably don’t want our AIs lying and cheating; we want them somehow to enhance our society along the lines of whatever ethical principles we follow.
Well, one might think, why not just teach AIs ethics like we could teach them laws? In practice, it’s not so simple. Because whereas laws have been somewhat decently codified, the same can’t be said for ethics. Yes, there are philosophical and religious texts that talk about ethics. But it’s a lot vaguer and less extensive than what exists for law.
Still, if our symbolic discourse language is sufficiently complete, it certainly should be able to describe ethics too. And in effect we should be able to set up a system of computational laws that defines a whole code of conduct for AIs.
But what should it say? One might have a few immediate ideas. Perhaps one could combine all the ethical systems of the world. Obviously hopeless. Perhaps one could have the AIs just watch what humans do and learn their system of ethics from it. Similarly hopeless. Perhaps one could try something more local, where the AIs switch their behavior based on geography, cultural context, etc. (think “protocol droid”). Perhaps useful in practice, but hardly a complete solution.
So what can one do? Well, perhaps there are a few principles one might agree on. For example, at least the way we think about things today, most of us don’t want humans to go extinct (of course, maybe in the future, having mortal beings will be thought too disruptive, or whatever). And actually, while most people think there are all sorts of things wrong with our current society and civilization, people usually don’t want it to change too much, and they definitely don’t want change forced upon them.
So what should we tell the AIs? It would be wonderful if we could just give the AIs some simple set of almost axiomatic principles that would make them always do what we want. Maybe they could be based on Asimov’s Three Laws of Robotics. Maybe they could be something seemingly more modern based on some kind of global optimization. But I don’t think it’s going to be that easy.
The world is a complicated place; if nothing else, that’s basically guaranteed by the phenomenon of computational irreducibility. And it’s pretty much inevitable that there’s not going to be any finite procedure that’ll force everything to “come out the way one wants” (whatever that may be).
Let me take a somewhat abstruse, but well defined, example from mathematics. We think we know what integers are. But to really be able to answer all questions about integers (including about infinite collections of them, etc.) we need to set up axioms that define how integers work. And that’s what Giuseppe Peano tried to do in the late 1800s. For a while it looked good, but then in 1931 Kurt Gödel surprised the world with his Incompleteness Theorem, which implied among other things, that actually, try as one might, there was never going to be a finite set of axioms that would define the integers as we expect them to be, and nothing else.
In some sense, Peano’s original axioms actually got quite close to defining just the integers we want. But Gödel showed that they also allow bizarre nonstandard integers, where for example the operation of addition isn’t finitely computable.
Well, OK, that’s abstract mathematics. What about the real world? Well, one of the things that we’ve learned since Gödel’s time is that the real world can be thought of in computational terms, pretty much just like the mathematical systems Gödel considered. And in particular, one can expect the same phenomenon of computational irreducibility (which itself is closely related to Gödel’s Theorem). And the result of this is that whatever simple intuitive goal we may define, it’s pretty much inevitable we’ll have to build up what amount to an arbitrarily complicated collection of rules to try to achieve it—and whatever we do, there’ll always be at least some “unintended consequences”.
None of this should really come as much of a surprise. After all, if we look at actual legal systems as they’ve evolved over the past couple of thousand years, there always end up being a lot of laws. It’s not like there’s a single principle from which everything else can be derived; there inevitably end up being lots of different situations that have to be covered.
But is all this complexity just a consequence of the “mechanics” of how the world works? Imagine—as one expects—that AIs get more and more powerful. And that more and more of the systems of the world, from money supplies to border controls, are in effect put in the hands of AIs. In a sense, then, the AIs play a role a little bit like governments, providing an infrastructure for human activities.
So, OK, perhaps we need a “constitution” for the AIs, just like we set up constitutions for governments. But again the question comes: what should the constitution have in it?
Let’s say that the AIs could mold human society in pretty much any way. How would we want it molded? Well, that’s an old question in political philosophy, debated since antiquity. At first an idea like utilitarianism might sound good: somehow maximize the wellbeing of as many people as possible. But imagine actually trying to do this with AIs that in effect control the world. Immediately one is thrust into concrete versions of questions that philosophers and others have debated for centuries. Let’s say one can sculpt the probability distribution for happiness among people in the world. Well, now we’ve got to get precise about whether it’s the mean or the median or the mode or a quantile or, for that matter, the kurtosis of the distribution that we’re trying to maximize.
No doubt one can come up with rhetoric that argues for some particular choice. But there just isn’t an abstract “right answer”. Yes, we can have a symbolic discourse language that expresses any choice. But there’s no mathematical derivation of the answer and there’s no law of nature that forces a particular answer. I suppose there could be a “best answer given our biological nature”. But as things advance, this won’t be on solid ground either, as we increasingly manage to use technology to transcend the biology that evolution has delivered to us.
Still, we might argue, there’s at least one constraint: we don’t want a scheme where we’ll go extinct—and where nothing will in the end exist. Even this is going to be a complicated thing to discuss, because we need to say what the “we” here is supposed to be: just how “evolved” relative to the current human condition can things be, and not consider “us” to have gone extinct?
But even independent of this, there’s another issue: given any particular setup, computational irreducibility can make it in a sense irreducibly difficult to find out its consequences. And so in particular, given any specific optimization criterion (or constitution), there may be no finite procedure that will determine whether it allows for infinite survival, or whether in effect it implies civilization will “halt” and go extinct.
OK, so things are complicated. What can one actually do? For a little while there’ll probably be the notion that AIs must ultimately have human owners, who must act according to certain principles, following the usual way human society operates. But realistically this won’t last long.
Who would be responsible for a publicdomain AI system that’s spread across the internet? What happens when the bots it spawns start misbehaving on social media (yes, the notion that social media accounts are just for humans will soon look very “early 21st century”)?
Of course, there’s an important question of why AIs should “follow the rules” at all. After all, humans certainly don’t always do that. It’s worth remembering, though, that we humans are probably a particularly difficult case: after all, we’re the product a multibillionyear process of natural selection, in which there’s been a continual competitive struggle for survival. AIs are presumably coming into the world in very different circumstances, and without the same need for “brutish instincts”. (Well, I can’t help thinking of AIs from different companies or countries being imbued by their creators with certain brutish instincts, but that’s surely not a necessary feature of AI existence.)
In the end, though, the best hope for getting AIs to “follow the rules” is probably by more or less the same mechanism that seems to maintain human society today: that following the rules is the way some kind of dynamic equilibrium is achieved. But if we can get the AIs to “follow the rules”, we still have to define what the rules—the AI Constitution—should be.
And, of course, this is a hard problem, with no “right answer”. But perhaps one approach is to see what’s happened historically with humans. And one important and obvious thing is that there are different countries, with different laws and customs. So perhaps at the very least we have to expect that there’d be multiple AI Constitutions, not just one.
Even looking at countries today, an obvious question is how many there should be. Is there some easy way to say that—with technology as it exists, for example—7 billion people should be expected to organize themselves into about 200 countries?
It sounds a bit like asking how many planets the solar system should end up with. For a long time this was viewed as a “random fact of nature” (and widely used by philosophers as an example of something that, unlike 2+2=4, doesn’t “have to be that way”). But particularly having seen so many exoplanet systems, it’s become clear that our solar system actually pretty much has to have about the number of planets it does.
And maybe after we’ve seen the sociologies of enough videogame virtual worlds, we’ll know something about how to “derive” the number of countries. But of course it’s not at all clear that AI Constitutions should be divided anything like countries.
The physicality of humans has the convenient consequence that at least at some level one can divide the world geographically. But AIs don’t need to have that kind of spatial locality. One can imagine some other schemes, of course. Like let’s say one looks at the space of personalities and motivations, and finds clusters in it. Perhaps one could start to say “here’s an AI Constitution for that cluster” and so on. Maybe the constitutions could fork, perhaps almost arbitrarily (a “Gitlike model of society”). I don’t know how things like this would ultimately work, but they seem more plausible than what amounts to a single, consensus, AI Constitution for everywhere and everyone.
There are so many issues, though. Like here’s one. Let’s assume AIs are the dominant power in our world. But let’s assume that they successfully follow some constitution or constitutions that we’ve defined for them. Well, that’s nice—but does it mean nothing can ever change in the world? I mean, just think if we were still all operating according to laws that had been set up 200 years ago: most of society has moved on since then, and wants different laws (or at least different interpretations) to reflect its principles.
But what if precise laws for AIs were burnt in around the year 2020, for all eternity? Well, one might say, real constitutions always have explicit clauses that allow for their own modification (in the US Constitution it’s Article V). But looking at the actual constitutions of countries around the world isn’t terribly encouraging. Some just say basically that the constitution can be changed if some supreme leader (a person) says so. Many say that the constitution can be changed through some democratic process—in effect by some sequence of majority or similar votes. And some basically define a bureaucratic process for change so complex that one wonders if it’s formally undecidable whether it would ever come to a conclusion.
At first, the democratic scheme seems like an obvious winner. But it’s fundamentally based on the concept that people are somehow easy to count (of course, one can argue about which people, etc.). But what happens when personhood gets more complicated? When, for example, there are in effect uploaded human consciousnesses, deeply intertwined with AIs? Well, one might say, there’s always got to be some “indivisible person” involved. And yes, I can imagine little clumps of pineal gland cells that are maintained to define “a person”, just like in the past they were thought to be the seat of the soul. But from the basic science I’ve done I think I can say for certain that none of this will ultimately work—because in the end the computational processes that define things just don’t have this kind of indivisibility.
So what happens to “democracy” when there are no longer “people to count”? One can imagine all sorts of schemes, involving identifying the density of certain features in “people space”. I suppose one can also imagine some kind of bizarre voting involving transfinite numbers of entities, in which perhaps the axiomatization of set theory has a key effect on the future of history.
It’s an interesting question how to set up a constitution in which change is “burned in”. There’s a very simple example in bitcoin, where the protocol just defines by fiat that the value of mined bitcoin goes down every year. Of course, that setup is in a sense based on a model of the world—and in particular on something like Moore’s Law and the apparent shortterm predictability of technological development. But following the same general idea, one might starting thinking about a constitution that says “change 1% of the symbolic code in this every year”. But then one’s back to having to decide “which 1%?”. Maybe it’d be based on usage, or observations of the world, or some machinelearning procedure. But whatever algorithm or metaalgorithm is involved, there’s still at some point something that has to be defined once and for all.
Can one make a general theory of change? At first, this might seem hopeless. But in a sense exploring the computational universe of programs is like seeing a spectrum of all possible changes. And there’s definitely some general science that can be done on such things. And maybe there’s some setup—beyond just “fork whenever there could be a change”—that would let one have a constitution that appropriately allows for change, as well as changing the way one allows for change, and so on.
OK, we’ve talked about some farreaching and foundational issues. But what about the here and now? Well, I think the exciting thing is that 300 years after Gottfried Leibniz died, we’re finally in a position to do what he dreamed of: to create a general symbolic discourse language, and to apply it to build a framework for computational law.
With the Wolfram Language we have the foundational symbolic system—as well as a lot of knowledge of the world—to start from. There’s still plenty to do, but I think there’s now a definite path forward. And it really helps that in addition to the abstract intellectual challenge of creating a symbolic discourse language, there’s now also a definite target in mind: being able to set up practical systems for computational law.
It’s not going to be easy. But I think the world is ready for it, and needs it. There are simple smart contracts already in things like bitcoin and Ethereum, but there’s vastly more that can be done—and with a full symbolic discourse language the whole spectrum of activities covered by law becomes potentially accessible to structured computation. It’s going to lead to all sorts of both practical and conceptual advances. And it’s going to enable new legal, commercial and societal structures—in which, among other things, computers are drawn still further into the conduct of human affairs.
I think it’s also going to be critical in defining the overall framework for AIs in the future. What ethics, and what principles, should they follow? How do we communicate these to them? For ourselves and for the AIs we need a way to formulate what we want. And for that we need a symbolic discourse language. Leibniz had the right idea, but 300 years too early. Now in our time I’m hoping we’re finally going to get to build for real what he only imagined. And in doing so we’re going to take yet another big step forward in harnessing the power of the computational paradigm.
]]>Not many years ago, the idea of having a computer broadly answer questions asked in plain English seemed like science fiction. But when we released WolframAlpha in 2009 one of the big surprises (not least to me!) was that we’d managed to make this actually work. And by now people routinely ask personal assistant systems—many powered by WolframAlpha—zillions of questions in ordinary language every day.
It all works fairly well for quick questions, or short commands (though we’re always trying to make it better!). But what about more sophisticated things? What’s the best way to communicate more seriously with AIs?
I’ve been thinking about this for quite a while, trying to fit together clues from philosophy, linguistics, neuroscience, computer science and other areas. And somewhat to my surprise, what I’ve realized recently is that a big part of the answer may actually be sitting right in front of me, in the form of what I’ve been building towards for the past 30 years: the Wolfram Language.
Maybe this is a case of having a hammer and then seeing everything as a nail. But I’m pretty sure there’s more to it. And at the very least, thinking through the issue is a way to understand more about AIs and their relation to humans.
The first key point—that I came to understand clearly only after a series of discoveries I made in basic science—is that computation is a very powerful thing, that lets even tiny programs (like cellular automata, or neural networks) behave in incredibly complicated ways. And it’s this kind of thing that an AI can harness.
Looking at pictures like this we might be pessimistic: how are we humans going to communicate usefully about all that complexity? Ultimately, what we have to hope is that we can build some kind of bridge between what our brains can handle and what computation can do. And although I didn’t look at it quite this way, this turns out to be essentially just what I’ve been trying to do all these years in designing the Wolfram Language.
I have seen my role as being to identify lumps of computation that people will understand and want to use, like FindShortestTour, ImageIdentify or Predict. Traditional computer languages have concentrated on lowlevel constructs close to the actual hardware of computers. But in the Wolfram Language I’ve instead started from what we humans understand, and then tried to capture as much of it as possible in the language.
In the early years, we were mostly dealing with fairly abstract concepts, about, say, mathematics or logic or abstract networks. But one of the big achievements of recent years—closely related to WolframAlpha—has been that we’ve been able to extend the structure we built to cover countless real kinds of things in the world—like cities or movies or animals.
One might wonder: why invent a language for all this; why not just use, say, English? Well, for specific things, like “hot pink”, “new york city” or “moons of pluto”, English is good—and actually for such things the Wolfram Language lets people just use English. But when one’s trying to describe more complex things, plain English pretty quickly gets unwieldy.
Imagine for example trying to describe even a fairly simple algorithmic program. A backandforth dialog—“Turingtest style”—would rapidly get frustrating. And a straight piece of English would almost certainly end up with incredibly convoluted prose like one finds in complex legal documents.
But the Wolfram Language is built precisely to solve such problems. It’s set up to be readily understandable to humans, capturing the way humans describe and think about things. Yet it also has a structure that allows arbitrary complexity to be assembled and communicated. And, of course, it’s readily understandable not just by humans, but also by machines.
I realize I’ve actually been thinking and communicating in a mixture of English and Wolfram Language for years. When I give talks, for example, I’ll say something in English, then I’ll just start typing to communicate my next thought with a piece of Wolfram Language code that executes right there.
But let’s get back to AI. For most of the history of computing, we’ve built programs by having human programmers explicitly write lines of code, understanding (apart from bugs!) what each line does. But achieving what can reasonably be called AI requires harnessing more of the power of computation. And to do this one has to go beyond programs that humans can directly write—and somehow automatically sample a broader swath of possible programs.
We can do this through the kind of algorithm automation we’ve long used in Mathematica and the Wolfram Language, or we can do it through explicit machine learning, or through searching the computational universe of possible programs. But however we do it, one feature of the programs that come out is that they have no reason to be understandable by humans.
At some level it’s unsettling. We don’t know how the programs work inside, or what they might be capable of. But we know they’re doing elaborate computation that’s in a sense irreducibly complex to analyze.
There’s another, very familiar place where the same kind of thing happens: the natural world. Whether we look at fluid dynamics, or biology, or whatever, we see all sorts of complexity. And in fact the Principle of Computational Equivalence that emerged from the basic science I did implies that this complexity is in a sense exactly the same as the complexity that can occur in computational systems.
Over the centuries we’ve been able to identify aspects of the natural world that we can understand, and then harness them to create technology that’s useful to us. And our traditional engineering approach to programming works more or less the same way.
But for AI, we have to venture out into the broader computational universe, where—as in the natural world—we’re inevitably dealing with things we cannot readily understand.
Let’s imagine we have a perfect, complete AI, that’s able to do anything we might reasonably associate with intelligence. Maybe it’ll get input from lots of IoT sensors. And it has all sorts of computation going on inside. But what is it ultimately going to try to do? What is its purpose going to be?
This is about to dive into some fairly deep philosophy, involving issues that have been batted around for thousands of years—but which finally are going to really matter in dealing with AIs.
One might think that as an AI becomes more sophisticated, so would its purposes, and that eventually the AI would end up with some sort of ultimate abstract purpose. But this doesn’t make sense. Because there is really no such thing as abstractly defined absolute purpose, derivable in some purely formal mathematical or computational way. Purpose is something that’s defined only with respect to humans, and their particular history and culture.
An “abstract AI”, not connected to human purposes, will just go along doing computation. And as with most cellular automata and most systems in nature, we won’t be able to identify—or attribute—any particular “purpose” to that computation, or to the system that does it.
Technology has always been about automating things so humans can define goals, and then those goals can automatically be achieved by the technology.
For most kinds of technology, those goals have been tightly constrained, and not too hard to describe. But for a general computational system they can be completely arbitrary. So then the challenge is how to describe them.
What do you say to an AI to tell it what you want it to do for you? You’re not going to be able to tell it exactly what to do in each and every circumstance. You’d only be able to do that if the computations the AI could do were tightly constrained, like in traditional software engineering. But for the AI to work properly, it’s going to have to make use of broader parts of the computational universe. And it’s then a consequence of a phenomenon I call computational irreducibility that you’ll never be able to determine everything it’ll do.
So what’s the best way to define goals for an AI? It’s complicated. If the AI can experience your life alongside you—seeing what you see, reading your email, and so on—then, just like with a person you know well, you might be able to tell the AI at least simple goals just by saying them in natural language.
But what if you want to define more complex goals, or goals that aren’t closely associated with what the AI has already experienced? Then small amounts of natural language wouldn’t be enough. Perhaps the AI could go through a whole education. But a better idea would be to leverage what we have in the Wolfram Language, which in effect already has lots of knowledge of the world built into it, in a way that both the human and the AI can use.
Thinking about how humans communicate with AIs is one thing. But how will AIs communicate with one another? One might imagine they could do literal transfers of their underlying representations of knowledge. But that wouldn’t work, because as soon as two AIs have had different “experiences”, the representations they use will inevitably be at least somewhat different.
And so, just like humans, the AIs are going to end up needing to use some form of symbolic language that represents concepts abstractly, without specific reference to the underlying representations of those concepts.
One might then think the AIs should just communicate in English; at least that way we’d be able to understand them! But it wouldn’t work out. Because the AIs would inevitably need to progressively extend their language—so even if it started as English, it wouldn’t stay that way.
In human natural languages, new words get added when there are new concepts that are widespread enough to make representing them in the language useful. Sometimes a new concept is associated with something new in the world (“blog”, “emoji”, “smartphone”, “clickbait”, etc.); sometimes it’s associated with a new distinction among existing things (“road” vs. “freeway”, “pattern” vs. “fractal”).
Often it’s science that gives us new distinctions between things, by identifying distinct clusters of behavior or structure. But the point is that AIs can do that on a much larger scale than humans. For example, our Image Identification Project is set up to recognize the 10,000 or so kinds of objects that we humans have everyday names for. But internally, as it’s trained on images from the world, it’s discovering all sorts of other distinctions that we don’t have names for, but that are successful at robustly separating things.
I’ve called these “postlinguistic emergent concepts” (or PLECs). And I think it’s inevitable that in a population of AIs, an everexpanding hierarchy of PLECs will appear, forcing the language of the AIs to progressively expand.
But how could the framework of English support that? I suppose each new concept could be assigned a word formed from some hashcodelike collection of letters. But a structured symbolic language—as the Wolfram Language is—provides a much better framework. Because it doesn’t require the units of the language to be simple “words”, but allows them to be arbitrary lumps of symbolic information, such as collections of examples (so that, for example, a word can be represented by a symbolic structure that carries around its definitions).
So should AIs talk to each other in Wolfram Language? It seems to make a lot of sense—because it effectively starts from the understanding of the world that’s been developed through human knowledge, but then provides a framework for going further. It doesn’t matter how the syntax is encoded (input form, XML, JSON, binary, whatever). What matters is the structure and content that are built into the language.
Over the course of the billions of years that life has existed on Earth, there’ve been a few different ways of transferring information. The most basic is genomics: passing information at the hardware level. But then there are neural systems, like brains. And these get information—like our Image Identification Project—by accumulating it from experiencing the world. This is the mechanism that organisms use to see, and to do many other “AIish” things.
But in a sense this mechanism is fundamentally limited, because every different organism—and every different brain—has to go through the whole process of learning for itself: none of the information obtained in one generation can readily be passed to the next.
But this is where our species made its great invention: natural language. Because with natural language it’s possible to take information that’s been learned, and communicate it in abstract form, say from one generation to the next. There’s still a problem however, because when natural language is received, it still has to be interpreted, in a separate way in each brain.
And this is where the idea of a computationalknowledge language—like the Wolfram Language—is important: because it gives a way to communicate concepts and facts about the world, in a way that can immediately and reproducibly be executed, without requiring separate interpretation on the part of whatever receives it.
It’s probably not a stretch to say that the invention of human natural language was what led to civilization and our modern world. So then what are the implications of going to another level: of having a precise computationalknowledge language, that carries not just abstract concepts, but also a way to execute them?
One possibility is that it may define the civilization of the AIs, whatever that may turn out to be. And perhaps this may be far from what we humans—at least in our present state—can understand. But the good news is that at least in the case of the Wolfram Language, precise computationalknowledge language isn’t incomprehensible to humans; in fact, it was specifically constructed to be a bridge between what humans can understand, and what machines can readily deal with.
So let’s imagine a world in which in addition to natural language, it’s also common for communication to occur through a computationalknowledge language like the Wolfram Language. Certainly, a lot of the computationalknowledgelanguage communication will be between machines. But some of it will be between humans and machines, and quite possibly it would be the dominant form of communication here.
In today’s world, only a small fraction of people can write computer code—just as, 500 or so years ago, only a small fraction of people could write natural language. But what if a wave of computer literacy swept through, and the result was that most people could write knowledgebased code?
Natural language literacy enabled many features of modern society. What would knowledgebased code literacy enable? There are plenty of simple things. Today you might get a menu of choices at a restaurant. But if people could read code, there could be code for each choice, that you could readily modify to your liking. (And actually, something very much like this is soon going be possible—with Wolfram Language code—for biology and chemistry lab experiments.) Another implication of people being able to read code is for rules and contracts: instead of just writing prose to be interpreted, one can have code to be read by humans and machines alike.
But I suspect the implications of widespread knowledgebased code literacy will be much deeper—because it will not only give a wide range of people a new way to express things, but will also give them a new way to think about them.
So, OK, let’s say we want to use the Wolfram Language to communicate with AIs. Will it actually work? To some extent we know it already does. Because inside WolframAlpha and the systems based on it, what’s happening is that natural language questions are being converted to Wolfram Language code.
But what about more elaborate applications of AI? Many places where the Wolfram Language is used are examples of AI, whether they’re computing with images or text or data or symbolic structures. Sometimes the computations involve algorithms whose goals we can precisely define, like FindShortestTour; sometimes they involve algorithms whose goals are less precise, like ImageIdentify. Sometimes the computations are couched in the form of “things to do”, sometimes as “things to look for” or “things to aim for”.
We’ve come a long way in representing the world in the Wolfram Language. But there’s still more to do. Back in the 1600s it was quite popular to try to create “philosophical languages” that would somehow symbolically capture the essence of everything one could think about. Now we need to really do this. And, for example, to capture in a symbolic way all the kinds of actions and processes that can happen, as well as things like people’s beliefs and mental states. As our AIs become more sophisticated and more integrated into our lives, representing these kinds of things will become more important.
For some tasks and activities we’ll no doubt be able to use pure machine learning, and never have to build up any kind of intermediate structure or language. But much as natural language was crucial in enabling our species to get where we have, so also having an abstract language will be important for the progress of AI.
I’m not sure what it would look like, but we could perhaps imagine using some kind of pure emergent language produced by the AIs. But if we do that, then we humans can expect to be left behind, and to have no chance of understanding what the AIs are doing. But with the Wolfram Language we have a bridge, because we have a language that’s suitable for both humans and AIs.
There’s much to be said about the interplay between language and computation, humans and AIs. Perhaps I need to write a book about it. But my purpose here has been to describe a little of my current thinking, particularly my realizations about the Wolfram Language as a bridge between human understanding and AI.
With pure natural language or traditional computer language, we’ll be hard pressed to communicate much to our AIs. But what I’ve been realizing is that with Wolfram Language there’s a much richer alternative, readily extensible by the AIs, but built on a base that leverages human natural language and human knowledge to maintain a connection with what we humans can understand. We’re seeing early examples already… but there’s a lot further to go, and I’m looking forward to actually building what’s needed, as well as writing about it…
]]>“What is this a picture of?” Humans can usually answer such questions instantly, but in the past it’s always seemed out of reach for computers to do this. For nearly 40 years I’ve been sure computers would eventually get there—but I’ve wondered when.
I’ve built systems that give computers all sorts of intelligence, much of it far beyond the human level. And for a long time we’ve been integrating all that intelligence into the Wolfram Language.
Now I’m excited to be able to say that we’ve reached a milestone: there’s finally a function called ImageIdentify built into the Wolfram Language that lets you ask, “What is this a picture of?”—and get an answer.
And today we’re launching the Wolfram Language Image Identification Project on the web to let anyone easily take any picture (drag it from a web page, snap it on your phone, or load it from a file) and see what ImageIdentify thinks it is:
It won’t always get it right, but most of the time I think it does remarkably well. And to me what’s particularly fascinating is that when it does get something wrong, the mistakes it makes mostly seem remarkably human.
It’s a nice practical example of artificial intelligence. But to me what’s more important is that we’ve reached the point where we can integrate this kind of “AI operation” right into the Wolfram Language—to use as a new, powerful building block for knowledgebased programming.
In a Wolfram Language session, all you need do to identify an image is feed it to the ImageIdentify function:
What you get back is a symbolic entity, that the Wolfram Language can then do more computation with—like, in this case, figure out if you’ve got an animal, a mammal, etc. Or just ask for a definition:
Or, say, generate a word cloud from its Wikipedia entry:
And if one had lots of photographs, one could immediately write a Wolfram Language program that, for example, gave statistics on the different kinds of animals, or planes, or devices, or whatever, that appear in the photographs.
With ImageIdentify built right into the Wolfram Language, it’s easy to create APIs, or apps, that use it. And with the Wolfram Cloud, it’s also easy to create websites—like the Wolfram Language Image Identification Project.
For me personally, I’ve been waiting a long time for ImageIdentify. Nearly 40 years ago I read books with titles like The Computer and the Brain that made it sound inevitable we’d someday achieve artificial intelligence—probably by emulating the electrical connections in a brain. And in 1980, buoyed by the success of my first computer language, I decided I should think about what it would take to achieve fullscale artificial intelligence.
Part of what encouraged me was that—in an early premonition of the Wolfram Language—I’d based my first computer language on powerful symbolic pattern matching that I imagined could somehow capture certain aspects of human thinking. But I knew that while tasks like image identification were also based on pattern matching, they needed something different—a more approximate form of matching.
I tried to invent things like approximate hashing schemes. But I kept on thinking that brains manage to do this; we should get clues from them. And this led me to start studying idealized neural networks and their behavior.
Meanwhile, I was also working on some fundamental questions in natural science—about cosmology and about how structures arise in our universe—and studying the behavior of selfgravitating collections of particles.
And at some point I realized that both neural networks and selfgravitating gases were examples of systems that had simple underlying components, but somehow achieved complex overall behavior. And in getting to the bottom of this, I wound up studying cellular automata and eventually making all the discoveries that became A New Kind of Science.
So what about neural networks? They weren’t my favorite type of system: they seemed a little too arbitrary and complicated in their structure compared to the other systems that I studied in the computational universe. But every so often I would think about them again, running simulations to understand more about the basic science of their behavior, or trying to see how they could be used for practical tasks like approximate pattern matching:
Neural networks in general have had a remarkable rollercoaster history. They first burst onto the scene in the 1940s. But by the 1960s, their popularity had waned, and the word was that it had been “mathematically proven” that they could never do anything very useful.
It turned out, though, that that was only true for onelayer “perceptron” networks. And in the early 1980s, there was a resurgence of interest, based on neural networks that also had a “hidden layer”. But despite knowing many of the leaders of this effort, I have to say I remained something of a skeptic, not least because I had the impression that neural networks were mostly getting used for tasks that seemed like they would be easy to do in lots of other ways.
I also felt that neural networks were overly complex as formal systems—and at one point even tried to develop my own alternative. But still I supported people at my academic research center studying neural networks, and included papers about them in my Complex Systems journal.
I knew that there were practical applications of neural networks out there—like for visual character recognition—but they were few and far between. And as the years went by, little of general applicability seemed to emerge.
Meanwhile, we’d been busy developing lots of powerful and very practical ways of analyzing data, in Mathematica and in what would become the Wolfram Language. And a few years ago we decided it was time to go further—and to try to integrate highly automated general machine learning. The idea was to make broad, general functions with lots of power; for example, to have a single function Classify that could be trained to classify any kind of thing: say, day vs. night photographs, sounds from different musical instruments, urgency level of email, or whatever.
We put in lots of stateoftheart methods. But, more importantly, we tried to achieve complete automation, so that users didn’t have to know anything about machine learning: they just had to call Classify.
I wasn’t initially sure it was going to work. But it does, and spectacularly.
People can give training data on pretty much anything, and the Wolfram Language automatically sets up classifiers for them to use. We’re also providing more and more builtin classifiers, like for languages, or country flags:
And a little while ago, we decided it was time to try a classic largescale classifier problem: image identification. And the result now is ImageIdentify.
What is image identification really about? There are some number of named kinds of things in the world, and the point is to tell which of them a particular picture is of. Or, more formally, to map all possible images into a certain set of symbolic names of objects.
We don’t have any intrinsic way to describe an object like a chair. All we can do is just give lots of examples of chairs, and effectively say, “Anything that looks like one of these we want to identify as a chair.” So in effect we want images that are “close” to our examples of chairs to map to the name “chair”, and others not to.
Now, there are lots of systems that have this kind of “attractor” behavior. As a physical example, think of a mountainscape. A drop of rain may fall anywhere on the mountains, but (at least in an idealized model) it will flow down to one of a limited number of lowest points. Nearby drops will tend to flow to the same lowest point. Drops far away may be on the other side of a watershed, and so will flow to other lowest points.
The drops of rain are like our images; the lowest points are like the different kinds of objects. With raindrops we’re talking about things physically moving, under gravity. But images are composed of digital pixels. And instead of thinking about physical motion, we have to think about digital values being processed by programs.
And exactly the same “attractor” behavior can happen there. For example, there are lots of cellular automata in which one can change the colors of a few cells in their initial conditions, but still end up in the same fixed “attractor” final state. (Most cellular automata actually show more interesting behavior, that doesn’t go to a fixed state, but it’s less clear how to apply this to recognition tasks.)
So what happens if we take images and apply cellular automaton rules to them? In effect we’re doing image processing, and indeed some common image processing operations (both done on computers and in human visual processing) are just simple 2D cellular automata.
It’s easy to get cellular automata to pick out certain features of an image—like blobs of dark pixels. But for real image identification, there’s more to do. In the mountain analogy, we have to “sculpt” the mountainscape so that the right raindrops flow to the right points.
So how do we do this? In the case of digital data like images, it isn’t known how to do this in one fell swoop; we only know how to do it iteratively, and incrementally. We have to start from a base “flat” system, and gradually do the “sculpting”.
There’s a lot that isn’t known about this kind of iterative sculpting. I’ve thought about it quite extensively for discrete programs like cellular automata (and Turing machines), and I’m sure something very interesting can be done. But I’ve never figured out just how.
For systems with continuous (realnumber) parameters, however, there’s a great method called back propagation—that’s based on calculus. It’s essentially a version of the very common method of gradient descent, in which one computes derivatives, then uses them to work out how to change parameters to get the system one is using to better fit the behavior one wants.
So what kind of system should one use? A surprisingly general choice is neural networks. The name makes one think of brains and biology. But for our purposes, neural networks are just formal, computational, systems, that consist of compositions of multiinput functions with continuous parameters and discrete thresholds.
How easy is it to make one of these neural networks perform interesting tasks? In the abstract, it’s hard to know. And for at least 20 years my impression was that in practice neural networks could mostly do only things that were also pretty easy to do in other ways.
But a few years ago that began to change. And one started hearing about serious successes in applying neural networks to practical problems, like image identification.
What made that happen? Computers (and especially linear algebra in GPUs) got fast enough that—with a variety of algorithmic tricks, some actually involving cellular automata—it became practical to train neural networks with millions of neurons, on millions of examples. (By the way, these were “deep” neural networks, no longer restricted to having very few layers.) And somehow this suddenly brought largescale practical applications within reach.
I don’t think it’s a coincidence that this happened right when the number of artificial neurons being used came within striking distance of the number of neurons in relevant parts of our brains.
It’s not that this number is significant on its own. Rather, it’s that if we’re trying to do tasks—like image identification—that human brains do, then it’s not surprising if we need a system with a similar scale.
Humans can readily recognize a few thousand kinds of things—roughly the number of picturable nouns in human languages. Lower animals likely distinguish vastly fewer kinds of things. But if we’re trying to achieve “humanlike” image identification—and effectively map images to words that exist in human languages—then this defines a certain scale of problem, which, it appears, can be solved with a “humanscale” neural network.
There are certainly differences between computational and biological neural networks—although after a network is trained, the process of, say, getting a result from an image seems rather similar. But the methods used to train computational neural networks are significantly different from what it seems plausible for biology to use.
Still, in the actual development of ImageIdentify, I was quite shocked at how much was reminiscent of the biological case. For a start, the number of training images—a few tens of millions—seemed very comparable to the number of distinct views of objects that humans get in their first couple of years of life.
There were also quirks of training that seemed very close to what’s seen in the biological case. For example, at one point, we’d made the mistake of having no human faces in our training. And when we showed a picture of Indiana Jones, the system was blind to the presence of his face, and just identified the picture as a hat. Not surprising, perhaps, but to me strikingly reminiscent of the classic vision experiment in which kittens reared in an environment of vertical stripes are blind to horizontal stripes.
Probably much like the brain, the ImageIdentify neural network has many layers, containing a variety of different kinds of neurons. (The overall structure, needless to say, is nicely described by a Wolfram Language symbolic expression.)
It’s hard to say meaningful things about much of what’s going on inside the network. But if one looks at the first layer or two, one can recognize some of the features that it’s picking out. And they seem to be remarkably similar to features we know are picked out by real neurons in the primary visual cortex.
I myself have long been interested in things like visual texture recognition (are there “texture primitives”, like there are primary colors?), and I suspect we’re now going to be able to figure out a lot about this. I also think it’s of great interest to look at what happens at later layers in the neural network—because if we can recognize them, what we should see are “emergent concepts” that in effect describe classes of images and objects in the world—including ones for which we don’t yet have words in human languages.
Like many projects we tackle for the Wolfram Language, developing ImageIdentify required bringing many diverse things together. Largescale curation of training images. Development of a general ontology of picturable objects, with mapping to standard Wolfram Language constructs. Analysis of the dynamics of neural networks using physicslike methods. Detailed optimization of parallel code. Even some searching in the style of A New Kind of Science for programs in the computational universe. And lots of judgement calls about how to create functionality that would actually be useful in practice.
At the outset, it wasn’t clear to me that the whole ImageIdentify project was going to work. And early on, the rate of utterly misidentified images was disturbingly high. But one issue after another got addressed, and gradually it became clear that finally we were at a point in history when it would be possible to create a useful ImageIdentify function.
There were still plenty of problems. The system would do well on certain things, but fail on others. Then we’d adjust something, and there’d be new failures, and a flurry of messages with subject lines like “We lost the anteaters!” (about how pictures that ImageIdentify used to correctly identify as anteaters were suddenly being identified as something completely different).
Debugging ImageIdentify was an interesting process. What counts as reasonable input? What’s reasonable output? How should one make the choice between getting morespecific results, and getting results that one’s more certain aren’t incorrect (just a dog, or a hunting dog, or a beagle)?
Sometimes we saw things that at first seemed completely crazy. A pig misidentified as a “harness”. A piece of stonework misidentified as a “moped”. But the good news was that we always found a cause—like confusion from the same irrelevant objects repeatedly being in training images for a particular type of object (e.g. “the only time ImageIdentify had ever seen that type of Asian stonework was in pictures that also had mopeds”).
To test the system, I often tried slightly unusual or unexpected images:
And what I found was something very striking, and charming. Yes, ImageIdentify could be completely wrong. But somehow the errors seemed very understandable, and in a sense very human. It seemed as if what ImageIdentify was doing was successfully capturing some of the essence of the human process of identifying images.
So what about things like abstract art? It’s a kind of Rorschachlike test for both humans and machines—and an interesting glimpse into the “mind” of ImageIdentify:
Something like ImageIdentify will never truly be finished. But a couple of months ago we released a preliminary version in the Wolfram Language. And today we’ve updated that version, and used it to launch the Wolfram Language Image Identification Project.
We’ll continue training and developing ImageIdentify, not least based on feedback and statistics from the site. Like for WolframAlpha in the domain of natural language understanding, without actual usage by humans there’s no real way to realistically assess progress—or even to define just what the goals should be for “natural image understanding”.
I must say that I find it fun to play with the Wolfram Language Image Identification Project. It’s satisfying after all these years to see this kind of artificial intelligence actually working. But more than that, when you see ImageIdentify respond to a weird or challenging image, there’s often a certain “aha” feeling, like one was just shown in a very humanlike way some new insight—or joke—about an image.
Underneath, of course, it’s just running code—with very simple inner loops that are pretty much the same as, for example, in my neural network programs from the beginning of the 1980s (except that now they’re Wolfram Language functions, rather than lowlevel C code).
It’s a fascinating—and extremely unusual—example in the history of ideas: neural networks were studied for 70 years, and repeatedly dismissed. Yet now they are what has brought us success in such a quintessential example of an artificial intelligence task as image identification. I expect the original pioneers of neural networks—like Warren McCulloch and Walter Pitts—would find little surprising about the core of what the Wolfram Language Image Identification Project does, though they might be amazed that it’s taken 70 years to get here.
But to me the greater significance is what can now be done by integrating things like ImageIdentify into the whole symbolic structure of the Wolfram Language. What ImageIdentify does is something humans learn to do in each generation. But symbolic language gives us the opportunity to represent shared intellectual achievements across all of human history. And making all these things computational is, I believe, something of monumental significance, that I am only just beginning to understand.
But for today, I hope you will enjoy the Wolfram Language Image Identification Project. Think of it as a celebration of where artificial intelligence has reached. Think of it as an intellectual recreation that helps build intuition for what artificial intelligence is like. But don’t forget the part that I think is most exciting: it’s also practical technology, that you can use here and now in the Wolfram Language, and deploy wherever you want.
]]>Two weeks ago I spoke at SXSW Interactive in Austin, TX. Here’s a slightly edited transcript (it’s the “speaker’s cut”, including some demos I had to abandon during the talk):
Well, I’ve got a lot planned for this hour.
Basically, I want to tell you a story that’s been unfolding for me for about the last 40 years, and that’s just coming to fruition in a really exciting way. And by just coming to fruition, I mean pretty much today. Because I’m planning to show you today a whole lot of technology that’s the result of that 40year story—that I’ve never shown before, and that I think is going to be pretty important.
I always like to do live demos. But today I’m going to be pretty extreme. Showing you a lot of stuff that’s very very fresh. And I hope at least a decent fraction of it is going to work.
OK, here’s the big theme: taking computation seriously. Really understanding the idea of computation. And then building technology that lets one inject it everywhere—and then seeing what that means.
I’ve pretty much been chasing this idea for 40 years. I’ve been kind of alternating between science and technology—and making these bigger and bigger building blocks. Kind of making this taller and taller stack. And every few years I’ve been able to see a bit farther. And I think making some interesting things. But in the last couple of years, something really exciting has happened. Some kind of grand unification—which is leading to a kind of Cambrian explosion of technology. Which is what I’m going to be showing you pieces of for the first time here today.
But just for context, let me tell you a bit of the backstory. Forty years ago, I was a 14yearold kid who’d just started using a computer—which was then about the size of a desk. I was using it not so much for its own sake, but instead to try to figure out things about physics, which is what I was really interested in. And I actually figured out a few things—which even still get used today. But in retrospect, I think the most important thing I figured out was kind of a meta thing. That the better the tools one uses, the further one can get. Like I was never good at doing math by hand, which in those days was a problem if you wanted to be a physicist. But I realized one could do math by computer. And I started building tools for that. And pretty soon me with my tools were better than almost anyone at doing math for physics.
And back in 1981—somewhat shockingly in those days for a 21yearold professor type—I turned that into my first product and my first company. And one important thing is that it made me realize that products can really drive intellectual thinking. I needed to figure out how to make a language for doing math by computer, and I ended up figuring out these fundamental things about computation to be able to do that. Well, after that I dived back into basic science again, using my computer tools.
And I ended up deciding that while math was fine, the whole idea of it really needed to be generalized. And I started looking at the whole universe of possible formal systems—in effect the whole computational universe of possible programs. I started doing little experiments. Kind of pointing my computational telescope into this computational universe, and seeing what was out there. And it was pretty amazing. Like here are a few simple programs.
Some of them do simple things. But some of them—well, they’re not simple at all.
This is my alltime favorite, because it’s the first one like this that I saw. It’s called rule 30, and I still have it on the back of my business cards 30 years later.
Trivial program. Trivial start. But it does something crazy. It sort of just makes complexity from nothing. Which is a pretty interesting phenomenon. That I think, by the way, captures a big secret of how things work in nature. And, yes, I’ve spent years studying this, and it’s really interesting.
But when I was first studying it, the big thing I realized was: I need better tools. And basically that’s why I built Mathematica. It’s sort of ironic that Mathematica has math in its name. Because in a sense I built it to get beyond math. In Mathematica my original big idea was to kind of drill down below all the math and so on that one wanted to do—and find the computational bedrock that it could all be built on. And that’s how I ended up inventing the language that’s in Mathematica. And over the years, it’s worked out really well. We’ve been able to build ever more and more on it.
And in fact Mathematica celebrated its 25th anniversary last year—and in those 25 years it’s gotten used to invent and discover and learn a zillion things—in pretty much all the universities and big companies and so on around the world. And actually I myself managed to carve out a decade to actually use Mathematica to do science myself. And I ended up discovering lots of things—scientific, technological and philosophical—and wrote this big book about them.
Well, OK, back when I was a kid something I was always interested in was systematizing information. And I had this idea that one day one should be able to automate being able to answer questions about basically anything. I figured out a lot about how to answer questions about math computations. But somehow I imagined that to do this in general, one would need some kind of general artificial intelligence—some sort of brainlike AI. And that seemed very hard to make.
And every decade or so I would revisit that. And conclude that, yes, that was still hard to make. But doing the science I did, I realized something. I realized that if one even just runs a tiny program, it can end up doing something of sort of brainlike complexity.
There really isn’t ultimately a distinction between brainlike intelligence, and this. And that’s got lots of implications for things like free will versus determinism, and the search for extraterrestrial intelligence. But for me it also made me realize that you shouldn’t need a brainlike AI to be able to answer all those questions about things. Maybe all you need is just computation. Like the kind we’d spent years building in Mathematica.
I wasn’t sure if it was the right decade, or even the right century. But I guess that’s the advantage of having a simple private company and being in charge; I just decided to do the experiment anyway. And, I’m happy to say, it turned out it was possible. And we built WolframAlpha.
You type stuff in, in natural language. And it uses all the curated data and knowledge and methods and algorithms that we’ve put into it, to basically generate a report about what you asked. And, yes, if you’re a WolframAlpha user, you might notice that WolframAlpha on the web just got a new spiffier look yesterday. WolframAlpha knows about all sorts of things. Thousands of domains, covering a really broad area. Trillions of pieces of data.
And indeed, every day many millions of people ask it all sorts of things—directly on the website, or through its apps or things like Siri that use it.
Well, OK, so we have Mathematica, which has this kind of bedrock language for describing computations—and for doing all sorts of technical computations. And we also have WolframAlpha—which knows a lot about the world—and which people interact with in this sort of much messier way through natural language. Well, Mathematica has been growing for more than 25 years, WolframAlpha for nearly 5. We’ve continually been inventing ways to take the basic ideas of these systems further and further. But now something really big and amazing has happened. And actually for me it was catalyzed by another piece: the cloud.
Now I didn’t think the cloud was really an intellectual thing. I thought it was just sort of a utility. But I was wrong. Because I finally understood how it’s the missing piece that lets one take kind of the two big approaches to computation in Mathematica and in WolframAlpha and make something just dramatically bigger from them.
Now, I’ve got to tell you that what comes out of all of this is pretty intellectually complicated. But it’s also very very directly practical. I always like these situations. Where big ideas let one make actually really useful new products. And that’s what’s happened here. We’ve taken one big idea, and we’re making a bunch of products—that I hope will be really useful. And at some level each product is pretty easy to explain. But the most exciting thing is what they all mean together. And that’s what I’m going to try to talk about here. Though I’ll say up front that even though I think it’s a really important story, it’s not an easy story to tell.
But let’s start. At the core of pretty much everything is what we call the Wolfram Language. Which is something we’re just starting to release now.
The core of the Wolfram Language has been sort of incubating in Mathematica for more than 25 years. It’s kind of been proven there. But what just happened is that we got all these new ideas and technology from WolframAlpha, and from the Cloud. And they’ve let us make something that’s really qualitatively different. And that I’m very excited about.
So what’s the idea? It’s really to make a language that’s knowledge based. A language where built right into the language is huge amounts of knowledge about computation and about the world. You see, most computer languages kind of stay close to the basic operations of the machine. They give you lots of good ways to manage code you build. And maybe they have addon libraries to do specific things.
But our idea with the Wolfram Language is kind of the opposite. It’s to make a language that has as much built in as possible. Where the language itself does as much as possible. To make everything as automated as possible for the programmer.
OK. Well let’s give it a try.
You can use the Wolfram Language completely interactively, using the notebook interface we built for Mathematica.
OK, that’s good. Let’s do something a little harder:
Yup, that’s a big number. Kind of looks like a bunch of random digits. Might be like 60,000 data points of sensor data.
How do we analyze it? Well, the Wolfram Language has all that stuff built in.
So like here’s the mean:
And the skewness:
Or hundreds of other statistical tests. Or visualizations.
That’s kind of weird actually. But let me not get derailed trying to figure out why it looks like that.
OK. Here’s something completely different. Let’s have the Wolfram Language go to some kind volunteer’s Facebook account and pull out their friend network:
OK. So that’s a network. The Wolfram Language knows how to deal with those. Like let’s compute how that breaks into communities:
Let’s try something different. Let’s get an image from this little camera:
OK. Well now let’s do something to that. We can just take that image and feed it to a function:
So now we’ve gotten the image broken into little pieces. Let’s make that dynamic:
Let’s rotate those around:
Let’s like even sort them. We can make some funky stuff:
OK. That’s kind of cool. Why don’t we tweet it?
OK. So the whole point is that the Wolfram Language just intrinsically knows a lot of stuff. It knows how to analyze networks. It knows how to deal with images—doing all the fanciest image processing. But it also knows about the world. Like we could ask it when the sun rose this morning here:
Or the time from sunrise to sunset today:
Or we could get the current recorded air temperature here:
Or the time series for the past day:
OK. Here’s a big thing. Based on what we’ve done for WolframAlpha, we can understand lots of natural language. And what’s really powerful is that we can use that to refer to things in the real world.
Let’s just type control= nyc:
And that just gives us the entity of New York City. So now we can find the temperature difference between here and New York City:
OK. Let’s do some more:
Let’s find the lengths of those borders:
Let’s put that in a grid:
Or maybe let’s make a word cloud out of that:
Or we could find all the former Soviet countries:
And let’s find their flags:
And let’s like find which is closest to the French flag:
Pretty neat, eh?
Or let’s take the first few former Soviet republics. And generate maps of their capital cities. With 10mile discs marked:
I think it’s pretty amazing that you can do that kind of thing right from inside a programming language, with just a line of code.
And, you know, there’s a huge amount of knowledge built into the Wolfram Language. We’ve been building this for more than a quarter of a century.
There’s knowledge about algorithms. And about the world.
There are two big principles here. The first is maximum automation: automate as much as possible. You define what you want the language to do, then it’s up to it to figure out how to do it. There might be hundreds of algorithms for doing different cases of something. But what we want to do is to make a metaalgorithm that selects the best way to do it. So kind of all the human has to do is to define their goal, then it’s up to the system to do things in the way that’s fastest, most accurate, best looking.
Like here’s an example. There’s a function Classify that tries to classify things. You just type Classify. Like here’s a very small training set of handwritten digits:
And this makes a classifier.
Which we can then apply to something we draw:
OK, well here’s another big thing about the Wolfram Language: coherence. Unification. We want to make everything in the language fit together. Even though it’s a huge system, if you’re doing something over here with geographic data, we want to make sure it fits perfectly with what you’re doing over there with networks.
I’ve spent a decent fraction of the last 25 years of my life implementing the kind of design discipline that’s needed. It’s been fascinating, but it’s been hard work. Spending all that time to make things obvious. To make it so it’s easy for people to learn and remember and guess. But you know, having all these building blocks fit together: that’s also where the most powerful new algorithms come from. And we’ve had a great time inventing tons and tons of new algorithms that are really only possible in our language—where we have all these different areas integrated.
And there’s actually a really fundamental reason that we can do this kind of integration. It’s because the Wolfram Language has this very fundamental feature of being symbolic. If you just type x into the language, it doesn’t give some error about x being undefined. x is just a thing—symbolic x—that the language can deal with. Of course that’s very nice for math.
But as far as I am concerned, one of the big discoveries is that this idea of a symbolic language is incredibly powerful for zillions of other things too. Everything in our language is symbolic. Math expressions.
Or entities, like Austin, TX:
Or like a piece of graphics. Here’s a sphere:
Here are a bunch of cylinders:
And because everything is just a symbolic expression, we could pick this up, and, like, do image processing on it:
You know, everything is just a symbolic expression. Like another example is interfaces. Here’s a symbolic slider:
Here’s a whole array of sliders:
You know, once everything is symbolic, there’s just a whole lot you can do. Here’s nesting some purely symbolic function f:
Here’s nesting, like, a function that makes a frame:
And here’s symbolically nesting, like, an interface element:
My gosh, it’s a fractal interface!
You know, once things are symbolic, it’s really easy to hook everything up. Like here’s a plot:
And now it’s trivial to make it interactive:
You can do that with anything:
OK. Here’s another thing that can be made symbolic: documents.
The document I’m typing into here is just another symbolic expression. And you can create whatever you want in it symbolically.
Like here’s some text. We could twirl it around if we want to:
All just symbolic expressions.
OK. So here’s yet another thing that’s a symbolic expression: code. Every piece of code in the Wolfram Language is just a symbolic expression, that can be picked up and manipulated, and passed around, and run, wherever you want. That’s incredibly important for programming. Because it means you can build things in a really modular way. Every piece can stand on its own.
It’s also important for another reason: it’s a great way to deal with the cloud, sort of treating it as a giant active repository for symbolic lumps of computation. And in fact we’ve built this whole infrastructure for that, that I’m going to demo for the first time here today.
Well, let’s say we have a symbolic expression:
Now we can just deploy it to the Cloud like this:
And we’ve got a symbolic CloudObject, with a URL we can go to from anywhere. And there’s our material.
Now let’s make this not static content, but an actual program. And on the web, a good way to do that is to have an API. But with our whole notion of everything being symbolic, we can represent that as just another symbolic expression:
And now we can deploy that to the Cloud:
And we’ve got an Instant API. Now we can just fill in an API parameter ?size=150 and we can run this from anywhere on the web:
And every time what’ll happen is that you’ll be calling that piece of Wolfram Language code in the Wolfram Cloud, and getting the result back. OK.
Here’s another thing to do: make a form. Just change the APIFunction to a FormFunction:
Now what we’ve got is a form:
Let’s add a feature:
Now let’s fill some values into the form:
And when we press Submit, here’s the result:
OK. Let’s try a different case. Here’s a form that takes two cities, and draws a map of the path between them:
Let’s deploy it in the Cloud:
Now let’s fill in the form:
And when we press Submit, here’s what we get:
One line of code and an actual little web app! It’s got quite a bit of technology inside it. Like you see these fields. They’re what we call smart fields. That leverage our natural language understanding stack:
If you don’t give a city, here’s what happens:
When you do give a city, the system is automatically interpreting the inputs as city entities. Let me show you what happens inside. Let’s just define a form that just returns a list of its inputs:
Now if we enter cities, we just get Wolfram Language symbolic entity objects. Which of course we can then compute with:
All right, let’s try something else.
Let’s do a sort of modern programming example. Let’s make a silly app that shows us pictures through the eyes of a cat or a dog. OK, let’s build the framework:
Now let’s pull in an actual algorithm for dog vision. Color channels, and acuity.
OK. Let’s deploy with that:
Now we can send that over as an app. But first let’s build an icon for it:
And now let’s deploy it as a public app:
Now let’s go to the Wolfram Cloud app on an iPad:
And there’s the app we just published:
Now we click that icon—and there we have it: a mobile app running against the Wolfram Language in the Cloud:
And we can just use the iPad camera to input a picture, and then run the app on it:
Pretty neat, eh?
OK, but there’s more. Actually, let me tell you about the first product that’s coming out of our Wolfram Language technology stack. It should be available very soon. We call it the Wolfram Programming Cloud.
It’s all the stuff I’m showing you, but all happening in the Cloud. Including the programming. And, yes, there’s a desktop version too.
OK, so here’s the Programming Cloud:
Deploy from the Cloud. Define a function and just use CloudDeploy[]:
Or use the GUI:
Oh, another thing is to take CDF and deploy it to run in the Cloud.
Let’s take some code from the Wolfram Demonstrations Project. Actually, as it happens, this was the very first Demonstration I wrote when were originally building that site:
Now here’s the deployed Cloud CDF:
It just needs a web browser. And gives arbitrary interactivity by running against the Wolfram Engine in the Cloud.
OK, well, using this technology, another product we’re building is our Data Science Platform.
And the idea is that data comes in, from all sorts of sources. And then we have all these automatic ways to analyze it. Using sort of a giant metaalgorithm. As well as using all the knowledge of the actual world that we have.
Well, then you can program whatever you want with the Wolfram Language. And in the end you can make reports. On demand, like from an API or an app. Or just on a schedule. And we can use our whole CDF symbolic documents to set up these reports.
Like here’s a template for a report on the state of my email inbox. It’s just defined as a symbolic document. That I go ahead and edit.
And then programmatically generate reports from:
You know, there are some really spectacular things we can do with data using our whole symbolic language technology stack. And actually just recently we realized that we can use it to make a very clean unification and generalization of SQL and NoSQL databases. And we’re implementing that in sort of four transparent levels. In memory. In files. In databases. And distributed.
But OK. Another thing is that we’ve got a really good way to represent individual pieces of data. We call it WDF—the Wolfram Data Framework.
And basically what it is, is taking the kind of algorithmic ontology that we built for WolframAlpha—and that we know works—and exposing that. And using our natural language understanding to be able to take unstructured data, and automatically convert it to something that’s structured and computable. And that for example our Data Science Platform can do really good things with.
Well, OK. Here’s another thing. A rapidly increasing source of data out there in the world are connected devices. And we’ve been pretty deeply involved with those. And actually one thing I wanted to do recently was just to find out what devices there are out there. So we started our Connected Devices Project, to just curate the devices out there—just like we curate all sorts of other things in WolframAlpha.
We have about 2500 devices in here now, growing every day. And, yes, we’re using WDF to organize this, and, yes, all this data is available from WolframAlpha.
Well, OK. So there are all these devices. And they measure things and do things. And at some point they typically make web contact. And one thing we’re doing—with our Data Science Platform and everything—is to create a really smooth infrastructure for handling things from there on. For visualizing and analyzing and computing everything that comes from that Internet of Things.
You know, even for devices that haven’t yet made web contact, it can be a bit messier, but we’ve got a framework for handling those too. Like here’s an accelerometer connected to an Arduino:
Let’s see if we can get that data into the Wolfram Language. It’s not too hard:
And now we can immediately plot this:
So that’s connecting a device to the Wolfram Language. But there’s something else coming too. And that’s actually putting the Wolfram Language onto devices. And this is where 25 years of tight software engineering pays back. Because as soon as devices run things like Linux, we can run the Wolfram Language on them. And actually there’s now a preliminary version of the Wolfram Language bundled with the standard operating system for every Raspberry Pi.
It’s pretty neat being able to have little $25 devices that persistently run the Wolfram Language. And connect to sensors and actuators and things. And every little computer out there just gets represented as yet another symbolic object in the Wolfram Language. And, like, it’s trivial to use the builtin parallel computation capabilities of the Wolfram Language to pull data from lots of such machines.
And going forward, you can expect to see the Wolfram Language running on lots of embedded processors. There’s another kind of embedding we’re interested in too. And that’s software embedding. We want to have a Universal Deployment System for the Wolfram Language.
Given a Wolfram Language program, there are lots of ways to deploy it.
Here’s one: being able to call Wolfram Language code from other languages.
And we have a really easy way to do that. There’s a GUI, but in the Wolfram Language, you can just take an API function, and say: create embed code for this for Python. Or Java. Or whatever.
And you can then just insert that code in your external program, and it’ll call the Wolfram Cloud to get a computation done. Actually, there are going to be ways to do this from inside IDEs, like Wolfram Workbench.
This is really easy to set up, and as I said, it just calls the Wolfram Cloud to run Wolfram Language code. But there’s even another concept. There’s an Embedded Wolfram Engine that you can run locally too. And essentially the same code will then work. But now you’re running on your local machine, not in the Cloud. And things get pretty interesting, being able to put Embedded Wolfram Engines inside all kinds of software, to immediately add all that knowledgebased capability, and all those algorithms, and natural language and so on. Here’s what the Embedded Wolfram Engine looks like inside the Unity Game Engine IDE:
Well, talking of embedding, let me mention yet another part of our technology stack. The Wolfram Language is supposed to describe the world. And so what about describing devices and machines and so on.
Well, conveniently enough we have a product related to our Mathematica business called SystemModeler, which does largescale system modeling and simulation:
And now that’s all getting integrated into the Wolfram Language too.
So here’s a representation of a rectifier circuit:
And this is all it takes to simulate this device:
And to plot parameters from the simulation:
And here’s yet another thing. We’re taking the natural language understanding capabilities that we created for WolframAlpha, and we’re setting them up to be customizable. Now of course that’s big when one’s querying databases, or controlling devices. It’s also really interesting when one’s interacting with simulations. Looking at some machine out in the field, and being able to figure out things about it by talking to one’s mobile device, and then getting a simulation done in the Cloud.
There are lots of possibilities. But OK, so how can people actually use these things? Well, in the next couple of weeks there’ll be an open sandbox on the web for people to use the Wolfram Language. We’ve got a gallery of examples that gives good places to start.
Oh, as well as 100,000 live examples in the Wolfram Language documentation.
And, OK, the Wolfram Programming Cloud is also coming very soon. And it’ll be completely free to start developing with it, and even to do smallscale deployments.
So what does this mean?
Well, I think it’s pretty exciting. Because I think we just really changed the economics of going from algorithmic ideas to deployed products. If you come by our booth at the South By trade show, we’ll be doing a bunch of live coding there. And perhaps we’ll even be able to create little products for people right there. But I think our Programming Cloud is going to open up a surge of algorithmic startups. And I’ll be really interested to see what comes out.
OK. Here’s another thing that’s going to change I think: programming education. I think the Wolfram Language is sort of uniquely good for education. Because it’s a language where you get to do real things incredibly easily. You get to see computation at work in an incredibly powerful way. And, by the way, rather effortlessly see a bunch of modern computer science ideas… and immediately connect to the real world.
And the natural language aspect makes it really easy to get started. For serious programmers, I think having snippets of natural language programming, particularly in places where one’s connecting to the real world, is very powerful. But for people getting started, it’s really nice to be able to create things just with natural language.
Like here we can just say:
And have the code generated automatically.
We’re really interested in all the educational possibilities here. Certainly there’s the raw material for a zillion great hackathon projects.
You know, every summer for the past dozen years we’ve done a very successful summer school about the new kind of science I’ve worked on:
Where we’re effectively doing realtime science. We’ve also for a few years had a summer camp for highschool students:
And we’re using our experience here to build out a bunch of ways to use the Wolfram Language for programming education. You know, we’ve been involved in education for a long time—more than 25 years. Mathematica is incredibly widely used there. WolframAlpha I’m happy to say has become sort of a universal tool for students.
There’s more and more coming.
Like here’s a version of WolframAlpha in Chinese that’s coming soon:
Here’s a Problem Generator created with the Wolfram Language and available through WolframAlpha Pro:
And we’re going to be doing all sorts of elaborate educational analytics and things through our Cloud system. You know, there are just so many possibilities. Like we have our CDF—Computable Document Format—that people have used for quite a few years to make interactive Demonstrations.
In fact here’s our site with nearly 10,000 of them:
And now with our Cloud system we can just run all of these directly in a web browser, using Cloud CDF, so they become easy to integrate into web learning environments. Like here’s an example that just got done by Versal:
Well, OK, at kind of the other end of things from education, there’s a lot going on in the corporate area. We’ve been doing largescale custom deployments of WolframAlpha for several years. But now with our Data Science Platform coming, we’ve got a kind of infinitely customizable version of that. And of course everything is integrated between cloud and desktop. And we’re going to have private clouds too.
But all this is just the beginning. Because what we’ve got with the whole Wolfram Language stack is a kind of universal platform for creating products. And we’ve got a whole sequence of products in the pipeline. It’s an exciting feeling having all this stuff that we’ve been doing for more than a quarter of a century come together like this.
Of course, it’s big challenge dealing with all the possibilities. I mean, we’re just a little private company with about 700—admittedly very talented—people.
We’ve started spinning off companies. Like Touch Press which makes iPad ebooks.
And we’ll be doing more of that, though we need more entrepreneurs. And we might even take investors.
But, OK, what about the broader future?
I think about that a fair amount. I don’t have time to say much here. But let me say just a few things. In what we’ve done with computation and knowledge, we’re trying to take the knowledge of our civilization, and put it in computable form. So we can essentially inject it everywhere. In something like WolframAlpha, we’re essentially doing ondemand computation. You ask for something, and WolframAlpha will do it.
Increasingly, we’re going to have preemptive computation. We’re building towards that a lot with the Wolfram Language. Being able to model the world, and make predictions about what’s going to happen. Being able to tell you what you might want to do next. In fact, whenever you use the Wolfram Language interactively, you’ll see this little Suggestions Bar that’s using some fairly fancy computation to suggest what to do next.
But the real way to have that work is to use knowledge about you. I’ve been an enthusiast of personal analytics for a long time. Like here’s a 25year history of my diurnal email rhythm:
And as we have more sensors and outsource more of our memory, our machines will be better and better at telling us what to do. And at some level the machines take over just because the humans tend to follow the autosuggests they make.
But OK. Here’s something I realized recently. I’m interested in history, and I was visiting the archives of Gottfried Leibniz, who lived about 300 years ago, and had a lot of rather modern ideas about computing. But in his time he had only one—very primitive—protocomputer that he built:
Today we have billions of computers. So I was thinking about the extrapolation. And I realized that one day there won’t just be lots more computers—everything will actually be made of computers.
Biology has already a little bit figured out this idea. But one day it won’t be worth making anything out of dumb materials; instead everything will be made out of stuff that’s completely programmable.
So what does that mean? Well, of course it really blurs the distinction between hardware and software. And it means that these languages we create sort of become what everything is made of. You know, I’ve been interested for a long time in the fundamental theory of physics. And in fact with a bunch of science I’ve done, I think there’s a real possibility that we’ve finally got a new way to find such a theory. In effect a way to find our physical universe out in the computational universe of all possible universes.
But here’s the funny thing: once everything is made of computers, even though it’ll be really cool to find the fundamental theory of physics—and I still want to do it—it’s not going to matter so much. Because in effect that actually physics is just the machine code for the universe. But everything we deal with is on top of a layer that we can program however we want.
Well, OK, what does that mean for us humans? No doubt we’ll get to deploy in that sort of muchmorethanbiologyprogrammable world. Where in effect you can just build any universe for yourself. I sort of imagine this moment where there’s a box of a trillion souls. Running in whatever pieces of the computational universe they want.
And what happens? Well, there’s lots of computation going on. But from the science I’ve done—and particularly the Principle of Computational Equivalence—I think it’s sort of a very Copernican situation. I don’t think there’s anything fundamentally different about that computation, from what goes on all over the universe, and even in rather simple programs.
And at some level the only thing that’s special about that particular box of a trillion souls is that it’s based on our particular history. Now, you know, I deal with all this tech stuff. But I happen to like people; I guess that’s why I’ve liked building a company, and mentoring lots of people. And in a sense seeing how much is possible, and how much can sort of be generalized and virtualized with technology, actually makes me think people are more important rather than less. Because when everything is possible, what matters is just what one wants or chooses to do.
It’s sort of a big version of what we’re doing with the Wolfram Language. Humans define the goals, then technology automatically tries to achieve them. And the more we can inject computation into everything, the more this becomes possible. And, you know, I happen to think that the injection of computation into everything will be a defining feature—perhaps the defining feature—of this time in history.
And I have to say I’m personally pleased to have lived at the right time to make some contribution to this. It’s a great privilege. And I’m very pleased to have been able to tell you a little bit about it here today.
Thank you very much.
]]>