*Last week I gave a talk (and did a panel discussion) at a conference entitled “Ethics of Artificial Intelligence” held at the NYU Philosophy Department’s Center for Mind, Brain and Consciousness. Here’s the video and a transcript:*

Thanks for inviting me here today.

You know, it’s funny to be here. My mother was a philosophy professor in Oxford. And when I was a kid I always said the one thing I’d never do was do or talk about philosophy. But, well, here I am.

Before I really get into AI, I think I should say a little bit about my worldview. I’ve basically spent my life alternating between doing basic science and building technology. I’ve been interested in AI for about as long as I can remember. But as a kid I started out doing physics and cosmology and things. That got me into building technology to automate stuff like math. And that worked so well that I started thinking about, like, how to really know and compute everything about everything. That was in about 1980—and at first I thought I had to build something like a brain, and I was studying neural nets and so on. But I didn’t get too far.

And meanwhile I got interested in an even bigger problem in science: how to make the most general possible theories of things. The dominant idea for 300 years had been to use math and equations. But I wanted to go beyond them. And the big thing I realized was that the way to do that was to think about programs, and the whole computational universe of possible programs.

And that led to my personal Galileo-like moment. I just pointed my “computational telescope” at these simplest possible programs, and I saw this amazing one I called rule 30—that just seemed to go on producing complexity forever from essentially nothing.

Well, after I’d seen this, I realized this is actually something that happens all over the computational universe—and all over nature. It’s really the secret that lets nature make all the complicated stuff we see. But it’s something else too: it’s a window into what raw, unfettered computation is like. At least traditionally when we do engineering we’re always building things that are simple enough that we can foresee what they’ll do.

But if we just go out into the computational universe, things can be much wilder. Our company has done a lot of mining out there, finding programs that are useful for different purposes, like rule 30 is for randomness. And modern machine learning is kind of part way from traditional engineering to this kind of free-range mining.

But, OK, what can one say in general about the computational universe? Well, all these programs can be thought of as doing computations. And years ago I came up with what I call the Principle of Computational Equivalence—that says that if behavior isn’t obviously simple, it typically corresponds to a computation that’s maximally sophisticated. There are lots of predictions and implications of this. Like that universal computation should be ubiquitous. As should undecidability. And as should what I call computational irreducibility.

Can you predict what it’s going to do? Well, it’s probably computationally irreducible, which means you can’t figure out what it’s going to do without effectively tracing every step and going through the same computational effort it does. It’s completely deterministic. But to us it’s got what seems like free will—because we can never know what it’s going to do.

Here’s another thing: what’s intelligence? Well, our big unifying principle says that everything—from a tiny program, to our brains, is computationally equivalent. There’s no bright line between intelligence and mere computation. The weather really does have a mind of its own: it’s doing computations just as sophisticated as our brains. To us, though, it’s pretty alien computation. Because it’s not connected to our human goals and experiences. It’s just raw computation that happens to be going on.

So how do we tame computation? We have to mold it to our goals. And the first step there is to describe our goals. And for the past 30 years what I’ve basically been doing is creating a way to do that.

I’ve been building a language—that’s now called the Wolfram Language—that allows us to express what we want to do. It’s a computer language. But it’s not really like other computer languages. Because instead of telling a computer what to do in its terms, it builds in as much knowledge as possible about computation and the world, so that we humans can describe in our terms what we want, and then it’s up to the language to get it done as automatically as possible.

This basic idea has worked really well, and in the form of Mathematica it’s been used to make endless inventions and discoveries over the years. It’s also what’s inside Wolfram|Alpha. Where the idea is to take pure natural language questions, understand them, and use the kind of curated knowledge and algorithms of our civilization to answer them. And, yes, it’s a very classic AIish thing. And of course it’s computed answers to billions and billions of questions from humans, for example inside Siri.

I had an interesting experience recently, figuring out how to use what we’ve built to teach computational thinking to kids. I was writing exercises for a book. At the beginning, it was easy: “make a program to do X”. But later on, it was like “I know what to say in the Wolfram Language, but it’s really hard to express in English”. And of course that’s why I just spent 30 years building the Wolfram Language.

English has maybe 25,000 common words; the Wolfram Language has about 5000 carefully designed built-in constructs—including all the latest machine learning—together with millions of things based on curated data. And the idea is that once one can think about something in the world computationally, it should be as easy as possible to express it in the Wolfram Language. And the cool thing is, it really works. Humans, including kids, can read and write the language. And so can computers. It’s a kind of high-level bridge between human thinking, in its cultural context, and computation.

OK, so what about AI? Technology has always been about finding things that exist, and then taming them to automate the achievement of particular human goals. And in AI the things we’re taming exist in the computational universe. Now, there’s a lot of raw computation seething around out there—just as there’s a lot going on in nature. But what we’re interested in is computation that somehow relates to human goals.

So what about ethics? Well, maybe we want to constrain the computation, the AI, to only do things we consider ethical. But somehow we have to find a way to describe what we mean by that.

Well, in the human world, one way we do this is with laws. But so how do we connect laws to computations? We may call them “legal codes”, but today laws and contracts are basically written in natural language. There’ve been simple computable contracts in areas like financial derivatives. And now one’s talking about smart contracts around cryptocurrencies.

But what about the vast mass of law? Well, Leibniz—who died 300 years ago next month—was always talking about making a universal language to, as we would say now, express it all in a computable way. He was a few centuries too early, but I think now we’re finally in a position to do this.

I just posted a long blog about all this last week, but let me try to summarize. With the Wolfram Language we’ve managed to express a lot of kinds of things in the world—like the ones people ask Siri about. And I think we’re now within sight of what Leibniz wanted: to have a general symbolic discourse language that represents everything involved in human affairs.

I see it basically as a language design problem. Yes, we can use natural language to get clues, but ultimately we have to build our own symbolic language. It’s actually the same kind of thing I’ve done for decades in the Wolfram Language. Take even a word like “plus”. Well, in the Wolfram Language there’s a function called `Plus`, but it doesn’t mean the same thing as the word. It’s a very specific version, that has to do with adding things mathematically. And as we design a symbolic discourse language, it’s the same thing. The word “eat” in English can mean lots of things. But we need a concept—that we’ll probably refer to as “eat”—that’s a specific version, that we can compute with.

So let’s say we’ve got a contract written in natural language. One way to get a symbolic version is to use natural language understanding—just like we do for billions of Wolfram|Alpha inputs, asking humans about ambiguities. Another way might be to get machine learning to describe a picture. But the best way is just to write in symbolic form in the first place, and actually I’m guessing that’s what lawyers will be doing before too long.

And of course once you have a contract in symbolic form, you can start to compute about it, automatically seeing if it’s satisfied, simulating different outcomes, automatically aggregating it in bundles, and so on. Ultimately the contract has to get input from the real world. Maybe that input is “born digital”, like data about accessing a computer system, or transferring bitcoin. Often it’ll come from sensors and measurements—and it’ll take machine learning to turn into something symbolic.

Well, if we can express laws in computable form maybe we can start telling AIs how we want them to act. Of course it might be better if we could boil everything down to simple principles, like Asimov’s Laws of Robotics, or utilitarianism or something.

But I don’t think anything like that is going to work. What we’re ultimately trying to do is to find perfect constraints on computation, but computation is something that’s in some sense infinitely wild. The issue already shows up in Gödel’s Theorem. Like let’s say we’re looking at integers and we’re trying to set up axioms to constrain them to just work the way we think they do. Well, what Gödel showed is that no finite set of axioms can ever achieve this. With any set of axioms you choose, there won’t just be the ordinary integers; there’ll also be other wild things.

And the phenomenon of computational irreducibility implies a much more general version of this. Basically, given any set of laws or constraints, there’ll always be “unintended consequences”. This isn’t particularly surprising if one looks at the evolution of human law. But the point is that there’s theoretically no way around it. It’s ubiquitous in the computational universe.

Now I think it’s pretty clear that AI is going to get more and more important in the world—and is going to eventually control much of the infrastructure of human affairs, a bit like governments do now. And like with governments, perhaps the thing to do is to create an AI Constitution that defines what AIs should do.

What should the constitution be like? Well, it’s got to be based on a model of the world, and inevitably an imperfect one, and then it’s got to say what to do in lots of different circumstances. And ultimately what it’s got to do is provide a way of constraining the computations that happen to be ones that align with our goals. But what should those goals be? I don’t think there’s any ultimate right answer. In fact, one can enumerate goals just like one can enumerate programs out in the computational universe. And there’s no abstract way to choose between them.

But for us there’s a way to choose. Because we have particular biology, and we have a particular history of our culture and civilization. It’s taken us a lot of irreducible computation to get here. But now we’re just at some point in the computational universe, that corresponds to the goals that we have.

Human goals have clearly evolved through the course of history. And I suspect they’re about to evolve a lot more. I think it’s pretty inevitable that our consciousness will increasingly merge with technology. And eventually maybe our whole civilization will end up as something like a box of a trillion uploaded human souls.

But then the big question is: “what will they choose to do?”. Well, maybe we don’t even have the language yet to describe the answer. If we look back even to Leibniz’s time, we can see all sorts of modern concepts that hadn’t formed yet. And when we look inside a modern machine learning or theorem proving system, it’s humbling to see how many concepts it effectively forms—that we haven’t yet absorbed in our culture.

Maybe looked at from our current point of view, it’ll just seem like those disembodied virtual souls are playing videogames for the rest of eternity. At first maybe they’ll operate in a simulation of our actual universe. Then maybe they’ll start exploring the computational universe of all possible universes.

But at some level all they’ll be doing is computation—and the Principle of Computational Equivalence says it’s computation that’s fundamentally equivalent to all other computation. It’s a bit of a letdown. Our proud future ending up being computationally equivalent just to plain physics, or to little rule 30.

Of course, that’s just an extension of the long story of science showing us that we’re not fundamentally special. We can’t look for ultimate meaning in where we’ve reached. We can’t define an ultimate purpose. Or ultimate ethics. And in a sense we have to embrace the details of our existence and our history.

There won’t be a simple principle that encapsulates what we want in our AI Constitution. There’ll be lots of details that reflect the details of our existence and history. And the first step is just to understand how to represent those things. Which is what I think we can do with a symbolic discourse language.

And, yes, conveniently I happen to have just spent 30 years building the framework to create such a thing. And I’m keen to understand how we can really use it to create an AI Constitution.

So I’d better stop talking about philosophy, and try to answer some questions.

*After the talk there was a lively Q&A (followed by a panel discussion), included on the video. Some questions were:*

- When will AI reach human-level intelligence?
- What are the difficulties you foresee in developing a symbolic discourse language?
- Do we live in a deterministic universe?
- Is our present reality a simulation?
- Does free will exist, and how does consciousness arise from computation?
- Can we separate rules and principles in a way that is computable for AI?
- How can AI navigate contradictions in human ethical systems?

Gottfried Leibniz—who died 300 years ago this November—worked on many things. But a theme that recurred throughout his life was the goal of turning human law into an exercise in computation. Of course, as we know, he didn’t succeed. But three centuries later, I think we’re finally ready to give it a serious try again. And I think it’s a really important thing to do—not just because it’ll enable all sorts of new societal opportunities and structures, but because I think it’s likely to be critical to the future of our civilization in its interaction with artificial intelligence.

Human law, almost by definition, dates from the very beginning of civilization—and undoubtedly it’s the first system of rules that humans ever systematically defined. Presumably it was a model for the axiomatic structure of mathematics as defined by the likes of Euclid. And when science came along, “natural laws” (as their name suggests) were at first viewed as conceptually similar to human laws, except that they were supposed to define constraints for the universe (or God) rather than for humans.

Over the past few centuries we’ve had amazing success formalizing mathematics and exact science. And out of this there’s a more general idea that’s emerged: the idea of computation. In computation, we’re dealing with arbitrary systems of rules—not necessarily ones that correspond to mathematical concepts we know, or features of the world we’ve identified. So now the question is: can we use the ideas of computation, in very much the way Leibniz imagined, to formalize human law?

The basic issue is that human law talks about human activities, and (unlike say for the mechanics of particles) we don’t have a general formalism for describing human activities. When it comes to talking about money, for example, we often can be precise. And as a result, it’s pretty easy to write a very formal contract for paying a subscription, or determining how an option on a publicly traded stock should work.

But what about all the things that typical legal contracts deal with? Well, clearly we have one way to write legal contracts: just use natural language (like English). It’s often very stylized natural language, because it’s trying to be as precise as possible. But ultimately it’s never going to be precise. Because at the lowest level it’s always going to depend on the meanings of words, which for natural language are effectively defined just by the practice and experience of the users of the language.

For a computer language, though, it’s a different story. Because now the constructs in the language are absolutely precise: instead of having a vague, societally defined effect on human brains, they’re defined to have a very specific effect on a computer. Of course, traditional computer languages don’t directly talk about things relevant to human activities: they only directly talk about things like setting values for variables, or calling abstractly defined functions.

But what I’m excited about is that we’re starting to have a bridge between the precision of traditional computer languages and the ability to talk about real-world constructs. And actually, it’s something I’ve personally been working on for more than three decades now: our knowledge-based Wolfram Language.

The Wolfram Language is precise: everything in it is defined to the point where a computer can unambiguously work with it. But its unique feature among computer languages is that it’s knowledge based. It’s not just a language to describe the low-level operations of a computer; instead, built right into the language is as much knowledge as possible about the real world. And this means that the language includes not just numbers like 2.7 and strings like “abc”, but also constructs like the United States, or the Consumer Price Index, or an elephant. And that’s exactly what we need in order to start talking about the kinds of things that appear in legal contracts or human laws.

I should make it clear that the Wolfram Language as it exists today doesn’t include everything that’s needed. We’ve got a large and solid framework, and we’re off to a good start. But there’s more about the world that we have to encode to be able to capture the full range of human activities and human legal specifications.

The Wolfram Language has, for example, a definition of what a banana is, broken down by all kinds of details. So if one says “you should eat a banana”, the language has a way to represent “a banana”. But as of now, it doesn’t have a meaningful way to represent “you”, “should” or “eat”.

Is it possible to represent things like this in a precise computer language? Absolutely! But it takes language design to set up how to do it. Language design is a difficult business—in fact, it’s probably the most intellectually demanding thing I know, requiring a strange mixture of high abstraction together with deep knowledge and down-to-earth practical judgment. But I’ve been doing it now for nearly four decades, and I think I’m finally ready for the challenge of doing language design for everyday discourse.

So what’s involved? Well, let’s first talk about it in a simpler case: the case of mathematics. Consider the function `Plus`, which adds things like numbers together. When we use the English word “plus” it can have all sorts of meanings. One of those meanings is adding numbers together. But there are other meanings, that are related, say, by various analogies (“product X plus”, “the plus wire”, “it’s a real plus”, …).

When we come to define `Plus` in the Wolfram Language we want to build on the everyday notion of “plus”, but we want to make it precise. And we can do that by picking the specific meaning of “plus” that’s about adding things like numbers together. And once we know that this is what `Plus` means, we immediately know all sorts of properties, and can do explicit computations with it.

Now consider a concept like “magnesium”. It’s not as perfect and abstract a concept as `Plus`. But physics and chemistry give us a clear definition of the element magnesium—which we can then use in the Wolfram Language to have a well-defined “magnesium” entity.

It’s very important that the Wolfram Language is a symbolic language—because it means that the things in it don’t immediately have to have “values”; they can just be symbolic constructs that stand for themselves. And so, for example, the entity “magnesium” is represented as a symbolic construct, that doesn’t itself “do” anything, but can still appear in a computation, just like, for example, a number (like 9.45) can appear.

There are many kinds of constructs that the Wolfram Language supports. Like “New York City” or “last Christmas” or “geographically contained within”. And the point is that the design of the language has defined a precise meaning for them. New York City, for example, is taken to mean the precise legal entity considered to be New York City, with geographical borders defined by law. Internal to the Wolfram Language, there’s always a precise canonical representation for something like New York City (it’s `Entity["City", {"NewYork", "NewYork", "UnitedStates"}]`). And this internal representation is all that matters when it comes to computation. Yes, it’s convenient to refer to New York City as “nyc”, but in the Wolfram Language that natural language form is immediately converted to the precise internal form.

So what about “you should eat a banana”? Well, we’ve got to go through the same language design process for something like “eat” as for `Plus` (or “banana”). And the basic idea is that we’ve got to figure out a standard meaning for “eat”. For example, it might be “ingestion of food by a person (or animal)”. Now, there are plenty of other possible meanings for the English word “eat”—for example, ones that use analogies, as in “this function eats its arguments”. But the idea—like for `Plus`—is to ignore these, and just to define a standard notion of “eat” that is precise, and suitable for computation.

One gets a reasonable idea of what kinds of constructs one has to deal with just by thinking about parts of speech in English. There are nouns. Sometimes (as in “banana” or “elephant”) there’s a pretty precise definition of what these correspond to, and usually the Wolfram Language already knows about them. Sometimes it’s a little vaguer but still concrete (as in “chair” or “window”), and sometimes it’s abstract (like “happiness” or “justice”). But in each case one can imagine one or several entities that capture a definite meaning for the noun—just like the Wolfram Language already has entities for thousands of kinds of things.

Beyond nouns, there are verbs. There’s typically a certain superstructure that exists around verbs. Grammatically there might be a subject for the verb, and an object, and so on. Verbs are similar to functions in the Wolfram Language: each one deals with certain arguments, that for example correspond to its subject, object, etc. Now of course in English (or any other natural language) there are all sorts of elaborate special cases and extra features that can be associated with verbs. But basically we don’t care about these. Because we’re really just trying to define symbolic constructs that represent certain concepts. We don’t have to capture every detail of how a particular verb works; we’re just using the English verb as a way to give us a kind of “cognitive hook” for the concept.

We can go through other parts of speech. Adverbs that modify verbs; adjectives that modify nouns. These can sometimes be represented in the Wolfram Language by constructs like `EntityInstance`, and sometimes by options to functions. But the important point in all cases is that we’re not trying to faithfully reproduce how the natural language works; we’re just using the natural language as a guide to how concepts are set up.

Pronouns are interesting. They work a bit like variables in pure anonymous functions. In “you should eat a banana”, the “you” is like a free variable that’s going to be filled in with a particular person.

Parts of speech and grammatical structures suggest certain general features to capture in a symbolic representation of discourse. There are a bunch of others, though. For example, there are what amount to “calculi” that one needs to represent notions of time (“within the time interval”, “starting later”, etc.) or of space (“on top of”, “contained within”, etc.). We’ve already got many calculi like these in the Wolfram Language; the most straightforward are ones about numbers (“greater than”, etc.) or sets (“member of”), etc. Some calculi have long histories (“temporal logic”, “set theory”, etc.); others still have to be constructed.

Is there a global theory of what to do? Well, no more than there’s a global theory of how the world works. There are concepts and constructs that are part of how our world works, and we need to capture these. No doubt there’ll be new things that come along in the future, and we’ll want to capture those too. And my experience from building Wolfram|Alpha is that the best thing to do is just to build each thing one needs, without starting off with any kind of global theory. After a while, one may notice that one’s built similar things several times, and one may go in and unify them.

One can get deep into the foundations of science and philosophy about this. Yes, there’s a computational universe out there of all the possible rules by which systems can operate (and, yes, I’ve spent a good part of my life studying the basic science of this). And there’s our physical universe that presumably operates according to certain rules from the computational universe. But from these rules can emerge all sorts of complex behavior, and in fact the phenomenon of computational irreducibility implies that in a sense there’s no limit to what can be built up.

But there’s not going to be an overall way to talk about all this stuff. And if we’re going to be dealing with any finite kind of discourse it’s going to only capture certain features. Which features we choose to capture is going to be determined by what concepts have evolved in the history of our society. And usually these concepts will be mirrored in the words that exist in the languages we use.

At a foundational level, computational irreducibility implies that there’ll always be new concepts that could be introduced. Back in antiquity, Aristotle introduced logic as a way to capture certain aspects of human discourse. And there are other frameworks that have been introduced in the history of philosophy, and more recently, natural language processing and artificial intelligence research. But computational irreducibility effectively implies that none of them can ever ultimately be complete. And we must expect that as the concepts we consider relevant evolve, so too must the symbolic representation we have for discourse.

OK, so let’s say we’ve got a symbolic representation for discourse. How’s it actually going to be used? Well, there are some good clues from the way natural language works.

In standard discussions of natural language, it’s common to talk about “interrogative statements” that ask a question, “declarative statements” that assert something and “imperative statements” that say to do something. (Let’s ignore “exclamatory statements”, like expletives, for now.)

Interrogative statements are what we’re dealing with all the time in Wolfram|Alpha: “what is the density of gold?”, “what is 3+7?”, “what was the latest reading from that sensor?”, etc. They’re also common in notebooks used to interact with the Wolfram Language: there’s an input (`In[1]:= 2+2`) and then there’s a corresponding output (`Out[1]= 4`).

Declarative statements are all about filling in particular values for variables. In a very coarse way, one can set values (`x=7`), as in typical procedural languages. But it’s typically better to think about having environments in which one’s asserting things. Maybe those environments are supposed to represent the real world, or some corner of it. Or maybe they’re supposed to represent some fictional world, where for example dinosaurs didn’t go extinct, or something.

Imperative statements are about making things happen in the world: “open the pod bay doors”, “pay Bob 0.23 bitcoin”, etc.

In a sense, interrogative statements determine the state of the world, declarative statements assert things about the state of the world, and imperative statements change the state of the world.

In different situations, we can mean different things by “the world”. We could be talking about abstract constructs, like integers or logic operations, that just are the way they are. We could be talking about natural laws or other features of our physical universe that we can’t change. Or we could be talking about our local environment, where we can move around tables and chairs, choose to eat bananas, and so on. Or we could be talking about our mental states, or the internal state of something like a computer.

There are lots of things one can do if one has a general symbolic representation for discourse. But one of them—which is the subject of this post—is to express things like legal contracts. The beginning of a contract, with its various whereas clauses, recitals, definitions and so on tends to be dense with declarative statements (“this is so”). Then the actual terms of the contract tend to end up with imperative statements (“this should happen”), perhaps depending on certain things determined by interrogative statements (“did this happen?”).

It’s not hard to start seeing the structure of contracts as being much like programs. In simple cases, they just contain logical conditionals: “if X then Y”. In other cases they’re more modeled on math: “if this amount of X happens, that amount of Y should happen”. Sometimes there’s iteration: “keep doing X until Y happens”. Occasionally there’s some recursion: “keep applying X to every Y”. And so on.

There are already some places where legal contracts are routinely represented by what amount to programs. The most obvious are financial contracts for things like bonds and options—which just amount to little programs that define payouts based on various formulas and conditionals.

There’s a whole industry of using “rules engines” to encode certain kinds of regulations as “if then” rules, usually mixed with formulas. In fact, such things are almost universally used for tax and insurance computations. (They’re also common in pricing engines and the like.)

Of course, it’s no coincidence that one talks about “legal codes”. The word code—which comes from the Latin *codex*—originally referred to systematic collections of legal rules. And when programming came along a couple of millennia later, it used the word “code” because it basically saw itself as similarly setting up rules for how things should work, except now the things had to do with the operation of computers rather than the conduct of worldly affairs.

But now, with our knowledge-based computer language and the idea of a symbolic discourse language, what we’re trying to do is to make it so we can talk about a broad range of worldly affairs in the same kind of way that we talk about computational processes—so we put all those legal codes and contracts into computational form.

How should we think about symbolic discourse language compared to ordinary natural language? In a sense, the symbolic discourse language is a representation in which all the nuance and “poetry” have been “crushed” out of the natural language. The symbolic discourse language is precise, but it’ll almost inevitably lose the nuance and poetry of the original natural language.

If someone says “2+2” to Wolfram|Alpha, it’ll dutifully answer “4”. But what if instead they say, “hey, will you work out 2+2 for me”. Well, that sets up a different mood. But Wolfram|Alpha will take that input and convert it to exactly the same symbolic form as “2+2”, and similarly just respond “4”.

This is exactly the kind of thing that’ll happen all the time with symbolic discourse language. And if the goal is to answer precise questions—or, for that matter, to create a precise legal contract, it’s exactly what one wants. One just needs the hard content that will actually have a consequence for what one’s trying to do, and in this case one doesn’t need the “extras” or “pleasantries”.

Of course, what one chooses to capture depends on what one’s trying to do. If one’s trying to get psychological information, then the “mood” of a piece of natural language can be very important. Those “exclamatory statements” (like expletives) carry meaning one cares about. But one can still perfectly well imagine capturing things like that in a symbolic way—for example by having an “emotion track” in one’s symbolic discourse language. (Very coarsely, this might be represented by sentiment or by position in an emotion space—or, for that matter, by a whole symbolic language derived, say, from emoji.)

In actual human communication through natural language, “meaning” is a slippery concept, that inevitably depends on the context of the communication, the history of whoever is communicating, and so on. My notion of a symbolic discourse language isn’t to try to magically capture the “true meaning” of a piece of natural language. Instead, my goal is just to capture some meaning that one can then compute with.

For convenience, one might choose to start with natural language, and then try to translate it into the symbolic discourse language. But the point is for the symbolic discourse language to be the real representation: the natural language is just a guide for trying to generate it. And in the end, the notion is that if one really wants to be sure one’s accurate in what one’s saying, one should say it directly in the symbolic discourse language, without ever using natural language.

Back in the 1600s, one of Leibniz’s big concerns was to have a representation that was independent of which natural language people were using (French, German, Latin, etc.). And one feature of a symbolic discourse language is that it has to operate “below” the level of specific natural languages.

There’s a rough kind of universality among human languages, in that it seems to be possible to represent any human concept at least to some approximation in any language. But there are plenty of nuances that are extremely hard to translate—between different languages, or the different cultures that surround them (or even the same language at different times in history). But in the symbolic discourse language, one’s effectively “crushing out” these differences—and getting something that is precise, even though it typically won’t correspond exactly to any particular human natural language.

A symbolic discourse language is about representing things in the world. Natural language is just one way to try to describe those things. But there are others. For example, one might give a picture. One could try to describe certain features of the picture in natural language (“a cat with a hat on its head”)—or one could go straight from the picture to the symbolic discourse language.

In the example of a picture, it’s very obvious that the symbolic discourse language isn’t going to capture everything. Maybe it could capture something like “he is taking the diamond”. But it’s not going to specify the color of every pixel, and it’s not going to describe all conceivable features of a scene at every level of detail.

In some sense, what the symbolic discourse language is doing is to specify a model of the system it’s describing. And like any model, it’s capturing some features, and idealizing others away. But the importance of it is that it provides a solid foundation on which computations can be done, conclusions can be drawn, and actions can be taken.

I’ve been thinking about creating what amounts to a general symbolic discourse language for nearly 40 years. But it’s only recently—with the current state of the Wolfram Language—that I’ve had the framework to actually do it. And it’s also only recently that I’ve understood how to think about the problem in a sufficiently practical way.

Yes, it’s nice in principle to have a symbolic way to represent things in the world. And in specific cases—like answering questions in Wolfram|Alpha—it’s completely clear why it’s worth doing this. But what’s the point of dealing with more general discourse? Like, for example, when do we really want to have a “general conversation” with a machine?

The Turing test says that being able to do this is a sign of achieving general AI. But “general conversations” with machines—without any particular purpose in mind—so far usually seem in practice to devolve quickly into party tricks and Easter eggs. At least that’s our experience looking at interactions people have with Wolfram|Alpha, and it also seems to be the experience with decades of chatbots and the like.

But the picture quickly changes if there’s a purpose to the conversation: if you’re actually trying to get the machine to do something, or learn something from the machine. Still, in most of these cases, there’s no real reason to have a general representation of things in the world; it’s sufficient just to represent specific machine actions, particular customer service goals, or whatever. But if one wants to tackle the general problem of law and contracts, it’s a different story. Because inevitably one’s going to have to represent the full spectrum of human affairs and issues. And so now there’s a definite goal to having a symbolic representation of the world: one needs it to be able to say what should happen and have machines understand it.

Sometimes it’s useful to do that because one wants the machines just to be able to check whether what was supposed to happen actually did; sometimes one wants to actually have the machines automatically enforce or do things. But either way, one needs the machine to be able to represent general things in the world—and so one needs a symbolic discourse language to be able to do this.

In a sense, it’s a very obvious idea to have something like a symbolic discourse language. And indeed it’s an idea that’s come up repeatedly across the course of centuries. But it’s proved a very difficult idea to make work, and it has a history littered with (sometimes quite wacky) failures.

Things in a sense started well. Back in antiquity, logic as discussed by Aristotle provided a very restricted example of a symbolic discourse language. And when the formalism of mathematics began to emerge it provided another example of a restricted symbolic discourse language.

But what about more general concepts in the world? There’d been many efforts—between the Tetractys of the Pythagoreans and the I Ching of the Chinese—to assign symbols or numbers to a few important concepts. But around 1300 Ramon Llull took it further, coming up with a whole combinatorial scheme for representing concepts—and then trying to implement this with circles of paper that could supposedly mechanically determine the validity of arguments, particularly religious ones.

Four centuries later, Gottfried Leibniz was an enthusiast of Llull’s work, at first imagining that perhaps all concepts could be converted to numbers and truth then determined by doing something like factoring into primes. Later, Leibniz starting talking about a *characteristica universalis* (or, as Descartes called it, an “alphabet of human thoughts”)—essentially a universal symbolic language. But he never really tried to construct such a thing, instead chasing what one might consider “special cases”—including the one that led him to calculus.

With the decline of Latin as the universal natural language in the 1600s, particularly in areas like science and diplomacy, there had already been efforts to invent “philosophical languages” (as they were called) that would represent concepts in an abstract way, not tied to any specific natural language. The most advanced of these was by John Wilkins—who in 1668 produced a book cataloging over 10,000 concepts and representing them using strange-looking glyphs, with a rendering of the Lord’s Prayer as an example.

In some ways these efforts evolved into the development of encyclopedias and later thesauruses, but as language-like systems, they basically went nowhere. Two centuries later, though, as the concept of internationalization spread, there was a burst of interest in constructing new, country-independent languages—and out of this emerged Volapük and then Esperanto. These languages were really just artificial natural languages; they weren’t an attempt to produce anything like a symbolic discourse language. I always used to enjoy seeing signs in Esperanto at European airports, and was disappointed in the 1980s when these finally disappeared. But, as it happens, right around that time, there was another wave of language construction. There were languages like Lojban, intended to be as unambiguous as possible, and ones like the interestingly minimal Toki Pona intended to support the simple life, as well as the truly bizarre Ithkuil, intended to encompass the broadest range of linguistic and supposedly cognitive structures.

Along the way, there were also attempts to simplify languages like English by expressing everything in terms of 1000 or 2000 basic words (instead of the usual 20,000–30,000)—as in the “Simple English” version of Wikipedia or the xkcd Thing Explainer.

There were a few, more formal, efforts. One example was Hans Freudenthal’s 1960 Lincos “language for cosmic intercourse” (i.e. communication with extraterrestrials) which attempted to use the notation of mathematical logic to capture everyday concepts. In the early days of the field of artificial intelligence, there were plenty of discussions of “knowledge representation”, with approaches based variously on the grammar of natural language, the structure of predicate logic or the formalism of databases. Very few large-scale projects were attempted (Doug Lenat’s Cyc being a notable counterexample), and when I came to develop Wolfram|Alpha I was disappointed at how little of relevance to our needs seemed to have emerged.

In a way I find it remarkable that something as fundamental as the construction of a symbolic discourse language should have had so little serious attention paid to it in the past. But at some level it’s not so surprising. It’s a difficult, large project, and it somehow lies in between established fields. It’s not a linguistics project. Yes, it may ultimately illuminate how languages work, but that’s not its main point. It’s not a computer science project because it’s really about content, not algorithms. And it’s not a philosophy project because it’s mostly about specific nitty-gritty and not much about general principles.

There’ve been a few academic efforts in the last half century or so, discussing ideas like “semantic primes” and “natural semantic metalanguage”. Usually such efforts have tried to attach themselves to the field of linguistics—but their emphasis on abstract meaning rather than pure linguistic structure has put them at odds with prevailing trends, and none have turned into large-scale projects.

Outside of academia, there’s been a steady stream of proposals—sometimes promoted by wonderfully eccentric individuals—for systems to organize and name concepts in the world. It’s not clear how far this pursuit has come since Ramon Llull—and usually it’s only dealing with pure ontology, and never with full meaning of the kind that can be conveyed in natural language.

I suppose one might hope that with all the recent advances in machine learning there’d be some magic way to automatically learn an abstract representation for meaning. And, yes, one can take Wikipedia, for example, or a text corpus, and use dimension reduction to derive some effective “space of concepts”. But, not too surprisingly, simple Euclidean space doesn’t seem to be a very good model for the way concepts relate (one can’t even faithfully represent graph distances). And even the problem of taking possible meanings for words—as a dictionary might list them—and breaking them into clusters in a space of concepts doesn’t seem to be easy to do effectively.

Still, as I’ll discuss later, I think there’s a very interesting interplay between symbolic discourse language and machine learning. But for now my conclusion is that there’s not going to be any alternative but to use human judgment to construct the core of any symbolic discourse language that’s intended for humans to use.

But let’s get back to contracts. Today, there are hundreds of billions of them being signed every year around the world (and vastly more being implicitly entered into)—though the number of “original” ones that aren’t just simple modifications is probably just in the millions (and is perhaps comparable to the number of original computer programs or apps being written.)

So can these contracts be represented in precise symbolic form, as Leibniz hoped 300 years ago? Well, if we can develop a decently complete symbolic discourse language, it should be possible. (Yes, every contract would have to be defined relative to some underlying set of “governing law” rules, etc., that are in some ways like the built-in functions of the symbolic discourse language.)

But what would it mean? Among other things, it would mean that contracts themselves would become computable things. A contract would be converted to a program in the symbolic discourse language. And one could do abstract operations just on this program. And this means one can imagine formally determining—in effect through a kind of generalization of logic—whether, say, a given contract has some particular implication, could ever lead to some particular outcome, or is equivalent to some other contract.

Ultimately, though, there’s a theoretical problem with this. Because questions like this can run into issues of formal undecidability, which means there’s no guarantee that any systematic finite computation will answer them. The same problem arises in reasoning about typical software programs people write, and in practice it’s a mixed bag, with some things being decidable, and others not.

Of course, even in the Wolfram Language as it is today, there are plenty of things (such as the very basic “are these expressions equal?”) that are ultimately in principle undecidable. And there are certainly questions one can ask that run right into such issues. But an awful lot of the kinds of questions that people naturally ask turn out to be answerable with modest amounts of computation. And I wouldn’t be surprised if this were true for questions about contracts too. (It’s worth noting that human-formulated questions tend to run into undecidability much less than questions picked, say at random, from the whole computational universe of possibilities.)

If one has contracts in computational form, there are other things one can expect to do too. Like to be able to automatically work out what the contracts imply for a large range of possible inputs. The 1980s revolution in quantitative finance started when it became clear one could automatically compute distributions of outcomes for simple options contracts. If one had lots (perhaps billions) of contracts in computational form, there’d be a lot more that could be done along these lines—and no doubt, for better or worse, whole new areas of financial engineering that could be developed.

OK, so let’s say one has a computational contract. What can one directly do with it? Well, it depends somewhat on what the form of its inputs is. One important possibility is that they’re in a sense “born computational”: that they’re immediately statements about a computational system (“how many accesses has this ID made today?”, “what is the ping time for this connection?”, “how much bitcoin got transferred?”, etc.). And in that case, it should be possible to immediately and unambiguously “evaluate” the contract—and find out if it’s being satisfied.

This is something that’s very useful for lots of purposes—both for humans interacting with machines, and machines interacting with machines. In fact, there are plenty of cases where versions of it are already in use. One can think of computer security provisions such as firewall rules as one example. There are others that are gradually emerging, such as automated SLAs (service-level agreements) and automated terms-of-service. (I’m certainly hoping our company, for example, will be able to make these a routine part of our business practices before too long.)

But, OK, it’s certainly not true that every input for every contract is “born computational”: plenty of inputs have to come from seeing what happens in the “outside” world (“did the person actually go to place X?”, “was the package maintained in a certain environment?”, “did the information get leaked to social media?”, “is the parrot dead?”, etc.). And the first thing to say is that in modern times it’s become vastly easier to automatically determine things about the world, not least because one can just make measurements with sensors. Check the GPS trace. Look at the car counting sensor. And so on. The whole Internet of Things is out there to provide input about the real world for computational contracts.

Having said this, though, there’s still an issue. Yes, with a GPS trace there’s a definite answer (assuming the GPS is working properly) for whether someone or something went to a particular place. But let’s say one’s trying to determine something less obviously numerical. Let’s say, for example, that one’s trying to determine whether a piece of fruit should be considered “Fancy Grade” or not. Well, given some pictures of the piece of fruit an expert can pretty unambiguously tell. But how can we make this computational?

Well, here’s a place where we can use modern machine learning. We can set up some neural net, say in the Wolfram Language, and then show it lots of examples of fruit that’s Fancy Grade and that’s not. And from my experience (and those of our customers!) most of the time we’ll get a system that’s really good at a task like grading fruit. It’ll certainly be much faster than humans, and it’ll probably be more reliable and more consistent too.

And this gives a whole new way to set up contracts about things in the world. Two parties can just agree that the contract should say “if the machine learning system says X then do Y”. In a sense it’s like any other kind of computational contract: the machine learning system is just a piece of code. But it’s a little different. Because normally one expects that one can readily examine everything that a contract says: one can in effect read and understand the code. But with machine learning in the middle, there can no longer be any expectation of that.

Nobody specifically set up all those millions of numerical weights in the neural net; they were just determined by some approximate and somewhat random process from whatever training data that was given. Yes, in principle we can measure everything about what’s happening inside the neural net. But there’s no reason to expect that we’ll ever be able to get an understandable explanation—or prediction—of what the net will do in any particular case. Most likely it’s an example of the phenomenon I call computational irreducibility–which means there really isn’t any way to see what will happen much more efficiently than just by running it.

What’s the difference with asking a human expert, then, whose thought processes one can’t understand? Well, in practice machine learning is much faster so one can make much more use of “expert judgment”. And one can set things up so they’re repeatable, and one can for example systematically test for biases one thinks might be there, and so on.

Of course, one can always imagine cheating the machine learning. If it’s repeatable, one could use machine learning itself to try to learn cases where it would fail. And in the end it becomes rather like computer security, where holes are being found, patches are being applied, and so on. And in some sense this is no different from the typical situation with contracts too: one tries to cover all situations, then it becomes clear that something hasn’t been correctly addressed, and one tries to write a new contract to address it, and so on.

But the important bottom line is that with machine learning one can expect to get “judgment oriented” input into contracts. I expect the typical pattern will be this: in the contract there’ll be something stated in the symbolic discourse language (like “X will personally do Y”). And at the level of the symbolic discourse language there’ll be a clear meaning to this, from which, for example, all sorts of implications can be drawn. But then there’s the question of whether what the contract said is actually what happened in the real world. And, sure, there can be lots of sensor data that gives information on this. But in the end there’ll be a “judgment call” that has to be made. Did the person actually personally do this? Well—like for a remote exam proctoring system—one can have a camera watching the person, one can record their pattern of keystrokes, and maybe even measure their EEG. But something’s got to synthesize this data, and make the judgment call about what happened, and turn this in effect into a symbolic statement. And in practice I expect it will typically end up being a machine learning system that does this.

OK, so let’s say we’ve got ways to set up computational contracts. How can we enforce them? Well, ones that basically just involve computational processes can at some level enforce themselves. A particular piece of software can be built to issue licenses only in such-and-such a way. A cloud system can be built to make a download available only if it receives a certain amount of bitcoin. And so on.

But how far do we trust what’s going on? Maybe someone hacked the software, or the cloud. How can we be sure nothing bad has happened? The basic answer is to use the fact that the world is a big place. As a (sometime) physicist it makes me think of measurement in quantum mechanics. If we’re just dealing with a little quantum effect, there’s always interference that can happen. But when we do a real measurement, we’re amplifying that little quantum effect to the point where so many things (atoms, etc.) are involved that it’s unambiguous what happened—in much the same way as the Second Law of Thermodynamics makes it inconceivable that all the air molecules in a room will spontaneously line up on one side.

And so it is with bitcoin, Ethereum, etc. The idea is that some particular thing that happened (“X paid Y such-and-such” or whatever) is shared and recorded in so many places that there can’t be any doubt about it. Yes, it’s in principle possible that all the few thousand places that actually participate in something like bitcoin today could collude to give a fake result. But the idea is that it’s like with gas molecules in a room: the probability is inconceivably small. (As it happens, my Principle of Computational Equivalence suggests that there’s more than an analogy with the gas molecules, and that actually the underlying principles at work are basically exactly the same. And, yes, there are lots of interesting technical details about the operation of distributed blockchain ledgers, distributed consensus protocols, etc., but I’m not going to get into them here.)

It’s popular these days to talk about “smart contracts”. When I’ve been talking about “computational contracts” I mean contracts that can be expressed computationally. But by “smart contracts” people usually mean contracts that can both be expressed computationally and execute automatically. Most often the idea is to set up a smart contract in a distributed computation environment like Ethereum, and then to have the code in the contract evaluate based on inputs from the computation environment.

Sometimes the input is intrinsic—like the passage of time (who could possibly tamper with the clock of the whole internet?), or physically generated random numbers. And in cases like this, one has fairly pure smart contracts, say for paying subscriptions, or for running distributed lotteries.

But more often there has to be some input from the outside—from something that happens in the world. Sometimes one just needs public information: the price of a stock, the temperature at a weather station, or a seismic event like a nuclear explosion. But somehow the smart contract needs access to an “oracle” that can give it this information. And conveniently enough, there is one good such oracle available in the world: Wolfram|Alpha. And indeed Wolfram|Alpha is becoming widely used as an oracle for smart contracts. (Yes, our general public terms of service say you currently just shouldn’t rely on Wolfram|Alpha for anything you consider critical—though hopefully soon those terms of service will get more sophisticated, and computational.)

But what about non-public information from the outside world? The current thinking for smart contracts tends to be that one has to get humans in the loop to verify the information: that in effect one has to have a jury (or a democracy) to decide whether something is true. But is that really the best one can do? I tend to suspect there’s another path, that’s like using machine learning to inject human-like judgment into things. Yes, one can use people, with all their inscrutable and hard-to-systematically-influence behavior. But what if one replaces those people in effect by AIs—or even a collection of today’s machine-learning systems?

One can think of a machine-learning system as being a bit like a cryptosystem. To attack it and spoof its input one has to do something like inverting how it works. Well, given a single machine-learning system there’s a certain effort needed to achieve this. But if one has a whole collection of sufficiently independent systems, the effort goes up. It won’t be good enough just to change a few parameters in the system. But if one just goes out into the computational universe and picks systems at random then I think one can expect to have the same kind of independence as by having different people. (To be fair, I don’t yet quite know how to apply the mining of the computational universe that I’ve done for programs like cellular automata to the case of systems like neural nets.)

There’s another point as well: if one has a sufficiently dense net of sensors in the world, then it becomes increasingly easy to be sure about what’s happened. If there’s just one motion sensor in a room, it might be easy to cover it. And maybe even if there are several sensors, it’s still possible to avoid them, *Mission Impossible*-style. But if there are enough sensors, then by synthesizing information from them one can inevitably build up an understanding of what actually happened. In effect, one has a model of how the world works, and with enough sensors one can validate that the model is correct.

It’s not surprising, but it always helps to have redundancy. More nodes to ensure the computation isn’t tampered with. More machine-learning algorithms to make sure they aren’t spoofed. More sensors to make sure they’re not fooled. But in the end, there has to be something that says what should happen—what the contract is. And the contract has to be expressed in some language in which there are definite concepts. So somehow from the various redundant systems one has in the world, one has to make a definite conclusion—one has to turn the world into something symbolic, on which the contract can operate.

Let’s say we have a good symbolic discourse language. Then how should contracts actually get written in it?

One approach is to take existing contracts written in English or any other natural language, and try to translate (or parse) them into the symbolic discourse language. Well, what will happen is somewhat like what happens with Wolfram|Alpha today. The translator will not know exactly what the natural language was supposed to mean, and so it will give several possible alternatives. Maybe there was some meaning that the original writer of the natural-language contract had in mind. But maybe the “poetry” of that meaning can’t be expressed in the symbolic discourse language: it requires something more definite. And a human is going to have to decide which alternative to pick.

Translating from natural-language contracts may be a good way to start, but I suspect it will quickly give way to writing contracts directly in the symbolic discourse language. Today lawyers have to learn to write legalese. In the future, they’re going to have to learn to write what amounts to code: contracts expressed precisely in a symbolic discourse language.

One might think that writing everything as code, rather than natural-language legalese, would be a burden. But my guess is that it will actually be a great benefit. And it’s not just because it will let contracts operate more easily. It’s also that it will help lawyers think better about contracts. It’s an old claim (the Sapir–Whorf hypothesis) that the language one uses affects the way one thinks. And this is no doubt somewhat true for natural languages. But in my experience it’s dramatically true for computer languages. And indeed I’ve been amazed over the years at how my thinking has changed as we’ve added more to the Wolfram Language. When I didn’t have a way to express something, it didn’t enter my thinking. But once I had a way to express it, I could think in terms of it.

And so it will be, I believe, for legal thinking. When there’s a precise symbolic discourse language, it’ll become possible to think more clearly about all sorts of things.

Of course, in practice it’ll help that there’ll no doubt be all sorts of automated annotation: “if you add that clause, it’ll imply X, Y and Z”, etc. It’ll also help that it’ll routinely be possible to take some contract and simulate its consequences for a range of inputs. Sometimes one will want statistical results (“is this biased?”). Sometimes one will want to hunt for particular “bugs” that will only be found by trying lots of inputs.

Yes, one can read a contract in natural language, like one can read a math paper. But if one really wants to know its implications one needs it in computational form, so one can run it and see what it implies—and also so one can give it to a computer to implement.

Back in ancient Babylon it was a pretty big deal when there started to be written laws like the Code of Hammurabi. Of course, with very few people able to read, there was all sorts of clunkiness at first—like having people recite the laws in order from memory. Over the centuries things got more streamlined, and then about 500 years ago, with the advent of widespread literacy, laws and contracts started to be able to get more complex (which among other things allowed them to be more nuanced, and to cover more situations).

In recent decades the trend has accelerated, particularly now that it’s so easy to copy and edit documents of any length. But things are still limited by the fact that humans are in the loop, authoring and interpreting the documents. Back 50 years ago, pretty much the only way to define a procedure for anything was to write it down, and have humans implement it. But then along came computers, and programming. And very soon it started to be possible to define vastly more complex procedures—to be implemented not by humans, but instead by computers.

And so, I think, it will be with law. Once computational law becomes established, the complexity of what can be done will increase rapidly. Typically a contract defines some model of the world, and specifies what should happen in different situations. Today the logical and algorithmic structure of models defined by contracts still tends to be fairly simple. But with computational contracts it’ll be feasible for them to be much more complex—so that they can for example more faithfully capture how the world works.

Of course, that just makes defining what should happen even more complex—and before long it might feel a bit like constructing an operating system for a computer, that tries to cover all the different situations the computer might find itself in.

In the end, though, one’s going to have to say what one wants. One might be able to get a certain distance by just giving specific examples. But ultimately I think one’s going to have to use a symbolic discourse language that can express a higher level of abstraction.

Sometimes one will be able to just write everything in the symbolic discourse language. But often, I suspect, one will use the symbolic discourse language to define what amount to goals, and then one will have to use machine-learning kinds of methods to fill in how to define a contract that actually achieves them.

And as soon as there’s computational irreducibility involved, it’ll typically be impossible to know for sure that there are no bugs, or “unintended consequences”. Yes, one can do all kinds of automated tests. But in the end it’s theoretically impossible to have any finite procedure that can guarantee to check all possibilities.

Today there are plenty of legal situations that are too complex to handle without expert lawyers. And in a world where computational law is common, it won’t just be convenient to have computers involved, it’ll be necessary.

In a sense it’s similar to what’s already happened in many areas of engineering. Back when humans had to design everything themselves, humans could typically understand the structures that were being built. But once computers are involved in design it becomes inevitable that they’re needed in figuring out how things work too.

Today a fairly complex contract might involve a hundred pages of legalese. But once there’s computational law—and particularly contracts constructed automatically from goals—the lengths are likely to increase rapidly. At some level it won’t matter, though—just as it doesn’t really matter how long the code of a program one’s using is. Because the contract will in effect just be run automatically by computer.

Leibniz saw computation as a simplifying element in the practice of law. And, yes, some things will become simpler and better defined. But a vast ocean of complexity will also open up.

How should one tell an AI what to do? Well, you have to have some form of communication that both humans and AIs can understand—and that is rich enough to describe what one wants. And as I’ve described elsewhere, what I think this basically means is that one has to have a knowledge-based computer language—which is precisely what the Wolfram Language is—and ultimately one needs a full symbolic discourse language.

But, OK, so one tells an AI to do something, like “go get some cookies from the store”. But what one says inevitably won’t be complete. The AI has to operate within some model of the world, and with some code of conduct. Maybe it can figure out how to steal the cookies, but it’s not supposed to do that; presumably one wants it to follow the law, or a certain code of conduct.

And this is where computational law gets really important: because it gives us a way to provide that code of conduct in a way that AIs can readily make use of.

In principle, we could have AIs ingest the complete corpus of laws and historical cases and so on, and try to learn from these examples. But as AIs become more and more important in our society, it’s going to be necessary to define all sorts of new laws, and many of these are likely to be “born computational”, not least, I suspect, because they’ll be too algorithmically complex to be usefully described in traditional natural language.

There’s another problem too: we really don’t just want AIs to follow the letter of the law (in whatever venue they happen to be), we want them to behave ethically too, whatever that may mean. Even if it’s within the law, we probably don’t want our AIs lying and cheating; we want them somehow to enhance our society along the lines of whatever ethical principles we follow.

Well, one might think, why not just teach AIs ethics like we could teach them laws? In practice, it’s not so simple. Because whereas laws have been somewhat decently codified, the same can’t be said for ethics. Yes, there are philosophical and religious texts that talk about ethics. But it’s a lot vaguer and less extensive than what exists for law.

Still, if our symbolic discourse language is sufficiently complete, it certainly should be able to describe ethics too. And in effect we should be able to set up a system of computational laws that defines a whole code of conduct for AIs.

But what should it say? One might have a few immediate ideas. Perhaps one could combine all the ethical systems of the world. Obviously hopeless. Perhaps one could have the AIs just watch what humans do and learn their system of ethics from it. Similarly hopeless. Perhaps one could try something more local, where the AIs switch their behavior based on geography, cultural context, etc. (think “protocol droid”). Perhaps useful in practice, but hardly a complete solution.

So what can one do? Well, perhaps there are a few principles one might agree on. For example, at least the way we think about things today, most of us don’t want humans to go extinct (of course, maybe in the future, having mortal beings will be thought too disruptive, or whatever). And actually, while most people think there are all sorts of things wrong with our current society and civilization, people usually don’t want it to change too much, and they definitely don’t want change forced upon them.

So what should we tell the AIs? It would be wonderful if we could just give the AIs some simple set of almost axiomatic principles that would make them always do what we want. Maybe they could be based on Asimov’s Three Laws of Robotics. Maybe they could be something seemingly more modern based on some kind of global optimization. But I don’t think it’s going to be that easy.

The world is a complicated place; if nothing else, that’s basically guaranteed by the phenomenon of computational irreducibility. And it’s pretty much inevitable that there’s not going to be any finite procedure that’ll force everything to “come out the way one wants” (whatever that may be).

Let me take a somewhat abstruse, but well defined, example from mathematics. We think we know what integers are. But to really be able to answer all questions about integers (including about infinite collections of them, etc.) we need to set up axioms that define how integers work. And that’s what Giuseppe Peano tried to do in the late 1800s. For a while it looked good, but then in 1931 Kurt Gödel surprised the world with his Incompleteness Theorem, which implied among other things, that actually, try as one might, there was never going to be a finite set of axioms that would define the integers as we expect them to be, and nothing else.

In some sense, Peano’s original axioms actually got quite close to defining just the integers we want. But Gödel showed that they also allow bizarre non-standard integers, where for example the operation of addition isn’t finitely computable.

Well, OK, that’s abstract mathematics. What about the real world? Well, one of the things that we’ve learned since Gödel’s time is that the real world can be thought of in computational terms, pretty much just like the mathematical systems Gödel considered. And in particular, one can expect the same phenomenon of computational irreducibility (which itself is closely related to Gödel’s Theorem). And the result of this is that whatever simple intuitive goal we may define, it’s pretty much inevitable we’ll have to build up what amount to an arbitrarily complicated collection of rules to try to achieve it—and whatever we do, there’ll always be at least some “unintended consequences”.

None of this should really come as much of a surprise. After all, if we look at actual legal systems as they’ve evolved over the past couple of thousand years, there always end up being a lot of laws. It’s not like there’s a single principle from which everything else can be derived; there inevitably end up being lots of different situations that have to be covered.

But is all this complexity just a consequence of the “mechanics” of how the world works? Imagine—as one expects—that AIs get more and more powerful. And that more and more of the systems of the world, from money supplies to border controls, are in effect put in the hands of AIs. In a sense, then, the AIs play a role a little bit like governments, providing an infrastructure for human activities.

So, OK, perhaps we need a “constitution” for the AIs, just like we set up constitutions for governments. But again the question comes: what should the constitution have in it?

Let’s say that the AIs could mold human society in pretty much any way. How would we want it molded? Well, that’s an old question in political philosophy, debated since antiquity. At first an idea like utilitarianism might sound good: somehow maximize the well-being of as many people as possible. But imagine actually trying to do this with AIs that in effect control the world. Immediately one is thrust into concrete versions of questions that philosophers and others have debated for centuries. Let’s say one can sculpt the probability distribution for happiness among people in the world. Well, now we’ve got to get precise about whether it’s the mean or the median or the mode or a quantile or, for that matter, the kurtosis of the distribution that we’re trying to maximize.

No doubt one can come up with rhetoric that argues for some particular choice. But there just isn’t an abstract “right answer”. Yes, we can have a symbolic discourse language that expresses any choice. But there’s no mathematical derivation of the answer and there’s no law of nature that forces a particular answer. I suppose there could be a “best answer given our biological nature”. But as things advance, this won’t be on solid ground either, as we increasingly manage to use technology to transcend the biology that evolution has delivered to us.

Still, we might argue, there’s at least one constraint: we don’t want a scheme where we’ll go extinct—and where nothing will in the end exist. Even this is going to be a complicated thing to discuss, because we need to say what the “we” here is supposed to be: just how “evolved” relative to the current human condition can things be, and not consider “us” to have gone extinct?

But even independent of this, there’s another issue: given any particular setup, computational irreducibility can make it in a sense irreducibly difficult to find out its consequences. And so in particular, given any specific optimization criterion (or constitution), there may be no finite procedure that will determine whether it allows for infinite survival, or whether in effect it implies civilization will “halt” and go extinct.

OK, so things are complicated. What can one actually do? For a little while there’ll probably be the notion that AIs must ultimately have human owners, who must act according to certain principles, following the usual way human society operates. But realistically this won’t last long.

Who would be responsible for a public-domain AI system that’s spread across the internet? What happens when the bots it spawns start misbehaving on social media (yes, the notion that social media accounts are just for humans will soon look very “early 21st century”)?

Of course, there’s an important question of why AIs should “follow the rules” at all. After all, humans certainly don’t always do that. It’s worth remembering, though, that we humans are probably a particularly difficult case: after all, we’re the product a multibillion-year process of natural selection, in which there’s been a continual competitive struggle for survival. AIs are presumably coming into the world in very different circumstances, and without the same need for “brutish instincts”. (Well, I can’t help thinking of AIs from different companies or countries being imbued by their creators with certain brutish instincts, but that’s surely not a necessary feature of AI existence.)

In the end, though, the best hope for getting AIs to “follow the rules” is probably by more or less the same mechanism that seems to maintain human society today: that following the rules is the way some kind of dynamic equilibrium is achieved. But if we can get the AIs to “follow the rules”, we still have to define what the rules—the AI Constitution—should be.

And, of course, this is a hard problem, with no “right answer”. But perhaps one approach is to see what’s happened historically with humans. And one important and obvious thing is that there are different countries, with different laws and customs. So perhaps at the very least we have to expect that there’d be multiple AI Constitutions, not just one.

Even looking at countries today, an obvious question is how many there should be. Is there some easy way to say that—with technology as it exists, for example—7 billion people should be expected to organize themselves into about 200 countries?

It sounds a bit like asking how many planets the solar system should end up with. For a long time this was viewed as a “random fact of nature” (and widely used by philosophers as an example of something that, unlike 2+2=4, doesn’t “have to be that way”). But particularly having seen so many exoplanet systems, it’s become clear that our solar system actually pretty much has to have about the number of planets it does.

And maybe after we’ve seen the sociologies of enough video-game virtual worlds, we’ll know something about how to “derive” the number of countries. But of course it’s not at all clear that AI Constitutions should be divided anything like countries.

The physicality of humans has the convenient consequence that at least at some level one can divide the world geographically. But AIs don’t need to have that kind of spatial locality. One can imagine some other schemes, of course. Like let’s say one looks at the space of personalities and motivations, and finds clusters in it. Perhaps one could start to say “here’s an AI Constitution for that cluster” and so on. Maybe the constitutions could fork, perhaps almost arbitrarily (a “Git-like model of society”). I don’t know how things like this would ultimately work, but they seem more plausible than what amounts to a single, consensus, AI Constitution for everywhere and everyone.

There are so many issues, though. Like here’s one. Let’s assume AIs are the dominant power in our world. But let’s assume that they successfully follow some constitution or constitutions that we’ve defined for them. Well, that’s nice—but does it mean nothing can ever change in the world? I mean, just think if we were still all operating according to laws that had been set up 200 years ago: most of society has moved on since then, and wants different laws (or at least different interpretations) to reflect its principles.

But what if precise laws for AIs were burnt in around the year 2020, for all eternity? Well, one might say, real constitutions always have explicit clauses that allow for their own modification (in the US Constitution it’s Article V). But looking at the actual constitutions of countries around the world isn’t terribly encouraging. Some just say basically that the constitution can be changed if some supreme leader (a person) says so. Many say that the constitution can be changed through some democratic process—in effect by some sequence of majority or similar votes. And some basically define a bureaucratic process for change so complex that one wonders if it’s formally undecidable whether it would ever come to a conclusion.

At first, the democratic scheme seems like an obvious winner. But it’s fundamentally based on the concept that people are somehow easy to count (of course, one can argue about which people, etc.). But what happens when personhood gets more complicated? When, for example, there are in effect uploaded human consciousnesses, deeply intertwined with AIs? Well, one might say, there’s always got to be some “indivisible person” involved. And yes, I can imagine little clumps of pineal gland cells that are maintained to define “a person”, just like in the past they were thought to be the seat of the soul. But from the basic science I’ve done I think I can say for certain that none of this will ultimately work—because in the end the computational processes that define things just don’t have this kind of indivisibility.

So what happens to “democracy” when there are no longer “people to count”? One can imagine all sorts of schemes, involving identifying the density of certain features in “people space”. I suppose one can also imagine some kind of bizarre voting involving transfinite numbers of entities, in which perhaps the axiomatization of set theory has a key effect on the future of history.

It’s an interesting question how to set up a constitution in which change is “burned in”. There’s a very simple example in bitcoin, where the protocol just defines by fiat that the value of mined bitcoin goes down every year. Of course, that setup is in a sense based on a model of the world—and in particular on something like Moore’s Law and the apparent short-term predictability of technological development. But following the same general idea, one might starting thinking about a constitution that says “change 1% of the symbolic code in this every year”. But then one’s back to having to decide “which 1%?”. Maybe it’d be based on usage, or observations of the world, or some machine-learning procedure. But whatever algorithm or meta-algorithm is involved, there’s still at some point something that has to be defined once and for all.

Can one make a general theory of change? At first, this might seem hopeless. But in a sense exploring the computational universe of programs is like seeing a spectrum of all possible changes. And there’s definitely some general science that can be done on such things. And maybe there’s some setup—beyond just “fork whenever there could be a change”—that would let one have a constitution that appropriately allows for change, as well as changing the way one allows for change, and so on.

OK, we’ve talked about some far-reaching and foundational issues. But what about the here and now? Well, I think the exciting thing is that 300 years after Gottfried Leibniz died, we’re finally in a position to do what he dreamed of: to create a general symbolic discourse language, and to apply it to build a framework for computational law.

With the Wolfram Language we have the foundational symbolic system—as well as a lot of knowledge of the world—to start from. There’s still plenty to do, but I think there’s now a definite path forward. And it really helps that in addition to the abstract intellectual challenge of creating a symbolic discourse language, there’s now also a definite target in mind: being able to set up practical systems for computational law.

It’s not going to be easy. But I think the world is ready for it, and needs it. There are simple smart contracts already in things like bitcoin and Ethereum, but there’s vastly more that can be done—and with a full symbolic discourse language the whole spectrum of activities covered by law becomes potentially accessible to structured computation. It’s going to lead to all sorts of both practical and conceptual advances. And it’s going to enable new legal, commercial and societal structures—in which, among other things, computers are drawn still further into the conduct of human affairs.

I think it’s also going to be critical in defining the overall framework for AIs in the future. What ethics, and what principles, should they follow? How do we communicate these to them? For ourselves and for the AIs we need a way to formulate what we want. And for that we need a symbolic discourse language. Leibniz had the right idea, but 300 years too early. Now in our time I’m hoping we’re finally going to get to build for real what he only imagined. And in doing so we’re going to take yet another big step forward in harnessing the power of the computational paradigm.

]]>Computational thinking is going to be a defining feature of the future—and it’s an incredibly important thing to be teaching to kids today. There’s always lots of discussion (and concern) about how to teach traditional mathematical thinking to kids. But looking to the future, this pales in comparison to the importance of teaching computational thinking. Yes, there’s a certain amount of traditional mathematical thinking that’s needed in everyday life, and in many careers. But computational thinking is going to be needed everywhere. And doing it well is going to be a key to success in almost all future careers.

Doctors, lawyers, teachers, farmers, whatever. The future of all these professions will be full of computational thinking. Whether it’s sensor-based medicine, computational contracts, education analytics or computational agriculture—success is going to rely on being able to do computational thinking well.

I’ve noticed an interesting trend. Pick any field X, from archeology to zoology. There either is now a “computational X” or there soon will be. And it’s widely viewed as the future of the field.

So how do we prepare the kids of today for this future? I myself have been involved with computational thinking for nearly 40 years now—building technology for it, applying it in lots of places, studying its basic science—and trying to understand its principles. And by this point I think I have a clear view of what it takes to do computational thinking. So now the question is how to educate kids about it. And I’m excited to say that I think I now have a good answer to that—that’s based on something I’ve spent 30 years building for other purposes: the Wolfram Language. There have been ways to teach the mechanics of low-level programming for a long time, but what’s new and important is that with all the knowledge and automation that we’ve built into the Wolfram Language we’re finally now to the point where we have the technology to be able to directly teach broad computational thinking, even to kids.

I’m personally very committed to the goal of teaching computational thinking—because I believe it’s so crucial to our future. And I’m trying to do everything I can with our technology to support the effort. We’ve had Wolfram|Alpha free on the web for years now. But now we’ve also launched our Wolfram Open Cloud—so that anyone anywhere can start learning computational thinking with the Wolfram Programming Lab, using the Wolfram Language. But this is just the beginning—and as I’ll discuss here, there are many exciting new things that I think are now possible.

But first, let’s try to define what we mean by “computational thinking”. As far as I’m concerned, its intellectual core is about formulating things with enough clarity, and in a systematic enough way, that one can tell a computer how to do them. Mathematical thinking is about formulating things so that one can handle them mathematically, when that’s possible. Computational thinking is a much bigger and broader story, because there are just a lot more things that can be handled computationally.

But how does one “tell a computer” anything? One has to have a language. And the great thing is that today with the Wolfram Language we’re in a position to communicate very directly with computers about things we think about. The Wolfram Language is knowledge based: it knows about things in the world—like cities, or species, or songs, or photos we take—and it knows how to compute with them. And as soon as we have an idea that we can formulate computationally, the point is that the language lets us express it, and then—thanks to 30 years of technology development—lets us as automatically as possible actually execute the idea.

The Wolfram Language is a programming language. So when you write in it, you’re doing programming. But it’s a new kind of programming. It’s programming in which one’s as directly as possible expressing computational thinking—rather than just telling the computer step-by-step what low-level operations it should do. It’s programming where humans—including kids—provide the ideas, then it’s up to the computer and the Wolfram Language to handle the details of how they get executed.

Programming—and programming education—have traditionally been about telling a computer at a low level what to do. But thanks to all the technology we’ve built in the Wolfram Language, one doesn’t have to do that any more. One can express things at a much higher level—so one can concentrate on computational thinking, not mere programming.

Yes, there’s certainly a need for some number of software engineers in the world who can write low-level programs in languages like C++ or Java or JavaScript—and can handle the details of loops and declarations. But that number is tiny compared to the number of people who need to be able to think computationally.

The Wolfram Language—particularly in the form of Mathematica—has been widely used in technical research and development around the world for more than a quarter of a century, and endless important inventions and discoveries have been made with it. And all these years we’ve also been progressively filling out my original vision of having an integrated language in which every possible domain of knowledge is built in and automated. And the exciting thing is that now we’ve actually done this across a vast range of areas—enough to support all kinds of computational thinking, for example across all the fields traditionally taught in schools.

Seven years ago we released Wolfram|Alpha—which kids (and many others) use all the time to answer questions. Wolfram|Alpha takes plain English input, and then uses sophisticated computation from the Wolfram Language to automatically generate pages of results. I think Wolfram|Alpha is a spectacular illustration—for kids and others—of what’s possible with knowledge-based computation in the Wolfram Language. But it’s only intended for quick “drive by” questions that can be expressed in fairly few words, or maybe a bit of notation.

So what about more complicated questions and other things? Plain English doesn’t work well for these. To get enough precision to be able to get definite results one would end up with something like very elaborate and incomprehensible legalese. But the good news is that there’s an alternative: the Wolfram Language—which is built specifically to make it easy to express complex things, yet is always precise and definite.

It doesn’t take any skill to use Wolfram|Alpha. But if one wants to go further in taking advantage of what computation makes possible, one has to learn more about how to formulate and structure what one wants. Or, in other words, one needs to learn to do computational thinking. And the great thing is that the Wolfram Language finally provides the language in which one can do that—because, through all the work we’ve put into it, it’s managed to transcend mere programming, and as directly as possible support computational thinking.

So what’s it like when kids are first exposed to the Wolfram Language? As part of my effort to understand how to teach computational thinking, I’ve spent quite a bit of time in the last few years using the Wolfram Language with kids. Sometimes it’s with large groups, sometimes with small groups—and sometimes I’ll notice a kid at some grown-up event I’m at, and end up getting out my computer and spending time with the kid rather than the grown ups. I’ve worked with high-school-age kids, and with middle-school-age (11–14) ones.

If it’s one kid, or a small group, I’ll always insist that the kids do the typing. Usually I’ll start off with something everyone knows. Get the computer to compute 2+2. They type it in, and they can see that, yes, the computer gives them the result they know:

They’ll often then try some other basic arithmetic. It’s very important that the Wolfram Language lets them just enter input, and immediately see output from it. There are no extra steps.

After they’ve done some basic arithmetic, I’ll usually suggest they try something that generates more digits:

Often they’ll ask if it’s OK, or if somehow the long number will break the computer. I encourage them to try other examples, and they’ll often do computations that instantly generate pages and pages of numbers. These kinds of big-number computations are something we’ve been able to do for decades, but kids still always seem to get very excited by them. I think the point is that it lets them see that, yes, a computer really can compute nontrivial things. (Just think how long it would take you to compute all those digits…)

After they’ve done some basic arithmetic, it’s time for them to try some other functions. The most common function that I end up starting with is `Range`:

`Range` is good because it’s easy for kids to see what it does—and they quickly get the sense that, yes, they can tell the computer to do something, and it will do it. `Range` is also good because it’s easy to use it to generate something satisfyingly big. Often I’ll suggest they try `Range[1000]`. They’ll ask if `Range[10000]` is OK too. I tell them to try it…

I think I do something different with every kid or group of kids I deal with. But a pretty common next step is to see how to visualize the list we’ve made:

If the kids happen to be into math, I might try next making a table of primes:

And then plotting them:

For kids who perhaps don’t think they like math—or tech in general—I might instead make some colors:

Maybe we’d try blending red and blue to make purple:

Maybe we’d pick up the current image from the camera:

And we’d find all the “edges” in it:

We might also get a bit more sophisticated with color:

Perhaps then we’d go in another direction, getting a list of common words in English (I’d also try another language if any of the kids know one):

If the kids are into language arts, we might try generating some random words:

We might see how to use `StringTake` to take the first letter of each word:

Then use `WordCloud` to make a word cloud and see the relative frequencies of first letters:

Some kid might ask “what about the first two letters?”. Then we’d be off trying that (yes, there’s some computational thinking involved in that `UpTo`):

We might talk for a bit about how many words start with “un-” etc. And maybe we’d investigate some of those words. We could go on and look at translations of words:

Actually, it’d be easy to go on for hours just doing things with what I’ve talked about so far. But let’s look at some other examples. A big thing about the Wolfram Language is that it knows about lots of real-world data. I’d typically build this up through a bunch of steps, but here’s an example of making a collage of flags of countries in Europe, where the size of each flag is determined by the current population of the country:

Since we happen to have talked about color, it’s fun to see where in color space the flags lie (apparently not many “pink countries”, for example):

A big theme is that the Wolfram Language lets one do not just abstract computation, but computation based on real-world knowledge. The Wolfram Language covers a huge range of areas, from traditional STEM-like areas to art, history, music, sports, literature, geography and so on. Kids often like doing things with maps.

We might start from where we are (`Here`). Or from some landmark. Like here’s a map with a 100-mile-radius disk around the Eiffel tower:

Here’s a “powers of 10” sequence of images:

So what about history, for example? How can the Wolfram Language engage with that? Actually, it’s full of historical knowledge. About countries (plot the growth and decline of the Roman Empire), or movies (compare movie posters over time), or, for example, words. Like here’s a comparison of the use of “horse” and “car” in books over the last 300 years:

Try the same thing for names of countries, or inventions, or whatever; there’s always lots of history to discuss.

There are so many different directions to go. Here’s another one: graphics. Let’s make a 3D sphere:

It’s always fun for kids that they can make something like this in 3D and move it around. If they’re on the more sophisticated end, we might build up 3D graphics like this from 100 random spheres with random colors:

Kids of all ages like making interactive stuff. Here’s a simple “adjustable Cyclops eye” that one can easily build up to in stages:

Another thing I sometimes do is have the Wolfram Language make sound. Here’s a random sequence of musical notes:

There are so many directions to go. For the budding medical person, there’s anatomy in 3D—and you can pick out the geometry of a bone and 3D print it. And so on and so on.

I’d never seriously tried working with kids (though, yes, I do have four kids of my own) before launching into my recent efforts on computational thinking. So I didn’t know quite what to expect. People I talked to seemed somewhat amused about the contrast to my usual life of hard-driving technology development. And they kept on bringing up a couple of issues they thought might be crippling to what I wanted to do. The first was that they were skeptical that kids would actually be able to type raw code in the Wolfram Language; they thought they’d just get too confused and tangled up with syntax and so on. And the second issue is that they didn’t think kids would be motivated to do anything with code unless it led to creating a game they could play.

One of the nice features of working with kids is that if you give them the chance, they’ll very quickly make it very clear to you what works with them and what doesn’t. So what actually happens? Well, it turns out that in my experience neither of the potential problems people brought up ends up being a real issue at all. But the reasons for this are quite interesting, and not particularly what I would have expected.

About typing code, one thing to realize is that in today’s world, most middle-school-age kids are quite used to typing, or at least typing text. Sometimes when they start typing code they at first have to look where the [ ] keys are, or even where the + is. But they don’t have any fundamental problem with typing. They’re also quite used to learning precise rules for how things work (“i comes before e …” in English spelling; the order of operations in math; etc.). So learning a few rules like “functions use square brackets” or “function names start with capital letters” isn’t a big deal. And of course in the Wolfram Language there’s nothing like all those irregularities that exist in a natural language like English.

When I watch kids typing code, the automatic hints we provide are quite important (brackets being purple until they’re matched; things turning red if they’re in the wrong place; autocompletions being suggested for everything; etc.). But the bottom line is that despite the theoretical concerns of adults, actual kids seem to find it extremely easy to type syntactically correct code in the Wolfram Language. In fact, I’ve been amazed at how quickly many kids “get it”. Having seen just a few examples, they immediately generalize. And the great thing is that because the Wolfram Language is designed in a very consistent way, the generalizations they come up with actually work. It’s heartwarming for me as the language designer to see this. Though of course, to the kids it’s just obvious that something must work this-or-that way, and they don’t imagine that it took effort to design it that way.

OK, so kids can type Wolfram Language code. But do they want to? Lots of kids like playing games on computers, and adults often think that’s all they’ll be interested in creating on computers too. But in my observation, this simply isn’t true. The most important thing for most kids about the Wolfram Language is that they can immediately “do something real” with it. They can type whatever code they want, and immediately get the computer to do something for them. They can create pictures or sounds or text. They can make art. They can do science. They can explore human languages. They can analyze Pokémon (yes, the Wolfram Language has extensive Pokémon data). And yes, if they really want to, they can make games.

In my experience, if you ask kids before they’ve seen the Wolfram Language what they might be interesting in programming they’ll often say games. But as soon as they’ve actually seen what’s possible in the Wolfram Language, they’ll stop talking about games, and they’ll want to do something “real” instead.

It’s only very a recent thing (and it’s basically taken 30 years of work) that the Wolfram Language has got to the point where I think it provides an instantly compelling way for kids to learn computational thinking. And actually, it’s not just the raw language—and all the knowledge it contains—that’s important: it’s also the environment.

The first point is that the Wolfram Notebook concept that we invented nearly 30 years ago is a really good way for kids (and others) to interact with the language. The idea of a notebook is to have an interactive document that freely mixes code, results, graphics, text and everything else. One can build up a computation in a notebook, typing code and getting results right there in the document. The results can be dynamic—with their own automatically generated user interfaces. And one can read—or write—explanations or instructions directly in the notebook. It’s taken decades to polish all aspects of notebooks. But now we’ve got an extremely efficient and wonderful environment in which to work and think—and learn computational thinking.

For many years, notebooks and the Wolfram Language were basically available only as desktop software. But now—after a huge software engineering effort—they’re also available in the cloud, directly in a web browser, or on mobile devices. So that means that any kid can just go to a web browser, and immediately start interacting with the Wolfram Language—creating or editing a notebook, and writing whatever code they want.

It takes a big stack of technology to make this possible. And building it has taken a big chunk of my life. It’s been very satisfying to see so many great leading-edge achievements made over the years with our technology. And now I’m really excited to see what’s possible in using it to spread computational thinking to future generations.

I made the decision when we created Wolfram|Alpha to make it available free on the web to the world. And it’s been wonderful to see so many people—and especially kids—using it every day. So a few months ago, when the technology was ready, I made the decision also to provide free access to the whole Wolfram Language in our Wolfram Open Cloud—and to set it up so kids (and others) could learn computational thinking there.

Wolfram|Alpha is set up so anyone can ask it questions, in plain English. And it’s turned out to be great—among other things—as a way to support education in lots of fields. But if one wants to learn true computational thinking for the future, then one’s got to go beyond asking questions in plain English. And that’s where the Wolfram Language comes in.

So what’s the best way to get started with the Wolfram Language, and the computational thinking it makes possible? There are probably many answers to this, that, among other things, depend on the details of the environment and resources that different kids have available. I’d like to think I’ve personally done a decent job working directly with kids—and for example at our Wolfram Summer Camp for high-school students I’ve seen very good things achieved with direct personal mentoring.

But it’s also important to have “self service” solutions—and one thing I’ve done to contribute to that is to write a book called *An Elementary Introduction to the Wolfram Language*. It’s really a book about computational thinking. It doesn’t assume any previous knowledge of programming, or, for example, of math. But in the course of the book it gets people to the point where they can routinely write real programs that do things they’re interested in.

The book is available free online. And it’s also got exercises—which are automatically graded in the cloud. I originally intended the book for high school and up. But it’s turned out that there’s ended up being quite a collection of middle-school students (aged 11 and up) who have enthusiastically worked their way through it—even as the book has also turned out to be used for things like graduate math courses, trainings at banks, and educating professional software developers.

There’s a (free) online course based on my book that will be available soon, and I know there are quite a few courses under development that use the book to teach modern programming and computational thinking.

But, OK, when a kid walks up to their web browser to learn computational thinking and the Wolfram Language, where can they actually go? A few months ago we launched Wolfram Programming Lab as an answer to this. There’s a version in the Wolfram Open Cloud that’s free (and doesn’t even require login so long as you don’t want to save your work).

Wolfram Programming Lab has two basic branches. The first is a collection of Explorations. Each Exploration is a notebook that’s set up to contain code you can edit and run to do something interesting. After you’ve gone through the code that’s already there, the notebook then suggests ways to go further, and to explore on your own.

Explorations let you get a taste of the Wolfram Language and computational thinking. Kids can typically get through the basics of several in an hour. In a sense they’re like “immersion language learning”: You start from code that “fluent speakers” might write, then you interact with it.

But Wolfram Programming Lab provides a second branch too: an interactive version of my book, that lets people go step-by-step, building up from a very simple start, and progressively creating more and more sophisticated code.

You can use Wolfram Programming Lab entirely through a web browser, in the cloud. But there’s also a desktop version that runs on any standard computer—and lets you get really zippy local interactivity, as well as letting you do bigger computations if you want. And if you have a Raspberry Pi computer, the desktop version of Wolfram Programming Lab comes bundled right with the operating system, including special features for getting data from sensors connected to the Raspberry Pi.

I’ve wanted to make sure that Wolfram Programming Lab is suitable for any kid, anywhere, whether or not they’re embedded in an educational environment that can support what they’re doing. And from what we can tell, this seems to be working nicely—though it certainly helps when kids have actual people they can work with. We plan to set up the structure for informal networks to support this, among other things using the existing, very active Wolfram Community. But we’re also setting things up so Wolfram Programming Lab can easily fit into existing, organized, educational settings—not least by using the Wolfram Language to create some of the world’s best educational analytics to analyze student progress.

It’s worth mentioning that one of the great things about our whole Wolfram Cloud infrastructure is that it lets anyone—whether they’re students or teachers—directly publish things on the web for the world to use. And in Wolfram Programming Lab, for example, it’s routine to end up deploying an app on the web as part of an Exploration.

We’re still in the early days of understanding all the nuances of actually deploying Wolfram Programming Lab in every possible learning environment—and we’re steadily advancing on many fronts. A little while ago I happened to be talking to some kids at a school in Korea, and asked them whether they thought they’d be able to learn the Wolfram Language. One of the kids responded that she thought it looked easy—except for having to read all the English in the names of the functions.

Well, that got me thinking. And the result was that we introduced multilingual code captions, that annotate code in a whole range of different languages. You still type Wolfram Language code using standard function names, but you get an instant explanation in your native language. (By the way, there are also versions of my book that will be available in various languages.)

OK, so I’ve talked a bit about the mechanics of teaching computational thinking. But where does computational thinking fit into the standard educational curriculum? The answer, I think, is simple: everywhere!

One might think that computational thinking was somehow only relevant to STEM education. But it’s not true. Computational thinking is relevant across the whole curriculum. To social studies. To language arts. To music. To art. Even to sports. People have tried to make math relevant to all these areas. But you just can’t do enough with traditional hand-calculation-based math to make this realistic. But with computation and computational thinking it’s a completely different story. In every one of these areas there are very powerful—and often very clarifying—things that can be done with computation and computational thinking. And the great thing is that it’s all accessible to kids. The Wolfram Language takes care of all the internal technicalities—so one can really focus on the pure computational thinking and understanding, without the mechanics getting in the way.

One way to get to this is to redefine what one imagines “math” education to be—and that’s one of the things that’s being achieved in the Computer-Based Math initiative. But another approach is just to think about inserting computational thinking directly into every other area of the curriculum. I’ve noticed that in practice—particularly at the grade school level—the teachers who get enthusiastic about teaching computational thinking may or may not have obvious technical backgrounds. It’s like with the current generation of kids: you don’t have to be a techie to be into knowledge-based programming and computational thinking.

In the past, with low-level computer languages like C++ and Java, you really did have to be a committed, engineering-oriented person to be teaching with them. But it’s a completely different story with the Wolfram Language. Yes, there’s plenty to learn if one wants to know the language well. But one is learning about general computational thinking, not the engineering details of computer systems.

So how should computational thinking be fitted into the school curriculum? Something I hear quite a lot is that teachers already have a hard time fitting everything they’re supposed to teach into the available time. So how can anything else be added? Well, here’s the surprising thing that I’m only just beginning to understand: adding computational thinking actually makes it easier to teach lots of things, so even with the time spent on computational thinking, the total time can actually go down, even though there’s more being learned.

How can this be? The main point is that computational thinking provides a framework that makes things more transparent and easier to understand. When you formulate something computationally, everyone can try it out and explicitly see how it works. There’s nothing hidden that the student somehow has to infer from some comment the teacher made.

Here’s a story from years ago, when the Wolfram Language—in the form of Mathematica—was first being used to teach calculus. It’s pretty common for calculus students to have trouble understanding the concept of a function. But professors told me that they started noticing that when they were learning calculus through Mathematica, somehow none of the students ended up being confused about functions. And the reason was that they had learned about functions through computational thinking—through seeing them explicitly and computationally in the Wolfram Language, rather than hearing about them more indirectly and abstractly as in standard calculus teaching.

Particularly in past decades there was a great tendency for textbooks in almost every subject to “stand on ceremony” in explaining things—so the best explanations often had to be sought out in semi-illicit outline publications. But somehow, with things like MathWorld and Wikipedia, a more direct style of presenting information has become commonplace—and has come to be taken for granted by today’s students. I see the application of computational thinking across every field as being a kind of dramatic continuation of this trend: taking things which could only be talked around, and turning them into things that can be shown through computation directly and explicitly.

You talk about a Shakespeare play and try to get a general sense of the flow in it. Well, with computational thinking you can imagine creating a social network for the play (who “knows” who through being in the same scene, etc.). And pretty soon you have a nice summary, that’s a place to launch from in talking about the nuances of the play and its themes.

Imagine you’re talking about different language families. Well, you can just take some words and use `WordTranslation` to translate them into hundreds of languages. Then you could make a dendrogram to show how the forms of those words cluster in different languages—and you can discover the Indo-European language family.

You could be talking about styles of art—and pull up lots of images of famous paintings that are built into the Wolfram Language. Then you could start comparing the use of color in different paintings—maybe making a plot of how it changed over time, seeing if one can tell when different styles came in.

You could be talking about the economics of different countries—and you could immediately create your own infographics, working with students to see how best to present what’s important. You could be talking about history, and you could use the historical map data in the Wolfram Language to compare the conquests of Alexander the Great and Julius Caesar. Or you could ask about US presidents, make a timeline showing their administrations, and compare them using economic or cultural indicators.

Let’s say you’re teaching English grammar. Well, it certainly helps that the Wolfram Language can automatically diagram sentences. But you can also let students try their own rules for generating sentences—so they can see what generates something they think is grammatically correct, and what doesn’t. How about spelling? Can computational thinking help with that? I’m not sure. It’s certainly easy to take all the common words in English, and start trying out different rules one might think of. And it’s fun to discover exceptions (does “u” always follow “q”: it’s trivial in the Wolfram Language to find out).

It’s an interesting exercise to take standard pieces of the curriculum for different subjects and ask “can this be helped by applying computational thinking?”. Sometimes the first thing one thinks of may be a gimmick. But what I’ve found is that if one really asks what the point of that piece of the curriculum is, there will end up being a way that computational thinking can help, right from the foundations.

Over time, there will be a larger and larger inventory of great examples of all this. In the past, with math (the non-computer-based version), it’s been rather disappointing: there just aren’t that many examples that work. Yes, there are things like exponential growth that show up in a bunch of places, but by the time one realizes that the examples in the calculus books are in many cases the same as they were in the 1700s, it’s not looking so good. And with standard programming the picture isn’t much better: there are only so many places that the Fibonacci sequence shows up. But with knowledge-based programming in the Wolfram Language the picture is completely different. Because the language immediately connects to the data and computations that are relevant across essentially every domain.

OK, so if one’s going to be teaching computational thinking, how should it be organized? Should one for example have a Computational Thinking class? At the college level, I think Computational Thinking 101 is a good idea. In fact, it might well be the single most important course many students take. At the high-school level, I think it’s less obvious what should be done, and though I’m certainly no expert, my tendency is to think that computational thinking is better inserted into lots of different modules within different classes.

One obvious question is: what’s the startup cost to having students engage with computational thinking? My feeling is that with the technology we’ve got now, it’s extremely low. With Wolfram|Alpha, it’s zero. With Explorations in the Wolfram Language, it’s very close to zero. With free-form code in the Wolfram Language, there’s a small amount to know, and perhaps it’s better for this to be taught in one go, a little like a miniature version of what would be a “service math course” at the college level.

It’s worth mentioning that computational thinking is rather unique in its breadth of applicability across the curriculum. Everyone would like what’s learned in one class to be applied in others, but it doesn’t happen all that often. I’ve already mentioned the difficulties with traditional math. The situation is a bit better with writing, where one would at least hope that students use what they’ve learned in producing essays in other subjects. But most fields are taught in intellectual silos, with nothing learned in one even being referenced in others. With computational thinking, though, there’s vastly more cross-connection. The social network for the Shakespeare play involves the same computational ideas as a network for international trade, or a diagram of the relations between words in different languages. The visualization technique one might use for economic performance is the same as for sports results. And so on.

Every day lots of top scientists and technologists use the Wolfram Language to do lots of sophisticated things. But of course the big thing in recent times is that the Wolfram Language has got to the point where it can also readily be used by kids. And I’m not talking about some watered-down toy version. I’m talking about the very same Wolfram Language that the fanciest professionals use. (Yes, just like the English language where there are obscure words kids won’t typically use, so there are obscure functions in the Wolfram Language that kids won’t typically use.)

So what’s made this possible? It’s basically the layers and layers of automation that we’ve built into the Wolfram Language over the past thirty years. The goal is to automate as much as possible—so that the humans who use the Wolfram Language, whether they’re sophisticated professionals or middle-school kids, just have to provide the concepts and the computational thinking, and then the language takes over and automates the details of actually getting things done.

In the past, there always had to be separate systems for kids and professionals to use. But thanks to all this automation, they’ve converged. It’s happened before, in other fields. For example, in video editing. Where there used to be simple systems for amateurs and complicated systems for professionals—but now everyone, from kids to makers of the world’s most expensive movies, uses the very same systems.

It’s probably more difficult to achieve this in computational thinking and programming—but that’s what the past thirty years of work on the Wolfram Language has, I think, now definitively achieved.

In many standard curriculum subjects, kids in school only get to do pale shadows of what professionals do. But when it comes to computational thinking, they’ve now got the same tools—and it’s now realistic for them to do the same professional-grade kinds of things.

Most of what kids get to do in school has, in a sense, little visible leverage. Kids spend a lot of effort to produce one answer in math or chemistry or whatever. If kids write essays, they have to explicitly write out each word. But with computational thinking and the Wolfram Language, it’s a different story. Once a kid understands how to formulate something computationally, and how to write it to the Wolfram Language, then the language takes over to build what’s potentially a big and sophisticated result.

A student might have some idea about the growth and decay of historical empires, and might figure out how to formulate the idea in terms of time series of geographic areas of historical countries. And as soon as they write this idea in the Wolfram Language, the language takes over, and pretty soon the student has elaborate tables and infographics and whatever—from which they can then draw all sorts of conclusions.

But what do kids learn from writing things in the Wolfram Language? Well, first and foremost, they learn computational thinking. Computational thinking is really a new way of thinking. But it’s got certain similarities in its character to other things kids do. Like math, for example, it forces a certain precision and clarity of thinking. But like writing, it’s fundamentally about communicating ideas. And also like writing, it’s a fundamentally creative activity. Good code in the Wolfram Language, like good writing, is clear and elegant—and can readily be read and understood. But unlike ordinary writing, humans aren’t the only target audience: it’s also for computers, to tell them what to automatically do.

When students do problems in math or chemistry or other subjects, the only way they can typically tell if they’ve got the right answer is for their teacher to tell them, or for them to “look it up in the back of the book”. But it’s a whole different story with Wolfram Language code. Because kids themselves can tell if they’re on the right track. The code was supposed to make a honeycomb-like array. Well, did it?

The whole process of creating code is a little different from anything else kids normally do. There’s formulating the code, and then there’s debugging it. Debugging is a very interesting intellectual exercise. The mechanics of it are vastly easier in the Wolfram Language than they’ve ever been before—because the Wolfram Language is symbolic, so any fragment of code can always be run on its own, and separately studied.

But debugging is ultimately about understanding, and problem solving. It’s a very pure form of what comes up in a great many things in life. But what’s really nice about it—particularly in the Wolfram Language—is the instant feedback. You changed something; did it help? Or do you have to dive in and figure out something else?

Part of debugging is just about getting a piece of code to produce something. But the other part is understanding if it produces the right thing. Is that really a sensible social network for the Shakespeare play? Why are there lots of characters who don’t seem to connect to anyone else? Let’s understand how we defined “connectivity”. Does it really make sense? Is there a better definition?

This is the kind of thing computational thinking is about. It’s not so much about programming: it’s about what should be programmed; it’s about the overall problem of formulating things so they can be put into computational form. And now—with today’s Wolfram Language—we have an environment for taking what’s been formulated, and actually turning it into something real.

When I show computational thinking and the Wolfram Language to kids, I’ll usually try to figure out what the kids are interested in. Are they into art? Or science? Or history? Or videogames? Or what? Then—and it’s always fun for me to do this—I’ll come up with an example that relates to their interest. And we’ll run it. And it’ll produce some result, maybe some image or visualization. And then the kids will look at it, and think about it based on what they already know. And then, almost always, they’ll ask question. “How does this extend to that?” “What about doing this instead?” And this is where things get really good. Because when the kids are asking their own questions, you can tell they’re getting seriously engaged; they’re really thinking about what’s going on.

Most subjects that are taught in school are somewhat tightly constrained. Questions can be asked, but they’re more like typical “tech support”: help me to understand this existing feature. They’re not like “let’s talk about something new”. A few times I’ve done “ask me anything” sessions about science with kids. It’s an interesting experience. There’ll be a question where, yes, it can easily be answered from college-level physics. Then another question that might require graduate-level knowledge. And then—whoosh—there’ll be an obvious-sounding question which I know simply hasn’t been answered, even by the latest leading-edge research. Or maybe one where, yes, I know the answer, but only because just last month I happened to talk to the world expert who recently figured out. Before I tried these kinds of “ask me anything” sessions I didn’t really appreciate how hard it can be when kids ask “free-range” questions. But now I understand why unless one has teachers with broad research-level knowledge there’s little choice but to make traditional school subjects much more tightly constrained.

But there’s something new that’s possible using the Wolfram Language as a tool. Because with the Wolfram Language a teacher doesn’t have to know the whole answer to a question: they just have to be able to formulate the question in a computational way, so the Wolfram Language can compute the answer. Yes, there’s skill required on the part of the teacher to be able to write in the Wolfram Language. But it’s really fun—and educational—for student and teacher together to be getting the answers to questions.

I’ve often done what I call “live experiments”. I take some topic—either suggested by the audience, or that I thought of just before I start—and then I explore that topic live with the Wolfram Language, and see what I can discover about it. It’s gotten a lot easier over the years, as the capabilities and level of automation in the Wolfram Language have increased. I usually open our Wolfram Summer School by doing a live experiment. And I’ll make the claim that over the course of an hour or so, we’ll build up a notebook where we’ve discovered something new and interesting enough that it could be the seed for an academic paper or the like. It can be quite nerve-wracking for me. But in almost all cases it works out extremely well. And I think it’s an educational and empowering thing to watch. Because most people don’t realize that it’s even faintly possible to go from zero to a publishable discovery in an hour. But that’s what the modern Wolfram Language makes possible. And while it obviously helps that I personally have a lifetime of experience in computational thinking and discovering things, it’s surprisingly easy for anyone with decent knowledge of computational thinking and the Wolfram Language to do a very compelling live experiment.

When I was a kid I was never a fan of exercises in textbooks. I always took the point of view that it wasn’t very exciting to do the same thing lots of people had already done. And so I always tried to think of different questions that I could explore, and where I could potentially see things that nobody had seen before. Well, in modern times with the Wolfram Language, doing things that have never been done before has become vastly easier. Not every kid has the same motivation structure as I had. But for many people there’s extra satisfaction in being able to make something that’s really their own creation—and not just a re-run of what’s been made before. And at a practical level, it’s great that with the Wolfram Cloud it’s easy to share what’s made—and for example to create one’s own active website or app, that one can show to one’s class, one’s friends, or the world.

So where are there discoveries that can be made by kids? Everywhere! Even in a technical, well-developed area like math, there’s endless experimental mathematics to be done, where discoveries can be made. In the sciences, there’s a small additional hurdle, because one’s typically got to deal with actual data. Of course, there’s lots of data built right into the Wolfram Language. And it’s easier than ever to get more data. Perhaps one just uses a camera or a microphone, or, more elaborately, one gets sensors connected through Raspberry Pi or Arduino, or whatever.

So what about the humanities? Well, here again one needs data. But again there’s lots of it that’s built right into the Wolfram Language (images of famous artworks, texts of famous books, information on historical countries, and so on and so on). And in today’s world, it’s become extremely easy to find more data on the web—and to import it into the Wolfram Language. Sometimes there’s some data curation involved (which itself is interesting and educational), but it’s amazing in modern times how easy it’s become to find, for example, even obscure documents from centuries ago on the web. (And, yes, that’s one of the things that’s really helped my own hobby of studying history.)

Computational thinking is an area that really lends itself to project-based learning. Every year for our summer programs, I come up with hundreds of ideas for projects that are accessible to kids. And with a little help, the kids themselves come up with even more. For our summer programs, we have kids work on projects on their own, but it’s easy for kids to collaborate on these projects too. We typically have a definite end point for projects: create a Demonstration, or a web app, and write a description, perhaps to post on the Wolfram Community. (Particularly with Demonstrations for our Wolfram Demonstrations Project, the actual process of review and publication tends to be educational too.)

Of course, even when a particular project has been “done before”, it’ll usually be different if it’s done again. At the very simplest level, writing code is a creative process and different people will write it differently. And if there are visualizations or user interfaces as part of the project, each person can creatively invent new ways to do these.

But, OK, all this creative stuff is well and good. But in practice a lot of education has to be done in more of a production-line mode, with large numbers of students in some sense always doing the same thing. And even with this constraint, there’s something good about computational thinking, and coding in the Wolfram Language. One of the convenient features of math is that when people do exercises, they get definite answers, which are easy to check (well, at least up to issues of equivalence of algebraic expressions, which basically needs our whole math technology stack to get right). When people write essays, there’s basically no choice but to have actual humans read them (yes, one can determine some things with natural language processing and machine learning, but the real point of essays is to communicate with humans, and ultimately to tell if that’s working you really need humans in the loop).

Well, when one writes a piece of code, it’s a creative act, like writing an essay. But now one’s making something that’s set up to be communicated to a computer. And so it makes perfect sense to have a computer read it and assess it. It’s still not a trivial task, though. Because, for example, one wants to check that the student didn’t in effect just put the final answer right into the code they wrote—and that the code really did express, preferably with clarity, a computational idea. It gets pretty high tech, but by using the symbolic character of the Wolfram Language, plus some automated theorem proving and machine learning, it seems to be possible to do very well on this in practice. And that’s for example what’s allowed us to put automatically graded versions of the exercises from my *Elementary Introduction* book on the web.

At one level one can assess what’s going on by looking at the final code students write. Even though there may be an infinite number of different possible programs, one can assess which ones are correct, and even which ones satisfy particular efficiency or elegance criteria. But there’s much further one can go. Because unlike an area like math where students tend to do their thinking on scratch paper, in coding each step in the process of writing a program tends to be done on the computer, with every keystroke able to be captured. I myself have long been an enthusiast of personal analytics, and occasionally I’ve done at least a little analysis on the process by which I write and debug programs. But there’s a great opportunity in education for this, first in producing elaborate educational analytics (for which the Wolfram Language and Wolfram Cloud are a perfect fit), and then for creating deep ways of adapting to the actual behavior and learning process of each individual student.

Ultimately what we presumably want is an accurate computational model of every student. And with the current machine learning technology that we have in the Wolfram Language I think we’re beginning to have what’s needed to build it. Given this model what we’d then presumably do is in effect to run lots of simulations of what would happen if the student were told this or that, trying to determine what the optimal thing to explain, or optimal exercise to give, would be at any given time.

In helping with an area like basic math, this kind of personalization is fairly easy to do with simple heuristics. When it comes to helping with coding and computational thinking, the problem is considerably more complicated. But it’s a place where, with good computational thinking, and sophisticated computation inside the system, I think it’ll be possible to do something really good.

I might mention that there’s always a question of what one should assess to find out if someone has really understood a particular thing. With a good computational model of every student, one could have a very sophisticated answer to this. But somewhere one’s still going to have to invent types of exercises or tests to give (well, assuming that one doesn’t just go for the arguably much better scheme of just assessing whole projects).

One fundamental type of exercise—of which my *Elementary Introduction* is full—is of the form “write a piece of code to do X”. But there are others too. One is “simplify this piece of code”, or “find an input where this function will fail”. Of course, there are exercises like “what will this piece of code do?”. But in some sense exercises like that seem silly: after all, one can just run the code to find out.

Now, I have to say I think it’s useful for people to do a bit of “acting like a computer”. It’s helpful in understanding what computation is, and how the process of computation works. But it’s not something to do a lot of. The real focus, I think, should be on educating people about what they themselves actually need to do. There is technology and automation in the world, and there’ll be more of it over time. There’s no point in teaching people to do a computer’s job; one should teach them to do what only they can do, working with the computer as a tool and partner, in the best possible way.

(I’ve heard arguments about teaching kids how to do arithmetic without calculators that go along the lines of “what if you were on a desert island without a calculator?”. And I can hear it now having someone make the same argument about teaching kids how to work out what programs do by hand. But, er, if you’re on a desert island without a computer, why exactly are you writing code? [Of course, when code literacy becomes more universal, it might be a different story, because humans on a desert island might be writing code to read themselves...])

OK, so what are the important things to teach? Computational thinking is really about thinking. It’s about formulating ideas in a structured way, that, conveniently enough, can in the modern world be communicated to a computer, which can then do interesting things.

There are facts and ideas to know. Some of them are about the abstract process of computation. But some of them are about how things in the world get made systematic. How is color represented? How are points on the Earth specified? How does one represent the glyphs of different human languages? And so on. We made a poster a few years ago of the history of the systematic representation of data. Just the content of that poster would make an interesting course.

But, OK, so if one knows something about how to represent things, and about the processes of computation, what should one learn how to figure out? The fundamental goal is to get to the point where one’s able to take something one wants to know or do, and be able to cast it into computational form.

Often that’s about “inventing an algorithm”, or “inventing a heuristic”. What’s a good way to compare the growth of the Roman Empire with the spread of the Mongols? What’s the right thing to compute? The right thing to display? How can one tell if there are really more craters near the poles of the Moon? What’s a good way to identify a crater from an image anyway?

It’s the analog of things like this that are at the core of making progress in basically every “computational X” field. And it’s people who learn to be good at these kinds of things who will be the most successful in these fields. Around our company, many of these “invent an algorithm; invent a heuristic” kinds of problems are solved every day—and that’s a large part of what’s gone into building up the Wolfram Language, and Wolfram|Alpha, all these years.

Yes, once the algorithm or the heuristic is invented, it’s up to the computer to execute it. But inventing it is typically first and foremost about understanding what’s wanted in a clear and structured enough way that it can be made computational. With effort, one can invent disembodied exercises that are as abstract as possible. But what’s much more common—and useful—is to have questions that connect to the outside world.

Even a question like “Given a bunch of x,y pairs, what’s a good algorithm for deciding if one should plot them as separate points, or with a line joining them?” is really a question that depends on thinking about the way the world is. And from an educational point of view, what’s really nice about questions of computational thinking is that they almost inevitably involve input from other domains of knowledge. They force a certain kind of broad, general thinking, and a certain application of common sense, that is incredibly valuable for so much of what people need to do.

Teaching “coding” is something that’s been talked about quite a lot in the past few years. Of course, “coding” isn’t the same as computational thinking. It’s a little bit like the relation of handwriting or typing to essay writing. You (normally) need handwriting or typing to be able to actually produce an essay, but it’s not the intellectual core of the activity. But, OK, so how should one teach “coding”?

Well, in the Wolfram Language the idea is that one should be able to take ideas as humans formulate them with computational thinking, and convert them as directly as possible into code in the language. In some small cases (and they’ll gradually get a bit bigger) it’s possible to just specify what one wants in English. But normally one’s writing directly in the Wolfram Language. Which means at some level one’s doing coding, otherwise known as programming.

It’s a much higher-level form of programming, though, than most programmers are used to. And that’s precisely why it’s now accessible to a much broader range of people, and why it makes sense to inject it on a large scale into education.

So how does it relate to “traditional” programming education? There are really two types of programming education that have been tried: what one might call the “high-school version” and the “elementary-school version”. These days the high-school version is mostly about C++ and Java. The elementary-school version is mostly about derivatives of Logo like Scratch. I’ve been shocked, though, that even among technically-oriented kids educated at sophisticated schools in the US, it’s still surprisingly rare to find ones who’ve learned any serious amount of programming in school.

But when they do learn about “programming”, say in high school, what do they actually learn? There’s usually a lot of syntactic detail, but the top concepts tend to be conditionals, loops and variables. As someone who’s spent most of his life thinking about computation, this is really disappointing. Yes, these concepts are certainly part of low-level computer languages. But they’re not central to what we now broadly understand as computation—and in computational thinking in general they’re at best side shows.

What is important? In practice, probably the single most important concept is just that everything (text, images, networks, user interfaces, whatever) can be represented in computational form. Ideas like functions and lists are also important. And if one’s being intellectual, the notion of universal computation (which is what makes software possible) is important too.

But the problem is that what’s being taught now is not only not general computational thinking, it’s not even general programming. Conditionals, loops and variables were central to the very first practical computer languages that emerged in the 1960s. Today’s computer languages—like C++ and Java—have much better ways to manage large volumes of code. But their underlying computational structure is remarkably similar to the 1960s languages. And in fact kids—who are typically writing very small amounts of code—end up really just dealing with computing as it was in the 1960s (though perhaps with a mechanisms aimed at large codebases making it more complicated).

The Wolfram Language is really a language of modern times. It wouldn’t have been practical at all in the 1960s: computers just weren’t big and fast enough, and there wasn’t anything like the cloud in which to maintain a large knowledgebase. (As it happens, there were languages like LISP and APL even in the early 1960s that had higher-level ideas reminiscent of the Wolfram Language, but it took decades before those ideas could really be used in practice.)

So what of loops and conditionals and variables? Well, they all exist in the Wolfram Language. They just aren’t front and center concepts. In my *Elementary Introduction* book, for example, it’s Chapter 38 before I talk about assigning values to variables, and it happens after I’ve discussed deploying sophisticated knowledge-based apps to the web.

To give an example, let’s say one wants to make a table of the first 10 squares. In the Wolfram Language one could do this very simply, with:

But if one’s working in C for example, it’d be roughly:

A non-programmer might ask: “What the heck is all that stuff?” Well, instead of just saying directly what we want, what it’s doing is telling the computer at a low level exactly what it should do. We’re telling it to allocate memory to store the integer value of n. We’re saying to start with n=1, and keep incrementing n until it gets to 10. And then we’re saying in each case to the computer that it should print the square. There’s a lot of detail. (To be fair, in a more modern language like Python or JavaScript, some of this goes away, but in this example we’re still left dealing with an explicit loop and its variable.)

Now, the crucial point is that the loops and conditionals and variables aren’t the real point of the computation; they’re just details of the particular implementation in a low-level language. I’ve heard people say it’s simpler for kids to understand what’s going on when there are explicit loops and conditionals and variables. From my observations this simply isn’t true. Maybe it’s something that’s changed over the years, as people have gotten more exposed to computation and computational ideas in their everyday lives. But as of now, talking about the details of loops and conditionals and variables just seems to make it harder for kids to understand the concepts of computation.

Is it useful to learn about loops and conditionals and variables at some point? Definitely. They’re part of the whole story of computation and computational thinking. They’re just not the most important part, or the first part to learn. Oh, and by the way, if one’s going to start talking about doing computation with images or networks or whatever, concepts like loops really aren’t what one wants at all.

One important feature of the Wolfram Language is that in its effort to cover general computational thinking it integrates a large number of different computational paradigms. There’s functional programming. And procedural programming. And list-based programming. And symbolic programming. And machine learning and example-based programming. And so on. So when people learn the Wolfram Language, they’re immediately getting exposed to a broad spectrum of computational ideas, conveniently all consistently packaged together.

But what happens when someone who’s learned programming in the Wolfram Language wants to do low-level programming in C++ or Java? I’ve seen this a few times, and it’s been quite charming. They seem to have no difficulty at all grasping how to do good programming in these lower-level languages, but they keep on exclaiming about all the quaint things they have to do, and all the things that don’t work. “Oh my gosh, I actually have to allocate memory myself”. “Wow, there’s a limit on the size of an integer”. And so on.

The transition from the Wolfram Language to lower-level languages seems to be easy. The other way around it’s sometimes a little more challenging. And I must say that I often find it easier to teach computational thinking to kids who know nothing about programming: they pick up the concepts very quickly, and they don’t have to unlearn the idea that everything must turn into loops and conditionals and so on.

When I started considering teaching computational thinking and the Wolfram Language to kids, I imagined it would mostly be high-school kids. But particularly when my *Introduction* book came out, I was surprised to learn that all sorts of 11- and 12-year-olds were going through it. And my current conclusion is that what we’ve got with Wolfram Programming Lab and so on is suitable for kids down to about age 11 or 12.

What about younger kids? Well, in today’s world, all of them are using computers or smartphones, and are getting exposed to all sorts of computational activities. Maybe they’re making and editing videos. Maybe they’re constructing assets for a game. And all of these kinds of activities are good precursors to computational thinking.

Back in the 1960s, a bold experiment was started in the form of Logo. I’m told the original idea was to construct 50 “microworlds” where kids could experiment with computers. The very first one involved a “turtle” moving around on the screen—and over the course of a half-century this evolved into things like Scratch (which has an orange cat rather than a turtle). Unfortunately, however, the other 49 microworlds never got built. And while the turtle (or cat) is quite cute (and an impressive idea for the 1960s), it seems disappointingly narrow from the point of view of today’s understanding and experience of computation.

Still, lots of kids are exposed to things like Scratch in elementary school—even if sometimes only for a single “hour of code” in a year. In past years, there was clear value in having younger kids get the idea that they could make a computer do what they want at all. But the proliferation of other ways young kids use computation and computational ideas has made this much less significant. And yes, teaching loops and conditionals to elementary-school kids does seem a bit bizarre in modern times.

I strongly suspect that there are some much better ways to teach ideas of computational thinking at young ages—making use of all the technology and automation we have now. One feature of systems like Scratch is that their programs are assembled visually out of brick-like blocks, rather than having to be typed. Usually in practice the programs are quite linear in their structure. But the blocks do two things. First, they avoid the need for any explicit syntax (instead it’s just “does the block fit or not?”). And second, by having a stack of possible blocks on the side of the screen, they immediately document what’s possible.

And perhaps even more important: this whole setup forces one to have only a small collection of possible blocks, in effect a microworld. In the full Wolfram Language, there are over 5000 built-in functions, and just turning them all into blocks would be overwhelming and unhelpful. But the point is to select out of all these possible functions several (50?) microworlds, each involving only a small set of functions, but each chosen so that rich and interesting things can be done with them.

With our current technology, those microworlds can readily involve image computation, or natural language understanding, or machine learning—and, most importantly, can immediately relate to the real world. And I strongly suspect that by including some of these far-past-the-1960s things, we’ll be able to expose young kids much more directly and successfully to ideas about computational thinking that they’ll be able to take with them when they come to learn more later.

The process of educating kids—and the world—about computational thinking is only just beginning. I’m very excited that with the Wolfram Language and the systems around it, we’ve finally got tools that I think solve the core technological problems involved. But there are lots of structural, organizational and other issues that remain.

I’m trying to do my part, for example, by writing my *Elementary Introduction to the Wolfram Language*, releasing Wolfram Programming Lab, and creating the free Wolfram Open Cloud. But these are just first steps. There need to be lots of books and courses aimed at different populations. There need to be online and offline communities and activities defined. There need to be ways to deliver what’s now possible to students. And there need to be ways to teach teachers how to help.

We’ve got quite a few basic things in the works. A packaged course based on the *Elementary Introduction*. A Wolfram Challenges website with coding and computational thinking challenges. A more structured mentorship program for individual students doing projects. A franchisable version of our Wolfram Summer Camp. And more. Some of these are part of Wolfram Research; some come from the Wolfram Foundation. We’re considering a broader non-profit initiative to support delivering computational thinking education. And we’ve even thought about creating a whole school that’s centered around computational thinking—not least to show at least one model of how it can be done.

But beyond anything we’re doing, what I’m most excited about is that other people, and other organizations, are starting to take things forward, too. There are in-school programs, after-school programs, summer programs. There are the beginnings of very large-scale programs across countries.

Our own company and foundation are fairly small. To be able to educate the world about computational thinking, many other people and organizations need to be involved. Thanks to three decades of work we are at the point where have the technology. But now we have to actually get it delivered to kids all over the world in the right way.

Computational thinking is something that I think can be successfully taught to a very wide range of people, regardless of their economic resources. And because it’s so new, countries or regions with more sophisticated educational setups, or greater technological prowess, don’t really have any great advantage over anyone else in doing it.

Eventually, much of the world’s population will be able to do computational thinking and be able to communicate with computers using code—just as they can now read and write. But today we’re just at the beginning of making this happen. I’m pleased to be able to contribute technology and a little more to this. I look forward to seeing what I hope will be rapid progress on this in the next year or so, and in the years to come.

*Try the example computations from this blog post in the Wolfram Open Cloud »*

I’m thrilled today to announce the release of a major new version of Mathematica and the Wolfram Language: Version 11, available immediately for both desktop and cloud. Hundreds of us have been energetically working on building this for the past two years—and in fact I’ve personally put several thousand hours into it. I’m very excited about what’s in it; it’s a major step forward, with a lot of both breadth and depth—and with remarkably central relevance to many of today’s most prominent technology areas.

It’s been more than 28 years since Version 1 came out—and nearly 30 years since I started its development. And all that time I’ve been continuing to pursue a bold vision—and to build a taller and taller stack of technology. With most software, after a few years and a few versions, not a lot of important new stuff ever gets added. But with Mathematica and the Wolfram Language it’s been a completely different story: for three decades we’ve been taking major steps forward at every version, progressively conquering vast numbers of new areas.

It’s been an amazing intellectual journey for me and all of us. From the very beginning we had a strong set of fundamental principles and a strong underlying design—and for three decades we’ve been able to just keep building more and more on these foundations, creating what is by now an unprecedentedly vast system that has nevertheless maintained its unity, elegance and, frankly, modernity. In the early years we concentrated particularly on abstract areas such as mathematics. But over time we’ve dramatically expanded, taking ever larger steps and covering ever more kinds of computation and knowledge.

Each new version represents both a lot of new ideas and a lot of hard work. But more than that, it represents ever greater leverage achieved with our technology. Because one of our key principles is automation, and at every version we’re building on all the automation we’ve achieved before—in effect, we’ve got larger and larger building blocks that we’re able to use to go further and further more and more quickly. And of course what makes this possible is all that effort that I and others have put in over the years maintaining a coherent design for the whole system—so all those building blocks from every different area fit perfectly together.

With traditional approaches to software development, it would have taken a great many years to create what we’ve added in Version 11. And the fact that we can deliver Version 11 now is a direct reflection of the effectiveness of our technology, our principles and our methodology. And as I look at Version 11, it’s very satisfying to see how far we’ve come not only in what’s in the system, but also in how effectively we can develop it. Not to mention that all these directions we’ve been pursuing for so many years as part of the logical development of our system have now turned out to be exactly what’s needed for many of today’s most active areas of technology development.

For many years we called our core system Mathematica. But as we added new directions in knowledge and deployment, and expanded far beyond things related in any way to “math”, we decided to introduce the concept of the Wolfram Language to represent the core of everything we’re doing. And the Wolfram Language now defines the operation not only of Mathematica, but also of Wolfram Development Platform and Wolfram Programming Lab, as well as other products and platforms. And because all our software engineering is unified, today we’re able to release Version 11 of all our Wolfram-Language-based systems, both desktop and cloud.

OK, so what’s the big new thing in Version 11? Well, it’s not one big thing; it’s many big things. To give a sense of scale, there are 555 completely new functions that we’re adding in Version 11—representing a huge amount of new functionality (by comparison, Version 1 had a total of 551 functions altogether). And actually that function count is even an underrepresentation—because it doesn’t include the vast deepening of many existing functions.

The way we manage development, we’ve always got a portfolio of projects going on, from fairly small ones, to ones that may take five years or more. And indeed Version 11 contains the results of several five-year projects. We’re always keen to deliver the results of our development as quickly as possible to users, so we’ve actually had several intermediate releases since Version 10—and effectively Version 11 represents the combination of many completely new developments together with ones that we’ve already previewed in 10.1, 10.2, 10.3 and 10.4. (Many functions that were tagged as “Experimental” in 10.x releases are now in full production form in Version 11.0.)

When you first launch Version 11 on your desktop the first thing you’ll notice is that notebooks have a new look, with crisper fonts and tighter design. When you type code there are lots of new autocompletions that appear (it’s getting harder and harder to type the wrong thing), and when you type text there’s a new real-time spellchecker, that we’ll be continually updating to make sure it has the latest words included.

If your computer system is set to any of a dozen languages other than English, you’ll also immediately see something else: every function is automatically annotated with a “code caption” in the language you’ve set:

When you actually run code, you’ll notice that messages look different too—and, very helpfully for debugging, they let you immediately see what chain of functions was being called when the message was produced.

There are lots of big, meaty new areas in Version 11. But let’s jump right into one of them: 3D printing. I made my first 3D printout (which didn’t last long before disintegrating) back in 2002. And we’ve had the ability to export to STL for years. But what’s new and exciting in Version 11 is that we’ve built a complete pipeline that goes from creating 3D geometry to having it printed on your 3D printer (or through a printing service).

Often in the past I’ve wanted to take a 3D plot and just make a 3D print of it. And occasionally I’ve been lucky, and it’s been easy to do. But most of the time it’s a fiddly, complicated process. Because graphics that display on the screen don’t necessarily correspond to geometry that can actually be printed on a 3D printer. And it turns out to be a difficult problem of 3D computational geometry to conveniently set up or repair the geometry so it really works on a 3D printer. (Oh, and if you get it wrong, you could have plastic randomly squirting out of your printer.)

In Version 11 it’s finally realistic to take any 3D plot, and just 3D print it. Or you can get the structure of a molecule or the elevation around a mountain, and similarly just 3D print it. Over the years I’ve personally made many 3D printouts. But each one has been its own little adventure. But now, thanks to Version 11, 3D printouts of everything are easy. And now that I think about it, maybe I need a 3D printout of the growth of the Wolfram Language by area for my desk…

In a sense, Mathematica and the Wolfram Language have always been doing AI. Over the years we’ve certainly been pioneers in solving lots of “AI-ish” problems—from mathematical solving to automated aesthetics to natural language understanding. But back in Version 10 we also made a great step forward in machine learning—developing extremely automated core functions (`Classify` and `Predict`) for learning by example.

I have to say that I wasn’t sure how well these functions would do in practice. But actually it’s been amazing to see how well they work—and it’s been very satisfying to see so many of our users being able to incorporate machine learning into their work, just using the automation we’ve built, and without having to consult any machine learning experts.

In Version 11 we’ve made many steps forward in machine learning. We’ve now got clean ways not just to do classification and prediction, but also to do feature extraction, dimension reduction, clustering and so on. And we’ve also done a lot of training ourselves to deliver pre-trained machine-learning functions. Machine-learning training is an interesting new kind of development. At its core, it’s a curation process. It’s just that instead of, say, collecting data on movies, you’re collecting as many images as possible of different kinds of animals.

Built into Version 11 are now functions like `ImageIdentify` that identify over 10,000 different kinds of objects. And through the whole design of the system, it’s easy to take the features that have been learned, and immediately use those to train new image classifiers vastly more efficiently than before.

We’ve done a lot to automate today’s most common machine learning tasks. But it’s become clear in the past few years that an amazing number of new areas can now be tackled by modern machine learning methods, and particularly by using neural networks. It’s really an amazing episode in the history of science: the field of neural networks, that I’ve followed for almost 40 years, has gone from seeming basically hopeless to being one of the hottest fields around, with major new discoveries being made almost every week.

But, OK, if you want to get involved, what should you do? Yes, you can cobble things together with a range of low-level libraries. But in building Version 11 we set ourselves the goal of creating a streamlined symbolic way to set up and train neural networks—in which as much of what has to be done as possible has been automated. It’s all very new, but in Version 11 we’ve now got functions like `NetGraph` and `NetChain`, together with all sorts of “neural net special functions”, like `DotPlusLayer` and `ConvolutionLayer`. And with these functions it’s easy to take the latest networks and quickly get them set up in the Wolfram Language (recurrent networks didn’t quite make it into Version 11.0, but they’re coming soon).

Of course, what makes this all really work well is the integration into the rest of the Wolfram Language. The neural network is just a `Graph` object like any other graph. And inputs like images or text can immediately and automatically be processed using standard Wolfram Language capabilities into forms appropriate for neural network computation.

From their name, “neural networks” sound like they’re related to brains. But actually they’re perfectly general computational structures: they just correspond to complex combinations of simple functions. They’re not unrelated to the simple programs that I’ve spent so long studying—though they have the special feature that they’re set up to be easy to train from examples.

We’ve had traditional statistical data fitting and interpolation forever. But what’s new with neural networks is a vastly richer space of possible computational structures to fit data to, or to train. It’s been remarkable just in the past couple of years to see a sequence of fields revolutionized by this—and there will be more to come.

I’m hoping that we can accelerate this with Version 11. Because we’ve managed to make “neural net programming” really just another programming paradigm integrated with all the others in the Wolfram Language. Yes, it’s very efficient and can deal with huge training sets. But ultimately probably the most powerful thing is that it immediately fits in with everything else the Wolfram Language does. And even in Version 11 we’re already using it in many of our internal algorithms in areas like image, signal and text processing. It’s still early days in the history of “neural net programming”—but I’m excited for the Wolfram Language to play a central part in what’s coming.

OK, let’s turn to another new area of Version 11: audio. Our goal is to be able to handle any kind of data directly in the Wolfram Language. We’ve already got graphics and images and geometry and networks and formulas and lots else, all consistently represented as first-class symbolic structures in the language. And as of Version 11, we’ve now always got another type of first-class data: audio.

Audio is mostly complicated because it’s big. But in Version 11 we’ve got everything set up so it’s seamless to handle, say, an hour of audio directly in the Wolfram Language. Behind the scenes there’s all sorts of engineering that’s caching and streaming and so on. But it’s all automated—and in the language it’s just a simple `Audio` object. And that `Audio` object is immediately amenable to all the sophisticated signal processing and analysis that’s available in the Wolfram Language.

The Wolfram Language is a knowledge-based language. Which means it’s got lots of knowledge—about both computation and the world—built into it. And these days the Wolfram Language covers thousands of domains of real-world knowledge—from countries to movies to companies to planets. There’s new data flowing into the central Wolfram Knowledgebase in the cloud all the time, and we’re carefully curating data on new things that exist in the world (who knew, for example, that there were new administrative divisions recently created in Austria?). Lots of this data is visible in Wolfram|Alpha (as well as intelligent assistants powered by it). But it’s in the Wolfram Language that the data really comes alive for full computation—and where all the effort we put into ensuring its alignment and consistency becomes evident.

We’re always working to expand the domains of knowledge covered by the Wolfram Language. And in Version 11 several domains that we’ve been working on for many years are finally now ready. One that’s been particularly difficult is anatomy data. But in Version 11 we’ve now got detailed 3D models of all the significant structures in the human body. So you can see how those complicated bones in the foot fit together. And you can do computations on them. Or 3D print them. And you can understand the network of arteries around the heart. I must say that as I’ve explored this, I’m more amazed than ever at the level of morphological complexity that exists in the human body. But as of Version 11, it’s now a domain where we can actually do computations. And there are perhaps-unexpected new functions like `AnatomyPlot3D` to support it. (There’s certainly more to be done, by the way: for example, our anatomy data is only for a single “average adult male”, and the joints can’t move, etc.)

A completely different domain of data now handled in the Wolfram Language is food. There’s a lot that’s complicated in this domain. First, there are issues of ontology. What is an apple? Well, there’s a generic apple, and there are also many specific types of apples. Then there are issues of defining amounts of things. A cup of strawberries. Three apples. A quarter pounder. It’s taken many years of work, but we’ve now got a very robust symbolic way to represent food—from which we can immediately compute nutritional properties and lots of other things.

Another area that’s also been long coming is historical country data. We’ve had very complete data on countries in modern times (typically from 1960 or 1970 on). But what about earlier history? What about Prussia? What about the Roman Empire? Well, in Version 11 we’ve finally got at least approximate border information for all serious country-like entities, throughout recorded history. So one can do computations about the rise and fall of empires right from within the Wolfram Language.

Talking of history, a small but very useful addition in Version 11 is historical word frequency data. Just ask `WordFrequencyData` for a time series, and you’ll be able to see how much people talked about “war”—or “turnips”—at different times in history. Almost every plot is a history lesson.

Another convenient function in Version 11 is `WikipediaData`, which immediately gives any Wikipedia entry (or various kinds of data it contains). There’s also `WolframLanguageData`, which gives computable data on the Wolfram Language itself—like the examples in its documentation, links between functions, and so on.

In many domains one’s mostly just dealing with static data (“what is the density of gold?”; “what was the population of London in 1959?”). But there are other domains where one’s not really interested in static data so much as data-backed computation. There are several new examples of this in Version 11. Like human mortality data (“what is the probability of dying between age X and Y?”), standard ocean data (“what is the pressure at a depth X?”), radioactive stopping power and human growth data—as well as data on the whole universe according to standard cosmological models.

Also new in Version 11 are `WeatherForecastData` and `MathematicalFunctionData`. Oh, as well as data on Pokémon and lots of other useful things.

One of the very powerful features of the Wolfram Language is its ability to compute directly with real-world entities. To the Wolfram Language, the US, or Russia, or a type of lizard are all just entities that can be manipulated as symbolic constructs using the overall symbolic paradigm of the language. Entities don’t directly have values; they’re just symbolic objects. But their properties can have values: `[[USA]]["Population"]` is 322 million.

But let’s say we don’t just want to take some entity (like the US) and find values of its properties. Let’s say instead we want to find what entities have certain properties with particular values. Like let’s say we want to find the 5 largest countries in the world by population. Well, in Version 11 there’s a new way to do this. Instead of specifying a particular explicit entity, we instead specify a computation that implicitly defines a class of entities. And so for example we can get a list of the 5 largest countries by population like this:

`TakeLargest[5]` is an operator form of a new function in Version 11 that gets the largest elements in a list. Implicit entities end up making a lot of use of operator forms—much like queries in `Dataset`. And in a sense they’re also making deep use of the symbolic character of the Wolfram Language—because they’re treating the functions that define them just like data.

The whole mechanism of entities and properties and implicit entities works for all the different types of entities that exist in the Wolfram Language. But as of Version 11, it’s not limited to built-in types of entities. There’s a new construct called `EntityStore` that lets you define your own types of entities, and specify their properties and values and so on—and then seamlessly use them in any computation.

Just as `Dataset` is a powerful hierarchical generalization of typical database concepts, so `EntityStore` is a kind of symbolic generalization of a typical relational database. And if you set up a sophisticated entity store, you can just use `CloudDeploy` to immediately deploy it to the cloud, so you can use it whenever you want.

One aspect of “knowing about the real world” is knowing about geography. But the Wolfram Language doesn’t just have access to detailed geographic data (not only for Earth, but also for the Moon, Mars, even Pluto); it can also compute with this data. It’s got a huge collection of geo projections, all immediately computable, and all set up to support very careful and detailed geodesy. Remember spherical trigonometry? Well, the Wolfram Language doesn’t just assume the Earth is a sphere, but correctly computes distances and areas and so on, using the actual shape of the Earth.

When it comes to making maps, the Wolfram Language now has access not only to the street map of the world, but also to things like historical country borders, as well as at least low-resolution satellite imagery. And given the street map, there’s an important new class of computations that can be done: travel directions (and travel times) along streets from anywhere to anywhere.

Version 11 has lots and lots of new capabilities across all areas of the Wolfram Language. But it’s also got lots of new capabilities in traditional Mathematica areas—like calculus. And back in earlier versions, what we’ve just added for calculus in Version 11 is big enough that it would have undoubtedly been the headline new feature of the version.

One example is differential eigensystems: being able to solve eigenvalue versions of both ODEs and PDEs. There’s a huge stack of algorithmic technology necessary to make this possible—and in fact we’ve been building towards it for more than 25 years. And what’s really important is that it’s general: it’s not something where one has to carefully set up some particular problem using elaborate knowledge of numerical analysis. It’s something where one just specifies the equations and their boundary conditions—and then the system automatically figures out how to solve them.

Back around 1976 I wrote a Fortran program to solve an eigenvalue version of the 1D Schrödinger equation for a particle physics problem I was studying. In 1981 I wrote C programs to do the same thing for some equations in relativistic quantum mechanics. I’ve been patiently waiting for the day when I can just type in these problems, and immediately get answers. And now with Version 11 it’s here.

Of course, what’s in Version 11 is much more powerful and more general. I was dealing with simple boundary conditions. But in Version 11 one can use the whole Wolfram Language geometry system—and all the data we have—to set up boundary conditions. So it’s easy to find the eigenmodes of a “drum” of any shape—like the shape of the US.

For something like that, there’s no choice but to do the computation numerically. Still, Version 11 will do differential eigensystem computations symbolically when it can. And Version 11 also adds some major new capabilities for general symbolic differential equations. In particular, we’ve had a large R&D project that’s now gotten us to the point where we can compute a symbolic solution to pretty much any symbolic PDE that would appear in any kind of textbook or the like.

Back in 1979 when I created a precursor to Mathematica I made a list of things I hoped we’d eventually be able to do. One of the things on that list was to solve integral equations. Well, I’m excited to be able to say that 37 years later, we’ve finally got the algorithmic technology stack to make this possible—and Version 11 introduces symbolic solutions to many classes of integro-differential equations.

There’s more in calculus too. Like Green’s functions for general equations in general domains. And, long awaited (at least by me): Mellin transforms. (They’ve been a favorite of mine ever since they were central to a 1977 particle physics paper of mine.)

It’s not classic calculus fare, but in Version 11 we’ve also added a lot of strength in what one might consider “modern calculus”—the aspects of calculus needed to support areas like machine learning. We’ve got more efficient and robust minimization, and we’ve also got sophisticated Bayesian minimization, suitable for things like unsupervised machine learning.

Things like partial differential equations are sophisticated math, that happen to be very important in lots of practical applications in physics and engineering and so on. But what about more basic kinds of math, of the kind that’s for example relevant to high-school education? Well, for a long time Mathematica has covered that very thoroughly. But as our algorithmic technology stack has grown, there are a few new things that become possible, even for more elementary math.

One example new in Version 11 is full automatic handling of discontinuities, asymptotes, and so on in plots of functions. So now, for example, `Tan[x]` is plotted in the perfect high-school way, not joining –∞ and +∞. For `Tan[x]` that’s pretty simple to achieve. But there’s some seriously sophisticated algorithmic technology inside to handle this for more complicated functions.

And, by the way, another huge new thing in Version 11 is `MathematicalFunctionData`—computable access to 100,000 properties and relations about mathematical functions—in a sense encapsulating centuries of mathematical research and making it immediately available for computation.

We’ve been doing a lot recently using the Wolfram Language as a way to teach computational thinking at all levels. And among many other things, we’ve wanted to make sure that any computation that comes up—say in math—in elementary school education is really elementary to do in the Wolfram Language. And so we’ve got little functions like `NumberExpand`, which takes `123` and writes it as `{100, 20, 3}`. And we’ve also got `RomanNumeral` and so on.

And, partly as a tribute to the legacy of Logo, we’ve introduced `AnglePath`—a kind of industrial-scale version of “turtle graphics”, that happens to be useful not just for elementary education, but for serious simulations, say of different types of random walks.

One of the central goals of the Wolfram Language is to have everything seamlessly work together. And in Version 11 there are some powerful new examples of this going on.

Time series, for example, now work directly with arithmetic. So you can take two air pressure time series, and just subtract them. Of course, this would be easy if all the time points in the series lined up. But in Version 11 they don’t have to: the Wolfram Language automatically handles arbitrarily irregular time series.

Another example concerns units. In Version 11, statistical distributions now work seamlessly with units. So a normal distribution can have not just a variance of 2.5, but a variance of 2.5 meters. And all computations and unit conversions are handled completely automatically.

Geometry and geometric regions have also been seamlessly integrated into more parts of the system. Solvers that used to just take variables can now be given arbitrary regions to operate over. Another connection is between images and regions: `ImageMesh` now takes any image and constructs a geometric mesh from it. So, for example, you can do serious computational geometry with your favorite cat picture if you want.

One final example: random objects. `RandomInteger` and `RandomReal` are old functions. Version 8 introduced `RandomVariate` for picking random objects from arbitrary symbolic probability distributions. Then in Version 9 came `RandomFunction`, for generating functions from random processes. But now in Version 11 there’s more randomness. There’s `RandomPoint`, which picks a random point in any geometric region. And there’s also `RandomEntity` that picks a random entity, as well as `RandomWord`—that’s useful for natural language processing research, as well as being a nice way to test your vocabulary in lots of languages… And finally, in Version 11 there’s a whole major new mathematical area of randomness: random matrices—implemented with all the depth and completeness that we’ve made a hallmark of Mathematica and the Wolfram Language.

One of the long-term achievements of Mathematica and the Wolfram Language has been that they’ve made visualization a routine part of everyday work. Our goal has always been to make it as automatic as possible to visualize as much as possible. And Version 11 now makes a whole collection of new things automatic to visualize.

There are very flexible word clouds that let one visualize text and collections of strings. There are timeline plots for visualizing events in time. There are audio plots that immediately visualize short and long pieces of audio. There are dendrograms that use machine learning methods to show hierarchical clustering of images, text, or any other kind of data. There are geo histograms to show geographic density. There’s a `TextStructure` function that diagrams the grammar of English sentences. And there are anatomy plots, to show features in the human body (making use of symbolic specifications, since there aren’t any explicit coordinates).

What other kinds of things are there to visualize? Well, one thing I’ve ended up visualizing a lot (especially in my efforts in basic science) are the rules for simple programs like cellular automata. And in Version 11 we’ve added `RulePlot` for automatically visualizing rules in many different styles.

Another longstanding visualization challenge has been how to automatically visualize 3D distributions of data. The issue tends to be that it’s hard to “see into” the 3D volume. But in Version 11 we’ve got a bunch of functions that solve this in different ways, often by making slices at positions that are defined by our geometry system.

In the quest for automation in visualization, another big area is labeling. And in Version 11 we’ve added `Callout` to make it possible to specify callouts for points, lines and regions (we’ve already got legends and tooltips and so on). There’s a trivial way to do callouts: just always put them (say) on the left. But that’d be really bad in practice, because lots of callouts could end up colliding. And what Version 11 does instead is something much more sophisticated, involving algorithmically laying out callouts to optimally achieve aesthetic and communication goals.

Mathematica and the Wolfram Language have always been able to handle character strings. And in Version 10 there was a huge step forward with the introduction of `Interpreter`—in which we took the natural language understanding breakthroughs we made for Wolfram|Alpha, and applied them to interpreting strings in hundreds of domains. Well, in Version 11 we’re taking another big step—providing a variety of functions for large-scale natural language processing and text manipulation.

There are functions like `TextWords` and `TextSentences` for breaking text into words and sentences. (It takes fancy machine learning to do a good job, and not, for example, to be confused by things like the periods in “St. John’s St.”) Then there are functions like `TextCases`, which lets one automatically pick out different natural-language classes, like countries or dates, or, for that matter, nouns or verbs.

It’s pretty interesting being able to treat words as data. `WordList` gives lists of different kinds of words. `WordDefinition` gives definitions.

Then there are multilingual capabilities. `Alphabet` gives pretty much every kind of alphabet; `Transliterate` transliterates between writing systems. And `WordTranslation` gives translations of words into about 200 languages. Great raw material for all sorts of linguistics investigations.

The Wolfram Language is arguably the highest-level language that’s ever been created. But in Version 11 we’ve added a bunch of capabilities for “reaching all the way down” to the lowest level of computer systems. First of all, there’s `ByteArray` that can store and manipulate raw sequences of bytes. Then there are functions that deal with raw networks, like `PingTime` and `SocketConnect`.

There’s a new framework for publish-subscribe “channels”. You can create a channel, then either the Wolfram Language or an external system can send it data, and you can set up a “listener” that will do something in your Wolfram Language session whenever the data arrives. There’s a lot that can be built with this setup, whether it’s connecting to external services and devices, handling notifications and third-party authentication—or creating your very own chat system.

Something else new in Version 11 is built-in cryptography. It’s a very clean symbolic framework that lets you set up pretty much whatever protocol you want, using public or private key systems.

What about interacting with the web? The symbolic character of the Wolfram Language is again very powerful here. Because for example it lets one have `HTTPRequest` and `HTTPResponse` as symbolic structures. And it also lets one have functions like `URLSubmit`, with symbolically defined handler functions for callbacks from asynchronous URL submission. There’s even now a `CookieFunction`, for symbolic handling of cookies.

Yes, one can do systems programming in pretty much any language—or even for example in a shell. But what I’ve found is that doing it in the Wolfram Language is incredibly more powerful. Let’s say you’re exploring the performance of a computer system. Well, first of all, everything you’re doing is kept nicely in a notebook, where you can add comments, etc. Then—very importantly—everything you do can be immediately visualized. Or you can apply machine learning, or whatever. Want to study network performance? Use `PingTime` to generate a list of ping times; then immediately make a histogram, correlate with other data, or whatever.

Something else we’re adding in Version 11 is `FileSystemMap`: being able to treat a file system like a collection of nested lists (or associations) and then mapping any function over it. So for example you can take a directory full of images, and use `FileSystemMap` to apply image processing to all of them.

Oh, and another thing: Version 11 also includes, though it’s still tagged as experimental, a full industrial-strength system for searching text documents, both locally and in the cloud.

An incredibly powerful feature of the Wolfram Language is that it runs not only on the desktop but also in the cloud. And in Version 11 there are lots of new capabilities that use the cloud, for example to create things on the web.

Let’s start with the fairly straightforward stuff. `CloudDeploy[FormFunction[...]]` lets you immediately create a form-based app on the web. But now it’s easy to make the form even more sophisticated. There are lots of new types of “smart fields” that automatically use natural language understanding to interpret your input. There are new constructs, like `RepeatingElement` and `CompoundElement`, that automatically set up fields to get input for lists and associations. And there’s a whole new Programmable Linguistic Interface that lets you define your own grammar to extend the natural language understanding that’s already built into the Wolfram Language.

The forms you specify symbolically in the Wolfram Language can be quite sophisticated—with multiple pages and lots of interdependencies and formatting. But ultimately they’re still just forms where you set up your input, then submit it. Version 11 introduces the new `AskFunction` framework which lets you set up more complex interactions—like back-and-forth dialogs in which you “interview” the user to get data. In the Wolfram Language, the whole process is specified by a symbolic structure—which `CloudDeploy` then makes immediately active on the web.

It’s a goal of the Wolfram Language to make it easy to build complex things on the web. In Version 11, we’ve added `FormPage` to let you build a “recycling form” (like on wolframalpha.com), and `GalleryView` to let you take a list of assets in the Wolfram Language, and immediately deploy them as a “gallery” on the web (like in demonstrations.wolfram.com).

If you want to operate at a lower level, there are lots of new functions, like `URLDispatcher` and `GenerateHTTPResponse`, that let you determine exactly how web requests will be handled by things you set up in the cloud.

Also new in Version 11 are functions like `CloudPublish` and `CloudShare` that let you control access to things you put in the cloud from the Wolfram Language. A small but I think important new feature is `SourceLink`, which lets you automatically link from, say, a graphic that you deploy in the cloud back to the notebook (also in the cloud) in which it was created. I think this will be a great tool for “data-backed publication”—in which every picture you see in a paper, for example, links back to what created it. (Inside our company, I’m also insisting that automated reports we generate—from the Wolfram Language of course—include source links, so I can always get the raw data and analyze it myself or whatever.)

Already there experimentally in Version 10—but now fully supported in Version 11—is the whole Wolfram Data Drop mechanism, which lets you accumulate data from anywhere into the Wolfram Cloud. I have to say I think I underestimated the breadth of usefulness of the Wolfram Data Drop. I thought it would be used primarily to store data from sensors and the like. And, yes, there are lots of applications along these lines. But what I’ve found is that the Data Drop is incredibly useful purely inside the Wolfram Language. Let’s say you’ve got a web form that’s running in the Wolfram Language. You might process each request—but then throw the result into a databin in the Wolfram Data Drop so you can analyze them all together.

Wolfram Data Drop is basically set up to accumulate time series of data. In Version 11 another way to store data in the cloud is `CloudExpression`. You can put any Wolfram Language expression into a cloud expression, and it’ll be persistently stored there, with each part being extracted or set by all the usual operations (like `Part` and `AppendTo`) that one could use on a symbol in a Wolfram Language session. `CloudExpression` is a great way to store structured data where one’s continually modifying parts, but one wants all the data to be persistent in the cloud.

Things you store in the cloud are immediately persistent. In Version 11, there’s also `LocalObject`—which is the local analog of `CloudObject`—and provides persistent local storage on your machine. `LocalCache` is a seamless way of ensuring that things you’re using are cached in the local object store.

In the Wolfram Language we curate lots of data that we include directly in the knowledgebase—but we also curate ways to access more data, such as external APIs. Version 11 includes many new connections, for example to Flickr, Reddit, MailChimp, SurveyMonkey, SeatGeek, ArXiv and more.

The Wolfram Language is also an extremely powerful way to deploy your own APIs. And in Version 11 there’s an expanding set of authentication mechanisms that are supported for APIs—for example `PermissionsKey` for giving appids. `CloudLoggingData` also now gives what can be extremely detailed data about how any API or other cloud object you have is being accessed.

An API that you call on the web basically gets data passed to it through the URL that it’s given. In Version 11 we have a new kind of API-like construct that operates not through the web and URLs, but through email. `MailReceiverFunction` is like `APIFunction`, except when it’s deployed it defines an email address, and then any mail that’s sent to that email address gets fed to the code in the mail receiver function. `MailReceiverFunction` lets you get very detailed in separating out different parts of a mail message and its headers—and then lets you apply any Wolfram Language function you want—so you can do arbitrarily sophisticated automation in processing email. And particularly for someone like me who gets a huge amount of email from humans as well as automated systems, this is a pretty nice thing.

You can access the Wolfram Language through a notebook, on the desktop or in the cloud. You can access it through a scheduled task in the cloud, or through an API or a mail receiver function. It’s always been possible to run the Wolfram Language from a command line too, but in Version 11 there’s a powerful new way to do that, using WolframScript.

The idea of WolframScript is to provide a very simple but flexible interface to the Wolfram Language. WolframScript lets you run on a local Wolfram Engine on your computer—or just by saying `-cloud` it lets you run in the cloud. It lets you run code from a file or directly from the command line. And it lets you get back results in any format—including text or images or sounds or PDF or CDF or whatever. And in the usual Unix way, it lets you use `#!wolframscript` to create a script that can be called standalone and will run with WolframScript.

There’s more too. You can set up WolframScript to operate like a Wolfram Language `FormFunction`—pulling in arguments of whatever types you specify (and doing interpretation when needed). And you can also use WolframScript to call an API you’ve already defined in the cloud.

In our own company, there are lots of places where we’re using the Wolfram Language as part of some large and often distributed system. WolframScript provides a very clean way to just “throw in a Wolfram Language component” anywhere you want.

I’ve talked about all sorts of things that broaden and deepen the algorithmic capabilities of Mathematica and the Wolfram Language. But what about the structure of the core Wolfram Language itself? Of course, we’re committed to always maintaining compatibility (and I’m happy to say that all our attention to design on an ongoing basis tends to make this rather easy). But we also want to progressively strengthen and polish the language.

In natural languages, one process of evolution tends to be the construction of new words from common idioms. And we’re doing essentially the same thing in the Wolfram Language. We’ve made an active effort to study what “lumps of repeated computational work” appear most frequently across lots of Wolfram Language code. Then—assuming we can come up with a good name for a particular lump of computational work—we add it as a new function.

In the early days of Mathematica I used to take the point of view that if there were functions that let you do something using an idiom, then that was fine. But what I realized is that if an idiom is compressed into a single function whose name communicates clearly what it’s for, then one gets code that’s easier to read. And coupled with the convenience of not have to reconstruct an idiom many times, this justifies having a new function.

For the last several years, two initiatives we’d had within our company are Incremental Language Development (ILD), and Language Consistency & Completeness (LCC). The idea of ILD is to do things like introduce functions that are equivalent to common idioms. The idea of LCC is to do things like make sure that anything—like pattern matching, or units, or symbolic URLs—is supported wherever it makes sense across the system.

So, for example, a typical ILD addition in Version 11 is the function `MinMax` that returns min and max in a list (it’s amazing how many fiddly applications of `Map` that saves). A typical LCC addition is support for pattern matching in associations.

Between ILD and LCC there are lots of additions to the core language in Version 11. Functions like `Cases` have been extended to `SequenceCases`—looking for sequences in a list instead of individual elements. There’s also now `SequenceFoldList`, which is like `FoldList`, except it can “look back” to a sequence of elements of any length. In a similar vein, there’s `FoldPairList`, which generalizes `FoldList` by allowing the result “returned” at each step to be different from the result that’s passed on in the folding process. (This might sound abstract, and at some level it is—but this is a very useful operation whenever one wants to maintain a separate internal state while sequentially ingesting data.)

Another new construct that might at first sound weird is `Nothing`. Despite its name, `Nothing` does something very useful: whenever it appears in a list, it’s immediately removed. Which means, for example, that to get rid of something in a list, you just have to replace it by `Nothing`.

There are lots of little conveniences we’ve added in Version 11. `First`, for example, now has a second argument, that says what to give if there’s no first element—and avoids having to include an `If` for that case. There’s also a nice general mechanism for things like this: `UpTo`. You can say `Take[ list,UpTo[4]]` to get up to 4 elements in

Another little convenience is `Echo`. When you’re trying to tell what’s going on inside a piece of code, you sometimes want to print some intermediate result. `Echo` is a function that prints, then returns what it printed—so you can sprinkle it in your code without changing what the code does.

It’s hard to believe there are useful basic list operations still to add, but Version 11 has a few. `Subdivide` is like `Range` except it subdivides the range into equal parts. `TakeLargest` and related functions generalize `Max` etc. to give not just the largest, but the *n* largest elements in a list.

There’s a function `Groupings`, which I’d been thinking about for years but only now figured out a good design for—and which effectively generates all possible trees formed with certain binary or other combiners (“what numbers can you get by combining a list of 1s in all possible ways with `Plus` and `Times`?”).

There are nips and tucks to the language in all sorts of places. Like in `Table`, for example, where you can say `Table[ x,n]` rather than needing to say

This has been a long blog post. But I’m not even close to having covered everything that’s new in Version 11. There’s more information on the web. Check out the featured new areas, or the information for existing users or the summary of new features. (See also the list of new features specifically from Version 10.4 to 11.0.)

But most of all, get into using Version 11! If you want quick (and free) exposure to it, try it in the Wolfram Open Cloud. Or just start using Mathematica 11 or Version 11 of any other products based on the Wolfram Language.

I’ve been using test versions of Version 11 for some time now—and to me Version 10 already looks and feels very “old fashioned”, lacking those nice new interface features and all the new functionality and little conveniences. I’m really pleased with the way Version 11 has turned out. It’s yet another big step in what’s already been a 30-year journey of Mathematica and the Wolfram Language. And I’m excited to see all the wonderful things that all sorts of people around the world will now be able to do for first time with Mathematica 11 and the other Version 11 Wolfram Language products.

*To comment, please visit the copy of this post at the Wolfram Blog »*

I spend most of my time trying to build the future with science and technology. But for many years now I’ve also had two other great interests: people and history. And today I’m excited to be publishing my first book that builds on these interests. It’s called *Idea Makers*, and its subtitle is *Personal Perspectives on the Lives & Ideas of Some Notable People*. It’s based on essays I’ve written over the past decade about a range of people—from ones I’ve personally known (like Richard Feynman and Steve Jobs) to ones who died long before I was born (like Ada Lovelace and Gottfried Leibniz).

The book is about lives and ideas, and how they mix together. At its core it’s a book of stories about people, and what those people managed to create. It’s the first book I’ve written that’s fundamentally non-technical—and I’m hoping all sorts of readers without deep technical interests will be able to enjoy it.

There’s a common stereotype that techies like me aren’t interested in people. But for some reason I always have been. Yes, I like computers and abstract ideas and those sorts of things very much, and I certainly spend a great deal of my time on them. But I also like people, and find them interesting. And no doubt that has something to do with why I’ve chosen to spend the past 30 years building up a company that is—like any company—full of people.

One of the things I always find particularly interesting about people is their life trajectories. I’ve been fortunate enough to mentor many people, and I hope to have had a positive effect on many life trajectories. But I also find life trajectories interesting for their own sake—as things to understand and learn from.

*Idea Makers* is in a sense an exploration of a few life trajectories whose intellectual output happens to have intersected with my own. Some of the people in the book are extremely well known; others less so. I had all sorts of different reasons for writing the pieces that have ended up in the book. Sometimes it was to celebrate an anniversary. Sometimes because of some event. And sometimes—sadly—it was because the person they’re about just died. And although I didn’t plan it this way, the sixteen people in the book turn out to represent an interesting cross-section. (Yes, it would have been nice to have a bit more diversity among them, but unfortunately, with my subjects being from past generations, it didn’t work out that way.)

So what have I learned from the explorations in the book? The way the history of science and technology is told it often sounds like new ideas just suddenly arrive in the world. But my experience is that there’s always a story behind them—and usually the story is long and deeply interwoven with the life of a particular person. Even the person themself may sometimes not realize just how important their own life trajectory is in the formation of an idea. But if one digs down, there’s usually a whole long thread to be unearthed.

Some of the people in this book I personally knew, so I was able to watch the stories of their ideas unfold over the course of many years. But for the historical figures in the book I had to do research. I’ve done a certain amount of historical research before—notably for the history notes in *A New Kind of Science*. But I have to say it’s gotten much easier to do historical research in recent years—particularly with so many historical documents now being scanned and searchable—not to mention the advent of large-scale computational knowledge and knowledge-based programming.

It’s rather common to come across what at first seem like mysteries. How on earth did so-and-so manage to figure out such-and-such? One could conclude that it was just a miraculous flash of inspiration. But in reality it almost never is. And indeed my own life experiences have shown me over and over again just how incremental the process of coming up with ideas actually is. One develops some intellectual framework, from which one comes up with conclusions or tools, which then let one extend the framework, and so on—at each stage incrementally generating ideas. And I have to say I’ve found it a lot of fun to try to discover those intellectual missing links—that let one see just how someone went from here to there to manage to arrive at a particular idea.

The story itself is usually interesting—both historically and personally. But I’ve also found that knowing the story of how an idea came to be almost invariably lets me understand the idea itself more deeply.

Ideas are ultimately abstract things. But something I’ve noticed is that their origins are often surprisingly concrete and practical. One might imagine that the best way to arrive at some new idea would just be to think abstractly long and hard. But the stories in the book—and my own life experiences—suggest that a much more common path is something quite different. Instead of heading straight for an abstract goal, what often happens is that the trajectories of people’s lives cause them to work on solving some practical problem. Needless to say, most people will just be satisfied if they solve the practical problem, and will go no further. But some will try to generalize it, and build up an abstract intellectual framework around what they have done. And that, in my observation, is where a great many important new ideas come from.

As I’ve studied the lives and ideas of the people in the book, I think I’ve learned a lot that I can apply in my own life. Perhaps most important is just knowing so many stories of how things worked out in the past—because these give me all sorts of intuition about how things I see today will work out in the future. Even though they’re separated by many details and sometimes centuries, it is remarkable how similar many of the personalities, trends and situations in the book are to ones I see all the time.

The stories in the book involve both triumphs and tragedies. But in the end, I find them all inspiring. Because in their different ways they show how it’s possible to transcend the daily details of human lives and create ideas that can make persistent contributions to our world.

It’s been fun writing the pieces in the book. Of course, there’ve been plenty of challenges. But I feel good about the extent to which I’ve managed to decode history, and get the true stories of how things happened—as well as paint accurate portraits of the people behind them. I’ve had the privilege of personally knowing some of these people; others I’ve come to know only by studying them and poring through what they wrote. I’ve learned a lot—and I’m hoping that with this this book I can pass on some of it, and in particular, communicate something about what it takes for people to make ideas, in the past and in the future.

*Idea Makers is now available from bookstores, or online at Amazon.com and Barnes & Noble.*

An octillion. A billion billion billion. That’s a fairly conservative estimate of the number of times a cellphone or other device somewhere in the world has generated a bit using a maximum-length linear-feedback shift register sequence. It’s probably the single most-used mathematical algorithm idea in history. And the main originator of this idea was Solomon Golomb, who died on May 1—and whom I knew for 35 years.

Solomon Golomb’s classic book *Shift Register Sequences*, published in 1967—based on his work in the 1950s—went out of print long ago. But its content lives on in pretty much every modern communications system. Read the specifications for 3G, LTE, Wi-Fi, Bluetooth, or for that matter GPS, and you’ll find mentions of polynomials that determine the shift register sequences these systems use to encode the data they send. Solomon Golomb is the person who figured out how to construct all these polynomials.

He also was in charge when radar was first used to find the distance to Venus, and of working out how to encode images to be sent from Mars. He introduced the world to what he called polyominoes, which later inspired Tetris (“tetromino tennis”). He created and solved countless math and wordplay puzzles. And—as I learned about 20 years ago—he came very close to discovering my all-time-favorite rule 30 cellular automaton all the way back in 1959, the year I was born.

This essay is in *Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People* »

Most of the scientists and mathematicians I know I met first through professional connections. But not Sol Golomb. It was 1981, and I was at Caltech, a 21-year-old physicist who’d just received some media attention from being the youngest in the first batch of MacArthur award recipients. I get a knock at my office door—and a young woman is there. Already this was unusual, because in those days there were hopelessly few women to be found around a theoretical high-energy physics group. I was a sheltered Englishman who’d been in California a couple of years, but hadn’t really ventured outside the university—and was ill prepared for the burst of Southern Californian energy that dropped in to see me that day. She introduced herself as Astrid, and said that she’d been visiting Oxford and knew someone I’d been at kindergarten with. She explained that she had a personal mandate to collect interesting acquaintances around the Pasadena area. I think she considered me a difficult case, but persisted nevertheless. And one day when I tried to explain something about the work I was doing she said, “You should meet my father. He’s a bit old, but he’s still as sharp as a tack.” And so it was that Astrid Golomb, oldest daughter of Sol Golomb, introduced me to Sol Golomb.

The Golombs lived in a house perched in the hills near Pasadena. I learned that they had two daughters—Astrid, a little older than me, an aspiring Hollywood person, and Beatrice, about my age, a high-powered science type. The Golomb sisters often had parties, usually at their family’s house. There were themes, like the flamingoes & hedgehogs croquet garden party (“recognition will be given to the person who appears most appropriately attired”), or the Stonehenge party with instructions written using runes. The parties had an interesting cross-section of young and not-so-young people, including various local luminaries. And always there, hanging back a little, was Sol Golomb, a small man with a large beard and a certain elf-like quality to him, typically wearing a dark suit coat.

I gradually learned a little about Sol Golomb. That he was involved in “information theory”. That he worked at USC (the University of Southern California). That he had various unspecified but apparently high-level government and other connections. I’d heard of shift registers, but didn’t really know anything much about them.

Then in the fall of 1982, I visited Bell Labs in New Jersey and gave a talk about my latest results on cellular automata. One topic I discussed was what I called “additive” or “linear” cellular automata—and their behavior with limited numbers of cells. Whenever a cellular automaton has a limited number of cells, it’s inevitable that its behavior will eventually repeat. But as the size increases, the maximum repetition period—say for the rule 90 additive cellular automaton—bounces around seemingly quite randomly: 1, 1, 3, 2, 7, 1, 7, 6, 31, 4, 63, …. A few days before my talk, however, I’d noticed that these periods actually seemed to follow a formula that depended on things like the prime factorization of the number of cells. But when I mentioned this during the talk, someone at the back put up their hand and asked, “Do you know if it works for the case *n*=37?” My experiments hadn’t gotten as far as the size-37 case yet, so I didn’t know. But why would someone ask that?

The person who asked turned out to be a certain Andrew Odlyzko, a number theorist at Bell Labs. I asked him, “What on earth makes you think there might be something special about *n*=37?” “Well,” he said, “I think what you’re doing is related to the theory of linear-feedback shift registers,” and he suggested that I look at Sol Golomb’s book (“Oh yes,” I said, “I know his daughters…”). Andrew was indeed correct: there is a very elegant theory of additive cellular automata based on polynomials that is similar to the theory Sol developed for linear-feedback shift registers. Andrew and I ended up writing a now-rather-well-cited paper about it (it’s interesting because it’s a rare case where traditional mathematical methods let one say things about nontrivial cellular automaton behavior). And for me, a side effect was that I learned something about what the somewhat mysterious Sol Golomb actually did. (Remember, this was before the web, so one couldn’t just instantly look everything up.)

Solomon Golomb was born in Baltimore, Maryland in 1932. His family came from Lithuania. His grandfather had been a rabbi; his father moved to the US when he was young, and got a master’s degree in math before switching to medieval Jewish philosophy and also becoming a rabbi. Sol’s mother came from a prominent Russian family that had made boots for the Tsar’s army and then ran a bank. Sol did well in school, notably being a force in the local debating scene. Encouraged by his father, he developed an interest in mathematics, publishing a problem he invented about primes when he was 17. After high school, Sol enrolled at Johns Hopkins University to study math, narrowly avoiding a quota on Jewish students by promising he wouldn’t switch to medicine—and took twice the usual course load, graduating in 1951 after half the usual time.

From there he would go to Harvard for graduate school in math. But first he took a summer job at the Glenn L. Martin Company, an aerospace firm founded in 1912 that had moved to Baltimore from Los Angeles in the 1920s and mostly become a defense contractor—and that would eventually merge into Lockheed Martin. At Harvard, Sol specialized in number theory, and in particular in questions about characterizations of sets of prime numbers. But every summer he would return to the Martin Company. As he later described it, he found that at Harvard “the question of whether anything that was taught or studied in the mathematics department had any practical applications could not even be asked, let alone discussed”. But at the Martin Company, he discovered that the pure mathematics he knew—even about primes and things—did indeed have practical applications, and very interesting ones, especially to shift registers.

The first summer he was at the Martin Company, Sol was assigned to a control theory group. But by his second summer, he’d been put in a group studying communications. And in June 1954 it so happened that his supervisor had just gone to a conference where he’d heard about strange behavior observed in linear-feedback shift registers (he called them “tapped delay lines with feedback”)—and he asked Sol if he could investigate. It didn’t take Sol long to realize that what was going on could be very elegantly studied using the pure mathematics he knew about polynomials over finite fields. Over the year that followed, he split his time between graduate school at Harvard and consulting for the Martin Company, and in June 1955 he wrote his final report, “Sequences with Randomness Properties”—which would basically become the foundational document of the theory of shift register sequences.

Sol liked math puzzles, and in the process of thinking about a puzzle involving arranging dominoes on a checkerboard, he ended up inventing what he called “polyominoes”. He gave a talk about them in November 1953 at the Harvard Mathematics Club, published a paper about them (his first research publication), won a Harvard math prize for his work on them, and, as he later said, then “found [himself] irrevocably committed to their care and feeding” for the rest of his life.

In June 1955, Sol went to spend a year at the University of Oslo on a Fulbright Fellowship—partly so he could work with some distinguished number theorists there, and partly so he could add Norwegian, Swedish and Danish (and some runic scripts) to his collection of language skills. While he was there, he finished a long paper on prime numbers, but also spent time traveling around Scandinavia, and in Denmark met a young woman named Bo (Bodil Rygaard)—who came from a large family in a rural area mostly known for its peat moss, but had managed to get into university and was studying philosophy. Sol and Bo apparently hit it off, and within months, they were married.

When they returned to the US in July 1956, Sol interviewed in a few places, then accepted a job at JPL—the Jet Propulsion Lab that had spun off from Caltech, initially to do military work. Sol was assigned to the Communications Research Group, as a Senior Research Engineer. It was a time when the people at JPL were eager to try launching a satellite. At first, the government wouldn’t let them do it, fearing it would be viewed as a military act. But that all changed in October 1957 when the Soviet Union launched Sputnik, ostensibly as part of the International Geophysical Year. Amazingly, it took only 3 months for the US to launch Explorer 1. JPL built much of it, and Sol’s lab (where he had technicians building electronic implementations of shift registers) was diverted into doing things like making radiation detectors (including, as it happens, the ones that discovered the Van Allen radiation belts)—while Sol himself worked on using radar to determine the orbit of the satellite when it was launched, taking a little time out to go back to Harvard for his final PhD exam.

It was a time of great energy around JPL and the space program. In May 1958 a new Information Processing Group was formed, and Sol was put in charge—and in the same month, Sol’s first child, the aforementioned Astrid, was born. Sol continued his research on shift register sequences—particularly as applied to jamming-resistant radio control of missiles. In May 1959, Sol’s second child arrived—and was named Beatrice, forming a nice A, B sequence. In the fall of 1959, Sol took a sabbatical at MIT, where he got to know Claude Shannon and a number of other MIT luminaries, and got involved in information theory and the theory of algebraic codes.

As it happens, he’d already done some work on coding theory—in the area of biology. The digital nature of DNA had been discovered by Jim Watson and Francis Crick in 1953, but it wasn’t yet clear just how sequences of the four possible base pairs encoded the 20 amino acids. In 1956, Max Delbrück—Jim Watson’s former postdoc advisor at Caltech—asked around at JPL if anyone could figure it out. Sol and two colleagues analyzed an idea of Francis Crick’s and came up with “comma-free codes” in which overlapping triples of base pairs could encode amino acids. The analysis showed that exactly 20 amino acids could be encoded this way. It seemed like an amazing explanation of what was seen—but unfortunately it isn’t how biology actually works (biology uses a more straightforward encoding, where some of the 64 possible triples just don’t represent anything).

In addition to biology, Sol was also pulled into physics. His shift register sequences were useful for doing range finding with radar (much as they’re used now in GPS), and at Sol’s suggestion, he was put in charge of trying to use them to find the distance to Venus. And so it was that in early 1961—when the Sun, Venus, and Earth were in alignment—Sol’s team used the 85-foot Goldstone radio dish in the Mojave Desert to bounce a radar signal off Venus, and dramatically improve our knowledge of the Earth-Venus and Earth-Sun distances.

With his interest in languages, coding and space, it was inevitable that Sol would get involved in the question of communications with extraterrestrials. In 1961 he wrote a paper for the Air Force entitled “A Short Primer for Extraterrestrial Linguistics”, and over the next several years wrote several papers on the subject for broader audiences. He said that “There are two questions involved in communication with Extraterrestrials. One is the mechanical issue of discovering a mutually acceptable channel. The other is the more philosophical problem (semantic, ethic, and metaphysical) of the proper subject matter for discourse. In simpler terms, we first require a common language, and then we must think of something clever to say.” He continued, with a touch of his characteristic humor: “Naturally, we must not risk telling too much until we know whether the Extraterrestrials’ intentions toward us are honorable. The Government will undoubtedly set up a Cosmic Intelligence Agency (CIA) to monitor Extraterrestrial Intelligence. Extreme security precautions will be strictly observed. As H. G. Wells once pointed out [or was it an episode of *The Twilight Zone*?], even if the Aliens tell us in all truthfulness that their only intention is ‘to serve mankind,’ we must endeavor to ascertain whether they wish to serve us baked or fried.”

While at JPL, Sol had also been teaching some classes at the nearby universities: Caltech, USC and UCLA. In the fall of 1962, following some changes at JPL—and perhaps because he wanted to spend more time with his young children—he decided to become a full-time professor. He got offers from all three schools. He wanted to go somewhere where he could “make a difference”. He was told that at Caltech “no one has any influence if they don’t at least have a Nobel Prize”, while at UCLA “the UC bureaucracy is such that no one ever has any ability to affect anything”. The result was that—despite its much-inferior reputation at the time—Sol chose USC. He went there in the spring of 1963 as a Professor of Electrical Engineering—and ended up staying for 53 years.

Before going on with the story of Sol’s life, I should explain what a linear-feedback shift register (LFSR) actually is. The basic idea is simple. Imagine a row of squares, each containing either 1 or 0 (say, black or white). In a pure shift register all that happens is that at each step all values shift one position to the left. The leftmost value is lost, and a new value is “shifted in” from the right. The idea of a feedback shift register is that the value that’s shifted in is determined (or “fed back”) from values at other positions in the shift register. In a linear-feedback shift register, the values from “taps” at particular positions in the register are combined by being added mod 2 (so that 1⊕1=0 instead of 2), or equivalently XOR’ed (“exclusive or”, true if either is true, but not both).

If one runs this for a while, here’s what happens:

Obviously the shift register is always shifting bits to the left. And it has a very simple rule for how bits should be added at the right. But if one looks at the sequence of these bits, it seems rather random—though, as the picture shows, it does eventually repeat. What Sol Golomb did was to find an elegant mathematical way to analyze such sequences, and how they repeat.

If a shift register has size *n*, then it has 2^{n} possible states altogether (corresponding to all possible sequences of 0s and 1s of length *n*). Since the rules for the shift register are deterministic, any given state must always go to the same next state. And that means the maximum possible number of steps the shift register could conceivably go through before it repeats is 2^{n }(actually, it’s 2^{n}–1, because the state with all 0s can’t evolve into anything else).

In the example above, the shift register is of size 7, and it turns out to repeat after exactly 2^{7}–1=127 steps. But which shift registers—with which particular arrangements of taps—will produce sequences with maximal lengths? This is the first question Sol Golomb set out to investigate in the summer of 1954. His answer was simple and elegant.

The shift register above has taps at positions 7, 6 and 1. Sol represented this algebraically, using the polynomial *x*^{7}+*x*^{6}+1. Then what he showed was that the sequence that would be generated would be of maximal length if this polynomial is “irreducible modulo 2”, so that it can’t be factored, making it sort of the analog of a prime among polynomials—as well as having some other properties that make it a so-called “primitive polynomial”. Nowadays, with Mathematica and the Wolfram Language, it’s easy to test things like this:

Back in 1954, Sol had to do all this by hand, but came up with a fairly long table of primitive polynomials corresponding to shift registers that give maximal length sequences:

The idea of maintaining short-term memory by having “delay lines” that circulate digital pulses (say in an actual column of mercury) goes back to the earliest days of electronic computers. By the late 1940s such delay lines were routinely being implemented purely digitally, using sequences of vacuum tubes, and were being called “shift registers”. It’s not clear when the first feedback shift registers were built. Perhaps it was at the end of the 1940s. But it’s still shrouded in mystery—because the first place they seem to have been used was in military cryptography.

The basic idea of cryptography is to take meaningful messages, and then randomize them so they can’t be recognized, but in such a way that the randomization can always be reversed if you know the key that was used to create it. So-called stream ciphers work by generating long sequences of seemingly random bits, then combining these with some representation of the message—then decoding by having the receiver independently generate the same sequence of seemingly random bits, and “backing this out” of the encoded message received.

Linear-feedback shift registers seem at first to have been prized for cryptography because of their long repetition periods. As it turns out, the mathematical analysis Sol used to find things like these periods also makes clear that such shift registers aren’t good for secure cryptography. But in the early days, they seemed pretty good—particularly compared to, say, successive rotor positions in an Enigma machine—and there’s been a persistent rumor that, for example, Soviet military cryptosystems were long based on them.

Back in 2001, when I was working on history notes for my book *A New Kind of Science*, I had a long phone conversation with Sol about shift registers. Sol told me that when he started out, he didn’t know anything about cryptographic work on shift registers. He said that people at Bell Labs, Lincoln Labs and JPL had also started working on shift registers around the same time he did—though perhaps through knowing more pure mathematics, he managed to get further than they did, and in the end his 1955 report basically defined the field.

Over the years that followed, Sol gradually heard about various precursors of his work in the pure mathematical literature. Way back in the year 1202 Fibonacci was already talking about what are now called Fibonacci numbers—and which are generated by a recurrence relation that can be thought of as an analog of a linear-feedback shift register, but working with arbitrary integers rather than 0s and 1s. There was a little work on recurrences with 0s and 1s done in the early 1900s, but the first large-scale study seems to have been by Øystein Ore, who, curiously, came from the University of Oslo, though was by then at Yale. Ore had a student named Marshall Hall—who Sol told me he knew had consulted for the predecessor of the National Security Agency in the late 1940s—possibly about shift registers. But whatever he may have done was kept secret, and so it fell to Sol to discover and publish the story of linear-feedback shift registers—even though Sol did dedicate his 1967 book on shift registers to Marshall Hall.

Over the years I’ve noticed the principle that systems defined by sufficiently simple rules always eventually end up having lots of applications. Shift registers follow this principle in spades. And for example modern hardware (and software) systems are bristling with shift registers: a typical cellphone probably has a dozen or two, implemented usually in hardware but sometimes in software. (When I say “shift register” here, I mean linear-feedback shift register, or LFSR.)

Most of the time, the shift registers that are used are ones that give maximum-length sequences (otherwise known as “m-sequences”). And the reasons they’re used are typically related to some very special properties that Sol discovered about them. One basic property they always have is that they contain the same total number of 0s and 1s (actually, there’s always exactly one extra 1). Sol then showed that they also have the same number of 00s, 01s, 10s and 11s—and the same holds for larger blocks too. This “balance” property is on its own already very useful, for example if one’s trying to efficiently test all possible bit patterns as input to a circuit.

But Sol discovered another, even more important property. Replace each 0 in a sequence by –1, then imagine multiplying each element in a shifted version of the sequence by the corresponding element in the original. What Sol showed is that if one adds up these products, they’ll always sum to zero, except when there’s no shift at all. Said more technically, he showed that the sequence has no correlation with shifted versions of itself.

Both this and the balance property will be approximately true for any sufficiently long random sequence of 0s and 1s. But the surprising thing about maximum-length shift register sequences is that these properties are always exactly true. The sequences in a sense have some of the signatures of randomness—but in a very perfect way, made possible by the fact that they’re not random at all, but instead have a very definite, organized structure.

It’s this structure that makes linear-feedback shift registers ultimately not suitable for strong cryptography. But they’re great for basic “scrambling” and “cheap cryptography”—and they’re used all over the place for these purposes. A very common objective is just to “whiten” (as in “white noise”) a signal. It’s pretty common to want to transmit data that’s got long sequences of 0s in it. But the electronics that pick these up can get confused if they see what amounts to “silence” for too long. One can avoid the problem by scrambling the original data by combining it with a shift register sequence, so there’s always some kind of “chattering” going on. And that’s indeed what’s done in Wi-Fi, Bluetooth, USB, digital TV, Ethernet and lots of other places.

It’s often a nice side effect that the shift register scrambling makes the signal harder to decode—and this is sometimes used to provide at least some level of security. (DVDs use a combination of a size-16 and a size-24 shift register to attempt to encode their data; many GSM phones use a combination of three shift registers to encode all their signals, in a way that was at first secret.)

GPS makes crucial use of shift register sequences too. Each GPS satellite continuously transmits a shift register sequence (from a size-10 shift register, as it happens). A receiver can tell at exactly what time a signal it’s just received was transmitted from a particular satellite by seeing what part of the sequence it got. And by comparing delay times from different satellites, the receiver can triangulate its position. (There’s also a precision mode of GPS, that uses a size-1024 shift register.)

A quite different use of shift registers is for error detection. Say one’s transmitting a block of bits, but each one has a small probability of error. A simple way to let one check for a single error is to include a “parity bit” that says whether there should be an odd or even number of 1s in the block of bits. There are generalizations of this called CRCs (cyclic redundancy checks) that can check for a larger number of errors—and that are computed essentially by feeding one’s data into none other than a linear-feedback shift register. (There are also error-correcting codes that let one not only detect but also correct a certain number of errors, and some of these, too, can be computed with shift register sequences—and in fact Sol Golomb used a version of these called Reed–Solomon codes to design the video encoding for Mars spacecraft.)

The list of uses for shift register sequences goes on and on. A fairly exotic example—more popular in the past than now—was to use shift register sequences to jitter the clock in a computer to spread out the frequency at which the CPU would potentially generate radio interference (“select Enable Spread Spectrum in the BIOS”).

One of the single most prominent uses of shift register sequences is in cellphones, for what’s called CDMA (code division multiple access). Cellphones got their name because they operate in “cells”, with all phones in a given cell being connected to a particular tower. But how do different cellphones in a cell not interfere with each other? In the first systems, each phone just negotiated with the tower to use a slightly different frequency. Later, they used different time slices (TDMA, or time division multiple access). But CDMA uses maximum-length shift register sequences to provide a clever alternative.

The idea is to have all phones essentially operate on the same frequency, but to have each phone encode its signal using (in the simplest case) a differently shifted version of a shift register sequence. And because of Sol’s mathematical results, these differently shifted versions have no correlation—so the cellphone signals don’t interfere. And this is how, for example, most 3G cellphone networks operate.

Sol created the mathematics for this, but he also brought some of the key people together. Back in 1959, he’d gotten to know a certain Irwin Jacobs, who’d recently gotten a PhD at MIT. Meanwhile, he knew Andy Viterbi, who worked at JPL. Sol introduced the two of them—and by 1968 they’d formed a company called Linkabit which did work on coding systems, mostly for the military.

Linkabit had many spinoffs and descendents, and in 1985 Jacobs and Viterbi started a new company called Qualcomm. It didn’t immediately do especially well, but by the early 1990s it began a meteoric rise when it started making the components to deploy CDMA in cellphones—and in 1999 Sol became the “Viterbi Professor of Communications” at USC.

It’s sort of amazing that—although most people have never heard of them—shift register sequences are actually used in one way or another almost whenever bits are moved around in modern communication systems, computers and elsewhere. It’s quite confusing sometimes, because there are lots of things with different names and acronyms that all turn out to be linear-feedback shift register sequences (PN, pseudonoise, M-, FSR, LFSR sequences, spread spectrum communications, MLS, SRS, PRBS, …).

If one looks at cellphones, shift register sequence usage has gone up and down over the years. 2G networks are based on TDMA, so don’t use shift register sequences to encode their data—but still often use CRCs to validate blocks of data. 3G networks are big users of CDMA—so there are shift register sequences involved in pretty much every bit that’s transmitted. 4G networks typically use a combination of time and frequency slots which don’t directly involve shift register sequences—though there are still CRCs used, for example to deal with data integrity when frequency windows overlap. 5G is designed to be more elaborate—with large arrays of antennas dynamically adapting to use optimal time and frequency slots. But half their channels are typically allocated to “pilot signals” that are used to infer the local radio environment—and work by transmitting none other than shift register sequences.

Throughout most kinds of electronics it’s common to want to use the highest data rates and the lowest powers that still get bits transmitted correctly above the “noise floor”. And typically the way one pushes to the edge is to do automatic error detection—using CRCs and therefore shift register sequences. And in fact pretty much every kind of bus (PCIe, SATA, etc.) inside a computer does this: whether it’s connecting parts of CPUs, getting data off devices, or connecting to a display with HDMI. And on disks and in memory, for example, CRCs and other shift-register-sequence-based codes are pretty much universally used to operate at the highest possible rates and densities.

Shift registers are so ubiquitous, it’s a little difficult to estimate just how many of them are in use, and how many bits are being generated by them. There are perhaps 10 billion computers, slightly fewer cellphones, and an increasing number of billions of embedded and IoT (“Internet of Things”) devices. (Even many of the billion cars in the world, for example, have at least 10 microprocessors in them.)

At what rate are the shift registers running? Here, again, things are complicated. In communications systems, for example, there’s a basic carrier frequency—usually in the GHz range—and then there’s what’s called a “chipping rate” (or, confusingly, “chip rate”) that says how fast something like CDMA is done, and this is usually in the MHz range. On the other hand, in buses inside computers, or in connections to a display, all the data is going through shift registers, at the full data rate, which is well into the GHz range.

So it seems safe to estimate that there are at least 10 billion communications links, running for at least 1/10 billion seconds (which is 3 years), that use at least 1 billion bits from a shift register every second—meaning that to date Sol’s algorithm has been used at least an octillion times.

Is it really the most-used mathematical algorithm idea in history? I think so. I suspect the main potential competition would be from arithmetic operations. These days processors are doing perhaps a trillion arithmetic operations per second—and such operations are needed for pretty much every bit that’s generated by a computer. But how is arithmetic done? At some level it’s just a digital electronics implementation of the way people have done arithmetic forever.

But there are some wrinkles—some “algorithmic ideas”—though they’re quite obscure, except to microprocessor designers. Just as when Babbage was making his Difference Engine, carries are a big nuisance in doing arithmetic. (One can actually think of a linear-feedback shift register as being a system that does something like arithmetic, but doesn’t do carries.) There are “carry propagation trees” that optimize carrying. There are also little tricks (“Booth encoding”, “Wallace trees”, etc.) that reduce the number of bit operations needed to do the innards of arithmetic. But unlike with LFSRs, there doesn’t seem to be one algorithmic idea that’s universally used—and so I think it’s still likely that Sol’s maximum-length LFSR sequence idea is the winner for most used.

Even though it’s not obvious at first, it turns out there’s a very close relationship between feedback shift registers and something I’ve spent many years studying: cellular automata. The basic setup for a feedback shift register involves computing one bit at a time. In a cellular automaton, one has a line of cells, and at each step all the cells are updated in parallel, based on a rule that depends, say, on the values of their nearest neighbors.

To see how these are related, think about running a feedback shift register of size *n*, but displaying its state only every *n* steps—in other words, letting all the bits be rewritten before one displays again. If one displays every step of a linear-feedback shift register (here with two taps next to each other), as in the first two panels below, nothing much happens at each step, except that things shift to the left. But if one makes a compressed picture, showing only every *n* steps, suddenly a pattern emerges.

It’s a nested pattern, and it’s very close to being the exact same pattern that one gets with a cellular automaton that takes a cell and its neighbor, and adds them mod 2 (or XORs them). Here’s what happens with that cellular automaton, if one arranges its cells so they’re in a circle of the same size as the shift register above:

At the beginning, the cellular automaton and shift register patterns are exactly the same—though when they “hit the edge” they become slightly different because the edges are handled differently. But looking at these pictures it becomes less surprising that the math of shift registers should be relevant to cellular automata. And seeing the regularity of the nested patterns makes it clearer why there might be an elegant mathematical theory of shift registers in the first place.

Typical shift registers used in practice don’t tend to make such obviously regular patterns, though. Here are a few examples of shift registers that yield maximum-length sequences. When one’s doing math, like Sol did, it’s very much the same story as for the case of obvious nesting. But here the fact that the taps are far apart makes things get mixed up, leaving no obvious visual trace of nesting.

So how broad is the correspondence between shift registers and cellular automata? In cellular automata the rules for generating new values of cells can be anything one wants. In linear-feedback shift registers, however, they always have to be based on adding mod 2 (or XOR’ing). But that’s what the “linear” part of “linear-feedback shift register” means. And it’s also in principle possible to have nonlinear-feedback shift registers (NFSRs) that use whatever rule one wants for combining values.

And in fact, once Sol had worked out his theory for linear-feedback shift registers, he started in on the nonlinear case. When he arrived at JPL in 1956 he got an actual lab, complete with racks of little electronic modules. Sol told me each module was about the size of a cigarette pack—and was built from a Bell Labs design to perform a particular logic operation (AND, OR, NOT, …). The modules could be strung together to implement whatever nonlinear-feedback shift register one wanted, and they ran pretty fast—producing about a million bits per second. (Sol told me that someone tried doing the same thing with a general-purpose computer—and what took 1 second with the custom hardware modules took 6 weeks on the general-purpose computer.)

When Sol had looked at linear-feedback shift registers, the first big thing he’d managed to understand was their repetition periods. And with nonlinear ones he put most of his effort into trying to understand the same thing. He collected all sorts of experimental data. He told me he even tested sequences of length 2^{45}—which must have taken a year. He made summaries, like the one below (notice the visualizations of sequences, shown as oscilloscope-like traces). But he never managed to come up with any kind of general theory as he had with linear-feedback shift registers.

It’s not surprising he couldn’t do it. Because when one looks at nonlinear-feedback shift registers, one’s effectively sampling the whole richness of the computational universe of possible simple programs. Back in the 1950s there were already theoretical results—mostly based on Turing’s ideas of universal computation—about what programs could in principle do. But I don’t think Sol or anyone else ever thought they would apply to the very simple—if nonlinear—functions in NFSRs.

And in the end it basically took until my work around 1981 for it to become clear just how complicated the behavior of even very simple programs could be. My all-time favorite example is rule 30—a cellular automaton in which the values of neighboring cells are combined using a function that can be represented as *p*+*q*+*r*+*qr* mod 2 (or *p* XOR (*q* OR *r*)). And, amazingly, Sol looked at nonlinear-feedback shift registers that were based on incredibly similar functions—like, in 1959, *p*+*r*+*s*+*qr*+*qs*+*rs* mod 2. Here’s what Sol’s function (which can be thought of as “rule 29070”), rule 30, and a couple of other similar rules look like in a shift register:

And here’s what they look like as cellular automata, without being constrained to a fixed-size register:

Of course, Sol never made pictures like this (and it would, realistically, have been almost impossible to do so in the 1950s). Instead, he concentrated on a kind of aggregate feature: the overall repetition period.

Sol wondered whether nonlinear-feedback shift registers might make good sources of randomness. From what we now know about cellular automata, it’s clear they can. And for example the rule 30 cellular automaton is what we used to generate randomness for Mathematica for 25 years (though we recently retired it in favor of a more efficient rule that we found by searching trillions of possibilities).

Sol didn’t talk about cryptography much—though I suspect he did quite a bit of government work on it. He did tell me though that in 1959 he’d found a “multi-dimensional correlation attack on nonlinear sequences”, though he said that at the time he “carefully avoided stating that the application was to cryptanalysis”. The fact is that cellular automata like rule 30 (and presumably also nonlinear-feedback shift registers) do seem to be good cryptosystems—though partly because of confusions about whether they’re somehow equivalent to linear-feedback shift registers (they’re not), they’ve never been used as much as they should.

Being a history enthusiast, I’ve tried over the past few decades to identify all precursors to my work on 1D cellular automata. 2D cellular automata had been studied a bit, but there was only quite theoretical work on the 1D case, together with a few specific investigations in the cryptography community (that I’ve never fully found out about). And in the end, of all the things I’ve seen, I think Sol Golomb’s nonlinear-feedback shift registers were in a sense closest to what I actually ended up doing a quarter century later.

Mention the name “Golomb” and some people will think of shift registers. But many more will think of polyominoes. Sol didn’t invent polyominoes—though he did invent the name. But what he did was to make systematic what had appeared only in isolated puzzles before.

The main question Sol was interested in was how and when collections of polyominoes can be arranged to tile particular (finite or infinite) regions. Sometimes it’s fairly obvious, but often it’s very tricky to figure out. Sol published his first paper on polyominoes in 1954, but what really launched polyominoes into the public consciousness was Martin Gardner’s 1957 Mathematical Games column on them in *Scientific American*. As Sol explained in the introduction to his 1964 book, the effect was that he acquired “a steady stream of correspondents from around the world and from every stratum of society—board chairmen of leading universities, residents of obscure monasteries, inmates of prominent penitentiaries…”

Game companies took notice too, and within months, for example, the “New Sensational Jinx Jigsaw Puzzle” had appeared—followed over the course of decades by a long sequence of other polyomino-based puzzles and games (no, the sinister bald guy doesn’t look anything like Sol):

Sol was still publishing papers about polyominoes 50 years after he first discussed them. In 1961 he introduced general subdividable “rep-tiles”, which it later became clear can make nested, fractal (“infin-tile”), patterns. But almost everything Sol did with polyominoes involved solving specific tiling problems with them.

For me, polyominoes are most interesting not for their specifics but for the examples they provide of more-general phenomena. One might have thought that given a few simple shapes it would be easy to decide whether they can tile the whole plane. But the example of polyominoes—with all the games and puzzles they support—makes it clear that it’s not necessarily so easy. And in fact it was proved in the 1960s that in general it’s a theoretically undecidable problem.

If one’s only interested in a finite region, then in principle one can just enumerate all conceivable arrangements of the original shapes, and see whether any of them correspond to successful tilings. But if one’s interested in the whole, infinite plane then one can’t do this. Maybe one will find a tiling of size one million, but there’s no guarantee how far the tiling can be extended.

It turns out it can be like running a Turing machine—or a cellular automaton. You start from a line of tiles. Then the question of whether there’s an infinite tiling is equivalent to the question of whether there’s a setup for some Turing machine that makes it never halt. And the point then is that if the Turing machine is universal (so that it can in effect be programmed to do any possible computation) then the halting problem for it can be undecidable, which means that the tiling problem is also undecidable.

Of course, whether a tiling problem is undecidable depends on the original set of shapes. And for me an important question is how complicated the shapes have to be so that they can encode universal computation, and yield an undecidable tiling problem. Sol Golomb knew the literature on this kind of question, but wasn’t especially interested in it. But I start thinking about materials formed from polyominoes whose pattern of “crystallization” can in effect do an arbitrary computation, or occur at a “melting point” that seems “random” because its value is undecidable.

Complicated, carefully crafted sets of polyominoes are known that in effect support universal computation. But what’s the simplest set—and is it simple enough that one might run across by accident? My guess is that—just like with other kinds of systems I’ve studied in the computational universe—the simplest set is in fact simple. But finding it is very difficult.

A considerably easier problem is to find polyominoes that successfully tile the plane, but can’t do so periodically. Roger Penrose (of Penrose tiles fame) found an example in 1994. My book *A New Kind of Science* gave a slightly simpler example with 3 polyominoes:

By the time Sol was in his early thirties, he’d established his two most notable pursuits—shift registers and polyominoes—and he’d settled into life as a university professor. He was constantly active, though. He wrote what ended up being a couple of hundred papers, some extending his earlier work, some stimulated by questions people would ask him, and some written, it seems, for the sheer pleasure of figuring out interesting things about numbers, sequences, cryptosystems, or whatever.

Shift registers and polyominoes are both big subjects (they even each have their own category in the AMS classification of mathematical publication topics). Both have had a certain injection of energy in the past decade or two as modern computer experiments started to be done on them—and Sol collaborated with people doing these. But both fields still have many unanswered questions. Even for linear-feedback shift registers there are bigger Hadamard matrices to be found. And very little is known even now about nonlinear-feedback shift registers. Not to mention all the issues about nonperiodic and otherwise exotic polyomino tilings.

Sol was always interested in puzzles, both with math and with words. For a while he wrote a puzzle column for the *Los Angeles Times*—and for 32 years he wrote “Golomb’s Gambits” for the Johns Hopkins alumni magazine. He participated in MegaIQ tests—earning himself a trip to the White House when he and its chief of staff happened to both score in the top five in the country.

He poured immense effort into his work at the university, not only teaching undergraduate courses and mentoring graduate students but also ascending the ranks of university administration (president of the faculty senate, vice provost for research, etc.)—and occasionally opining more generally about university governance (for example writing a paper entitled “Faculty Consulting: Should It Be Curtailed?”; answer: no, it’s good for the university!). At USC, he was particularly involved in recruiting—and over his time at USC he helped it ascend from a school essentially unknown in electrical engineering to one that makes it onto lists of top programs.

And then there was consulting. He was meticulous at not disclosing what he did for government agencies, though at one point he did lament that some newly published work had been anticipated by a classified paper he had written 40 years earlier. In the late 1960s—frustrated that everyone but him seemed to be selling polyomino games—Sol started a company called Recreational Technology, Inc. It didn’t go particularly well, but one side effect was that he got involved in business with Elwyn Berlekamp—a Berkeley professor and fellow enthusiast of coding theory and puzzles—whom he persuaded to start a company called Cyclotomics (in honor of cyclotomic polynomials of the form *x*^{n}–1) which was eventually sold to Kodak for a respectable sum. (Berlekamp also created an algorithmic trading system that he sold to Jim Simons and that became a starting point for Renaissance Technologies, now one of the world’s largest hedge funds.)

More than 10,000 patents refer to Sol’s work, but Sol himself got only one patent: on a cryptosystem based on quasigroups—and I don’t think he ever did much to directly commercialize his work.

Sol was for many years involved with the Technion (Israel Institute of Technology) and quite devoted to Israel. He characterized himself as an “non-observant orthodox Jew”—but occasionally did things like teach a freshman seminar on the Book of Genesis, as well as working on decoding parts of the Dead Sea Scrolls.

Sol and his wife traveled extensively, but the center of Sol’s world was definitely Los Angeles—his office at USC, and the house in which he and his wife lived for nearly 60 years. He had a circle of friends and students who relied on him for many things. And he had his family. His daughter Astrid remained a local personality, even being portrayed in fiction a few times—as a student in a play about Richard Feynman (she sat as a drawing model for him many times), and as a character in a novel by a friend of mine. Beatrice became an MD/PhD who’s spent her career applying an almost mathematical level of precision to various kinds of medical reasoning and diagnosis (Gulf War illness, statin effects, hiccups, etc.)—even as she often quotes “Beatrice’s Law”, that “everything in biology is more complicated than you think, even taking into account Beatrice’s Law”. (I’m happy to have made at least one contribution to Beatrice’s life: introducing her to her husband, now of 26 years, Terry Sejnowski, one of the founders of modern computational neuroscience.)

In the years I knew Sol, there was always a quiet energy to him. He seemed to be involved in lots of things, even if he often wasn’t particularly forthcoming about the details. Occasionally I would talk to him about actual science and mathematics; usually he was more interested in telling stories (often very engaging ones) about personalities and organizations (“Can you believe that [in 1985] after not going to conferences for years, Claude Shannon just showed up unannounced at the bar at the annual information theory conference?”, “Do you know how much they had to pay the president of Caltech to get him to move to Saudi Arabia?”, etc.)

In retrospect, I wish I’d done more to get Sol interested in some of the math questions brought up by my own work. I don’t think I properly internalized the extent to which he liked cracking problems suggested by other people. And then there was the matter of computers. Despite all his contributions to the infrastructure of the computational world, Sol himself basically never seriously used computers. He took particular pride in his own mental calculation capabilities. And he didn’t really use email until he was in his seventies, and never used a computer at home—though, yes, he did have a cellphone. (A typical email from him was short. I had mentioned last year that I was researching Ada Lovelace; he responded: “The story of Ada Lovelace as Babbage’s programmer is so widespread that everyone seems to accept it as factual, but I’ve never seen original source material on this.”)

Sol’s daughters organized a party for his 80th birthday a few years ago, creating an invitation with characteristic mathematical features:

Sol had a few medical problems, though they didn’t seem to be slowing him down much. His wife’s health, though, was failing, and a few weeks ago her condition suddenly worsened. Sol still went to his office as usual on Friday, but on Saturday night, in his sleep, he died. His wife Bo died just two weeks later, two days before what would have been their 60th wedding anniversary.

Though Sol himself is gone, the work he did lives on—responsible for an octillion bits (and counting) across the digital world. Farewell, Sol. And on behalf of all of us, thanks for all those cleverly created bits.

]]>Fifty years ago today there was a six-year-old at a kindergarten (“nursery school” in British English) in Oxford, England who was walking under some trees and noticed that the patches of light under the trees didn’t look the same as usual. Curious, he looked up at the sun. It was bright, but he could see that one side of it seemed to be missing. And he realized that was why the patches of light looked odd.

He’d heard of eclipses. He didn’t really understand them. But he had the idea that that was what he was seeing. Excited, he told another kid about it. They hadn’t heard of eclipses. But he pointed out that the sun had a bite taken out of it. The other kid looked up. Perhaps the sun was too bright, but they looked away without noticing anything. Then the first kid tried another kid. And then another. None of them believed him about the eclipse and the bite taken out of the sun.

Of course, this is a story about me. And now I can find the eclipse by going to Wolfram|Alpha (or the Wolfram Language):

And, yes, it was fun to see my first eclipse (almost exactly 25 years later, I finally saw a total eclipse too). But my real takeaway from that day was about the world and about people. Even if you notice something as obvious as a bite taken out of the side of the sun, there’s no guarantee that you can convince anyone else that it’s there.

It’s been very helpful to me over the past fifty years to understand that. There’ve been so many times in my life in science, technology and business where things seemed as obvious to me as the bite taken out of the sun. And quite often it’s been easy to get other people to see them too. But sometimes they just don’t.

When they find out that people don’t agree with something that seems obvious to them, many people will just conclude that they’re the ones who are wrong. That even though it seems obvious to them, the “crowd” must be right, and they themselves must somehow be confused. Fifty years ago today I learned that wasn’t true. Perhaps it made me more obstinate, but I could list quite a few pieces of science and technology that I rather suspect wouldn’t exist today if it hadn’t been for that kindergarten experience of mine.

As I write this, I feel an urge to tell a few other stories—and lessons learned—from kindergarten. I should explain that I went to a kindergarten with lots of smart kids, mostly children of Oxford academics. They certainly seemed very bright to me at the time—and, interestingly, many of them have ended up having distinguished lives and careers.

In many ways, the kids were much brighter than most of the teachers. I remember one teacher with the curious theory that children’s minds were like elastic bands—and that if children learned too much, their minds would snap. Of course, those were the days when Bible Study was part of pretty much any school’s curriculum in the UK, and it was probably very annoying that I would come in every day and regale everyone with stories about dinosaurs and geology when the teacher just wanted people to learn Genesis stories.

I don’t think I was great at “doing what the other kids do”. When I was three years old, and first at school, there was a time when everyone was supposed to run around “like a bus” (I guess ignoring the fact that buses go on roads…). I didn’t want to do it, and just stood in one place. “Why aren’t you being a bus?”, the teacher asked. “Well, I am lamp post”, I said. They seemed sufficiently taken aback by that response that they left me alone.

I learned an important lesson when I was about five, from another kid. (The kid in question happened to grow up to become a distinguished mathematician—and she was even knighted recently for her contributions to mathematics—but that’s not really relevant to the story.) We were supposed to be hammering nails into pieces of wood. Yes, in those days in the UK they let five-year-olds do that. Anyway, she had the hammer and said “Can you hold the nail? Trust me, I know what I’m doing.” Needless to say, she missed the nail. My thumb was black for several days. But it was a small price to pay for a terrific life lesson: just because someone claims to know what they’re talking about doesn’t mean they do. And nowadays, when I’m dealing with some expert who says “trust me, I know what I’m talking about”, I can’t help but have my mind wander back half a century to that moment just before the hammer fell.

I’ll relate two more stories. The first one I’m not sure how I feel about now. It had to do with learning addition. Now, realistically, I have a good memory (which is perhaps obvious given that I’m writing about things that happened 50 years ago). So I could perfectly well have just memorized all my addition facts. But somehow I didn’t want to. And one day I noticed that if I put two rulers next to each other, I could make a little machine that would add for me—an “addition slide rule”. So whenever we were doing additions, I always “happened” to have two rulers on my desk. When it came to multiplication, I didn’t memorize that either—though in that case I discovered I could go far by knowing the single fact that 7×8=56—because that was the fact other kids didn’t know. (In the end, it took until I was in my forties before I’d finally learned every part of my multiplication table up to 12×12.) And as I look at Wolfram|Alpha and Mathematica and so on, and think about my addition slide rule, I’m reminded of the theory that people never really change….

My final story comes from around the same time as the eclipse. Back then, the UK used non-decimal currency: there were 12 pennies in a shilling, and 20 shillings in a pound. And one of the exercises for us kids was to do mixed-radix arithmetic with these things. I was very pleased with myself one day when I figured out that money didn’t have to work this way; that everything could be base 10 (well, I didn’t explicitly know the concept of base 10 yet). I told this to a teacher. They were a little confused, but said that currency had worked the same way for hundreds of years, and wasn’t going to change. A couple of years later, the UK announced it was going to decimalize its currency. (I suspect if it had waited longer it would still have non-decimal currency, and there would just be a big market for calculators that could compute with it.) I’ve kept this little incident with me all these years—as a reminder that things can change, even if they’ve been the way they are for a very long time. Oh, and again, that one shouldn’t necessarily believe what one’s told. But I guess that’s a theme….

]]>*This week’s release of the movie The Man Who Knew Infinity (which I saw in rough form last fall through its mathematican-producers Manjul Bhargava and Ken Ono) leads me to write about its subject, Srinivasa Ramanujan…*

This essay is in *Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People* »

They used to come by physical mail. Now it’s usually email. From around the world, I have for many years received a steady trickle of messages that make bold claims—about prime numbers, relativity theory, AI, consciousness or a host of other things—but give little or no backup for what they say. I’m always so busy with my own ideas and projects that I invariably put off looking at these messages. But in the end I try to at least skim them—in large part because I remember the story of Ramanujan.

On about January 31, 1913 a mathematician named G. H. Hardy in Cambridge, England received a package of papers with a cover letter that began: “Dear Sir, I beg to introduce myself to you as a clerk in the Accounts Department of the Port Trust Office at Madras on a salary of only £20 per annum. I am now about 23 years of age….” and went on to say that its author had made “startling” progress on a theory of divergent series in mathematics, and had all but solved the longstanding problem of the distribution of prime numbers. The cover letter ended: “Being poor, if you are convinced that there is anything of value I would like to have my theorems published…. Being inexperienced I would very highly value any advice you give me. Requesting to be excused for the trouble I give you. I remain, Dear Sir, Yours truly, S. Ramanujan”.

What followed were at least 11 pages of technical results from a range of areas of mathematics (at least 2 of the pages have now been lost). There are a few things that on first sight might seem absurd, like that the sum of all positive integers can be thought of as being equal to –1/12:

Then there are statements that suggest a kind of experimental approach to mathematics:

But some things get more exotic, with pages of formulas like this:

What are these? Where do they come from? Are they even correct?

The concepts are familiar from college-level calculus. But these are not just complicated college-level calculus exercises. Instead, when one looks closely, each one has something more exotic and surprising going on—and seems to involve a quite different level of mathematics.

Today we can use Mathematica or Wolfram|Alpha to check the results—at least numerically. And sometimes we can even just type in the question and immediately get out the answer:

And the first surprise—just as G. H. Hardy discovered back in 1913—is that, yes, the formulas are essentially all correct. But what kind of person would have made them? And how? And are they all part of some bigger picture—or in a sense just scattered random facts of mathematics?

Needless to say, there’s a human story behind this: the remarkable story of Srinivasa Ramanujan.

He was born in a smallish town in India on December 22, 1887 (which made him not “about 23”, but actually 25, when he wrote his letter to Hardy). His family was of the Brahmin (priests, teachers, …) caste but of modest means. The British colonial rulers of India had put in place a very structured system of schools, and by age 10 Ramanujan stood out by scoring top in his district in the standard exams. He also was known as having an exceptional memory, and being able to recite digits of numbers like pi as well as things like roots of Sanskrit words. When he graduated from high school at age 17 he was recognized for his mathematical prowess, and given a scholarship for college.

While in high school Ramanujan had started studying mathematics on his own—and doing his own research (notably on the numerical evaluation of Euler’s constant, and on properties of the Bernoulli numbers). He was fortunate at age 16 (in those days long before the web!) to get a copy of a remarkably good and comprehensive (at least as of 1886) 1055-page summary of high-end undergraduate mathematics, organized in the form of results numbered up to 6165. The book was written by a tutor for the ultra-competitive Mathematical Tripos exams in Cambridge—and its terse “just the facts” format was very similar to the one Ramanujan used in his letter to Hardy.

By the time Ramanujan got to college, all he wanted to do was mathematics—and he failed his other classes, and at one point ran away, causing his mother to send a missing-person letter to the newspaper:

Ramanujan moved to Madras (now Chennai), tried different colleges, had medical problems, and continued his independent math research. In 1909, when he was 21, his mother arranged (in keeping with customs of the time) for him to marry a then-10-year-old girl named Janaki, who started living with him a couple of years later.

Ramanujan seems to have supported himself by doing math tutoring—but soon became known around Madras as a math whiz, and began publishing in the recently launched *Journal of the Indian Mathematical Society*. His first paper—published in 1911—was on computational properties of Bernoulli numbers (the same Bernoulli numbers that Ada Lovelace had used in her 1843 paper on the Analytical Engine). Though his results weren’t spectacular, Ramanujan’s approach was an interesting and original one that combined continuous (“what’s the numerical value?”) and discrete (“what’s the prime factorization?”) mathematics.

When Ramanujan’s mathematical friends didn’t succeed in getting him a scholarship, Ramanujan started looking for jobs, and wound up in March 1912 as an accounting clerk—or effectively, a human calculator—for the Port of Madras (which was then, as now, a big shipping hub). His boss, the Chief Accountant, happened to be interested in academic mathematics, and became a lifelong supporter of his. The head of the Port of Madras was a rather distinguished British civil engineer, and partly through him, Ramanujan started interacting with a network of technically oriented British expatriates. They struggled to assess him, wondering whether “he has the stuff of great mathematicians” or whether “his brains are akin to those of the calculating boy”. They wrote to a certain Professor M. J. M. Hill in London, who looked at Ramanujan’s rather outlandish statements about divergent series and declared that “Mr. Ramanujan is evidently a man with a taste for Mathematics, and with some ability, but he has got on to wrong lines.” Hill suggested some books for Ramanujan to study.

Meanwhile, Ramanujan’s expat friends were continuing to look for support for him—and he decided to start writing to British mathematicians himself, though with some significant help at composing the English in his letters. We don’t know exactly who all he wrote to first—although Hardy’s long-time collaborator John Littlewood mentioned two names shortly before he died 64 years later: H. F. Baker and E. W. Hobson. Neither were particularly good choices: Baker worked on algebraic geometry and Hobson on mathematical analysis, both subjects fairly far from what Ramanujan was doing. But in any event, neither of them responded.

And so it was that on Thursday, January 16, 1913, Ramanujan sent his letter to G. H. Hardy.

G. H. Hardy was born in 1877 to schoolteacher parents based about 30 miles south of London. He was from the beginning a top student, particularly in mathematics. Even when I was growing up in England in the early 1970s, it was typical for such students to go to Winchester for high school and Cambridge for college. And that’s exactly what Hardy did. (The other, slightly more famous, track—less austere and less mathematically oriented—was Eton and Oxford, which happens to be where I went.)

Cambridge undergraduate mathematics was at the time very focused on solving ornately constructed calculus-related problems as a kind of competitive sport—with the final event being the Mathematical Tripos exams, which ranked everyone from the “Senior Wrangler” (top score) to the “Wooden Spoon” (lowest passing score). Hardy thought he should have been top, but actually came in 4th, and decided that what he really liked was the somewhat more rigorous and formal approach to mathematics that was then becoming popular in Continental Europe.

The way the British academic system worked at that time—and basically until the 1960s—was that as soon as they graduated, top students could be elected to “college fellowships” that could last the rest of their lives. Hardy was at Trinity College—the largest and most scientifically distinguished college at Cambridge University—and when he graduated in 1900, he was duly elected to a college fellowship.

Hardy’s first research paper was about doing integrals like these:

For a decade Hardy basically worked on the finer points of calculus, figuring out how to do different kinds of integrals and sums, and injecting greater rigor into issues like convergence and the interchange of limits.

His papers weren’t grand or visionary, but they were good examples of state-of-the-art mathematical craftsmanship. (As a colleague of Bertrand Russell’s, he dipped into the new area of transfinite numbers, but didn’t do much with them.) Then in 1908, he wrote a textbook entitled *A Course of Pure Mathematics*—which was a good book, and was very successful in its time, even if its preface began by explaining that it was for students “whose abilities reach or approach something like what is usually described as ‘scholarship standard’”.

By 1910 or so, Hardy had pretty much settled into a routine of life as a Cambridge professor, pursuing a steady program of academic work. But then he met John Littlewood. Littlewood had grown up in South Africa and was eight years younger than Hardy, a recent Senior Wrangler, and in many ways much more adventurous. And in 1911 Hardy—who had previously always worked on his own—began a collaboration with Littlewood that ultimately lasted the rest of his life.

As a person, Hardy gives me the impression of a good schoolboy who never fully grew up. He seemed to like living in a structured environment, concentrating on his math exercises, and displaying cleverness whenever he could. He could be very nerdy—whether about cricket scores, proving the non-existence of God, or writing down rules for his collaboration with Littlewood. And in a quintessentially British way, he could express himself with wit and charm, but was personally stiff and distant—for example always theming himself as “G. H. Hardy”, with “Harold” basically used only by his mother and sister.

So in early 1913 there was Hardy: a respectable and successful, if personally reserved, British mathematician, who had recently been energized by starting to collaborate with Littlewood—and was being pulled in the direction of number theory by Littlewood’s interests there. But then he received the letter from Ramanujan.

Ramanujan’s letter began in a somewhat unpromising way, giving the impression that he thought he was describing for the first time the already fairly well-known technique of analytic continuation for generalizing things like the factorial function to non-integers. He made the statement that “My whole investigations are based upon this and I have been developing this to a remarkable extent so much so that the local mathematicians are not able to understand me in my higher flights.” But after the cover letter, there followed more than nine pages that listed over 120 different mathematical results.

Again, they began unpromisingly, with rather vague statements about having a method to count the number of primes up to a given size. But by page 3, there were definite formulas for sums and integrals and things. Some of them looked at least from a distance like the kinds of things that were, for example, in Hardy’s papers. But some were definitely more exotic. Their general texture, though, was typical of these types of math formulas. But many of the actual formulas were quite surprising—often claiming that things one wouldn’t expect to be related at all were actually mathematically equal.

At least two pages of the original letter have gone missing. But the last page we have again seems to end inauspiciously—with Ramanujan describing achievements of his theory of divergent series, including the seemingly absurd result about adding up all the positive integers, 1+2+3+4+…, and getting –1/12.

So what was Hardy’s reaction? First he consulted Littlewood. Was it perhaps a practical joke? Were these formulas all already known, or perhaps completely wrong? Some they recognized, and knew were correct. But many they did not. But as Hardy later said with characteristic clever gloss, they concluded that these too “must be true because, if they were not true, no one would have the imagination to invent them.”

Bertrand Russell wrote that by the next day he “found Hardy and Littlewood in a state of wild excitement because they believe they have found a second Newton, a Hindu clerk in Madras making 20 pounds a year.” Hardy showed Ramanujan’s letter to lots of people, and started making enquiries with the government department that handled India. It took him a week to actually reply to Ramanujan, opening with a certain measured and precisely expressed excitement: “I was exceedingly interested by your letter and by the theorems which you state.”

Then he went on: “You will however understand that, before I can judge properly of the value of what you have done, it is essential that I should see proofs of some of your assertions.” It was an interesting thing to say. To Hardy, it wasn’t enough to know what was true; he wanted to know the proof—the story—of why it was true. Of course, Hardy could have taken it upon himself to find his own proofs. But I think part of it was that he wanted to get an idea of how Ramanujan thought—and what level of mathematician he really was.

His letter went on—with characteristic precision—to group Ramanujan’s results into three classes: already known, new and interesting but probably not important, and new and potentially important. But the only things he immediately put in the third category were Ramanujan’s statements about counting primes, adding that “almost everything depends on the precise rigour of the methods of proof which you have used.”

Hardy had obviously done some background research on Ramanujan by this point, since in his letter he makes reference to Ramanujan’s paper on Bernoulli numbers. But in his letter he just says, “I hope very much that you will send me as quickly as possible… a few of your proofs,” then closes with, “Hoping to hear from you again as soon as possible.”

Ramanujan did indeed respond quickly to Hardy’s letter, and his response is fascinating. First, he says he was expecting the same kind of reply from Hardy as he had from the “Mathematics Professor at London”, who just told him “not [to] fall into the pitfalls of divergent series.” Then he reacts to Hardy’s desire for rigorous proofs by saying, “If I had given you my methods of proof I am sure you will follow the London Professor.” He mentions his result 1+2+3+4+…=–1/12 and says that “If I tell you this you will at once point out to me the lunatic asylum as my goal.” He goes on to say, “I dilate on this simply to convince you that you will not be able to follow my methods of proof… [based on] a single letter.” He says that his first goal is just to get someone like Hardy to verify his results—so he’ll be able to get a scholarship, since “I am already a half starving man. To preserve my brains I want food…”

Ramanujan makes a point of saying that it was Hardy’s first category of results—ones that were already known—that he’s most pleased about, “For my results are verified to be true even though I may take my stand upon slender basis.” In other words, Ramanujan himself wasn’t sure if the results were correct—and he’s excited that they actually are.

So how was he getting his results? I’ll say more about this later. But he was certainly doing all sorts of calculations with numbers and formulas—in effect doing experiments. And presumably he was looking at the results of these calculations to get an idea of what might be true. It’s not clear how he figured out what was actually true—and indeed some of the results he quoted weren’t in the end true. But presumably he used some mixture of traditional mathematical proof, calculational evidence, and lots of intuition. But he didn’t explain any of this to Hardy.

Instead, he just started conducting a correspondence about the details of the results, and the fragments of proofs he was able to give. Hardy and Littlewood seemed intent on grading his efforts—with Littlewood writing about some result, for example, “(d) is still wrong, of course, rather a howler.” Still, they wondered if Ramanujan was “an Euler”, or merely “a Jacobi”. But Littlewood had to say, “The stuff about primes is wrong”—explaining that Ramanujan incorrectly assumed the Riemann zeta function didn’t have zeros off the real axis, even though it actually has an infinite number of them, which are the subject of the whole Riemann hypothesis. (The Riemann hypothesis is still a famous unsolved math problem, even though an optimistic teacher suggested it to Littlewood as a project when he was an undergraduate…)

What about Ramanujan’s strange 1+2+3+4+… = –1/12? Well, that has to do with the Riemann zeta function as well. For positive integers, ζ(s) is defined as the sum And given those values, there’s a nice function—called Zeta[*s*] in the Wolfram Language—that can be obtained by continuing to all complex *s*. Now based on the formula for positive arguments, one can identify Zeta[–1] with 1+2+3+4+… But one can also just evaluate Zeta[–1]:

It’s a weird result, to be sure. But not as crazy as it might at first seem. And in fact it’s a result that’s nowadays considered perfectly sensible for purposes of certain calculations in quantum field theory (in which, to be fair, all actual infinities are intended to cancel out at the end).

But back to the story. Hardy and Littlewood didn’t really have a good mental model for Ramanujan. Littlewood speculated that Ramanujan might not be giving the proofs they assumed he had because he was afraid they’d steal his work. (Stealing was a major issue in academia then as it is now.) Ramanujan said he was “pained” by this speculation, and assured them that he was not “in the least apprehensive of my method being utilised by others.” He said that actually he’d invented the method eight years earlier, but hadn’t found anyone who could appreciate it, and now he was “willing to place unreservedly in your possession what little I have.”

Meanwhile, even before Hardy had responded to Ramanujan’s first letter, he’d been investigating with the government department responsible for Indian students how he could bring Ramanujan to Cambridge. It’s not quite clear quite what got communicated, but Ramanujan responded that he couldn’t go—perhaps because of his Brahmin beliefs, or his mother, or perhaps because he just didn’t think he’d fit in. But in any case, Ramanujan’s supporters started pushing instead for him to get a graduate scholarship at the University of Madras. More experts were consulted, who opined that “His results appear to be wonderful; but he is not, now, able to present any intelligible proof of some of them,” but “He has sufficient knowledge of English and is not too old to learn modern methods from books.”

The university administration said their regulations didn’t allow a graduate scholarship to be given to someone like Ramanujan who hadn’t finished an undergraduate degree. But they helpfully suggested that “Section XV of the Act of Incorporation and Section 3 of the Indian Universities Act, 1904, allow of the grant of such a scholarship [by the Government Educational Department], subject to the express consent of the Governor of Fort St George in Council.” And despite the seemingly arcane bureaucracy, things moved quickly, and within a few weeks Ramanujan was duly awarded a scholarship for two years, with the sole requirement that he provide quarterly reports.

By the time he got his scholarship, Ramanujan had started writing more papers, and publishing them in the *Journal of the Indian Mathematical Society*. Compared to his big claims about primes and divergent series, the topics of these papers were quite tame. But the papers were remarkable nevertheless.

What’s immediately striking about them is how calculational they are—full of actual, complicated formulas. Most math papers aren’t that way. They may have complicated notation, but they don’t have big expressions containing complicated combinations of roots, or seemingly random long integers.

In modern times, we’re used to seeing incredibly complicated formulas routinely generated by Mathematica. But usually they’re just intermediate steps, and aren’t what papers explicitly talk much about. For Ramanujan, though, complicated formulas were often what really told the story. And of course it’s incredibly impressive that he could derive them without computers and modern tools.

(As an aside, back in the late 1970s I started writing papers that involved formulas generated by computer. And in one particular paper, the formulas happened to have lots of occurrences of the number 9. But the experienced typist who typed the paper—yes, from a manuscript—replaced every “9” with a “g”. When I asked her why, she said, “Well, there are never explicit 9’s in papers!”)

Looking at Ramanujan’s papers, another striking feature is the frequent use of numerical approximations in arguments leading to exact results. People tend to think of working with algebraic formulas as an exact process—generating, for example, coefficients that are exactly 16, not just roughly 15.99999. But for Ramanujan, approximations were routinely part of the story, even when the final results were exact.

In some sense it’s not surprising that approximations to numbers are useful. Let’s say we want to know which is larger: or . We can start doing all sorts of transformations among square roots, and trying to derive theorems from them. Or we can just evaluate each expression numerically, and find that the first one (2.9755…) is obviously smaller than the second (3.322…). In the mathematical tradition of someone like Hardy—or, for that matter, in a typical modern calculus course—such a direct calculational way of answering the question seems somehow inappropriate and improper.

And of course if the numbers are very close one has to be careful about numerical round-off and so on. But for example in Mathematica and the Wolfram Language today—particularly with their built-in precision tracking for numbers—we often use numerical approximations internally as part of deriving exact results, much like Ramanujan did.

When Hardy asked Ramanujan for proofs, part of what he wanted was to get a kind of story for each result that explained why it was true. But in a sense Ramanujan’s methods didn’t lend themselves to that. Because part of the “story” would have to be that there’s this complicated expression, and it happens to be numerically greater than this other expression. It’s easy to see it’s true—but there’s no real story of why it’s true.

And the same happens whenever a key part of a result comes from pure computation of complicated formulas, or in modern times, from automated theorem proving. Yes, one can trace the steps and see that they’re correct. But there’s no bigger story that gives one any particular understanding of the results.

For most people it’d be bad news to end up with some complicated expression or long seemingly random number—because it wouldn’t tell them anything. But Ramanujan was different. Littlewood once said of Ramanujan that “every positive integer was one of his personal friends.” And between a good memory and good ability to notice patterns, I suspect Ramanujan could conclude a lot from a complicated expression or a long number. For him, just the object itself would tell a story.

Ramanujan was of course generating all these things by his own calculational efforts. But back in the late 1970s and early 1980s I had the experience of starting to generate lots of complicated results automatically by computer. And after I’d been doing it awhile, something interesting happened: I started being able to quickly recognize the “texture” of results—and often immediately see what might be likely be true. If I was dealing, say, with some complicated integral, it wasn’t that I knew any theorems about it. I just had an intuition about, for example, what functions might appear in the result. And given this, I could then get the computer to go in and fill in the details—and check that the result was correct. But I couldn’t derive why the result was true, or tell a story about it; it was just something that intuition and calculation gave me.

Now of course there’s a fair amount of pure mathematics where one can’t (yet) just routinely go in and do an explicit computation to check whether or not some result is correct. And this often happens for example when there are infinite or infinitesimal quantities or limits involved. And one of the things Hardy had specialized in was giving proofs that were careful in handling such things. In 1910 he’d even written a book called *Orders of Infinity* that was about subtle issues that come up in taking infinite limits. (In particular, in a kind of algebraic analog of the theory of transfinite numbers, he talked about comparing growth rates of things like nested exponential functions—and we even make some use of what are now called Hardy fields in dealing with generalizations of power series in the Wolfram Language.)

So when Hardy saw Ramanujan’s “fast and loose” handling of infinite limits and the like, it wasn’t surprising that he reacted negatively—and thought he would need to “tame” Ramanujan, and educate him in the finer European ways of doing such things, if Ramanujan was actually going to reliably get correct answers.

Ramanujan was surely a great human calculator, and impressive at knowing whether a particular mathematical fact or relation was actually true. But his greatest skill was, I think, something in a sense more mysterious: an uncanny ability to tell what was significant, and what might be deduced from it.

Take for example his paper “Modular Equations and Approximations to π”, published in 1914, in which he calculates (without a computer of course):

Most mathematicians would say, “It’s an amusing coincidence that that’s so close to an integer—but so what?” But Ramanujan realized there was more to it. He found other relations (those “=” should really be ≅):

Then he began to build a theory—that involves elliptic functions, though Ramanujan didn’t know that name yet—and started coming up with new series approximations for π:

Previous approximations to π had in a sense been much more sober, though the best one before Ramanujan’s (Machin’s series from 1706) did involve the seemingly random number 239:

But Ramanujan’s series—bizarre and arbitrary as they might appear—had an important feature: they took far fewer terms to compute π to a given accuracy. In 1977, Bill Gosper—himself a rather Ramanujan-like figure, whom I’ve had the pleasure of knowing for more than 35 years—took the last of Ramanujan’s series from the list above, and used it to compute a record number of digits of π. There soon followed other computations, all based directly on Ramanujan’s idea—as is the method we use for computing π in Mathematica and the Wolfram Language.

It’s interesting to see in Ramanujan’s paper that even he occasionally didn’t know what was and wasn’t significant. For example, he noted:

And then—in pretty much his only published example of geometry—he gave a peculiar geometric construction for approximately “squaring the circle” based on this formula:

To Hardy, Ramanujan’s way of working must have seemed quite alien. For Ramanujan was in some fundamental sense an experimental mathematician: going out into the universe of mathematical possibilities and doing calculations to find interesting and significant facts—and only then building theories based on them.

Hardy on the other hand worked like a traditional mathematician, progressively extending the narrative of existing mathematics. Most of his papers begin—explicitly or implicitly—by quoting some result from the mathematical literature, and then proceed by telling the story of how this result can be extended by a series of rigorous steps. There are no sudden empirical discoveries—and no seemingly inexplicable jumps based on intuition from them. It’s mathematics carefully argued, and built, in a sense, brick by brick.

A century later this is still the way almost all pure mathematics is done. And even if it’s discussing the same subject matter, perhaps anything else shouldn’t be called “mathematics”, because its methods are too different. In my own efforts to explore the computational universe of simple programs, I’ve certainly done a fair amount that could be called “mathematical” in the sense that it, for example, explores systems based on numbers.

Over the years, I’ve found all sorts of results that seem interesting. Strange structures that arise when one successively adds numbers to their digit reversals. Bizarre nested recurrence relations that generate primes. Peculiar representations of integers using trees of bitwise xors. But they’re empirical facts—demonstrably true, yet not part of the tradition and narrative of existing mathematics.

For many mathematicians—like Hardy—the process of proof is the core of mathematical activity. It’s not particularly significant to come up with a conjecture about what’s true; what’s significant is to create a proof that explains why something is true, constructing a narrative that other mathematicians can understand.

Particularly today, as we start to be able to automate more and more proofs, they can seem a bit like mundane manual labor, where the outcome may be interesting but the process of getting there is not. But proofs can also be illuminating. They can in effect be stories that introduce new abstract concepts that transcend the particulars of a given proof, and provide raw material to understand many other mathematical results.

For Ramanujan, though, I suspect it was facts and results that were the center of his mathematical thinking, and proofs felt a bit like some strange European custom necessary to take his results out of his particular context, and convince European mathematicians that they were correct.

But let’s return to the story of Ramanujan and Hardy.

In the early part of 1913, Hardy and Ramanujan continued to exchange letters. Ramanujan described results; Hardy critiqued what Ramanujan said, and pushed for proofs and traditional mathematical presentation. Then there was a long gap, but finally in December 1913, Hardy wrote again, explaining that Ramanujan’s most ambitious results—about the distribution of primes—were definitely incorrect, commenting that “…the theory of primes is full of pitfalls, to surmount which requires the fullest of trainings in modern rigorous methods.” He also said that if Ramanujan had been able to prove his results it would have been “about the most remarkable mathematical feat in the whole history of mathematics.”

In January 1914 a young Cambridge mathematician named E. H. Neville came to give lectures in Madras, and relayed the message that Hardy was (in Ramanujan’s words) “anxious to get [Ramanujan] to Cambridge”. Ramanujan responded that back in February 1913 he’d had a meeting, along with his “superior officer”, with the Secretary to the Students Advisory Committee of Madras, who had asked whether he was prepared to go to England. Ramanujan wrote that he assumed he’d have to take exams like the other Indian students he’d seen go to England, which he didn’t think he’d do well enough in—and also that his superior officer, a “very orthodox Brahman having scruples to go to foreign land replied at once that I could not go”.

But then he said that Neville had “cleared [his] doubts”, explaining that there wouldn’t be an issue with his expenses, that his English would do, that he wouldn’t have to take exams, and that he could remain a vegetarian in England. He ended by saying that he hoped Hardy and Littlewood would “be good enough to take the trouble of getting me [to England] within a very few months.”

Hardy had assumed it would be bureaucratically trivial to get Ramanujan to England, but actually it wasn’t. Hardy’s own Trinity College wasn’t prepared to contribute any real funding. Hardy and Littlewood offered to put up some of the money themselves. But Neville wrote to the registrar of the University of Madras saying that “the discovery of the genius of S. Ramanujan of Madras promises to be the most interesting event of our time in the mathematical world”—and suggested the university come up with the money. Ramanujan’s expat supporters swung into action, with the matter eventually reaching the Governor of Madras—and a solution was found that involved taking money from a grant that had been given by the government five years earlier for “establishing University vacation lectures”, but that was actually, in the bureaucratic language of “Document No. 182 of the Educational Department”, “not being utilised for any immediate purpose”.

There are strange little notes in the bureaucratic record, like on February 12: “What caste is he? Treat as urgent.” But eventually everything was sorted out, and on March 17, 1914, after a send-off featuring local dignitaries, Ramanujan boarded a ship for England, sailing up through the Suez Canal, and arriving in London on April 14. Before leaving India, Ramanujan had prepared for European life by getting Western clothes, and learning things like how to eat with a knife and fork, and how to tie a tie. Many Indian students had come to England before, and there was a whole procedure for them. But after a few days in London, Ramanujan arrived in Cambridge—with the Indian newspapers proudly reporting that “Mr. S. Ramanujan, of Madras, whose work in the higher mathematics has excited the wonder of Cambridge, is now in residence at Trinity.”

(In addition to Hardy and Littlewood, two other names that appear in connection with Ramanujan’s early days in Cambridge are Neville and Barnes. They’re not especially famous in the overall history of mathematics, but it so happens that in the Wolfram Language they’re both commemorated by built-in functions: NevilleThetaS and BarnesG.)

What was the Ramanujan who arrived in Cambridge like? He was described as enthusiastic and eager, though diffident. He made jokes, sometimes at his own expense. He could talk about politics and philosophy as well as mathematics. He was never particularly introspective. In official settings he was polite and deferential and tried to follow local customs. His native language was Tamil, and earlier in his life he had failed English exams, but by the time he arrived in England, his English was excellent. He liked to hang out with other Indian students, sometimes going to musical events, or boating on the river. Physically, he was described as short and stout—with his main notable feature being the brightness of his eyes. He worked hard, chasing one mathematical problem after another. He kept his living space sparse, with only a few books and papers. He was sensible about practical things, for example in figuring out issues with cooking and vegetarian ingredients. And from what one can tell, he was happy to be in Cambridge.

But then on June 28, 1914—two and a half months after Ramanujan arrived in England—Archduke Ferdinand was assassinated, and on July 28, World War I began. There was an immediate effect on Cambridge. Many students were called up for military duty. Littlewood joined the war effort and ended up developing ways to compute range tables for anti-aircraft guns. Hardy wasn’t a big supporter of the war—not least because he liked German mathematics—but he volunteered for duty too, though was rejected on medical grounds.

Ramanujan described the war in a letter to his mother, saying for example, “They fly in aeroplanes at great heights, bomb the cities and ruin them. As soon as enemy planes are sighted in the sky, the planes resting on the ground take off and fly at great speeds and dash against them resulting in destruction and death.”

Ramanujan nevertheless continued to pursue mathematics, explaining to his mother that “war is waged in a country that is as far as Rangoon is away from [Madras]”. There were practical difficulties, like a lack of vegetables, which caused Ramanujan to ask a friend in India to send him “some tamarind (seeds being removed) and good cocoanut oil by parcel post”. But of more importance, as Ramanujan reported it, was that the “professors here… have lost their interest in mathematics owing to the present war”.

Ramanujan told a friend that he had “changed [his] plan of publishing [his] results”. He said that he would wait to publish any of the old results in his notebooks until the war was over. But he said that since coming to England he had learned “their methods”, and was “trying to get new results by their methods so that I can easily publish these results without delay”.

In 1915 Ramanujan published a long paper entitled “Highly Composite Numbers” about maxima of the function (DivisorSigma in the Wolfram Language) that counts the number of divisors of a given number. Hardy seems to have been quite involved in the preparation of this paper—and it served as the centerpiece of Ramanujan’s analog of a PhD thesis.

For the next couple of years, Ramanujan prolifically wrote papers—and despite the war, they were published. A notable paper he wrote with Hardy concerns the partition function (PartitionsP in the Wolfram Language) that counts the number of ways an integer can be written as a sum of positive integers. The paper is a classic example of mixing the approximate with the exact. The paper begins with the result for large *n*:

But then, using ideas Ramanujan developed back in India, it progressively improves the estimate, to the point where the exact integer result can be obtained. In Ramanujan’s day, computing the exact value of PartitionsP[200] was a big deal—and the climax of his paper. But now, thanks to Ramanujan’s method, it’s instantaneous:

Cambridge was dispirited by the war—with an appalling number of its finest students dying, often within weeks, at the front lines. Trinity College’s big quad had become a war hospital. But through all of this, Ramanujan continued to do his mathematics—and with Hardy’s help continued to build his reputation.

But then in May 1917, there was another problem: Ramanujan got sick. From what we know now, it’s likely that what he had was a parasitic liver infection picked up in India. But back then nobody could diagnose it. Ramanujan went from doctor to doctor, and nursing home to nursing home. He didn’t believe much of what he was told, and nothing that was done seemed to help much. Some months he would be well enough to do a significant amount of mathematics; others not. He became depressed, and at one point apparently suicidal. It didn’t help that his mother had prevented his wife back in India from communicating with him, presumably fearing it would distract him.

Hardy tried to help—sometimes by interacting with doctors, sometimes by providing mathematical input. One doctor told Hardy he suspected “some obscure Oriental germ trouble imperfectly studied at present”. Hardy wrote, “Like all Indians, [Ramanujan] is fatalistic, and it is terribly hard to get him to take care of himself.” Hardy later told the now-famous story that he once visited Ramanujan at a nursing home, telling him that he came in a taxicab with number 1729, and saying that it seemed to him a rather dull number—to which Ramanujan replied: “No, it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways”: . (Wolfram|Alpha now reports some other properties too.)

But through all of this, Ramanujan’s mathematical reputation continued to grow. He was elected a Fellow of the Royal Society (with his supporters including Hobson and Baker, both of whom had failed to respond to his original letter)—and in October 1918 he was elected a fellow of Trinity College, assuring him financial support. A month later World War I was over—and the threat of U-boat attacks, which had made travel to India dangerous, was gone.

And so on March 13, 1919, Ramanujan returned to India—now very famous and respected, but also very ill. Through it all, he continued to do mathematics, writing a notable letter to Hardy about “mock” theta functions on January 12, 1920. He chose to live humbly, and largely ignored what little medicine could do for him. And on April 26, 1920, at the age of 32, and three days after the last entry in his notebook, he died.

From when he first started doing mathematics research, Ramanujan had recorded his results in a series of hardcover notebooks—publishing only a very small fraction of them. When Ramanujan died, Hardy began to organize an effort to study and publish all 3000 or so results in Ramanujan’s notebooks. Several people were involved in the 1920s and 1930s, and quite a few publications were generated. But through various misadventures the project was not completed—to be taken up again only in the 1970s.

In 1940, Hardy gave all the letters he had from Ramanujan to the Cambridge University Library, but the original cover letter for what Ramanujan sent in 1913 was not among them—so now the only record we have of that is the transcription Hardy later published. Ramanujan’s three main notebooks sat for many years on top of a cabinet in the librarian’s office at the University of Madras, where they suffered damage from insects, but were never lost. His other mathematical documents passed through several hands, and some of them wound up in the incredibly messy office of a Cambridge mathematician—but when he died in 1965 they were noticed and sent to a library, where they languished until they were “rediscovered” with great excitement as Ramanujan’s lost notebook in 1976.

When Ramanujan died, it took only days for his various relatives to start asking for financial support. There were large medical bills from England, and there was talk of selling Ramanujan’s papers to raise money.

Ramanujan’s wife was 21 when he died, but as was the custom, she never remarried. She lived very modestly, making her living mostly from tailoring. In 1950 she adopted the son of a friend of hers who had died. By the 1960s, Ramanujan was becoming something of a general Indian hero, and she started receiving various honors and pensions. Over the years, quite a few mathematicians had come to visit her—and she had supplied them for example with the passport photo that has become the most famous picture of Ramanujan.

She lived a long life, dying in 1994 at the age of 95, having outlived Ramanujan by 73 years.

Hardy was 35 when Ramanujan’s letter arrived, and was 43 when Ramanujan died. Hardy viewed his “discovery” of Ramanujan as his greatest achievement, and described his association with Ramanujan as the “one romantic incident of [his] life”. After Ramanujan died, Hardy put some of his efforts into continuing to decode and develop Ramanujan’s results, but for the most part he returned to his previous mathematical trajectory. His collected works fill seven large volumes (while Ramanujan’s publications make up just one fairly slim volume). The word clouds of the titles of his papers show only a few changes from before he met Ramanujan to after:

Shortly before Ramanujan entered his life, Hardy had started to collaborate with John Littlewood, who he would later say was an even more important influence on his life than Ramanujan. After Ramanujan died, Hardy moved to what seemed like a better job in Oxford, and ended up staying there for 11 years before returning to Cambridge. His absence didn’t affect his collaboration with Littlewood, though—since they worked mostly by exchanging written messages, even when their rooms were less than a hundred feet apart. After 1911 Hardy rarely did mathematics without a collaborator; he worked especially with Littlewood, publishing 95 papers with him over the course of 38 years.

Hardy’s mathematics was always of the finest quality. He dreamed of doing something like solving the Riemann hypothesis—but in reality never did anything truly spectacular. He wrote two books, though, that continue to be read today: *An Introduction to the Theory of Numbers*, with E. M. Wright; and *Inequalities*, with Littlewood and G. Pólya.

Hardy lived his life in the stratum of the intellectual elite. In the 1920s he displayed a picture of Lenin in his apartment, and was briefly president of the “scientific workers” trade union. He always wrote elegantly, mostly about mathematics, and sometimes about Ramanujan. He eschewed gadgets and always lived along with students and other professors in his college. He never married, though near the end of his life his younger sister joined him in Cambridge (she also had never married, and had spent most of her life teaching at the girls’ school where she went as a child).

In 1940 Hardy wrote a small book called *A Mathematician’s Apology*. I remember when I was about 12 being given a copy of this book. I think many people viewed it as a kind of manifesto or advertisement for pure mathematics. But I must say it didn’t resonate with me at all. It felt to me at once sanctimonious and austere, and I wasn’t impressed by its attempt to describe the aesthetics and pleasures of mathematics, or by the pride with which its author said that “nothing I have ever done is of the slightest practical use” (actually, he co-invented the Hardy-Weinberg law used in genetics). I doubt I would have chosen the path of a pure mathematician anyway, but Hardy’s book helped make certain of it.

To be fair, however, Hardy wrote the book at a low point in his own life, when he was concerned about his health and the loss of his mathematical faculties. And perhaps that explains why he made a point of explaining that “mathematics… is a young man’s game”. (And in an article about Ramanujan, he wrote that “a mathematician is often comparatively old at 30, and his death may be less of a catastrophe than it seems.”) I don’t know if the sentiment had been expressed before—but by the 1970s it was taken as an established fact, extending to science as well as mathematics. Kids I knew would tell me I’d better get on with things, because it’d be all over by age 30.

Is that actually true? I don’t think so. It’s hard to get clear evidence, but as one example I took the data we have on notable mathematical theorems in Wolfram|Alpha and the Wolfram Language, and make a histogram of the ages of people who proved them. It’s not a completely uniform distribution (though the peak just before 40 is probably just a theorem-selection effect associated with Fields Medals), but particularly if one corrects for life expectancies now and in the past it’s a far cry from showing that mathematical productivity has all but dried up by age 30.

My own feeling—as someone who’s getting older myself—is that at least up to my age, many aspects of scientific and technical productivity actually steadily increase. For a start, it really helps to know more—and certainly a lot of my best ideas have come from making connections between things I’ve learned decades apart. It also helps to have more experience and intuition about how things will work out. And if one has earlier successes, those can help provide the confidence to move forward more definitively, without second guessing. Of course, one must maintain the constitution to focus with enough intensity—and be able to concentrate for long enough—to think through complex things. I think in some ways I’ve gotten slower over the years, and in some ways faster. I’m slower because I know more about mistakes I make, and try to do things carefully enough to avoid them. But I’m faster because I know more and can shortcut many more things. Of course, for me in particular, it also helps that over the years I’ve built all sorts of automation that I’ve been able to make use of.

A quite different point is that while making specific contributions to an existing area (as Hardy did) is something that can potentially be done by the young, creating a whole new structure tends to require the broader knowledge and experience that comes with age.

But back to Hardy. I suspect it was a lack of motivation rather than ability, but in his last years, he became quite dispirited and all but dropped mathematics. He died in 1947 at the age of 70.

Littlewood, who was a decade younger than Hardy, lived on until 1977. Littlewood was always a little more adventurous than Hardy, a little less austere, and a little less august. Like Hardy, he never married—though he did have a daughter (with the wife of the couple who shared his vacation home) whom he described as his “niece” until she was in her forties. And—giving a lie to Hardy’s claim about math being a young man’s game—Littlewood (helped by getting early antidepressant drugs at the age of 72) had remarkably productive years of mathematics in his 80s.

What became of Ramanujan’s mathematics? For many years, not too much. Hardy pursued it some, but the whole field of number theory—which was where the majority of Ramanujan’s work was concentrated—was out of fashion. Here’s a plot of the fraction of all math papers tagged as “number theory” as a function of time in the Zentralblatt database:

Ramanujan’s interest may have been to some extent driven by the peak in the early 1900s (which would probably go even higher with earlier data). But by the 1930s, the emphasis of mathematics had shifted away from what seemed like particular results in areas like number theory and calculus, towards the greater generality and formality that seemed to exist in more algebraic areas.

In the 1970s, though, number theory suddenly became more popular again, driven by advances in algebraic number theory. (Other subcategories showing substantial increases at that time include automorphic forms, elementary number theory and sequences.)

Back in the late 1970s, I had certainly heard of Ramanujan—though more in the context of his story than his mathematics. And I was pleased in 1982, when I was writing about the vacuum in quantum field theory, that I could use results of Ramanujan’s to give closed forms for particular cases (of infinite sums in various dimensions of modes of a quantum field—corresponding to Epstein zeta functions):

Starting in the 1970s, there was a big effort—still not entirely complete—to prove results Ramanujan had given in his notebooks. And there were increasing connections being found between the particular results he’d got, and general themes emerging in number theory.

A significant part of what Ramanujan did was to study so-called special functions—and to invent some new ones. Special functions—like the zeta function, elliptic functions, theta functions, and so on—can be thought of as defining convenient “packets” of mathematics. There are an infinite number of possible functions one can define, but what get called “special functions” are ones whose definitions survive because they turn out to be repeatedly useful.

And today, for example, in Mathematica and the Wolfram Language we have RamanujanTau, RamanujanTauL, RamanujanTauTheta and RamanujanTauZ as special functions. I don’t doubt that in the future we’ll have more Ramanujan-inspired functions. In the last year of his life, Ramanujan defined some particularly ambitious special functions that he called “mock theta functions”—and that are still in the process of being made concrete enough to routinely compute.

If one looks at the definition of Ramanujan’s tau function it seems quite bizarre (notice the “24”):

And to my mind, the most remarkable thing about Ramanujan is that he could define something as seemingly arbitrary as this, and have it turn out to be useful a century later.

In antiquity, the Pythagoreans made much of the fact that 1+2+3+4=10. But to us today, this just seems like a random fact of mathematics, not of any particular significance. When I look at Ramanujan’s results, many of them also seem like random facts of mathematics. But the amazing thing that’s emerged over the past century, and particularly over the past few decades, is that they’re not. Instead, more and more of them are being found to be connected to deep, elegant mathematical principles.

To enunciate these principles in a direct and formal way requires layers of abstract mathematical concepts and language which have taken decades to develop. But somehow, through his experiments and intuition, Ramanujan managed to find concrete examples of these principles. Often his examples look quite arbitrary—full of seemingly random definitions and numbers. But perhaps it’s not surprising that that’s what it takes to express modern abstract principles in terms of the concrete mathematical constructs of the early twentieth century. It’s a bit like a poet trying to express deep general ideas—but being forced to use only the imperfect medium of human natural language.

It’s turned out to be very challenging to prove many of Ramanujan’s results. And part of the reason seems to be that to do so—and to create the kind of narrative needed for a good proof—one actually has no choice but to build up much more abstract and conceptually complex structures, often in many steps.

So how is it that Ramanujan managed in effect to predict all these deep principles of later mathematics? I think there are two basic logical possibilities. The first is that if one drills down from any sufficiently surprising result, say in number theory, one will eventually reach a deep principle in the effort to explain it. And the second possibility is that while Ramanujan did not have the wherewithal to express it directly, he had what amounts to an aesthetic sense of which seemingly random facts would turn out to fit together and have deeper significance.

I’m not sure which of these possibilities is correct, and perhaps it’s a combination. But to understand this a little more, we should talk about the overall structure of mathematics. In a sense mathematics as it’s practiced is strangely perched between the trivial and the impossible. At an underlying level, mathematics is based on simple axioms. And it could be—as it is, say, for the specific case of Boolean algebra—that given the axioms there’s a straightforward procedure to figure out whether any particular result is true. But ever since Gödel’s theorem in 1931 (which Hardy must have been aware of, but apparently never commented on) it’s been known that for an area like number theory the situation is quite different: there are statements one can give within the context of the theory whose truth or falsity is undecidable from the axioms.

It was proved in the early 1960s that there are polynomial equations involving integers where it’s undecidable from the axioms of arithmetic—or in effect from the formal methods of number theory—whether or not the equations have solutions. The particular examples of classes of equations where it’s known that this happens are extremely complex. But from my investigations in the computational universe, I’ve long suspected that there are vastly simpler equations where it happens too. Over the past several decades, I’ve had the opportunity to poll some of the world’s leading number theorists on where they think the boundary of undecidability lies. Opinions differ, but it’s certainly within the realm of possibility that for example cubic equations with three variables could exhibit undecidability.

So the question then is, why should the truth of what seem like random facts of number theory even be decidable? In other words, it’s perfectly possible that Ramanujan could have stated a result that simply can’t be proved true or false from the axioms of arithmetic. Conceivably the Goldbach conjecture will turn out to be an example. And so could many of Ramanujan’s results.

Some of Ramanujan’s results have taken decades to prove—but the fact that they’re provable at all is already important information. For it suggests that in a sense they’re not just random facts; they’re actually facts that can somehow be connected by proofs back to the underlying axioms.

And I must say that to me this tends to support the idea that Ramanujan had intuition and aesthetic criteria that in some sense captured some of the deeper principles we now know, even if he couldn’t express them directly.

It’s pretty easy to start picking mathematical statements, say at random, and then getting empirical evidence for whether they’re true or not. Gödel’s theorem effectively implies that you’ll never know how far you’ll have to go to be certain of any particular result. Sometimes it won’t be far, but sometimes it may in a sense be arbitrarily far.

Ramanujan no doubt convinced himself of many of his results by what amount to empirical methods—and often it worked well. In the case of the counting of primes, however, as Hardy pointed out, things turn out to be more subtle, and results that might work up to very large numbers can eventually fail.

So let’s say one looks at the space of possible mathematical statements, and picks statements that appear empirically at least to some level to be true. Now the next question: are these statements connected in any way?

Imagine one could find proofs of the statements that are true. These proofs effectively correspond to paths through a directed graph that starts with the axioms, and leads to the true results. One possibility is then that the graph is like a star—with every result being independently proved from the axioms. But another possibility is that there are many common “waypoints” in getting from the axioms to the results. And it’s these waypoints that in effect represent general principles.

If there’s a certain sparsity to true results, then it may be inevitable that many of them are connected through a small number of general principles. It might also be that there are results that aren’t connected in this way, but these results, perhaps just because of their lack of connections, aren’t considered “interesting”—and so are effectively dropped when one thinks about a particular subject.

I have to say that these considerations lead to an important question for me. I have spent many years studying what amounts to a generalization of mathematics: the behavior of arbitrary simple programs in the computational universe. And I’ve found that there’s a huge richness of complex behavior to be seen in such programs. But I have also found evidence—not least through my Principle of Computational Equivalence—that undecidability is rife there.

But now the question is, when one looks at all that rich and complex behavior, are there in effect Ramanujan-like facts to be found there? Ultimately there will be much that can’t readily be reasoned about in axiom systems like the ones in mathematics. But perhaps there are networks of facts that can be reasoned about—and that all connect to deeper principles of some kind.

We know from the idea around the Principle of Computational Equivalence that there will always be pockets of “computational reducibility”: places where one will be able to identify abstract patterns and make abstract conclusions without running into undecidability. Repetitive behavior and nested behavior are two almost trivial examples. But now the question is whether among all the specific details of particular programs there are other general forms of organization to be found.

Of course, whereas repetition and nesting are seen in a great many systems, it could be that another form of organization would be seen only much more narrowly. But we don’t know. And as of now, we don’t really have much of a handle on finding out—at least until or unless there’s a Ramanujan-like figure not for traditional mathematics but for the computational universe.

Will there ever be another Ramanujan? I don’t know if it’s the legend of Ramanujan or just a natural feature of the way the world is set up, but for at least 30 years I’ve received a steady stream of letters that read a bit like the one Hardy got from Ramanujan back in 1913. Just a few months ago, for example, I received an email (from India, as it happens) with an image of a notebook listing various mathematical expressions that are numerically almost integers—very much like Ramanujan’s .

Are these numerical facts significant? I don’t know. Wolfram|Alpha can certainly generate lots of similar facts, but without Ramanujan-like insight, it’s hard to tell which, if any, are significant.

Over the years I’ve received countless communications a bit like this one. Number theory is a common topic. So are relativity and gravitation theory. And particularly in recent years, AI and consciousness have been popular too. The nice thing about letters related to math is that there’s typically something immediately concrete in them: some specific formula, or fact, or theorem. In Hardy’s day it was hard to check such things; today it’s a lot easier. But—as in the case of the almost integer above—there’s then the question of whether what’s being said is somehow “interesting”, or whether it’s just a “random uninteresting fact”.

Needless to say, the definition of “interesting” isn’t an easy or objective one. And in fact the issues are very much the same as Hardy faced with Ramanujan’s letter. If one can see how what’s being presented fits into some bigger picture—some narrative—that one understands, then one can tell whether, at least within that framework, something is “interesting”. But if one doesn’t have the bigger picture—or if what’s being presented is just “too far out”—then one really has no way to tell if it should be considered interesting or not.

When I first started studying the behavior of simple programs, there really wasn’t a context for understanding what was going on in them. The pictures I got certainly seemed visually interesting. But it wasn’t clear what the bigger intellectual story was. And it took quite a few years before I’d accumulated enough empirical data to formulate hypotheses and develop principles that let one go back and see what was and wasn’t interesting about the behavior I’d observed.

I’ve put a few decades into developing a science of the computational universe. But it’s still young, and there is much left to discover—and it’s a highly accessible area, with no threshold of elaborate technical knowledge. And one consequence of this is that I frequently get letters that show remarkable behavior in some particular cellular automaton or other simple program. Often I recognize the general form of the behavior, because it relates to things I’ve seen before, but sometimes I don’t—and so I can’t be sure what will or won’t end up being interesting.

Back in Ramanujan’s day, mathematics was a younger field—not quite as easy to enter as the study of the computational universe, but much closer than modern academic mathematics. And there were plenty of “random facts” being published: a particular type of integral done for the first time, or a new class of equations that could be solved. Many years later we would collect as many of these as we could to build them into the algorithms and knowledgebase of Mathematica and the Wolfram Language. But at the time probably the most significant aspect of their publication was the proofs that were given: the stories that explained why the results were true. Because in these proofs, there was at least the potential that concepts were introduced that could be reused elsewhere, and build up part of the fabric of mathematics.

It would take us too far afield to discuss this at length here, but there is a kind of analog in the study of the computational universe: the methodology for computer experiments. Just as a proof can contain elements that define a general methodology for getting a mathematical result, so the particular methods of search, visualization or analysis can define something in computer experiments that is general and reusable, and can potentially give an indication of some underlying idea or principle.

And so, a bit like many of the mathematics journals of Ramanujan’s day, I’ve tried to provide a journal and a forum where specific results about the computational universe can be reported—though there is much more that could be done along these lines.

When a letter one receives contains definite mathematics, in mathematical notation, there is at least something concrete one can understand in it. But plenty of things can’t usefully be formulated in mathematical notation. And too often, unfortunately, letters are in plain English (or worse, for me, other languages) and it’s almost impossible for me to tell what they’re trying to say. But now there’s something much better that people increasingly do: formulate things in the Wolfram Language. And in that form, I’m always able to tell what someone is trying to say—although I still may not know if it’s significant or not.

Over the years, I’ve been introduced to many interesting people through letters they’ve sent. Often they’ll come to our Summer School, or publish something in one of our various channels. I have no story (yet) as dramatic as Hardy and Ramanujan. But it’s wonderful that it’s possible to connect with people in this way, particularly in their formative years. And I can’t forget that a long time ago, I was a 14-year-old who mailed papers about the research I’d done to physicists around the world…

Ramanujan did his calculations by hand—with chalk on slate, or later pencil on paper. Today with Mathematica and the Wolfram Language we have immensely more powerful tools with which to do experiments and make discoveries in mathematics (not to mention the computational universe in general).

It’s fun to imagine what Ramanujan would have done with these modern tools. I rather think he would have been quite an adventurer—going out into the mathematical universe and finding all sorts of strange and wonderful things, then using his intuition and aesthetic sense to see what fits together and what to study further.

Ramanujan unquestionably had remarkable skills. But I think the first step to following in his footsteps is just to be adventurous: not to stay in the comfort of well-established mathematical theories, but instead to go out into the wider mathematical universe and start finding—experimentally—what’s true.

It’s taken the better part of a century for many of Ramanujan’s discoveries to be fitted into a broader and more abstract context. But one of the great inspirations that Ramanujan gives us is that it’s possible with the right sense to make great progress even before the broader context has been understood. And I for one hope that many more people will take advantage of the tools we have today to follow Ramanujan’s lead and make great discoveries in experimental mathematics—whether they announce them in unexpected letters or not.

]]>*Edited transcript of a talk given on March 4, 2016, at the Computer History Museum, Mountain View, California.*

I normally spend my time trying to build the future. But I find history really interesting and informative, and I study it quite a lot. Usually it’s other people’s history. But the Computer History Museum asked me to talk today about my own history, and the history of technology I’ve built. So that’s what I’m going to do here.

This happens to be a really exciting time for me—because a bunch of things that I’ve been working on for more than 30 years are finally coming to fruition. And mostly that’s what I’ve been out in the Bay Area this week talking about.

The focus is the Wolfram Language, which is really a new kind of language—a knowledge-based language—in which as much knowledge as possible about computation and about the world is built in. And in which the language automates as much as possible so one can go as directly as possible from computational thinking to actual implementation.

And what I want to do here is to talk about how all this came to be, and how things like Mathematica and Wolfram|Alpha emerged along the way.

Inevitably a lot of what I’m going to talk about is really my story: basically the story of how I’ve spent most of my life so far building a big stack of technology and science. When I look back, some of what’s happened seems sort of inevitable and inexorable. And some I didn’t see coming.

But let me begin at the beginning. I was born in London, England, in 1959—so, yes, I’m outrageously old, at least by my current standards. My father ran a small company—doing international trading of textiles—for nearly 60 years, and also wrote a few “serious fiction” novels. My mother was a philosophy professor at Oxford. I actually happened to notice her textbook on philosophical logic in the Stanford bookstore last time I was there.

You know, I remember when I was maybe 5 or 6 being bored at some party with a bunch of adults, and somehow ending up talking at great length to some probably very distinguished Oxford philosopher—who I heard say at the end, “One day that child will be a philosopher—but it may take a while.” Well, they were right. It’s sort of funny how these things work out.

Here’s me back then:

I went to elementary school in Oxford—to a place called the Dragon School, that I guess happens to be probably the most famous elementary school in England. Wikipedia seems to think the most famous people now from my class there are myself and the actor Hugh Laurie.

Here’s one of my school reports, from when I was 7. Those are class ranks. So, yes, I did well in poetry and geography, but not in math. (And, yes, it’s England, so they taught “Bible study” in school, at least then.) But at least it said “He is full of spirit and determination; he should go far”…

But OK, that was 1967, and I was learning Latin and things—but what I really liked was the future. And the big future-oriented thing happening back then was the space program. And I was really interested in that, and started collecting all the information I could about every spacecraft launched—and putting together little books summarizing it. And I discovered that even from England one could write to NASA and get all this great stuff mailed to one for free.

Well, back then, there was supposed to be a Mars colony any day, and I started doing little designs for that, and for spacecraft and things.

And that got me interested in propulsion and ion drives and stuff like that—and by the time I was 11 what I was really interested in was physics.

And I discovered—having nothing to do with school—that if one just reads books one can learn stuff pretty quickly. I would pick areas of physics and try to organize knowledge about them. And when I was turning 12 I ended up spending the summer putting together all the facts I could accumulate about physics. And, yes, I suppose you could call some of these “visualizations”. And, yes, like so much else, it’s on the web now:

I found this again a few years ago—around the time Wolfram|Alpha came out—and I thought, “Oh my gosh, I’ve been doing the same thing all my life!” And then of course I started typing in numbers from when I was 11 or 12 to see if Wolfram|Alpha got them right. It did, of course:

Well, when I was 12, following British tradition I went to a so-called public school that’s actually a private school. I went to the most famous such school—Eton—which was founded about 50 years before Columbus came to America. And, oh so impressively , I even got the top scholarship among new kids in 1972.

Yes, everyone wore tailcoats all the time, and King’s Scholars, like me, wore gowns too—which provided excellent rain protection etc. I think I avoided these annual Harry-Potter-like pictures all but one time:

And back in those Latin-and-Greek-and-tailcoat days I had a sort of double life, because my real passion was doing physics.

The summer when I turned 13 I put together a summary of particle physics:

And I made the important meta-discovery that even if one was a kid, one could discover stuff. And I started just trying to answer questions about physics, either by finding answers in books, or by figuring them out myself. And by the time I was 15 I started publishing papers about physics. Yes, nobody asks how old you are when you mail a paper in to a physics journal.

But, OK, something important for me had happened back when I was 12 and first at Eton: I got to know my first computer. It’s an Elliott 903C. This is not the actual one I used, but it’s similar:

It had come to Eton through a teacher of mine named Norman Routledge, who had been a friend of Alan Turing’s. It had 8 kilowords of 18-bit ferrite core memory, and you usually programmed it with paper—or Mylar—tape, most often in a little 16-instruction assembler called SIR.

It often seemed like one of the most important skills was rewinding the tape as quickly as possible after it got dumped in a bin after going through the optical reader.

Anyway, I wanted to use the computer to do physics. When I was 12 I had gotten this book:

What’s on the cover is supposed to be a simulation of gas molecules showing increasing randomness and entropy. As it happens, years later I discovered this picture was actually kind of a fake. But back when I was 12, I really wanted to reproduce it—with the computer.

It wasn’t so easy. The molecule positions were supposed to be real numbers; one had to have an algorithm for collisions; and so on. And to make this fit on the Elliott 903 I ended up simplifying a lot—to what was actually a 2D cellular automaton.

Well, a decade after that, I made some big discoveries about cellular automata. But back then I was unlucky with my cellular automaton rule, and I ended up not discovering anything with it. And in the end my biggest achievement with the Elliott 903 was writing a punched tape loader for it.

You see, the big problem with the Mylar tape that one used for serious programs is that it would get statically electrically charged and pick up little confetti holes, so the bits would be read wrong. Well, for my loader, I came up with what I later found out were error-correcting codes—and I set it up so that if the checks failed, the tape would stop in the reader, and you could pull it back a couple of feet, and then re-read it, after shaking out the confetti.

OK, so by the time I was 16 I had published some physics papers and was starting to be known in physics circles—and I left school, and went to work at a British government lab called the Rutherford Lab that did particle physics research.

Now you might remember from my age-7 school report that I didn’t do very well in math. Things got a bit better when I started to use a slide rule, and then in 1972 a calculator—of which I was a very early adopter. But I never liked doing school math, or math calculations in general. Well, in particle physics there’s a lot of math to be done—and so my dislike of it was a problem.

At the Rutherford Lab, two things helped. First, a lovely HP desktop computer with a plotter, on which I could do very nice interactive computation. And second, a mainframe for crunchier things, that I programmed in Fortran.

Well, after my time at the Rutherford Lab I went to college at Oxford. Within a very short time I’d decided this was a mistake—but in those days one didn’t actually have to go to lectures for classes—so I was able to just hide out and do physics research. And mostly I spent my time in a nice underground air-conditioned room in the Nuclear Physics building—that had terminals connected to a mainframe, and to the ARPANET.

And that was when—in 1976—I first started using computers to do symbolic math, and algebra and things. Feynman diagrams in particle physics involve lots and lots of algebra. And back in 1962, I think, three physicists had met at CERN and decided to try to use computers to do this. They had three different approaches. One wrote a system called ASHMEDAI in Fortran. One—influenced by John McCarthy at Stanford—wrote a system called Reduce in Lisp. And one wrote a system called SCHOONSCHIP in CDC 6000 series assembly language, with mnemonics in Dutch. Curiously, years later, one of these physicists won a Nobel Prize. It was Tini Veltman—the one who wrote SCHOONSCHIP in assembly language.

Anyway, back in 1976 very few people other than the creators of these systems used them. But I started using all of them. But my favorite was a quite different system, written in Lisp at MIT since the mid-1960s. It was a system called Macsyma. It ran on the Project MAC PDP-10 computer. And what was really important to me as a 17-year-old kid in England was that I could get to it on the ARPANET.

It was host 236. So I would type @O 236, and there I was in an interactive operating system. Someone had taken the login SW. So I became Swolf, and started to use Macsyma.

I spent the summer of 1977 at Argonne National Lab—where they actually trusted physicists to be right in the room with the mainframe.

Then in 1978 I went to Caltech as a graduate student. By that point, I think I was the world’s largest user of computer algebra. And it was so neat, because I could just compute all this stuff so easily. I used to have fun putting incredibly ornate formulas in my physics papers. Then I could see if anyone was reading the papers, because I’d get letters saying, “How did you derive line such-and-such from the one before?”

I got a reputation for being a great calculator. Which was of course 100% undeserved—because it wasn’t me, it was just the computer. Well, actually, to be fair, there was part that was me. You see, by being able to compute so many different examples, I had gotten a new kind of intuition. I was no good at computing integrals myself, but I could go back and forth with the computer, knowing from intuition what to try, and then doing experiments to see what worked.

I was writing lots of code for Macsyma, and building this whole tower. And sometime in 1979 I hit the edge. Something new was needed. (Notice, for example, the ominous “MACSYMA RELOAD” line in the diagram.)

Well, in November 1979, just after I turned 20, I put together some papers, called it a thesis, and got my PhD. And a couple of days later I was visiting CERN in Geneva—and thinking about my future in, I thought, physics. And the one thing I was sure about was that I needed something beyond Macsyma that would let me compute things. And that was when I decided I had to build a system for myself. And right then and there, I started designing the system, handwriting its specification.

At first it was going to be ALGY–The Algebraic Manipulator. But I quickly realized that I actually had to make it do much more than algebraic manipulation. I knew most of the general-purpose computer languages of the time—both the ALGOL-like ones, and ones like Lisp and APL. But somehow they didn’t seem to capture what I wanted the system to do.

So I guess I did what I’d learned in physics: I tried to drill down to find the atoms—the primitives—of what’s going on. I knew a certain amount about mathematical logic, and the history of attempts to formulate things using logic and so on—even if my mother’s textbook about philosophical logic didn’t exist yet.

The whole history of this effort at formalization—through Aristotle, Leibniz, Frege, Peano, Hilbert, Whitehead, Russell, and so on—is really interesting. But it’s a different talk. But back in 1979 it was thinking about this kind of thing that led me to the design I came up with, that was based on the idea of symbolic expressions, and doing transformations on them.

I named what I wanted to build SMP: a Symbolic Manipulation Program, and started recruiting people from around Caltech to help me with it. Richard Feynman came to a bunch of the meetings I had to discuss the design of SMP, offering various ideas—which I have to admit I considered hacky—about shortcuts for interacting with the system. Meanwhile, the physics department had just gotten a VAX 11/780, and after some wrangling, it was made to run Unix. Meanwhile, a young physics grad student named Rob Pike—more recently creator of the Go programming language—persuaded me that I should write the code for my system in the “language of the future”: C.

I got pretty good at writing C, for a while averaging about a thousand lines a day. And with the help of a somewhat colorful collection of characters, by June 1981, the first version of SMP existed—with a big book of documentation I’d written.

OK, you might ask: so can we see SMP? Well, back when we were working on SMP I had the bright idea that we should protect the source code by encrypting it. And—you guessed it—over a span of three decades nobody remembers the password. And until a little while ago, that was the situation.

In another bright idea, I had used a modified version of the Unix crypt program to do the encryption—thinking that would be more secure. Well, as part of the 25th anniversary of Mathematica a couple of years ago, we did a crowdsourced project to break the encryption—and we did it. Unfortunately it wasn’t easy to compile the code though—but thanks to a 15-year-old volunteer, we’ve actually now got something running.

So here it is: running inside a VAX virtual machine emulator, I can show you for the first time in public in 30 years—a running version of SMP.

SMP had a mixture of good ideas, and very bad ideas. One example of a bad idea—actually suggested to me by Tini Veltman, author of SCHOONSHIP—was representing rationals using floating point, so one could make use of the faster floating-point instructions on many processors. But there were plenty of other bad ideas too, like having a garbage collector that had to crawl the stack and realign pointers when it ran.

There were some interesting ideas. Like what I called “projections”—which were essentially a unification of functions and lists. They were almost wonderful, but there were confusions about currying—or what I called tiering. And there were weird edge cases about things that were almost vectors with sequential integer indices.

But all in all, SMP worked pretty well, and I certainly found it very useful. So now the next problem was what to do with it. I realized it needed a real team to work on it, and I thought the best way to get that was somehow to make it commercial. But at the time I was a 21-year-old physics-professor type, who didn’t know anything about business.

So I thought, let me go to the tech transfer office at the university, and ask them what to do. But it turned out they didn’t know, because, as they explained, “Mostly professors don’t come to us; they just start their own companies.” “Well,” I said, “can I do that?” And right then and there the lawyer who pretty much was the tech transfer office pulled out the faculty handbook, and looked through it, and said, “Well, yes, it says copyrightable materials are owned by their authors, and software is copyrightable, so, yes, you can do whatever you want.”

And so off I went to try to start a company. Though it turned out not to be so simple—because suddenly the university decided that actually I couldn’t just do what I wanted.

A couple of years ago I was visiting Caltech and I ran into the 95-year-old chap who had been the provost at the time—and he finally filled in for me the remaining details of what he called the “Wolfram Affair”. It was more bizarre than one could possibly imagine. I won’t tell it all here. But suffice it to say that the story starts with Arnold Beckman, Caltech postdoc in 1929, claiming rights to the pH meter, and starting Beckman Instruments—and then in 1980 being chairman of the Caltech board of trustees and being upset when he realized that gene-sequencing technology had been invented at Caltech and had “walked off campus” to turn into Applied Biosystems.

But the company I started weathered this storm—even if I ended up quitting Caltech, and Caltech ended up with a weird software-ownership policy that affected their computer-science recruiting efforts for a long time.

I didn’t do a great job starting what I called Computer Mathematics Corporation. I brought in a person—who happened to be twice my age—to be CEO. And rather quickly things started to diverge from what I thought made sense.

One of my favorite moments of insanity was the idea to get into the hardware business and build a workstation to run SMP on. Well, at the time no workstation had enough memory, and the 68000 didn’t handle virtual memory. So a scheme was concocted whereby two 68000s would run an instruction out of step, and if the first one saw a page fault, it would stop the other one and fetch the data. I thought it was nuts. And I also happened to have visited Stanford, and run into a grad student named Andy Bechtolsheim who was showing off a Stanford University Network—SUN—workstation with a cardboard box as a case.

But worse than all that, this was 1981, and there was the idea that AI—in the form of expert systems—was hot. So the company merged with another company that did expert systems, to form what was called Inference Corporation (which eventually became Nasdaq:INFR). SMP was the cash cow—selling for about $40,000 a copy to industrial and government research labs. But the venture capitalists who’d come in were convinced that the future was expert systems, and after not very long, I left.

Meanwhile I’d become a big expert on the intellectual property policies of universities—and eventually went to work at the Institute for Advanced Study in Princeton, where the director very charmingly said that since they’d “given away the computer” after von Neumann died, it didn’t make much sense for them to claim IP rights to anything now.

I dived into basic science, working a lot on cellular automata, and discovering some things I thought were very interesting. Here’s me with my SUN workstation with cellular automata running on it (and, yes, the mollusc looks like the cellular automaton):

I did some consulting work, mostly on technology strategy, which was very educational, particularly in seeing things not to do. I did quite a lot of work for Thinking Machines Corporation. I think my most important contribution was going to see the movie *WarGames* with Danny Hillis—and as we were walking out of the movie theater, saying to Danny, “Maybe your computer should have flashing lights too.” (The flashing lights ended up being a big feature of the Connection Machine computer—certainly important in its afterlife in museums.)

I was mostly working on basic science—but “because it would be easy” I decided to do a software project of building a C interpreter that we called IXIS. I hired some young people—one of whom was Tsutomu Shimomura, whom I’d already fished out of several hacking disasters. I made the horrible mistake of writing the boring code nobody else wanted to write myself—so I wrote a (quite lovely) text editor, but the whole project flopped.

I had all kinds of interactions with the computer industry back then. I remember Nathan Myhrvold, then a physics grad student at Princeton, coming to see me to ask what to do with a window system he’d developed. My basic suggestion was “sell it to Microsoft”. As it happens, Nathan later became CTO of Microsoft.

Well, by about 1985 I’d done a bunch of basic science I was pretty pleased with, and I was trying to use it to start the field of what I called complex systems research. I ended up getting a little involved in an outfit called the Rio Grande Institute—that later became the Santa Fe Institute—and encouraging them to pursue this kind of research. But I wasn’t convinced about their chances, and I resolved to start my own research institute.

So I went around to lots of different universities, in effect to get bids. The University of Illinois won, ironically in part because they thought it would help their chances getting funding from the Beckman Foundation—which in fact it did. So in August 1986, off I went to the University of Illinois, and the cornfields of Champaign-Urbana, 100 miles south of Chicago.

I think I did pretty well at recruiting faculty and setting things up for the new Center for Complex Systems Research—and the university lived up to its end of the bargain too. But within a few weeks I started to think it was all a big mistake. I was spending all my time managing things and trying to raise money—and not actually doing science.

So I quickly came up with Plan B. Rather than getting other people to help with the science I wanted to do, I would set things up so I could just do the science myself, as efficiently as possible. And this meant two things: first, I had to have the best possible tools; and second, I needed the best possible environment for myself.

When I was doing my basic science I kept on using different tools. There was some SMP. Quite a lot of C. Some PostScript, and graphics libraries, and things. And a lot of my time was spent gluing all this stuff together. And what I decided was that I should try to build a single system that would just do all the stuff I wanted to do—and that I could expect to keep growing forever.

Well, meanwhile, personal computers were just getting to the point where it was plausible to build a system like this that would run on them. And I knew a lot about what to do—and not do—from my experience with SMP. So I started designing and building what became Mathematica.

My scheme was to write documentation to define what to build. I wrote a bunch of core code—for example for the pattern matcher—a surprising amount of which is still in the system all these years later. The design of Mathematica was in many respects less radical and less extreme than SMP. SMP had insisted on using the idea of transforming symbolic expressions for everything—but in Mathematica I saw my goal as being to design a language that would effectively capture all the possible different paradigms for thinking about programming in a nice seamless way.

At first, of course, Mathematica wasn’t called Mathematica. In a strange piece of later fate, it was actually called Omega. It went through other names. There was Polymath. And Technique. Here’s a list of names. It’s kind of shocking to me how many of these—even the really horrible ones—have actually been used for products in the years since.

Well, meanwhile, I was starting to investigate how to build a company around the system. My original model was something like what Adobe was doing at the time with PostScript: we build core IP, then license it to hardware companies to bundle. And as it happened, the first person to show interest in that was Steve Jobs, who was then in the middle of doing NeXT.

Well, one of the consequences of interacting with Steve was that we talked about the name of the product. With all that Latin I’d learned in school, I’d thought about the name “Mathematica” but I thought it was too long and ponderous. Steve insisted that “that’s the name”—and had a whole theory about taking generic words and romanticizing them. And eventually he convinced me.

It took about 18 months to build Version 1 of Mathematica. I was still officially a professor of physics, math and computer science at the University of Illinois. But apart from that I was spending every waking hour building software and later making deals.

We closed a deal with Steve Jobs at NeXT to bundle Mathematica on the NeXT computer:

We also made a bunch of other deals. With Sun, through Andy Bechtolsheim and Bill Joy. With Silicon Graphics, through Forest Baskett. With Ardent, through Gordon Bell and Cleve Moler. With the AIX/RT part of IBM, basically through Andy Heller and Vicky Markstein.

And eventually we set a release date: June 23, 1988.

Meanwhile, as documentation for the system, I wrote a book called *Mathematica: A System for Doing Mathematics by Computer*. It was going to be published by Addison-Wesley, and it was the longest lead-time element of the release. And it ended up being very tight, because the book was full of fancy PostScript graphics—which nobody could apparently figure out how to render at high-enough resolution. So eventually I just took a hard disk to a friend of mine in Canada who had a phototypesetting company, and he and I babysat his phototypesetting machine over a holiday weekend, after which I flew to Logan Airport in Boston and handed the finished film for the book to a production person from Addison-Wesley.

We decided to do the announcement of Mathematica in Silicon Valley, and specifically at the TechMart place in Santa Clara. In those days, Mathematica couldn’t run under MS-DOS because of the 640K memory limit. So the only consumer version was for the Mac. And the day before the announcement there we were stuffing disks into boxes, and delivering them to the ComputerWare software store in Palo Alto.

The announcement was a nice affair. Steve Jobs came—even though he was not really “out in public” at the time. Larry Tesler came from Apple—courageously doing a demo himself. John Gage from Sun had the sense to get all the speakers to sign a book:

And so that was how Mathematica was launched. *The Mathematica Book* became a bestseller in bookstores, and from that people started understanding how to use Mathematica. It was really neat seeing all these science types and so on—of all ages—who’d basically never used computers themselves before, starting to just compute things themselves.

It was fun looking through registration cards. Lots of interesting and famous names. Sometimes some nice juxtapositions. Like when I’d just seen an article about Roger Penrose and his new book in *Time* magazine with the headline “Those Computers Are Dummies”… but then there was Roger’s registration card for Mathematica.

As part of the growth of Mathematica, we ended up interacting with pretty much all possible computer companies, and collected all kinds of exotic machines. Sometimes that came in handy, like when the Morris worm came through the internet, and our gateway machine was a weird Sony workstation with a Japanese OS that the worm hadn’t been built for.

There were all kind of porting adventures. Probably my favorite was on the Cray-2. With great effort we’d gotten Mathematica compiled. And there we were, ready for the first calculation. And someone typed 2+2. And—I kid you not—it came out “5”. I think it was an issue with integer vs. floating point representation.

You know, here’s a price list from 1990 that’s a bit of a stroll down computer memory lane:

We got a boost when the NeXT computer came out, with Mathematica bundled on it. I think Steve Jobs made a good deal there, because all kinds of people got NeXT machines to run Mathematica. Like the Theory group at CERN—where the systems administrator was Tim Berners-Lee, who decided to do a little networking experiment on those machines.

Well, a couple of years in, the company was growing nicely—we had maybe 150 employees. And I thought to myself: I built this because I wanted to have a way to do my science, so isn’t it time I started doing that? Also, to be fair, I was injecting new ideas at too high a rate; I was worried the company might just fly apart. But anyway, I decided I would take a partial sabbatical—for maybe six months or a year—to do basic science and write a book about it.

So I moved from Illinois to the Oakland Hills—right before the big fire there, which narrowly missed our house. And I started being a remote CEO—using Mathematica to do science. Well, the good news was that I started discovering lots and lots of science. It was kind of a “turn a telescope to the sky for the first time” moment—except now it was the computational universe of possible programs.

It was really great. But I just couldn’t stop—because there kept on being more and more things to discover. And all in all I kept on doing it for ten and a half years. I was really a hermit, mostly living in Chicago, and mostly interacting only virtually… although my oldest three children were born during that period, so there were humans around!

I had thought maybe there’d be a coup at the company. But there wasn’t. And the company continued to steadily grow. We kept on doing new things.

Here’s our first website, from October 7, 1994:

And it wasn’t too long after that we started doing computation on the web:

I actually took a break from my science in 1996 to finish a big new version of Mathematica. Back in 1988 lots of people used Mathematica through a command line interface. In fact, it’s still there today. 1989^1989 is the basic computation I’ve been using since, yes, 1989, to test speed on a new machine. And actually a basic Raspberry Pi today gives a pretty good sense of what it was like back at the beginning.

But, OK, on the Mac and on NeXT back in 1988 we’d invented these things we called notebooks that were documents that mixed text and graphics and structure and computation—and that was the UI. It was all very modern, with a clean front-end/kernel architecture where it was easy to run the kernel on a remote machine—and by 1996 a complete symbolic XML-like representation of the structure of the notebooks.

Maybe I should say something about the software engineering of Mathematica. The core code was written in an extension of C—actually an object-oriented version of C that we had to develop ourselves, because C++ wasn’t efficient enough back in 1988. Even from the beginning, some code was written in the Mathematica top-level language—that’s now the Wolfram Language—and over the years a larger and larger fraction of the code was that way.

Well, back at the beginning it was very challenging getting the front end to run on different machines. And we wound up with different codebases on Mac, NeXT, Microsoft Windows, and X Windows. And in 1996 one of the achievements was merging all that together. And for almost 20 years the code was gloriously merged—but now we’ve again got separate codebases for desktop, browser and mobile, and history is repeating itself.

Back in 1996 we had all kinds of ways to get the word out about the new Mathematica Version 3. My original Mathematica book had now become quite large, to accommodate all the things we were adding.

And we had a couple of other “promotional vehicles” that we called the MathMobiles that drove around with the latest gear inside—and served as moving billboard ads for our graphics.

There were Mathematicas everywhere, getting used for all kinds of things. And of course wild things sometimes happened. Like in 1997 when Mike Foale had a PC running Mathematica on the Mir space station. Well, there was an accident, and the PC got stuck in a part of the space station that got depressurized. Meanwhile, the space station was tumbling, and Mike was trying to debug it—and wanted to use Mathematica to do it. So he got a new copy on the next supply mission—and installed it on a Russian PC.

But there was a problem. Because our DRM system immediately said, “That’s a Russian PC; you can’t run a US-licensed Mathematica there!” And that led to what might be our all-time most exotic customer service call: “The user is in a tumbling space station.” But fortunately we could just issue a different password—Mike solved the equations, and the space station was stabilized.

Well, after more than a decade—in 2002—I finally finished my science project and my big book:

During my “science decade” the company had been steadily growing, and we’d built up a terrific team. But not least because of things I’d learned from my science, I thought it could do more. It was refreshing coming back to focus on it again. And I rather quickly realized that the structure we’d built could be applied to lots of new things.

Math had been the first big application of Mathematica, but the symbolic language I’d built was much more general than that. And it was pretty exciting seeing what we could do with it. One of the things in 2006 was representing user interfaces symbolically, and being able to create them computationally. And that led for example to CDF (our Computable Document Format), and things like our Wolfram Demonstrations Project.

We started doing all sorts of experiments. Many went really well. Some went a bit off track. We wanted to make a poster with all the facts we knew about mathematical functions. First it was going to be a small poster, but then it became 36 feet of poster… and eventually The Wolfram Functions Site, with 300,000+ formulas:

It was the time of the cell-phone ringtone craze, and I wanted a personal ringtone. So we came up with a way to use cellular automata to compose an infinite variety of ringtones, and we put it on the web. It was actually an interesting AI-creativity experience, and music people liked it. But after messing around with phone carriers for six months, we pretty much didn’t sell a single ringtone.

(Yes, if you go to that site, it’s currently a bit embarrassing, because it’s not working with the current browser releases. It’ll be fixed soon… but what happened is that the webMathematica server behind it was just running unattended for a decade—and now nobody knows how it works…)

But, anyway, having for many years been a one-product company making Mathematica, we were starting to get the idea that we could not only add new things to Mathematica—but also invent all kinds of other stuff.

Well, I mentioned that back when I was a kid I was really interested in trying to do what I’d now call “making knowledge computable”: take the knowledge of our civilization and build something that could automatically compute answers to questions from it. For a long time I’d assumed that to do that would require making some kind of brain-like AI. So, like, in 1980 I worked on neural networks—and didn’t get them to do anything interesting. And every few years after that I would think some more about the computable knowledge problem.

But then I did the science in *A New Kind of Science*—and I discovered this thing I call the Principle of Computational Equivalence. Which says many things. But one of them is that there can’t be a bright line between the “intelligent” and the “merely computational”. So that made me start to think that maybe I didn’t need to build a brain to solve the computable knowledge problem.

Meanwhile, my younger son, who I think was about six at the time, was starting to use Mathematica a bit. And he asked me, “Why can’t I just tell it what I want to in plain English?” I started explaining how hard that was. But he persisted with, “Well, there just aren’t that many different ways to say any particular thing,” etc. And that got me thinking—particularly about using the science I’d built to try to solve the problem of understanding natural language.

Meanwhile, I’d started a project to curate lots of data of all kinds. It was an interesting thing going into a big reference library and figuring out what it would take to just make all of that computable. Alan Turing had done some estimates of things like that, which were a bit daunting. But anyway, I started getting all kinds of experts on all kinds of topics that tech companies usually don’t care about. And I started building technology and a management system for making data computable.

It was not at all clear this was all going to work, and even a lot of my management team was skeptical. “Another WolframTones” was a common characterization. But the good news was that our main business was strong. And—even though I’d considered it in the early 1990s—I’d never taken the company public, and I didn’t have any investors at all, except I guess myself. So I wasn’t really answering to anyone. And so I could just do Wolfram|Alpha—as I have been able to do all kinds of long-term stuff throughout the history of our company.

And despite the concerns, Wolfram|Alpha did work. And I have to say that when it was finally ready to demo, it took only one meeting for my management team to completely come around, and be enthusiastic about it.

One problem, of course, with Wolfram|Alpha is that—like Mathematica and the Wolfram Language—it’s really an infinite project. But there came a point at which we really couldn’t do much more development without seeing what would happen with real users, asking real questions, in real natural language.

So we picked May 15, 2009 as the date to go live. But there was a problem: we had no idea how high the traffic would spike. And back then we couldn’t use Amazon or anything: to get performance we had to do fancy parallel computations right on the bare metal.

Michael Dell was kind enough to give us a good deal on getting lots of computers for our colos. But I was pretty concerned when I talked to some people who’d had services that had crashed horribly on launch. So I decided on a kind of hack. I decided that we’d launch on live internet TV—so if something horrible happened, at least people would know what was going on, and might have some fun with it. So I contacted Justin Kan, who was then doing justin.tv, and whose first company I’d failed to invest in at the very first Y Combinator—and we arranged to “launch live”.

It was fun building our “mission control”—and we made some very nice dashboards, many of which we actually still use today. But the day of the launch I was concerned that this was going to be the most boring TV ever: that basically at the appointed hour, I’d just click a mouse and we’d be live, and that’d be the end of it.

Well, that was not to be. You know, I’ve never watched the broadcast. I don’t know how much it captures of some of the horrible things that went wrong—particularly with last-minute network configuration issues.

But perhaps the most memorable thing had to do with the weather. We were in central Illinois. And about an hour before our grand launch, there was a weather report—that a tornado was heading straight for us! You can see the wind speed spike in the Wolfram|Alpha historical weather data:

Well, fortunately, the tornado missed. And sure enough, at 9:33:50pm central time on May 15, 2009, I pressed the button, and Wolfram|Alpha went live. Lots of people started using it. Some people even understood that it wasn’t a search engine: it was computing things.

The early bug reports then started flowing in. This was the thing Wolfram|Alpha used to do at the very beginning, when something failed:

And one of the bug reports was someone saying, “How did you know my name was Dave?!” All kinds of bug reports came in the first night—here are a couple:

Well, not only did people start using Wolfram|Alpha; companies did too. Through Bill Gates, Microsoft hooked up Wolfram|Alpha to Bing. And a little company called Siri hooked it up to its app. And some time later Apple bought Siri, and through Steve Jobs, who was by then very sick, Wolfram|Alpha ended up powering the knowledge part of Siri.

OK, so we’re getting to modern times. And the big thing now is the Wolfram Language. Actually, it’s not such a modern thing for us. Back in the early 1990s I was going to break off the language component of Mathematica—we were thinking of calling it the M Language. And we even had people working on it, like Sergey Brin when he was an intern with us in 1993. But we hadn’t quite figured out how to distribute it, or what it should be called.

And in the end, the idea languished. Until we had Wolfram|Alpha, and the cloud existed, and so on. And also I must admit that I was really getting fed up with people thinking of Mathematica as being a “math thing”. It had been growing and growing:

And although we kept on strengthening the math, 90% of it wasn’t math at all. We had kind of a “let’s just implement everything” approach. And that had gone really well. We were really on a roll inventing all those meta-algorithms, and automating things. And combined with Wolfram|Alpha I realized that what we had was a new, very general kind of thing: a knowledge-based language that built in as much knowledge about computation and about the world as possible.

And there was another piece too: realizing that our symbolic programming paradigm could be used to represent not just computation, but also deployment, particularly in the cloud.

Mathematica has been very widely used in R&D and in education—but with notable exceptions, like in the finance industry, it’s not been so widely used for deployed production systems. And one of the ideas of the Wolfram Language—and our cloud—is to change that, and to really make knowledge-based programming something that can be deployed everywhere, from supercomputers to embedded devices. There’s a huge amount to say about all this…

And we’ve done lots of other things too. This shows function growth over the first 10,000 days of Mathematica, what kinds of things were in it over the years.

We’ve done all kinds of different things with our technology. I don’t know why I have this picture here, but I have to show it anyway; this was a picture on the commemorative T-shirt for our Image Identification Project that we did a year ago. Maybe you can figure out what the caption on this means with respect to debugging the image identifier: it was an anteater in the image identifier because we lost the aardvark, who is pictured here:

And just in the last few weeks, we’ve opened up our Wolfram Open Cloud to let anyone use the Wolfram Language on the web. It’s really the culmination of 30, perhaps 40, years of work.

You know, for nearly 30 years I’ve been working hard to make sure the Wolfram Language is well designed—that as it gets bigger and bigger all the pieces fit nicely together, so you can build on them as well as possible. And I have to say it’s nice to see how well this has paid off now.

It’s pretty cool. We’ve got a very different kind of language—something that’s useful for communicating not just about computation, but about the world, with computers and with humans. You can write tiny programs. There’s Tweet-a-Program for example:

Or you can write big programs—like Wolfram|Alpha, which is 15 million lines of Wolfram Language code.

It’s pretty nice to see companies in all sorts of industries starting to base their technology on the Wolfram Language. And another thing I’m really excited about right now is that with the Wolfram Language I think we finally have a great way to teach computational thinking to kids. I even wrote a book about that recently:

And I can’t help wondering what would have happened if the 12-year-old me had had this—and if my first computer language had been the Wolfram Language rather than the machine code of the Elliott 903. I could certainly have made some of my favorite science discoveries with one-liners. And a lot of my questions about things like AI would already have been answered.

But actually I’m pretty happy to have been living at the time in history I have, and to have been able to be part of these decades in the evolution of the incredibly important idea of computation—and to have had the privilege of being able to discover and invent a few things relevant to it along the way.

]]>The equation that Albert Einstein wrote down for the gravitational field in 1915 is simple enough:

But working out its consequences is not. And in fact even after 100 years we’re still just at the beginning of the process.

Millions of lines of algebra have been done along the way (often courtesy of Mathematica and the Wolfram Language). And there have been all sorts of predictions. Like that if two black holes merge, there should be a burst of gravitational radiation generated, with a particular form. And a little more than a week ago—in a triumph of theoretical and experimental science—it was announced that just such gravitational radiation had been detected.

I’ve followed General Relativity and gravitation theory for more than 40 years now—and it’s been inspiring to see how the small community that’s pursued it has progressively increased its theoretical prowess, and how the discussions I saw at Caltech in the late 1970s finally led to a successful detector of gravitational waves.

General Relativity is surely not the whole story of how spacetime and gravity work. But we’ve now just got some spectacular new evidence of how far the theory can be taken. For a long time I myself was a bit skeptical about black holes—and for example about whether true General-Relativity-style ones would actually form in real physical processes. But as of a little more than a week ago I’m finally convinced that black holes exist, just as General Relativity suggests.

OK, so we’ve observed one pair of black holes, a billion light years away. And no doubt now—quite amazingly—we’ll get evidence for a steady stream of others around the universe. But what if somehow we could get our hands on our very own black holes, and maybe even lots of them? What could we—or, for that matter, any putative extraterrestrials—do with them? What kind of perhaps extremely exotic structures or technology could eventually be made with them?

It’s always the same story with technology. We have to take the raw material that our universe provides, and somehow find ways to organize it for purposes we want. It’s remarkable to look through the list of chemical elements, or a list of physics effects that have been discovered, and to realize that—though it sometimes took a while—almost all those that can be readily realized on the time and energy scales of today’s technology have found real applications. So what about black holes? Given how hard it’s been to detect our very first pair of black holes, it might seem almost irreverent to ask. And perhaps our universe just isn’t big enough for the question to be sensible. But as a kind of celebration of the detection of gravitational waves I thought it might be fun to try fast-forwarding a long way—and seeing what one can figure out about technology that black holes could make possible.

It seems inconceivable that we ourselves will ever get to try out anything like this for real—unless we find a way to locally make tiny stable black holes. But if something is possible to do, perhaps some more-advanced civilization out there in the universe has already done it—but we likely couldn’t recognize evidence of it without having more idea of what’s possible.

But before we can get to speculating about black hole technology, we’re going to have to talk a bit about what’s known about black holes, General Relativity and gravitation. There are lots of complicated issues—that are probably most easily explained using some fairly mathematically sophisticated concepts (Riemann tensors, covariant derivatives, spacelike hypersurfaces, Penrose diagrams, etc. etc.). But for the sake of writing a general blog post, I’m going to try to do without these, while still, I hope, correctly communicating what’s known and what’s not. I won’t be able to do it perfectly, and might lapse unwittingly into physics-speak from time to time, but here goes…

General Relativity is often discussed in terms of the geometry of spacetime. But one can also think of it as just saying that gravity is associated with a field that has a certain strength or value at every point. This idea of a field is basically just like in electromagnetism, with its electric and magnetic fields. It’s also like in fluid mechanics, where there’s a velocity field that gives the velocity of the fluid at every point (like a wind velocity map for the weather).

What Einstein did in 1915 was to suggest particular equations that should be satisfied by the gravitational field. Mathematically, they’re partial differential equations, which means that they say how values of the field relate to rates of change (partial derivatives) of these values. They’re the same general kind of equations that we know work for electromagnetic fields, or for the velocity field in a fluid.

So what does one do with these equations? Well, one solves them to find out what the field is in any particular case. It turns out that for electromagnetism, the structure of the equations makes this in principle straightforward. But for fluid mechanics, it’s considerably more complicated—and for Einstein’s equations it’s much more complicated still.

In electromagnetism, one can just think of charges and currents as being sources of electromagnetic field, and there’s no “internal effect” of the field on itself (unless one considers quantum effects). But for fluid mechanics and Einstein’s equations, it’s a different story. In a first approximation, the velocity of a fluid is determined by whatever pressure is applied to it. But what complicates things greatly is that within the fluid there’s an internal effect of each part of the velocity field on others. And it’s similar with the gravitational field: In a first approximation, the field is just determined by whatever configuration of masses exists. But there’s also an “internal effect” of the field on itself. Physically, this is because the gravitational field can be thought of as having energy and momentum, which behave like mass in effectively being a source of the field. (The electromagnetic field has energy and momentum too, but it doesn’t itself have charge, so doesn’t act as a source for itself. In QCD, the color field itself has color, so it has the same general kind of nonlinear character as fluid mechanics or Einstein’s equations.)

In electromagnetism, with its simpler structure, one can’t have any region of static nonzero field unless one has charges or currents explicitly producing it. But when fields can act on themselves it’s a different story, and there can be structures that exist purely in the field, without any external sources being present. For example, in a fluid there can be a vortex that just exists within the fluid—because this happens to be a possible solution to the pure equations for the velocity field of the fluid, without any external forces.

What about the Einstein equations? Well, it’s somewhat the same story, though the details are considerably more complicated. There are nontrivial solutions to the Einstein equations even in the case of “pure gravity”, without any matter or external configuration of masses being present. And that’s exactly what black holes are. They’re examples of solutions to the Einstein equations that correspond to structures that can just exist independently in a gravitational field, a bit like vortices can just exist in the velocity field of a fluid.

From everyday experience and from seeing the operation of programs, we tend to be used to the idea that the way to work out what something will do is to start from the beginning and then go forwards step by step. But in mathematically based science the setup is often much less direct and constructive, and instead is basically “the system obeys such-and-such an equation; whatever the system does must correspond to some solution or another to the equation”. And that’s ultimately the setup with Einstein’s equations.

There can be some serious complications. For example, given particular constraints it’s far from obvious that any solutions to the equations will exist, or be unique. And indeed we’ll encounter difficulties along these lines later. But let’s start off by trying to get some rough idea of the physics of how black holes can be made.

The classic way one imagines a black hole is made is from the collapse of a massive star. And that’s presumably where the two black holes just detected came from.

For the Earth, with its particular mass and radius, we can work out that something launched from the surface must have a velocity of about 25,000 miles per hour to escape Earth’s gravity. But for a body whose mass is larger or whose radius is smaller, the escape velocity will be larger. And what General Relativity (like Newtonian gravity before it) says is that eventually the escape velocity will exceed the speed of light—so that neither light nor anything else will be able to escape, so the object will always seem black: a black hole.

When this happens, there’s inevitably also a strong gravitational field. And this gravitational field effectively has mass, which itself serves as a source of gravitational field. And in the end, it’s actually irrelevant if there’s matter there at all: the black hole is in effect a self-sustaining configuration of the gravitational field that exists as a solution to Einstein’s equations. It’s a bit like a vortex in a fluid, which you can start by stirring, but which, once it’s there, effectively just perpetuates itself (though in a real fluid with viscosity it’ll eventually damp out).

It’s not obvious of course that the mass and radius needed to get a black hole would actually occur. It’s known that stars like the Sun will never collapse far enough. But above about 3 or 4 solar masses, there’s at least no known physical process that will prevent a star from collapsing enough to form a black hole. And the 36- and 29-solar-mass black holes recently observed presumably formed this way.

Let’s for a moment ignore how black holes might be formed, and just ask what they can be like. This is really a question about possible solutions to Einstein’s equations. And if we want something that doesn’t change with time, and that’s localized in space, then there are mathematical theorems that say the choices are very limited.

There could have been a whole zoo of possible black hole structures—and in higher dimensions, there are at least a few more. But for 4D spacetime, it actually turns out that all stationary black hole solutions are mathematically similar, and are determined by just two parameters: their overall mass and angular momentum. (If one includes electromagnetism as well, then they’re also determined by charge—and it’d be the same story with any other long-range gauge fields.)

The case of non-rotating black holes (zero angular momentum) is simplest. The relevant solution to the Einstein equations was found already by Karl Schwarzschild in 1915. But it took nearly 50 years for the interpretation of the solution to become clear.

One crucial feature of the Schwarzschild solution is that it has an event horizon. This means that any light rays (or anything else) that originate inside a certain sphere (the event horizon) are trapped forever, and can’t escape. There was confusion for quite a while, because the original formula for the Schwarzschild solution has a singularity at the event horizon. But actually this is just a mathematical artifact that can be removed by using a different coordinate system, and isn’t relevant to anything physically observable.

But even though there’s no real singularity at the event horizon, there is a singularity at the very center of the black hole—where the curvature of spacetime, and thus the effective strength of the gravitational field, is infinite. And it turns out that this singularity is in effect where the whole mass of the black hole is concentrated. It’s a pretty pathological situation. If this were happening in fluid mechanics, for example, we’d just assume that the continuum differential equations we’re using must break down, and that instead we’d have to work at the level of molecules. But for General Relativity we don’t yet have any established lower-level theory to use (though I certainly have ideas, and string theory has claims of being able to come to the rescue). There’s also elegant mathematics that’s developed around black holes and their singularities—and anyway at least in this case one can say that “It’s all happening inside the event horizon so nobody outside will ever find out about it”. So the current state of the art is just to work with the theory assuming the singularity is real—and what’s interesting now is that calculations based on this seem to have given correct answers for the recent gravitational wave discovery.

I just talked a bit about the mathematical structure of a black hole solution to Einstein’s equations. But how does this correspond to an actual black hole that could form from the collapse of a massive star?

The truest way to find out would be to start from an accurate model of the star and then simulate the whole process of forming the black hole. And at least in some approximation, it’s possible these days to do this. But let’s try a more lightweight approach.

Let’s assume that there’s a black hole solution to Einstein’s equations that exists. Then let’s ask what happens when small things fall into it. Well, there’s already an issue here. Think about an observer far from the black hole. In order to “get the news” that something crossed the event horizon of the black hole, the observer would have to get some signal—say a light pulse. But as the thing gets closer to the event horizon, it’ll take longer and longer for the signal to escape. And the result is that the observer will never see things cross the event horizon: they’ll appear to get closer and closer (and darker and darker), but never actually cross.

And that’ll be true even when it comes to the formation of the black hole. The star will be seen to be collapsing, but it’ll look as if it’s just freezing when it gets to the point where an event horizon would form.

OK, but what if the observer is also falling into a black hole? Here the experience is completely different. They probably wouldn’t even notice when they cross the event horizon, except that “handshake” signals to the outside world will stop getting responses. But then they’ll get pulled in towards the singularity at the center of the black hole. The gravitational field will steadily increase, and the fact that it’s stronger further in will inevitably stretch any object (or observer!) out. But eventually, splat, they’ll hit the singularity—and in some sense be sucked into it.

Is that really how things will work? Well, it’s hard to tell, but probably not. Outside the event horizon it’s known that small perturbations in the structure of the gravitational field—say associated with the presence of matter—will tend to get damped out, so that what emerges is exactly the official Schwarzschild black hole solution to the Einstein equations.

But inside the event horizon it’s much less clear what happens. As soon as there are perturbations, there’ll be time variations in the gravitational field, and one’s no longer dealing with a static solution to the Einstein equations. The result is that the known theorems no longer apply—and quite possibly there’ll be instabilities that change the structure or even existence of the singularity. But at least in this case, in some sense it doesn’t matter—because none of what happens will ever be visible outside of the event horizon.

In 1963 Roy Kerr found a solution to Einstein’s equations that corresponds to a black hole with angular momentum. Like the solution for a non-rotating black hole, it has a singularity in the middle. But now the singularity is not a point; instead it forms a ring.

And at least so long as the angular momentum *J* is (in suitable units) less than the square of the mass, *M*^{2}, the rotating black hole solution has an event horizon. And outside the event horizon, perturbations tend to get damped, just like in the non-rotating case. But inside, things are different.

In a non-rotating black hole anything that goes inside the event horizon will eventually hit the singularity, but won’t “see it coming”. And if light or anything else originates at the singularity it’ll just stay there, and never “get out”.

But the same isn’t true in a rotating black hole. Here, not everything will hit the singularity, and things that originate at the singularity can “get out”. This latter point is quite a problem—because it means that to know the behavior inside the black hole, you have to know what happens at the singularity. But at the singularity, Einstein’s equations can’t tell one anything: they essentially just say infinity=infinity. So the conclusion is that at least based on Einstein’s equations, one simply can’t predict what will happen.

At least with *J* < *M*^{2}, this failure of prediction occurs only inside the so-called inner horizon of the black hole. But even outside this, something weird happens. To an observer falling into the black hole, it’ll seem like a finite time elapses between when they cross the event horizon and the inner horizon. But to an observer outside the black hole, this will seem like an infinite time. And that means that any signals that come from outside the black hole—into the infinite future—could be collected by the observer inside the black hole, in finite time.

Most likely this is a sign that in practice unbounded amounts of energy will accumulate near the inner horizon, making it unstable. But if somehow stability were maintained, there’d be a really weird effect going on: the observer inside the black hole would get to see, in finite time, the whole infinite future unfolding outside the black hole. And if that future happened to include Turing machines doing computations, then in finite time the observer would get to see computations—like solving the halting problem—that can’t necessarily be done by Turing machines in any finite time.

This might be billed as evidence for “physics going beyond the Turing limit”, but it’s not really convincing, first because the whole theoretical internal structure of rotating black holes probably gets modified in practice; and second, because to really talk about the infinite future we have to consider the structure of the whole universe, not just one specific black hole.

But despite all this complexity about what happens inside the event horizon, General Relativity has clear predictions for outside—and these are what were needed for the pair of black holes just detected.

In a rotating black hole with *J* < *M*^{2}, there’s a nasty singularity—but it’s safely inside an event horizon. But for *J* > *M*^{2}, there’s the same kind of singularity, but now it’s no longer inside an event horizon, and instead it’s “naked” and exposed to the outside universe.

If there’s a naked singularity like this, the consequence is simple: General Relativity alone isn’t sufficient to describe what happens in the universe; some additional theory is needed.

Encountering something like this is one of the hazards of using a theory—like General Relativity—that’s based on solving equations (rather than, say, running a program) to deduce how systems behave.

And in fact, it’s still quite possible that something similar happens in the Navier–Stokes equations for fluid mechanics. There are lots of partial results, but it’s still not known whether starting from smooth initial conditions, the Navier–Stokes equations can generate singularities.

From a physics point of view, though, there’s something to say: the Navier–Stokes equations for fluids are derived by assuming that the velocity field doesn’t change too rapidly in space or time. And that’s a fine assumption when the velocities are small. But as soon as there’s supersonic flow, there are shocks where the velocity changes rapidly. Viscosity smooths out the shocks a bit, but by the time one’s in the hypersonic regime, at Mach 4 or so, the shocks get very sharp—in fact, so sharp that their width is less than the typical distance between collisions for molecules in the fluid. And the result of this is that the continuum description of the fluid necessarily breaks down, and one has to start looking at the underlying molecular structure.

OK, so can naked singularities actually occur in practice in General Relativity? We know they occur if you somehow have a *J* > *M*^{2} object. But what if you start from a realistic star, or some other distribution of matter? Can it spontaneously evolve to produce a naked singularity?

It was proved a few decades ago that if you start with something that’s close to ordinary flat spacetime, it can’t spontaneously make singularities. But if you start putting matter in, then the story changes. And in fact there are now several examples known where a smooth initial distribution of matter can evolve to make a naked singularity—though the singularity only shows up if the initial conditions are very carefully arranged and as soon as there’s any perturbation, it goes away.

Can one get a stable naked singularity without this kind of special setup? So far, nobody knows.

And nobody knows whether *J* > *M*^{2} objects can be formed. If one looks at candidate black holes around the universe, most of them are rotating. The final one from the week before last had *J* ≃ 0.7 *M*^{2}. And it’s certainly interesting to note that while many have *J* close to *M*^{2}, none seen so far have *J* > *M*^{2}. It’s also interesting that in numerical simulations of pairs of rotating black holes, they always eventually merge—but if the result would have *J* > *M*^{2} they seem to “delay” their merger, and emit lots of gravitational radiation that gets rid of angular momentum, before merging to produce a black hole with *J* < *M*^{2}.

People have been talking about gravitational waves for almost a century, and there’s been indirect evidence of them for a while. But the recent announcement of direct detection of gravitational waves is pretty exciting.

So what are gravitational waves? They’re a fairly direct analog of electromagnetic waves. If you take a charge and wiggle it around, it’ll radiate electromagnetic waves—for example, radio waves. And in a directly analogous way, if you take a mass and wiggle it around, it’ll radiate gravitational waves. Usually they’ll be incredibly weak. But if the mass is very big and concentrated, like a black hole, the gravitational waves can be stronger—and, as we’ve now seen, even strong enough to detect.

Why is there radiation when you wiggle something around? It’s not hard to see. Imagine, say, that there’s a charge sitting somewhere, and you’re some distance away. There’ll be electric field from the charge—that’s, say, pointing towards the charge. Now suddenly move the charge. After things have stabilized again, there’d better be a new version of the electric field, say pointing to the new position of the charge. But how does the transition happen? The answer is that the change somehow has to propagate outward from the charge—and the process of that happening is electromagnetic radiation, which (in a vacuum) moves at the speed of light.

In general, the amount of electromagnetic radiation that’s produced is proportional to (the square of) the acceleration of the charge. (Actually, there’s considerable subtlety to this, particularly in the relativistic case—and the details of the globally correct formula are still somewhat debated.) It’s similar for gravitational radiation.

There are some differences though. A minimal antenna for electromagnetic radiation is a straight wire, that electrons can go up and down. For gravitational radiation, the minimal “antenna” has to be something that effectively has motion in two perpendicular directions—or, more technically, a changing quadrupole moment. In practice, two bodies orbiting each other will emit gravitational radiation, more or less as a result of the acceleration necessary to keep them in their orbits. More or less any mass that “blobs around” without being spherically symmetric will also emit gravitational waves.

When something emits gravitational waves, it’s radiating away some of its energy. And in general the emission of gravitational radiation tends to have a damping effect on the motion of things. For example, the emission of gravitational radiation will make orbits decay—and makes orbiting bodies progressively spiral in towards each other.

For something like the Earth and the Sun, this is an absolutely infinitesimal effect. But for a pair of neutron stars orbiting each other, it’s more significant. And indeed, starting in 1974 such an effect was observed in a binary pulsar. And now, this is what caused two black holes eventually to spiral in so far that they hit each other—and produce the event just announced.

Once two black holes hit, there’s a tremendous amount of gravitational radiation emitted as the resulting object “blobs around” before assuming its final single-black-hole shape. For stellar-sized black holes it all happens in a few hundred milliseconds. And in the case of the event just announced, the total energy in gravitational radiation was a whopping 3 solar masses—big enough that we’re able to detect it a significant fraction of the way across the universe.

Pretty much any kind of field or continuous material supports some kind of waves. Start from whatever the stable state of the system is, then perturb it just a little by periodically changing something, and you’ll get waves. When the amplitude of the waves is small enough, the math tends to be fairly straightforward. For example, in a first approximation, the amplitudes of different waves at a particular point will just add linearly.

But when the amplitudes of the waves get bigger, things can get much more complicated. In electromagnetism, everything stays linear however big the amplitude is (well, until one runs into quantum effects). But for pretty much any other kind of waves—including, say, water waves, as well as gravitational waves—there start to be nonlinear effects as soon as the amplitude is larger.

When there’s linearity, one can effectively break down any field configuration into a sequence of non-interacting waves of different frequencies. But that’s no longer true for something nonlinear, and eventually it usually doesn’t make sense to talk about waves at all: one’s just dealing with some field configuration or another.

In the case of gravitational waves, one of the notable features is that one can in principle arrange waves to combine so that they’ll form black holes. Indeed, one can potentially start with low-amplitude waves, but somehow make them converge to a point where they’ll generate a black hole (think “gravitational implosion lens”, etc.).

A single static black hole in an infinite universe is a possible solution to Einstein’s equations. So what about two black holes orbiting each other? Well, there’s no known exact solution to the equations for this case, and it’s only fairly recently that it’s become possible to calculate with any reliability what happens.

Roughly, there are three regimes. First, the black holes are peacefully orbiting, and emitting gravitational radiation. When the black holes are far apart, and have velocities small compared to the speed of light, it’s fairly straightforward. But as they get closer and speed up, it becomes more complicated. Each black hole perturbs the other, but with a lot of algebra it’s possible to calculate the effects (as a power series in v/c).

Eventually, though, this breaks down, and the only choice is to solve the Einstein equations numerically using many of the same methods traditionally used for fluid mechanics. (There’ve been various efforts to use the same kind of cellular automaton approach on the Einstein equations that I used for the Navier–Stokes equations, but I think what’s more promising is to try something like my network-rewriting models for gravity.)

It’s only in recent years that computers have become fast enough to get sensible answers from computations like this involving high gravitational fields as well as velocities close to the speed of light. And in these computations, the result is that something like a single black hole is formed. Inevitably it’s a deformed black hole, and the third regime is one where—a bit like a bell—the black hole “rings down” these deformations (either by emitting gravitational radiation, or by absorbing them into the black hole itself).

It’s a pretty complicated stack of computations, requiring a variety of different methods. But the impressive thing is that—judging from the recent announcement—it seems to correctly capture what goes on in the interaction between two black holes.

There are plenty of detailed issues, however. One of them is that you can’t just set up some elaborate initial state with two black holes and expect that it will be a solution to the Einstein equations, even for an instant. So in addition to working out the time evolution, one also has to somehow progressively modify the initial conditions one specifies, so that they actually correspond to a possible configuration of the gravitational field according to Einstein’s equations.

If we want to start thinking about black hole configurations for purposes of technology, it would help to devise a simplified summary of interactions between two—or more—black holes. For example, one might want to have a summary of the effects of the direction of rotation (or “spin”) and of orbiting on black holes’ interactions, organized (in analogy with quantum systems) into spin-orbit, spin-spin, etc. components.

It’s a general feature of fluids that when they flow rapidly, they tend to show turbulence and behave in seemingly random ways. It’s still not completely clear what the origin of this apparent randomness is. It could be that somehow one is seeing an amplified version of small-scale random molecular motions. Or it could be there is enough instability that one is progressively exploring random details of initial conditions (as in chaos theory). I’ve spent a long time studying this, and my conclusion is that the randomness mostly isn’t coming from things that are essentially outside of the fluid; it’s instead coming from the actual dynamics of the fluid, as if the fluid were computing my rule 30 cellular automaton, or running a pseudorandom number generator.

If one works with the standard Navier–Stokes equations for fluid mechanics, it’s not very clear what’s going on—because one ends up having to solve the equations numerically, and whenever something complicated happens, it’s almost impossible to tell if it’s a consequence of the numerical analysis one’s done, or a genuine feature of the equations. I sidestepped these issues by using cellular automaton models for fluids rather than differential equations—and from that it’s pretty clear that intrinsic randomness generation is at least a large part of what’s going on. And having seen this, my expectation would be that if one could solve the equations well enough, one would see exactly the same behavior in the Navier–Stokes equations.

So what about the Einstein equations? Can they show turbulence? I’ve long thought that they should be able to, although to establish this will run into the same kinds of numerical-analysis issues as with the Navier–Stokes equations, though probably in an even more difficult form.

In a fluid the typical pattern is that one starts with a large-scale motion (say induced by an airplane going through the air). Then what roughly happens (at least in 3D) is that this motion breaks down into a cascade of smaller and smaller eddies, until the eddies are so small that they are damped out by viscosity in the fluid.

Would something similar happen with turbulence in the gravitational field? It can’t be quite the same, because unlike fluids, which dissipate small-scale motion by turning it into heat, the gravitational field has no such dissipation mechanism, at least according to Einstein’s equations (without adding matter, quantum effects, etc.). (Note that even with ordinary fluid mechanics, things are very different in 2D: there eddies tend not to break into smaller ones, but instead to combine into larger ones, perhaps like the Great Red Spot on Jupiter.)

My guess is that a phenomenon akin to turbulence is endemic in systems that have fields which can interact with themselves. Another potential example is the classical analog of QCD—or, more simply, classical Yang–Mills theory (the theory of a classical self-interacting color field). Yang–Mills theory shares with gravity the feature that it exhibits no dissipation, but is mathematically perhaps simpler. For years I’ve been asking people who do lattice-gauge-theory simulations whether they see any analog of turbulence. But with the randomized sampling (as opposed to evolution) approach they typically use, it’s hard to tell. (There are mathematical connections between versions of gravity and versions of Yang–Mills theory that have been extensively explored in recent years, but I don’t know what implications they have for questions of turbulence.)

In Newton’s theory of gravity, there’s an inverse square law for the force of gravity. Sufficiently far away from a massive object, the same law holds in General Relativity too. With an inverse square law for gravity, the orbit of a pointlike object around any spherical mass will always be an ellipse (just like Newton said it should be for Halley’s Comet). And every time the object goes around its orbit, it will just retrace the exact same ellipse, keeping the long axis of the ellipse in the same direction.

But what happens in General Relativity, and with black holes? The first important fact is that if something is spherically symmetric, then the gravitational field it produces outside itself must always be given exactly by the Schwarzschild solution to Einstein’s equations. That’s true for a perfectly spherical star, and it’s also true for a non-rotating black hole. And in fact that’s why it was often hard to tell if you were dealing with a genuine black hole: because the gravitational field outside it would be the same as for a star of the same mass.

So what happens according to General Relativity if you’re in orbit around something spherical? In a first approximation, the orbit is still elliptical, but the axis of the ellipse can change (“precess”)—and in fact one of the early successes of General Relativity was to explain an effect like this that had been seen for the orbit of the planet Mercury (the “advance of the perihelion”).

Here’s what actually happens as the orbital distance goes down:

The object in the middle looks larger and larger relative to the orbit. In the final picture, there’s no orbit at all, and one just spirals into the object in the middle. In the other cases, there are roughly elliptical orbits, but the precession effect gets larger and larger, and typically one ends up eventually visiting a whole ring of possible positions. (There’s an interactive version of this on the Wolfram Demonstrations Project.)

But does this always happen? The answer is no: one can pick special initial conditions that instead give a variety of closed orbits with various patterns:

So what about a rotating object, or specifically a rotating black hole? One notable feature is a phenomenon called “frame dragging”, which causes orbits to be pulled towards rotating along with the object. A consequence of this is that unless the orbit precisely follows the direction of rotation, it won’t stay in a single plane, and—in a seemingly quite random way—will typically fill up not a ring but a whole 3D torus. (Try out the interactive demonstration to see this.)

Although it eventually fills in a torus, the pattern of the orbit can be fairly different depending on what initial “latitude” one starts from (all these are shown for the same total time):

If you’re sufficiently far away from the black hole, then it turns out that even though you’re pulled by frame dragging, you can in principle overcome the force (say with a powerful enough rocket). But if you’re inside a region called the ergosphere (indicated by the gray region in the pictures), you’d have to be going faster than the speed of light to do that. So the result is that any object that gets into the ergosphere (which extends outside of the event horizon) will inevitably be made to co-rotate with the black hole, just through frame dragging.

And this means that if you can put something into the ergosphere, it can gain energy—ultimately by reducing the angular momentum of the black hole. One could imagine using this as a way to harvest the energy of a black hole—and indeed astronomical phenomena like high-energy gamma ray bursts are thought to be possibly related.

OK, so we’ve talked about orbiting a black hole, and earlier about what happens with two black holes. But what about with more black holes? Well, we can start by asking that question just for simple point masses following Newton’s law of gravity—and it turns out that even there things are already extremely complicated.

The pictures below show a bunch of possible trajectories for three equal-mass pointlike objects interacting through ordinary Newtonian gravity. The only difference between the setup for the different pictures is where the objects were started. But one can see that just changing this initial condition leads to an incredible diversity of behavior:

Here are some animated versions:

Solving the necessary differential equations is fast enough these days in the Wolfram Language that one can actually generate these interactively. Here’s a version in 2D where you can interactively move around the initial positions and velocities:

And here’s a version in 3D where you can set all the positions and velocities in 3D:

If we just had two objects (a “two-body problem”), all that would ever happen is that they’d orbit each other in a simple ellipse. But adding a third object (“three-body problem”) immediately allows dramatically more complexity. Sometimes in the end all three objects just go their separate ways. Sometimes two form a binary system and the third goes separately. And sometimes all three make anything from an orderly arrangement to a complicated tangled mess.

The three-body problem turns out to be a classic example of the chaos-theory idea of sensitive dependence on initial conditions: in many situations, even the tiniest change in, say, the initial position of an object will be progressively amplified. And the result is that if one specifies the initial conditions by numbers (say, for coordinate positions), then the evolution of the system will effectively “excavate” more and more digits in these numbers.

Here’s a particularly simple example. Imagine having a pair of objects in a simple elliptical orbit. Then a third object (assumed to have infinitesimally small mass) is started a certain distance above the plane of the ellipse. Gravity will make the third object oscillate back and forth through this plane forever. But the tricky thing is that the details of these oscillations depend arbitrarily sensitively on the details of the initial conditions.

This picture shows what happens when one starts that third object at one of four different coordinate positions that differ by one part in a billion. For a while, all of them follow what looks like exactly the same trajectory. But then they start to diverge, and eventually each of them does something completely different:

Plotting this in 3D (with the initial position *z*(0) shown going into the page) we can see just how random things can get—even though each specific trajectory is precisely determined by the sequence of digits in the real number that represents its initial condition. (It’s not trivial, by the way, to compute these pictures correctly; it requires using the arbitrary-precision number arithmetic of the Wolfram Language—and as time goes on more and more digits are needed.)

Not surprisingly, there’s no simple formula that represents these results. But a few interesting things have been proved—for example that if one measures each oscillation by how many orbits are completed while it is happening, then one can get any sequence of integers one wants by choosing the initial conditions appropriately.

The two-body problem was solved in terms of mathematical formulas by Isaac Newton in 1687—as a highlight of his introduction of calculus. And in the 1700s and 1800s it was assumed that eventually someone would find the same kind of solution for the three-body problem. But by the end of the 1800s there were results (notably by Henri Poincaré) that suggested there couldn’t be a solution in terms of at least certain kinds of functions.

It’s still not proved that there can’t be solutions in terms of any kind of known functions (much as even though there aren’t algebraic solutions to quintic equations, there are ones in terms of elliptic or hypergeometric functions). But I strongly suspect that there can never, even in principle, be a complete solution to the three-body problem as an explicit formula.

One can think of the time evolution of a system of masses interacting according to gravity as being a computation: you put in the initial conditions, and then you get out where the masses are after a certain time. But how sophisticated is this computation? For the two-body problem, it’s fairly simple. In fact, however long the actual two-body system runs, one can always find the outcome just by plugging numbers into a straightforward formula.

But what about the three-body problem? The pictures above suggest a very different story. And indeed my guess is that the evolution of a three-body system can correspond to an arbitrarily sophisticated computation—and that with suitable initial conditions it should in fact be able, for example, to emulate any Turing machine, and thus act as a universal computer.

I’ve suspected computational universality in the three-body problem for about 35 years now. But it’s a technically complicated thing to prove. Usually in studying computation we look at fundamentally discrete systems, like Turing machines or cellular automata. But the three-body problem is fundamentally continuous—and can for example make use of arbitrarily many digits in the real numbers it’s given as initial conditions.

Still, at least from a formal point of view, one can set up initial conditions that have, say, a finite sequence of nonzero digits. Then one can look at the output from the evolution of the system, binning the results to get a sequence of discrete data (e.g. using ideas of symbolic dynamics). And then the question is whether by changing the initial conditions we can have the output sequence correspond to the result from any program we want—say one that shows which successive numbers are prime, or computes the digits of pi.

So what would it mean if we could prove this kind of computational universality? One thing it would mean is that three-body problem must be computationally irreducible, so there couldn’t ever be a way to “shortcut”—say with a formula—the actual computation it does in getting a result. And another thing it means is that certain infinite-time questions—like whether a particular body can ever escape for any of a particular range of initial conditions—could in general be undecidable.

(There’s a whole discussion about whether the three-body problem, because it works with real numbers, can compute more than a standard universal computer like a Turing machine, which only works with integers. Suffice it here to say that my strong suspicion is that it can’t, at least if one insists that the initial conditions and the results can be expressed in finite symbolic terms.)

How stable are the seemingly random trajectories in the three-body problem? Some are very sensitive to the details of the initial conditions, but others are quite robust. And for example, if one were designing a trajectory for a spacecraft, it seems perfectly possible that one could find a complex and seemingly random trajectory that would achieve some purpose one wants.

Are there cases where actual star or planetary systems will exhibit apparent randomness? There were undoubtedly examples even in the history of our own solar system. But because randomness tends to bring bodies into regions where they haven’t been before, there’s a higher chance of disruption by external effects—such as collisions—and so the apparent randomness probably doesn’t typically last under “natural selection for solar systems” when there are many bodies in the system.

In the ever-difficult problem of working out whether something is of “intelligent origin”, the three-body problem adds another twist—because it allows astronomical processes to show complexity just as a consequence of their intrinsic dynamics. If it is indeed possible to do arbitrary computation with a three-body system, then such a system could in principle be programmed to, say, generate the digits of pi, and perhaps make them visible in the light curve of a star. But often the system will show just as complex behavior from many different initial conditions—and one won’t be able to tell whether the behavior has any element of “purpose”.

Can one pick initial conditions for the three-body problem to achieve particular kinds of behavior? The answer is certainly yes. One example (already found by Lagrange in 1772) is to have the bodies on the corners of an equilateral triangle—which produces stable periodic behavior.

One can find other periodic configurations too:

And indeed, particularly if one allows more bodies, given some specified periodic trajectory, one can probably find (by fairly traditional gradient descent methods) initial conditions that will reproduce it, at least to some accuracy. (A notable example found in 1993 is just three bodies following a figure-eight orbit.)

But what about more-complex trajectories? Clearly, each set of initial conditions gives some kind of behavior. The question is whether it’s useful.

The situation is similar to what I’ve encountered for a long time in studying simple programs like cellular automata: out there in the computational universe of possible programs, there’s all kinds of rich and complex behavior. Now the issue is to “mine” those examples that are actually useful for something.

In practice, I’ve done lots of “algorithm discovery” in the computational universe, setting up criteria and then searching huge numbers of possible programs to find ones that are useful. And I expect exactly the same can be done for gravitational systems like the three-body problem. It’s really a question of formulating some purpose one’s trying to achieve with the system; then one can just start searching, often quite exhaustively, for a case that achieves that purpose.

So how do black holes work in things like the three-body problem? The basic story is simple: so long as the bodies stay far enough apart, it doesn’t matter whether they’re black holes or just generic masses. But if they get close, there’ll start to be relativistic effects, and that’s where black holes will be important. Presumably, however, one can just set up a constraint that there should be no close approaches, and one will still be able to do plenty of gravitational engineering—with black holes or any other massive objects.

If we’re going to be able to do serious black hole engineering, we’d better have a serious source of black holes. It’s not clear that our universe is going to cooperate on this. There are probably big black holes at the centers of galaxies (and that may be the rather unsatisfying answer to “what’s the ‘equilibrium’ state” of a large number of self-gravitating objects). There’s probably a decent population of black holes from collapsed massive stars—perhaps one per thousand stars or so, which means 100 million spread across our galaxy.

There’s an important other point to mention about black holes: if current theories correctly graft certain aspects of quantum mechanics onto the classical physics of the Einstein equations, then any black hole will emit Hawking radiation, and will eventually evaporate away as a result. Star-sized black holes would have huge lifespans, but for less-massive black holes, the lifespan goes down, and for a black hole the mass of Halley’s comet, the lifespan would be about a billion years.

What about tiny black holes? Hawking radiation suggests they should evaporate almost instantly: an electron-mass one should be gone in well under 10^{-100} seconds. (When I was 15 or so, I remember asking a distinguished physicist whether electrons could actually be black holes. He said it was a stupid idea, which probably it was. But in writing this blog I discovered that Einstein also considered this idea—though about 50 years before I did. And as it happens, in my network-based models, electrons do end up being made of “pure space”, not so unlike black holes.)

Even if it’s hard to get genuine gravitational black holes, one might wonder if there could at least be analogs that are easier to get. And in recent years there’s been some success with making “sonic black holes”—that are at least a rough analog of gravitational black holes, but where it’s sound, rather than light, that’s trapped.

OK, so we’re now finally ready to talk about creating technology with black holes. I should say at the outset that I’m not at all happy with what I’ve managed to figure out. Lots of things I thought might work turn out simply to be impossible when one looks at them in the light of actual black hole physics. And some others, while perhaps interesting, require assembling large numbers of black holes, which seems almost absurdly infeasible in our universe—given how sparse at least larger black holes seem to be, with only perhaps 10^19 spread across our whole universe.

But let’s say we just have one black hole. What can we do with it? One answer is to “bask in its time dilation”—or in some sense to use it to do “time travel to the future”.

Special Relativity already exhibits the phenomenon of time dilation, in which time runs more slowly for an object that’s moving quickly. General Relativity also messes around with the rate at which time runs. In particular, in a place with stronger gravity, time runs slower than in a place with weaker gravity. And so this means, for example, that as one goes further from the Earth, time runs slightly faster. (The clocks on GPS satellites are back-corrected for this—making them at least naively appear to “violate General Relativity”.)

Near a black hole, strong gravity can make time run significantly more slowly. There’s a nice example in the movie *Interstellar*, in which there’s a planet orbiting at exactly the right distance from a black hole with exactly the right parameters—so that time runs much more slowly on the planet, but other gravitational effects there aren’t too extreme.

In a sense, as soon as one has a way to make time locally run slower, one can do “time travel to the future”. For the “traveler” a month might have elapsed—but outside it could have been a century. (It’s worth mentioning that one can achieve the same kind of effect without gravity just by doing a trip in which one accelerates to close to the speed of light.)

Of course, even though this would allow “time travel to the future”, it would give no way to get back. For that, one would need so-called closed timelike curves, which do in principle exist in solutions to the Einstein equations (notably, the one found by Kurt Gödel), but which don’t seem to appear in any physically realizable case. (In a system determined by equations, a closed timelike curve is really less about “traveling in time” than it is about defining a consistency condition between what happens in the past and the future.)

In science fiction, black holes and related phenomena tend to be a staple of faster-than-light travel. At a more mundane level, the kind of “gravity assist” maneuvers that real spacecraft do by swinging, say, around Jupiter could be done on a much larger scale if one could swing around a black hole—where the maximum achievable velocity would be essentially the speed of light.

In General Relativity, the only way to effectively go faster than light is to modify the structure of spacetime. For example, one can imagine a “wormhole” or tube that directly connects different places in space. In General Relativity there’s no way to form such a wormhole if it doesn’t already exist—but there’s nothing to say such wormholes couldn’t already have existed at the beginning of the universe. There is a problem, though, in maintaining an “open wormhole”: the curvature of spacetime at the end would tend to create gravity that would make it collapse.

I don’t know if it can be proved that there’s no configuration of, say, orbiting black holes that would keep the wormhole open. One known way to keep it open is to introduce matter with special properties like negative energy density—which sounds implausible until you consider vacuum fluctuations in quantum field theory, inflationary-universe scenarios or dark-energy ideas.

Introducing exotic matter makes all sorts of new solutions possible for the Einstein equations. A notable example is the Alcubierre solution, which in some sense provides a different way to traverse space at any speed, effectively by warping the space.

Could there be a solution to the Einstein equations that allows something similar, without exotic matter? It hasn’t been proved that it’s impossible. And I suppose one could imagine some configuration of judiciously placed black holes that would make it possible.

It’s perhaps worth mentioning that in the models I’ve studied where the underlying structure of spacetime is a network with no predefined number of space dimensions, wormhole-like phenomena seem more natural—though insofar as the models reproduce General Relativity on large scales, this means such phenomena can’t originate on those scales.

It’s easy to generate high energies with a black hole. Matter that spirals in towards the black hole will gain energy—and indeed, around stellar and larger black holes there’s potentially an accretion disk that contains high-energy matter.

With rotating black holes, there are some additional energy phenomena. In the ergosphere, objects can gain energy at the expense of the black hole itself. This is relevant both in accelerating ordinary matter, and in producing “superradiance” where energy is added to waves, say of light, that pass through the ergosphere.

Can one do better with multiple black holes than a single one? I don’t know. Maybe there’s a configuration of orbiting black holes that’s somehow optimized for imparting energy to matter—like a kind of particle accelerator made from black holes.

We saw earlier some of the complex trajectories that three bodies interacting through gravity can follow. But what kind of trajectories can we potentially “engineer”, particularly with more bodies?

It’s not too difficult to start with approximate trajectories and then do gradient descent (e.g. in Fourier space) to try to find trajectories that actually correspond, for example, to closed orbits. So can one for example find a “gravitational crystal” that consists of an infinite regular array of interacting gravitational bodies?

There are some mathematical tricks to apply—and one ends up having to use randomized search more than systematic gradient descent—but there do seem to be gravitational crystals to be found. Here are two potential examples that show a kind of checkerboard symmetry:

I suppose a “gravitational wall” like this might be good for stopping things that approach it. With the right parameters, it might be able to capture anything (perhaps up to some speed) that tries to cross it.

Given a “gravitational crystal”, one can ask about implementing things like cellular automata on it. I don’t know how to store “bits” for cellular automaton cells in lattices like these without disrupting the lattice too much, but I suspect there’s a way. (Yes, classical gravity is reversible, so one would have to have reversible cellular automata, but there are plenty of those.)

What’s shown here is something that’s intended to be a regular, periodic “crystal”. One can also potentially imagine creating a “random crystal” in which there’s overall regularity, but at a small scale there’s seemingly random motion. If one could make such a random crystal work, then it might provide a more robust “wall”, less affected by outside perturbations.

Modularization is an important general technique in engineering because it lets one break a problem into parts and then solve each one separately. But for gravitational systems, it’s hard to do modularization—because gravity is a large-range force, dropping off only gradually with distance.

And even with spinning black holes and the like, I don’t know of any way to achieve the analog of gravitational shielding—though this changes if one introduces exotic matter that effectively has negative mass, or if, for example, every black hole has electric charge.

And without modularization, it’s surely more difficult to create something technologically useful—because in effect one has to figure out everything at once. But it’s certainly conceivable that by searching a space of possibilities one could find something—though without modularization it might look very complicated (as long-range simple programs, like combinators, tend to do), and it could be difficult even to tell what the system achieves without looking for specific properties one already knows.

Having said all this, I suspect that there are big things I am missing—and that with the right ways of thinking, there’ll end up being some spectacular kinds of technology that black holes make possible. And for all we know, once we figure this out we’ll realize that an example of it has already existed in our universe for a billion years, whether of “natural” origin or not.

But for now, the discovery of gravitational radiation from merging black holes is a remarkable example of how something like the small equation Einstein wrote down for the gravitational field a hundred years ago can lead to such elaborate consequences. It’s an impressive endorsement of the strength of theoretical science—and perhaps an inspiration to see just how small the rules might be to generate everything we see in our universe.

]]>