It’s been very satisfying to see how successfully Wolfram|Alpha has democratized computational knowledge and how its effects have grown over the years. Now I want to do the same thing with knowledge-based programming—through the Wolfram Open Cloud.

Last week we released Wolfram Programming Lab as an environment for people to learn knowledge-based programming with the Wolfram Language. Today I’m pleased to announce that we’re making Wolfram Programming Lab available for free use on the web in the Wolfram Open Cloud.

Go to wolfram.com, and you’ll see buttons labeled “Immediate Access”. One is for Wolfram|Alpha. But now there are two more: Programming Lab and Development Platform.

Wolfram Programming Lab is for learning to program. Wolfram Development Platform (still in beta) is for doing professional software development. Go to either of these in the Wolfram Open Cloud and you’ll immediately be able to start writing and executing Wolfram Language code in an active Wolfram Language notebook.

Just as with Wolfram|Alpha, you don’t have to log in to use the Wolfram Open Cloud. And you can go pretty far like that. You can create notebook documents that involve computations, text, graphics, interactivity—and all the other things the Wolfram Language can do. You can even deploy active webpages, web apps and APIs that anyone can access and use on the web.

If you want to save things then you’ll need to set up a (free) Wolfram Cloud account. And if you want to get more serious—about computation, deployments or storage—you’ll need to have an actual subscription for Wolfram Programming Lab or Wolfram Development Platform.

But the Wolfram Open Cloud gives anyone a way to do “casual” programming whenever they want—with access to all the core computation, interface, deployment and knowledge capabilities of the Wolfram Language.

In Wolfram|Alpha, you give a single line of natural language input to get your computational knowledge output. In the Wolfram Open Cloud, the power and automation of the Wolfram Language make it possible to give remarkably small amounts of Wolfram Language code to get remarkably sophisticated operations done.

The Wolfram Open Cloud is set up for learning and prototyping and other kinds of casual use. But a great thing about the Wolfram Language is that it’s fully scalable. Start in the Wolfram Open Cloud, then scale up to the full Wolfram Cloud, or to a Wolfram Private Cloud—or instead run in Wolfram Desktop, or, for that matter, in the bundled version for Raspberry Pi computers.

I’ve been working towards what’s now the Wolfram Language for nearly 30 years, and it’s tremendously exciting now to be able to deliver it to anyone anywhere through the Wolfram Open Cloud. It takes a huge stack of technology to make this possible, but what matters most to me is what can be achieved with it.

With Wolfram Programming Lab now available through the Wolfram Open Cloud, anyone anywhere can learn and start doing the latest knowledge-based programming. Last month I published *An Elementary Introduction to the Wolfram Language* (which is free on the web); now there’s a way anyone anywhere can do all the things the book describes.

Ever since the web was young, our company has been creating large-scale public resources for it, from Wolfram MathWorld to the Wolfram Demonstrations Project to Wolfram|Alpha. Today we’re adding what may ultimately be the most significant of all: the Wolfram Open Cloud. In a sense it’s making the web into a true computing environment—in which anyone can use the power of knowledge-based programming to create whatever they want. And it’s an important step towards a world of ubiquitous knowledge-based programming, with all the opportunities that brings for so many people.

*To comment, please visit the copy of this post at the Wolfram Blog »*

That afternoon we were driving through Pasadena, California—and with no apparent concern to the actual process of driving, Feynman’s visitor was energetically pointing out all sorts of things an AI would have to figure if it was to be able to do the driving. I was a bit relieved when we arrived at our destination, but soon the visitor was on to another topic, talking about how brains work, and then saying that as soon as he’d finished his next book he’d be happy to let someone open up his brain and put electrodes inside, if they had a good plan to figure out how it worked.

Feynman often had eccentric visitors, but I was really wondering who this one was. It took a couple more encounters, but then I got to know that eccentric visitor as Marvin Minsky, pioneer of computation and AI—and was pleased to count him as a friend for more than three decades.

Just a few days ago I was talking about visiting Marvin—and I was so sad when I heard he died. I started reminiscing about all the ways we interacted over the years, and all the interests we shared. Every major project of my life I discussed with Marvin, from SMP, my first big software system back in 1981, through Mathematica, *A New Kind of Science*, Wolfram|Alpha and most recently the Wolfram Language.

This picture is from one of the last times I saw Marvin. His health was failing, but he was keen to talk. Having watched more than 35 years of my life, he wanted to tell me his assessment: “You really did it, Steve.” Well, so did you, Marvin! (I’m always “Stephen”, but somehow Americans of a certain age have a habit of calling me “Steve”.)

The Marvin that I knew was a wonderful mixture of serious and quirky. About almost any subject he’d have something to say, most often quite unusual. Sometimes it’d be really interesting; sometimes it’d just be unusual. I’m reminded of a time in the early 1980s when I was visiting Boston and subletting an apartment from Marvin’s daughter Margaret (who was in Japan at the time). Margaret had a large and elaborate collection of plants, and one day I noticed that some of them had developed nasty-looking spots on their leaves.

Being no expert on such things (and without the web to look anything up!), I called Marvin to ask what to do. What ensued was a long discussion about the possibility of developing microrobots that could chase mealybugs away. Fascinating though it was, at the end of it I still had to ask, “But what should I *actually* do about Margaret’s plants?” Marvin replied, “Oh, I guess you’d better talk to my wife.”

For many decades, Marvin was perhaps the world’s greatest energy source for artificial intelligence research. He was a fount of ideas, which he fed to his long sequence of students at MIT. And though the details changed, he always kept true to his goal of figuring out how thinking works, and how to make machines do it.

By the time I knew Marvin, he tended to talk mostly about theories where things could be figured out by what amounts to common sense, perhaps based on psychological or philosophical reasoning. But earlier in his life, Marvin had taken a different approach. His 1954 PhD thesis from Princeton was about artificial neural networks (“Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem”) and it was a mathematics thesis, full of technical math. And in 1956, for example, Marvin published a paper entitled “Some Universal Elements for Finite Automata”, in which he talked about how “complicated machinery can be constructed from a small number of basic elements”.

This particular paper considered only essentially finite machines, based directly on specific models of artificial neural networks. But soon Marvin was looking at more general computational systems, and trying to see what they could do. In a sense, Marvin was beginning just the kind of exploration of the computational universe that years later I would also do, and eventually write *A New Kind of Science* about. And in fact, as early as 1960, Marvin came extremely close to discovering the same core phenomenon I eventually did.

In 1960, as now, Turing machines were used as a standard basic model of computation. And in his quest to understand what computation—and potentially brains—could be built from, Marvin started looking at the very simplest Turing machines (with just 2 states and 2 colors) and using a computer to find out what all 4096 of them actually do. Most he discovered just have repetitive behavior, and a few have what we’d now call nested or fractal behavior. But none do anything more complicated, and indeed Marvin based the final exercise in his classic 1967 book *Computation: Finite and Infinite Machines* on this, noting that “D. G. Bobrow and the author did this for all (2,2) machines [1961, unpublished] by a tedious reduction to thirty-odd cases (unpublishable).”

Years later, Marvin told me that after all the effort he’d spent on the (2,2) Turing machines he wasn’t inclined to go further. But as I finally discovered in 1991, if one just looks at (2,3) Turing machines, then among the 3 million or so of them, there are a few that don’t just show simple behavior any more—and instead generate immense complexity even from their very simple rules.

Back in the early 1960s, even though he didn’t find complexity just by searching simple “naturally occurring” Turing machines, Marvin still wanted to construct the simplest one he could that would exhibit it. And through painstaking work, he came up in 1962 with a (7,4) Turing machine that he proved was universal (and so, in a sense, capable of arbitrarily complex behavior).

At the time, Marvin’s (7,4) Turing machine was the simplest known universal Turing machine. And it kept that record essentially unbroken for 40 years—until I finally published a (2,5) universal Turing machine in *A New Kind of Science*. I felt a little guilty taking the record away from Marvin’s machine after so long. But Marvin was very nice about it. And a few years later he enthusiastically agreed to be on the committee for a prize I put up to establish whether a (2,3) Turing machine that I had identified as the simplest possible candidate for universality was in fact universal.

It didn’t take long for a proof of universality to be submitted, and Marvin got quite involved in some of the technical details of validating it, noting that perhaps we should all have known something like this was possible, given the complexity that Emil Post had observed with the simple rules of what he called a tag system—back in 1921, before Marvin was even born.

When it came to science, it sometimes seemed as if there were two Marvins. One was the Marvin trained in mathematics who could give precise proofs of theorems. The other was the Marvin who talked about big and often quirky ideas far away from anything like mathematical formalization.

I think Marvin was ultimately disappointed with what could be achieved by mathematics and formalization. In his early years he had thought that with simple artificial neural networks—and maybe things like Turing machines—it would be easy to build systems that worked like brains. But it never seemed to happen. And in 1969, with his long-time mathematician collaborator Seymour Papert, Marvin wrote a book that proved that a certain simple class of neural networks known as perceptrons couldn’t (in Marvin’s words) “do anything interesting”.

To Marvin’s later chagrin, people took the book to show that no neural network of any kind could ever do anything interesting, and research on neural networks all but stopped. But a bit like with the (2,2) Turing machines, much richer behavior was actually lurking just out of sight. It started being noticed in the 1980s, but it’s only been in the last couple of years—with computers able to handle almost-brain-scale networks—that the richness of what neural networks can do has begun to become clear.

And although I don’t think anyone could have known it then, we now know that the neural networks Marvin was investigating as early as 1951 were actually on a path that would ultimately lead to just the kind of impressive AI capabilities he was hoping for. It’s a pity it took so long, and Marvin barely got to see it. (When we released our neural-network-based image identifier last year, I sent Marvin a pointer saying “I never thought neural networks would actually work… but…” Sadly, I never ended up talking to Marvin about it.)

Marvin’s earliest approaches to AI were through things like neural networks. But perhaps through the influence of John McCarthy, the inventor of LISP, with whom Marvin started the MIT AI Lab, Marvin began to consider more “symbolic” approaches to AI as well. And in 1961 Marvin got a student of his to write a program in LISP to do symbolic integration. Marvin told me that he wanted the program to be as “human like” as possible—so every so often it would stop and say “Give me a cookie”, and the user would have to respond “A cookie”.

By the standards of Mathematica or Wolfram|Alpha, the 1961 integration program was very primitive. But I’m certainly glad Marvin had it built. Because it started a sequence of projects at MIT that led to the MACSYMA system that I ended up using in the 1970s—that in many ways launched my efforts on SMP and eventually Mathematica.

Marvin himself, though, didn’t go on thinking about using computers to do mathematics, but instead started working on how they might do the kind of tasks that all humans—including children—routinely do. Marvin’s collaborator Seymour Papert, who had worked with developmental psychologist Jean Piaget, was interested in how children learn, and Marvin got quite involved in Seymour’s project of developing a computer language for children. The result was Logo—a direct precursor of Scratch—and for a brief while in the 1970s Marvin and Seymour had a company that tried to market Logo and a hardware “turtle” to schools.

For me there was always a certain mystique around Marvin’s theories about AI. In some ways they seemed like psychology, and in some ways philosophy. But occasionally there’d actually be pieces of software—or hardware—that claimed to implement them, often in ways that I didn’t understand very well.

Probably the most spectacular example was the Connection Machine, developed by Marvin’s student Danny Hillis and his company Thinking Machines (for which Richard Feynman and I were both consultants). It was always in the air that the Connection Machine was built to implement one of Marvin’s theories about the brain, and might be seen one day as like the “transistor of artificial intelligence”. But I, for example, ended up using its massively parallel architecture to implement cellular automaton models of fluids, and not anything AI-ish at all.

Marvin was always having new ideas and theories. And even as the Connection Machine was being built, he was giving me drafts of his book *The Society of Mind*, which talked about new and different approaches to AI. Ever one to do the unusual, Marvin told me he thought about writing the book in verse. But instead the book is structured a bit like so many conversations I had with Marvin: with one idea on each page, often good, but sometimes not—yet always lively.

I think Marvin viewed *The Society of Mind* as his magnum opus, and I think he was disappointed that more people didn’t understand and appreciate it. It probably didn’t help that the book came out in the 1980s, when AI was at its lowest ebb. But somehow I think to really appreciate what’s in the book one would need Marvin there, presenting his ideas with his characteristic personal energy and responding to any objections one might have about them.

Marvin was used to having theories about thinking that could be figured out just by thinking—a bit like the ancient philosophers had done. But Marvin was interested in everything, including physics. He wasn’t an expert on the formalism of physics, though he did make contributions to physics topics (notably patenting a confocal microscope). And through his long-time friend Ed Fredkin, he had already been introduced to cellular automata in the early 1960s. He really liked the philosophy of having physics based on them—and ended up for example writing a paper entitled “Nature Abhors an Empty Vacuum” that talked about how one might in effect engineer certain features of physics from cellular automata.

Marvin didn’t do terribly much with cellular automata, though in 1970 he and Fredkin used something like them in the Triadex Muse digital music synthesizer that they patented and marketed—an early precursor of cellular-automaton-based music composition.

Marvin was very supportive of my work on cellular automata and other simple programs, though I think he found my orientation towards natural science a bit alien. During the decade that I worked on *A New Kind of Science* I interacted with Marvin with some regularity. He was starting work on a book then too, about emotions, that he told me in 1992 he hoped “might reform how people think about themselves”. I talked to him occasionally about his book, trying I suppose to understand the epistemological character of it (I once asked if it was a bit like Freud in this respect, and he said yes). It took 15 years for Marvin to finish what became *The Emotion Machine*. I know he had other books planned too; in 2006, for example, he told me he was working on a book on theology that was “a couple of years away”—but which sadly never saw the light of day.

It was always a pleasure to see Marvin. Often it would be at his big house in Brookline, Massachusetts. As soon as one entered, Marvin would start saying something unusual. It could be, “What would we conclude if the sun didn’t set today?” Or, “You’ve got to come see the actual binary tree in my greenhouse.” Once someone told me that Marvin could give a talk about almost anything, but if one wanted it to be good, one should ask him an interesting question just before he started, and then that’d be what he would talk about. I realized this was how to handle conversations with Marvin too: bring up a topic and then he could be counted on to say something unusual and often interesting about it.

I remember a few years ago bringing up the topic of teaching programming, and how I was hoping the Wolfram Language would be relevant to it. Marvin immediately launched into talking about how programming languages are the only ones that people are expected to learn to write before they can read. He said he’d been trying to convince Seymour Papert that the best way to teach programming was to start by showing people good code. He gave the example of teaching music by giving people *Eine kleine Nachtmusik*, and asking them to transpose it to a different rhythm and see what bugs occur. (Marvin was a long-time enthusiast of classical music.) In just this vein, one way the Wolfram Programming Lab that we launched just last week lets people learn programming is by starting with good code, and then having them modify it.

There was always a certain warmth to Marvin. He liked and supported people; he connected with all sorts of interesting people; he enjoyed telling nice stories about people. His house always seemed to buzz with activity, even as, over the years, it piled up with stuff to the point where the only free space was a tiny part of a kitchen table.

Marvin also had a great love of ideas. Ones that seemed important. Ones that were strange and unusual. But I think in the end Marvin’s greatest pleasure was in connecting ideas with people. He was a hacker of ideas, but I think the ideas became meaningful to him when he used them as a way to connect with people.

I shall miss all those conversations about ideas—both ones I thought made sense and ones I thought didn’t. Of course, Marvin was always a great enthusiast of cryonics, so perhaps this isn’t the end of the story. But at least for now, farewell, Marvin, and thank you.

]]>

I’ve long wanted to have a way to let anybody—kids, adults, whoever—get a hands-on introduction to the Wolfram Language and everything it makes possible, even if they’ve had no experience with programming before. Now we have a way!

The startup screen gives four places to go. First, there’s a quick video. Then it’s hands on, with “Try It Yourself”—going through some very simple but interesting computations.

Then there are two different paths. Either start learning systematically—or jump right in and explore. My new book *An Elementary Introduction to the Wolfram Language* is the basis for the systematic approach.

The whole book is available inside Wolfram Programming Lab. And the idea is that as you read the book, you can immediately try things out for yourself—whether you’re making up your own computations, or doing the exercises given in the book.

But there’s also another way to use Wolfram Programming Lab: just jump right in and explore. Programming Lab comes with several dozen Explorations—each providing an activity with a different focus. When you open an Exploration, you see a series of steps with code ready to run.

Press Shift+Enter (or the button) to run each piece of code and see what it does—or edit the code first and then run your own version. The idea is always to start with a piece of code that works, and then modify it to do different things. It’s like you’re starting off learning to read the language; then you’re beginning to write it. You can always press the “Show Details” button to open up an explanation of what’s going on.

Each Exploration goes through a series of steps to build to a final result. But then there’s usually a “Go Further” button that gives you suggestions for free-form projects to do based on the Exploration.

When you create something neat, you can share it with your friends, teachers, or anyone else. Just press the button to create a webpage of what you’ve made.

I first started thinking about making something like Wolfram Programming Lab quite a while ago. I’d had lots of great experiences showing the Wolfram Language in person to people from middle-school-age on up. But I wanted us to find a way for people to get started with the Wolfram Language on their own.

We used our education expertise to put together a whole series of what seemed like good approaches, building prototypes and testing them with groups of kids. It was often a sobering experience—with utter failure in a matter of minutes. Sometimes the problem was that there was nothing the kids found interesting. Sometimes the kids were confused about what to do. Sometimes they’d do a little, but clearly not understand what they were doing.

At first we thought that it was just a matter of finding the one “right approach”: immersion language learning, systematic exercise-based learning, project-based learning, or something else. But gradually we realized we needed to allow not just one approach, but instead several that could be used interchangeably on different occasions or by different people. And once we did this, our tests started to be more and more successful—leading us in the end to the Wolfram Programming Lab that we have today.

I’m very excited about the potential of Wolfram Programming Lab. In fact, we’ve already started developing a whole ecosystem around it—with online and offline educational and community programs, lots of opportunities for students, educators, volunteers and others, and a whole variety of additional deployment channels.

Wolfram Programming Lab can be used by people on their own—but it can also be used by teachers in classrooms. Explain things through a demo based on an Exploration. Do a project based on a Go Further suggestion (with live coding if you’re bold). Use the *Elementary Introduction* book as the basis for lectures or independent reading. Use exercises from the book as class projects or homework.

Wolfram Programming Lab is something that’s uniquely made possible by the Wolfram Language. Because it’s only with the whole knowledge-based programming approach—and all the technology we’ve built—that one gets to the point where simple code can routinely do really interesting and compelling things.

It’s a very important—and in fact transformative—moment for programming education.

In the past one could use a “toy programming language” like Scratch, or one could use a professional low-level programming language like C++ or Java. Scratch is easy to use, but is very limited. C++ or Java can ultimately do much more (though they don’t have built-in knowledge), but you need to put in significant time—and get deep into the engineering details—to make programs that get beyond a toy level of functionality.

With the Wolfram Language, though, it’s a completely different story. Because now even beginners can write programs that do really interesting things. And the programs don’t have to just be “computer science exercises”: they can be programs that immediately connect to the real world, and to what students study across the whole curriculum.

Wolfram Programming Lab gives people a broad way to learn modern programming—and to acquire an incredibly valuable career-building practical skill. But it also helps develop the kind of computational thinking that’s increasingly central to today’s world.

For many students (and others) today, Wolfram|Alpha serves as a kind of “zeroth” programming language. The Wolfram Language is not only an incredibly powerful professional programming language, but also a great first programming language. Wolfram Programming Lab lets people learn the Wolfram Language—and computational thinking—while preserving as much as possible the accessibility and simplicity of Wolfram|Alpha.

I’m excited to see how Wolfram Programming Lab is used. I think it’s going to open up programming like never before—and give all sorts of people around the world the opportunity to join the new generation of programmers who turn ideas into reality using computational thinking and the Wolfram Language.

*To comment, please visit the copy of this post at the Wolfram Blog »*

Ada Lovelace was born 200 years ago today. To some she is a great hero in the history of computing; to others an overestimated minor figure. I’ve been curious for a long time what the real story is. And in preparation for her bicentennial, I decided to try to solve what for me has always been the “mystery of Ada”.

It was much harder than I expected. Historians disagree. The personalities in the story are hard to read. The technology is difficult to understand. The whole story is entwined with the customs of 19th-century British high society. And there’s a surprising amount of misinformation and misinterpretation out there.

But after quite a bit of research—including going to see many original documents—I feel like I’ve finally gotten to know Ada Lovelace, and gotten a grasp on her story. In some ways it’s an ennobling and inspiring story; in some ways it’s frustrating and tragic.

It’s a complex story, and to understand it, we’ll have to start by going over quite a lot of facts and narrative.

Let’s begin at the beginning. Ada Byron, as she was then called, was born in London on December 10, 1815 to recently married high-society parents. Her father, Lord Byron (George Gordon Byron) was 27 years old, and had just achieved rock-star status in England for his poetry. Her mother, Annabella Milbanke, was a 23-year-old heiress committed to progressive causes, who inherited the title Baroness Wentworth. Her father said he gave her the name “Ada” because “It is short, ancient, vocalic”.

Ada’s parents were something of a study in opposites. Byron had a wild life—and became perhaps the top “bad boy” of the 19th century—with dark episodes in childhood, and lots of later romantic and other excesses. In addition to writing poetry and flouting the social norms of his time, he was often doing the unusual: keeping a tame bear in his college rooms in Cambridge, living it up with poets in Italy and “five peacocks on the grand staircase”, writing a grammar book of Armenian, and—had he not died too soon—leading troops in the Greek war of independence (as celebrated by a big statue in Athens), despite having no military training whatsoever.

Annabella Milbanke was an educated, religious and rather proper woman, interested in reform and good works, and nicknamed by Byron “Princess of Parallelograms”. Her very brief marriage to Byron fell apart when Ada was just 5 weeks old, and Ada never saw Byron again (though he kept a picture of her on his desk and famously mentioned her in his poetry). He died at the age of 36, at the height of his celebrityhood, when Ada was 8. There was enough scandal around him to fuel hundreds of books, and the PR battle between the supporters of Lady Byron (as Ada’s mother styled herself) and of him lasted a century or more.

Ada led an isolated childhood on her mother’s rented country estates, with governesses and tutors and her pet cat, Mrs. Puff. Her mother, often absent for various (quite wacky) health cures, enforced a system of education for Ada that involved long hours of study and exercises in self control. Ada learned history, literature, languages, geography, music, chemistry, sewing, shorthand and mathematics (taught in part through experiential methods) to the level of elementary geometry and algebra. When Ada was 11, she went with her mother and an entourage on a year-long tour of Europe. When she returned she was enthusiastically doing things like studying what she called “flyology”—and imagining how to mimic bird flight with steam-powered machines.

But then she got sick with measles (and perhaps encephalitis)—and ended up bedridden and in poor health for 3 years. She finally recovered in time to follow the custom for society girls of the period: on turning 17 she went to London for a season of socializing. On June 5, 1833, 26 days after she was “presented at Court” (i.e. met the king), she went to a party at the house of 41-year-old Charles Babbage (whose oldest son was the same age as Ada). Apparently she charmed the host, and he invited her and her mother to come back for a demonstration of his newly constructed Difference Engine: a 2-foot-high hand-cranked contraption with 2000 brass parts, now to be seen at the Science Museum in London:

Ada’s mother called it a “thinking machine”, and reported that it “raised several Nos. to the 2nd & 3rd powers, and extracted the root of a Quadratic Equation”. It would change the course of Ada’s life.

What was the story of Charles Babbage? His father was an enterprising and successful (if personally distant) goldsmith and banker. After various schools and tutors, Babbage went to Cambridge to study mathematics, but soon was intent on modernizing the way mathematics was done there, and with his lifelong friends John Herschel (son of the discoverer of Uranus) and George Peacock (later a pioneer in abstract algebra), founded the Analytical Society (which later became the Cambridge Philosophical Society) to push for reforms like replacing Newton’s (“British”) dot-based notation for calculus with Leibniz’s (“Continental”) function-based one.

Babbage graduated from Cambridge in 1814 (a year before Ada Lovelace was born), went to live in London with his new wife, and started establishing himself on the London scientific and social scene. He didn’t have a job as such, but gave public lectures on astronomy and wrote respectable if unspectacular papers about various mathematical topics (functional equations, continued products, number theory, etc.)—and was supported, if modestly, by his father and his wife’s family.

In 1819 Babbage visited France, and learned about the large-scale government project there to make logarithm and trigonometry tables. Mathematical tables were of considerable military and commercial significance in those days, being used across science, engineering and finance, as well as in areas like navigation. It was often claimed that errors in tables could make ships run aground or bridges collapse.

Back in England, Babbage and Herschel started a project to produce tables for their new Astronomical Society, and it was in the effort to check these tables that Babbage is said to have exclaimed, “I wish to God these tables had been made by steam!”—and began his lifelong effort to mechanize the production of tables.

There were mechanical calculators long before Babbage. Pascal made one in 1642, and we now know there was even one in antiquity. But in Babbage’s day such machines were still just curiosities, not reliable enough for everyday practical use. Tables were made by human computers, with the work divided across a team, and the lowest-level computations being based on evaluating polynomials (say from series expansions) using the method of differences.

What Babbage imagined is that there could be a machine—a Difference Engine—that could be set up to compute any polynomial up to a certain degree using the method of differences, and then automatically step through values and print the results, taking humans and their propensity for errors entirely out of the loop.

(Museum of the History of Science)

By early 1822, the 30-year-old Babbage was busy studying different types of machinery, and producing plans and prototypes of what the Difference Engine could be. The Astronomical Society he’d co-founded awarded him a medal for the idea, and in 1823 the British government agreed to provide funding for the construction of such an engine.

Babbage was slightly distracted in 1824 by the prospect of joining a life insurance startup, for which he did a collection of life-table calculations. But he set up a workshop in his stable (his “garage”), and kept on having ideas about the Difference Engine and how its components could be made with the tools of his time.

In 1827, Babbage’s table of logarithms—computed by hand—was finally finished, and would be reprinted for nearly 100 years. Babbage had them printed on yellow paper on the theory that this would minimize user error. (When I was in elementary school, logarithm tables were still the fast way to do multiplication.)

Also in 1827, Babbage’s father died, leaving him about £100k, or perhaps $14 million today, setting up Babbage financially for the rest of his life. The same year, though, his wife died. She had had eight children with him, but only three survived to adulthood.

Dispirited by his wife’s death, Babbage took a trip to continental Europe, and being impressed by what he saw of the science being done there, wrote a book entitled *Reflections on the Decline of Science in England*, that ended up being mainly a diatribe against the Royal Society (of which he was a member).

Though often distracted, Babbage continued to work on the Difference Engine, generating thousands of pages of notes and designs. He was quite hands on when it came to personally drafting plans or doing machine-shop experiments. But he was quite hands off in managing the engineers he hired—-and he did not do well at managing costs. Still, by 1832 a working prototype of a small Difference Engine (without a printer) had successfully been completed. And this is what Ada Lovelace saw in June 1833.

(Science Museum / Science & Society Picture Library)

Ada’s encounter with the Difference Engine seems to be what ignited her interest in mathematics. She had gotten to know Mary Somerville, translator of Laplace and a well-known expositor of science—and partly with her encouragement, was soon, for example, enthusiastically studying Euclid. And in 1834, Ada went along on a philanthropic tour of mills in the north of England that her mother was doing, and was quite taken with the then-high-tech equipment they had.

On the way back, Ada taught some mathematics to the daughters of one of her mother’s friends. She continued by mail, noting that this could be “the commencement of ‘A Sentimental Mathematical Correspondence carried on for years between two ladies of rank’ to be hereafter published no doubt for the edification of mankind, or womankind”. It wasn’t sophisticated math, but what Ada said was clear, complete with admonitions like “You should never select an *indirect* proof, when a *direct* one can be given.” (There’s a lot of underlining, here shown as italics, in all Ada’s handwritten correspondence.)

Babbage seems at first to have underestimated Ada, trying to interest her in the Silver Lady automaton toy that he used as a conversation piece for his parties (and noting his addition of a turban to it). But Ada continued to interact with (as she put it) Mr. Babbage and Mrs. Somerville, both separately and together. And soon Babbage was opening up to her about many intellectual topics, as well as about the trouble he was having with the government over funding of the Difference Engine.

In the spring of 1835, when Ada was 19, she met 30-year-old William King (or, more accurately, William, Lord King). He was a friend of Mary Somerville’s son, had been educated at Eton (the same school where I went 150 years later) and Cambridge, and then had been a civil servant, most recently at an outpost of the British Empire in the Greek islands. William seems to have been a precise, conscientious and decent man, if somewhat stiff. But in any case, Ada and he hit it off, and they were married on July 8, 1835, with Ada keeping the news quiet until the last minute to avoid paparazzi-like coverage.

The next several years of Ada’s life seem to have been dominated by having three children and managing a large household—though she had some time for horse riding, learning the harp, and mathematics (including topics like spherical trigonometry). In 1837, Queen Victoria (then 18) came to the throne, and as a member of high society, Ada met her. In 1838, William was made an earl for his government work, and Ada become the Countess of Lovelace.

Within a few months of the birth of her third child in 1839, Ada decided to get more serious about mathematics again. She told Babbage she wanted to find a “mathematical Instructor” in London, though asked that in making enquiries he not mention her name, presumably for fear of society gossip.

The person identified was Augustus de Morgan, first professor of mathematics at University College London, noted logician, author of several textbooks, and not only a friend of Babbage’s, but also the husband of the daughter of Ada’s mother’s main childhood teacher. (Yes, it was a small world. De Morgan was also a friend of George Boole’s—and was the person who indirectly caused Boolean algebra to be invented.)

In Ada’s correspondence with Babbage, she showed interest in discrete mathematics, and wondered, for example, if the game of solitaire “admits of being put into a mathematical Formula, and solved”. But in keeping with the math education traditions of the time (and still today), de Morgan set Ada on studying calculus.

Her letters to de Morgan about calculus are not unlike letters from a calculus student today—except for the Victorian English. Even many of the confusions are the same—though Ada was more sensitive than some to the bad notations of calculus (“why can’t one multiply by dx?”, etc.). Ada was a tenacious student, and seemed to have had a great time learning more and more about mathematics. She was pleased by the mathematical abilities she discovered in herself, and by de Morgan’s positive feedback about them. She also continued to interact with Babbage, and on one visit to her estate (in January 1841, when she was 25), she charmingly told the then-49-year-old Babbage, “If you are a *Skater*, pray bring *Skates* to Ockham; that being the fashionable occupation here now, & one *I* have much taken to.”

Ada’s relationship with her mother was a complex one. Outwardly, Ada treated her mother with great respect. But in many ways she seems to have found her controlling and manipulative. Ada’s mother was constantly announcing that she had medical problems and might die imminently (she actually lived to age 64). And she increasingly criticized Ada for her child rearing, household management and deportment in society. But by February 6, 1841, Ada was feeling good enough about herself and her mathematics to write a very open letter to her mother about her thoughts and aspirations.

She wrote: “I believe myself to possess a most singular combination of qualities exactly fitted to make me pre-eminently a discoverer of the hidden realities of nature.” She talked of her ambition to do great things. She talked of her “insatiable & restless energy” which she believed she finally had found a purpose for. And she talked about how after 25 years she had become less “secretive & suspicious” with respect to her mother.

But then, three weeks later, her mother dropped a bombshell, claiming that before Ada was born, Byron and his half-sister had had a child together. Incest like that wasn’t actually illegal in England at the time, but it was scandalous. Ada took the whole thing very hard, and it derailed her from mathematics.

Ada had had intermittent health problems for years, but in 1841 they apparently worsened, and she started systematically taking opiates. She was very keen to excel in something, and began to get the idea that perhaps it should be music and literature rather than math. But her husband William seems to have talked her out of this, and by late 1842 she was back to doing mathematics.

What had Babbage been up to while all this had been going on? He’d been doing all sorts of things, with varying degrees of success.

After several attempts, he’d rather honorifically been appointed Lucasian Professor of Mathematics at Cambridge—but never really even spent time in Cambridge. Still, he wrote what turned out to be a fairly influential book, *On the Economy of Machinery and Manufactures*, dealing with such things as how to break up tasks in factories (an issue that had actually come up in connection with the human computation of mathematical tables).

In 1837, he weighed in on the then-popular subject of natural theology, appending his *Ninth Bridgewater Treatise* to the series of treatises written by other people. The central question was whether there is evidence of a deity from the apparent design seen in nature. Babbage’s book is quite hard to read, opening for example with, “The notions we acquire of contrivance and design arise from comparing our observations on the works of other beings with the intentions of which we are conscious in our own undertakings.”

In apparent resonance with some of my own work 150 years later, he talks about the relationship between mechanical processes, natural laws and free will. He makes statements like “computations of great complexity can be effected by mechanical means”, but then goes on to claim (with rather weak examples) that a mechanical engine can produce sequences of numbers that show unexpected changes that are like miracles.

Babbage tried his hand at politics, running for parliament twice on a manufacturing-oriented platform, but failed to get elected, partly because of claims of misuse of government funds on the Difference Engine.

Babbage also continued to have upscale parties at his large and increasingly disorganized house in London, attracting such luminaries as Charles Dickens, Charles Darwin, Florence Nightingale, Michael Faraday and the Duke of Wellington—with his aged mother regularly in attendance. But even though the degrees and honors that he listed after his name ran to 6 lines, he was increasingly bitter about his perceived lack of recognition.

Central to this was what had happened with the Difference Engine. Babbage had hired one of the leading engineers of his day to actually build the engine. But somehow, after a decade of work—and despite lots of precision machine tool development—the actual engine wasn’t done. Back in 1833, shortly after he met Ada, Babbage had tried to rein in the project—but the result was that his engineer quit, and insisted that he got to keep all the plans for the Difference Engine, even the ones that Babbage himself had drawn.

But right around this time, Babbage decided he’d had a better idea anyway. Instead of making a machine that would just compute differences, he imagined an “Analytical Engine” that supported a whole list of possible kinds of operations, that could in effect be done in an arbitrarily programmed sequence. At first, he just thought about having the machine evaluate fixed formulas, but as he studied different use cases, he added other capabilities, like conditionals—and figured out often very clever ways to implement them mechanically. But, most important, he figured out how to control the steps in a computation using punched cards of the kind that had been invented in 1801 by Jacquard for specifying patterns of weaving on looms.

(Museum of the History of Science)

Babbage created some immensely complicated designs, and today it seems remarkable that they could work. But back in 1826 Babbage had invented something he called Mechanical Notation—that was intended to provide a symbolic representation for the operation of machinery in the same kind of way that mathematical notation provides a symbolic representation for operations in mathematics.

Babbage was disappointed already in 1826 that people didn’t appreciate his invention. Undoubtedly people didn’t understand it, since even now it’s not clear how it worked. But it may have been Babbage’s greatest invention—because apparently it’s what let him figure out all his elaborate designs.

Babbage’s original Difference Engine project had cost the British government £17,500 or the equivalent of perhaps $2 million today. It was a modest sum relative to other government expenditures, but the project was unusual enough to lead to a fair amount of discussion. Babbage was fond of emphasizing that—unlike many of his contemporaries—he hadn’t taken government money himself (despite chargebacks for renovating his stable as a fireproof workshop, etc.). He also claimed that he eventually spent £20,000 of his own money—or the majority of his fortune (no, I don’t see how the numbers add up)—on his various projects. And he kept on trying to get further government support, and created plans for a Difference Engine No. 2, requiring only 8000 parts instead of 25,000.

By 1842, the government had changed, and Babbage insisted on meeting with the new prime minister (Robert Peel), but ended up just berating him. In parliament the idea of funding the Difference Engine was finally killed with quips like that the machine should be set to compute when it would be of use.** **(The transcripts of debates about the Difference Engine have a certain charm—especially when they discuss its possible uses for state statistics that strangely parallel computable-country opportunities with Wolfram|Alpha today.)

Despite the lack of support in England, Babbage’s ideas developed some popularity elsewhere, and in 1840 Babbage was invited to lecture on the Analytical Engine in Turin, and given honors by the Italian government.

Babbage had never published a serious account of the Difference Engine, and had never published anything at all about the Analytical Engine. But he talked about the Analytical Engine in Turin, and notes were taken by a certain Luigi Menabrea, who was then a 30-year-old army engineer—but who, 27 years later, became prime minister of Italy (and also made contributions to the mathematics of structural analysis).

In October 1842, Menabrea published a paper in French based on his notes. When Ada saw the paper, she decided to translate it into English and submit it to a British publication. Many years later Babbage claimed he suggested to Ada that she write her own account of the Analytical Engine, and that she had responded that the thought hadn’t occurred to her. But in any case, by February 1843, Ada had resolved to do the translation but add extensive notes of her own.

Over the months that followed she worked very hard—often exchanging letters almost daily with Babbage (despite sometimes having other “pressing and unavoidable engagements”). And though in those days letters were sent by post (which did come 6 times a day in London at the time) or carried by a servant (Ada lived about a mile from Babbage when she was in London), they read a lot like emails about a project might today, apart from being in Victorian English. Ada asks Babbage questions; he responds; she figures things out; he comments on them. She was clearly in charge, but felt she was first and foremost explaining Babbage’s work, so wanted to check things with him—though she got annoyed when Babbage, for example, tried to make his own corrections to her manuscript.

It’s charming to read Ada’s letter as she works on debugging her computation of Bernoulli numbers: “My Dear Babbage. I am in much dismay at having got into so amazing a quagmire & botheration with these *Numbers*, that I cannot possibly get the thing done today. …. I am now going out on horseback. Tant mieux.” Later she told Babbage: “I have worked incessantly, & most successfully, all day. You will admire the Table & Diagram extremely. They have been made out with extreme care, & all the indices most minutely & scrupulously attended to.” Then she added that William (or “Lord L.” as she referred to him) “is at this moment kindly inking it all over for me. I had to do it in pencil…”

William was also apparently the one who suggested that she sign the translation and notes. As she wrote to Babbage: “It is not my wish to *proclaim* who has written it; at the same time I rather wish to append anything that may tend hereafter to *individualize*, & *identify* it, with the other productions of the said A.A.L.” (for “Ada Augusta Lovelace”).

By the end of July 1843, Ada had pretty much finished writing her notes. She was proud of them, and Babbage was complimentary about them. But Babbage wanted one more thing: he wanted to add an anonymous preface (written by him) that explained how the British government had failed to support the project. Ada thought it a bad idea. Babbage tried to insist, even suggesting that without the preface the whole publication should be withdrawn. Ada was furious, and told Babbage so. In the end, Ada’s translation appeared, signed “AAL”, without the preface, followed by her notes headed “Translator’s Note”.

Ada was clearly excited about it, sending reprints to her mother, and explaining that “No one can estimate the trouble & *interminable* labour of having to revise the printing of *mathematical* formulae. This is a pleasant prospect for the future, as I suppose many hundreds & thousands of such formulae will come forth from my pen, in one way or another.” She said that her husband William had been excitedly giving away copies to his friends too, and Ada wrote, “William especially conceives that it places me in a much *juster* & *truer* position & light, than anything else can. And he tells me that it has already placed *him* in a far more agreeable position in this country.”

Within days, there was also apparently society gossip about Ada’s publication. She explained to her mother that she and William “are by no means desirous of making it a secret, altho’ I do not wish the *importance* of the thing to be exaggerated and overrated”. She saw herself as being a successful expositor and interpreter of Babbage’s work, setting it in a broader conceptual framework that she hoped could be built on.

There’s lots to say about the actual content of Ada’s notes. But before we get to that, let’s finish the story of Ada herself.

While Babbage’s preface wasn’t itself a great idea, one good thing it did for posterity was to cause Ada on August 14, 1843 to write Babbage a fascinating, and very forthright, 16-page letter. (Unlike her usual letters, which were on little folded pages, this was on large sheets.) In it, she explains that while he is often “implicit” in what he says, she is herself “always a very ‘explicit function of x’”. She says that “Your affairs have been, & are, deeply occupying both myself and Lord Lovelace…. And the result is that I have plans for you…” Then she proceeds to ask, “If I am to lay before you in the course of a year or two, explicit & honorable propositions for *executing your engine* … would there be any chance of allowing myself … to conduct the business for you; your own *undivided* energies being devoted to the execution of the work …”

In other words, she basically proposed to take on the role of CEO, with Babbage becoming CTO. It wasn’t an easy pitch to make, especially given Babbage’s personality. But she was skillful in making her case, and as part of it, she discussed their different motivation structures. She wrote, “My own uncompromising principle is to endeavour to love *truth & God before fame & glory* …”, while “Yours is to love truth & God … but to love *fame, glory, honours, yet more*.” Still, she explained, “Far be it from me, to disclaim the influence of ambition & fame. No living soul ever was more imbued with it than myself … but I certainly would not deceive myself or others by pretending it is other than a very important motive & ingredient in my character & nature.”

She ended the letter, “I wonder if you will choose to retain the lady-fairy in your service or not.”

At noon the next day she wrote to Babbage again, asking if he would help in “the *final* revision”. Then she added, “You will have had my long letter this morning. Perhaps you will not choose to have anything more to do with me. But I hope the best…”

At 5 pm that day, Ada was in London, and wrote to her mother: “I am uncertain as yet how the Babbage business will end…. I have written to him … very explicitly; stating my own *conditions* … He has so strong an idea of the *advantage* of having *my* pen as his servant, that he will probably yield; though I demand very strong concessions. If he *does* consent to what I propose, I shall probably be enabled to keep him out of much hot water; & to bring his engine to *consummation*, (which all I have seen of him & his habits the last 3 months, makes me scarcely anticipate it ever *will* be, unless someone really exercises a strong coercive influence over him). He is beyond measure *careless* & *desultory* at times. — I shall be willing to be his Whipper-in during the next 3 years if I see fair prospect of success.”

But on Babbage’s copy of Ada’s letter, he scribbled, “Saw A.A.L. this morning and refused all the conditions”.

Yet on August 18, Babbage wrote to Ada about bringing drawings and papers when he would next come to visit her. The next week, Ada wrote to Babbage that “We are quite delighted at your (somewhat *unhoped* for) proposal” [of a long visit with Ada and her husband]. And Ada wrote to her mother: “Babbage & I are I think more friends than ever. I have never seen him so agreeable, so reasonable, or in such good spirits!”

Then, on Sept. 9, Babbage wrote to Ada, expressing his admiration for her and (famously) describing her as “Enchantress of Number” and “my dear and much admired Interpreter”. (Yes, despite what’s often quoted, he wrote “Number” not “Numbers”.)

The next day, Ada responded to Babbage, “You are a brave man to give yourself wholly up to Fairy-Guidance!”, and Babbage signed off on his next letter as “Your faithful Slave”. And Ada described herself to her mother as serving as the “High-Priestess of Babbage’s Engine”.

But unfortunately that’s not how things worked out. For a while it was just that Ada had to take care of household and family things that she’d neglected while concentrating on her Notes. But then her health collapsed, and she spent many months going between doctors and various “cures” (her mother suggested “mesmerism”, i.e. hypnosis), all the while watching their effects on, as she put it, “that portion of the material forces of the world entitled the body of A.A.L.”

She was still excited about science, though. She’d interacted with Michael Faraday, who apparently referred to her as “the *rising star* of Science”. She talked about her first publication as her “first-born”, “with a colouring & undercurrent (rather *hinted at* & *suggested* than definitely expressed) of *large, general, & metaphysical views*”, and said that “He [the publication] will make an excellent head (I hope) of a large family of brothers & sisters”.

When her notes were published, Babbage had said “You should have written an original paper. The postponement of that will however only render it more perfect.” But by October 1844, it seemed that David Brewster (inventor of the kaleidoscope, among other things) would write about the Analytical Engine, and Ada asked if perhaps Brewster could suggest another topic for her, saying “I rather think some physiological topics would suit me as well as any.”

And indeed later that year, she wrote to a friend (who was also her lawyer, as well as being Mary Somerville’s son): “It does not appear to me that cerebral matter need be more unmanageable to mathematicians than *sidereal* & *planetary* matter & movements; if they would but inspect it from the *right point of view*. I hope to bequeath to the generations a *Calculus of the Nervous System*.” An impressive vision—10 years before, for example, George Boole would talk about similar things.

Both Babbage and Mary Somerville had started their scientific publishing careers with translations, and she saw herself as doing the same, saying that perhaps her next works would be reviews of Whewell and Ohm, and that she might eventually become a general “prophet of science”.

There were roadblocks, to be sure. Like that, at that time, as a woman, she couldn’t get access to the Royal Society’s library in London, even though her husband, partly through her efforts, was a member of the society. But the most serious issue was still Ada’s health. She had a whole series of problems, though in 1846 she was still saying optimistically “Nothing is needed but a year or two more of patience & *cure*”.

There were also problems with money. William had a never-ending series of elaborate—and often quite innovative—construction projects (he seems to have been particularly keen on towers and tunnels). And to finance them, they had to turn to Ada’s mother, who often made things difficult. Ada’s children were also approaching teenage-hood, and Ada was exercised by many issues that were coming up with them.

Meanwhile, she continued to have a good social relationship with Babbage, seeing him with considerable frequency, though in her letters talking more about dogs and pet parrots than the Analytical Engine. In 1848 Babbage developed a hare-brained scheme to construct an engine that played tic-tac-toe, and to tour it around the country as a way to raise money for his projects. Ada talked him out of it. The idea was raised for Babbage to meet Prince Albert to discuss his engines, but it never happened.

William also dipped his toe into publishing. He had already written short reports with titles like “Method of growing Beans and Cabbages on the same Ground” and “On the Culture of Mangold-Wurzel”. But in 1848 he wrote one more substantial piece, comparing the productivity of agriculture in France and England, based on detailed statistics, with observations like “It is demonstrable, not only that the Frenchman is much worse off than the Englishman, but that he is less well fed than during the devastating exhaustion of the empire.”

1850 was a notable year for Ada. She and William moved into a new house in London, intensifying their exposure to the London scientific social scene. She had a highly emotional experience visiting for the first time her father’s family’s former estate in the north of England—and got into an argument with her mother about it. And she got more deeply involved in betting on horseracing, and lost some money doing it. (It wouldn’t have been out of character for Babbage or her to invent some mathematical scheme for betting, but there’s no evidence that they did.)

In May 1851 the Great Exhibition opened at the Crystal Palace in London. (When Ada visited the site back in January, Babbage advised, “Pray put on worsted stockings, cork soles and every other thing which can keep you warm.”) The exhibition was a high point of Victorian science and technology, and Ada, Babbage and their scientific social circle were all involved (though Babbage less so than he thought he should be). Babbage gave out many copies of a flyer on his Mechanical Notation. William won an award for brick-making.

But within a year, Ada’s health situation was dire. For a while her doctors were just telling her to spend more time at the seaside. But eventually they admitted she had cancer (from what we know now, probably cervical cancer). Opium no longer controlled her pain; she experimented with cannabis. By August 1852, she wrote, “I begin to understand Death; which is going on quietly & gradually every minute, & will never be a thing of one particular moment”. And on August 19, she asked Babbage’s friend Charles Dickens to visit and read her an account of death from one of his books.

Her mother moved into her house, keeping other people away from her, and on September 1, Ada made an unknown confession that apparently upset William. She seemed close to death, but she hung on, in great pain, for nearly 3 more months, finally dying on November 27, 1852, at the age of 36. Florence Nightingale, nursing pioneer and friend of Ada’s, wrote: “They said she could not possibly have lived so long, were it not for the tremendous vitality of the brain, that would not die.”

Ada had made Babbage the executor of her will. And—much to her mother’s chagrin—she had herself buried in the Byron family vault next to her father, who, like her, died at age 36 (Ada lived 254 days longer). Her mother built a memorial that included a sonnet entitled “The Rainbow” that Ada wrote.

Ada’s funeral was small; neither her mother nor Babbage attended. But the obituaries were kind, if Victorian in their sentiments:

William outlived her by 41 years, eventually remarrying. Her oldest son—with whom Ada had many difficulties—joined the navy several years before she died, but deserted. Ada thought he might have gone to America (he was apparently in San Francisco in 1851), but in fact he died at 26 working in a shipyard in England. Ada’s daughter married a somewhat wild poet, spent many years in the Middle East, and became the world’s foremost breeder of Arabian horses. Ada’s youngest son inherited the family title, and spent most of his life on the family estate.

Ada’s mother died in 1860, but even then the gossip about her and Byron continued, with books and articles appearing, including Harriet Beecher Stowe’s 1870 *Lady Byron Vindicated*. In 1905, a year before he died, Ada’s youngest son—who had been largely brought up by Ada’s mother—published a book about the whole thing, with such choice lines as “Lord Byron’s life contained nothing of any interest except what ought not to have been told”.

When Ada died, there was a certain air of scandal that seemed to hang around her. Had she had affairs? Had she run up huge gambling debts? There’s scant evidence of either. Perhaps it was a reflection of her father’s “bad boy” image. But before long there were claims that she’d pawned the family jewels (twice!), or lost, some said, £20,000, or maybe even £40,000 (equivalent to about $7 million today) betting on horses.

It didn’t help that Ada’s mother and her youngest son both seemed against her. On September 1, 1852—the same day as her confession to William—Ada had written, “It is my earnest and dying request that all my friends who have letters from me will deliver them to my mother Lady Noel Byron after my death.” Babbage refused. But others complied, and, later on, when her son organized them, he destroyed some.

But many thousands of pages of Ada’s documents still remain, scattered around the world. Back-and-forth letters that read like a modern text stream, setting up meetings, or mentioning colds and other ailments. Charles Babbage complaining about the postal service. Three Greek sisters seeking money from Ada because their dead brother had been a page for Lord Byron. Charles Dickens talking about chamomile tea. Pleasantries from a person Ada met at Paddington Station. And household accounts, with entries for note paper, musicians, and ginger biscuits. And then, mixed in with all the rest, serious intellectual discussion about the Analytical Engine and many other things.

So what happened to Babbage? He lived 18 more years after Ada, dying in 1871. He tried working on the Analytical Engine again in 1856, but made no great progress. He wrote papers with titles like “On the Statistics of Light-Houses”, “Table of the Relative Frequency of Occurrences of the Causes of Breaking Plate-Glass Windows”, and “On Remains of Human Art, mixed with the Bones of Extinct Races of Animals”.

Then in 1864 he published his autobiography, *Passages from the Life of a Philosopher*—a strange and rather bitter document. The chapter on the Analytical Engine opens with a quote from a poem by Byron—“Man wrongs, and Time avenges”—and goes on from there. There are chapters on “Theatrical experience”, “Hints for travellers” (including on advice about how to get an RV-like carriage in Europe), and, perhaps most peculiar, “Street nuisances”. For some reason Babbage waged a campaign against street musicians who he claimed woke him up at 6 am, and caused him to lose a quarter of his productive time. One wonders why he didn’t invent a sound-masking solution, but his campaign was so notable, and so odd, that when he died it was a leading element of his obituary.

Babbage never remarried after his wife died, and his last years seem to have been lonely ones. A gossip column of the time records impressions of him:

Apparently he was fond of saying that he would gladly give up the remainder of his life if he could spend just 3 days 500 years in the future. When he died, his brain was preserved, and is still on display…

Even though Babbage never finished his Difference Engine, a Swedish company did, and even already displayed part of it at the Great Exhibition. When Babbage died, many documents and spare parts from his Difference Engine project passed to his son Major-General Henry Babbage, who published some of the documents, and privately assembled a few more devices, including part of the Mill for the Analytical Engine. Meanwhile, the fragment of the Difference Engine that had been built in Babbage’s time was deposited at the Science Museum in London.

After Babbage died, his life work on his engines was all but forgotten (though did, for example, get a mention in the 1911 Encyclopaedia Britannica). Mechanical computers nevertheless continued to be developed, gradually giving way to electromechanical ones, and eventually to electronic ones. And when programming began to be understood in the 1940s, Babbage’s work—and Ada’s Notes—were rediscovered.

People knew that “AAL” was Ada Augusta Lovelace, and that she was Byron’s daughter. Alan Turing read her Notes, and coined the term “Lady Lovelace’s Objection” (“an AI can’t originate anything”) in his 1950 Turing Test paper. But Ada herself was still largely a footnote at that point.

It was a certain Bertram Bowden—a British nuclear physicist who went into the computer industry and eventually became Minister of Science and Education—who “rediscovered” Ada. In researching his 1953 book *Faster Than Thought* (yes, about computers), he located Ada’s granddaughter Lady Wentworth (the daughter of Ada’s daughter), who told him the family lore about Ada, both accurate and inaccurate, and let him look at some of Ada’s papers. Charmingly, Bowden notes that in Ada’s granddaughter’s book *Thoroughbred Racing Stock*, there is use of binary in computing pedigrees. Ada, and the Analytical Engine, of course, used decimal, with no binary in sight.

But even in the 1960s, Babbage—and Ada—weren’t exactly well known. Babbage’s Difference Engine prototype had been given to the Science Museum in London, but even though I spent lots of time at the Science Museum as a child in the 1960s, I’m pretty sure I never saw it there. Still, by the 1980s, particularly after the US Department of Defense named its ill-fated programming language after Ada, awareness of Ada Lovelace and Charles Babbage began to increase, and biographies began to appear, though sometimes with hair-raising errors (my favorite is that the mention of “the problem of three bodies” in a letter from Babbage indicated a romantic triangle between Babbage, Ada and William—while it actually refers to the three-body problem in celestial mechanics!).

As interest in Babbage and Ada increased, so did curiosity about whether the Difference Engine would actually have worked if it had been built from Babbage’s plans. A project was mounted, and in 2002, after a heroic effort, a complete Difference Engine was built, with only one correction in the plans being made. Amazingly, the machine worked. Building it cost about the same, inflation adjusted, as Babbage had requested from the British government back in 1823.

What about the Analytical Engine? So far, no real version of it has ever been built—or even fully simulated.

OK, so now that I’ve talked (at length) about the life of Ada Lovelace, what about the actual content of her Notes on the Analytical Engine?

They start crisply: “The particular function whose integral the Difference Engine was constructed to tabulate, is …”. She then explains that the Difference Engine can compute values of any 6th degree polynomial—but the Analytical Engine is different, because it can perform any sequence of operations. Or, as she says: “The Analytical Engine is an *embodying of the science of operations*, constructed with peculiar reference to abstract number as the subject of those operations. The Difference Engine is the embodying of one particular and very limited set of operations…”

Charmingly, at least for me, considering the years I have spent working on Mathematica, she continues at a later point: “We may consider the engine as the *material and mechanical representative of analysis*, and that our actual working powers in this department of human study will be enabled more effectually than heretofore to keep pace with our theoretical knowledge of its principles and laws, through the complete control which the engine gives us over the executive manipulation of algebraical and numerical symbols.”

A little later, she explains that punched cards are how the Analytical Engine is controlled, and then makes the classic statement that “the Analytical Engine *weaves algebraical patterns* just as the Jacquard-loom weaves flowers and leaves”.

Ada then goes through how a sequence of specific kinds of computations would work on the Analytical Engine, with “Operation Cards” defining the operations to be done, and “Variable Cards” defining the locations of values. Ada talks about “cycles” and “cycles of cycles, etc”, now known as loops and nested loops, giving a mathematical notation for them:

There’s a lot of modern-seeming content in Ada’s notes. She comments that “There is in existence a beautiful woven portrait of Jacquard, in the fabrication of which 24,000 cards were required.” Then she discusses the idea of using loops to reduce the number of cards needed, and the value of rearranging operations to optimize their execution on the Analytical Engine, ultimately showing that just 3 cards could do what might seem like it should require 330.

Ada talks about just how far the Analytical Engine can go in computing what was previously not computable, at least with any accuracy. And as an example she discusses the three-body problem, and the fact that in her time, of “about 295 coefficients of lunar perturbations” there were many on which different peoples’ computations didn’t agree.

Finally comes Ada’s Note G. Early on, she states: “The Analytical Engine has no pretensions whatever to *originate* anything. It can do whatever we *know how to order it* to perform. … Its province is to assist us in making available what we are already acquainted with.”

Ada seems to have understood with some clarity the traditional view of programming: that we engineer programs to do things we know how to do. But she also notes that in actually putting “the truths and the formulae of analysis” into a form amenable to the engine, “the nature of many subjects in that science are necessarily thrown into new lights, and more profoundly investigated.” In other words—as I often point out—actually programming something inevitably lets one do more exploration of it.

She goes on to say that “in devising for mathematical truths a new form in which to record and throw themselves out for actual use, views are likely to be induced, which should again react on the more theoretical phase of the subject”, or in other words—as I have also often said—representing mathematical truths in a computable form is likely to help one understand those truths themselves better.

Ada seems to have understood, though, that the “science of operations” implemented by the engine would not only apply to traditional mathematical operations. For example, she notes that if “the fundamental relations of pitched sounds in the science of harmony” were amenable to abstract operations, then the engine could use them to “compose elaborate and scientific pieces of music of any degree of complexity or extent”. Not a bad level of understanding for 1843.

What’s become the most famous part of what Ada wrote is the computation of Bernoulli numbers, in Note G. This seems to have come out of a letter she wrote to Babbage, in July 1843. She begins the letter with “I am working very hard for you; like the Devil in fact; (which perhaps I am)”. Then she asks for some specific references, and finally ends with, “I want to put in something about Bernoulli’s Numbers, in one of my Notes, as an example of how an implicit function may be worked out by the engine, without having been worked out by human head & hands first…. Give me the necessary data & formulae.”

Ada’s choice of Bernoulli numbers to show off the Analytical Engine was an interesting one. Back in the 1600s, people spent their lives making tables of sums of powers of integers—in other words, tabulating values of for different *m* and *n*. But Jakob Bernoulli pointed out that all such sums can be expressed as polynomials in *m*, with the coefficients being related to what are now called Bernoulli numbers. And in 1713 Bernoulli was proud to say that he’d computed the first 10 Bernoulli numbers “in a quarter of an hour”—reproducing years of other peoples’ work.

Today, of course, it’s instantaneous to do the computation in the Wolfram Language:

And, as it happens, a few years ago, just to show off new algorithms, we even computed 10 million of them.

But, OK, so how did Ada plan to do it? She started from the fact that Bernoulli numbers appear in the series expansion

Then by rearranging this and matching up powers of *x*, she got a sequence of equations for the Bernoulli numbers *B _{n}*—which she then “unravelled” to give a recurrence relation of the form:

Now Ada had to specify how to actually compute this on the Analytical Engine. First, she used the fact that odd Bernoulli numbers (other than *B*_{1}) are zero, then computed *B _{n}*, which is our modern

On the Analytical Engine, the idea was to have a sequence of operations (specified by “Operation Cards”) performed by the “Mill”, with operands coming from the “Store” (with addresses specified by “Variable Cards”). (In the Store, each number was represented by a sequence of wheels, each turned to the appropriate value for each digit.) To compute Bernoulli numbers the way Ada wanted takes two nested loops of operations. With the Analytical Engine design that existed at the time, Ada had to basically unroll these loops. But in the end she successfully produced a description of how *B*_{8} (which she called *B*_{7}) could be computed:

This is effectively the execution trace of a program that runs for 25 steps (plus a loop) on the Analytical Engine. At each step, the trace shows what operation is performed on which Variable Cards, and which Variable Cards receive the results. Lacking a symbolic notation for loops, Ada just indicated loops in the execution trace using braces, noting in English that parts are repeated.

And in the end, the final result of the computation appears in location 24:

As it’s printed, there’s a bug in Ada’s execution trace on line 4: the fraction is upside down. But if you fix that, it’s easy to get a modern version of what Ada did:

And here’s what the same scheme gives for next two (nonzero) Bernoulli numbers. As Ada figured out it doesn’t ultimately take any more storage locations (specified by Variable Cards) to compute higher Bernoulli numbers, just more operations.

The Analytical Engine, as it was designed in 1843, was supposed to store 1000 40-digit numbers, which would in principle have allowed it to compute up to perhaps *B*_{50} (=495057205241079648212477525/66). It would have been reasonably fast too; the Analytical Engine was intended to do about 7 operations per second. So Ada’s *B*_{8} would have taken about 5 seconds and *B*_{50} would have taken perhaps a minute.

Curiously, even in our record-breaking computation of Bernoulli numbers a few years ago, we were basically using the same algorithm as Ada—though now there are slightly faster algorithms that effectively compute Bernoulli number numerators modulo a sequence of primes, then reconstruct the full numbers using the Chinese Remainder Theorem.

The Analytical Engine and its construction were all Babbage’s work. So what did Ada add? Ada saw herself first and foremost as an expositor. Babbage had shown her lots of plans and examples of the Analytical Engine. She wanted to explain what the overall point was—as well as relate it, as she put it, to “large, general, & metaphysical views”.

In the surviving archive of Babbage’s papers (discovered years later in his lawyer’s family’s cowhide trunk), there are a remarkable number of drafts of expositions of the Analytical Engine, starting in the 1830s, and continuing for decades, with titles like “Of the Analytical Engine” and “The Science of Number Reduced to Mechanism”. Why Babbage never published any of these isn’t clear. They seem like perfectly decent descriptions of the basic operation of the engine—though they are definitely more pedestrian than what Ada produced.

When Babbage died, he was writing a “History of the Analytical Engine”, which his son completed. In it, there’s a dated list of “446 Notations of the Analytical Engine”, each essentially a representations of how some operation—like division—could be done on the Analytical Engine. The dates start in the 1830s, and run through the mid-1840s, with not much happening in the summer of 1843.

Meanwhile, in the collection of Babbage’s papers at the Science Museum, there are some sketches of higher-level operations on the Analytical Engine. For example, from 1837 there’s “Elimination between two equations of the first degree”—essentially the evaluation of a rational function:

There are a few very simple recurrence relations:

Then from 1838, there’s a computation of the coefficients in the product of two polynomials:

But there’s nothing as sophisticated—or as clean—as Ada’s computation of the Bernoulli numbers. Babbage certainly helped and commented on Ada’s work, but she was definitely the driver of it.

So what did Babbage say about that? In his autobiography written 26 years later, he had a hard time saying anything nice about anyone or anything. About Ada’s Notes, he writes: “We discussed together the various illustrations that might be introduced: I suggested several, but the selection was entirely her own. So also was the algebraic working out of the different problems, except, indeed, that relating to the numbers of Bernoulli, which I had offered to do to save Lady Lovelace the trouble. This she sent back to me for an amendment, having detected a grave mistake which I had made in the process.”

When I first read this, I thought Babbage was saying that he basically ghostwrote all of Ada’s Notes. But reading what he wrote again, I realize it actually says almost nothing, other than that he suggested things that Ada may or may not have used.

To me, there’s little doubt about what happened: Ada had an idea of what the Analytical Engine should be capable of, and was asking Babbage questions about how it could be achieved. If my own experiences with hardware designers in modern times are anything to go by, the answers will often have been very detailed. Ada’s achievement was to distill from these details a clear exposition of the abstract operation of the machine—something which Babbage never did. (In his autobiography, he basically just refers to Ada’s Notes.)

For all his various shortcomings, the very fact that Babbage figured out how to build even a functioning Difference Engine—let alone an Analytical Engine—is extremely impressive. So how did he do it? I think the key was what he called his Mechanical Notation. He first wrote about it in 1826 under the title “On a Method of Expressing by Signs the Action of Machinery”. His idea was to take a detailed structure of a machine and abstract a kind of symbolic diagram of how its part acts on each other. His first example was a hydraulic device:

Then he gave the example of a clock, showing on the left a kind of “execution trace” of how the components of the clock change, and on the right a kind of “block diagram” of their relationships:

It’s a pretty nice way to represent how a system works, similar in some ways to a modern timing diagram—but not quite the same. And over the years that Babbage worked on the Analytical Engine, his notes show ever more complex diagrams. It’s not quite clear what something like this means:

But it looks surprisingly like a modern Modelica representation—say in Wolfram SystemModeler. (One difference in modern times is that subsystems are represented much more hierarchically; another is that everything is now computable, so that actual behavior of the system can be simulated from the representation.)

But even though Babbage used his various kinds of diagrams extensively himself, he didn’t write papers about them. Indeed, his only other publication about “Mechanical Notation” is the flyer he had printed up for the Great Exhibition in 1851—apparently a pitch for standardization in drawings of mechanical components (and indeed these notations appear on Babbage’s diagrams like the one above).

I’m not sure why Babbage didn’t do more to explain his Mechanical Notation and his diagrams. Perhaps he was just bitter about peoples’ failure to appreciate it in 1826. Or perhaps he saw it as the secret that let him create his designs. And even though systems engineering has progressed a long way since Babbage’s time, there may yet be inspiration to be had from what Babbage did.

OK, so what’s the bigger picture of what happened with Ada, Babbage and the Analytical Engine?

Charles Babbage was an energetic man who had many ideas, some of them good. At the age of 30 he thought of making mathematical tables by machine, and continued to pursue this idea until he died 49 years later, inventing the Analytical Engine as a way to achieve his objective. He was good—even inspired—at the engineering details. He was bad at keeping a project on track.

Ada Lovelace was an intelligent woman who became friends with Babbage (there’s zero evidence they were ever romantically involved). As something of a favor to Babbage, she wrote an exposition of the Analytical Engine, and in doing so she developed a more abstract understanding of it than Babbage had—and got a glimpse of the incredibly powerful idea of universal computation.

The Difference Engine and things like it are special-purpose computers, with hardware that’s built to do only one kind of thing. One might have thought that to do lots of different kinds of things would necessarily require lots of different kinds of computers. But this isn’t true. And instead it’s a fundamental fact that it’s possible to make general-purpose computers, where a single fixed piece of hardware can be programmed to do any computation. And it’s this idea of universal computation that for example makes software possible—and that launched the whole computer revolution in the 20th century.

Gottfried Leibniz had already had a philosophical concept of something like universal computation back in the 1600s. But it wasn’t followed up. And Babbage’s Analytical Engine is the first explicit example we know of a machine that would have been capable of universal computation.

Babbage didn’t think of it in these terms, though. He just wanted a machine that was as effective as possible at producing mathematical tables. But in the effort to design this, he ended up with a universal computer.

When Ada wrote about Babbage’s machine, she wanted to explain what it did in the clearest way—and to do this she looked at the machine more abstractly, with the result that she ended up exploring and articulating something quite recognizable as the modern notion of universal computation.

What Ada did was lost for many years. But as the field of mathematical logic developed, the idea of universal computation arose again, most clearly in the work of Alan Turing in 1936. Then when electronic computers were built in the 1940s, it was realized they too exhibited universal computation, and the connection was made with Turing’s work.

There was still, though, a suspicion that perhaps some other way of making computers might lead to a different form of computation. And it actually wasn’t until the 1980s that universal computation became widely accepted as a robust notion. And by that time, something new was emerging—notably through work I was doing: that universal computation was not only something that’s possible, but that it’s actually common.

And what we now know (embodied for example in my Principle of Computational Equivalence) is that beyond a low threshold a very wide range of systems—even of very simple construction—are actually capable of universal computation.

A Difference Engine doesn’t get there. But as soon as one adds just a little more, one will have universal computation. So in retrospect, it’s not surprising that the Analytical Engine was capable of universal computation.

Today, with computers and software all around us, the notion of universal computation seems almost obvious: of course we can use software to compute anything we want. But in the abstract, things might not be that way. And I think one can fairly say that Ada Lovelace was the first person ever to glimpse with any clarity what has become a defining phenomenon of our technology and even our civilization: the notion of universal computation.

What if Ada’s health hadn’t failed—and she had successfully taken over the Analytical Engine project? What might have happened then?

I don’t doubt that the Analytical Engine would have been built. Maybe Babbage would have had to revise his plans a bit, but I’m sure he would have made it work. The thing would have been the size of a train locomotive, with maybe 50,000 moving parts. And no doubt it would have been able to compute mathematical tables to 30- or 50-digit precision at the rate of perhaps one result every 4 seconds.

Would they have figured that the machine could be electromechanical rather than purely mechanical? I suspect so. After all, Charles Wheatstone, who was intimately involved in the development of the electric telegraph in the 1830s, was a good friend of theirs. And by transmitting information electrically through wires, rather than mechanically through rods, the hardware for the machine would have been dramatically reduced, and its reliability (which would have been a big issue) would have been dramatically increased.

Another major way that modern computers reduce hardware is by dealing with numbers in binary rather than decimal. Would they have figured that idea out? Leibniz knew about binary. And if George Boole had followed up on his meeting with Babbage at the Great Exhibition, maybe that would have led to something. Binary wasn’t well known in the mid-1800s, but it did appear in puzzles, and Babbage, at least, was quite into puzzles: a notable example being his question of how to make a square of words with “bishop” along the top and side (which now takes just a few lines of Wolfram Language code to solve).

Babbage’s primary conception of the Analytical Engine was as a machine for automatically producing mathematical tables—either printing them out by typesetting, or giving them as plots by drawing onto a plate. He imagined that humans would be the main users of these tables—although he did think of the idea of having libraries of pre-computed cards that would provide machine-readable versions.

Today—in the Wolfram Language for example—we never store much in the way of mathematical tables; we just compute what we need when we need it. But in Babbage’s day—with the idea of a massive Analytical Engine—this way of doing things would have been unthinkable.

So, OK: would the Analytical Engine have gotten beyond computing mathematical tables? I suspect so. If Ada had lived as long as Babbage, she would still have been around in the 1890s when Herman Hollerith was doing card-based electromechanical tabulation for the census (and founding what would eventually become IBM). The Analytical Engine could have done much more.

Perhaps Ada would have used the Analytical Engine—as she began to imagine—to produce algorithmic music. Perhaps they would have used it to solve things like the three-body problem, maybe even by simulation. If they’d figured out binary, maybe they would even have simulated things like cellular automata.

Neither Babbage nor Ada ever made money commercially (and, as Babbage took pains to point out, his government contracts just paid his engineers, not him). If they had developed the Analytical Engine, would they have found a business model for it? No doubt they would have sold some engines to governments. Maybe they would even have operated a kind of cloud computing service for Victorian science, technology, finance and more.

But none of this actually happened, and instead Ada died young, the Analytical Engine was never finished, and it took until the 20th century for the power of computation to be discovered.

If one had met Charles Babbage, what would he have been like? He was, I think, a good conversationalist. Early in life he was idealistic (“do my best to leave the world wiser than I found it”); later he was almost a Dickensian caricature of a bitter old man. He gave good parties, and put great value in connecting with the highest strata of intellectual society. But particularly in his later years, he spent most of his time alone in his large house, filled with books and papers and unfinished projects.

Babbage was never a terribly good judge of people, or of how what he said would be perceived by them. And even in his eighties, he was still quite child-like in his polemics. He was also notoriously poor at staying focused; he always had a new idea to pursue. The one big exception to this was his almost-50-year persistence in trying to automate the process of computation.

I myself have shared a modern version of this very goal in my own life (…, Mathematica, Wolfram|Alpha, Wolfram Language, …)—though so far only for 40 years. I am fortunate to have lived in a time when ambient technology made this much easier to achieve, but in every large project I have done it has still taken a certain singlemindedness and gritty tenacity—as well as leadership—to actually get it finished.

So what about Ada? From everything I can tell, she was a clear speaking, clear thinking individual. She came from the upper classes, but didn’t wear especially fashionable clothes, and carried herself much less like a stereotypical countess than like an intellectual. As an adult, she was emotionally quite mature—probably more so than Babbage—and seems to have had a good practical grasp of people and the world.

Like Babbage, she was independently wealthy, and had no need to work for a living. But she was ambitious, and wanted to make something of herself. In person, beyond the polished Victorian upper-class exterior, I suspect she was something of a nerd, complete with math jokes and everything. She was also capable of great and sustained focus, for example over the months she spent writing her Notes.

In mathematics, she successfully learned up to the state of the art in her time—probably about the same level as Babbage. Unlike Babbage, we don’t know of any specific research she did in mathematics, so it’s hard to judge how good she would have been; Babbage was respectable though unremarkable.

When one reads Ada’s letters, what comes through is a smart, sophisticated person, with a clear, logical mind. What she says is often dressed in Victorian pleasantaries—but underneath, the ideas are clear and often quite forceful.

Ada was very conscious of her family background, and of being “Lord Byron’s daughter”. At some level, his story and success no doubt fueled her ambition, and her willingness to try new things. (I can’t help thinking of her leading the engineers of the Analytical Engine as a bit like Lord Byron leading the Greek army.) But I also suspect his troubles loomed over her. For many years, partly at her mother’s behest, she eschewed things like poetry. But she was drawn to abstract ways of thinking, not only in mathematics and science, but also in more metaphysical areas.

And she seems to have concluded that her greatest strength would be in bridging the scientific with the metaphysical—perhaps in what she called “poetical science”. It was likely a correct self perception. For that is in a sense exactly what she did in the Notes she wrote: she took Babbage’s detailed engineering, and made it more abstract and “metaphysical”—and in the process gave us a first glimpse of the idea of universal computation.

The story of Ada and Babbage has many interesting themes. It is a story of technical prowess meeting abstract “big picture” thinking. It is a story of friendship between old and young. It is a story of people who had the confidence to be original and creative.

It is also a tragedy. A tragedy for Babbage, who lost so many people in his life, and whose personality pushed others away and prevented him from realizing his ambitions. A tragedy for Ada, who was just getting started in something she loved when her health failed.

We will never know what Ada could have become. Another Mary Somerville, famous Victorian expositor of science? A Steve-Jobs-like figure who would lead the vision of the Analytical Engine? Or an Alan Turing, understanding the abstract idea of universal computation?

That Ada touched what would become a defining intellectual idea of our time was good fortune. Babbage did not know what he had; Ada started to see glimpses and successfully described them.

For someone like me the story of Ada and Babbage has particular resonance. Like Babbage, I have spent much of my life pursuing particular goals—though unlike Babbage, I have been able to see a fair fraction of them achieved. And I suspect that, like Ada, I have been put in a position where I can potentially see glimpses of some of the great ideas of the future.

But the challenge is to be enough of an Ada to grasp what’s there—or at least to find an Ada who does. But at least now I think I have an idea of what the original Ada born 200 years ago today was like: a fitting personality on the road to universal computation and the present and future achievements of computational thinking.

It’s been a pleasure getting to know you, Ada.

*Quite a few organizations and people helped in getting information and material for this post. I’d like to thank the British Library, the Museum of the History of Science, Oxford, Science Museum, London; the Bodleian Library, Oxford (with permission from the Earl of Lytton, Ada’s great-great grandson, and one of her 10 living descendants); the New York Public Library; St. Mary Magdalene Church, Hucknall, Nottinghamshire (Ada’s burial place); and Betty Toole (author of a collection of Ada’s letters); as well as two old friends: Tim Robinson (re-creator of Babbage engines) and Nathan Myhrvold (funder of Difference Engine #2 re-creation). *

I wasn’t sure if I was ever going to write another book. My last book—*A New Kind of Science*—took me more than a decade of intensely focused work, and is the largest personal project I’ve ever done.

But a little while ago, I realized there was another book I had to write: a book that would introduce people with no knowledge of programming to the Wolfram Language and the kind of computational thinking it allows.

The result is *An Elementary Introduction to the Wolfram Language*, published today in print, free on the web, etc.

The goal of the book is to take people from zero to the point where they know enough about the Wolfram Language that they can routinely use it to create programs for things they want to do. And when I say “zero”, I really mean “zero”. This is a book for everyone. It doesn’t assume any knowledge of programming, or math (beyond basic arithmetic), or anything else. It just starts from scratch and explains things. I’ve tried to make it appropriate for both adults and kids. I think it’ll work for typical kids aged about 12 and up.

In the past, a book like this would have been inconceivable. The necessary underlying technology just didn’t exist. Serious programming was always difficult, and there wasn’t a good way to connect with real-world concepts. But now we have the Wolfram Language. It’s taken three decades. But now we’ve built in enough knowledge and automated enough of the process of programming that it’s actually realistic to take almost anyone from zero to the frontiers of what can be achieved with computation.

But how should one actually do it? What should one explain, in what order? Those were challenges I had to address to write this book. I’d written a Fast Introduction for Programmers that in 30 pages or so introduces people who already know about modern programming to the core concepts of the Wolfram Language. But what about people who don’t start off knowing anything about programming?

For many years I’ve found various opportunities to show what’s now the Wolfram Language to people like that. And now I’ve used my experience to figure out what to do in the book.

In essence, the book brings the reader into a conversation with the computer. There are two great things about the Wolfram Language that make this really work. First, that the language is symbolic, so that anything one’s dealing with—a color, an image, a graph, whatever—can be right there in the dialog. And second, that the language can be purely functional, so that everything is stateless, and every input can be self contained.

It’s also very important that the Wolfram Language has built-in knowledge that lets one immediately compute with real-world things.

Oh, and visualization is really important too—so it’s easy to see what one’s computing.

OK, but where should one start? The very first page is about arithmetic—just because that’s a place where everyone can see that a computation is actually happening:

There’s a section called Vocabulary because that’s what it is: one’s learning some “words” in the Wolfram Language. Then there are exercises, which I’ll talk about soon.

OK, but once one’s done arithmetic, where should one go next? What I decided to do was to go immediately to the idea of functions—and to first introduce them in terms of arithmetic. The advantage of this is that while the concept of a function may be new, the operation it’s doing (namely arithmetic) is familiar.

And once one’s understood the function Plus, one can immediately go to functions like Max that don’t have special inputs. What Max does isn’t that exciting, though. So as a slightly more exciting function, what I introduce next is RandomInteger—which people often like to run over and over again, to see what it produces.

OK, so what next? The obvious answer is that we have to introduce lists. But what should one do with lists? Doing something like picking elements out of them isn’t terribly exciting, and it’s hard immediately to see why it’s important. So instead what I decided was to make the very first function I show for lists be ListPlot. It’s nice to start getting in the idea of visualization—and it’s also a good example of how one can type in a tiny piece of code, and get something bigger and more interesting out.

Actually, the best extremely simple example of that is Range, which I also show at this point. Range is a great way to show the computer actually computing something, with a result that’s easy to understand.

But OK, so now we want to reinforce the idea of functions, and functions working together. The function Reverse isn’t incredibly common in practice, but it’s very easy to understand, so I introduce it next, followed by Join.

What’s nice then is that between Reverse, Range and Join we have a little microlanguage that’s completely self-contained, but lets us do a variety of computations. And, of course, whatever computations one does, one can immediately see the results, either symbolically or visually.

The next couple of sections talk about displaying and operating on lists, reinforcing what’s already been said, and introducing a variety of functions that are useful in practice. Then it’s on to Table—a very common and powerful function, that in effect packages up a lot of what might otherwise need explicit loops and so on.

I start with trivial versions of Table, without any iteration variable. I take it for granted (as people who don’t know “better” do!) that Table can produce a list of graphics just like it can produce a list of numbers. (Of course, the fact that it can do this is a consequence of the fundamentally symbolic character of the Wolfram Language.)

The next big step is to introduce a variable into Table. I thought a lot about how to do this, and decided that the best thing to show first is the purely symbolic version. After all, we’ve already introduced functions, and with the symbolic version, one can immediately see where the variable goes. But now that we’ve got Table with variables, we can really go to town and start doing what people will think of as “real computations”.

In the first few sections of the book, the raw material for our computations is basically numbers and lists. What I wanted to do next was to show that there are other things to compute with. I chose colors as the first example. Colors are good because (a) everyone knows what they are, (b) you can actually compute with them and (c) they make colorful output (!).

After colors we’re ready for some graphics. I haven’t talked about coordinates yet, so I can only show individual graphical objects, without placement information.

There’s absolutely no reason not to go to 3D, and I do.

Now we’re all set up for something “advanced”: interactive manipulation. It’s pretty much like Table, except that one gets out a complete interactive user interface. And since we’ve introduced graphics, those can be part of the interface. People have seen interactive interfaces in lots of consumer software. My experience is that they’re pretty excited to be able to create them from scratch themselves.

The next, perhaps surprising thing I introduce in the book is image processing. Yes, there’s a lot of sophisticated computation behind image processing. But in the Wolfram Language that’s all internal. And what people see are just functions—like Blur and ColorNegate—whose purposes are easy to understand.

It’s also nice that people—especially kids—can compute with images they take, or drag in. And this is actually the first example in the book where there’s rich data coming into a computation from outside. (I needed a sample image for the section, so, yes, I just snapped one right there—of me working on the book.)

Next I talk about strings and text. String operations on their own are pretty dry. But in the Wolfram Language there’s lots of interesting stuff that’s easy to do with them—like visualizing word clouds from Wikipedia, or looking at common words in different languages.

Next I cover sound, and talk about how to generate sequences of musical notes. In the printed book you can’t hear them, of course, though the little score icons give some sense of what’s there.

One might wonder, “Why not talk about sound right after graphics?” Well, first of all, I thought it wasn’t bad to mix things up a bit, to help keep the flow interesting. But more than that, there’s a certain chain of dependencies between different areas. For example, the names of musical notes are specified as strings—so one has to have talked about strings before musical notes.

Next it’s “Arrays, or Lists of Lists”. Then it’s “Coordinates and Graphics”. At first, I worried that coordinates were too “mathy”. But particularly after one’s seen arrays, it’s not so difficult to understand coordinates. And once one’s got the idea of 2D coordinates, it’s easy to go to 3D.

By this point in the book, people already know how to do some useful and real things with the Wolfram Language. So I made the next section a kind of interlude—a meta-section that gives a sense of the overall scope of the Wolfram Language, and also shows how to find information on specific topics and functions.

Now that people have seen a bit about abstract computation, it’s time to talk about real-world data, and to show how to access the vast amount of data that the Wolfram Language shares with Wolfram|Alpha.

Lots of real-world data involves units—so the next section is devoted to working with units. Once that’s done, we can talk about geocomputation: things like finding distances on the Earth, and drawing maps.

After that I talk about dates and times. One might think this wouldn’t be an interesting or useful topic. But it’s actually a really good example of real-world computation, and it’s also something one uses all over the place.

The Wolfram Language is big. But it’s based on a small number of ideas that are consistently used over and over again. One of the important objectives in the book is to cover these ideas. And the next section—on options—covers one such simple idea that’s widely used in practice.

After covering options, we’re set to talk about something that’s often viewed as a quite advanced topic: graphs and networks. But my experience is that in modern times, people have seen enough graphs and networks in their everyday life that they don’t have much trouble understanding them in the Wolfram Language. Of course, it helps a lot that the language can manipulate them directly, as just another example of symbolic objects.

After graphs and networks, we’re ready for another seemingly very advanced topic: machine learning. But even though the internal algorithms for machine learning are complicated, the actual functions that do it in the Wolfram Language are perfectly easy to understand. And what’s nice is that by doing a bunch of examples with them, one can start to get pretty good high-level intuition about the core ideas of machine learning.

Throughout the book, I try to keep things as simple as possible. But sometimes that means I have to go back for a deeper view of a topic I’ve already covered. “More about Numbers” and “More Forms of Visualization” are two examples of doing this—covering things that would have gotten in the way when numbers and visualization were first introduced, but that need to be said to get a full understanding of these areas.

The next few sections tackle the important and incredibly powerful topic of functional programming. In the past, functional programming tended to be viewed as a sophisticated topic—and certainly not something to teach people who are first learning about programming. But I think in the Wolfram Language the picture has changed—and it’s now possible to explain functional programming in a way that people will find easy to understand. I start by just talking more abstractly about the process of applying a function.

The big thing this does is set me up to talk about pure anonymous functions. In principle I could have talked about these much sooner, but I think it’s important for people to have seen many different kinds of examples of how functions are used in general—because that’s what’s needed to motivate pure functions.

The next section is where some of the real power of functional programming starts to shine through. In the abstract, functions like NestList and NestGraph sound pretty complicated and abstract. But by this point in the book, we’ve covered enough of the Wolfram Language that there are plenty of concrete examples to give—that are quite easy to understand.

The next several sections cover areas of the language that are unlocked as soon as one understands pure functions. There are lots of powerful programming techniques that emerge from a smaller number of ideas.

After functional programming, the next big topics are patterns and pattern-based programming. I could have chosen to talk about patterns earlier in the book, but they weren’t really needed until now.

What makes patterns so powerful in the Wolfram Language is something much more fundamental: the uniform structure of everything in the language, based on symbolic expressions. If I were writing a formal specification of the Wolfram Language, I would start with symbolic expressions. And I might do the same if I were writing a book for theoretical computer scientists or pure mathematicians.

It’s not that symbolic expressions are a difficult concept to understand. It’s just that without seeing how things actually work in practice in the Wolfram Language, it’s difficult to motivate abstractly studying them. But now it makes sense to talk about them, not least because they let one see the full power of what’s possible with patterns.

At this point in the book, we’re getting ready to see how to actually deploy things like web apps. There are a few more pieces to put in place to get there. I talk about associations—and then I talk about natural language understanding. Internally, the way natural language understanding works is complex. But at the level of the Wolfram Language, it’s easy to use—though to see how to connect it into things, it’s helpful to know about pure functions.

OK, so now everything is ready to talk about deploying things to the web. And at this point, people will be able to start creating useful, practical pieces of software that they can share with the world.

It’s taken 220 pages or so. But to me that’s an amazingly small number of pages to go from zero to what are essentially professional-grade web apps. If we’d just been talking about some very specific kind of app, it wouldn’t be so impressive. But we’re talking about extremely general kinds of apps, that do pretty much any kind of computation.

If you open a book about a traditional programming language like C++ or Java, one of the first things you’re likely to see is a discussion of assigning values to variables. But in my book I don’t do this until Section 38. At some level, this might seem bizarre—but it really isn’t. Because in the Wolfram Language you can do an amazing amount—including for example deploying a complete web app—without ever needing to assign a value to a variable.

And this is actually one of the reasons why it’s so easy to learn the Wolfram Language. Because if you don’t assign values to variables, every piece of code in the language stands alone, and will do the same thing whenever it’s run. But as soon as you’re assigning values to variables, there’s hidden state, and your code will do different things depending on what values variables happen to have.

Still, having talked about assigning values to variables—as well as about patterns—we’re ready to talk about defining your own functions, which is the way to build up more and more sophisticated functionality in the Wolfram Language.

At this point, you’re pretty much set in terms of the basic concepts of the Wolfram Language. But the last few sections of the book cover some important practical extensions. There’s a section on string patterns and templates. There’s a section on storing things, locally and in the cloud. There’s a section on importing and exporting. And there’s a section on datasets. Not everyone who uses the Wolfram Language will ever need datasets, but when you’re dealing with large amounts of structured data they’re very useful. And they provide an interesting example that makes use of many different ideas from the Wolfram Language.

At the end of the book, I have what are basically essay sections: about writing good code, about debugging and about being a programmer. My goal in these sections is to build on the way of thinking that I hope people have developed from reading the rest of the book, and then to communicate some more abstract principles.

I said at the beginning of this post that the book is essentially written as a conversation. In almost every section, I found it convenient to add two additional parts: Q&A and Tech Notes. The goal of Q&A is to have a place to answer obvious questions people might have, without distracting from the main narrative.

There are several different types of questions. Some are about extensions to the functionality that’s been discussed. Some are about the background to it. And some are questions (“What does ‘raised to the power’ mean?”) that will be trivial to some readers but not to others.

In addition to Q&A, I found it useful to include what I call Tech Notes. Their goal is to add technical information—and to help people who already have sophisticated technical knowledge in some particular area to connect it to what they’re reading in this book.

Another part of most sections is a collection of exercises. The vast majority are basically of the form “write a piece of code to do X”—though a few are instead “find a simpler version of this piece of code”.

There are answers to all the exercises in the printed book at the back—and in the web version there are additional exercises. Of course, the answers that are given are just possible answers—and they’re almost never the only possible answers.

Writing the exercises was an interesting experience for me, that was actually quite important in my thinking about topics like how to talk to AIs. Because what most of the exercises effectively say is, “Take this description that’s written in English, and turn it into Wolfram Language code.” If what one’s doing is simple enough, then English works quite well as a description language. But when what one’s doing gets more complicated, English doesn’t do so well. And by later in the book, I was often finding it much easier to write the Wolfram Language answer for an exercise than to create the actual exercise in English.

In a sense this is very satisfying, because it means we really need the Wolfram Language to be able to express ideas. Some things we can express easily in English—and eventually expect Wolfram|Alpha to be able to understand. But there’s plenty that requires the greater structure and precision of the Wolfram Language.

At some level it might seem odd in this day and age to be writing a book that can actually be printed on paper, rather than creating some more flexible online structure. But what I’ve found is that the concept of a book is very useful. Yes, one can have a website where one can reach lots of information by following links. But when people are trying to systematically learn a subject, I think it’s good to have a definite, finite container of information, where there’s an expectation of digesting it sequentially, and where one can readily see the overall structure.

That’s not to say that it’s not useful to have the book online. Right now the book is available as a website, and for many purposes this web version works very well. But somewhat to my surprise, I still find the physical book, with its definite pagination and browsable pages, better for many things.

Of course, if you’re going to learn the Wolfram Language, you actually need to run it. So even if you’re using a physical book, it’s best to to have a computer (or tablet) by your side—so you can try the examples, do the exercises, etc. You can do this immediately if you’re reading the book on the web or in the cloud. But some people have told me that they actually find it helpful to retype the examples: they internalize them better that way, and with all the autocompletion and other features in the Wolfram Language, it’s very fast to type in code.

I call the book an “elementary introduction”. And that’s what it is. It’s not a complete book about the Wolfram Language—far from it. It’s intended to be a basic introduction that gets people to the point where they can start writing useful programs. It covers a lot of the core principles of the language—but only a small fraction of the very large number of specific areas of functionality.

Generally I tried to include areas that are either very commonly encountered in practice, or easy for people to understand without external knowledge—and good for illuminating principles. I’m very happy with the sequence of areas I was able to cover—but another book could certainly pick quite different ones.

Of course, I was a little disappointed to have to leave out all sorts of amazing things that the Wolfram Language can do. And at the end of the book I decided to include a short section that gives a taste of what I wasn’t able to talk about.

I see my new book as part of the effort to launch the Wolfram Language. And back in 1988, when we first launched Mathematica, I wrote a book for that, too. But it was a different kind of book: it was a book that was intended to provide a complete tutorial introduction and reference guide to the whole system. The first edition was 767 pages. But by the 5th edition a decade later, the book had grown to 1488 pages. And at that point we decided a book just wasn’t the correct way to deliver the information—and we built a whole online system instead.

It’s just as well we did that, because it allowed us to greatly expand the depth of coverage, particularly in terms of examples. And of course the actual software system grew a lot—with the result that today the full Documentation Center contains more than 50,000 pages of content.

Many people have told me that they liked the original Mathematica book—and particularly the fact that it was short enough to realistically read from cover to cover. My goal with *An Elementary Introduction to the Wolfram Language* was again to have a book that’s short enough that people can actually read all of it.

Looking at the book, it’s interesting to see how much of it is about things that simply didn’t exist in the Wolfram Language—or Mathematica—until very recently. Of course it’s satisfying to me to see that things we’re adding now are important enough to make it into an elementary introduction. But it also means that even people who’ve known parts of the Wolfram Language through Mathematica for many years should find the book interesting to read.

I’ve thought for a while that there should be a book like the one I’ve now written. And obviously there are plenty of people who know the Wolfram Language well and could in principle have written an introduction to it. But I’m happy that I’ve been the one to write this book. It’s reduced my productivity on other things—like writing blogs—for a little while. But it’s been a fascinating experience.

It’s been a bit like being back hundreds of years and asking, “How should one approach explaining math to people?” And working out that first one should talk about arithmetic, then algebra, and so on. Well, now we have to do the same kind of thing for computational thinking. And I see the book as a first effort at communicating the tools of computational thinking to a broad range of people.

It’s been fun to write. I hope people find it fun to read—and that they use what they learn from it to create amazing things with the Wolfram Language.

]]>A hundred years ago today Albert Einstein published his General Theory of Relativity—a brilliant, elegant theory that has survived a century, and provides the only successful way we have of describing spacetime.

There are plenty of theoretical indications, though, that General Relativity isn’t the end of the story of spacetime. And in fact, much as I like General Relativity as an abstract theory, I’ve come to suspect it may actually have led us on a century-long detour in understanding the true nature of space and time.

I’ve been thinking about the physics of space and time for a little more than 40 years now. At the beginning, as a young theoretical physicist, I mostly just assumed Einstein’s whole mathematical setup of Special and General Relativity—and got on with my work in quantum field theory, cosmology, etc. on that basis.

But about 35 years ago, partly inspired by my experiences in creating technology, I began to think more deeply about fundamental issues in theoretical science—and started on my long journey to go beyond traditional mathematical equations and instead use computation and programs as basic models in science. Quite soon I made the basic discovery that even very simple programs can show immensely complex behavior—and over the years I discovered that all sorts of systems could finally be understood in terms of these kinds of programs.

Encouraged by this success, I then began to wonder if perhaps the things I’d found might be relevant to that ultimate of scientific questions: the fundamental theory of physics.

At first, it didn’t seem too promising, not least because the models that I’d particularly been studying (cellular automata) seemed to work in a way that was completely inconsistent with what I knew from physics. But sometime in 1988—around the time the first version of *Mathematica* was released—I began to realize that if I changed my basic way of thinking about space and time then I might actually be able to get somewhere.

In the abstract it’s far from obvious that there should be a simple, ultimate theory of our universe. Indeed, the history of physics so far might make us doubtful—because it seems as if whenever we learn more, things just get more complicated, at least in terms of the mathematical structures they involve. But—as noted, for example, by early theologians—one very obvious feature of our universe is that there is order in it. The particles in the universe don’t just all do their own thing; they follow a definite set of common laws.

But just how simple might the ultimate theory for the universe be? Let’s say we could represent it as a program, say in the Wolfram Language. How long would the program be? Would it be as long as the human genome, or as the code for an operating system? Or would it be much, much smaller?

Before my work on the computational universe of simple programs, I would have assumed that if there’s a program for the universe it must be at least somewhat complicated. But what I discovered is that in the computational universe even extremely simple programs can actually show behavior as complex as anything (a fact embodied in my general Principle of Computational Equivalence). So then the question arises: could one of these simple programs in the computational universe actually be the program for our physical universe?

But what would such a program be like? One thing is clear: if the program is really going to be extremely simple, it’ll be too small to explicitly encode obvious features of our actual universe, like particle masses, or gauge symmetries, or even the number of dimensions of space. Somehow all these things have to emerge from something much lower level and more fundamental.

So if the behavior of the universe is determined by a simple program, what’s the basic “data structure” on which this program operates? At first, I’d assumed that it must be something simple for us to describe, like the lattice of cells that exists in a cellular automaton. But even though such a structure works well for models of many things, it seems at best incredibly implausible as a fundamental model of physics. Yes, one can find rules that give behavior which on a large scale doesn’t show obvious signs of the lattice. But if there’s really going to be a simple model of physics, it seems wrong that such a rigid structure for space should be burned in, while every other feature of physics just emerges.

So what’s the alternative? One needs something in a sense “underneath” space: something from which space as we know it can emerge. And one needs an underlying data structure that’s as flexible as possible. I thought about this for years, and looked at all sorts of computational and mathematical formalisms. But what I eventually realized was that basically everything I’d looked at could actually be represented in the same way: as a network.

A network—or graph—just consists of a bunch of nodes, joined by connections. And all that’s intrinsically defined in the graph is the pattern of these connections.

So could this be what space is made of? In traditional physics—and General Relativity—one doesn’t think of space as being “made of” anything. One just thinks of space as a mathematical construct that serves as a kind of backdrop, in which there’s a continuous range of possible positions at which things can be placed.

But do we in fact know that space is continuous like this? In the early days of quantum mechanics, it was actually assumed that space would be quantized like everything else. But it wasn’t clear how this could fit in with Special Relativity, and there was no obvious evidence of discreteness. By the time I started doing physics in the 1970s, nobody really talked about discreteness of space anymore, and it was experimentally known that there wasn’t discreteness down to about 10^{-18} meters (1/1000 the radius of a proton, or 1 attometer). Forty years—and several tens of billions of dollars’ worth of particle accelerators—later there’s still no discreteness in space that’s been seen, and the limit is about 10^{-22} meters (or 100 yoctometers).

Still, there’s long been a suspicion that something has to be quantized about space down at the Planck length of about 10^{-34} meters. But when people have thought about this—and discussed spin networks or loop quantum gravity or whatever—they’ve tended to assume that whatever happens there has to be deeply connected to the formalism of quantum mechanics, and to the notion of quantum amplitudes for things.

But what if space—perhaps at something like the Planck scale—is just a plain old network, with no explicit quantum amplitudes or anything? It doesn’t sound so impressive or mysterious—but it certainly takes a lot less information to specify such a network: you just have to say which nodes are connected to which other ones.

But how could this be what space is made of? First of all, how could the apparent continuity of space on larger scales emerge? Actually, that’s not very difficult: it can just be a consequence of having lots of nodes and connections. It’s a bit like what happens in a fluid, like water. On a small scale, there are a bunch of discrete molecules bouncing around. But the large-scale effect of all these molecules is to produce what seems to us like a continuous fluid.

It so happens that I studied this phenomenon a lot in the mid-1980s—as part of my efforts to understand the origins of apparent randomness in fluid turbulence. And in particular I showed that even when the underlying “molecules” are cells in a simple cellular automaton, it’s possible to get large-scale behavior that exactly follows the standard differential equations of fluid flow.

So when I started thinking about the possibility that underneath space there might be a network, I imagined that perhaps the same methods might be used—and that it might actually be possible to derive Einstein’s Equations of General Relativity from something much lower level.

But, OK, if space is a network, what about all the stuff that’s in space? What about all the electrons, and quarks and photons, and so on? In the usual formulation of physics, space is a backdrop, on top of which all the particles, or strings, or whatever, exist. But that gets pretty complicated. And there’s a simpler possibility: maybe in some sense everything in the universe is just “made of space”.

As it happens, in his later years, Einstein was quite enamored of this idea. He thought that perhaps particles, like electrons, could be associated with something like black holes that contain nothing but space. But within the formalism of General Relativity, Einstein could never get this to work, and the idea was largely dropped.

As it happens, nearly 100 years earlier there’d been somewhat similar ideas. That was a time before Special Relativity, when people still thought that space was filled with a fluid-like ether. (Ironically enough, in modern times we’re back to thinking of space as filled with a background Higgs field, vacuum fluctuations in quantum fields, and so on.) Meanwhile, it had been understood that there were different types of discrete atoms, corresponding to the different chemical elements. And so it was suggested (notably by Kelvin) that perhaps these different types of atoms might all be associated with different types of knots in the ether.

It was an interesting idea. But it wasn’t right. But in thinking about space as a network, there’s a related idea: maybe particles just correspond to particular structures in the network. Maybe all that has to exist in the universe is the network, and then the matter in the universe just corresponds to particular features of this network. It’s easy to see similar things in cellular automata on a lattice. Even though every cell follows the same simple rules, there are definite structures that exist in the system—and that behave quite like particles, with a whole particle physics of interactions.

There’s a whole discussion to be had about how this works in networks. But first, there’s something else that’s very important to talk about: time.

Back in the 1800s, there was space and there was time. Both were described by coordinates, and in some mathematical formalisms, both appeared in related ways. But there was no notion that space and time were in any sense “the same thing”. But then along came Einstein’s Special Theory of Relativity—and people started talking about “spacetime”, in which space and time are somehow facets of the same thing.

It makes a lot of sense in the formalism of Special Relativity, in which, for example, traveling at a different velocity is like rotating in 4-dimensional spacetime. And for about a century, physics has pretty much just assumed that spacetime is a thing, and that space and time aren’t in any fundamental way different.

So how does that work in the context of a network model of space? It’s certainly possible to construct 4-dimensional networks in which time works just like space. And then one just has to say that the history of the universe corresponds to some particular spacetime network (or family of networks). Which network it is must be determined by some kind of constraint: our universe is the one which has such-and-such a property, or in effect satisfies such-and-such an equation. But this seems very non-constructive: it’s not telling one how the universe behaves, it’s just saying that if the behavior looks like this, then it can be the universe.

And, for example, in thinking about programs, space and time work very differently. In a cellular automaton, for example, the cells are laid out in space, but the behavior of the system occurs in a sequence of steps in time. But here’s the thing: just because the underlying rules treat space and time very differently, it doesn’t mean that on a large scale they can’t effectively behave similarly, just like in current physics.

OK, so let’s say that underneath space there’s a network. How does this network evolve? A simple hypothesis is to assume that there’s some kind of local rule, which says, in effect that if you see a piece of network that looks like this, replace it with one that looks like that.

But now things get a bit complicated. Because there might be lots of places in the network where the rule could apply. So what determines in which order each piece is handled?

In effect, each possible ordering is like a different thread of time. And one could imagine a theory in which all threads are followed—and the universe in effect has many histories.

But that doesn’t need to be how it works. Instead, it’s perfectly possible for there to be just one thread of time—pretty much the way we experience it. And to understand this, we have to do something a bit similar to what Einstein did in formulating Special Relativity: we have to make a more realistic model of what an “observer” can be.

Needless to say, any realistic observer has to exist within our universe. So if the universe is a network, the observer must be just some part of that network. Now think about all those little network updatings that are happening. To “know” that a given update has happened, observers themselves must be updated.

If you trace this all the way through—as I did in my book, *A New Kind of Science*—you realize that the only thing observers can ever actually observe in the history of the universe is the causal network of what event causes what other event.

And then it turns out that there’s a definite class of underlying rules for which different orderings of underlying updates don’t affect that causal network. They’re what I call “causal invariant” rules.

Causal invariance is an interesting property, with analogs in a variety of computational and mathematical systems—for example in the fact that transformations in algebra can be applied in any order and still give the same final result. But in the context of the universe, its consequence is that it guarantees that there’s only one thread of time in the universe.

So what about spacetime and Special Relativity? Here, as I figured out in the mid-1990s, something exciting happens: as soon as there’s causal invariance, it basically follows that there’ll be Special Relativity on a large scale. In other words, even though at the lowest level space and time are completely different kinds of things, on a larger scale they get mixed together in exactly the way prescribed by Special Relativity.

Roughly what happens is that different “reference frames” in Special Relativity—corresponding, for example, to traveling at different velocities—correspond to different detailed sequencings of the low-level updates in the network. But because of causal invariance, the overall behavior associated with these different detailed sequences is the same—so that the system follows the principles of Special Relativity.

At the beginning it might have looked hopeless: how could a network that treats space and time differently end up with Special Relativity? But it works out. And actually, I don’t know of any other model in which one can successfully derive Special Relativity from something lower level; in modern physics it’s always just inserted as a given.

OK, so one can derive Special Relativity from simple models based on networks. What about General Relativity—which, after all, is what we’re celebrating today? Here the news is very good too: subject to various assumptions, I managed in the late 1990s to derive Einstein’s Equations from the dynamics of networks.

The whole story is somewhat complicated. But here’s roughly how it goes. First, we have to think about how a network actually represents space. Now remember, the network is just a collection of nodes and connections. The nodes don’t say how they’re laid out in one-dimensional, two-dimensional, or any-dimensional space.

It’s easy to see that there are networks that on a large scale seem, say, two-dimensional, or three-dimensional. And actually, there’s a simple test for the effective dimension of a network. Just start from a node, then look at all nodes that are up to r connections away. If the network is behaving like it’s *d*-dimensional, then the number of nodes in that “ball” will be about *r ^{d}*.

Here’s where things start to get really interesting. If the network behaves like flat *d*-dimensional space, then the number of nodes will always be close to *r ^{d}*. But if it behaves like curved space, as in General Relativity, then there’s a correction term, that’s proportional to a mathematical object called the Ricci scalar. And that’s interesting, because the Ricci scalar is precisely something that occurs in Einstein’s Equations.

There’s lots of mathematical complexity here. One has to look at shortest paths—or geodesics—in the network. One has to see how to do everything not just in space, but in networks evolving in time. And one has to understand how the large-scale limits of networks work.

In deriving mathematical results, it’s important to be able to take certain kinds of averages. It’s actually very much the same kind of thing needed to derive fluid equations from dynamics of molecules: one needs to be able to assume a certain degree of effective randomness in low-level interactions to justify the taking of averages.

But the good news is that an incredible range of systems, even with extremely simple rules, work a bit like the digits of pi, and generate what seems for all practical purposes random. And the result is that even though the details of a causal network are completely determined once one knows the network one’s starting from, many of these details will appear effectively random.

So here’s the final result. If one assumes effective microscopic randomness, and one assumes that the behavior of the overall system does not lead to a change in overall limiting dimensions, then it follows that the large-scale behavior of the system satisfies Einstein’s Equations!

I think this is pretty exciting. From almost nothing, it’s possible to derive Einstein’s Equations. Which means that these simple networks reproduce the features of gravity that we know in current physics.

There are all sorts of technical things to say, not suitable for this general blog. Quite a few of them I already said long ago in *A New Kind of Science*—and particularly the notes at the back.

A few things are perhaps worth mentioning here. First, it’s worth noting that my underlying networks not only have no embedding in ordinary space intrinsically defined, but also don’t intrinsically define topological notions like inside and outside. All these things have to emerge.

When it comes to deriving the Einstein Equations, one creates Ricci tensors by looking at geodesics in the network, and looking at the growth rates of balls that start from each point on the geodesic.

The Einstein Equations one gets are the vacuum Einstein Equations. But just like with gravitational waves, one can effectively separate off features of space considered to be associated with “matter”, and then get Einstein’s full Equations, complete with “matter” energy-momentum terms.

As I write this, I realize how easily I still fall into technical “physics speak”. (I think it must be that I learned physics when I was so young…) But suffice it to say that at a high level the exciting thing is that from the simple idea of networks and causal invariant replacement rules, it’s possible to derive the Equations of General Relativity. One puts remarkably little in, yet one gets out that remarkable beacon of 20th-century physics: General Relativity.

It’s wonderful to be able to derive General Relativity. But that’s not all of physics. Another very important part is quantum mechanics. It’s going to get me too far afield to talk about this in detail here, but presumably particles—like electrons or quarks or Higgs bosons—must exist as certain special regions in the network. In qualitative terms, they might not be that different from Kelvin’s “knots in the ether”.

But then their behavior must follow the rules we know from quantum mechanics—or more particularly, quantum field theory. A key feature of quantum mechanics is that it can be formulated in terms of multiple paths of behavior, each associated with a certain quantum amplitude. I haven’t figured it all out, but there’s definitely a hint of something like this going on when one looks at the evolution of a network with many possible underlying sequences of replacements.

My network-based model doesn’t have official quantum amplitudes in it. It’s more like (but not precisely like) a classical, if effectively probabilistic, model. And for 50 years people have almost universally assumed that there’s a crippling problem with models like that. Because there’s a theorem (Bell’s Theorem) that says that unless there’s instantaneous non-local propagation of information, no such “hidden variables” model can reproduce the quantum mechanical results that are observed experimentally.

But there’s an important footnote. It’s pretty clear what “non-locality” means in ordinary space with a definite dimension. But what about in a network? Here it’s a different story. Because everything is just defined by connections. And even though the network may mostly correspond on a large scale to 3D space, it’s perfectly possible for there to be “threads” that join what would otherwise be quite separated regions. And the tantalizing thing is that there are indications that exactly such threads can be generated by particle-like structures propagating in the network.

OK, so it’s conceivable that some network-based model might be able to reproduce things from current physics. How might we set about finding such a model that actually reproduces our exact universe?

The traditional instinct would be to start from existing physics, and try to reverse engineer rules that could reproduce it. But is that the only way? What about just starting to enumerate possible rules, and seeing if any of them turn out to be our universe?

Before studying the computational universe of simple programs I would have assumed that this would be crazy: that there’s no way the rules for our universe could be simple enough to find by this kind of enumeration. But after seeing what’s out there in the computational universe—and seeing some other examples where amazing things were found just by a search—I’ve changed my mind.

So what happens if one actually starts doing such a search? Here’s the zoo of networks one gets after a fairly small number of steps by using all possible underlying rules of a certain very simple type:

Some of these networks very obviously aren’t our universe. They just freeze after a few steps, so time effectively stops. Or they have far too simple a structure for space. Or they effectively have an infinite number of dimensions. Or other pathologies.

But the exciting thing is that remarkably quickly one finds rules that aren’t obviously not our universe. Telling if they actually are our universe is a difficult matter. Because even if one simulates lots of steps, it can be arbitrarily difficult to know whether the behavior they’re showing is what one would expect in the early moments of a universe that follows the laws of physics as we know them.

There are plenty of encouraging features, though. For example, these universes can start from effectively infinite numbers of dimensions, then gradually settle to a finite number of dimensions—potentially removing the need for explicit inflation in the early universe.

And at a higher level, it’s worth remembering that if the models one’s using are simple enough, there’s a big distance between “neighboring models”, so it’s likely one will either reproduce known physics exactly, or be very wide of the mark.

In the end, though, one needs to reproduce not just the rule, but also the initial condition for the universe. But once one has that, one will in principle know the exact evolution of the universe. So does that means one would immediately be able to figure out everything about the universe? Absolutely not. Because of the phenomenon I call “computational irreducibility”—which implies that even though one may knows the rule and initial condition for a system, it can still require an irreducible amount of computational work to trace through every step in the behavior of the system to find out what it does.

Still, the possibility exists that one could just find a simple rule—and initial condition—that one could hold up and say, “This is our universe!” We’d have found our universe in the computational universe of all possible universes.

Of course this would be an exciting day for science.

But it would raise plenty of other questions. Like: why this rule, and not another? And why should our particular universe have a rule that shows up early enough in our list of all possible universes that we could actually find it just by enumeration?

One might think that it’d just be something about us being in this universe, and that causing us to choose an enumeration which makes it come up early. But my current guess is that it’d be something much more bizarre, such as that with respect to observers in a universe, all of a large class of nontrivial possible universe rules are actually equivalent, so one could pick any of them and get the exact same results, just in a different way.

But these are all speculations. And until we actually find a serious candidate rule for our universe, it’s probably not worth discussing these things much.

So, OK. Where are we at with all this right now? Most of what I’ve said here I had actually figured out by around 1999—several years before I finished *A New Kind of Science*. And though it was described in simple language rather than physics-speak, I managed to cover the highlights of it in Chapter 9 of the book—giving some of the technical details in the notes at the back.

But after the book was finished in 2002, I started working on the problem of physics again. I found it a bit amusing to say I had a computer in my basement that was searching for the fundamental theory of physics. But that really was what it was doing: enumerating possible rules of certain types, and trying to see if their behavior satisfied certain criteria that could make them plausible as models of physics.

I was pretty organized in what I did, getting intuition from simplified cases, then systematically going through more realistic cases. There were lots of technical issues. Like being able to visualize large evolving sequences of graphs. Or being able to quickly recognize subtle regularities that revealed that something couldn’t be our actual universe.

I accumulated the equivalent of thousands of pages of results, and was gradually beginning to get an understanding of the basic science of what systems based on networks can do.

In a sense, though, this was always just a hobby, done alongside my “day job” of leading our company and its technology development. And there was another “distraction”. For many years I had been interested in the problem of computational knowledge, and in building an engine that could comprehensively embody it. And as a result of my work on *A New Kind of Science*, I became convinced that this might be actually be possible—and that this might be the right decade to do it.

By 2005 it was clear that it was indeed possible, and so I decided to devote myself to actually doing it. The result was Wolfram|Alpha. And once Wolfram|Alpha was launched it became clear that even more could be done—and I have spent what I think has probably been my most productive decade ever building a huge tower of ideas and technology, which has now made possible the Wolfram Language and much more.

But over the course of that decade, I haven’t been doing physics. And when I now look at my filesystem, I see a large number of notebooks about physics, all nicely laid out with the things I figured out—and all left abandoned and untouched since the beginning of 2005.

Should I get back to the physics project? I definitely want to. Though there are also other things I want to do.

I’ve spent most of my life working on very large projects. And I work hard to plan what I’m going to do, usually starting to think about projects decades ahead of actually doing them. Sometimes I’ll avoid a project because the ambient technology or infrastructure to do it just isn’t ready yet. But once I embark on a project, I commit myself to finding a way make it succeed, even if it takes many years of hard work to do so.

Finding the fundamental theory of physics, though, is a project of a rather different character than I’ve done before. In a sense its definition of success is much harsher: one either solves the problem and finds the theory, or one doesn’t. Yes, one could explore lots of interesting abstract features of the type of theory one’s constructing (as string theory has done). And quite likely such an investigation will have interesting spinoffs.

But unlike building a piece of technology, or exploring an area of science, the definition of the project isn’t under one’s control. It’s defined by our universe. And it could be that I’m simply wrong about how our universe works. Or it could be that I’m right, but there’s too deep a barrier of computational irreducibility for us to know.

One might also worry that one would find what one thinks is the universe, but never be sure. I’m actually not too worried about this. I think there are enough clues from existing physics—as well as from anomalies attributed to things like dark matter—that one will be able to tell quite definitively if one has found the correct theory. It’ll be neat if one can make an immediate prediction that can be verified. But by the time one’s reproducing all the seemingly arbitrary masses of particles, and other known features of physics, one will be pretty sure one has the correct theory.

It’s been interesting over the years to ask my friends whether I should work on fundamental physics. I get three dramatically different kinds of responses.

The first is simply, “You’ve got to do it!” They say that the project is the most exciting and important thing one can imagine, and they can’t see why I’d wait another day before starting on it.

The second class of responses is basically, “Why would you do it?” Then they say something like, “Why don’t you solve the problem of artificial intelligence, or molecular construction, or biological immortality, or at least build a giant multibillion-dollar company? Why do something abstract and theoretical when you can do something practical to change the world?”

There’s also a third class of responses, which I suppose my knowledge of the history of science should make me expect. It’s typically from physicist friends, and typically it’s some combination of, “Don’t waste your time working on that!” and, “Please don’t work on that.”

The fact is that the current approach to fundamental physics—through quantum field theory—is nearly 90 years old. It’s had its share of successes, but it hasn’t brought us the fundamental theory of physics. But for most physicists today, the current approach is almost the definition of physics. So when they think about what I’ve been working on, it seems quite alien—like it isn’t really physics.

And some of my friends will come right out and say, “I hope you don’t succeed, because then all that work we’ve done is wasted.” Well, yes, some work will be wasted. But that’s a risk you take when you do a project where in effect nature decides what’s right. But I have to say that even if one can find a truly fundamental theory of physics, there’s still plenty of use for what’s been done with standard quantum field theory, for example in figuring out phenomena at the scale where we can do experiments with particle accelerators today.

So, OK, if I mounted a project to try to find the fundamental theory of physics, what would I actually do? It’s a complex project, that’ll need not just me, but a diverse team of talented other people too.

Whether or not it ultimately works, I think it’ll be quite interesting to watch—and I’d plan to do it as “spectator science”, making it as educational and accessible as possible. (Certainly that would be a pleasant change from the distraction-avoiding hermit mode in which I worked on *A New Kind of Science* for a decade.)

Of course I don’t know how difficult the project is, or whether it will even work at all. Ultimately that depends on what’s true about our universe. But based on what I did a decade ago, I have a clear plan for how to get started, and what kind of team I have to put together.

It’s going to need both good scientists and good technologists. There’s going to be lots of algorithm development for things like network evolution, and for analysis. I’m sure it’ll need abstract graph theory, modern geometry and probably group theory and other kinds of abstract algebra too. And I won’t be surprised if it needs lots of other areas of math and theoretical computer science as well.

It’ll need serious, sophisticated physics—with understanding of the upper reaches of quantum field theory and perhaps string theory and things like spin networks. It’s also likely to need methods that come from statistical physics and the modern theoretical frameworks around it. It’ll need an understanding of General Relativity and cosmology. And—if things go well—it’ll need an understanding of a diverse range of physics experiments.

There’ll be technical challenges too—like figuring out how to actually run giant network computations, and collect and visualize their results. But I suspect the biggest challenges will be in building the tower of new theory and understanding that’s needed to study the kinds of network systems I want to investigate. There’ll be useful support from existing fields. But in the end, I suspect this is going to require building a substantial new intellectual structure that won’t look much like anything that’s been done before.

Is it the right time to actually try doing this project? Maybe one should wait until computers are bigger and faster. Or certain areas of mathematics have advanced further. Or some more issues in physics have been clarified.

I’m not sure. But nothing I have seen suggests that there are any immediate roadblocks—other than putting the effort and resources into trying to do it. And who knows: maybe it will be easier than we think, and we’ll look back and wonder why it wasn’t tried long ago.

One of the key realizations that led to General Relativity 100 years ago was that Euclid’s fifth postulate (“parallel lines never cross”) might not be true in our actual universe, so that curved space is possible. But if my suspicions about space and the universe are correct, then it means there’s actually an even more basic problem in Euclid—with his very first definitions. Because if there’s a discrete network “underneath” space, then Euclid’s assumptions about points and lines that can exist anywhere in space simply aren’t correct.

General Relativity is a great theory—but we already know that it cannot be the final theory. And now we have to wonder how long it will be before we actually know the final theory. I’m hoping it won’t be too long. And I’m hoping that before too many more anniversaries of General Relativity have gone by we’ll finally know what spacetime really is.

]]>It all works fairly well for quick questions, or short commands (though we’re always trying to make it better!). But what about more sophisticated things? What’s the best way to communicate more seriously with AIs?

I’ve been thinking about this for quite a while, trying to fit together clues from philosophy, linguistics, neuroscience, computer science and other areas. And somewhat to my surprise, what I’ve realized recently is that a big part of the answer may actually be sitting right in front of me, in the form of what I’ve been building towards for the past 30 years: the Wolfram Language.

Maybe this is a case of having a hammer and then seeing everything as a nail. But I’m pretty sure there’s more to it. And at the very least, thinking through the issue is a way to understand more about AIs and their relation to humans.

The first key point—that I came to understand clearly only after a series of discoveries I made in basic science—is that computation is a very powerful thing, that lets even tiny programs (like cellular automata, or neural networks) behave in incredibly complicated ways. And it’s this kind of thing that an AI can harness.

Looking at pictures like this we might be pessimistic: how are we humans going to communicate usefully about all that complexity? Ultimately, what we have to hope is that we can build some kind of bridge between what our brains can handle and what computation can do. And although I didn’t look at it quite this way, this turns out to be essentially just what I’ve been trying to do all these years in designing the Wolfram Language.

I have seen my role as being to identify lumps of computation that people will understand and want to use, like FindShortestTour, ImageIdentify or Predict. Traditional computer languages have concentrated on low-level constructs close to the actual hardware of computers. But in the Wolfram Language I’ve instead started from what we humans understand, and then tried to capture as much of it as possible in the language.

In the early years, we were mostly dealing with fairly abstract concepts, about, say, mathematics or logic or abstract networks. But one of the big achievements of recent years—closely related to Wolfram|Alpha—has been that we’ve been able to extend the structure we built to cover countless real kinds of things in the world—like cities or movies or animals.

One might wonder: why invent a language for all this; why not just use, say, English? Well, for specific things, like “hot pink”, “new york city” or “moons of pluto”, English is good—and actually for such things the Wolfram Language lets people just use English. But when one’s trying to describe more complex things, plain English pretty quickly gets unwieldy.

Imagine for example trying to describe even a fairly simple algorithmic program. A back-and-forth dialog—“Turing-test style”—would rapidly get frustrating. And a straight piece of English would almost certainly end up with incredibly convoluted prose like one finds in complex legal documents.

But the Wolfram Language is built precisely to solve such problems. It’s set up to be readily understandable to humans, capturing the way humans describe and think about things. Yet it also has a structure that allows arbitrary complexity to be assembled and communicated. And, of course, it’s readily understandable not just by humans, but also by machines.

I realize I’ve actually been thinking and communicating in a mixture of English and Wolfram Language for years. When I give talks, for example, I’ll say something in English, then I’ll just start typing to communicate my next thought with a piece of Wolfram Language code that executes right there.

But let’s get back to AI. For most of the history of computing, we’ve built programs by having human programmers explicitly write lines of code, understanding (apart from bugs!) what each line does. But achieving what can reasonably be called AI requires harnessing more of the power of computation. And to do this one has to go beyond programs that humans can directly write—and somehow automatically sample a broader swath of possible programs.

We can do this through the kind of algorithm automation we’ve long used in *Mathematica* and the Wolfram Language, or we can do it through explicit machine learning, or through searching the computational universe of possible programs. But however we do it, one feature of the programs that come out is that they have no reason to be understandable by humans.

At some level it’s unsettling. We don’t know how the programs work inside, or what they might be capable of. But we know they’re doing elaborate computation that’s in a sense irreducibly complex to analyze.

There’s another, very familiar place where the same kind of thing happens: the natural world. Whether we look at fluid dynamics, or biology, or whatever, we see all sorts of complexity. And in fact the Principle of Computational Equivalence that emerged from the basic science I did implies that this complexity is in a sense exactly the same as the complexity that can occur in computational systems.

Over the centuries we’ve been able to identify aspects of the natural world that we can understand, and then harness them to create technology that’s useful to us. And our traditional engineering approach to programming works more or less the same way.

But for AI, we have to venture out into the broader computational universe, where—as in the natural world—we’re inevitably dealing with things we cannot readily understand.

Let’s imagine we have a perfect, complete AI, that’s able to do anything we might reasonably associate with intelligence. Maybe it’ll get input from lots of IoT sensors. And it has all sorts of computation going on inside. But what is it ultimately going to try to do? What is its purpose going to be?

This is about to dive into some fairly deep philosophy, involving issues that have been batted around for thousands of years—but which finally are going to really matter in dealing with AIs.

One might think that as an AI becomes more sophisticated, so would its purposes, and that eventually the AI would end up with some sort of ultimate abstract purpose. But this doesn’t make sense. Because there is really no such thing as abstractly defined absolute purpose, derivable in some purely formal mathematical or computational way. Purpose is something that’s defined only with respect to humans, and their particular history and culture.

An “abstract AI”, not connected to human purposes, will just go along doing computation. And as with most cellular automata and most systems in nature, we won’t be able to identify—or attribute—any particular “purpose” to that computation, or to the system that does it.

Technology has always been about automating things so humans can define goals, and then those goals can automatically be achieved by the technology.

For most kinds of technology, those goals have been tightly constrained, and not too hard to describe. But for a general computational system they can be completely arbitrary. So then the challenge is how to describe them.

What do you say to an AI to tell it what you want it to do for you? You’re not going to be able to tell it exactly what to do in each and every circumstance. You’d only be able to do that if the computations the AI could do were tightly constrained, like in traditional software engineering. But for the AI to work properly, it’s going to have to make use of broader parts of the computational universe. And it’s then a consequence of a phenomenon I call computational irreducibility that you’ll never be able to determine everything it’ll do.

So what’s the best way to define goals for an AI? It’s complicated. If the AI can experience your life alongside you—seeing what you see, reading your email, and so on—then, just like with a person you know well, you might be able to tell the AI at least simple goals just by saying them in natural language.

But what if you want to define more complex goals, or goals that aren’t closely associated with what the AI has already experienced? Then small amounts of natural language wouldn’t be enough. Perhaps the AI could go through a whole education. But a better idea would be to leverage what we have in the Wolfram Language, which in effect already has lots of knowledge of the world built into it, in a way that both the human and the AI can use.

Thinking about how humans communicate with AIs is one thing. But how will AIs communicate with one another? One might imagine they could do literal transfers of their underlying representations of knowledge. But that wouldn’t work, because as soon as two AIs have had different “experiences”, the representations they use will inevitably be at least somewhat different.

And so, just like humans, the AIs are going to end up needing to use some form of symbolic language that represents concepts abstractly, without specific reference to the underlying representations of those concepts.

One might then think the AIs should just communicate in English; at least that way we’d be able to understand them! But it wouldn’t work out. Because the AIs would inevitably need to progressively extend their language—so even if it started as English, it wouldn’t stay that way.

In human natural languages, new words get added when there are new concepts that are widespread enough to make representing them in the language useful. Sometimes a new concept is associated with something new in the world (“blog”, “emoji”, “smartphone”, “clickbait”, etc.); sometimes it’s associated with a new distinction among existing things (“road” vs. “freeway”, “pattern” vs. “fractal”).

Often it’s science that gives us new distinctions between things, by identifying distinct clusters of behavior or structure. But the point is that AIs can do that on a much larger scale than humans. For example, our Image Identification Project is set up to recognize the 10,000 or so kinds of objects that we humans have everyday names for. But internally, as it’s trained on images from the world, it’s discovering all sorts of other distinctions that we don’t have names for, but that are successful at robustly separating things.

I’ve called these “post-linguistic emergent concepts” (or PLECs). And I think it’s inevitable that in a population of AIs, an ever-expanding hierarchy of PLECs will appear, forcing the language of the AIs to progressively expand.

But how could the framework of English support that? I suppose each new concept could be assigned a word formed from some hash-code-like collection of letters. But a structured symbolic language—as the Wolfram Language is—provides a much better framework. Because it doesn’t require the units of the language to be simple “words”, but allows them to be arbitrary lumps of symbolic information, such as collections of examples (so that, for example, a word can be represented by a symbolic structure that carries around its definitions).

So should AIs talk to each other in Wolfram Language? It seems to make a lot of sense—because it effectively starts from the understanding of the world that’s been developed through human knowledge, but then provides a framework for going further. It doesn’t matter how the syntax is encoded (input form, XML, JSON, binary, whatever). What matters is the structure and content that are built into the language.

Over the course of the billions of years that life has existed on Earth, there’ve been a few different ways of transferring information. The most basic is genomics: passing information at the hardware level. But then there are neural systems, like brains. And these get information—like our Image Identification Project—by accumulating it from experiencing the world. This is the mechanism that organisms use to see, and to do many other “AI-ish” things.

But in a sense this mechanism is fundamentally limited, because every different organism—and every different brain—has to go through the whole process of learning for itself: none of the information obtained in one generation can readily be passed to the next.

But this is where our species made its great invention: natural language. Because with natural language it’s possible to take information that’s been learned, and communicate it in abstract form, say from one generation to the next. There’s still a problem however, because when natural language is received, it still has to be interpreted, in a separate way in each brain.

And this is where the idea of a computational-knowledge language—like the Wolfram Language—is important: because it gives a way to communicate concepts and facts about the world, in a way that can immediately and reproducibly be executed, without requiring separate interpretation on the part of whatever receives it.

It’s probably not a stretch to say that the invention of human natural language was what led to civilization and our modern world. So then what are the implications of going to another level: of having a precise computational-knowledge language, that carries not just abstract concepts, but also a way to execute them?

One possibility is that it may define the civilization of the AIs, whatever that may turn out to be. And perhaps this may be far from what we humans—at least in our present state—can understand. But the good news is that at least in the case of the Wolfram Language, precise computational-knowledge language isn’t incomprehensible to humans; in fact, it was specifically constructed to be a bridge between what humans can understand, and what machines can readily deal with.

So let’s imagine a world in which in addition to natural language, it’s also common for communication to occur through a computational-knowledge language like the Wolfram Language. Certainly, a lot of the computational-knowledge-language communication will be between machines. But some of it will be between humans and machines, and quite possibly it would be the dominant form of communication here.

In today’s world, only a small fraction of people can write computer code—just as, 500 or so years ago, only a small fraction of people could write natural language. But what if a wave of computer literacy swept through, and the result was that most people could write knowledge-based code?

Natural language literacy enabled many features of modern society. What would knowledge-based code literacy enable? There are plenty of simple things. Today you might get a menu of choices at a restaurant. But if people could read code, there could be code for each choice, that you could readily modify to your liking. (And actually, something very much like this is soon going be possible—with Wolfram Language code—for biology and chemistry lab experiments.) Another implication of people being able to read code is for rules and contracts: instead of just writing prose to be interpreted, one can have code to be read by humans and machines alike.

But I suspect the implications of widespread knowledge-based code literacy will be much deeper—because it will not only give a wide range of people a new way to express things, but will also give them a new way to think about them.

So, OK, let’s say we want to use the Wolfram Language to communicate with AIs. Will it actually work? To some extent we know it already does. Because inside Wolfram|Alpha and the systems based on it, what’s happening is that natural language questions are being converted to Wolfram Language code.

But what about more elaborate applications of AI? Many places where the Wolfram Language is used are examples of AI, whether they’re computing with images or text or data or symbolic structures. Sometimes the computations involve algorithms whose goals we can precisely define, like FindShortestTour; sometimes they involve algorithms whose goals are less precise, like ImageIdentify. Sometimes the computations are couched in the form of “things to do”, sometimes as “things to look for” or “things to aim for”.

We’ve come a long way in representing the world in the Wolfram Language. But there’s still more to do. Back in the 1600s it was quite popular to try to create “philosophical languages” that would somehow symbolically capture the essence of everything one could think about. Now we need to really do this. And, for example, to capture in a symbolic way all the kinds of actions and processes that can happen, as well as things like peoples’ beliefs and mental states. As our AIs become more sophisticated and more integrated into our lives, representing these kinds of things will become more important.

For some tasks and activities we’ll no doubt be able to use pure machine learning, and never have to build up any kind of intermediate structure or language. But much as natural language was crucial in enabling our species to get where we have, so also having an abstract language will be important for the progress of AI.

I’m not sure what it would look like, but we could perhaps imagine using some kind of pure emergent language produced by the AIs. But if we do that, then we humans can expect to be left behind, and to have no chance of understanding what the AIs are doing. But with the Wolfram Language we have a bridge, because we have a language that’s suitable for both humans and AIs.

There’s much to be said about the interplay between language and computation, humans and AIs. Perhaps I need to write a book about it. But my purpose here has been to describe a little of my current thinking, particularly my realizations about the Wolfram Language as a bridge between human understanding and AI.

With pure natural language or traditional computer language, we’ll be hard pressed to communicate much to our AIs. But what I’ve been realizing is that with Wolfram Language there’s a much richer alternative, readily extensible by the AIs, but built on a base that leverages human natural language and human knowledge to maintain a connection with what we humans can understand. We’re seeing early examples already… but there’s a lot further to go, and I’m looking forward to actually building what’s needed, as well as writing about it…

]]>When George Boole came onto the scene, the disciplines of logic and mathematics had developed quite separately for more than 2000 years. And George Boole’s great achievement was to show how to bring them together, through the concept of what’s now called Boolean algebra. And in doing so he effectively created the field of mathematical logic, and set the stage for the long series of developments that led for example to universal computation.

When George Boole invented Boolean algebra, his basic goal was to find a set of mathematical axioms that could reproduce the classical results of logic. His starting point was ordinary algebra, with variables like *x* and *y*, and operations like addition and multiplication.

At first, ordinary algebra seems a lot like logic. After all, *p and q* is the same as *q and p*, just as *p×q* = *q×p*. But if one looks in more detail, there are differences. Like *p×p* = *p ^{2}*, but

Boole was rather informal in the way he described his axiom system. But within a few decades, it had been more precisely formalized, and over the course of the century that followed, a few progressively simpler forms of it were found. And then, as it happens, 16 years ago I ended up finishing this 150-year process, by finding—largely as a side effect of other science I was doing—the provably very simplest possible axiom system for logic, that actually happens to consist of just a single axiom.

I thought this axiom was pretty neat, and looking at where it lies in the space of possible axioms has interesting implications for the foundations of mathematics and logic. But in the context of George Boole, one can say that it’s a minimal version of his big idea: that one can have a mathematical axiom system that reproduces all the results of logic just by what amount to simple algebra-like transformations.

But let’s talk about George Boole, the person. Who was he, and how did he come to do what he did?

George Boole was born (needless to say) in 1815, in England, in the fairly small town of Lincoln, about 120 miles north of London. His father had a serious interest in science and mathematics, and had a small business as a shoemaker. George Boole was something of a self-taught prodigy, who first became locally famous at age 14 with a translation of a Greek poem that he published in the local newspaper. At age 16 he was hired as a teacher at a local school, and by that time he was reading calculus books, and apparently starting to formulate what would later be his idea about relations between mathematics and logic.

At age 19, George Boole did a startup: he started his own elementary school. It seems to have been decently successful, and in fact Boole continued making his living running (or “conducting” as it was then called) schools until he was in his thirties. He was involved with a few people educated in places like Cambridge, notably through the local Mechanics’ Institute (a little like a modern community college). But mostly he seems just to have learned by reading books on his own.

He took his profession as a schoolteacher seriously, and developed all sorts of surprisingly modern theories about the importance of understanding and discovery (as opposed to rote memorization), and the value of tangible examples in areas like mathematics (he surely would have been thrilled by what’s now possible with computers).

When he was 23, Boole started publishing papers on mathematics. His early papers were about hot topics of the time, such as calculus of variations. Perhaps it was his interest in education and exposition that led him to try creating different formalisms, but soon he became a pioneer in the “calculus of operations”: doing calculus by manipulating operators rather than explicit algebraic expressions.

It wasn’t long before he was interacting with leading British mathematicians of the day, and getting positive feedback. He considered going to Cambridge to become a “university person”, but was put off when told that he would have to start with the standard undergraduate course, and stop doing his own research.

Logic as a field of study had originated in antiquity, particularly with the work of Aristotle. It had been a staple of education throughout the Middle Ages and beyond, fitting into the practice of rote learning by identifying specific patterns of logical arguments (“syllogisms”) with mnemonics like “bArbArA” and “cElArEnt”. In many ways, logic hadn’t changed much in over a thousand years, though by the 1800s there were efforts to make it more streamlined and “formal”. But the question was how. And in particular, should this happen through the methods of philosophy, or mathematics?

In early 1847, Boole’s friend Augustus de Morgan had become embroiled in a piece of academic unpleasantness over the question. And this led Boole quickly to go off and work out his earlier ideas about how logic could be formulated using mathematics. The result was his first book, *The Mathematical Analysis of Logic*, published the same year:

The book was not long—only 86 pages. But it explained Boole’s idea of representing logic using a form of algebra. The notion that one could have an algebra with variables that weren’t just ordinary numbers happened to have just arisen in Hamilton’s 1843 invention of quaternion algebra—and Boole was influenced by this. (Galois had also done something similar in 1832 working on groups and finite fields.)

150 years before Boole, Gottfried Leibniz had also thought about using algebra to represent logic. But he’d never managed to see quite how. And the idea seems to have been all but forgotten until Boole finally succeeded in doing it in 1847.

Looking at Boole’s book today, much of it is quite easy to understand. Here, for example, is him showing how his algebraic formulation reproduces a few standard results in logic:

At a surface level, this all seems fairly straightforward. “And” is represented by multiplication of variables *xy*, “not” by *1–x*, and “(exclusive) or” by *x+y–2xy*. There are also extra constraints like *x ^{2}=x*. But when one tries digging deeper, things become considerably murkier. Just what are

When Boole wrote his first book he was still working as a teacher and running a school. But he had also become well known as a mathematician, and in 1849, when Queen’s College, Cork (now University College Cork) opened in Ireland, Boole was hired as its first math professor. And once in Cork, Boole started to work on what would become his most famous book, *An Investigation of the Laws of Thought*:

His preface began: “The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed; to give expression to them in the symbolical language of a Calculus, and upon this foundation to establish the science of Logic and construct its method; …”

Boole appears to have seen himself as trying to create a calculus for the “science of intellectual powers” analogous to Newton’s calculus for physical science. But while Newton had been able to rely on concepts like space and time to inform the structure of his calculus, Boole had to build on the basis of a model of how the mind works, which for him was unquestionably logic.

The first part of *Laws of Thought* is basically a recapitulation of Boole’s earlier book on logic, but with additional examples—such as a chapter covering logical proofs about the existence and characteristics of God. The second part of the book is in a sense more mathematically traditional. For instead of interpreting his algebraic variables as related to logic, he interprets them as traditional numbers corresponding to probabilities—and in doing so shows that the laws for combining probabilities of events have the same structure as the laws for combining logical statements.

For the most part *Laws of Thought* reads like a mathematical work, with abstract definitions and formal conclusions. But in the final chapter Boole tries to connect what he has done to empirical questions about the operation of the mind. He discusses how free will can be compatible with definite laws of thought. He talks about how imprecise human experiences can lead to precise concepts. He discusses whether there is truth that humans can recognize that goes beyond what mathematical laws can ever explain. And he talks about how an understanding of human thinking should inform education.

After the publication of *Laws of Thought*, George Boole stayed in Cork, living another decade and dying in 1864 of pneumonia at the age of 49. He continued to publish widely on mathematics, but never published on logic again, though he probably intended to do so.

In his lifetime, Boole was much more recognized for his work on traditional mathematics than on logic. He wrote two textbooks, one in 1859 on differential equations, and one in 1860 on difference equations. Both are clean and elegant expositions. And interestingly, while there are endless modern alternatives to Boole’s *Differential Equations*, sufficiently little has been done on difference equations that when we were implementing them in *Mathematica* in the late 1990s, Boole’s 1860 book was still an important reference, notable especially for its nice examples of the factorization of linear difference operators.

What was Boole like as a person? There’s quite a bit of information on this, not least from his wife’s writings and from correspondence and reminiscences his sister collected when he died. From what one can tell, Boole was organized and diligent, with careful attention to detail. He worked hard, often late into the night, and could be so engrossed in his work that he became quite absent minded. Despite how he looks in pictures, he appears to have been rather genial in person. He was well liked as a teacher, and was a talented lecturer, though his blackboard writing was often illegible. He was a gracious and extensive correspondent, and made many visits to different people and places. He spent many years managing people, first at schools, and then at the university in Cork. He had a strong sense of justice, and while he did not like controversy, he was occasionally involved in it, and was not shy to maintain his position.

Despite his successes, Boole seems to have always thought of himself as a self-taught schoolteacher, rather than a member of the academic elite. And perhaps this helped in his ability to take intellectual risks. Whether it was playing fast and loose with differential operators in calculus, or finding ways to bend the laws of algebra so they could apply to logic, Boole seems to have always taken the attitude of just moving forward and seeing where he could go, trusting his own sense of what was correct and true.

Boole was single most of his life, though finally married at the age of 40. His wife, Mary Everest Boole, was 17 years his junior, and she outlived him by 52 years, dying in 1916. She had an interesting story in her own right, later in her life writing books with titles like *Philosophy and Fun of Algebra*, *Logic Taught by Love*, *The Preparation of the Child for Science* and *The Message of Psychic Science to the World*. George and Mary Boole had five daughters—who, along with their own children, had a wide range of careers and accomplishments, some quite mathematical.

It is something of an irony that George Boole, committed as he was to the methods of algebra, calculus and continuous mathematics, should have come to symbolize discrete variables. But to be fair, this took a while. In the decades after he died, the primary influence of Boole’s work on logic was on the wave of abstraction and formalization that swept through mathematics—involving people like Frege, Peano, Hilbert, Whitehead, Russell and eventually Gödel and Turing. And it was only in 1937, with the work of Claude Shannon on switching networks, that Boolean algebra began to be used for practical purposes.

Today there is a lot on Boolean computation in *Mathematica* and the Wolfram Language, and in fact George Boole is the person with the largest number (15) of distinct functions in the system named after them.

But what has made Boole’s name so widely known is not Boolean algebra, it’s the much simpler notion of Boolean variables, which appear in essentially every computer language—leading to a progressive increase in mentions of the word “boolean” in publications since the 1950s:

Was this inevitable? In some sense, I suspect it was. For when one looks at history, sufficiently simple formal ideas have a remarkable tendency to eventually be widely used, even if they emerge only slowly from quite complex origins. Most often what happens is that at some moment the ideas become relevant to technology, and quickly then go from curiosities to mainstream.

My work on *A New Kind of Science* has made me think about enumerations of what amount to all possible “simple formal ideas”. Some have already become incorporated in technology, but many have not yet. But the story of George Boole and Boolean variables provides an interesting example of what can happen over the course of centuries—and how what at first seems obscure and abstruse can eventually become ubiquitous.

I’ve built systems that give computers all sorts of intelligence, much of it far beyond the human level. And for a long time we’ve been integrating all that intelligence into the Wolfram Language.

Now I’m excited to be able to say that we’ve reached a milestone: there’s finally a function called ImageIdentify built into the Wolfram Language that lets you ask, “What is this a picture of?”—and get an answer.

And today we’re launching the Wolfram Language Image Identification Project on the web to let anyone easily take any picture (drag it from a web page, snap it on your phone, or load it from a file) and see what ImageIdentify thinks it is:

It won’t always get it right, but most of the time I think it does remarkably well. And to me what’s particularly fascinating is that when it does get something wrong, the mistakes it makes mostly seem remarkably human.

It’s a nice practical example of artificial intelligence. But to me what’s more important is that we’ve reached the point where we can integrate this kind of “AI operation” right into the Wolfram Language—to use as a new, powerful building block for knowledge-based programming.

In a Wolfram Language session, all you need do to identify an image is feed it to the ImageIdentify function:

What you get back is a symbolic entity, that the Wolfram Language can then do more computation with—like, in this case, figure out if you’ve got an animal, a mammal, etc. Or just ask for a definition:

Or, say, generate a word cloud from its Wikipedia entry:

And if one had lots of photographs, one could immediately write a Wolfram Language program that, for example, gave statistics on the different kinds of animals, or planes, or devices, or whatever, that appear in the photographs.

With ImageIdentify built right into the Wolfram Language, it’s easy to create APIs, or apps, that use it. And with the Wolfram Cloud, it’s also easy to create websites—like the Wolfram Language Image Identification Project.

For me personally, I’ve been waiting a long time for ImageIdentify. Nearly 40 years ago I read books with titles like *The Computer and the Brain* that made it sound inevitable we’d someday achieve artificial intelligence—probably by emulating the electrical connections in a brain. And in 1980, buoyed by the success of my first computer language, I decided I should think about what it would take to achieve full-scale artificial intelligence.

Part of what encouraged me was that—in an early premonition of the Wolfram Language—I’d based my first computer language on powerful symbolic pattern matching that I imagined could somehow capture certain aspects of human thinking. But I knew that while tasks like image identification were also based on pattern matching, they needed something different—a more approximate form of matching.

I tried to invent things like approximate hashing schemes. But I kept on thinking that brains manage to do this; we should get clues from them. And this led me to start studying idealized neural networks and their behavior.

Meanwhile, I was also working on some fundamental questions in natural science—about cosmology and about how structures arise in our universe—and studying the behavior of self-gravitating collections of particles.

And at some point I realized that both neural networks and self-gravitating gases were examples of systems that had simple underlying components, but somehow achieved complex overall behavior. And in getting to the bottom of this, I wound up studying cellular automata and eventually making all the discoveries that became *A New Kind of Science*.

So what about neural networks? They weren’t my favorite type of system: they seemed a little too arbitrary and complicated in their structure compared to the other systems that I studied in the computational universe. But every so often I would think about them again, running simulations to understand more about the basic science of their behavior, or trying to see how they could be used for practical tasks like approximate pattern matching:

Neural networks in general have had a remarkable roller-coaster history. They first burst onto the scene in the 1940s. But by the 1960s, their popularity had waned, and the word was that it had been “mathematically proven” that they could never do anything very useful.

It turned out, though, that that was only true for one-layer “perceptron” networks. And in the early 1980s, there was a resurgence of interest, based on neural networks that also had a “hidden layer”. But despite knowing many of the leaders of this effort, I have to say I remained something of a skeptic, not least because I had the impression that neural networks were mostly getting used for tasks that seemed like they would be easy to do in lots of other ways.

I also felt that neural networks were overly complex as formal systems—and at one point even tried to develop my own alternative. But still I supported people at my academic research center studying neural networks, and included papers about them in my *Complex Systems* journal.

I knew that there were practical applications of neural networks out there—like for visual character recognition—but they were few and far between. And as the years went by, little of general applicability seemed to emerge.

Meanwhile, we’d been busy developing lots of powerful and very practical ways of analyzing data, in *Mathematica* and in what would become the Wolfram Language. And a few years ago we decided it was time to go further—and to try to integrate highly automated general machine learning. The idea was to make broad, general functions with lots of power; for example, to have a single function Classify that could be trained to classify any kind of thing: say, day vs. night photographs, sounds from different musical instruments, urgency level of email, or whatever.

We put in lots of state-of-the-art methods. But, more importantly, we tried to achieve complete automation, so that users didn’t have to know anything about machine learning: they just had to call Classify.

I wasn’t initially sure it was going to work. But it does, and spectacularly.

People can give training data on pretty much anything, and the Wolfram Language automatically sets up classifiers for them to use. We’re also providing more and more built-in classifiers, like for languages, or country flags:

And a little while ago, we decided it was time to try a classic large-scale classifier problem: image identification. And the result now is ImageIdentify.

What is image identification really about? There are some number of named kinds of things in the world, and the point is to tell which of them a particular picture is of. Or, more formally, to map all possible images into a certain set of symbolic names of objects.

We don’t have any intrinsic way to describe an object like a chair. All we can do is just give lots of examples of chairs, and effectively say, “Anything that looks like one of these we want to identify as a chair.” So in effect we want images that are “close” to our examples of chairs to map to the name “chair”, and others not to.

Now, there are lots of systems that have this kind of “attractor” behavior. As a physical example, think of a mountainscape. A drop of rain may fall anywhere on the mountains, but (at least in an idealized model) it will flow down to one of a limited number of lowest points. Nearby drops will tend to flow to the same lowest point. Drops far away may be on the other side of a watershed, and so will flow to other lowest points.

The drops of rain are like our images; the lowest points are like the different kinds of objects. With raindrops we’re talking about things physically moving, under gravity. But images are composed of digital pixels. And instead of thinking about physical motion, we have to think about digital values being processed by programs.

And exactly the same “attractor” behavior can happen there. For example, there are lots of cellular automata in which one can change the colors of a few cells in their initial conditions, but still end up in the same fixed “attractor” final state. (Most cellular automata actually show more interesting behavior, that doesn’t go to a fixed state, but it’s less clear how to apply this to recognition tasks.)

So what happens if we take images and apply cellular automaton rules to them? In effect we’re doing image processing, and indeed some common image processing operations (both done on computers and in human visual processing) are just simple 2D cellular automata.

It’s easy to get cellular automata to pick out certain features of an image—like blobs of dark pixels. But for real image identification, there’s more to do. In the mountain analogy, we have to “sculpt” the mountainscape so that the right raindrops flow to the right points.

So how do we do this? In the case of digital data like images, it isn’t known how to do this in one fell swoop; we only know how to do it iteratively, and incrementally. We have to start from a base “flat” system, and gradually do the “sculpting”.

There’s a lot that isn’t known about this kind of iterative sculpting. I’ve thought about it quite extensively for discrete programs like cellular automata (and Turing machines), and I’m sure something very interesting can be done. But I’ve never figured out just how.

For systems with continuous (real-number) parameters, however, there’s a great method called back propagation—that’s based on calculus. It’s essentially a version of the very common method of gradient descent, in which one computes derivatives, then uses them to work out how to change parameters to get the system one is using to better fit the behavior one wants.

So what kind of system should one use? A surprisingly general choice is neural networks. The name makes one think of brains and biology. But for our purposes, neural networks are just formal, computational, systems, that consist of compositions of multi-input functions with continuous parameters and discrete thresholds.

How easy is it to make one of these neural networks perform interesting tasks? In the abstract, it’s hard to know. And for at least 20 years my impression was that in practice neural networks could mostly do only things that were also pretty easy to do in other ways.

But a few years ago that began to change. And one started hearing about serious successes in applying neural networks to practical problems, like image identification.

What made that happen? Computers (and especially linear algebra in GPUs) got fast enough that—with a variety of algorithmic tricks, some actually involving cellular automata—it became practical to train neural networks with millions of neurons, on millions of examples. (By the way, these were “deep” neural networks, no longer restricted to having very few layers.) And somehow this suddenly brought large-scale practical applications within reach.

I don’t think it’s a coincidence that this happened right when the number of artificial neurons being used came within striking distance of the number of neurons in relevant parts of our brains.

It’s not that this number is significant on its own. Rather, it’s that if we’re trying to do tasks—like image identification—that human brains do, then it’s not surprising if we need a system with a similar scale.

Humans can readily recognize a few thousand kinds of things—roughly the number of picturable nouns in human languages. Lower animals likely distinguish vastly fewer kinds of things. But if we’re trying to achieve “human-like” image identification—and effectively map images to words that exist in human languages—then this defines a certain scale of problem, which, it appears, can be solved with a “human-scale” neural network.

There are certainly differences between computational and biological neural networks—although after a network is trained, the process of, say, getting a result from an image seems rather similar. But the methods used to train computational neural networks are significantly different from what it seems plausible for biology to use.

Still, in the actual development of ImageIdentify, I was quite shocked at how much was reminiscent of the biological case. For a start, the number of training images—a few tens of millions—seemed very comparable to the number of distinct views of objects that humans get in their first couple of years of life.

There were also quirks of training that seemed very close to what’s seen in the biological case. For example, at one point, we’d made the mistake of having no human faces in our training. And when we showed a picture of Indiana Jones, the system was blind to the presence of his face, and just identified the picture as a hat. Not surprising, perhaps, but to me strikingly reminiscent of the classic vision experiment in which kittens reared in an environment of vertical stripes are blind to horizontal stripes.

Probably much like the brain, the ImageIdentify neural network has many layers, containing a variety of different kinds of neurons. (The overall structure, needless to say, is nicely described by a Wolfram Language symbolic expression.)

It’s hard to say meaningful things about much of what’s going on inside the network. But if one looks at the first layer or two, one can recognize some of the features that it’s picking out. And they seem to be remarkably similar to features we know are picked out by real neurons in the primary visual cortex.

I myself have long been interested in things like visual texture recognition (are there “texture primitives”, like there are primary colors?), and I suspect we’re now going to be able to figure out a lot about this. I also think it’s of great interest to look at what happens at later layers in the neural network—because if we can recognize them, what we should see are “emergent concepts” that in effect describe classes of images and objects in the world—including ones for which we don’t yet have words in human languages.

Like many projects we tackle for the Wolfram Language, developing ImageIdentify required bringing many diverse things together. Large-scale curation of training images. Development of a general ontology of picturable objects, with mapping to standard Wolfram Language constructs. Analysis of the dynamics of neural networks using physics-like methods. Detailed optimization of parallel code. Even some searching in the style of *A New Kind of Science* for programs in the computational universe. And lots of judgement calls about how to create functionality that would actually be useful in practice.

At the outset, it wasn’t clear to me that the whole ImageIdentify project was going to work. And early on, the rate of utterly misidentified images was disturbingly high. But one issue after another got addressed, and gradually it became clear that finally we were at a point in history when it would be possible to create a useful ImageIdentify function.

There were still plenty of problems. The system would do well on certain things, but fail on others. Then we’d adjust something, and there’d be new failures, and a flurry of messages with subject lines like “We lost the anteaters!” (about how pictures that ImageIdentify used to correctly identify as anteaters were suddenly being identified as something completely different).

Debugging ImageIdentify was an interesting process. What counts as reasonable input? What’s reasonable output? How should one make the choice between getting more-specific results, and getting results that one’s more certain aren’t incorrect (just a dog, or a hunting dog, or a beagle)?

Sometimes we saw things that at first seemed completely crazy. A pig misidentified as a “harness”. A piece of stonework misidentified as a “moped”. But the good news was that we always found a cause—like confusion from the same irrelevant objects repeatedly being in training images for a particular type of object (e.g. “the only time ImageIdentify had ever seen that type of Asian stonework was in pictures that also had mopeds”).

To test the system, I often tried slightly unusual or unexpected images:

And what I found was something very striking, and charming. Yes, ImageIdentify could be completely wrong. But somehow the errors seemed very understandable, and in a sense very human. It seemed as if what ImageIdentify was doing was successfully capturing some of the essence of the human process of identifying images.

So what about things like abstract art? It’s a kind of Rorschach-like test for both humans and machines—and an interesting glimpse into the “mind” of ImageIdentify:

Something like ImageIdentify will never truly be finished. But a couple of months ago we released a preliminary version in the Wolfram Language. And today we’ve updated that version, and used it to launch the Wolfram Language Image Identification Project.

We’ll continue training and developing ImageIdentify, not least based on feedback and statistics from the site. Like for Wolfram|Alpha in the domain of natural language understanding, without actual usage by humans there’s no real way to realistically assess progress—or even to define just what the goals should be for “natural image understanding”.

I must say that I find it fun to play with the Wolfram Language Image Identification Project. It’s satisfying after all these years to see this kind of artificial intelligence actually working. But more than that, when you see ImageIdentify respond to a weird or challenging image, there’s often a certain “aha” feeling, like one was just shown in a very human-like way some new insight—or joke—about an image.

Underneath, of course, it’s just running code—with very simple inner loops that are pretty much the same as, for example, in my neural network programs from the beginning of the 1980s (except that now they’re Wolfram Language functions, rather than low-level C code).

It’s a fascinating—and extremely unusual—example in the history of ideas: neural networks were studied for 70 years, and repeatedly dismissed. Yet now they are what has brought us success in such a quintessential example of an artificial intelligence task as image identification. I expect the original pioneers of neural networks—like Warren McCulloch and Walter Pitts—would find little surprising about the core of what the Wolfram Language Image Identification Project does, though they might be amazed that it’s taken 70 years to get here.

But to me the greater significance is what can now be done by integrating things like ImageIdentify into the whole symbolic structure of the Wolfram Language. What ImageIdentify does is something humans learn to do in each generation. But symbolic language gives us the opportunity to represent shared intellectual achievements across all of human history. And making all these things computational is, I believe, something of monumental significance, that I am only just beginning to understand.

But for today, I hope you will enjoy the Wolfram Language Image Identification Project. Think of it as a celebration of where artificial intelligence has reached. Think of it as an intellectual recreation that helps build intuition for what artificial intelligence is like. But don’t forget the part that I think is most exciting: it’s also practical technology, that you can use here and now in the Wolfram Language, and deploy wherever you want.

My idea was to write code with our standard Wolfram Programming Cloud, but instead of producing a web app or web API, to produce an app for the Apple Watch. And conveniently enough, a preliminary version of our Wolfram Cloud app just became available in the App Store—letting me deploy from the Wolfram Cloud to both mobile devices and the watch.

To some extent it was adventure programming. The Apple Watch was just coming out, and the Wolfram Cloud app was still just preliminary. But of course I was building on nearly 30 years of progressive development of the Wolfram Language. And I’m happy to say that it didn’t take long for me to start getting interesting Wolfram Language apps running on the watch. And after less than a day of work—with help from a handful of other people—I had 25 watch-ready apps:

All of these I built by writing code in the Wolfram Programming Cloud (either on the web or the desktop), then deploying to the Wolfram Cloud, and connecting to the Apple Watch via the Wolfram Cloud app. And although the apps were designed for the Apple Watch, you can actually also use them on the web, or on a phone. There are links to the web versions scattered through this post. To get the apps onto your phone and watch, just go to this page and follow the instructions. That page also has all the Wolfram Language source code for the apps, and you can use any Wolfram Language system—Wolfram Programming Cloud (including the free version), *Mathematica* etc.—to experiment with the code for yourself, and perhaps deploy your own version of any of the apps.

So how does it all work? For my first watch-app-writing session, I decided to start by making a tiny app that just generates a single random number. The core Wolfram Language code to do that is simply:

For the watch we want the number to look nice and bold and big, and it might as well be a random color:

We can immediately deploy this publicly to the cloud by saying:

And if you go to that URL in any web browser, you’ll get to a minimal web app which immediately gives a web page with a random number. (The Delayed in the code says to delay the computation until the moment the page is accessed or refreshed, so you get a fresh random number each time.)

So what about getting this to the Apple Watch? First, it has to get onto an iPhone. And that’s easy. Because anything that you’ve deployed to the Wolfram Cloud is automatically accessible on an iPhone through the Wolfram Cloud app. To make it easy to find, it’s good to add a recognizable name and icon. And if it’s ultimately headed for the watch, it’s good to put it on a black background:

And now if you go to this URL in a web browser, you’ll find a public version of the app there. Inside the Wolfram Cloud app on an iPhone, the app appears inside the WatchApps folder:

And now, if you touch the app icon, you’ll run the Wolfram Language code in the Wolfram Cloud, and back will come a random number, displayed on the phone:

If you want to run the app again, and get a fresh random number, just pull down from the top of the phone.

To get the app onto the watch, go back to the listing of apps, then touch the watch icon at the top and select the app. This will get the app listed on the watch that’s paired with your phone:

Now just touch the entry for the RandomNumber app and it’ll go to the Wolfram Cloud, run the Wolfram Language code, and display a random number on the watch:

It’s simple to make all sorts of “randomness apps” with the Wolfram Language. Here’s the core of a Coin Flip app:

And this is all it takes to deploy the app, to the web, mobile and watch:

One might argue that it’s overkill to use our sophisticated technology stack to do this. After all, it’s easy enough to flip a physical coin. But that assumes you have one of those around (which I, for one, don’t any more). Plus, the Coin Flip app will make better randomness.

What about playing Rock, Paper, Scissors with your watch? The core code for that is again trivial:

There’s a huge amount of knowledge built in to the Wolfram Language—including, in one tiny corner, the knowledge to trivially create a Random Pokemon app:

Here it is running on the watch:

Let’s try some slightly more complex Wolfram Language code. Here’s a Word Inventor that makes a “word” by alternating random vowels and consonants (and often the result sounds a lot like a Pokemon, or a tech startup):

If nothing else, one thing people presumably want to use a watch for is to tell time. And since we’re in the modern internet world, it has to be more fun if there’s a cat or two involved. So here’s the Wolfram Language code for a Kitty Clock:

Which on the watch becomes:

One can get pretty geeky with clocks. Remembering our recent very popular My Pi Day website, here’s some slightly more complicated code to make a Pi Clock where the digits of the current time are displayed in the context where they first occur in pi:

Or adding a little more:

So long as you enable it, the Apple Watch uses GPS, etc. on its paired phone to know where you are. That makes it extremely easy to have a Lat-Long app that shows your current latitude and longitude on the watch (this one is for our company HQ):

I’m not quite sure why it’s useful (prove location over Skype?), but here’s a Here & Now QR app that shows your current location and time in a QR code:

Among the many things the Wolfram Language knows a lot about is geography. So here’s the code to find the ten volcanoes closest to you:

A little more code shows them on a map, and constructs a Nearest Volcanoes app:

Here’s the code for a 3D Topography app, that shows the (scaled) 3D topography for 10 miles around your location:

Since the watch communicates with the Wolfram Cloud, it can make use of all the real-time data that’s flowing into the Wolfram Knowledgebase. That data includes things like the current (*x*,*y*,*z*,*t*) position of the International Space Station:

Given the position, a little bit of Wolfram Language graphics programming gives us an ISS Locator app:

As another example of real-time data, here’s the code for an Apple Quanting app that does some quant-oriented computations on Apple stock:

And here’s the code for a Market Word Cloud app that shows a stock-symbols word cloud weighted by fractional price changes in the past day (Apple up, Google down today):

Here’s the complete code for a geo-detecting Currency Converter app:

It’s easy to make so many apps with the Wolfram Language. Here’s the core code for a Sunrise/Sunset app:

Setting up a convenient display for the watch takes a little more code:

The Wolfram Language includes real-time weather feeds:

Which we can also display iconically:

Here’s the data for the last week of air temperatures:

And with a little code, we can format this to make a Temperature History app:

Sometimes the easiest way to get a result in the Wolfram Language is just to call Wolfram|Alpha. Here’s what Wolfram|Alpha shows on the web if you ask about the time to sunburn (it detects your current location):

Now here’s a real-time Sunburn Time app created by calling Wolfram|Alpha through the Wolfram Language (the different rows are for different skin tones):

The Wolfram Language has access not only to all its own curated data feeds, but also to private data feeds, especially ones in the Wolfram Data Drop.

As a personal analytics enthusiast, I maintain a databin in the Wolfram Data Drop that tells me my current backlog of unprocessed and unread email messages. I have a scheduled task that runs in the cloud and generates a report of my backlog history. And given this, it’s easy to have an SW Email Backlog app that imports this report on demand, and displays it on a watch:

And, yes, the recent increase in unprocessed and unread email messages is at least in part a consequence of work on this blog.

There are now lots of Wolfram Data Drop databins around, and of course you can make your own. And from any databin you can immediately make a watch app that shows a dashboard for it. Like here’s a Company Fridge app based on a little temperature sensor sitting in a break-room refrigerator at our company HQ (the cycling is from the compressor; the spike is from someone opening the fridge):

Databins often get data from just a single source or single device. But one can also have a databin that gets data from an app running on lots of different devices.

As a simple example, let’s make an app that just shows where in the world that app is being accessed from. Here’s the complete code to the deploy such a “Data Droplets” app:

The app does two things. First, whenever it’s run, it adds the geo location of the device that’s running it to a central databin in the Wolfram Data Drop. And second, it displays a world map that marks the last 20 places in the world where the app has been used:

A typical reason to run an app on the watch is to be able to see results right on your wrist. But another reason is to use the app to make things happen externally, say through APIs.

As one very simple example, here’s the complete code to deploy an app that mails the app’s owner a map of a 1-mile region around wherever they are when they access the app:

So far, all the apps we’ve talked about are built from fixed pieces of Wolfram Language code that get deployed once to the Apple Watch. But the Wolfram Language is symbolic, so it’s easy for it to manipulate the code of an app, just like it manipulates any other data. And that means that it’s straightforward to use the Wolfram Language to build and deploy custom apps on the fly.

Here’s a simple example. Say we want to have an app on the watch that gives a countdown of days to one’s next birthday. It’d be very inconvenient to have to enter the date of one’s birthday directly on the watch. But instead we can have an app on the phone where one enters one’s birthday, and then this app can in real time build a custom watch app that gives the countdown for that specific birthday.

Here we enter a birthday in a standard Wolfram Language “smart field” that accepts any date format:

And as soon as we touch Submit, this app runs Wolfram Language code in the Wolfram Cloud that generates a new custom app for whatever birthday we entered, then deploys that generated app so it shows up on our watch:

Here’s the complete code that’s needed to make the Birthday Countdown app-generating app.

And here is the result from the generated countdown app for my birthday:

We can make all sorts of apps like this. Here’s a World Clocks example where you fill out a list of any number of places, and create an app that displays an array of clocks for all those places:

You can also use app generation to put *you* into an app. Here’s the code to deploy a “You Clock” app-generating app that lets you take a picture of yourself with your phone, then creates an app that uses that picture as the hands of a clock:

And actually, you can easily go even more meta, and have apps that generate apps that generate apps: apps all the way down!

When I set out to use the Wolfram Language to make apps for the Apple Watch I wasn’t sure how it would go. Would the deployment pipeline to the watch work smoothly enough? Would there be compelling watch apps that are easy to build in the Wolfram Language?

I’m happy to say that everything has gone much better than I expected. The watch is very new, so there were a few initial deployment issues, which are rapidly getting worked out. But it became clear that there are lots and lots of good watch apps that can be made even with tiny amounts of Wolfram Language code (tweet-a-watch-app?). And to me it’s very impressive that in less than one full day’s work I was able to develop and deploy 25 complete apps.

Of course, what ultimately made this possible is the whole Wolfram Language technology stack that I’ve been building for nearly 30 years. But it’s very satisfying to see all the automation we’ve built work so nicely, and make it so easy to turn ideas into yet another new kind of thing: watch apps.

It’s always fun to program in the Wolfram Language, and it’s neat to see one’s code deployed on something like a watch. But what’s ultimately more important is that it’s going to be very useful to lots of people for lots of purposes. The code here is a good way to get started learning what to do. But there are many directions to go, and many important—or simply fun—apps to create. And the remarkable thing is that the Wolfram Language makes it so easy to create watch apps that they can become a routine part of everyday workflow: just another place where functionality can be deployed.

*To comment, please visit the copy of this post at the Wolfram Blog »*