An octillion. A billion billion billion. That’s a fairly conservative estimate of the number of times a cellphone or other device somewhere in the world has generated a bit using a maximum-length linear-feedback shift register sequence. It’s probably the single most-used mathematical algorithm idea in history. And the main originator of this idea was Solomon Golomb, who died on May 1—and whom I knew for 35 years.

Solomon Golomb’s classic book *Shift Register Sequences*, published in 1967—based on his work in the 1950s—went out of print long ago. But its content lives on in pretty much every modern communications system. Read the specifications for 3G, LTE, Wi-Fi, Bluetooth, or for that matter GPS, and you’ll find mentions of polynomials that determine the shift register sequences these systems use to encode the data they send. Solomon Golomb is the person who figured out how to construct all these polynomials.

He also was in charge when radar was first used to find the distance to Venus, and of working out how to encode images to be sent from Mars. He introduced the world to what he called polyominoes, which later inspired Tetris (“tetromino tennis”). He created and solved countless math and wordplay puzzles. And—as I learned about 20 years ago—he came very close to discovering my all-time-favorite rule 30 cellular automaton all the way back in 1959, the year I was born.

This essay is in *Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People* »

Most of the scientists and mathematicians I know I met first through professional connections. But not Sol Golomb. It was 1981, and I was at Caltech, a 21-year-old physicist who’d just received some media attention from being the youngest in the first batch of MacArthur award recipients. I get a knock at my office door—and a young woman is there. Already this was unusual, because in those days there were hopelessly few women to be found around a theoretical high-energy physics group. I was a sheltered Englishman who’d been in California a couple of years, but hadn’t really ventured outside the university—and was ill prepared for the burst of Southern Californian energy that dropped in to see me that day. She introduced herself as Astrid, and said that she’d been visiting Oxford and knew someone I’d been at kindergarten with. She explained that she had a personal mandate to collect interesting acquaintances around the Pasadena area. I think she considered me a difficult case, but persisted nevertheless. And one day when I tried to explain something about the work I was doing she said, “You should meet my father. He’s a bit old, but he’s still as sharp as a tack.” And so it was that Astrid Golomb, oldest daughter of Sol Golomb, introduced me to Sol Golomb.

The Golombs lived in a house perched in the hills near Pasadena. I learned that they had two daughters—Astrid, a little older than me, an aspiring Hollywood person, and Beatrice, about my age, a high-powered science type. The Golomb sisters often had parties, usually at their family’s house. There were themes, like the flamingoes & hedgehogs croquet garden party (“recognition will be given to the person who appears most appropriately attired”), or the Stonehenge party with instructions written using runes. The parties had an interesting cross-section of young and not-so-young people, including various local luminaries. And always there, hanging back a little, was Sol Golomb, a small man with a large beard and a certain elf-like quality to him, typically wearing a dark suit coat.

I gradually learned a little about Sol Golomb. That he was involved in “information theory”. That he worked at USC (the University of Southern California). That he had various unspecified but apparently high-level government and other connections. I’d heard of shift registers, but didn’t really know anything much about them.

Then in the fall of 1982, I visited Bell Labs in New Jersey and gave a talk about my latest results on cellular automata. One topic I discussed was what I called “additive” or “linear” cellular automata—and their behavior with limited numbers of cells. Whenever a cellular automaton has a limited number of cells, it’s inevitable that its behavior will eventually repeat. But as the size increases, the maximum repetition period—say for the rule 90 additive cellular automaton—bounces around seemingly quite randomly: 1, 1, 3, 2, 7, 1, 7, 6, 31, 4, 63, …. A few days before my talk, however, I’d noticed that these periods actually seemed to follow a formula that depended on things like the prime factorization of the number of cells. But when I mentioned this during the talk, someone at the back put up their hand and asked, “Do you know if it works for the case *n*=37?” My experiments hadn’t gotten as far as the size-37 case yet, so I didn’t know. But why would someone ask that?

The person who asked turned out to be a certain Andrew Odlyzko, a number theorist at Bell Labs. I asked him, “What on earth makes you think there might be something special about *n*=37?” “Well,” he said, “I think what you’re doing is related to the theory of linear-feedback shift registers,” and he suggested that I look at Sol Golomb’s book (“Oh yes,” I said, “I know his daughters…”). Andrew was indeed correct: there is a very elegant theory of additive cellular automata based on polynomials that is similar to the theory Sol developed for linear-feedback shift registers. Andrew and I ended up writing a now-rather-well-cited paper about it (it’s interesting because it’s a rare case where traditional mathematical methods let one say things about nontrivial cellular automaton behavior). And for me, a side effect was that I learned something about what the somewhat mysterious Sol Golomb actually did. (Remember, this was before the web, so one couldn’t just instantly look everything up.)

Solomon Golomb was born in Baltimore, Maryland in 1932. His family came from Lithuania. His grandfather had been a rabbi; his father moved to the US when he was young, and got a master’s degree in math before switching to medieval Jewish philosophy and also becoming a rabbi. Sol’s mother came from a prominent Russian family that had made boots for the Tsar’s army and then ran a bank. Sol did well in school, notably being a force in the local debating scene. Encouraged by his father, he developed an interest in mathematics, publishing a problem he invented about primes when he was 17. After high school, Sol enrolled at Johns Hopkins University to study math, narrowly avoiding a quota on Jewish students by promising he wouldn’t switch to medicine—and took twice the usual course load, graduating in 1951 after half the usual time.

From there he would go to Harvard for graduate school in math. But first he took a summer job at the Glenn L. Martin Company, an aerospace firm founded in 1912 that had moved to Baltimore from Los Angeles in the 1920s and mostly become a defense contractor—and that would eventually merge into Lockheed Martin. At Harvard, Sol specialized in number theory, and in particular in questions about characterizations of sets of prime numbers. But every summer he would return to the Martin Company. As he later described it, he found that at Harvard “the question of whether anything that was taught or studied in the mathematics department had any practical applications could not even be asked, let alone discussed”. But at the Martin Company, he discovered that the pure mathematics he knew—even about primes and things—did indeed have practical applications, and very interesting ones, especially to shift registers.

The first summer he was at the Martin Company, Sol was assigned to a control theory group. But by his second summer, he’d been put in a group studying communications. And in June 1954 it so happened that his supervisor had just gone to a conference where he’d heard about strange behavior observed in linear-feedback shift registers (he called them “tapped delay lines with feedback”)—and he asked Sol if he could investigate. It didn’t take Sol long to realize that what was going on could be very elegantly studied using the pure mathematics he knew about polynomials over finite fields. Over the year that followed, he split his time between graduate school at Harvard and consulting for the Martin Company, and in June 1955 he wrote his final report, “Sequences with Randomness Properties”—which would basically become the foundational document of the theory of shift register sequences.

Sol liked math puzzles, and in the process of thinking about a puzzle involving arranging dominoes on a checkerboard, he ended up inventing what he called “polyominoes”. He gave a talk about them in November 1953 at the Harvard Mathematics Club, published a paper about them (his first research publication), won a Harvard math prize for his work on them, and, as he later said, then “found [himself] irrevocably committed to their care and feeding” for the rest of his life.

In June 1955, Sol went to spend a year at the University of Oslo on a Fulbright Fellowship—partly so he could work with some distinguished number theorists there, and partly so he could add Norwegian, Swedish and Danish (and some runic scripts) to his collection of language skills. While he was there, he finished a long paper on prime numbers, but also spent time traveling around Scandinavia, and in Denmark met a young woman named Bo (Bodil Rygaard)—who came from a large family in a rural area mostly known for its peat moss, but had managed to get into university and was studying philosophy. Sol and Bo apparently hit it off, and within months, they were married.

When they returned to the US in July 1956, Sol interviewed in a few places, then accepted a job at JPL—the Jet Propulsion Lab that had spun off from Caltech, initially to do military work. Sol was assigned to the Communications Research Group, as a Senior Research Engineer. It was a time when the people at JPL were eager to try launching a satellite. At first, the government wouldn’t let them do it, fearing it would be viewed as a military act. But that all changed in October 1957 when the Soviet Union launched Sputnik, ostensibly as part of the International Geophysical Year. Amazingly, it took only 3 months for the US to launch Explorer 1. JPL built much of it, and Sol’s lab (where he had technicians building electronic implementations of shift registers) was diverted into doing things like making radiation detectors (including, as it happens, the ones that discovered the Van Allen radiation belts)—while Sol himself worked on using radar to determine the orbit of the satellite when it was launched, taking a little time out to go back to Harvard for his final PhD exam.

It was a time of great energy around JPL and the space program. In May 1958 a new Information Processing Group was formed, and Sol was put in charge—and in the same month, Sol’s first child, the aforementioned Astrid, was born. Sol continued his research on shift register sequences—particularly as applied to jamming-resistant radio control of missiles. In May 1959, Sol’s second child arrived—and was named Beatrice, forming a nice A, B sequence. In the fall of 1959, Sol took a sabbatical at MIT, where he got to know Claude Shannon and a number of other MIT luminaries, and got involved in information theory and the theory of algebraic codes.

As it happens, he’d already done some work on coding theory—in the area of biology. The digital nature of DNA had been discovered by Jim Watson and Francis Crick in 1953, but it wasn’t yet clear just how sequences of the four possible base pairs encoded the 20 amino acids. In 1956, Max Delbrück—Jim Watson’s former postdoc advisor at Caltech—asked around at JPL if anyone could figure it out. Sol and two colleagues analyzed an idea of Francis Crick’s and came up with “comma-free codes” in which overlapping triples of base pairs could encode amino acids. The analysis showed that exactly 20 amino acids could be encoded this way. It seemed like an amazing explanation of what was seen—but unfortunately it isn’t how biology actually works (biology uses a more straightforward encoding, where some of the 64 possible triples just don’t represent anything).

In addition to biology, Sol was also pulled into physics. His shift register sequences were useful for doing range finding with radar (much as they’re used now in GPS), and at Sol’s suggestion, he was put in charge of trying to use them to find the distance to Venus. And so it was that in early 1961—when the Sun, Venus, and Earth were in alignment—Sol’s team used the 85-foot Goldstone radio dish in the Mojave Desert to bounce a radar signal off Venus, and dramatically improve our knowledge of the Earth-Venus and Earth-Sun distances.

With his interest in languages, coding and space, it was inevitable that Sol would get involved in the question of communications with extraterrestrials. In 1961 he wrote a paper for the Air Force entitled “A Short Primer for Extraterrestrial Linguistics”, and over the next several years wrote several papers on the subject for broader audiences. He said that “There are two questions involved in communication with Extraterrestrials. One is the mechanical issue of discovering a mutually acceptable channel. The other is the more philosophical problem (semantic, ethic, and metaphysical) of the proper subject matter for discourse. In simpler terms, we first require a common language, and then we must think of something clever to say.” He continued, with a touch of his characteristic humor: “Naturally, we must not risk telling too much until we know whether the Extraterrestrials’ intentions toward us are honorable. The Government will undoubtedly set up a Cosmic Intelligence Agency (CIA) to monitor Extraterrestrial Intelligence. Extreme security precautions will be strictly observed. As H. G. Wells once pointed out [or was it an episode of *The Twilight Zone*?], even if the Aliens tell us in all truthfulness that their only intention is ‘to serve mankind,’ we must endeavor to ascertain whether they wish to serve us baked or fried.”

While at JPL, Sol had also been teaching some classes at the nearby universities: Caltech, USC and UCLA. In the fall of 1962, following some changes at JPL—and perhaps because he wanted to spend more time with his young children—he decided to become a full-time professor. He got offers from all three schools. He wanted to go somewhere where he could “make a difference”. He was told that at Caltech “no one has any influence if they don’t at least have a Nobel Prize”, while at UCLA “the UC bureaucracy is such that no one ever has any ability to affect anything”. The result was that—despite its much-inferior reputation at the time—Sol chose USC. He went there in the spring of 1963 as a Professor of Electrical Engineering—and ended up staying for 53 years.

Before going on with the story of Sol’s life, I should explain what a linear-feedback shift register (LFSR) actually is. The basic idea is simple. Imagine a row of squares, each containing either 1 or 0 (say, black or white). In a pure shift register all that happens is that at each step all values shift one position to the left. The leftmost value is lost, and a new value is “shifted in” from the right. The idea of a feedback shift register is that the value that’s shifted in is determined (or “fed back”) from values at other positions in the shift register. In a linear-feedback shift register, the values from “taps” at particular positions in the register are combined by being added mod 2 (so that 1⊕1=0 instead of 2), or equivalently XOR’ed (“exclusive or”, true if either is true, but not both).

If one runs this for a while, here’s what happens:

Obviously the shift register is always shifting bits to the left. And it has a very simple rule for how bits should be added at the right. But if one looks at the sequence of these bits, it seems rather random—though, as the picture shows, it does eventually repeat. What Sol Golomb did was to find an elegant mathematical way to analyze such sequences, and how they repeat.

If a shift register has size *n*, then it has 2^{n} possible states altogether (corresponding to all possible sequences of 0s and 1s of length *n*). Since the rules for the shift register are deterministic, any given state must always go to the same next state. And that means the maximum possible number of steps the shift register could conceivably go through before it repeats is 2^{n }(actually, it’s 2^{n}–1, because the state with all 0s can’t evolve into anything else).

In the example above, the shift register is of size 7, and it turns out to repeat after exactly 2^{7}–1=127 steps. But which shift registers—with which particular arrangements of taps—will produce sequences with maximal lengths? This is the first question Sol Golomb set out to investigate in the summer of 1954. His answer was simple and elegant.

The shift register above has taps at positions 7, 6 and 1. Sol represented this algebraically, using the polynomial *x*^{7}+*x*^{6}+1. Then what he showed was that the sequence that would be generated would be of maximal length if this polynomial is “irreducible modulo 2”, so that it can’t be factored, making it sort of the analog of a prime among polynomials—as well as having some other properties that make it a so-called “primitive polynomial”. Nowadays, with Mathematica and the Wolfram Language, it’s easy to test things like this:

Back in 1954, Sol had to do all this by hand, but came up with a fairly long table of primitive polynomials corresponding to shift registers that give maximal length sequences:

The idea of maintaining short-term memory by having “delay lines” that circulate digital pulses (say in an actual column of mercury) goes back to the earliest days of electronic computers. By the late 1940s such delay lines were routinely being implemented purely digitally, using sequences of vacuum tubes, and were being called “shift registers”. It’s not clear when the first feedback shift registers were built. Perhaps it was at the end of the 1940s. But it’s still shrouded in mystery—because the first place they seem to have been used was in military cryptography.

The basic idea of cryptography is to take meaningful messages, and then randomize them so they can’t be recognized, but in such a way that the randomization can always be reversed if you know the key that was used to create it. So-called stream ciphers work by generating long sequences of seemingly random bits, then combining these with some representation of the message—then decoding by having the receiver independently generate the same sequence of seemingly random bits, and “backing this out” of the encoded message received.

Linear-feedback shift registers seem at first to have been prized for cryptography because of their long repetition periods. As it turns out, the mathematical analysis Sol used to find things like these periods also makes clear that such shift registers aren’t good for secure cryptography. But in the early days, they seemed pretty good—particularly compared to, say, successive rotor positions in an Enigma machine—and there’s been a persistent rumor that, for example, Soviet military cryptosystems were long based on them.

Back in 2001, when I was working on history notes for my book *A New Kind of Science*, I had a long phone conversation with Sol about shift registers. Sol told me that when he started out, he didn’t know anything about cryptographic work on shift registers. He said that people at Bell Labs, Lincoln Labs and JPL had also started working on shift registers around the same time he did—though perhaps through knowing more pure mathematics, he managed to get further than they did, and in the end his 1955 report basically defined the field.

Over the years that followed, Sol gradually heard about various precursors of his work in the pure mathematical literature. Way back in the year 1202 Fibonacci was already talking about what are now called Fibonacci numbers—and which are generated by a recurrence relation that can be thought of as an analog of a linear-feedback shift register, but working with arbitrary integers rather than 0s and 1s. There was a little work on recurrences with 0s and 1s done in the early 1900s, but the first large-scale study seems to have been by Øystein Ore, who, curiously, came from the University of Oslo, though was by then at Yale. Ore had a student named Marshall Hall—who Sol told me he knew had consulted for the predecessor of the National Security Agency in the late 1940s—possibly about shift registers. But whatever he may have done was kept secret, and so it fell to Sol to discover and publish the story of linear-feedback shift registers—even though Sol did dedicate his 1967 book on shift registers to Marshall Hall.

Over the years I’ve noticed the principle that systems defined by sufficiently simple rules always eventually end up having lots of applications. Shift registers follow this principle in spades. And for example modern hardware (and software) systems are bristling with shift registers: a typical cellphone probably has a dozen or two, implemented usually in hardware but sometimes in software. (When I say “shift register” here, I mean linear-feedback shift register, or LFSR.)

Most of the time, the shift registers that are used are ones that give maximum-length sequences (otherwise known as “m-sequences”). And the reasons they’re used are typically related to some very special properties that Sol discovered about them. One basic property they always have is that they contain the same total number of 0s and 1s (actually, there’s always exactly one extra 1). Sol then showed that they also have the same number of 00s, 01s, 10s and 11s—and the same holds for larger blocks too. This “balance” property is on its own already very useful, for example if one’s trying to efficiently test all possible bit patterns as input to a circuit.

But Sol discovered another, even more important property. Replace each 0 in a sequence by –1, then imagine multiplying each element in a shifted version of the sequence by the corresponding element in the original. What Sol showed is that if one adds up these products, they’ll always sum to zero, except when there’s no shift at all. Said more technically, he showed that the sequence has no correlation with shifted versions of itself.

Both this and the balance property will be approximately true for any sufficiently long random sequence of 0s and 1s. But the surprising thing about maximum-length shift register sequences is that these properties are always exactly true. The sequences in a sense have some of the signatures of randomness—but in a very perfect way, made possible by the fact that they’re not random at all, but instead have a very definite, organized structure.

It’s this structure that makes linear-feedback shift registers ultimately not suitable for strong cryptography. But they’re great for basic “scrambling” and “cheap cryptography”—and they’re used all over the place for these purposes. A very common objective is just to “whiten” (as in “white noise”) a signal. It’s pretty common to want to transmit data that’s got long sequences of 0s in it. But the electronics that pick these up can get confused if they see what amounts to “silence” for too long. One can avoid the problem by scrambling the original data by combining it with a shift register sequence, so there’s always some kind of “chattering” going on. And that’s indeed what’s done in Wi-Fi, Bluetooth, USB, digital TV, Ethernet and lots of other places.

It’s often a nice side effect that the shift register scrambling makes the signal harder to decode—and this is sometimes used to provide at least some level of security. (DVDs use a combination of a size-16 and a size-24 shift register to attempt to encode their data; many GSM phones use a combination of three shift registers to encode all their signals, in a way that was at first secret.)

GPS makes crucial use of shift register sequences too. Each GPS satellite continuously transmits a shift register sequence (from a size-10 shift register, as it happens). A receiver can tell at exactly what time a signal it’s just received was transmitted from a particular satellite by seeing what part of the sequence it got. And by comparing delay times from different satellites, the receiver can triangulate its position. (There’s also a precision mode of GPS, that uses a size-1024 shift register.)

A quite different use of shift registers is for error detection. Say one’s transmitting a block of bits, but each one has a small probability of error. A simple way to let one check for a single error is to include a “parity bit” that says whether there should be an odd or even number of 1s in the block of bits. There are generalizations of this called CRCs (cyclic redundancy checks) that can check for a larger number of errors—and that are computed essentially by feeding one’s data into none other than a linear-feedback shift register. (There are also error-correcting codes that let one not only detect but also correct a certain number of errors, and some of these, too, can be computed with shift register sequences—and in fact Sol Golomb used a version of these called Reed–Solomon codes to design the video encoding for Mars spacecraft.)

The list of uses for shift register sequences goes on and on. A fairly exotic example—more popular in the past than now—was to use shift register sequences to jitter the clock in a computer to spread out the frequency at which the CPU would potentially generate radio interference (“select Enable Spread Spectrum in the BIOS”).

One of the single most prominent uses of shift register sequences is in cellphones, for what’s called CDMA (code division multiple access). Cellphones got their name because they operate in “cells”, with all phones in a given cell being connected to a particular tower. But how do different cellphones in a cell not interfere with each other? In the first systems, each phone just negotiated with the tower to use a slightly different frequency. Later, they used different time slices (TDMA, or time division multiple access). But CDMA uses maximum-length shift register sequences to provide a clever alternative.

The idea is to have all phones essentially operate on the same frequency, but to have each phone encode its signal using (in the simplest case) a differently shifted version of a shift register sequence. And because of Sol’s mathematical results, these differently shifted versions have no correlation—so the cellphone signals don’t interfere. And this is how, for example, most 3G cellphone networks operate.

Sol created the mathematics for this, but he also brought some of the key people together. Back in 1959, he’d gotten to know a certain Irwin Jacobs, who’d recently gotten a PhD at MIT. Meanwhile, he knew Andy Viterbi, who worked at JPL. Sol introduced the two of them—and by 1968 they’d formed a company called Linkabit which did work on coding systems, mostly for the military.

Linkabit had many spinoffs and descendents, and in 1985 Jacobs and Viterbi started a new company called Qualcomm. It didn’t immediately do especially well, but by the early 1990s it began a meteoric rise when it started making the components to deploy CDMA in cellphones—and in 1999 Sol became the “Viterbi Professor of Communications” at USC.

It’s sort of amazing that—although most people have never heard of them—shift register sequences are actually used in one way or another almost whenever bits are moved around in modern communication systems, computers and elsewhere. It’s quite confusing sometimes, because there are lots of things with different names and acronyms that all turn out to be linear-feedback shift register sequences (PN, pseudonoise, M-, FSR, LFSR sequences, spread spectrum communications, MLS, SRS, PRBS, …).

If one looks at cellphones, shift register sequence usage has gone up and down over the years. 2G networks are based on TDMA, so don’t use shift register sequences to encode their data—but still often use CRCs to validate blocks of data. 3G networks are big users of CDMA—so there are shift register sequences involved in pretty much every bit that’s transmitted. 4G networks typically use a combination of time and frequency slots which don’t directly involve shift register sequences—though there are still CRCs used, for example to deal with data integrity when frequency windows overlap. 5G is designed to be more elaborate—with large arrays of antennas dynamically adapting to use optimal time and frequency slots. But half their channels are typically allocated to “pilot signals” that are used to infer the local radio environment—and work by transmitting none other than shift register sequences.

Throughout most kinds of electronics it’s common to want to use the highest data rates and the lowest powers that still get bits transmitted correctly above the “noise floor”. And typically the way one pushes to the edge is to do automatic error detection—using CRCs and therefore shift register sequences. And in fact pretty much every kind of bus (PCIe, SATA, etc.) inside a computer does this: whether it’s connecting parts of CPUs, getting data off devices, or connecting to a display with HDMI. And on disks and in memory, for example, CRCs and other shift-register-sequence-based codes are pretty much universally used to operate at the highest possible rates and densities.

Shift registers are so ubiquitous, it’s a little difficult to estimate just how many of them are in use, and how many bits are being generated by them. There are perhaps 10 billion computers, slightly fewer cellphones, and an increasing number of billions of embedded and IoT (“Internet of Things”) devices. (Even many of the billion cars in the world, for example, have at least 10 microprocessors in them.)

At what rate are the shift registers running? Here, again, things are complicated. In communications systems, for example, there’s a basic carrier frequency—usually in the GHz range—and then there’s what’s called a “chipping rate” (or, confusingly, “chip rate”) that says how fast something like CDMA is done, and this is usually in the MHz range. On the other hand, in buses inside computers, or in connections to a display, all the data is going through shift registers, at the full data range, which is well into the GHz range.

So it seems safe to estimate that there are at least 10 billion communications links, running for at least 1/10 billion seconds (which is 3 years), that use at least 1 billion bits from a shift register every second—meaning that to date Sol’s algorithm has been used at least an octillion times.

Is it really the most-used mathematical algorithm idea in history? I think so. I suspect the main potential competition would be from arithmetic operations. These days processors are doing perhaps a trillion arithmetic operations per second—and such operations are needed for pretty much every bit that’s generated by a computer. But how is arithmetic done? At some level it’s just a digital electronics implementation of the way people have done arithmetic forever.

But there are some wrinkles—some “algorithmic ideas”—though they’re quite obscure, except to microprocessor designers. Just as when Babbage was making his Difference Engine, carries are a big nuisance in doing arithmetic. (One can actually think of a linear-feedback shift register as being a system that does something like arithmetic, but doesn’t do carries.) There are “carry propagation trees” that optimize carrying. There are also little tricks (“Booth encoding”, “Wallace trees”, etc.) that reduce the number of bit operations needed to do the innards of arithmetic. But unlike with LFSRs, there doesn’t seem to be one algorithmic idea that’s universally used—and so I think it’s still likely that Sol’s maximum-length LFSR sequence idea is the winner for most used.

Even though it’s not obvious at first, it turns out there’s a very close relationship between feedback shift registers and something I’ve spent many years studying: cellular automata. The basic setup for a feedback shift register involves computing one bit at a time. In a cellular automaton, one has a line of cells, and at each step all the cells are updated in parallel, based on a rule that depends, say, on the values of their nearest neighbors.

To see how these are related, think about running a feedback shift register of size *n*, but displaying its state only every *n* steps—in other words, letting all the bits be rewritten before one displays again. If one displays every step of a linear-feedback shift register (here with two taps next to each other), as in the first two panels below, nothing much happens at each step, except that things shift to the left. But if one makes a compressed picture, showing only every *n* steps, suddenly a pattern emerges.

It’s a nested pattern, and it’s very close to being the exact same pattern that one gets with a cellular automaton that takes a cell and its neighbor, and adds them mod 2 (or XORs them). Here’s what happens with that cellular automaton, if one arranges its cells so they’re in a circle of the same size as the shift register above:

At the beginning, the cellular automaton and shift register patterns are exactly the same—though when they “hit the edge” they become slightly different because the edges are handled differently. But looking at these pictures it becomes less surprising that the math of shift registers should be relevant to cellular automata. And seeing the regularity of the nested patterns makes it clearer why there might be an elegant mathematical theory of shift registers in the first place.

Typical shift registers used in practice don’t tend to make such obviously regular patterns, though. Here are a few examples of shift registers that yield maximum-length sequences. When one’s doing math, like Sol did, it’s very much the same story as for the case of obvious nesting. But here the fact that the taps are far apart makes things get mixed up, leaving no obvious visual trace of nesting.

So how broad is the correspondence between shift registers and cellular automata? In cellular automata the rules for generating new values of cells can be anything one wants. In linear-feedback shift registers, however, they always have to be based on adding mod 2 (or XOR’ing). But that’s what the “linear” part of “linear-feedback shift register” means. And it’s also in principle possible to have nonlinear-feedback shift registers (NFSRs) that use whatever rule one wants for combining values.

And in fact, once Sol had worked out his theory for linear-feedback shift registers, he started in on the nonlinear case. When he arrived at JPL in 1956 he got an actual lab, complete with racks of little electronic modules. Sol told me each module was about the size of a cigarette pack—and was built from a Bell Labs design to perform a particular logic operation (AND, OR, NOT, …). The modules could be strung together to implement whatever nonlinear-feedback shift register one wanted, and they ran pretty fast—producing about a million bits per second. (Sol told me that someone tried doing the same thing with a general-purpose computer—and what took 1 second with the custom hardware modules took 6 weeks on the general-purpose computer.)

When Sol had looked at linear-feedback shift registers, the first big thing he’d managed to understand was their repetition periods. And with nonlinear ones he put most of his effort into trying to understand the same thing. He collected all sorts of experimental data. He told me he even tested sequences of length 2^{45}—which must have taken a year. He made summaries, like the one below (notice the visualizations of sequences, shown as oscilloscope-like traces). But he never managed to come up with any kind of general theory as he had with linear-feedback shift registers.

It’s not surprising he couldn’t do it. Because when one looks at nonlinear-feedback shift registers, one’s effectively sampling the whole richness of the computational universe of possible simple programs. Back in the 1950s there were already theoretical results—mostly based on Turing’s ideas of universal computation—about what programs could in principle do. But I don’t think Sol or anyone else ever thought they would apply to the very simple—if nonlinear—functions in NFSRs.

And in the end it basically took until my work around 1981 for it to become clear just how complicated the behavior of even very simple programs could be. My all-time favorite example is rule 30—a cellular automaton in which the values of neighboring cells are combined using a function that can be represented as *p*+*q*+*r*+*qr* mod 2 (or *p* XOR (*q* OR *r*)). And, amazingly, Sol looked at nonlinear-feedback shift registers that were based on incredibly similar functions—like, in 1959, *p*+*r*+*s*+*qr*+*qs*+*rs* mod 2. Here’s what Sol’s function (which can be thought of as “rule 29070”), rule 30, and a couple of other similar rules look like in a shift register:

And here’s what they look like as cellular automata, without being constrained to a fixed-size register:

Of course, Sol never made pictures like this (and it would, realistically, have been almost impossible to do so in the 1950s). Instead, he concentrated on a kind of aggregate feature: the overall repetition period.

Sol wondered whether nonlinear-feedback shift registers might make good sources of randomness. From what we now know about cellular automata, it’s clear they can. And for example the rule 30 cellular automaton is what we used to generate randomness for Mathematica for 25 years (though we recently retired it in favor of a more efficient rule that we found by searching trillions of possibilities).

Sol didn’t talk about cryptography much—though I suspect he did quite a bit of government work on it. He did tell me though that in 1959 he’d found a “multi-dimensional correlation attack on nonlinear sequences”, though he said that at the time he “carefully avoided stating that the application was to cryptanalysis”. The fact is that cellular automata like rule 30 (and presumably also nonlinear-feedback shift registers) do seem to be good cryptosystems—though partly because of confusions about whether they’re somehow equivalent to linear-feedback shift registers (they’re not), they’ve never been used as much as they should.

Being a history enthusiast, I’ve tried over the past few decades to identify all precursors to my work on 1D cellular automata. 2D cellular automata had been studied a bit, but there was only quite theoretical work on the 1D case, together with a few specific investigations in the cryptography community (that I’ve never fully found out about). And in the end, of all the things I’ve seen, I think Sol Golomb’s nonlinear-feedback shift registers were in a sense closest to what I actually ended up doing a quarter century later.

Mention the name “Golomb” and some people will think of shift registers. But many more will think of polyominoes. Sol didn’t invent polyominoes—though he did invent the name. But what he did was to make systematic what had appeared only in isolated puzzles before.

The main question Sol was interested in was how and when collections of polyominoes can be arranged to tile particular (finite or infinite) regions. Sometimes it’s fairly obvious, but often it’s very tricky to figure out. Sol published his first paper on polyominoes in 1954, but what really launched polyominoes into the public consciousness was Martin Gardner’s 1957 Mathematical Games column on them in *Scientific American*. As Sol explained in the introduction to his 1964 book, the effect was that he acquired “a steady stream of correspondents from around the world and from every stratum of society—board chairmen of leading universities, residents of obscure monasteries, inmates of prominent penitentiaries…”

Game companies took notice too, and within months, for example, the “New Sensational Jinx Jigsaw Puzzle” had appeared—followed over the course of decades by a long sequence of other polyomino-based puzzles and games (no, the sinister bald guy doesn’t look anything like Sol):

Sol was still publishing papers about polyominoes 50 years after he first discussed them. In 1961 he introduced general subdividable “rep-tiles”, which it later became clear can make nested, fractal (“infin-tile”), patterns. But almost everything Sol did with polyominoes involved solving specific tiling problems with them.

For me, polyominoes are most interesting not for their specifics but for the examples they provide of more-general phenomena. One might have thought that given a few simple shapes it would be easy to decide whether they can tile the whole plane. But the example of polyominoes—with all the games and puzzles they support—makes it clear that it’s not necessarily so easy. And in fact it was proved in the 1960s that in general it’s a theoretically undecidable problem.

If one’s only interested in a finite region, then in principle one can just enumerate all conceivable arrangements of the original shapes, and see whether any of them correspond to successful tilings. But if one’s interested in the whole, infinite plane then one can’t do this. Maybe one will find a tiling of size one million, but there’s no guarantee how far the tiling can be extended.

It turns out it can be like running a Turing machine—or a cellular automaton. You start from a line of tiles. Then the question of whether there’s an infinite tiling is equivalent to the question of whether there’s a setup for some Turing machine that makes it never halt. And the point then is that if the Turing machine is universal (so that it can in effect be programmed to do any possible computation) then the halting problem for it can be undecidable, which means that the tiling problem is also undecidable.

Of course, whether a tiling problem is undecidable depends on the original set of shapes. And for me an important question is how complicated the shapes have to be so that they can encode universal computation, and yield an undecidable tiling problem. Sol Golomb knew the literature on this kind of question, but wasn’t especially interested in it. But I start thinking about materials formed from polyominoes whose pattern of “crystallization” can in effect do an arbitrary computation, or occur at a “melting point” that seems “random” because its value is undecidable.

Complicated, carefully crafted sets of polyominoes are known that in effect support universal computation. But what’s the simplest set—and is it simple enough that one might run across by accident? My guess is that—just like with other kinds of systems I’ve studied in the computational universe—the simplest set is in fact simple. But finding it is very difficult.

A considerably easier problem is to find polyominoes that successfully tile the plane, but can’t do so periodically. Roger Penrose (of Penrose tiles fame) found an example in 1994. My book *A New Kind of Science* gave a slightly simpler example with 3 polyominoes:

By the time Sol was in his early thirties, he’d established his two most notable pursuits—shift registers and polyominoes—and he’d settled into life as a university professor. He was constantly active, though. He wrote what ended up being a couple of hundred papers, some extending his earlier work, some stimulated by questions people would ask him, and some written, it seems, for the sheer pleasure of figuring out interesting things about numbers, sequences, cryptosystems, or whatever.

Shift registers and polyominoes are both big subjects (they even each have their own category in the AMS classification of mathematical publication topics). Both have had a certain injection of energy in the past decade or two as modern computer experiments started to be done on them—and Sol collaborated with people doing these. But both fields still have many unanswered questions. Even for linear-feedback shift registers there are bigger Hadamard matrices to be found. And very little is known even now about nonlinear-feedback shift registers. Not to mention all the issues about nonperiodic and otherwise exotic polyomino tilings.

Sol was always interested in puzzles, both with math and with words. For a while he wrote a puzzle column for the *Los Angeles Times*—and for 32 years he wrote “Golomb’s Gambits” for the Johns Hopkins alumni magazine. He participated in MegaIQ tests—earning himself a trip to the White House when he and its chief of staff happened to both score in the top five in the country.

He poured immense effort into his work at the university, not only teaching undergraduate courses and mentoring graduate students but also ascending the ranks of university administration (president of the faculty senate, vice provost for research, etc.)—and occasionally opining more generally about university governance (for example writing a paper entitled “Faculty Consulting: Should It Be Curtailed?”; answer: no, it’s good for the university!). At USC, he was particularly involved in recruiting—and over his time at USC he helped it ascend from a school essentially unknown in electrical engineering to one that makes it onto lists of top programs.

And then there was consulting. He was meticulous at not disclosing what he did for government agencies, though at one point he did lament that some newly published work had been anticipated by a classified paper he had written 40 years earlier. In the late 1960s—frustrated that everyone but him seemed to be selling polyomino games—Sol started a company called Recreational Technology, Inc. It didn’t go particularly well, but one side effect was that he got involved in business with Elwyn Berlekamp—a Berkeley professor and fellow enthusiast of coding theory and puzzles—whom he persuaded to start a company called Cyclotomics (in honor of cyclotomic polynomials of the form *x*^{n}–1) which was eventually sold to Kodak for a respectable sum. (Berlekamp also created an algorithmic trading system that he sold to Jim Simons and that became a starting point for Renaissance Technologies, now one of the world’s largest hedge funds.)

More than 10,000 patents refer to Sol’s work, but Sol himself got only one patent: on a cryptosystem based on quasigroups—and I don’t think he ever did much to directly commercialize his work.

Sol was for many years involved with the Technion (Israel Institute of Technology) and quite devoted to Israel. He characterized himself as an “non-observant orthodox Jew”—but occasionally did things like teach a freshman seminar on the Book of Genesis, as well as working on decoding parts of the Dead Sea Scrolls.

Sol and his wife traveled extensively, but the center of Sol’s world was definitely Los Angeles—his office at USC, and the house in which he and his wife lived for nearly 60 years. He had a circle of friends and students who relied on him for many things. And he had his family. His daughter Astrid remained a local personality, even being portrayed in fiction a few times—as a student in a play about Richard Feynman (she sat as a drawing model for him many times), and as a character in a novel by a friend of mine. Beatrice became an MD/PhD who’s spent her career applying an almost mathematical level of precision to various kinds of medical reasoning and diagnosis (Gulf War illness, statin effects, hiccups, etc.)—even as she often quotes “Beatrice’s Law”, that “everything in biology is more complicated than you think, even taking into account Beatrice’s Law”. (I’m happy to have made at least one contribution to Beatrice’s life: introducing her to her husband, now of 26 years, Terry Sejnowski, one of the founders of modern computational neuroscience.)

In the years I knew Sol, there was always a quiet energy to him. He seemed to be involved in lots of things, even if he often wasn’t particularly forthcoming about the details. Occasionally I would talk to him about actual science and mathematics; usually he was more interested in telling stories (often very engaging ones) about personalities and organizations (“Can you believe that [in 1985] after not going to conferences for years, Claude Shannon just showed up unannounced at the bar at the annual information theory conference?”, “Do you know how much they had to pay the president of Caltech to get him to move to Saudi Arabia?”, etc.)

In retrospect, I wish I’d done more to get Sol interested in some of the math questions brought up by my own work. I don’t think I properly internalized the extent to which he liked cracking problems suggested by other people. And then there was the matter of computers. Despite all his contributions to the infrastructure of the computational world, Sol himself basically never seriously used computers. He took particular pride in his own mental calculation capabilities. And he didn’t really use email until he was in his seventies, and never used a computer at home—though, yes, he did have a cellphone. (A typical email from him was short. I had mentioned last year that I was researching Ada Lovelace; he responded: “The story of Ada Lovelace as Babbage’s programmer is so widespread that everyone seems to accept it as factual, but I’ve never seen original source material on this.”)

Sol’s daughters organized a party for his 80th birthday a few years ago:

Sol had a few medical problems, though they didn’t seem to be slowing him down much. His wife’s health, though, was failing, and a few weeks ago her condition suddenly worsened. Sol still went to his office as usual on Friday, but on Saturday night, in his sleep, he died. His wife Bo died just two weeks later, two days before what would have been their 60th wedding anniversary.

Though Sol himself is gone, the work he did lives on—responsible for an octillion bits (and counting) across the digital world. Farewell, Sol. And on behalf of all of us, thanks for all those cleverly created bits.

]]>He’d heard of eclipses. He didn’t really understand them. But he had the idea that that was what he was seeing. Excited, he told another kid about it. They hadn’t heard of eclipses. But he pointed out that the sun had a bite taken out of it. The other kid looked up. Perhaps the sun was too bright, but they looked away without noticing anything. Then the first kid tried another kid. And then another. None of them believed him about the eclipse and the bite taken out of the sun.

Of course, this is a story about me. And now I can find the eclipse by going to Wolfram|Alpha (or the Wolfram Language):

And, yes, it was fun to see my first eclipse (almost exactly 25 years later, I finally saw a total eclipse too). But my real takeaway from that day was about the world and about people. Even if you notice something as obvious as a bite taken out of the side of the sun, there’s no guarantee that you can convince anyone else that it’s there.

It’s been very helpful to me over the past fifty years to understand that. There’ve been so many times in my life in science, technology and business where things seemed as obvious to me as the bite taken out of the sun. And quite often it’s been easy to get other people to see them too. But sometimes they just don’t.

When they find out that people don’t agree with something that seems obvious to them, many people will just conclude that they’re the ones who are wrong. That even though it seems obvious to them, the “crowd” must be right, and they themselves must somehow be confused. Fifty years ago today I learned that wasn’t true. Perhaps it made me more obstinate, but I could list quite a few pieces of science and technology that I rather suspect wouldn’t exist today if it hadn’t been for that kindergarten experience of mine.

As I write this, I feel an urge to tell a few other stories—and lessons learned—from kindergarten. I should explain that I went to a kindergarten with lots of smart kids, mostly children of Oxford academics. They certainly seemed very bright to me at the time—and, interestingly, many of them have ended up having distinguished lives and careers.

In many ways, the kids were much brighter than most of the teachers. I remember one teacher with the curious theory that children’s minds were like elastic bands—and that if children learned too much, their minds would snap. Of course, those were the days when Bible Study was part of pretty much any school’s curriculum in the UK, and it was probably very annoying that I would come in every day and regale everyone with stories about dinosaurs and geology when the teacher just wanted people to learn Genesis stories.

I don’t think I was great at “doing what the other kids do”. When I was three years old, and first at school, there was a time when everyone was supposed to run around “like a bus” (I guess ignoring the fact that buses go on roads…). I didn’t want to do it, and just stood in one place. “Why aren’t you being a bus?”, the teacher asked. “Well, I am lamp post”, I said. They seemed sufficiently taken aback by that response that they left me alone.

I learned an important lesson when I was about five, from another kid. (The kid in question happened to grow up to become a distinguished mathematician—and she was even knighted recently for her contributions to mathematics—but that’s not really relevant to the story.) We were supposed to be hammering nails into pieces of wood. Yes, in those days in the UK they let five-year-olds do that. Anyway, she had the hammer and said “Can you hold the nail? Trust me, I know what I’m doing.” Needless to say, she missed the nail. My thumb was black for several days. But it was a small price to pay for a terrific life lesson: just because someone claims to know what they’re talking about doesn’t mean they do. And nowadays, when I’m dealing with some expert who says “trust me, I know what I’m talking about”, I can’t help but have my mind wander back half a century to that moment just before the hammer fell.

I’ll relate two more stories. The first one I’m not sure how I feel about now. It had to do with learning addition. Now, realistically, I have a good memory (which is perhaps obvious given that I’m writing about things that happened 50 years ago). So I could perfectly well have just memorized all my addition facts. But somehow I didn’t want to. And one day I noticed that if I put two rulers next to each other, I could make a little machine that would add for me—an “addition slide rule”. So whenever we were doing additions, I always “happened” to have two rulers on my desk. When it came to multiplication, I didn’t memorize that either—though in that case I discovered I could go far by knowing the single fact that 7×8=56—because that was the fact other kids didn’t know. (In the end, it took until I was in my forties before I’d finally learned every part of my multiplication table up to 12×12.) And as I look at Wolfram|Alpha and Mathematica and so on, and think about my addition slide rule, I’m reminded of the theory that people never really change….

My final story comes from around the same time as the eclipse. Back then, the UK used non-decimal currency: there were 12 pennies in a shilling, and 20 shillings in a pound. And one of the exercises for us kids was to do mixed-radix arithmetic with these things. I was very pleased with myself one day when I figured out that money didn’t have to work this way; that everything could be base 10 (well, I didn’t explicitly know the concept of base 10 yet). I told this to a teacher. They were a little confused, but said that currency had worked the same way for hundreds of years, and wasn’t going to change. A couple of years later, the UK announced it was going to decimalize its currency. (I suspect if it had waited longer it would still have non-decimal currency, and there would just be a big market for calculators that could compute with it.) I’ve kept this little incident with me all these years—as a reminder that things can change, even if they’ve been the way they are for a very long time. Oh, and again, that one shouldn’t necessarily believe what one’s told. But I guess that’s a theme….

]]>This essay is in

They used to come by physical mail. Now it’s usually email. From around the world, I have for many years received a steady trickle of messages that make bold claims—about prime numbers, relativity theory, AI, consciousness or a host of other things—but give little or no backup for what they say. I’m always so busy with my own ideas and projects that I invariably put off looking at these messages. But in the end I try to at least skim them—in large part because I remember the story of Ramanujan.

On about January 31, 1913 a mathematician named G. H. Hardy in Cambridge, England received a package of papers with a cover letter that began: “Dear Sir, I beg to introduce myself to you as a clerk in the Accounts Department of the Port Trust Office at Madras on a salary of only £20 per annum. I am now about 23 years of age….” and went on to say that its author had made “startling” progress on a theory of divergent series in mathematics, and had all but solved the longstanding problem of the distribution of prime numbers. The cover letter ended: “Being poor, if you are convinced that there is anything of value I would like to have my theorems published…. Being inexperienced I would very highly value any advice you give me. Requesting to be excused for the trouble I give you. I remain, Dear Sir, Yours truly, S. Ramanujan”.

What followed were at least 11 pages of technical results from a range of areas of mathematics (at least 2 of the pages have now been lost). There are a few things that on first sight might seem absurd, like that the sum of all positive integers can be thought of as being equal to –1/12:

Then there are statements that suggest a kind of experimental approach to mathematics:

But some things get more exotic, with pages of formulas like this:

What are these? Where do they come from? Are they even correct?

The concepts are familiar from college-level calculus. But these are not just complicated college-level calculus exercises. Instead, when one looks closely, each one has something more exotic and surprising going on—and seems to involve a quite different level of mathematics.

Today we can use Mathematica or Wolfram|Alpha to check the results—at least numerically. And sometimes we can even just type in the question and immediately get out the answer:

And the first surprise—just as G. H. Hardy discovered back in 1913—is that, yes, the formulas are essentially all correct. But what kind of person would have made them? And how? And are they all part of some bigger picture—or in a sense just scattered random facts of mathematics?

Needless to say, there’s a human story behind this: the remarkable story of Srinivasa Ramanujan.

He was born in a smallish town in India on December 22, 1887 (which made him not “about 23”, but actually 25, when he wrote his letter to Hardy). His family was of the Brahmin (priests, teachers, …) caste but of modest means. The British colonial rulers of India had put in place a very structured system of schools, and by age 10 Ramanujan stood out by scoring top in his district in the standard exams. He also was known as having an exceptional memory, and being able to recite digits of numbers like pi as well as things like roots of Sanskrit words. When he graduated from high school at age 17 he was recognized for his mathematical prowess, and given a scholarship for college.

While in high school Ramanujan had started studying mathematics on his own—and doing his own research (notably on the numerical evaluation of Euler’s constant, and on properties of the Bernoulli numbers). He was fortunate at age 16 (in those days long before the web!) to get a copy of a remarkably good and comprehensive (at least as of 1886) 1055-page summary of high-end undergraduate mathematics, organized in the form of results numbered up to 6165. The book was written by a tutor for the ultra-competitive Mathematical Tripos exams in Cambridge—and its terse “just the facts” format was very similar to the one Ramanujan used in his letter to Hardy.

By the time Ramanujan got to college, all he wanted to do was mathematics—and he failed his other classes, and at one point ran away, causing his mother to send a missing-person letter to the newspaper:

Ramanujan moved to Madras (now Chennai), tried different colleges, had medical problems, and continued his independent math research. In 1909, when he was 21, his mother arranged (in keeping with customs of the time) for him to marry a then-10-year-old girl named Janaki, who started living with him a couple of years later.

Ramanujan seems to have supported himself by doing math tutoring—but soon became known around Madras as a math whiz, and began publishing in the recently launched *Journal of the Indian Mathematical Society*. His first paper—published in 1911—was on computational properties of Bernoulli numbers (the same Bernoulli numbers that Ada Lovelace had used in her 1843 paper on the Analytical Engine). Though his results weren’t spectacular, Ramanujan’s approach was an interesting and original one that combined continuous (“what’s the numerical value?”) and discrete (“what’s the prime factorization?”) mathematics.

When Ramanujan’s mathematical friends didn’t succeed in getting him a scholarship, Ramanujan started looking for jobs, and wound up in March 1912 as an accounting clerk—or effectively, a human calculator—for the Port of Madras (which was then, as now, a big shipping hub). His boss, the Chief Accountant, happened to be interested in academic mathematics, and became a lifelong supporter of his. The head of the Port of Madras was a rather distinguished British civil engineer, and partly through him, Ramanujan started interacting with a network of technically oriented British expatriates. They struggled to assess him, wondering whether “he has the stuff of great mathematicians” or whether “his brains are akin to those of the calculating boy”. They wrote to a certain Professor M. J. M. Hill in London, who looked at Ramanujan’s rather outlandish statements about divergent series and declared that “Mr. Ramanujan is evidently a man with a taste for Mathematics, and with some ability, but he has got on to wrong lines.” Hill suggested some books for Ramanujan to study.

Meanwhile, Ramanujan’s expat friends were continuing to look for support for him—and he decided to start writing to British mathematicians himself, though with some significant help at composing the English in his letters. We don’t know exactly who all he wrote to first—although Hardy’s long-time collaborator John Littlewood mentioned two names shortly before he died 64 years later: H. F. Baker and E. W. Hobson. Neither were particularly good choices: Baker worked on algebraic geometry and Hobson on mathematical analysis, both subjects fairly far from what Ramanujan was doing. But in any event, neither of them responded.

And so it was that on Thursday, January 16, 1913, Ramanujan sent his letter to G. H. Hardy.

G. H. Hardy was born in 1877 to schoolteacher parents based about 30 miles south of London. He was from the beginning a top student, particularly in mathematics. Even when I was growing up in England in the early 1970s, it was typical for such students to go to Winchester for high school and Cambridge for college. And that’s exactly what Hardy did. (The other, slightly more famous, track—less austere and less mathematically oriented—was Eton and Oxford, which happens to be where I went.)

Cambridge undergraduate mathematics was at the time very focused on solving ornately constructed calculus-related problems as a kind of competitive sport—with the final event being the Mathematical Tripos exams, which ranked everyone from the “Senior Wrangler” (top score) to the “Wooden Spoon” (lowest passing score). Hardy thought he should have been top, but actually came in 4th, and decided that what he really liked was the somewhat more rigorous and formal approach to mathematics that was then becoming popular in Continental Europe.

The way the British academic system worked at that time—and basically until the 1960s—was that as soon as they graduated, top students could be elected to “college fellowships” that could last the rest of their lives. Hardy was at Trinity College—the largest and most scientifically distinguished college at Cambridge University—and when he graduated in 1900, he was duly elected to a college fellowship.

Hardy’s first research paper was about doing integrals like these:

For a decade Hardy basically worked on the finer points of calculus, figuring out how to do different kinds of integrals and sums, and injecting greater rigor into issues like convergence and the interchange of limits.

His papers weren’t grand or visionary, but they were good examples of state-of-the-art mathematical craftsmanship. (As a colleague of Bertrand Russell’s, he dipped into the new area of transfinite numbers, but didn’t do much with them.) Then in 1908, he wrote a textbook entitled *A Course of Pure Mathematics*—which was a good book, and was very successful in its time, even if its preface began by explaining that it was for students “whose abilities reach or approach something like what is usually described as ‘scholarship standard’”.

By 1910 or so, Hardy had pretty much settled into a routine of life as a Cambridge professor, pursuing a steady program of academic work. But then he met John Littlewood. Littlewood had grown up in South Africa and was eight years younger than Hardy, a recent Senior Wrangler, and in many ways much more adventurous. And in 1911 Hardy—who had previously always worked on his own—began a collaboration with Littlewood that ultimately lasted the rest of his life.

As a person, Hardy gives me the impression of a good schoolboy who never fully grew up. He seemed to like living in a structured environment, concentrating on his math exercises, and displaying cleverness whenever he could. He could be very nerdy—whether about cricket scores, proving the non-existence of God, or writing down rules for his collaboration with Littlewood. And in a quintessentially British way, he could express himself with wit and charm, but was personally stiff and distant—for example always theming himself as “G. H. Hardy”, with “Harold” basically used only by his mother and sister.

So in early 1913 there was Hardy: a respectable and successful, if personally reserved, British mathematician, who had recently been energized by starting to collaborate with Littlewood—and was being pulled in the direction of number theory by Littlewood’s interests there. But then he received the letter from Ramanujan.

Ramanujan’s letter began in a somewhat unpromising way, giving the impression that he thought he was describing for the first time the already fairly well-known technique of analytic continuation for generalizing things like the factorial function to non-integers. He made the statement that “My whole investigations are based upon this and I have been developing this to a remarkable extent so much so that the local mathematicians are not able to understand me in my higher flights.” But after the cover letter, there followed more than nine pages that listed over 120 different mathematical results.

Again, they began unpromisingly, with rather vague statements about having a method to count the number of primes up to a given size. But by page 3, there were definite formulas for sums and integrals and things. Some of them looked at least from a distance like the kinds of things that were, for example, in Hardy’s papers. But some were definitely more exotic. Their general texture, though, was typical of these types of math formulas. But many of the actual formulas were quite surprising—often claiming that things one wouldn’t expect to be related at all were actually mathematically equal.

At least two pages of the original letter have gone missing. But the last page we have again seems to end inauspiciously—with Ramanujan describing achievements of his theory of divergent series, including the seemingly absurd result about adding up all the positive integers, 1+2+3+4+…, and getting –1/12.

So what was Hardy’s reaction? First he consulted Littlewood. Was it perhaps a practical joke? Were these formulas all already known, or perhaps completely wrong? Some they recognized, and knew were correct. But many they did not. But as Hardy later said with characteristic clever gloss, they concluded that these too “must be true because, if they were not true, no one would have the imagination to invent them.”

Bertrand Russell wrote that by the next day he “found Hardy and Littlewood in a state of wild excitement because they believe they have found a second Newton, a Hindu clerk in Madras making 20 pounds a year.” Hardy showed Ramanujan’s letter to lots of people, and started making enquiries with the government department that handled India. It took him a week to actually reply to Ramanujan, opening with a certain measured and precisely expressed excitement: “I was exceedingly interested by your letter and by the theorems which you state.”

Then he went on: “You will however understand that, before I can judge properly of the value of what you have done, it is essential that I should see proofs of some of your assertions.” It was an interesting thing to say. To Hardy, it wasn’t enough to know what was true; he wanted to know the proof—the story—of why it was true. Of course, Hardy could have taken it upon himself to find his own proofs. But I think part of it was that he wanted to get an idea of how Ramanujan thought—and what level of mathematician he really was.

His letter went on—with characteristic precision—to group Ramanujan’s results into three classes: already known, new and interesting but probably not important, and new and potentially important. But the only things he immediately put in the third category were Ramanujan’s statements about counting primes, adding that “almost everything depends on the precise rigour of the methods of proof which you have used.”

Hardy had obviously done some background research on Ramanujan by this point, since in his letter he makes reference to Ramanujan’s paper on Bernoulli numbers. But in his letter he just says, “I hope very much that you will send me as quickly as possible… a few of your proofs,” then closes with, “Hoping to hear from you again as soon as possible.”

Ramanujan did indeed respond quickly to Hardy’s letter, and his response is fascinating. First, he says he was expecting the same kind of reply from Hardy as he had from the “Mathematics Professor at London”, who just told him “not [to] fall into the pitfalls of divergent series.” Then he reacts to Hardy’s desire for rigorous proofs by saying, “If I had given you my methods of proof I am sure you will follow the London Professor.” He mentions his result 1+2+3+4+…=–1/12 and says that “If I tell you this you will at once point out to me the lunatic asylum as my goal.” He goes on to say, “I dilate on this simply to convince you that you will not be able to follow my methods of proof… [based on] a single letter.” He says that his first goal is just to get someone like Hardy to verify his results—so he’ll be able to get a scholarship, since “I am already a half starving man. To preserve my brains I want food…”

Ramanujan makes a point of saying that it was Hardy’s first category of results—ones that were already known—that he’s most pleased about, “For my results are verified to be true even though I may take my stand upon slender basis.” In other words, Ramanujan himself wasn’t sure if the results were correct—and he’s excited that they actually are.

So how was he getting his results? I’ll say more about this later. But he was certainly doing all sorts of calculations with numbers and formulas—in effect doing experiments. And presumably he was looking at the results of these calculations to get an idea of what might be true. It’s not clear how he figured out what was actually true—and indeed some of the results he quoted weren’t in the end true. But presumably he used some mixture of traditional mathematical proof, calculational evidence, and lots of intuition. But he didn’t explain any of this to Hardy.

Instead, he just started conducting a correspondence about the details of the results, and the fragments of proofs he was able to give. Hardy and Littlewood seemed intent on grading his efforts—with Littlewood writing about some result, for example, “(d) is still wrong, of course, rather a howler.” Still, they wondered if Ramanujan was “an Euler”, or merely “a Jacobi”. But Littlewood had to say, “The stuff about primes is wrong”—explaining that Ramanujan incorrectly assumed the Riemann zeta function didn’t have complex zeros, even though it actually has an infinite number of them, which are the subject of the whole Riemann hypothesis. (The Riemann hypothesis is still a famous unsolved math problem, even though an optimistic teacher suggested it to Littlewood as a project when he was an undergraduate…)

What about Ramanujan’s strange 1+2+3+4+… = –1/12? Well, that has to do with the Riemann zeta function as well. For positive integers, ζ(s) is defined as the sum And given those values, there’s a nice function—called Zeta[*s*] in the Wolfram Language—that can be obtained by continuing to all complex *s*. Now based on the formula for positive arguments, one can identify Zeta[–1] with 1+2+3+4+… But one can also just evaluate Zeta[–1]:

It’s a weird result, to be sure. But not as crazy as it might at first seem. And in fact it’s a result that’s nowadays considered perfectly sensible for purposes of certain calculations in quantum field theory (in which, to be fair, all actual infinities are intended to cancel out at the end).

But back to the story. Hardy and Littlewood didn’t really have a good mental model for Ramanujan. Littlewood speculated that Ramanujan might not be giving the proofs they assumed he had because he was afraid they’d steal his work. (Stealing was a major issue in academia then as it is now.) Ramanujan said he was “pained” by this speculation, and assured them that he was not “in the least apprehensive of my method being utilised by others.” He said that actually he’d invented the method eight years earlier, but hadn’t found anyone who could appreciate it, and now he was “willing to place unreservedly in your possession what little I have.”

Meanwhile, even before Hardy had responded to Ramanujan’s first letter, he’d been investigating with the government department responsible for Indian students how he could bring Ramanujan to Cambridge. It’s not quite clear quite what got communicated, but Ramanujan responded that he couldn’t go—perhaps because of his Brahmin beliefs, or his mother, or perhaps because he just didn’t think he’d fit in. But in any case, Ramanujan’s supporters started pushing instead for him to get a graduate scholarship at the University of Madras. More experts were consulted, who opined that “His results appear to be wonderful; but he is not, now, able to present any intelligible proof of some of them,” but “He has sufficient knowledge of English and is not too old to learn modern methods from books.”

The university administration said their regulations didn’t allow a graduate scholarship to be given to someone like Ramanujan who hadn’t finished an undergraduate degree. But they helpfully suggested that “Section XV of the Act of Incorporation and Section 3 of the Indian Universities Act, 1904, allow of the grant of such a scholarship [by the Government Educational Department], subject to the express consent of the Governor of Fort St George in Council.” And despite the seemingly arcane bureaucracy, things moved quickly, and within a few weeks Ramanujan was duly awarded a scholarship for two years, with the sole requirement that he provide quarterly reports.

By the time he got his scholarship, Ramanujan had started writing more papers, and publishing them in the *Journal of the Indian Mathematical Society*. Compared to his big claims about primes and divergent series, the topics of these papers were quite tame. But the papers were remarkable nevertheless.

What’s immediately striking about them is how calculational they are—full of actual, complicated formulas. Most math papers aren’t that way. They may have complicated notation, but they don’t have big expressions containing complicated combinations of roots, or seemingly random long integers.

In modern times, we’re used to seeing incredibly complicated formulas routinely generated by Mathematica. But usually they’re just intermediate steps, and aren’t what papers explicitly talk much about. For Ramanujan, though, complicated formulas were often what really told the story. And of course it’s incredibly impressive that he could derive them without computers and modern tools.

(As an aside, back in the late 1970s I started writing papers that involved formulas generated by computer. And in one particular paper, the formulas happened to have lots of occurrences of the number 9. But the experienced typist who typed the paper—yes, from a manuscript—replaced every “9” with a “g”. When I asked her why, she said, “Well, there are never explicit 9’s in papers!”)

Looking at Ramanujan’s papers, another striking feature is the frequent use of numerical approximations in arguments leading to exact results. People tend to think of working with algebraic formulas as an exact process—generating, for example, coefficients that are exactly 16, not just roughly 15.99999. But for Ramanujan, approximations were routinely part of the story, even when the final results were exact.

In some sense it’s not surprising that approximations to numbers are useful. Let’s say we want to know which is larger: or . We can start doing all sorts of transformations among square roots, and trying to derive theorems from them. Or we can just evaluate each expression numerically, and find that the first one (2.9755…) is obviously smaller than the second (3.322…). In the mathematical tradition of someone like Hardy—or, for that matter, in a typical modern calculus course—such a direct calculational way of answering the question seems somehow inappropriate and improper.

And of course if the numbers are very close one has to be careful about numerical round-off and so on. But for example in Mathematica and the Wolfram Language today—particularly with their built-in precision tracking for numbers—we often use numerical approximations internally as part of deriving exact results, much like Ramanujan did.

When Hardy asked Ramanujan for proofs, part of what he wanted was to get a kind of story for each result that explained why it was true. But in a sense Ramanujan’s methods didn’t lend themselves to that. Because part of the “story” would have to be that there’s this complicated expression, and it happens to be numerically greater than this other expression. It’s easy to see it’s true—but there’s no real story of why it’s true.

And the same happens whenever a key part of a result comes from pure computation of complicated formulas, or in modern times, from automated theorem proving. Yes, one can trace the steps and see that they’re correct. But there’s no bigger story that gives one any particular understanding of the results.

For most people it’d be bad news to end up with some complicated expression or long seemingly random number—because it wouldn’t tell them anything. But Ramanujan was different. Littlewood once said of Ramanujan that “every positive integer was one of his personal friends.” And between a good memory and good ability to notice patterns, I suspect Ramanujan could conclude a lot from a complicated expression or a long number. For him, just the object itself would tell a story.

Ramanujan was of course generating all these things by his own calculational efforts. But back in the late 1970s and early 1980s I had the experience of starting to generate lots of complicated results automatically by computer. And after I’d been doing it awhile, something interesting happened: I started being able to quickly recognize the “texture” of results—and often immediately see what might be likely be true. If I was dealing, say, with some complicated integral, it wasn’t that I knew any theorems about it. I just had an intuition about, for example, what functions might appear in the result. And given this, I could then get the computer to go in and fill in the details—and check that the result was correct. But I couldn’t derive why the result was true, or tell a story about it; it was just something that intuition and calculation gave me.

Now of course there’s a fair amount of pure mathematics where one can’t (yet) just routinely go in and do an explicit computation to check whether or not some result is correct. And this often happens for example when there are infinite or infinitesimal quantities or limits involved. And one of the things Hardy had specialized in was giving proofs that were careful in handling such things. In 1910 he’d even written a book called *Orders of Infinity* that was about subtle issues that come up in taking infinite limits. (In particular, in a kind of algebraic analog of the theory of transfinite numbers, he talked about comparing growth rates of things like nested exponential functions—and we even make some use of what are now called Hardy fields in dealing with generalizations of power series in the Wolfram Language.)

So when Hardy saw Ramanujan’s “fast and loose” handling of infinite limits and the like, it wasn’t surprising that he reacted negatively—and thought he would need to “tame” Ramanujan, and educate him in the finer European ways of doing such things, if Ramanujan was actually going to reliably get correct answers.

Ramanujan was surely a great human calculator, and impressive at knowing whether a particular mathematical fact or relation was actually true. But his greatest skill was, I think, something in a sense more mysterious: an uncanny ability to tell what was significant, and what might be deduced from it.

Take for example his paper “Modular Equations and Approximations to π”, published in 1914, in which he calculates (without a computer of course):

Most mathematicians would say, “It’s an amusing coincidence that that’s so close to an integer—but so what?” But Ramanujan realized there was more to it. He found other relations (those “=” should really be ≅):

Then he began to build a theory—that involves elliptic functions, though Ramanujan didn’t know that name yet—and started coming up with new series approximations for π:

Previous approximations to π had in a sense been much more sober, though the best one before Ramanujan’s (Machin’s series from 1706) did involve the seemingly random number 239:

But Ramanujan’s series—bizarre and arbitrary as they might appear—had an important feature: they took far fewer terms to compute π to a given accuracy. In 1977, Bill Gosper—himself a rather Ramanujan-like figure, whom I’ve had the pleasure of knowing for more than 35 years—took the last of Ramanujan’s series from the list above, and used it to compute a record number of digits of π. There soon followed other computations, all based directly on Ramanujan’s idea—as is the method we use for computing π in Mathematica and the Wolfram Language.

It’s interesting to see in Ramanujan’s paper that even he occasionally didn’t know what was and wasn’t significant. For example, he noted:

And then—in pretty much his only published example of geometry—he gave a peculiar geometric construction for approximately “squaring the circle” based on this formula:

To Hardy, Ramanujan’s way of working must have seemed quite alien. For Ramanujan was in some fundamental sense an experimental mathematician: going out into the universe of mathematical possibilities and doing calculations to find interesting and significant facts—and only then building theories based on them.

Hardy on the other hand worked like a traditional mathematician, progressively extending the narrative of existing mathematics. Most of his papers begin—explicitly or implicitly—by quoting some result from the mathematical literature, and then proceed by telling the story of how this result can be extended by a series of rigorous steps. There are no sudden empirical discoveries—and no seemingly inexplicable jumps based on intuition from them. It’s mathematics carefully argued, and built, in a sense, brick by brick.

A century later this is still the way almost all pure mathematics is done. And even if it’s discussing the same subject matter, perhaps anything else shouldn’t be called “mathematics”, because its methods are too different. In my own efforts to explore the computational universe of simple programs, I’ve certainly done a fair amount that could be called “mathematical” in the sense that it, for example, explores systems based on numbers.

Over the years, I’ve found all sorts of results that seem interesting. Strange structures that arise when one successively adds numbers to their digit reversals. Bizarre nested recurrence relations that generate primes. Peculiar representations of integers using trees of bitwise xors. But they’re empirical facts—demonstrably true, yet not part of the tradition and narrative of existing mathematics.

For many mathematicians—like Hardy—the process of proof is the core of mathematical activity. It’s not particularly significant to come up with a conjecture about what’s true; what’s significant is to create a proof that explains why something is true, constructing a narrative that other mathematicians can understand.

Particularly today, as we start to be able to automate more and more proofs, they can seem a bit like mundane manual labor, where the outcome may be interesting but the process of getting there is not. But proofs can also be illuminating. They can in effect be stories that introduce new abstract concepts that transcend the particulars of a given proof, and provide raw material to understand many other mathematical results.

For Ramanujan, though, I suspect it was facts and results that were the center of his mathematical thinking, and proofs felt a bit like some strange European custom necessary to take his results out of his particular context, and convince European mathematicians that they were correct.

But let’s return to the story of Ramanujan and Hardy.

In the early part of 1913, Hardy and Ramanujan continued to exchange letters. Ramanujan described results; Hardy critiqued what Ramanujan said, and pushed for proofs and traditional mathematical presentation. Then there was a long gap, but finally in December 1913, Hardy wrote again, explaining that Ramanujan’s most ambitious results—about the distribution of primes—were definitely incorrect, commenting that “…the theory of primes is full of pitfalls, to surmount which requires the fullest of trainings in modern rigorous methods.” He also said that if Ramanujan had been able to prove his results it would have been “about the most remarkable mathematical feat in the whole history of mathematics.”

In January 1914 a young Cambridge mathematician named E. H. Neville came to give lectures in Madras, and relayed the message that Hardy was (in Ramanujan’s words) “anxious to get [Ramanujan] to Cambridge”. Ramanujan responded that back in February 1913 he’d had a meeting, along with his “superior officer”, with the Secretary to the Students Advisory Committee of Madras, who had asked whether he was prepared to go to England. Ramanujan wrote that he assumed he’d have to take exams like the other Indian students he’d seen go to England, which he didn’t think he’d do well enough in—and also that his superior officer, a “very orthodox Brahman having scruples to go to foreign land replied at once that I could not go”.

But then he said that Neville had “cleared [his] doubts”, explaining that there wouldn’t be an issue with his expenses, that his English would do, that he wouldn’t have to take exams, and that he could remain a vegetarian in England. He ended by saying that he hoped Hardy and Littlewood would “be good enough to take the trouble of getting me [to England] within a very few months.”

Hardy had assumed it would be bureaucratically trivial to get Ramanujan to England, but actually it wasn’t. Hardy’s own Trinity College wasn’t prepared to contribute any real funding. Hardy and Littlewood offered to put up some of the money themselves. But Neville wrote to the registrar of the University of Madras saying that “the discovery of the genius of S. Ramanujan of Madras promises to be the most interesting event of our time in the mathematical world”—and suggested the university come up with the money. Ramanujan’s expat supporters swung into action, with the matter eventually reaching the Governor of Madras—and a solution was found that involved taking money from a grant that had been given by the government five years earlier for “establishing University vacation lectures”, but that was actually, in the bureaucratic language of “Document No. 182 of the Educational Department”, “not being utilised for any immediate purpose”.

There are strange little notes in the bureaucratic record, like on February 12: “What caste is he? Treat as urgent.” But eventually everything was sorted out, and on March 17, 1914, after a send-off featuring local dignitaries, Ramanujan boarded a ship for England, sailing up through the Suez Canal, and arriving in London on April 14. Before leaving India, Ramanujan had prepared for European life by getting Western clothes, and learning things like how to eat with a knife and fork, and how to tie a tie. Many Indian students had come to England before, and there was a whole procedure for them. But after a few days in London, Ramanujan arrived in Cambridge—with the Indian newspapers proudly reporting that “Mr. S. Ramanujan, of Madras, whose work in the higher mathematics has excited the wonder of Cambridge, is now in residence at Trinity.”

(In addition to Hardy and Littlewood, two other names that appear in connection with Ramanujan’s early days in Cambridge are Neville and Barnes. They’re not especially famous in the overall history of mathematics, but it so happens that in the Wolfram Language they’re both commemorated by built-in functions: NevilleThetaS and BarnesG.)

What was the Ramanujan who arrived in Cambridge like? He was described as enthusiastic and eager, though diffident. He made jokes, sometimes at his own expense. He could talk about politics and philosophy as well as mathematics. He was never particularly introspective. In official settings he was polite and deferential and tried to follow local customs. His native language was Tamil, and earlier in his life he had failed English exams, but by the time he arrived in England, his English was excellent. He liked to hang out with other Indian students, sometimes going to musical events, or boating on the river. Physically, he was described as short and stout—with his main notable feature being the brightness of his eyes. He worked hard, chasing one mathematical problem after another. He kept his living space sparse, with only a few books and papers. He was sensible about practical things, for example in figuring out issues with cooking and vegetarian ingredients. And from what one can tell, he was happy to be in Cambridge.

But then on June 28, 1914—two and a half months after Ramanujan arrived in England—Archduke Ferdinand was assassinated, and on July 28, World War I began. There was an immediate effect on Cambridge. Many students were called up for military duty. Littlewood joined the war effort and ended up developing ways to compute range tables for anti-aircraft guns. Hardy wasn’t a big supporter of the war—not least because he liked German mathematics—but he volunteered for duty too, though was rejected on medical grounds.

Ramanujan described the war in a letter to his mother, saying for example, “They fly in aeroplanes at great heights, bomb the cities and ruin them. As soon as enemy planes are sighted in the sky, the planes resting on the ground take off and fly at great speeds and dash against them resulting in destruction and death.”

Ramanujan nevertheless continued to pursue mathematics, explaining to his mother that “war is waged in a country that is as far as Rangoon is away from [Madras]”. There were practical difficulties, like a lack of vegetables, which caused Ramanujan to ask a friend in India to send him “some tamarind (seeds being removed) and good cocoanut oil by parcel post”. But of more importance, as Ramanujan reported it, was that the “professors here… have lost their interest in mathematics owing to the present war”.

Ramanujan told a friend that he had “changed [his] plan of publishing [his] results”. He said that he would wait to publish any of the old results in his notebooks until the war was over. But he said that since coming to England he had learned “their methods”, and was “trying to get new results by their methods so that I can easily publish these results without delay”.

In 1915 Ramanujan published a long paper entitled “Highly Composite Numbers” about maxima of the function (DivisorSigma in the Wolfram Language) that counts the number of divisors of a given number. Hardy seems to have been quite involved in the preparation of this paper—and it served as the centerpiece of Ramanujan’s analog of a PhD thesis.

For the next couple of years, Ramanujan prolifically wrote papers—and despite the war, they were published. A notable paper he wrote with Hardy concerns the partition function (PartitionsP in the Wolfram Language) that counts the number of ways an integer can be written as a sum of positive integers. The paper is a classic example of mixing the approximate with the exact. The paper begins with the result for large *n*:

But then, using ideas Ramanujan developed back in India, it progressively improves the estimate, to the point where the exact integer result can be obtained. In Ramanujan’s day, computing the exact value of PartitionsP[200] was a big deal—and the climax of his paper. But now, thanks to Ramanujan’s method, it’s instantaneous:

Cambridge was dispirited by the war—with an appalling number of its finest students dying, often within weeks, at the front lines. Trinity College’s big quad had become a war hospital. But through all of this, Ramanujan continued to do his mathematics—and with Hardy’s help continued to build his reputation.

But then in May 1917, there was another problem: Ramanujan got sick. From what we know now, it’s likely that what he had was a parasitic liver infection picked up in India. But back then nobody could diagnose it. Ramanujan went from doctor to doctor, and nursing home to nursing home. He didn’t believe much of what he was told, and nothing that was done seemed to help much. Some months he would be well enough to do a significant amount of mathematics; others not. He became depressed, and at one point apparently suicidal. It didn’t help that his mother had prevented his wife back in India from communicating with him, presumably fearing it would distract him.

Hardy tried to help—sometimes by interacting with doctors, sometimes by providing mathematical input. One doctor told Hardy he suspected “some obscure Oriental germ trouble imperfectly studied at present”. Hardy wrote, “Like all Indians, [Ramanujan] is fatalistic, and it is terribly hard to get him to take care of himself.” Hardy later told the now-famous story that he once visited Ramanujan at a nursing home, telling him that he came in a taxicab with number 1729, and saying that it seemed to him a rather dull number—to which Ramanujan replied: “No, it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways”: . (Wolfram|Alpha now reports some other properties too.)

But through all of this, Ramanujan’s mathematical reputation continued to grow. He was elected a Fellow of the Royal Society (with his supporters including Hobson and Baker, both of whom had failed to respond to his original letter)—and in October 1918 he was elected a fellow of Trinity College, assuring him financial support. A month later World War I was over—and the threat of U-boat attacks, which had made travel to India dangerous, was gone.

And so on March 13, 1919, Ramanujan returned to India—now very famous and respected, but also very ill. Through it all, he continued to do mathematics, writing a notable letter to Hardy about “mock” theta functions on January 12, 1920. He chose to live humbly, and largely ignored what little medicine could do for him. And on April 26, 1920, at the age of 32, and three days after the last entry in his notebook, he died.

From when he first started doing mathematics research, Ramanujan had recorded his results in a series of hardcover notebooks—publishing only a very small fraction of them. When Ramanujan died, Hardy began to organize an effort to study and publish all 3000 or so results in Ramanujan’s notebooks. Several people were involved in the 1920s and 1930s, and quite a few publications were generated. But through various misadventures the project was not completed—to be taken up again only in the 1970s.

In 1940, Hardy gave all the letters he had from Ramanujan to the Cambridge University Library, but the original cover letter for what Ramanujan sent in 1913 was not among them—so now the only record we have of that is the transcription Hardy later published. Ramanujan’s three main notebooks sat for many years on top of a cabinet in the librarian’s office at the University of Madras, where they suffered damage from insects, but were never lost. His other mathematical documents passed through several hands, and some of them wound up in the incredibly messy office of a Cambridge mathematician—but when he died in 1965 they were noticed and sent to a library, where they languished until they were “rediscovered” with great excitement as Ramanujan’s lost notebook in 1976.

When Ramanujan died, it took only days for his various relatives to start asking for financial support. There were large medical bills from England, and there was talk of selling Ramanujan’s papers to raise money.

Ramanujan’s wife was 21 when he died, but as was the custom, she never remarried. She lived very modestly, making her living mostly from tailoring. In 1950 she adopted the son of a friend of hers who had died. By the 1960s, Ramanujan was becoming something of a general Indian hero, and she started receiving various honors and pensions. Over the years, quite a few mathematicians had come to visit her—and she had supplied them for example with the passport photo that has become the most famous picture of Ramanujan.

She lived a long life, dying in 1994 at the age of 95, having outlived Ramanujan by 73 years.

Hardy was 35 when Ramanujan’s letter arrived, and was 43 when Ramanujan died. Hardy viewed his “discovery” of Ramanujan as his greatest achievement, and described his association with Ramanujan as the “one romantic incident of [his] life”. After Ramanujan died, Hardy put some of his efforts into continuing to decode and develop Ramanujan’s results, but for the most part he returned to his previous mathematical trajectory. His collected works fill seven large volumes (while Ramanujan’s publications make up just one fairly slim volume). The word clouds of the titles of his papers show only a few changes from before he met Ramanujan to after:

Shortly before Ramanujan entered his life, Hardy had started to collaborate with John Littlewood, who he would later say was an even more important influence on his life than Ramanujan. After Ramanujan died, Hardy moved to what seemed like a better job in Oxford, and ended up staying there for 11 years before returning to Cambridge. His absence didn’t affect his collaboration with Littlewood, though—since they worked mostly by exchanging written messages, even when their rooms were only a few hundred feet apart. After 1911 Hardy rarely did mathematics without a collaborator; he worked especially with Littlewood, publishing 95 papers with him over the course of 38 years.

Hardy’s mathematics was always of the finest quality. He dreamed of doing something like solving the Riemann hypothesis—but in reality never did anything truly spectacular. He wrote two books, though, that continue to be read today: *An Introduction to the Theory of Numbers*, with E. M. Wright; and *Inequalities*, with Littlewood and G. Pólya.

Hardy lived his life in the stratum of the intellectual elite. In the 1920s he displayed a picture of Lenin in his apartment, and was briefly president of the “scientific workers” trade union. He always wrote elegantly, mostly about mathematics, and sometimes about Ramanujan. He eschewed gadgets and always lived along with students and other professors in his college. He never married, though near the end of his life his younger sister joined him in Cambridge (she also had never married, and had spent most of her life teaching at the girls’ school where she went as a child).

In 1940 Hardy wrote a small book called *A Mathematician’s Apology*. I remember when I was about 12 being given a copy of this book. I think many people viewed it as a kind of manifesto or advertisement for pure mathematics. But I must say it didn’t resonate with me at all. It felt to me at once sanctimonious and austere, and I wasn’t impressed by its attempt to describe the aesthetics and pleasures of mathematics, or by the pride with which its author said that “nothing I have ever done is of the slightest practical use” (actually, he co-invented the Hardy-Weinberg law used in genetics). I doubt I would have chosen the path of a pure mathematician anyway, but Hardy’s book helped make certain of it.

To be fair, however, Hardy wrote the book at a low point in his own life, when he was concerned about his health and the loss of his mathematical faculties. And perhaps that explains why he made a point of explaining that “mathematics… is a young man’s game”. (And in an article about Ramanujan, he wrote that “a mathematician is often comparatively old at 30, and his death may be less of a catastrophe than it seems.”) I don’t know if the sentiment had been expressed before—but by the 1970s it was taken as an established fact, extending to science as well as mathematics. Kids I knew would tell me I’d better get on with things, because it’d be all over by age 30.

Is that actually true? I don’t think so. It’s hard to get clear evidence, but as one example I took the data we have on notable mathematical theorems in Wolfram|Alpha and the Wolfram Language, and make a histogram of the ages of people who proved them. It’s not a completely uniform distribution (though the peak just before 40 is probably just a theorem-selection effect associated with Fields Medals), but particularly if one corrects for life expectancies now and in the past it’s a far cry from showing that mathematical productivity has all but dried up by age 30.

My own feeling—as someone who’s getting older myself—is that at least up to my age, many aspects of scientific and technical productivity actually steadily increase. For a start, it really helps to know more—and certainly a lot of my best ideas have come from making connections between things I’ve learned decades apart. It also helps to have more experience and intuition about how things will work out. And if one has earlier successes, those can help provide the confidence to move forward more definitively, without second guessing. Of course, one must maintain the constitution to focus with enough intensity—and be able to concentrate for long enough—to think through complex things. I think in some ways I’ve gotten slower over the years, and in some ways faster. I’m slower because I know more about mistakes I make, and try to do things carefully enough to avoid them. But I’m faster because I know more and can shortcut many more things. Of course, for me in particular, it also helps that over the years I’ve built all sorts of automation that I’ve been able to make use of.

A quite different point is that while making specific contributions to an existing area (as Hardy did) is something that can potentially be done by the young, creating a whole new structure tends to require the broader knowledge and experience that comes with age.

But back to Hardy. I suspect it was a lack of motivation rather than ability, but in his last years, he became quite dispirited and all but dropped mathematics. He died in 1947 at the age of 70.

Littlewood, who was a decade younger than Hardy, lived on until 1977. Littlewood was always a little more adventurous than Hardy, a little less austere, and a little less august. Like Hardy, he never married—though he did have a daughter (with the wife of the couple who shared his vacation home) whom he described as his “niece” until she was in her forties. And—giving a lie to Hardy’s claim about math being a young man’s game—Littlewood (helped by getting early antidepressant drugs at the age of 72) had remarkably productive years of mathematics in his 80s.

What became of Ramanujan’s mathematics? For many years, not too much. Hardy pursued it some, but the whole field of number theory—which was where the majority of Ramanujan’s work was concentrated—was out of fashion. Here’s a plot of the fraction of all math papers tagged as “number theory” as a function of time in the Zentralblatt database:

Ramanujan’s interest may have been to some extent driven by the peak in the early 1900s (which would probably go even higher with earlier data). But by the 1930s, the emphasis of mathematics had shifted away from what seemed like particular results in areas like number theory and calculus, towards the greater generality and formality that seemed to exist in more algebraic areas.

In the 1970s, though, number theory suddenly became more popular again, driven by advances in algebraic number theory. (Other subcategories showing substantial increases at that time include automorphic forms, elementary number theory and sequences.)

Back in the late 1970s, I had certainly heard of Ramanujan—though more in the context of his story than his mathematics. And I was pleased in 1982, when I was writing about the vacuum in quantum field theory, that I could use results of Ramanujan’s to give closed forms for particular cases (of infinite sums in various dimensions of modes of a quantum field—corresponding to Epstein zeta functions):

Starting in the 1970s, there was a big effort—still not entirely complete—to prove results Ramanujan had given in his notebooks. And there were increasing connections being found between the particular results he’d got, and general themes emerging in number theory.

A significant part of what Ramanujan did was to study so-called special functions—and to invent some new ones. Special functions—like the zeta function, elliptic functions, theta functions, and so on—can be thought of as defining convenient “packets” of mathematics. There are an infinite number of possible functions one can define, but what get called “special functions” are ones whose definitions survive because they turn out to be repeatedly useful.

And today, for example, in Mathematica and the Wolfram Language we have RamanujanTau, RamanujanTauL, RamanujanTauTheta and RamanujanTauZ as special functions. I don’t doubt that in the future we’ll have more Ramanujan-inspired functions. In the last year of his life, Ramanujan defined some particularly ambitious special functions that he called “mock theta functions”—and that are still in the process of being made concrete enough to routinely compute.

If one looks at the definition of Ramanujan’s tau function it seems quite bizarre (notice the “24”):

And to my mind, the most remarkable thing about Ramanujan is that he could define something as seemingly arbitrary as this, and have it turn out to be useful a century later.

In antiquity, the Pythagoreans made much of the fact that 1+2+3+4=10. But to us today, this just seems like a random fact of mathematics, not of any particular significance. When I look at Ramanujan’s results, many of them also seem like random facts of mathematics. But the amazing thing that’s emerged over the past century, and particularly over the past few decades, is that they’re not. Instead, more and more of them are being found to be connected to deep, elegant mathematical principles.

To enunciate these principles in a direct and formal way requires layers of abstract mathematical concepts and language which have taken decades to develop. But somehow, through his experiments and intuition, Ramanujan managed to find concrete examples of these principles. Often his examples look quite arbitrary—full of seemingly random definitions and numbers. But perhaps it’s not surprising that that’s what it takes to express modern abstract principles in terms of the concrete mathematical constructs of the early twentieth century. It’s a bit like a poet trying to express deep general ideas—but being forced to use only the imperfect medium of human natural language.

It’s turned out to be very challenging to prove many of Ramanujan’s results. And part of the reason seems to be that to do so—and to create the kind of narrative needed for a good proof—one actually has no choice but to build up much more abstract and conceptually complex structures, often in many steps.

So how is it that Ramanujan managed in effect to predict all these deep principles of later mathematics? I think there are two basic logical possibilities. The first is that if one drills down from any sufficiently surprising result, say in number theory, one will eventually reach a deep principle in the effort to explain it. And the second possibility is that while Ramanujan did not have the wherewithal to express it directly, he had what amounts to an aesthetic sense of which seemingly random facts would turn out to fit together and have deeper significance.

I’m not sure which of these possibilities is correct, and perhaps it’s a combination. But to understand this a little more, we should talk about the overall structure of mathematics. In a sense mathematics as it’s practiced is strangely perched between the trivial and the impossible. At an underlying level, mathematics is based on simple axioms. And it could be—as it is, say, for the specific case of Boolean algebra—that given the axioms there’s a straightforward procedure to figure out whether any particular result is true. But ever since Gödel’s theorem in 1931 (which Hardy must have been aware of, but apparently never commented on) it’s been known that for an area like number theory the situation is quite different: there are statements one can give within the context of the theory whose truth or falsity is undecidable from the axioms.

It was proved in the early 1960s that there are polynomial equations involving integers where it’s undecidable from the axioms of arithmetic—or in effect from the formal methods of number theory—whether or not the equations have solutions. The particular examples of classes of equations where it’s known that this happens are extremely complex. But from my investigations in the computational universe, I’ve long suspected that there are vastly simpler equations where it happens too. Over the past several decades, I’ve had the opportunity to poll some of the world’s leading number theorists on where they think the boundary of undecidability lies. Opinions differ, but it’s certainly within the realm of possibility that for example cubic equations with three variables could exhibit undecidability.

So the question then is, why should the truth of what seem like random facts of number theory even be decidable? In other words, it’s perfectly possible that Ramanujan could have stated a result that simply can’t be proved true or false from the axioms of arithmetic. Conceivably the Goldbach conjecture will turn out to be an example. And so could many of Ramanujan’s results.

Some of Ramanujan’s results have taken decades to prove—but the fact that they’re provable at all is already important information. For it suggests that in a sense they’re not just random facts; they’re actually facts that can somehow be connected by proofs back to the underlying axioms.

And I must say that to me this tends to support the idea that Ramanujan had intuition and aesthetic criteria that in some sense captured some of the deeper principles we now know, even if he couldn’t express them directly.

It’s pretty easy to start picking mathematical statements, say at random, and then getting empirical evidence for whether they’re true or not. Gödel’s theorem effectively implies that you’ll never know how far you’ll have to go to be certain of any particular result. Sometimes it won’t be far, but sometimes it may in a sense be arbitrarily far.

Ramanujan no doubt convinced himself of many of his results by what amount to empirical methods—and often it worked well. In the case of the counting of primes, however, as Hardy pointed out, things turn out to be more subtle, and results that might work up to very large numbers can eventually fail.

So let’s say one looks at the space of possible mathematical statements, and picks statements that appear empirically at least to some level to be true. Now the next question: are these statements connected in any way?

Imagine one could find proofs of the statements that are true. These proofs effectively correspond to paths through a directed graph that starts with the axioms, and leads to the true results. One possibility is then that the graph is like a star—with every result being independently proved from the axioms. But another possibility is that there are many common “waypoints” in getting from the axioms to the results. And it’s these waypoints that in effect represent general principles.

If there’s a certain sparsity to true results, then it may be inevitable that many of them are connected through a small number of general principles. It might also be that there are results that aren’t connected in this way, but these results, perhaps just because of their lack of connections, aren’t considered “interesting”—and so are effectively dropped when one thinks about a particular subject.

I have to say that these considerations lead to an important question for me. I have spent many years studying what amounts to a generalization of mathematics: the behavior of arbitrary simple programs in the computational universe. And I’ve found that there’s a huge richness of complex behavior to be seen in such programs. But I have also found evidence—not least through my Principle of Computational Equivalence—that undecidability is rife there.

But now the question is, when one looks at all that rich and complex behavior, are there in effect Ramanujan-like facts to be found there? Ultimately there will be much that can’t readily be reasoned about in axiom systems like the ones in mathematics. But perhaps there are networks of facts that can be reasoned about—and that all connect to deeper principles of some kind.

We know from the idea around the Principle of Computational Equivalence that there will always be pockets of “computational reducibility”: places where one will be able to identify abstract patterns and make abstract conclusions without running into undecidability. Repetitive behavior and nested behavior are two almost trivial examples. But now the question is whether among all the specific details of particular programs there are other general forms of organization to be found.

Of course, whereas repetition and nesting are seen in a great many systems, it could be that another form of organization would be seen only much more narrowly. But we don’t know. And as of now, we don’t really have much of a handle on finding out—at least until or unless there’s a Ramanujan-like figure not for traditional mathematics but for the computational universe.

Will there ever be another Ramanujan? I don’t know if it’s the legend of Ramanujan or just a natural feature of the way the world is set up, but for at least 30 years I’ve received a steady stream of letters that read a bit like the one Hardy got from Ramanujan back in 1913. Just a few months ago, for example, I received an email (from India, as it happens) with an image of a notebook listing various mathematical expressions that are numerically almost integers—very much like Ramanujan’s .

Are these numerical facts significant? I don’t know. Wolfram|Alpha can certainly generate lots of similar facts, but without Ramanujan-like insight, it’s hard to tell which, if any, are significant.

Over the years I’ve received countless communications a bit like this one. Number theory is a common topic. So are relativity and gravitation theory. And particularly in recent years, AI and consciousness have been popular too. The nice thing about letters related to math is that there’s typically something immediately concrete in them: some specific formula, or fact, or theorem. In Hardy’s day it was hard to check such things; today it’s a lot easier. But—as in the case of the almost integer above—there’s then the question of whether what’s being said is somehow “interesting”, or whether it’s just a “random uninteresting fact”.

Needless to say, the definition of “interesting” isn’t an easy or objective one. And in fact the issues are very much the same as Hardy faced with Ramanujan’s letter. If one can see how what’s being presented fits into some bigger picture—some narrative—that one understands, then one can tell whether, at least within that framework, something is “interesting”. But if one doesn’t have the bigger picture—or if what’s being presented is just “too far out”—then one really has no way to tell if it should be considered interesting or not.

When I first started studying the behavior of simple programs, there really wasn’t a context for understanding what was going on in them. The pictures I got certainly seemed visually interesting. But it wasn’t clear what the bigger intellectual story was. And it took quite a few years before I’d accumulated enough empirical data to formulate hypotheses and develop principles that let one go back and see what was and wasn’t interesting about the behavior I’d observed.

I’ve put a few decades into developing a science of the computational universe. But it’s still young, and there is much left to discover—and it’s a highly accessible area, with no threshold of elaborate technical knowledge. And one consequence of this is that I frequently get letters that show remarkable behavior in some particular cellular automaton or other simple program. Often I recognize the general form of the behavior, because it relates to things I’ve seen before, but sometimes I don’t—and so I can’t be sure what will or won’t end up being interesting.

Back in Ramanujan’s day, mathematics was a younger field—not quite as easy to enter as the study of the computational universe, but much closer than modern academic mathematics. And there were plenty of “random facts” being published: a particular type of integral done for the first time, or a new class of equations that could be solved. Many years later we would collect as many of these as we could to build them into the algorithms and knowledgebase of Mathematica and the Wolfram Language. But at the time probably the most significant aspect of their publication was the proofs that were given: the stories that explained why the results were true. Because in these proofs, there was at least the potential that concepts were introduced that could be reused elsewhere, and build up part of the fabric of mathematics.

It would take us too far afield to discuss this at length here, but there is a kind of analog in the study of the computational universe: the methodology for computer experiments. Just as a proof can contain elements that define a general methodology for getting a mathematical result, so the particular methods of search, visualization or analysis can define something in computer experiments that is general and reusable, and can potential give an indication of some underlying idea or principle.

And so, a bit like many of the mathematics journals of Ramanujan’s day, I’ve tried to provide a journal and a forum where specific results about the computational universe can be reported—though there is much more that could be done along these lines.

When a letter one receives contains definite mathematics, in mathematical notation, there is at least something concrete one can understand in it. But plenty of things can’t usefully be formulated in mathematical notation. And too often, unfortunately, letters are in plain English (or worse, for me, other languages) and it’s almost impossible for me to tell what they’re trying to say. But now there’s something much better that people increasingly do: formulate things in the Wolfram Language. And in that form, I’m always able to tell what someone is trying to say—although I still may not know if it’s significant or not.

Over the years, I’ve been introduced to many interesting people through letters they’ve sent. Often they’ll come to our Summer School, or publish something in one of our various channels. I have no story (yet) as dramatic as Hardy and Ramanujan. But it’s wonderful that it’s possible to connect with people in this way, particularly in their formative years. And I can’t forget that a long time ago, I was a 14-year-old who mailed papers about the research I’d done to physicists around the world…

Ramanujan did his calculations by hand—with chalk on slate, or later pencil on paper. Today with Mathematica and the Wolfram Language we have immensely more powerful tools with which to do experiments and make discoveries in mathematics (not to mention the computational universe in general).

It’s fun to imagine what Ramanujan would have done with these modern tools. I rather think he would have been quite an adventurer—going out into the mathematical universe and finding all sorts of strange and wonderful things, then using his intuition and aesthetic sense to see what fits together and what to study further.

Ramanujan unquestionably had remarkable skills. But I think the first step to following in his footsteps is just to be adventurous: not to stay in the comfort of well-established mathematical theories, but instead to go out into the wider mathematical universe and start finding—experimentally—what’s true.

It’s taken the better part of a century for many of Ramanujan’s discoveries to be fitted into a broader and more abstract context. But one of the great inspirations that Ramanujan gives us is that it’s possible with the right sense to make great progress even before the broader context has been understood. And I for one hope that many more people will take advantage of the tools we have today to follow Ramanujan’s lead and make great discoveries in experimental mathematics—whether they announce them in unexpected letters or not.

]]>

I normally spend my time trying to build the future. But I find history really interesting and informative, and I study it quite a lot. Usually it’s other people’s history. But the Computer History Museum asked me to talk today about my own history, and the history of technology I’ve built. So that’s what I’m going to do here.

This happens to be a really exciting time for me—because a bunch of things that I’ve been working on for more than 30 years are finally coming to fruition. And mostly that’s what I’ve been out in the Bay Area this week talking about.

The focus is the Wolfram Language, which is really a new kind of language—a knowledge-based language—in which as much knowledge as possible about computation and about the world is built in. And in which the language automates as much as possible so one can go as directly as possible from computational thinking to actual implementation.

And what I want to do here is to talk about how all this came to be, and how things like Mathematica and Wolfram|Alpha emerged along the way.

Inevitably a lot of what I’m going to talk about is really my story: basically the story of how I’ve spent most of my life so far building a big stack of technology and science. When I look back, some of what’s happened seems sort of inevitable and inexorable. And some I didn’t see coming.

But let me begin at the beginning. I was born in London, England, in 1959—so, yes, I’m outrageously old, at least by my current standards. My father ran a small company—doing international trading of textiles—for nearly 60 years, and also wrote a few “serious fiction” novels. My mother was a philosophy professor at Oxford. I actually happened to notice her textbook on philosophical logic in the Stanford bookstore last time I was there.

You know, I remember when I was maybe 5 or 6 being bored at some party with a bunch of adults, and somehow ending up talking at great length to some probably very distinguished Oxford philosopher—who I heard say at the end, “One day that child will be a philosopher—but it may take a while.” Well, they were right. It’s sort of funny how these things work out.

Here’s me back then:

I went to elementary school in Oxford—to a place called the Dragon School, that I guess happens to be probably the most famous elementary school in England. Wikipedia seems to think the most famous people now from my class there are myself and the actor Hugh Laurie.

Here’s one of my school reports, from when I was 7. Those are class ranks. So, yes, I did well in poetry and geography, but not in math. (And, yes, it’s England, so they taught “Bible study” in school, at least then.) But at least it said “He is full of spirit and determination; he should go far”…

But OK, that was 1967, and I was learning Latin and things—but what I really liked was the future. And the big future-oriented thing happening back then was the space program. And I was really interested in that, and started collecting all the information I could about every spacecraft launched—and putting together little books summarizing it. And I discovered that even from England one could write to NASA and get all this great stuff mailed to one for free.

Well, back then, there was supposed to be a Mars colony any day, and I started doing little designs for that, and for spacecraft and things.

And that got me interested in propulsion and ion drives and stuff like that—and by the time I was 11 what I was really interested in was physics.

And I discovered—having nothing to do with school—that if one just reads books one can learn stuff pretty quickly. I would pick areas of physics and try to organize knowledge about them. And when I was turning 12 I ended up spending the summer putting together all the facts I could accumulate about physics. And, yes, I suppose you could call some of these “visualizations”. And, yes, like so much else, it’s on the web now:

I found this again a few years ago—around the time Wolfram|Alpha came out—and I thought, “Oh my gosh, I’ve been doing the same thing all my life!” And then of course I started typing in numbers from when I was 11 or 12 to see if Wolfram|Alpha got them right. It did, of course:

Well, when I was 12, following British tradition I went to a so-called public school that’s actually a private school. I went to the most famous such school—Eton—which was founded about 50 years before Columbus came to America. And, oh so impressively , I even got the top scholarship among new kids in 1972.

Yes, everyone wore tailcoats all the time, and King’s Scholars, like me, wore gowns too—which provided excellent rain protection etc. I think I avoided these annual Harry-Potter-like pictures all but one time:

And back in those Latin-and-Greek-and-tailcoat days I had a sort of double life, because my real passion was doing physics.

The summer when I turned 13 I put together a summary of particle physics:

And I made the important meta-discovery that even if one was a kid, one could discover stuff. And I started just trying to answer questions about physics, either by finding answers in books, or by figuring them out myself. And by the time I was 15 I started publishing papers about physics. Yes, nobody asks how old you are when you mail a paper in to a physics journal.

But, OK, something important for me had happened back when I was 12 and first at Eton: I got to know my first computer. It’s an Elliott 903C. This is not the actual one I used, but it’s similar:

It had come to Eton through a teacher of mine named Norman Routledge, who had been a friend of Alan Turing’s. It had 8 kilowords of 18-bit ferrite core memory, and you usually programmed it with paper—or Mylar—tape, most often in a little 16-instruction assembler called SIR.

It often seemed like one of the most important skills was rewinding the tape as quickly as possible after it got dumped in a bin after going through the optical reader.

Anyway, I wanted to use the computer to do physics. When I was 12 I had gotten this book:

What’s on the cover is supposed to be a simulation of gas molecules showing increasing randomness and entropy. As it happens, years later I discovered this picture was actually kind of a fake. But back when I was 12, I really wanted to reproduce it—with the computer.

It wasn’t so easy. The molecule positions were supposed to be real numbers; one had to have an algorithm for collisions; and so on. And to make this fit on the Elliott 903 I ended up simplifying a lot—to what was actually a 2D cellular automaton.

Well, a decade after that, I made some big discoveries about cellular automata. But back then I was unlucky with my cellular automaton rule, and I ended up not discovering anything with it. And in the end my biggest achievement with the Elliott 903 was writing a punched tape loader for it.

You see, the big problem with the Mylar tape that one used for serious programs is that it would get statically electrically charged and pick up little confetti holes, so the bits would be read wrong. Well, for my loader, I came up with what I later found out were error-correcting codes—and I set it up so that if the checks failed, the tape would stop in the reader, and you could pull it back a couple of feet, and then re-read it, after shaking out the confetti.

OK, so by the time I was 16 I had published some physics papers and was starting to be known in physics circles—and I left school, and went to work at a British government lab called the Rutherford Lab that did particle physics research.

Now you might remember from my age-7 school report that I didn’t do very well in math. Things got a bit better when I started to use a slide rule, and then in 1972 a calculator—of which I was a very early adopter. But I never liked doing school math, or math calculations in general. Well, in particle physics there’s a lot of math to be done—and so my dislike of it was a problem.

At the Rutherford Lab, two things helped. First, a lovely HP desktop computer with a plotter, on which I could do very nice interactive computation. And second, a mainframe for crunchier things, that I programmed in Fortran.

Well, after my time at the Rutherford Lab I went to college at Oxford. Within a very short time I’d decided this was a mistake—but in those days one didn’t actually have to go to lectures for classes—so I was able to just hide out and do physics research. And mostly I spent my time in a nice underground air-conditioned room in the Nuclear Physics building—that had terminals connected to a mainframe, and to the ARPANET.

And that was when—in 1976—I first started using computers to do symbolic math, and algebra and things. Feynman diagrams in particle physics involve lots and lots of algebra. And back in 1962, I think, three physicists had met at CERN and decided to try to use computers to do this. They had three different approaches. One wrote a system called ASHMEDAI in Fortran. One—influenced by John McCarthy at Stanford—wrote a system called Reduce in Lisp. And one wrote a system called SCHOONSCHIP in CDC 6000 series assembly language, with mnemonics in Dutch. Curiously, years later, one of these physicists won a Nobel Prize. It was Tini Veltman—the one who wrote SCHOONSCHIP in assembly language.

Anyway, back in 1976 very few people other than the creators of these systems used them. But I started using all of them. But my favorite was a quite different system, written in Lisp at MIT since the mid-1960s. It was a system called Macsyma. It ran on the Project MAC PDP-10 computer. And what was really important to me as a 17-year-old kid in England was that I could get to it on the ARPANET.

It was host 236. So I would type @O 236, and there I was in an interactive operating system. Someone had taken the login SW. So I became Swolf, and started to use Macsyma.

I spent the summer of 1977 at Argonne National Lab—where they actually trusted physicists to be right in the room with the mainframe.

Then in 1978 I went to Caltech as a graduate student. By that point, I think I was the world’s largest user of computer algebra. And it was so neat, because I could just compute all this stuff so easily. I used to have fun putting incredibly ornate formulas in my physics papers. Then I could see if anyone was reading the papers, because I’d get letters saying, “How did you derive line such-and-such from the one before?”

I got a reputation for being a great calculator. Which was of course 100% undeserved—because it wasn’t me, it was just the computer. Well, actually, to be fair, there was part that was me. You see, by being able to compute so many different examples, I had gotten a new kind of intuition. I was no good at computing integrals myself, but I could go back and forth with the computer, knowing from intuition what to try, and then doing experiments to see what worked.

I was writing lots of code for Macsyma, and building this whole tower. And sometime in 1979 I hit the edge. Something new was needed. (Notice, for example, the ominous “MACSYMA RELOAD” line in the diagram.)

Well, in November 1979, just after I turned 20, I put together some papers, called it a thesis, and got my PhD. And a couple of days later I was visiting CERN in Geneva—and thinking about my future in, I thought, physics. And the one thing I was sure about was that I needed something beyond Macsyma that would let me compute things. And that was when I decided I had to build a system for myself. And right then and there, I started designing the system, handwriting its specification.

At first it was going to be ALGY–The Algebraic Manipulator. But I quickly realized that I actually had to make it do much more than algebraic manipulation. I knew most of the general-purpose computer languages of the time—both the ALGOL-like ones, and ones like Lisp and APL. But somehow they didn’t seem to capture what I wanted the system to do.

So I guess I did what I’d learned in physics: I tried to drill down to find the atoms—the primitives—of what’s going on. I knew a certain amount about mathematical logic, and the history of attempts to formulate things using logic and so on—even if my mother’s textbook about philosophical logic didn’t exist yet.

The whole history of this effort at formalization—through Aristotle, Leibniz, Frege, Peano, Hilbert, Whitehead, Russell, and so on—is really interesting. But it’s a different talk. But back in 1979 it was thinking about this kind of thing that led me to the design I came up with, that was based on the idea of symbolic expressions, and doing transformations on them.

I named what I wanted to build SMP: a Symbolic Manipulation Program, and started recruiting people from around Caltech to help me with it. Richard Feynman came to a bunch of the meetings I had to discuss the design of SMP, offering various ideas—which I have to admit I considered hacky—about shortcuts for interacting with the system. Meanwhile, the physics department had just gotten a VAX 11/780, and after some wrangling, it was made to run Unix. Meanwhile, a young physics grad student named Rob Pike—more recently creator of the Go programming language—persuaded me that I should write the code for my system in the “language of the future”: C.

I got pretty good at writing C, for a while averaging about a thousand lines a day. And with the help of a somewhat colorful collection of characters, by June 1981, the first version of SMP existed—with a big book of documentation I’d written.

OK, you might ask: so can we see SMP? Well, back when we were working on SMP I had the bright idea that we should protect the source code by encrypting it. And—you guessed it—over a span of three decades nobody remembers the password. And until a little while ago, that was the situation.

In another bright idea, I had used a modified version of the Unix crypt program to do the encryption—thinking that would be more secure. Well, as part of the 25th anniversary of Mathematica a couple of years ago, we did a crowdsourced project to break the encryption—and we did it. Unfortunately it wasn’t easy to compile the code though—but thanks to a 15-year-old volunteer, we’ve actually now got something running.

So here it is: running inside a VAX virtual machine emulator, I can show you for the first time in public in 30 years—a running version of SMP.

SMP had a mixture of good ideas, and very bad ideas. One example of a bad idea—actually suggested to me by Tini Veltman, author of SCHOONSHIP—was representing rationals using floating point, so one could make use of the faster floating-point instructions on many processors. But there were plenty of other bad ideas too, like having a garbage collector that had to crawl the stack and realign pointers when it ran.

There were some interesting ideas. Like what I called “projections”—which were essentially a unification of functions and lists. They were almost wonderful, but there were confusions about currying—or what I called tiering. And there were weird edge cases about things that were almost vectors with sequential integer indices.

But all in all, SMP worked pretty well, and I certainly found it very useful. So now the next problem was what to do with it. I realized it needed a real team to work on it, and I thought the best way to get that was somehow to make it commercial. But at the time I was a 21-year-old physics-professor type, who didn’t know anything about business.

So I thought, let me go to the tech transfer office at the university, and ask them what to do. But it turned out they didn’t know, because, as they explained, “Mostly professors don’t come to us; they just start their own companies.” “Well,” I said, “can I do that?” And right then and there the lawyer who pretty much was the tech transfer office pulled out the faculty handbook, and looked through it, and said, “Well, yes, it says copyrightable materials are owned by their authors, and software is copyrightable, so, yes, you can do whatever you want.”

And so off I went to try to start a company. Though it turned out not to be so simple—because suddenly the university decided that actually I couldn’t just do what I wanted.

A couple of years ago I was visiting Caltech and I ran into the 95-year-old chap who had been the provost at the time—and he finally filled in for me the remaining details of what he called the “Wolfram Affair”. It was more bizarre than one could possibly imagine. I won’t tell it all here. But suffice it to say that the story starts with Arnold Beckman, Caltech postdoc in 1929, claiming rights to the pH meter, and starting Beckman Instruments—and then in 1980 being chairman of the Caltech board of trustees and being upset when he realized that gene-sequencing technology had been invented at Caltech and had “walked off campus” to turn into Applied Biosystems.

But the company I started weathered this storm—even if I ended up quitting Caltech, and Caltech ended up with a weird software-ownership policy that affected their computer-science recruiting efforts for a long time.

I didn’t do a great job starting what I called Computer Mathematics Corporation. I brought in a person—who happened to be twice my age—to be CEO. And rather quickly things started to diverge from what I thought made sense.

One of my favorite moments of insanity was the idea to get into the hardware business and build a workstation to run SMP on. Well, at the time no workstation had enough memory, and the 68000 didn’t handle virtual memory. So a scheme was concocted whereby two 68000s would run an instruction out of step, and if the first one saw a page fault, it would stop the other one and fetch the data. I thought it was nuts. And I also happened to have visited Stanford, and run into a grad student named Andy Bechtolsheim who was showing off a Stanford University Network—SUN—workstation with a cardboard box as a case.

But worse than all that, this was 1981, and there was the idea that AI—in the form of expert systems—was hot. So the company merged with another company that did expert systems, to form what was called Inference Corporation (which eventually became Nasdaq:INFR). SMP was the cash cow—selling for about $40,000 a copy to industrial and government research labs. But the venture capitalists who’d come in were convinced that the future was expert systems, and after not very long, I left.

Meanwhile I’d become a big expert on the intellectual property policies of universities—and eventually went to work at the Institute for Advanced Study in Princeton, where the director very charmingly said that since they’d “given away the computer” after von Neumann died, it didn’t make much sense for them to claim IP rights to anything now.

I dived into basic science, working a lot on cellular automata, and discovering some things I thought were very interesting. Here’s me with my SUN workstation with cellular automata running on it (and, yes, the mollusc looks like the cellular automaton):

I did some consulting work, mostly on technology strategy, which was very educational, particularly in seeing things not to do. I did quite a lot of work for Thinking Machines Corporation. I think my most important contribution was going to see the movie *WarGames* with Danny Hillis—and as we were walking out of the movie theater, saying to Danny, “Maybe your computer should have flashing lights too.” (The flashing lights ended up being a big feature of the Connection Machine computer—certainly important in its afterlife in museums.)

I was mostly working on basic science—but “because it would be easy” I decided to do a software project of building a C interpreter that we called IXIS. I hired some young people—one of whom was Tsutomu Shimomura, whom I’d already fished out of several hacking disasters. I made the horrible mistake of writing the boring code nobody else wanted to write myself—so I wrote a (quite lovely) text editor, but the whole project flopped.

I had all kinds of interactions with the computer industry back then. I remember Nathan Myhrvold, then a physics grad student at Princeton, coming to see me to ask what to do with a window system he’d developed. My basic suggestion was “sell it to Microsoft”. As it happens, Nathan later became CTO of Microsoft.

Well, by about 1985 I’d done a bunch of basic science I was pretty pleased with, and I was trying to use it to start the field of what I called complex systems research. I ended up getting a little involved in an outfit called the Rio Grande Institute—that later became the Santa Fe Institute—and encouraging them to pursue this kind of research. But I wasn’t convinced about their chances, and I resolved to start my own research institute.

So I went around to lots of different universities, in effect to get bids. The University of Illinois won, ironically in part because they thought it would help their chances getting funding from the Beckman Foundation—which in fact it did. So in August 1986, off I went to the University of Illinois, and the cornfields of Champaign-Urbana, 100 miles south of Chicago.

I think I did pretty well at recruiting faculty and setting things up for the new Center for Complex Systems Research—and the university lived up to its end of the bargain too. But within a few weeks I started to think it was all a big mistake. I was spending all my time managing things and trying to raise money—and not actually doing science.

So I quickly came up with Plan B. Rather than getting other people to help with the science I wanted to do, I would set things up so I could just do the science myself, as efficiently as possible. And this meant two things: first, I had to have the best possible tools; and second, I needed the best possible environment for myself.

When I was doing my basic science I kept on using different tools. There was some SMP. Quite a lot of C. Some PostScript, and graphics libraries, and things. And a lot of my time was spent gluing all this stuff together. And what I decided was that I should try to build a single system that would just do all the stuff I wanted to do—and that I could expect to keep growing forever.

Well, meanwhile, personal computers were just getting to the point where it was plausible to build a system like this that would run on them. And I knew a lot about what to do—and not do—from my experience with SMP. So I started designing and building what became Mathematica.

My scheme was to write documentation to define what to build. I wrote a bunch of core code—for example for the pattern matcher—a surprising amount of which is still in the system all these years later. The design of Mathematica was in many respects less radical and less extreme than SMP. SMP had insisted on using the idea of transforming symbolic expressions for everything—but in Mathematica I saw my goal as being to design a language that would effectively capture all the possible different paradigms for thinking about programming in a nice seamless way.

At first, of course, Mathematica wasn’t called Mathematica. In a strange piece of later fate, it was actually called Omega. It went through other names. There was Polymath. And Technique. Here’s a list of names. It’s kind of shocking to me how many of these—even the really horrible ones—have actually been used for products in the years since.

Well, meanwhile, I was starting to investigate how to build a company around the system. My original model was something like what Adobe was doing at the time with PostScript: we build core IP, then license it to hardware companies to bundle. And as it happened, the first person to show interest in that was Steve Jobs, who was then in the middle of doing NeXT.

Well, one of the consequences of interacting with Steve was that we talked about the name of the product. With all that Latin I’d learned in school, I’d thought about the name “Mathematica” but I thought it was too long and ponderous. Steve insisted that “that’s the name”—and had a whole theory about taking generic words and romanticizing them. And eventually he convinced me.

It took about 18 months to build Version 1 of Mathematica. I was still officially a professor of physics, math and computer science at the University of Illinois. But apart from that I was spending every waking hour building software and later making deals.

We closed a deal with Steve Jobs at NeXT to bundle Mathematica on the NeXT computer:

We also made a bunch of other deals. With Sun, through Andy Bechtolsheim and Bill Joy. With Silicon Graphics, through Forest Baskett. With Ardent, through Gordon Bell and Cleve Moler. With the AIX/RT part of IBM, basically through Andy Heller and Vicky Markstein.

And eventually we set a release date: June 23, 1988.

Meanwhile, as documentation for the system, I wrote a book called *Mathematica: A System for Doing Mathematics by Computer*. It was going to be published by Addison-Wesley, and it was the longest lead-time element of the release. And it ended up being very tight, because the book was full of fancy PostScript graphics—which nobody could apparently figure out how to render at high-enough resolution. So eventually I just took a hard disk to a friend of mine in Canada who had a phototypesetting company, and he and I babysat his phototypesetting machine over a holiday weekend, after which I flew to Logan Airport in Boston and handed the finished film for the book to a production person from Addison-Wesley.

We decided to do the announcement of Mathematica in Silicon Valley, and specifically at the TechMart place in Santa Clara. In those days, Mathematica couldn’t run under MS-DOS because of the 640K memory limit. So the only consumer version was for the Mac. And the day before the announcement there we were stuffing disks into boxes, and delivering them to the ComputerWare software store in Palo Alto.

The announcement was a nice affair. Steve Jobs came—even though he was not really “out in public” at the time. Larry Tesler came from Apple—courageously doing a demo himself. John Gage from Sun had the sense to get all the speakers to sign a book:

And so that was how Mathematica was launched. *The Mathematica Book* became a bestseller in bookstores, and from that people started understanding how to use Mathematica. It was really neat seeing all these science types and so on—of all ages—who’d basically never used computers themselves before, starting to just compute things themselves.

It was fun looking through registration cards. Lots of interesting and famous names. Sometimes some nice juxtapositions. Like when I’d just seen an article about Roger Penrose and his new book in *Time* magazine with the headline “Those Computers Are Dummies”… but then there was Roger’s registration card for Mathematica.

As part of the growth of Mathematica, we ended up interacting with pretty much all possible computer companies, and collected all kinds of exotic machines. Sometimes that came in handy, like when the Morris worm came through the internet, and our gateway machine was a weird Sony workstation with a Japanese OS that the worm hadn’t been built for.

There were all kind of porting adventures. Probably my favorite was on the Cray-2. With great effort we’d gotten Mathematica compiled. And there we were, ready for the first calculation. And someone typed 2+2. And—I kid you not—it came out “5”. I think it was an issue with integer vs. floating point representation.

You know, here’s a price list from 1990 that’s a bit of a stroll down computer memory lane:

We got a boost when the NeXT computer came out, with Mathematica bundled on it. I think Steve Jobs made a good deal there, because all kinds of people got NeXT machines to run Mathematica. Like the Theory group at CERN—where the systems administrator was Tim Berners-Lee, who decided to do a little networking experiment on those machines.

Well, a couple of years in, the company was growing nicely—we had maybe 150 employees. And I thought to myself: I built this because I wanted to have a way to do my science, so isn’t it time I started doing that? Also, to be fair, I was injecting new ideas at too high a rate; I was worried the company might just fly apart. But anyway, I decided I would take a partial sabbatical—for maybe six months or a year—to do basic science and write a book about it.

So I moved from Illinois to the Oakland Hills—right before the big fire there, which narrowly missed our house. And I started being a remote CEO—using Mathematica to do science. Well, the good news was that I started discovering lots and lots of science. It was kind of a “turn a telescope to the sky for the first time” moment—except now it was the computational universe of possible programs.

It was really great. But I just couldn’t stop—because there kept on being more and more things to discover. And all in all I kept on doing it for ten and a half years. I was really a hermit, mostly living in Chicago, and mostly interacting only virtually… although my oldest three children were born during that period, so there were humans around!

I had thought maybe there’d be a coup at the company. But there wasn’t. And the company continued to steadily grow. We kept on doing new things.

Here’s our first website, from October 7, 1994:

And it wasn’t too long after that we started doing computation on the web:

I actually took a break from my science in 1996 to finish a big new version of Mathematica. Back in 1988 lots of people used Mathematica through a command line interface. In fact, it’s still there today. 1989^1989 is the basic computation I’ve been using since, yes, 1989, to test speed on a new machine. And actually a basic Raspberry Pi today gives a pretty good sense of what it was like back at the beginning.

But, OK, on the Mac and on NeXT back in 1988 we’d invented these things we called notebooks that were documents that mixed text and graphics and structure and computation—and that was the UI. It was all very modern, with a clean front-end/kernel architecture where it was easy to run the kernel on a remote machine—and by 1996 a complete symbolic XML-like representation of the structure of the notebooks.

Maybe I should say something about the software engineering of Mathematica. The core code was written in an extension of C—actually an object-oriented version of C that we had to develop ourselves, because C++ wasn’t efficient enough back in 1988. Even from the beginning, some code was written in the Mathematica top-level language—that’s now the Wolfram Language—and over the years a larger and larger fraction of the code was that way.

Well, back at the beginning it was very challenging getting the front end to run on different machines. And we wound up with different codebases on Mac, NeXT, Microsoft Windows, and X Windows. And in 1996 one of the achievements was merging all that together. And for almost 20 years the code was gloriously merged—but now we’ve again got separate codebases for desktop, browser and mobile, and history is repeating itself.

Back in 1996 we had all kinds of ways to get the word out about the new Mathematica Version 3. My original Mathematica book had now become quite large, to accommodate all the things we were adding.

And we had a couple of other “promotional vehicles” that we called the MathMobiles that drove around with the latest gear inside—and served as moving billboard ads for our graphics.

There were Mathematicas everywhere, getting used for all kinds of things. And of course wild things sometimes happened. Like in 1997 when Mike Foale had a PC running Mathematica on the Mir space station. Well, there was an accident, and the PC got stuck in a part of the space station that got depressurized. Meanwhile, the space station was tumbling, and Mike was trying to debug it—and wanted to use Mathematica to do it. So he got a new copy on the next supply mission—and installed it on a Russian PC.

But there was a problem. Because our DRM system immediately said, “That’s a Russian PC; you can’t run a US-licensed Mathematica there!” And that led to what might be our all-time most exotic customer service call: “The user is in a tumbling space station.” But fortunately we could just issue a different password—Mike solved the equations, and the space station was stabilized.

Well, after more than a decade—in 2002—I finally finished my science project and my big book:

During my “science decade” the company had been steadily growing, and we’d built up a terrific team. But not least because of things I’d learned from my science, I thought it could do more. It was refreshing coming back to focus on it again. And I rather quickly realized that the structure we’d built could be applied to lots of new things.

Math had been the first big application of Mathematica, but the symbolic language I’d built was much more general than that. And it was pretty exciting seeing what we could do with it. One of the things in 2006 was representing user interfaces symbolically, and being able to create them computationally. And that led for example to CDF (our Computable Document Format), and things like our Wolfram Demonstrations Project.

We started doing all sorts of experiments. Many went really well. Some went a bit off track. We wanted to make a poster with all the facts we knew about mathematical functions. First it was going to be a small poster, but then it became 36 feet of poster… and eventually The Wolfram Functions Site, with 300,000+ formulas:

It was the time of the cell-phone ringtone craze, and I wanted a personal ringtone. So we came up with a way to use cellular automata to compose an infinite variety of ringtones, and we put it on the web. It was actually an interesting AI-creativity experience, and music people liked it. But after messing around with phone carriers for six months, we pretty much didn’t sell a single ringtone.

(Yes, if you go to that site, it’s currently a bit embarrassing, because it’s not working with the current browser releases. It’ll be fixed soon… but what happened is that the webMathematica server behind it was just running unattended for a decade—and now nobody knows how it works…)

But, anyway, having for many years been a one-product company making Mathematica, we were starting to get the idea that we could not only add new things to Mathematica—but also invent all kinds of other stuff.

Well, I mentioned that back when I was a kid I was really interested in trying to do what I’d now call “making knowledge computable”: take the knowledge of our civilization and build something that could automatically compute answers to questions from it. For a long time I’d assumed that to do that would require making some kind of brain-like AI. So, like, in 1980 I worked on neural networks—and didn’t get them to do anything interesting. And every few years after that I would think some more about the computable knowledge problem.

But then I did the science in *A New Kind of Science*—and I discovered this thing I call the Principle of Computational Equivalence. Which says many things. But one of them is that there can’t be a bright line between the “intelligent” and the “merely computational”. So that made me start to think that maybe I didn’t need to build a brain to solve the computable knowledge problem.

Meanwhile, my younger son, who I think was about six at the time, was starting to use Mathematica a bit. And he asked me, “Why can’t I just tell it what I want to in plain English?” I started explaining how hard that was. But he persisted with, “Well, there just aren’t that many different ways to say any particular thing,” etc. And that got me thinking—particularly about using the science I’d built to try to solve the problem of understanding natural language.

Meanwhile, I’d started a project to curate lots of data of all kinds. It was an interesting thing going into a big reference library and figuring out what it would take to just make all of that computable. Alan Turing had done some estimates of things like that, which were a bit daunting. But anyway, I started getting all kinds of experts on all kinds of topics that tech companies usually don’t care about. And I started building technology and a management system for making data computable.

It was not at all clear this was all going to work, and even a lot of my management team was skeptical. “Another WolframTones” was a common characterization. But the good news was that our main business was strong. And—even though I’d considered it in the early 1990s—I’d never taken the company public, and I didn’t have any investors at all, except I guess myself. So I wasn’t really answering to anyone. And so I could just do Wolfram|Alpha—as I have been able to do all kinds of long-term stuff throughout the history of our company.

And despite the concerns, Wolfram|Alpha did work. And I have to say that when it was finally ready to demo, it took only one meeting for my management team to completely come around, and be enthusiastic about it.

One problem, of course, with Wolfram|Alpha is that—like Mathematica and the Wolfram Language—it’s really an infinite project. But there came a point at which we really couldn’t do much more development without seeing what would happen with real users, asking real questions, in real natural language.

So we picked May 15, 2009 as the date to go live. But there was a problem: we had no idea how high the traffic would spike. And back then we couldn’t use Amazon or anything: to get performance we had to do fancy parallel computations right on the bare metal.

Michael Dell was kind enough to give us a good deal on getting lots of computers for our colos. But I was pretty concerned when I talked to some people who’d had services that had crashed horribly on launch. So I decided on a kind of hack. I decided that we’d launch on live internet TV—so if something horrible happened, at least people would know what was going on, and might have some fun with it. So I contacted Justin Kan, who was then doing justin.tv, and whose first company I’d failed to invest in at the very first Y Combinator—and we arranged to “launch live”.

It was fun building our “mission control”—and we made some very nice dashboards, many of which we actually still use today. But the day of the launch I was concerned that this was going to be the most boring TV ever: that basically at the appointed hour, I’d just click a mouse and we’d be live, and that’d be the end of it.

Well, that was not to be. You know, I’ve never watched the broadcast. I don’t know how much it captures of some of the horrible things that went wrong—particularly with last-minute network configuration issues.

But perhaps the most memorable thing had to do with the weather. We were in central Illinois. And about an hour before our grand launch, there was a weather report—that a tornado was heading straight for us! You can see the wind speed spike in the Wolfram|Alpha historical weather data:

Well, fortunately, the tornado missed. And sure enough, at 9:33:50pm central time on May 15, 2009, I pressed the button, and Wolfram|Alpha went live. Lots of people started using it. Some people even understood that it wasn’t a search engine: it was computing things.

The early bug reports then started flowing in. This was the thing Wolfram|Alpha used to do at the very beginning, when something failed:

And one of the bug reports was someone saying, “How did you know my name was Dave?!” All kinds of bug reports came in the first night—here are a couple:

Well, not only did people start using Wolfram|Alpha; companies did too. Through Bill Gates, Microsoft hooked up Wolfram|Alpha to Bing. And a little company called Siri hooked it up to its app. And some time later Apple bought Siri, and through Steve Jobs, who was by then very sick, Wolfram|Alpha ended up powering the knowledge part of Siri.

OK, so we’re getting to modern times. And the big thing now is the Wolfram Language. Actually, it’s not such a modern thing for us. Back in the early 1990s I was going to break off the language component of Mathematica—we were thinking of calling it the M Language. And we even had people working on it, like Sergey Brin when he was an intern with us in 1993. But we hadn’t quite figured out how to distribute it, or what it should be called.

And in the end, the idea languished. Until we had Wolfram|Alpha, and the cloud existed, and so on. And also I must admit that I was really getting fed up with people thinking of Mathematica as being a “math thing”. It had been growing and growing:

And although we kept on strengthening the math, 90% of it wasn’t math at all. We had kind of a “let’s just implement everything” approach. And that had gone really well. We were really on a roll inventing all those meta-algorithms, and automating things. And combined with Wolfram|Alpha I realized that what we had was a new, very general kind of thing: a knowledge-based language that built in as much knowledge about computation and about the world as possible.

And there was another piece too: realizing that our symbolic programming paradigm could be used to represent not just computation, but also deployment, particularly in the cloud.

Mathematica has been very widely used in R&D and in education—but with notable exceptions, like in the finance industry, it’s not been so widely used for deployed production systems. And one of the ideas of the Wolfram Language—and our cloud—is to change that, and to really make knowledge-based programming something that can be deployed everywhere, from supercomputers to embedded devices. There’s a huge amount to say about all this…

And we’ve done lots of other things too. This shows function growth over the first 10,000 days of Mathematica, what kinds of things were in it over the years.

We’ve done all kinds of different things with our technology. I don’t know why I have this picture here, but I have to show it anyway; this was a picture on the commemorative T-shirt for our Image Identification Project that we did a year ago. Maybe you can figure out what the caption on this means with respect to debugging the image identifier: it was an anteater in the image identifier because we lost the aardvark, who is pictured here:

And just in the last few weeks, we’ve opened up our Wolfram Open Cloud to let anyone use the Wolfram Language on the web. It’s really the culmination of 30, perhaps 40, years of work.

You know, for nearly 30 years I’ve been working hard to make sure the Wolfram Language is well designed—that as it gets bigger and bigger all the pieces fit nicely together, so you can build on them as well as possible. And I have to say it’s nice to see how well this has paid off now.

It’s pretty cool. We’ve got a very different kind of language—something that’s useful for communicating not just about computation, but about the world, with computers and with humans. You can write tiny programs. There’s Tweet-a-Program for example:

Or you can write big programs—like Wolfram|Alpha, which is 15 million lines of Wolfram Language code.

It’s pretty nice to see companies in all sorts of industries starting to base their technology on the Wolfram Language. And another thing I’m really excited about right now is that with the Wolfram Language I think we finally have a great way to teach computational thinking to kids. I even wrote a book about that recently:

And I can’t help wondering what would have happened if the 12-year-old me had had this—and if my first computer language had been the Wolfram Language rather than the machine code of the Elliott 903. I could certainly have made some of my favorite science discoveries with one-liners. And a lot of my questions about things like AI would already have been answered.

But actually I’m pretty happy to have been living at the time in history I have, and to have been able to be part of these decades in the evolution of the incredibly important idea of computation—and to have had the privilege of being able to discover and invent a few things relevant to it along the way.

]]>The equation that Albert Einstein wrote down for the gravitational field in 1915 is simple enough:

But working out its consequences is not. And in fact even after 100 years we’re still just at the beginning of the process.

Millions of lines of algebra have been done along the way (often courtesy of Mathematica and the Wolfram Language). And there have been all sorts of predictions. Like that if two black holes merge, there should be a burst of gravitational radiation generated, with a particular form. And a little more than a week ago—in a triumph of theoretical and experimental science—it was announced that just such gravitational radiation had been detected.

I’ve followed General Relativity and gravitation theory for more than 40 years now—and it’s been inspiring to see how the small community that’s pursued it has progressively increased its theoretical prowess, and how the discussions I saw at Caltech in the late 1970s finally led to a successful detector of gravitational waves.

General Relativity is surely not the whole story of how spacetime and gravity work. But we’ve now just got some spectacular new evidence of how far the theory can be taken. For a long time I myself was a bit skeptical about black holes—and for example about whether true General-Relativity-style ones would actually form in real physical processes. But as of a little more than a week ago I’m finally convinced that black holes exist, just as General Relativity suggests.

OK, so we’ve observed one pair of black holes, a billion light years away. And no doubt now—quite amazingly—we’ll get evidence for a steady stream of others around the universe. But what if somehow we could get our hands on our very own black holes, and maybe even lots of them? What could we—or, for that matter, any putative extraterrestrials—do with them? What kind of perhaps extremely exotic structures or technology could eventually be made with them?

It’s always the same story with technology. We have to take the raw material that our universe provides, and somehow find ways to organize it for purposes we want. It’s remarkable to look through the list of chemical elements, or a list of physics effects that have been discovered, and to realize that—though it sometimes took a while—almost all those that can be readily realized on the time and energy scales of today’s technology have found real applications. So what about black holes? Given how hard it’s been to detect our very first pair of black holes, it might seem almost irreverent to ask. And perhaps our universe just isn’t big enough for the question to be sensible. But as a kind of celebration of the detection of gravitational waves I thought it might be fun to try fast-forwarding a long way—and seeing what one can figure out about technology that black holes could make possible.

It seems inconceivable that we ourselves will ever get to try out anything like this for real—unless we find a way to locally make tiny stable black holes. But if something is possible to do, perhaps some more-advanced civilization out there in the universe has already done it—but we likely couldn’t recognize evidence of it without having more idea of what’s possible.

But before we can get to speculating about black hole technology, we’re going to have to talk a bit about what’s known about black holes, General Relativity and gravitation. There are lots of complicated issues—that are probably most easily explained using some fairly mathematically sophisticated concepts (Riemann tensors, covariant derivatives, spacelike hypersurfaces, Penrose diagrams, etc. etc.). But for the sake of writing a general blog post, I’m going to try to do without these, while still, I hope, correctly communicating what’s known and what’s not. I won’t be able to do it perfectly, and might lapse unwittingly into physics-speak from time to time, but here goes…

General Relativity is often discussed in terms of the geometry of spacetime. But one can also think of it as just saying that gravity is associated with a field that has a certain strength or value at every point. This idea of a field is basically just like in electromagnetism, with its electric and magnetic fields. It’s also like in fluid mechanics, where there’s a velocity field that gives the velocity of the fluid at every point (like a wind velocity map for the weather).

What Einstein did in 1915 was to suggest particular equations that should be satisfied by the gravitational field. Mathematically, they’re partial differential equations, which means that they say how values of the field relate to rates of change (partial derivatives) of these values. They’re the same general kind of equations that we know work for electromagnetic fields, or for the velocity field in a fluid.

So what does one do with these equations? Well, one solves them to find out what the field is in any particular case. It turns out that for electromagnetism, the structure of the equations makes this in principle straightforward. But for fluid mechanics, it’s considerably more complicated—and for Einstein’s equations it’s much more complicated still.

In electromagnetism, one can just think of charges and currents as being sources of electromagnetic field, and there’s no “internal effect” of the field on itself (unless one considers quantum effects). But for fluid mechanics and Einstein’s equations, it’s a different story. In a first approximation, the velocity of a fluid is determined by whatever pressure is applied to it. But what complicates things greatly is that within the fluid there’s an internal effect of each part of the velocity field on others. And it’s similar with the gravitational field: In a first approximation, the field is just determined by whatever configuration of masses exists. But there’s also an “internal effect” of the field on itself. Physically, this is because the gravitational field can be thought of as having energy and momentum, which behave like mass in effectively being a source of the field. (The electromagnetic field has energy and momentum too, but it doesn’t itself have charge, so doesn’t act as a source for itself. In QCD, the color field itself has color, so it has the same general kind of nonlinear character as fluid mechanics or Einstein’s equations.)

In electromagnetism, with its simpler structure, one can’t have any region of static nonzero field unless one has charges or currents explicitly producing it. But when fields can act on themselves it’s a different story, and there can be structures that exist purely in the field, without any external sources being present. For example, in a fluid there can be a vortex that just exists within the fluid—because this happens to be a possible solution to the pure equations for the velocity field of the fluid, without any external forces.

What about the Einstein equations? Well, it’s somewhat the same story, though the details are considerably more complicated. There are nontrivial solutions to the Einstein equations even in the case of “pure gravity”, without any matter or external configuration of masses being present. And that’s exactly what black holes are. They’re examples of solutions to the Einstein equations that correspond to structures that can just exist independently in a gravitational field, a bit like vortices can just exist in the velocity field of a fluid.

From everyday experience and from seeing the operation of programs, we tend to be used to the idea that the way to work out what something will do is to start from the beginning and then go forwards step by step. But in mathematically based science the setup is often much less direct and constructive, and instead is basically “the system obeys such-and-such an equation; whatever the system does must correspond to some solution or another to the equation”. And that’s ultimately the setup with Einstein’s equations.

There can be some serious complications. For example, given particular constraints it’s far from obvious that any solutions to the equations will exist, or be unique. And indeed we’ll encounter difficulties along these lines later. But let’s start off by trying to get some rough idea of the physics of how black holes can be made.

The classic way one imagines a black hole is made is from the collapse of a massive star. And that’s presumably where the two black holes just detected came from.

For the Earth, with its particular mass and radius, we can work out that something launched from the surface must have a velocity of about 25,000 miles per hour to escape Earth’s gravity. But for a body whose mass is larger or whose radius is smaller, the escape velocity will be larger. And what General Relativity (like Newtonian gravity before it) says is that eventually the escape velocity will exceed the speed of light—so that neither light nor anything else will be able to escape, so the object will always seem black: a black hole.

When this happens, there’s inevitably also a strong gravitational field. And this gravitational field effectively has mass, which itself serves as a source of gravitational field. And in the end, it’s actually irrelevant if there’s matter there at all: the black hole is in effect a self-sustaining configuration of the gravitational field that exists as a solution to Einstein’s equations. It’s a bit like a vortex in a fluid, which you can start by stirring, but which, once it’s there, effectively just perpetuates itself (though in a real fluid with viscosity it’ll eventually damp out).

It’s not obvious of course that the mass and radius needed to get a black hole would actually occur. It’s known that stars like the Sun will never collapse far enough. But above about 3 or 4 solar masses, there’s at least no known physical process that will prevent a star from collapsing enough to form a black hole. And the 36- and 29-solar-mass black holes recently observed presumably formed this way.

Let’s for a moment ignore how black holes might be formed, and just ask what they can be like. This is really a question about possible solutions to Einstein’s equations. And if we want something that doesn’t change with time, and that’s localized in space, then there are mathematical theorems that say the choices are very limited.

There could have been a whole zoo of possible black hole structures—and in higher dimensions, there are at least a few more. But for 4D spacetime, it actually turns out that all stationary black hole solutions are mathematically similar, and are determined by just two parameters: their overall mass and angular momentum. (If one includes electromagnetism as well, then they’re also determined by charge—and it’d be the same story with any other long-range gauge fields.)

The case of non-rotating black holes (zero angular momentum) is simplest. The relevant solution to the Einstein equations was found already by Karl Schwarzschild in 1915. But it took nearly 50 years for the interpretation of the solution to become clear.

One crucial feature of the Schwarzschild solution is that it has an event horizon. This means that any light rays (or anything else) that originate inside a certain sphere (the event horizon) are trapped forever, and can’t escape. There was confusion for quite a while, because the original formula for the Schwarzschild solution has a singularity at the event horizon. But actually this is just a mathematical artifact that can be removed by using a different coordinate system, and isn’t relevant to anything physically observable.

But even though there’s no real singularity at the event horizon, there is a singularity at the very center of the black hole—where the curvature of spacetime, and thus the effective strength of the gravitational field, is infinite. And it turns out that this singularity is in effect where the whole mass of the black hole is concentrated. It’s a pretty pathological situation. If this were happening in fluid mechanics, for example, we’d just assume that the continuum differential equations we’re using must break down, and that instead we’d have to work at the level of molecules. But for General Relativity we don’t yet have any established lower-level theory to use (though I certainly have ideas, and string theory has claims of being able to come to the rescue). There’s also elegant mathematics that’s developed around black holes and their singularities—and anyway at least in this case one can say that “It’s all happening inside the event horizon so nobody outside will ever find out about it”. So the current state of the art is just to work with the theory assuming the singularity is real—and what’s interesting now is that calculations based on this seem to have given correct answers for the recent gravitational wave discovery.

I just talked a bit about the mathematical structure of a black hole solution to Einstein’s equations. But how does this correspond to an actual black hole that could form from the collapse of a massive star?

The truest way to find out would be to start from an accurate model of the star and then simulate the whole process of forming the black hole. And at least in some approximation, it’s possible these days to do this. But let’s try a more lightweight approach.

Let’s assume that there’s a black hole solution to Einstein’s equations that exists. Then let’s ask what happens when small things fall into it. Well, there’s already an issue here. Think about an observer far from the black hole. In order to “get the news” that something crossed the event horizon of the black hole, the observer would have to get some signal—say a light pulse. But as the thing gets closer to the event horizon, it’ll take longer and longer for the signal to escape. And the result is that the observer will never see things cross the event horizon: they’ll appear to get closer and closer (and darker and darker), but never actually cross.

And that’ll be true even when it comes to the formation of the black hole. The star will be seen to be collapsing, but it’ll look as if it’s just freezing when it gets to the point where an event horizon would form.

OK, but what if the observer is also falling into a black hole? Here the experience is completely different. They probably wouldn’t even notice when they cross the event horizon, except that “handshake” signals to the outside world will stop getting responses. But then they’ll get pulled in towards the singularity at the center of the black hole. The gravitational field will steadily increase, and the fact that it’s stronger further in will inevitably stretch any object (or observer!) out. But eventually, splat, they’ll hit the singularity—and in some sense be sucked into it.

Is that really how things will work? Well, it’s hard to tell, but probably not. Outside the event horizon it’s known that small perturbations in the structure of the gravitational field—say associated with the presence of matter—will tend to get damped out, so that what emerges is exactly the official Schwarzschild black hole solution to the Einstein equations.

But inside the event horizon it’s much less clear what happens. As soon as there are perturbations, there’ll be time variations in the gravitational field, and one’s no longer dealing with a static solution to the Einstein equations. The result is that the known theorems no longer apply—and quite possibly there’ll be instabilities that change the structure or even existence of the singularity. But at least in this case, in some sense it doesn’t matter—because none of what happens will ever be visible outside of the event horizon.

In 1963 Roy Kerr found a solution to Einstein’s equations that corresponds to a black hole with angular momentum. Like the solution for a non-rotating black hole, it has a singularity in the middle. But now the singularity is not a point; instead it forms a ring.

And at least so long as the angular momentum *J* is (in suitable units) less than the square of the mass, *M*^{2}, the rotating black hole solution has an event horizon. And outside the event horizon, perturbations tend to get damped, just like in the non-rotating case. But inside, things are different.

In a non-rotating black hole anything that goes inside the event horizon will eventually hit the singularity, but won’t “see it coming”. And if light or anything else originates at the singularity it’ll just stay there, and never “get out”.

But the same isn’t true in a rotating black hole. Here, not everything will hit the singularity, and things that originate at the singularity can “get out”. This latter point is quite a problem—because it means that to know the behavior inside the black hole, you have to know what happens at the singularity. But at the singularity, Einstein’s equations can’t tell one anything: they essentially just say infinity=infinity. So the conclusion is that at least based on Einstein’s equations, one simply can’t predict what will happen.

At least with *J* < *M*^{2}, this failure of prediction occurs only inside the so-called inner horizon of the black hole. But even outside this, something weird happens. To an observer falling into the black hole, it’ll seem like a finite time elapses between when they cross the event horizon and the inner horizon. But to an observer outside the black hole, this will seem like an infinite time. And that means that any signals that come from outside the black hole—into the infinite future—could be collected by the observer inside the black hole, in finite time.

Most likely this is a sign that in practice unbounded amounts of energy will accumulate near the inner horizon, making it unstable. But if somehow stability were maintained, there’d be a really weird effect going on: the observer inside the black hole would get to see, in finite time, the whole infinite future unfolding outside the black hole. And if that future happened to include Turing machines doing computations, then in finite time the observer would get to see computations—like solving the halting problem—that can’t necessarily be done by Turing machines in any finite time.

This might be billed as evidence for “physics going beyond the Turing limit”, but it’s not really convincing, first because the whole theoretical internal structure of rotating black holes probably gets modified in practice; and second, because to really talk about the infinite future we have to consider the structure of the whole universe, not just one specific black hole.

But despite all this complexity about what happens inside the event horizon, General Relativity has clear predictions for outside—and these are what were needed for the pair of black holes just detected.

In a rotating black hole with *J* < *M*^{2}, there’s a nasty singularity—but it’s safely inside an event horizon. But for *J* > *M*^{2}, there’s the same kind of singularity, but now it’s no longer inside an event horizon, and instead it’s “naked” and exposed to the outside universe.

If there’s a naked singularity like this, the consequence is simple: General Relativity alone isn’t sufficient to describe what happens in the universe; some additional theory is needed.

Encountering something like this is one of the hazards of using a theory—like General Relativity—that’s based on solving equations (rather than, say, running a program) to deduce how systems behave.

And in fact, it’s still quite possible that something similar happens in the Navier–Stokes equations for fluid mechanics. There are lots of partial results, but it’s still not known whether starting from smooth initial conditions, the Navier–Stokes equations can generate singularities.

From a physics point of view, though, there’s something to say: the Navier–Stokes equations for fluids are derived by assuming that the velocity field doesn’t change too rapidly in space or time. And that’s a fine assumption when the velocities are small. But as soon as there’s supersonic flow, there are shocks where the velocity changes rapidly. Viscosity smooths out the shocks a bit, but by the time one’s in the hypersonic regime, at Mach 4 or so, the shocks get very sharp—in fact, so sharp that their width is less than the typical distance between collisions for molecules in the fluid. And the result of this is that the continuum description of the fluid necessarily breaks down, and one has to start looking at the underlying molecular structure.

OK, so can naked singularities actually occur in practice in General Relativity? We know they occur if you somehow have a *J* > *M*^{2} object. But what if you start from a realistic star, or some other distribution of matter? Can it spontaneously evolve to produce a naked singularity?

It was proved a few decades ago that if you start with something that’s close to ordinary flat spacetime, it can’t spontaneously make singularities. But if you start putting matter in, then the story changes. And in fact there are now several examples known where a smooth initial distribution of matter can evolve to make a naked singularity—though the singularity only shows up if the initial conditions are very carefully arranged and as soon as there’s any perturbation, it goes away.

Can one get a stable naked singularity without this kind of special setup? So far, nobody knows.

And nobody knows whether *J* > *M*^{2} objects can be formed. If one looks at candidate black holes around the universe, most of them are rotating. The final one from the week before last had *J* ≃ 0.7 *M*^{2}. And it’s certainly interesting to note that while many have *J* close to *M*^{2}, none seen so far have *J* > *M*^{2}. It’s also interesting that in numerical simulations of pairs of rotating black holes, they always eventually merge—but if the result would have *J* > *M*^{2} they seem to “delay” their merger, and emit lots of gravitational radiation that gets rid of angular momentum, before merging to produce a black hole with *J* < *M*^{2}.

People have been talking about gravitational waves for almost a century, and there’s been indirect evidence of them for a while. But the recent announcement of direct detection of gravitational waves is pretty exciting.

So what are gravitational waves? They’re a fairly direct analog of electromagnetic waves. If you take a charge and wiggle it around, it’ll radiate electromagnetic waves—for example, radio waves. And in a directly analogous way, if you take a mass and wiggle it around, it’ll radiate gravitational waves. Usually they’ll be incredibly weak. But if the mass is very big and concentrated, like a black hole, the gravitational waves can be stronger—and, as we’ve now seen, even strong enough to detect.

Why is there radiation when you wiggle something around? It’s not hard to see. Imagine, say, that there’s a charge sitting somewhere, and you’re some distance away. There’ll be electric field from the charge—that’s, say, pointing towards the charge. Now suddenly move the charge. After things have stabilized again, there’d better be a new version of the electric field, say pointing to the new position of the charge. But how does the transition happen? The answer is that the change somehow has to propagate outward from the charge—and the process of that happening is electromagnetic radiation, which (in a vacuum) moves at the speed of light.

In general, the amount of electromagnetic radiation that’s produced is proportional to (the square of) the acceleration of the charge. (Actually, there’s considerable subtlety to this, particularly in the relativistic case—and the details of the globally correct formula are still somewhat debated.) It’s similar for gravitational radiation.

There are some differences though. A minimal antenna for electromagnetic radiation is a straight wire, that electrons can go up and down. For gravitational radiation, the minimal “antenna” has to be something that effectively has motion in two perpendicular directions—or, more technically, a changing quadrupole moment. In practice, two bodies orbiting each other will emit gravitational radiation, more or less as a result of the acceleration necessary to keep them in their orbits. More or less any mass that “blobs around” without being spherically symmetric will also emit gravitational waves.

When something emits gravitational waves, it’s radiating away some of its energy. And in general the emission of gravitational radiation tends to have a damping effect on the motion of things. For example, the emission of gravitational radiation will make orbits decay—and makes orbiting bodies progressively spiral in towards each other.

For something like the Earth and the Sun, this is an absolutely infinitesimal effect. But for a pair of neutron stars orbiting each other, it’s more significant. And indeed, starting in 1974 such an effect was observed in a binary pulsar. And now, this is what caused two black holes eventually to spiral in so far that they hit each other—and produce the event just announced.

Once two black holes hit, there’s a tremendous amount of gravitational radiation emitted as the resulting object “blobs around” before assuming its final single-black-hole shape. For stellar-sized black holes it all happens in a few hundred milliseconds. And in the case of the event just announced, the total energy in gravitational radiation was a whopping 3 solar masses—big enough that we’re able to detect it a significant fraction of the way across the universe.

Pretty much any kind of field or continuous material supports some kind of waves. Start from whatever the stable state of the system is, then perturb it just a little by periodically changing something, and you’ll get waves. When the amplitude of the waves is small enough, the math tends to be fairly straightforward. For example, in a first approximation, the amplitudes of different waves at a particular point will just add linearly.

But when the amplitudes of the waves get bigger, things can get much more complicated. In electromagnetism, everything stays linear however big the amplitude is (well, until one runs into quantum effects). But for pretty much any other kind of waves—including, say, water waves, as well as gravitational waves—there start to be nonlinear effects as soon as the amplitude is larger.

When there’s linearity, one can effectively break down any field configuration into a sequence of non-interacting waves of different frequencies. But that’s no longer true for something nonlinear, and eventually it usually doesn’t make sense to talk about waves at all: one’s just dealing with some field configuration or another.

In the case of gravitational waves, one of the notable features is that one can in principle arrange waves to combine so that they’ll form black holes. Indeed, one can potentially start with low-amplitude waves, but somehow make them converge to a point where they’ll generate a black hole (think “gravitational implosion lens”, etc.).

A single static black hole in an infinite universe is a possible solution to Einstein’s equations. So what about two black holes orbiting each other? Well, there’s no known exact solution to the equations for this case, and it’s only fairly recently that it’s become possible to calculate with any reliability what happens.

Roughly, there are three regimes. First, the black holes are peacefully orbiting, and emitting gravitational radiation. When the black holes are far apart, and have velocities small compared to the speed of light, it’s fairly straightforward. But as they get closer and speed up, it becomes more complicated. Each black hole perturbs the other, but with a lot of algebra it’s possible to calculate the effects (as a power series in v/c).

Eventually, though, this breaks down, and the only choice is to solve the Einstein equations numerically using many of the same methods traditionally used for fluid mechanics. (There’ve been various efforts to use the same kind of cellular automaton approach on the Einstein equations that I used for the Navier–Stokes equations, but I think what’s more promising is to try something like my network-rewriting models for gravity.)

It’s only in recent years that computers have become fast enough to get sensible answers from computations like this involving high gravitational fields as well as velocities close to the speed of light. And in these computations, the result is that something like a single black hole is formed. Inevitably it’s a deformed black hole, and the third regime is one where—a bit like a bell—the black hole “rings down” these deformations (either by emitting gravitational radiation, or by absorbing them into the black hole itself).

It’s a pretty complicated stack of computations, requiring a variety of different methods. But the impressive thing is that—judging from the recent announcement—it seems to correctly capture what goes on in the interaction between two black holes.

There are plenty of detailed issues, however. One of them is that you can’t just set up some elaborate initial state with two black holes and expect that it will be a solution to the Einstein equations, even for an instant. So in addition to working out the time evolution, one also has to somehow progressively modify the initial conditions one specifies, so that they actually correspond to a possible configuration of the gravitational field according to Einstein’s equations.

If we want to start thinking about black hole configurations for purposes of technology, it would help to devise a simplified summary of interactions between two—or more—black holes. For example, one might want to have a summary of the effects of the direction of rotation (or “spin”) and of orbiting on black holes’ interactions, organized (in analogy with quantum systems) into spin-orbit, spin-spin, etc. components.

It’s a general feature of fluids that when they flow rapidly, they tend to show turbulence and behave in seemingly random ways. It’s still not completely clear what the origin of this apparent randomness is. It could be that somehow one is seeing an amplified version of small-scale random molecular motions. Or it could be there is enough instability that one is progressively exploring random details of initial conditions (as in chaos theory). I’ve spent a long time studying this, and my conclusion is that the randomness mostly isn’t coming from things that are essentially outside of the fluid; it’s instead coming from the actual dynamics of the fluid, as if the fluid were computing my rule 30 cellular automaton, or running a pseudorandom number generator.

If one works with the standard Navier–Stokes equations for fluid mechanics, it’s not very clear what’s going on—because one ends up having to solve the equations numerically, and whenever something complicated happens, it’s almost impossible to tell if it’s a consequence of the numerical analysis one’s done, or a genuine feature of the equations. I sidestepped these issues by using cellular automaton models for fluids rather than differential equations—and from that it’s pretty clear that intrinsic randomness generation is at least a large part of what’s going on. And having seen this, my expectation would be that if one could solve the equations well enough, one would see exactly the same behavior in the Navier–Stokes equations.

So what about the Einstein equations? Can they show turbulence? I’ve long thought that they should be able to, although to establish this will run into the same kinds of numerical-analysis issues as with the Navier–Stokes equations, though probably in an even more difficult form.

In a fluid the typical pattern is that one starts with a large-scale motion (say induced by an airplane going through the air). Then what roughly happens (at least in 3D) is that this motion breaks down into a cascade of smaller and smaller eddies, until the eddies are so small that they are damped out by viscosity in the fluid.

Would something similar happen with turbulence in the gravitational field? It can’t be quite the same, because unlike fluids, which dissipate small-scale motion by turning it into heat, the gravitational field has no such dissipation mechanism, at least according to Einstein’s equations (without adding matter, quantum effects, etc.). (Note that even with ordinary fluid mechanics, things are very different in 2D: there eddies tend not to break into smaller ones, but instead to combine into larger ones, perhaps like the Great Red Spot on Jupiter.)

My guess is that a phenomenon akin to turbulence is endemic in systems that have fields which can interact with themselves. Another potential example is the classical analog of QCD—or, more simply, classical Yang–Mills theory (the theory of a classical self-interacting color field). Yang–Mills theory shares with gravity the feature that it exhibits no dissipation, but is mathematically perhaps simpler. For years I’ve been asking people who do lattice-gauge-theory simulations whether they see any analog of turbulence. But with the randomized sampling (as opposed to evolution) approach they typically use, it’s hard to tell. (There are mathematical connections between versions of gravity and versions of Yang–Mills theory that have been extensively explored in recent years, but I don’t know what implications they have for questions of turbulence.)

In Newton’s theory of gravity, there’s an inverse square law for the force of gravity. Sufficiently far away from a massive object, the same law holds in General Relativity too. With an inverse square law for gravity, the orbit of a pointlike object around any spherical mass will always be an ellipse (just like Newton said it should be for Halley’s Comet). And every time the object goes around its orbit, it will just retrace the exact same ellipse, keeping the long axis of the ellipse in the same direction.

But what happens in General Relativity, and with black holes? The first important fact is that if something is spherically symmetric, then the gravitational field it produces outside itself must always be given exactly by the Schwarzschild solution to Einstein’s equations. That’s true for a perfectly spherical star, and it’s also true for a non-rotating black hole. And in fact that’s why it was often hard to tell if you were dealing with a genuine black hole: because the gravitational field outside it would be the same as for a star of the same mass.

So what happens according to General Relativity if you’re in orbit around something spherical? In a first approximation, the orbit is still elliptical, but the axis of the ellipse can change (“precess”)—and in fact one of the early successes of General Relativity was to explain an effect like this that had been seen for the orbit of the planet Mercury (the “advance of the perihelion”).

Here’s what actually happens as the orbital distance goes down:

The object in the middle looks larger and larger relative to the orbit. In the final picture, there’s no orbit at all, and one just spirals into the object in the middle. In the other cases, there are roughly elliptical orbits, but the precession effect gets larger and larger, and typically one ends up eventually visiting a whole ring of possible positions. (There’s an interactive version of this on the Wolfram Demonstrations Project.)

But does this always happen? The answer is no: one can pick special initial conditions that instead give a variety of closed orbits with various patterns:

So what about a rotating object, or specifically a rotating black hole? One notable feature is a phenomenon called “frame dragging”, which causes orbits to be pulled towards rotating along with the object. A consequence of this is that unless the orbit precisely follows the direction of rotation, it won’t stay in a single plane, and—in a seemingly quite random way—will typically fill up not a ring but a whole 3D torus. (Try out the interactive demonstration to see this.)

Although it eventually fills in a torus, the pattern of the orbit can be fairly different depending on what initial “latitude” one starts from (all these are shown for the same total time):

If you’re sufficiently far away from the black hole, then it turns out that even though you’re pulled by frame dragging, you can in principle overcome the force (say with a powerful enough rocket). But if you’re inside a region called the ergosphere (indicated by the gray region in the pictures), you’d have to be going faster than the speed of light to do that. So the result is that any object that gets into the ergosphere (which extends outside of the event horizon) will inevitably be made to co-rotate with the black hole, just through frame dragging.

And this means that if you can put something into the ergosphere, it can gain energy—ultimately by reducing the angular momentum of the black hole. One could imagine using this as a way to harvest the energy of a black hole—and indeed astronomical phenomena like high-energy gamma ray bursts are thought to be possibly related.

OK, so we’ve talked about orbiting a black hole, and earlier about what happens with two black holes. But what about with more black holes? Well, we can start by asking that question just for simple point masses following Newton’s law of gravity—and it turns out that even there things are already extremely complicated.

The pictures below show a bunch of possible trajectories for three equal-mass pointlike objects interacting through ordinary Newtonian gravity. The only difference between the setup for the different pictures is where the objects were started. But one can see that just changing this initial condition leads to an incredible diversity of behavior:

Here are some animated versions:

Solving the necessary differential equations is fast enough these days in the Wolfram Language that one can actually generate these interactively. Here’s a version in 2D where you can interactively move around the initial positions and velocities:

And here’s a version in 3D where you can set all the positions and velocities in 3D:

If we just had two objects (a “two-body problem”), all that would ever happen is that they’d orbit each other in a simple ellipse. But adding a third object (“three-body problem”) immediately allows dramatically more complexity. Sometimes in the end all three objects just go their separate ways. Sometimes two form a binary system and the third goes separately. And sometimes all three make anything from an orderly arrangement to a complicated tangled mess.

The three-body problem turns out to be a classic example of the chaos-theory idea of sensitive dependence on initial conditions: in many situations, even the tiniest change in, say, the initial position of an object will be progressively amplified. And the result is that if one specifies the initial conditions by numbers (say, for coordinate positions), then the evolution of the system will effectively “excavate” more and more digits in these numbers.

Here’s a particularly simple example. Imagine having a pair of objects in a simple elliptical orbit. Then a third object (assumed to have infinitesimally small mass) is started a certain distance above the plane of the ellipse. Gravity will make the third object oscillate back and forth through this plane forever. But the tricky thing is that the details of these oscillations depend arbitrarily sensitively on the details of the initial conditions.

This picture shows what happens when one starts that third object at one of four different coordinate positions that differ by one part in a billion. For a while, all of them follow what looks like exactly the same trajectory. But then they start to diverge, and eventually each of them does something completely different:

Plotting this in 3D (with the initial position *z*(0) shown going into the page) we can see just how random things can get—even though each specific trajectory is precisely determined by the sequence of digits in the real number that represents its initial condition. (It’s not trivial, by the way, to compute these pictures correctly; it requires using the arbitrary-precision number arithmetic of the Wolfram Language—and as time goes on more and more digits are needed.)

Not surprisingly, there’s no simple formula that represents these results. But a few interesting things have been proved—for example that if one measures each oscillation by how many orbits are completed while it is happening, then one can get any sequence of integers one wants by choosing the initial conditions appropriately.

The two-body problem was solved in terms of mathematical formulas by Isaac Newton in 1687—as a highlight of his introduction of calculus. And in the 1700s and 1800s it was assumed that eventually someone would find the same kind of solution for the three-body problem. But by the end of the 1800s there were results (notably by Henri Poincaré) that suggested there couldn’t be a solution in terms of at least certain kinds of functions.

It’s still not proved that there can’t be solutions in terms of any kind of known functions (much as even though there aren’t algebraic solutions to quintic equations, there are ones in terms of elliptic or hypergeometric functions). But I strongly suspect that there can never, even in principle, be a complete solution to the three-body problem as an explicit formula.

One can think of the time evolution of a system of masses interacting according to gravity as being a computation: you put in the initial conditions, and then you get out where the masses are after a certain time. But how sophisticated is this computation? For the two-body problem, it’s fairly simple. In fact, however long the actual two-body system runs, one can always find the outcome just by plugging numbers into a straightforward formula.

But what about the three-body problem? The pictures above suggest a very different story. And indeed my guess is that the evolution of a three-body system can correspond to an arbitrarily sophisticated computation—and that with suitable initial conditions it should in fact be able, for example, to emulate any Turing machine, and thus act as a universal computer.

I’ve suspected computational universality in the three-body problem for about 35 years now. But it’s a technically complicated thing to prove. Usually in studying computation we look at fundamentally discrete systems, like Turing machines or cellular automata. But the three-body problem is fundamentally continuous—and can for example make use of arbitrarily many digits in the real numbers it’s given as initial conditions.

Still, at least from a formal point of view, one can set up initial conditions that have, say, a finite sequence of nonzero digits. Then one can look at the output from the evolution of the system, binning the results to get a sequence of discrete data (e.g. using ideas of symbolic dynamics). And then the question is whether by changing the initial conditions we can have the output sequence correspond to the result from any program we want—say one that shows which successive numbers are prime, or computes the digits of pi.

So what would it mean if we could prove this kind of computational universality? One thing it would mean is that three-body problem must be computationally irreducible, so there couldn’t ever be a way to “shortcut”—say with a formula—the actual computation it does in getting a result. And another thing it means is that certain infinite-time questions—like whether a particular body can ever escape for any of a particular range of initial conditions—could in general be undecidable.

(There’s a whole discussion about whether the three-body problem, because it works with real numbers, can compute more than a standard universal computer like a Turing machine, which only works with integers. Suffice it here to say that my strong suspicion is that it can’t, at least if one insists that the initial conditions and the results can be expressed in finite symbolic terms.)

How stable are the seemingly random trajectories in the three-body problem? Some are very sensitive to the details of the initial conditions, but others are quite robust. And for example, if one were designing a trajectory for a spacecraft, it seems perfectly possible that one could find a complex and seemingly random trajectory that would achieve some purpose one wants.

Are there cases where actual star or planetary systems will exhibit apparent randomness? There were undoubtedly examples even in the history of our own solar system. But because randomness tends to bring bodies into regions where they haven’t been before, there’s a higher chance of disruption by external effects—such as collisions—and so the apparent randomness probably doesn’t typically last under “natural selection for solar systems” when there are many bodies in the system.

In the ever-difficult problem of working out whether something is of “intelligent origin”, the three-body problem adds another twist—because it allows astronomical processes to show complexity just as a consequence of their intrinsic dynamics. If it is indeed possible to do arbitrary computation with a three-body system, then such a system could in principle be programmed to, say, generate the digits of pi, and perhaps make them visible in the light curve of a star. But often the system will show just as complex behavior from many different initial conditions—and one won’t be able to tell whether the behavior has any element of “purpose”.

Can one pick initial conditions for the three-body problem to achieve particular kinds of behavior? The answer is certainly yes. One example (already found by Lagrange in 1772) is to have the bodies on the corners of an equilateral triangle—which produces stable periodic behavior.

One can find other periodic configurations too:

And indeed, particularly if one allows more bodies, given some specified periodic trajectory, one can probably find (by fairly traditional gradient descent methods) initial conditions that will reproduce it, at least to some accuracy. (A notable example found in 1993 is just three bodies following a figure-eight orbit.)

But what about more-complex trajectories? Clearly, each set of initial conditions gives some kind of behavior. The question is whether it’s useful.

The situation is similar to what I’ve encountered for a long time in studying simple programs like cellular automata: out there in the computational universe of possible programs, there’s all kinds of rich and complex behavior. Now the issue is to “mine” those examples that are actually useful for something.

In practice, I’ve done lots of “algorithm discovery” in the computational universe, setting up criteria and then searching huge numbers of possible programs to find ones that are useful. And I expect exactly the same can be done for gravitational systems like the three-body problem. It’s really a question of formulating some purpose one’s trying to achieve with the system; then one can just start searching, often quite exhaustively, for a case that achieves that purpose.

So how do black holes work in things like the three-body problem? The basic story is simple: so long as the bodies stay far enough apart, it doesn’t matter whether they’re black holes or just generic masses. But if they get close, there’ll start to be relativistic effects, and that’s where black holes will be important. Presumably, however, one can just set up a constraint that there should be no close approaches, and one will still be able to do plenty of gravitational engineering—with black holes or any other massive objects.

If we’re going to be able to do serious black hole engineering, we’d better have a serious source of black holes. It’s not clear that our universe is going to cooperate on this. There are probably big black holes at the centers of galaxies (and that may be the rather unsatisfying answer to “what’s the ‘equilibrium’ state” of a large number of self-gravitating objects). There’s probably a decent population of black holes from collapsed massive stars—perhaps one per thousand stars or so, which means 100 million spread across our galaxy.

There’s an important other point to mention about black holes: if current theories correctly graft certain aspects of quantum mechanics onto the classical physics of the Einstein equations, then any black hole will emit Hawking radiation, and will eventually evaporate away as a result. Star-sized black holes would have huge lifespans, but for less-massive black holes, the lifespan goes down, and for a black hole the mass of Halley’s comet, the lifespan would be about a billion years.

What about tiny black holes? Hawking radiation suggests they should evaporate almost instantly: an electron-mass one should be gone in well under 10^{-100} seconds. (When I was 15 or so, I remember asking a distinguished physicist whether electrons could actually be black holes. He said it was a stupid idea, which probably it was. But in writing this blog I discovered that Einstein also considered this idea—though about 50 years before I did. And as it happens, in my network-based models, electrons do end up being made of “pure space”, not so unlike black holes.)

Even if it’s hard to get genuine gravitational black holes, one might wonder if there could at least be analogs that are easier to get. And in recent years there’s been some success with making “sonic black holes”—that are at least a rough analog of gravitational black holes, but where it’s sound, rather than light, that’s trapped.

OK, so we’re now finally ready to talk about creating technology with black holes. I should say at the outset that I’m not at all happy with what I’ve managed to figure out. Lots of things I thought might work turn out simply to be impossible when one looks at them in the light of actual black hole physics. And some others, while perhaps interesting, require assembling large numbers of black holes, which seems almost absurdly infeasible in our universe—given how sparse at least larger black holes seem to be, with only perhaps 10^19 spread across our whole universe.

But let’s say we just have one black hole. What can we do with it? One answer is to “bask in its time dilation”—or in some sense to use it to do “time travel to the future”.

Special Relativity already exhibits the phenomenon of time dilation, in which time runs more slowly for an object that’s moving quickly. General Relativity also messes around with the rate at which time runs. In particular, in a place with stronger gravity, time runs slower than in a place with weaker gravity. And so this means, for example, that as one goes further from the Earth, time runs slightly faster. (The clocks on GPS satellites are back-corrected for this—making them at least naively appear to “violate General Relativity”.)

Near a black hole, strong gravity can make time run significantly more slowly. There’s a nice example in the movie *Interstellar*, in which there’s a planet orbiting at exactly the right distance from a black hole with exactly the right parameters—so that time runs much more slowly on the planet, but other gravitational effects there aren’t too extreme.

In a sense, as soon as one has a way to make time locally run slower, one can do “time travel to the future”. For the “traveler” a month might have elapsed—but outside it could have been a century. (It’s worth mentioning that one can achieve the same kind of effect without gravity just by doing a trip in which one accelerates to close to the speed of light.)

Of course, even though this would allow “time travel to the future”, it would give no way to get back. For that, one would need so-called closed timelike curves, which do in principle exist in solutions to the Einstein equations (notably, the one found by Kurt Gödel), but which don’t seem to appear in any physically realizable case. (In a system determined by equations, a closed timelike curve is really less about “traveling in time” than it is about defining a consistency condition between what happens in the past and the future.)

In science fiction, black holes and related phenomena tend to be a staple of faster-than-light travel. At a more mundane level, the kind of “gravity assist” maneuvers that real spacecraft do by swinging, say, around Jupiter could be done on a much larger scale if one could swing around a black hole—where the maximum achievable velocity would be essentially the speed of light.

In General Relativity, the only way to effectively go faster than light is to modify the structure of spacetime. For example, one can imagine a “wormhole” or tube that directly connects different places in space. In General Relativity there’s no way to form such a wormhole if it doesn’t already exist—but there’s nothing to say such wormholes couldn’t already have existed at the beginning of the universe. There is a problem, though, in maintaining an “open wormhole”: the curvature of spacetime at the end would tend to create gravity that would make it collapse.

I don’t know if it can be proved that there’s no configuration of, say, orbiting black holes that would keep the wormhole open. One known way to keep it open is to introduce matter with special properties like negative energy density—which sounds implausible until you consider vacuum fluctuations in quantum field theory, inflationary-universe scenarios or dark-energy ideas.

Introducing exotic matter makes all sorts of new solutions possible for the Einstein equations. A notable example is the Alcubierre solution, which in some sense provides a different way to traverse space at any speed, effectively by warping the space.

Could there be a solution to the Einstein equations that allows something similar, without exotic matter? It hasn’t been proved that it’s impossible. And I suppose one could imagine some configuration of judiciously placed black holes that would make it possible.

It’s perhaps worth mentioning that in the models I’ve studied where the underlying structure of spacetime is a network with no predefined number of space dimensions, wormhole-like phenomena seem more natural—though insofar as the models reproduce General Relativity on large scales, this means such phenomena can’t originate on those scales.

It’s easy to generate high energies with a black hole. Matter that spirals in towards the black hole will gain energy—and indeed, around stellar and larger black holes there’s potentially an accretion disk that contains high-energy matter.

With rotating black holes, there are some additional energy phenomena. In the ergosphere, objects can gain energy at the expense of the black hole itself. This is relevant both in accelerating ordinary matter, and in producing “superradiance” where energy is added to waves, say of light, that pass through the ergosphere.

Can one do better with multiple black holes than a single one? I don’t know. Maybe there’s a configuration of orbiting black holes that’s somehow optimized for imparting energy to matter—like a kind of particle accelerator made from black holes.

We saw earlier some of the complex trajectories that three bodies interacting through gravity can follow. But what kind of trajectories can we potentially “engineer”, particularly with more bodies?

It’s not too difficult to start with approximate trajectories and then do gradient descent (e.g. in Fourier space) to try to find trajectories that actually correspond, for example, to closed orbits. So can one for example find a “gravitational crystal” that consists of an infinite regular array of interacting gravitational bodies?

There are some mathematical tricks to apply—and one ends up having to use randomized search more than systematic gradient descent—but there do seem to be gravitational crystals to be found. Here are two potential examples that show a kind of checkerboard symmetry:

I suppose a “gravitational wall” like this might be good for stopping things that approach it. With the right parameters, it might be able to capture anything (perhaps up to some speed) that tries to cross it.

Given a “gravitational crystal”, one can ask about implementing things like cellular automata on it. I don’t know how to store “bits” for cellular automaton cells in lattices like these without disrupting the lattice too much, but I suspect there’s a way. (Yes, classical gravity is reversible, so one would have to have reversible cellular automata, but there are plenty of those.)

What’s shown here is something that’s intended to be a regular, periodic “crystal”. One can also potentially imagine creating a “random crystal” in which there’s overall regularity, but at a small scale there’s seemingly random motion. If one could make such a random crystal work, then it might provide a more robust “wall”, less affected by outside perturbations.

Modularization is an important general technique in engineering because it lets one break a problem into parts and then solve each one separately. But for gravitational systems, it’s hard to do modularization—because gravity is a large-range force, dropping off only gradually with distance.

And even with spinning black holes and the like, I don’t know of any way to achieve the analog of gravitational shielding—though this changes if one introduces exotic matter that effectively has negative mass, or if, for example, every black hole has electric charge.

And without modularization, it’s surely more difficult to create something technologically useful—because in effect one has to figure out everything at once. But it’s certainly conceivable that by searching a space of possibilities one could find something—though without modularization it might look very complicated (as long-range simple programs, like combinators, tend to do), and it could be difficult even to tell what the system achieves without looking for specific properties one already knows.

Having said all this, I suspect that there are big things I am missing—and that with the right ways of thinking, there’ll end up being some spectacular kinds of technology that black holes make possible. And for all we know, once we figure this out we’ll realize that an example of it has already existed in our universe for a billion years, whether of “natural” origin or not.

But for now, the discovery of gravitational radiation from merging black holes is a remarkable example of how something like the small equation Einstein wrote down for the gravitational field a hundred years ago can lead to such elaborate consequences. It’s an impressive endorsement of the strength of theoretical science—and perhaps an inspiration to see just how small the rules might be to generate everything we see in our universe.

]]>It’s been very satisfying to see how successfully Wolfram|Alpha has democratized computational knowledge and how its effects have grown over the years. Now I want to do the same thing with knowledge-based programming—through the Wolfram Open Cloud.

Last week we released Wolfram Programming Lab as an environment for people to learn knowledge-based programming with the Wolfram Language. Today I’m pleased to announce that we’re making Wolfram Programming Lab available for free use on the web in the Wolfram Open Cloud.

Go to wolfram.com, and you’ll see buttons labeled “Immediate Access”. One is for Wolfram|Alpha. But now there are two more: Programming Lab and Development Platform.

Wolfram Programming Lab is for learning to program. Wolfram Development Platform (still in beta) is for doing professional software development. Go to either of these in the Wolfram Open Cloud and you’ll immediately be able to start writing and executing Wolfram Language code in an active Wolfram Language notebook.

Just as with Wolfram|Alpha, you don’t have to log in to use the Wolfram Open Cloud. And you can go pretty far like that. You can create notebook documents that involve computations, text, graphics, interactivity—and all the other things the Wolfram Language can do. You can even deploy active webpages, web apps and APIs that anyone can access and use on the web.

If you want to save things then you’ll need to set up a (free) Wolfram Cloud account. And if you want to get more serious—about computation, deployments or storage—you’ll need to have an actual subscription for Wolfram Programming Lab or Wolfram Development Platform.

But the Wolfram Open Cloud gives anyone a way to do “casual” programming whenever they want—with access to all the core computation, interface, deployment and knowledge capabilities of the Wolfram Language.

In Wolfram|Alpha, you give a single line of natural language input to get your computational knowledge output. In the Wolfram Open Cloud, the power and automation of the Wolfram Language make it possible to give remarkably small amounts of Wolfram Language code to get remarkably sophisticated operations done.

The Wolfram Open Cloud is set up for learning and prototyping and other kinds of casual use. But a great thing about the Wolfram Language is that it’s fully scalable. Start in the Wolfram Open Cloud, then scale up to the full Wolfram Cloud, or to a Wolfram Private Cloud—or instead run in Wolfram Desktop, or, for that matter, in the bundled version for Raspberry Pi computers.

I’ve been working towards what’s now the Wolfram Language for nearly 30 years, and it’s tremendously exciting now to be able to deliver it to anyone anywhere through the Wolfram Open Cloud. It takes a huge stack of technology to make this possible, but what matters most to me is what can be achieved with it.

With Wolfram Programming Lab now available through the Wolfram Open Cloud, anyone anywhere can learn and start doing the latest knowledge-based programming. Last month I published *An Elementary Introduction to the Wolfram Language* (which is free on the web); now there’s a way anyone anywhere can do all the things the book describes.

Ever since the web was young, our company has been creating large-scale public resources for it, from Wolfram MathWorld to the Wolfram Demonstrations Project to Wolfram|Alpha. Today we’re adding what may ultimately be the most significant of all: the Wolfram Open Cloud. In a sense it’s making the web into a true computing environment—in which anyone can use the power of knowledge-based programming to create whatever they want. And it’s an important step towards a world of ubiquitous knowledge-based programming, with all the opportunities that brings for so many people.

*To comment, please visit the copy of this post at the Wolfram Blog »*

That afternoon we were driving through Pasadena, California—and with no apparent concern to the actual process of driving, Feynman’s visitor was energetically pointing out all sorts of things an AI would have to figure if it was to be able to do the driving. I was a bit relieved when we arrived at our destination, but soon the visitor was on to another topic, talking about how brains work, and then saying that as soon as he’d finished his next book he’d be happy to let someone open up his brain and put electrodes inside, if they had a good plan to figure out how it worked.

Feynman often had eccentric visitors, but I was really wondering who this one was. It took a couple more encounters, but then I got to know that eccentric visitor as Marvin Minsky, pioneer of computation and AI—and was pleased to count him as a friend for more than three decades.

This essay is in *Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People* »

Just a few days ago I was talking about visiting Marvin—and I was so sad when I heard he died. I started reminiscing about all the ways we interacted over the years, and all the interests we shared. Every major project of my life I discussed with Marvin, from SMP, my first big software system back in 1981, through Mathematica, *A New Kind of Science*, Wolfram|Alpha and most recently the Wolfram Language.

This picture is from one of the last times I saw Marvin. His health was failing, but he was keen to talk. Having watched more than 35 years of my life, he wanted to tell me his assessment: “You really did it, Steve.” Well, so did you, Marvin! (I’m always “Stephen”, but somehow Americans of a certain age have a habit of calling me “Steve”.)

The Marvin that I knew was a wonderful mixture of serious and quirky. About almost any subject he’d have something to say, most often quite unusual. Sometimes it’d be really interesting; sometimes it’d just be unusual. I’m reminded of a time in the early 1980s when I was visiting Boston and subletting an apartment from Marvin’s daughter Margaret (who was in Japan at the time). Margaret had a large and elaborate collection of plants, and one day I noticed that some of them had developed nasty-looking spots on their leaves.

Being no expert on such things (and without the web to look anything up!), I called Marvin to ask what to do. What ensued was a long discussion about the possibility of developing microrobots that could chase mealybugs away. Fascinating though it was, at the end of it I still had to ask, “But what should I *actually* do about Margaret’s plants?” Marvin replied, “Oh, I guess you’d better talk to my wife.”

For many decades, Marvin was perhaps the world’s greatest energy source for artificial intelligence research. He was a fount of ideas, which he fed to his long sequence of students at MIT. And though the details changed, he always kept true to his goal of figuring out how thinking works, and how to make machines do it.

By the time I knew Marvin, he tended to talk mostly about theories where things could be figured out by what amounts to common sense, perhaps based on psychological or philosophical reasoning. But earlier in his life, Marvin had taken a different approach. His 1954 PhD thesis from Princeton was about artificial neural networks (“Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem”) and it was a mathematics thesis, full of technical math. And in 1956, for example, Marvin published a paper entitled “Some Universal Elements for Finite Automata”, in which he talked about how “complicated machinery can be constructed from a small number of basic elements”.

This particular paper considered only essentially finite machines, based directly on specific models of artificial neural networks. But soon Marvin was looking at more general computational systems, and trying to see what they could do. In a sense, Marvin was beginning just the kind of exploration of the computational universe that years later I would also do, and eventually write *A New Kind of Science* about. And in fact, as early as 1960, Marvin came extremely close to discovering the same core phenomenon I eventually did.

In 1960, as now, Turing machines were used as a standard basic model of computation. And in his quest to understand what computation—and potentially brains—could be built from, Marvin started looking at the very simplest Turing machines (with just 2 states and 2 colors) and using a computer to find out what all 4096 of them actually do. Most he discovered just have repetitive behavior, and a few have what we’d now call nested or fractal behavior. But none do anything more complicated, and indeed Marvin based the final exercise in his classic 1967 book *Computation: Finite and Infinite Machines* on this, noting that “D. G. Bobrow and the author did this for all (2,2) machines [1961, unpublished] by a tedious reduction to thirty-odd cases (unpublishable).”

Years later, Marvin told me that after all the effort he’d spent on the (2,2) Turing machines he wasn’t inclined to go further. But as I finally discovered in 1991, if one just looks at (2,3) Turing machines, then among the 3 million or so of them, there are a few that don’t just show simple behavior any more—and instead generate immense complexity even from their very simple rules.

Back in the early 1960s, even though he didn’t find complexity just by searching simple “naturally occurring” Turing machines, Marvin still wanted to construct the simplest one he could that would exhibit it. And through painstaking work, he came up in 1962 with a (7,4) Turing machine that he proved was universal (and so, in a sense, capable of arbitrarily complex behavior).

At the time, Marvin’s (7,4) Turing machine was the simplest known universal Turing machine. And it kept that record essentially unbroken for 40 years—until I finally published a (2,5) universal Turing machine in *A New Kind of Science*. I felt a little guilty taking the record away from Marvin’s machine after so long. But Marvin was very nice about it. And a few years later he enthusiastically agreed to be on the committee for a prize I put up to establish whether a (2,3) Turing machine that I had identified as the simplest possible candidate for universality was in fact universal.

It didn’t take long for a proof of universality to be submitted, and Marvin got quite involved in some of the technical details of validating it, noting that perhaps we should all have known something like this was possible, given the complexity that Emil Post had observed with the simple rules of what he called a tag system—back in 1921, before Marvin was even born.

When it came to science, it sometimes seemed as if there were two Marvins. One was the Marvin trained in mathematics who could give precise proofs of theorems. The other was the Marvin who talked about big and often quirky ideas far away from anything like mathematical formalization.

I think Marvin was ultimately disappointed with what could be achieved by mathematics and formalization. In his early years he had thought that with simple artificial neural networks—and maybe things like Turing machines—it would be easy to build systems that worked like brains. But it never seemed to happen. And in 1969, with his long-time mathematician collaborator Seymour Papert, Marvin wrote a book that proved that a certain simple class of neural networks known as perceptrons couldn’t (in Marvin’s words) “do anything interesting”.

To Marvin’s later chagrin, people took the book to show that no neural network of any kind could ever do anything interesting, and research on neural networks all but stopped. But a bit like with the (2,2) Turing machines, much richer behavior was actually lurking just out of sight. It started being noticed in the 1980s, but it’s only been in the last couple of years—with computers able to handle almost-brain-scale networks—that the richness of what neural networks can do has begun to become clear.

And although I don’t think anyone could have known it then, we now know that the neural networks Marvin was investigating as early as 1951 were actually on a path that would ultimately lead to just the kind of impressive AI capabilities he was hoping for. It’s a pity it took so long, and Marvin barely got to see it. (When we released our neural-network-based image identifier last year, I sent Marvin a pointer saying “I never thought neural networks would actually work… but…” Sadly, I never ended up talking to Marvin about it.)

Marvin’s earliest approaches to AI were through things like neural networks. But perhaps through the influence of John McCarthy, the inventor of LISP, with whom Marvin started the MIT AI Lab, Marvin began to consider more “symbolic” approaches to AI as well. And in 1961 Marvin got a student of his to write a program in LISP to do symbolic integration. Marvin told me that he wanted the program to be as “human like” as possible—so every so often it would stop and say “Give me a cookie”, and the user would have to respond “A cookie”.

By the standards of Mathematica or Wolfram|Alpha, the 1961 integration program was very primitive. But I’m certainly glad Marvin had it built. Because it started a sequence of projects at MIT that led to the MACSYMA system that I ended up using in the 1970s—that in many ways launched my efforts on SMP and eventually Mathematica.

Marvin himself, though, didn’t go on thinking about using computers to do mathematics, but instead started working on how they might do the kind of tasks that all humans—including children—routinely do. Marvin’s collaborator Seymour Papert, who had worked with developmental psychologist Jean Piaget, was interested in how children learn, and Marvin got quite involved in Seymour’s project of developing a computer language for children. The result was Logo—a direct precursor of Scratch—and for a brief while in the 1970s Marvin and Seymour had a company that tried to market Logo and a hardware “turtle” to schools.

For me there was always a certain mystique around Marvin’s theories about AI. In some ways they seemed like psychology, and in some ways philosophy. But occasionally there’d actually be pieces of software—or hardware—that claimed to implement them, often in ways that I didn’t understand very well.

Probably the most spectacular example was the Connection Machine, developed by Marvin’s student Danny Hillis and his company Thinking Machines (for which Richard Feynman and I were both consultants). It was always in the air that the Connection Machine was built to implement one of Marvin’s theories about the brain, and might be seen one day as like the “transistor of artificial intelligence”. But I, for example, ended up using its massively parallel architecture to implement cellular automaton models of fluids, and not anything AI-ish at all.

Marvin was always having new ideas and theories. And even as the Connection Machine was being built, he was giving me drafts of his book *The Society of Mind*, which talked about new and different approaches to AI. Ever one to do the unusual, Marvin told me he thought about writing the book in verse. But instead the book is structured a bit like so many conversations I had with Marvin: with one idea on each page, often good, but sometimes not—yet always lively.

I think Marvin viewed *The Society of Mind* as his magnum opus, and I think he was disappointed that more people didn’t understand and appreciate it. It probably didn’t help that the book came out in the 1980s, when AI was at its lowest ebb. But somehow I think to really appreciate what’s in the book one would need Marvin there, presenting his ideas with his characteristic personal energy and responding to any objections one might have about them.

Marvin was used to having theories about thinking that could be figured out just by thinking—a bit like the ancient philosophers had done. But Marvin was interested in everything, including physics. He wasn’t an expert on the formalism of physics, though he did make contributions to physics topics (notably patenting a confocal microscope). And through his long-time friend Ed Fredkin, he had already been introduced to cellular automata in the early 1960s. He really liked the philosophy of having physics based on them—and ended up for example writing a paper entitled “Nature Abhors an Empty Vacuum” that talked about how one might in effect engineer certain features of physics from cellular automata.

Marvin didn’t do terribly much with cellular automata, though in 1970 he and Fredkin used something like them in the Triadex Muse digital music synthesizer that they patented and marketed—an early precursor of cellular-automaton-based music composition.

Marvin was very supportive of my work on cellular automata and other simple programs, though I think he found my orientation towards natural science a bit alien. During the decade that I worked on *A New Kind of Science* I interacted with Marvin with some regularity. He was starting work on a book then too, about emotions, that he told me in 1992 he hoped “might reform how people think about themselves”. I talked to him occasionally about his book, trying I suppose to understand the epistemological character of it (I once asked if it was a bit like Freud in this respect, and he said yes). It took 15 years for Marvin to finish what became *The Emotion Machine*. I know he had other books planned too; in 2006, for example, he told me he was working on a book on theology that was “a couple of years away”—but which sadly never saw the light of day.

It was always a pleasure to see Marvin. Often it would be at his big house in Brookline, Massachusetts. As soon as one entered, Marvin would start saying something unusual. It could be, “What would we conclude if the sun didn’t set today?” Or, “You’ve got to come see the actual binary tree in my greenhouse.” Once someone told me that Marvin could give a talk about almost anything, but if one wanted it to be good, one should ask him an interesting question just before he started, and then that’d be what he would talk about. I realized this was how to handle conversations with Marvin too: bring up a topic and then he could be counted on to say something unusual and often interesting about it.

I remember a few years ago bringing up the topic of teaching programming, and how I was hoping the Wolfram Language would be relevant to it. Marvin immediately launched into talking about how programming languages are the only ones that people are expected to learn to write before they can read. He said he’d been trying to convince Seymour Papert that the best way to teach programming was to start by showing people good code. He gave the example of teaching music by giving people *Eine kleine Nachtmusik*, and asking them to transpose it to a different rhythm and see what bugs occur. (Marvin was a long-time enthusiast of classical music.) In just this vein, one way the Wolfram Programming Lab that we launched just last week lets people learn programming is by starting with good code, and then having them modify it.

There was always a certain warmth to Marvin. He liked and supported people; he connected with all sorts of interesting people; he enjoyed telling nice stories about people. His house always seemed to buzz with activity, even as, over the years, it piled up with stuff to the point where the only free space was a tiny part of a kitchen table.

Marvin also had a great love of ideas. Ones that seemed important. Ones that were strange and unusual. But I think in the end Marvin’s greatest pleasure was in connecting ideas with people. He was a hacker of ideas, but I think the ideas became meaningful to him when he used them as a way to connect with people.

I shall miss all those conversations about ideas—both ones I thought made sense and ones I thought didn’t. Of course, Marvin was always a great enthusiast of cryonics, so perhaps this isn’t the end of the story. But at least for now, farewell, Marvin, and thank you.

]]>

I’ve long wanted to have a way to let anybody—kids, adults, whoever—get a hands-on introduction to the Wolfram Language and everything it makes possible, even if they’ve had no experience with programming before. Now we have a way!

The startup screen gives four places to go. First, there’s a quick video. Then it’s hands on, with “Try It Yourself”—going through some very simple but interesting computations.

Then there are two different paths. Either start learning systematically—or jump right in and explore. My new book *An Elementary Introduction to the Wolfram Language* is the basis for the systematic approach.

The whole book is available inside Wolfram Programming Lab. And the idea is that as you read the book, you can immediately try things out for yourself—whether you’re making up your own computations, or doing the exercises given in the book.

But there’s also another way to use Wolfram Programming Lab: just jump right in and explore. Programming Lab comes with several dozen Explorations—each providing an activity with a different focus. When you open an Exploration, you see a series of steps with code ready to run.

Press Shift+Enter (or the button) to run each piece of code and see what it does—or edit the code first and then run your own version. The idea is always to start with a piece of code that works, and then modify it to do different things. It’s like you’re starting off learning to read the language; then you’re beginning to write it. You can always press the “Show Details” button to open up an explanation of what’s going on.

Each Exploration goes through a series of steps to build to a final result. But then there’s usually a “Go Further” button that gives you suggestions for free-form projects to do based on the Exploration.

When you create something neat, you can share it with your friends, teachers, or anyone else. Just press the button to create a webpage of what you’ve made.

I first started thinking about making something like Wolfram Programming Lab quite a while ago. I’d had lots of great experiences showing the Wolfram Language in person to people from middle-school-age on up. But I wanted us to find a way for people to get started with the Wolfram Language on their own.

We used our education expertise to put together a whole series of what seemed like good approaches, building prototypes and testing them with groups of kids. It was often a sobering experience—with utter failure in a matter of minutes. Sometimes the problem was that there was nothing the kids found interesting. Sometimes the kids were confused about what to do. Sometimes they’d do a little, but clearly not understand what they were doing.

At first we thought that it was just a matter of finding the one “right approach”: immersion language learning, systematic exercise-based learning, project-based learning, or something else. But gradually we realized we needed to allow not just one approach, but instead several that could be used interchangeably on different occasions or by different people. And once we did this, our tests started to be more and more successful—leading us in the end to the Wolfram Programming Lab that we have today.

I’m very excited about the potential of Wolfram Programming Lab. In fact, we’ve already started developing a whole ecosystem around it—with online and offline educational and community programs, lots of opportunities for students, educators, volunteers and others, and a whole variety of additional deployment channels.

Wolfram Programming Lab can be used by people on their own—but it can also be used by teachers in classrooms. Explain things through a demo based on an Exploration. Do a project based on a Go Further suggestion (with live coding if you’re bold). Use the *Elementary Introduction* book as the basis for lectures or independent reading. Use exercises from the book as class projects or homework.

Wolfram Programming Lab is something that’s uniquely made possible by the Wolfram Language. Because it’s only with the whole knowledge-based programming approach—and all the technology we’ve built—that one gets to the point where simple code can routinely do really interesting and compelling things.

It’s a very important—and in fact transformative—moment for programming education.

In the past one could use a “toy programming language” like Scratch, or one could use a professional low-level programming language like C++ or Java. Scratch is easy to use, but is very limited. C++ or Java can ultimately do much more (though they don’t have built-in knowledge), but you need to put in significant time—and get deep into the engineering details—to make programs that get beyond a toy level of functionality.

With the Wolfram Language, though, it’s a completely different story. Because now even beginners can write programs that do really interesting things. And the programs don’t have to just be “computer science exercises”: they can be programs that immediately connect to the real world, and to what students study across the whole curriculum.

Wolfram Programming Lab gives people a broad way to learn modern programming—and to acquire an incredibly valuable career-building practical skill. But it also helps develop the kind of computational thinking that’s increasingly central to today’s world.

For many students (and others) today, Wolfram|Alpha serves as a kind of “zeroth” programming language. The Wolfram Language is not only an incredibly powerful professional programming language, but also a great first programming language. Wolfram Programming Lab lets people learn the Wolfram Language—and computational thinking—while preserving as much as possible the accessibility and simplicity of Wolfram|Alpha.

I’m excited to see how Wolfram Programming Lab is used. I think it’s going to open up programming like never before—and give all sorts of people around the world the opportunity to join the new generation of programmers who turn ideas into reality using computational thinking and the Wolfram Language.

*To comment, please visit the copy of this post at the Wolfram Blog »*

Ada Lovelace was born 200 years ago today. To some she is a great hero in the history of computing; to others an overestimated minor figure. I’ve been curious for a long time what the real story is. And in preparation for her bicentennial, I decided to try to solve what for me has always been the “mystery of Ada”.

It was much harder than I expected. Historians disagree. The personalities in the story are hard to read. The technology is difficult to understand. The whole story is entwined with the customs of 19th-century British high society. And there’s a surprising amount of misinformation and misinterpretation out there.

But after quite a bit of research—including going to see many original documents—I feel like I’ve finally gotten to know Ada Lovelace, and gotten a grasp on her story. In some ways it’s an ennobling and inspiring story; in some ways it’s frustrating and tragic.

It’s a complex story, and to understand it, we’ll have to start by going over quite a lot of facts and narrative.

This essay is in *Idea Makers: Personal Perspectives on the Lives & Ideas of Some Notable People* »

Let’s begin at the beginning. Ada Byron, as she was then called, was born in London on December 10, 1815 to recently married high-society parents. Her father, Lord Byron (George Gordon Byron) was 27 years old, and had just achieved rock-star status in England for his poetry. Her mother, Annabella Milbanke, was a 23-year-old heiress committed to progressive causes, who inherited the title Baroness Wentworth. Her father said he gave her the name “Ada” because “It is short, ancient, vocalic”.

Ada’s parents were something of a study in opposites. Byron had a wild life—and became perhaps the top “bad boy” of the 19th century—with dark episodes in childhood, and lots of later romantic and other excesses. In addition to writing poetry and flouting the social norms of his time, he was often doing the unusual: keeping a tame bear in his college rooms in Cambridge, living it up with poets in Italy and “five peacocks on the grand staircase”, writing a grammar book of Armenian, and—had he not died too soon—leading troops in the Greek war of independence (as celebrated by a big statue in Athens), despite having no military training whatsoever.

Annabella Milbanke was an educated, religious and rather proper woman, interested in reform and good works, and nicknamed by Byron “Princess of Parallelograms”. Her very brief marriage to Byron fell apart when Ada was just 5 weeks old, and Ada never saw Byron again (though he kept a picture of her on his desk and famously mentioned her in his poetry). He died at the age of 36, at the height of his celebrityhood, when Ada was 8. There was enough scandal around him to fuel hundreds of books, and the PR battle between the supporters of Lady Byron (as Ada’s mother styled herself) and of him lasted a century or more.

Ada led an isolated childhood on her mother’s rented country estates, with governesses and tutors and her pet cat, Mrs. Puff. Her mother, often absent for various (quite wacky) health cures, enforced a system of education for Ada that involved long hours of study and exercises in self control. Ada learned history, literature, languages, geography, music, chemistry, sewing, shorthand and mathematics (taught in part through experiential methods) to the level of elementary geometry and algebra. When Ada was 11, she went with her mother and an entourage on a year-long tour of Europe. When she returned she was enthusiastically doing things like studying what she called “flyology”—and imagining how to mimic bird flight with steam-powered machines.

But then she got sick with measles (and perhaps encephalitis)—and ended up bedridden and in poor health for 3 years. She finally recovered in time to follow the custom for society girls of the period: on turning 17 she went to London for a season of socializing. On June 5, 1833, 26 days after she was “presented at Court” (i.e. met the king), she went to a party at the house of 41-year-old Charles Babbage (whose oldest son was the same age as Ada). Apparently she charmed the host, and he invited her and her mother to come back for a demonstration of his newly constructed Difference Engine: a 2-foot-high hand-cranked contraption with 2000 brass parts, now to be seen at the Science Museum in London:

Ada’s mother called it a “thinking machine”, and reported that it “raised several Nos. to the 2nd & 3rd powers, and extracted the root of a Quadratic Equation”. It would change the course of Ada’s life.

What was the story of Charles Babbage? His father was an enterprising and successful (if personally distant) goldsmith and banker. After various schools and tutors, Babbage went to Cambridge to study mathematics, but soon was intent on modernizing the way mathematics was done there, and with his lifelong friends John Herschel (son of the discoverer of Uranus) and George Peacock (later a pioneer in abstract algebra), founded the Analytical Society (which later became the Cambridge Philosophical Society) to push for reforms like replacing Newton’s (“British”) dot-based notation for calculus with Leibniz’s (“Continental”) function-based one.

Babbage graduated from Cambridge in 1814 (a year before Ada Lovelace was born), went to live in London with his new wife, and started establishing himself on the London scientific and social scene. He didn’t have a job as such, but gave public lectures on astronomy and wrote respectable if unspectacular papers about various mathematical topics (functional equations, continued products, number theory, etc.)—and was supported, if modestly, by his father and his wife’s family.

In 1819 Babbage visited France, and learned about the large-scale government project there to make logarithm and trigonometry tables. Mathematical tables were of considerable military and commercial significance in those days, being used across science, engineering and finance, as well as in areas like navigation. It was often claimed that errors in tables could make ships run aground or bridges collapse.

Back in England, Babbage and Herschel started a project to produce tables for their new Astronomical Society, and it was in the effort to check these tables that Babbage is said to have exclaimed, “I wish to God these tables had been made by steam!”—and began his lifelong effort to mechanize the production of tables.

There were mechanical calculators long before Babbage. Pascal made one in 1642, and we now know there was even one in antiquity. But in Babbage’s day such machines were still just curiosities, not reliable enough for everyday practical use. Tables were made by human computers, with the work divided across a team, and the lowest-level computations being based on evaluating polynomials (say from series expansions) using the method of differences.

What Babbage imagined is that there could be a machine—a Difference Engine—that could be set up to compute any polynomial up to a certain degree using the method of differences, and then automatically step through values and print the results, taking humans and their propensity for errors entirely out of the loop.

(Museum of the History of Science)

By early 1822, the 30-year-old Babbage was busy studying different types of machinery, and producing plans and prototypes of what the Difference Engine could be. The Astronomical Society he’d co-founded awarded him a medal for the idea, and in 1823 the British government agreed to provide funding for the construction of such an engine.

Babbage was slightly distracted in 1824 by the prospect of joining a life insurance startup, for which he did a collection of life-table calculations. But he set up a workshop in his stable (his “garage”), and kept on having ideas about the Difference Engine and how its components could be made with the tools of his time.

In 1827, Babbage’s table of logarithms—computed by hand—was finally finished, and would be reprinted for nearly 100 years. Babbage had them printed on yellow paper on the theory that this would minimize user error. (When I was in elementary school, logarithm tables were still the fast way to do multiplication.)

Also in 1827, Babbage’s father died, leaving him about £100K, or perhaps $14 million today, setting up Babbage financially for the rest of his life. The same year, though, his wife died. She had had eight children with him, but only three survived to adulthood.

Dispirited by his wife’s death, Babbage took a trip to continental Europe, and being impressed by what he saw of the science being done there, wrote a book entitled *Reflections on the Decline of Science in England*, that ended up being mainly a diatribe against the Royal Society (of which he was a member).

Though often distracted, Babbage continued to work on the Difference Engine, generating thousands of pages of notes and designs. He was quite hands on when it came to personally drafting plans or doing machine-shop experiments. But he was quite hands off in managing the engineers he hired—and he did not do well at managing costs. Still, by 1832 a working prototype of a small Difference Engine (without a printer) had successfully been completed. And this is what Ada Lovelace saw in June 1833.

(Science Museum / Science & Society Picture Library)

Ada’s encounter with the Difference Engine seems to be what ignited her interest in mathematics. She had gotten to know Mary Somerville, translator of Laplace and a well-known expositor of science—and partly with her encouragement, was soon, for example, enthusiastically studying Euclid. And in 1834, Ada went along on a philanthropic tour of mills in the north of England that her mother was doing, and was quite taken with the then-high-tech equipment they had.

On the way back, Ada taught some mathematics to the daughters of one of her mother’s friends. She continued by mail, noting that this could be “the commencement of ‘A Sentimental Mathematical Correspondence carried on for years between two ladies of rank’ to be hereafter published no doubt for the edification of mankind, or womankind”. It wasn’t sophisticated math, but what Ada said was clear, complete with admonitions like “You should never select an *indirect* proof, when a *direct* one can be given.” (There’s a lot of underlining, here shown as italics, in all Ada’s handwritten correspondence.)

Babbage seems at first to have underestimated Ada, trying to interest her in the Silver Lady automaton toy that he used as a conversation piece for his parties (and noting his addition of a turban to it). But Ada continued to interact with (as she put it) Mr. Babbage and Mrs. Somerville, both separately and together. And soon Babbage was opening up to her about many intellectual topics, as well as about the trouble he was having with the government over funding of the Difference Engine.

In the spring of 1835, when Ada was 19, she met 30-year-old William King (or, more accurately, William, Lord King). He was a friend of Mary Somerville’s son, had been educated at Eton (the same school where I went 150 years later) and Cambridge, and then had been a civil servant, most recently at an outpost of the British Empire in the Greek islands. William seems to have been a precise, conscientious and decent man, if somewhat stiff. But in any case, Ada and he hit it off, and they were married on July 8, 1835, with Ada keeping the news quiet until the last minute to avoid paparazzi-like coverage.

The next several years of Ada’s life seem to have been dominated by having three children and managing a large household—though she had some time for horse riding, learning the harp, and mathematics (including topics like spherical trigonometry). In 1837, Queen Victoria (then 18) came to the throne, and as a member of high society, Ada met her. In 1838, William was made an earl for his government work, and Ada become the Countess of Lovelace.

Within a few months of the birth of her third child in 1839, Ada decided to get more serious about mathematics again. She told Babbage she wanted to find a “mathematical Instructor” in London, though asked that in making enquiries he not mention her name, presumably for fear of society gossip.

The person identified was Augustus De Morgan, first professor of mathematics at University College London, noted logician, author of several textbooks, and not only a friend of Babbage’s, but also the husband of the daughter of Ada’s mother’s main childhood teacher. (Yes, it was a small world. De Morgan was also a friend of George Boole’s—and was the person who indirectly caused Boolean algebra to be invented.)

In Ada’s correspondence with Babbage, she showed interest in discrete mathematics, and wondered, for example, if the game of solitaire “admits of being put into a mathematical Formula, and solved”. But in keeping with the math education traditions of the time (and still today), De Morgan set Ada on studying calculus.

Her letters to De Morgan about calculus are not unlike letters from a calculus student today—except for the Victorian English. Even many of the confusions are the same—though Ada was more sensitive than some to the bad notations of calculus (“why can’t one multiply by dx?”, etc.). Ada was a tenacious student, and seemed to have had a great time learning more and more about mathematics. She was pleased by the mathematical abilities she discovered in herself, and by De Morgan’s positive feedback about them. She also continued to interact with Babbage, and on one visit to her estate (in January 1841, when she was 25), she charmingly told the then-49-year-old Babbage, “If you are a *Skater*, pray bring *Skates* to Ockham; that being the fashionable occupation here now, & one *I* have much taken to.”

Ada’s relationship with her mother was a complex one. Outwardly, Ada treated her mother with great respect. But in many ways she seems to have found her controlling and manipulative. Ada’s mother was constantly announcing that she had medical problems and might die imminently (she actually lived to age 64). And she increasingly criticized Ada for her child rearing, household management and deportment in society. But by February 6, 1841, Ada was feeling good enough about herself and her mathematics to write a very open letter to her mother about her thoughts and aspirations.

She wrote: “I believe myself to possess a most singular combination of qualities exactly fitted to make me pre-eminently a discoverer of the hidden realities of nature.” She talked of her ambition to do great things. She talked of her “insatiable & restless energy” which she believed she finally had found a purpose for. And she talked about how after 25 years she had become less “secretive & suspicious” with respect to her mother.

But then, three weeks later, her mother dropped a bombshell, claiming that before Ada was born, Byron and his half-sister had had a child together. Incest like that wasn’t actually illegal in England at the time, but it was scandalous. Ada took the whole thing very hard, and it derailed her from mathematics.

Ada had had intermittent health problems for years, but in 1841 they apparently worsened, and she started systematically taking opiates. She was very keen to excel in something, and began to get the idea that perhaps it should be music and literature rather than math. But her husband William seems to have talked her out of this, and by late 1842 she was back to doing mathematics.

What had Babbage been up to while all this had been going on? He’d been doing all sorts of things, with varying degrees of success.

After several attempts, he’d rather honorifically been appointed Lucasian Professor of Mathematics at Cambridge—but never really even spent time in Cambridge. Still, he wrote what turned out to be a fairly influential book, *On the Economy of Machinery and Manufactures*, dealing with such things as how to break up tasks in factories (an issue that had actually come up in connection with the human computation of mathematical tables).

In 1837, he weighed in on the then-popular subject of natural theology, appending his *Ninth Bridgewater Treatise* to the series of treatises written by other people. The central question was whether there is evidence of a deity from the apparent design seen in nature. Babbage’s book is quite hard to read, opening for example with, “The notions we acquire of contrivance and design arise from comparing our observations on the works of other beings with the intentions of which we are conscious in our own undertakings.”

In apparent resonance with some of my own work 150 years later, he talks about the relationship between mechanical processes, natural laws and free will. He makes statements like “computations of great complexity can be effected by mechanical means”, but then goes on to claim (with rather weak examples) that a mechanical engine can produce sequences of numbers that show unexpected changes that are like miracles.

Babbage tried his hand at politics, running for parliament twice on a manufacturing-oriented platform, but failed to get elected, partly because of claims of misuse of government funds on the Difference Engine.

Babbage also continued to have upscale parties at his large and increasingly disorganized house in London, attracting such luminaries as Charles Dickens, Charles Darwin, Florence Nightingale, Michael Faraday and the Duke of Wellington—with his aged mother regularly in attendance. But even though the degrees and honors that he listed after his name ran to 6 lines, he was increasingly bitter about his perceived lack of recognition.

Central to this was what had happened with the Difference Engine. Babbage had hired one of the leading engineers of his day to actually build the engine. But somehow, after a decade of work—and despite lots of precision machine tool development—the actual engine wasn’t done. Back in 1833, shortly after he met Ada, Babbage had tried to rein in the project—but the result was that his engineer quit, and insisted that he got to keep all the plans for the Difference Engine, even the ones that Babbage himself had drawn.

But right around this time, Babbage decided he’d had a better idea anyway. Instead of making a machine that would just compute differences, he imagined an “Analytical Engine” that supported a whole list of possible kinds of operations, that could in effect be done in an arbitrarily programmed sequence. At first, he just thought about having the machine evaluate fixed formulas, but as he studied different use cases, he added other capabilities, like conditionals—and figured out often very clever ways to implement them mechanically. But, most important, he figured out how to control the steps in a computation using punched cards of the kind that had been invented in 1801 by Jacquard for specifying patterns of weaving on looms.

(Museum of the History of Science)

Babbage created some immensely complicated designs, and today it seems remarkable that they could work. But back in 1826 Babbage had invented something he called Mechanical Notation—that was intended to provide a symbolic representation for the operation of machinery in the same kind of way that mathematical notation provides a symbolic representation for operations in mathematics.

Babbage was disappointed already in 1826 that people didn’t appreciate his invention. Undoubtedly people didn’t understand it, since even now it’s not clear how it worked. But it may have been Babbage’s greatest invention—because apparently it’s what let him figure out all his elaborate designs.

Babbage’s original Difference Engine project had cost the British government £17,500 or the equivalent of perhaps $2 million today. It was a modest sum relative to other government expenditures, but the project was unusual enough to lead to a fair amount of discussion. Babbage was fond of emphasizing that—unlike many of his contemporaries—he hadn’t taken government money himself (despite chargebacks for renovating his stable as a fireproof workshop, etc.). He also claimed that he eventually spent £20,000 of his own money—or the majority of his fortune (no, I don’t see how the numbers add up)—on his various projects. And he kept on trying to get further government support, and created plans for a Difference Engine No. 2, requiring only 8000 parts instead of 25,000.

By 1842, the government had changed, and Babbage insisted on meeting with the new prime minister (Robert Peel), but ended up just berating him. In parliament the idea of funding the Difference Engine was finally killed with quips like that the machine should be set to compute when it would be of use.** **(The transcripts of debates about the Difference Engine have a certain charm—especially when they discuss its possible uses for state statistics that strangely parallel computable-country opportunities with Wolfram|Alpha today.)

Despite the lack of support in England, Babbage’s ideas developed some popularity elsewhere, and in 1840 Babbage was invited to lecture on the Analytical Engine in Turin, and given honors by the Italian government.

Babbage had never published a serious account of the Difference Engine, and had never published anything at all about the Analytical Engine. But he talked about the Analytical Engine in Turin, and notes were taken by a certain Luigi Menabrea, who was then a 30-year-old army engineer—but who, 27 years later, became prime minister of Italy (and also made contributions to the mathematics of structural analysis).

In October 1842, Menabrea published a paper in French based on his notes. When Ada saw the paper, she decided to translate it into English and submit it to a British publication. Many years later Babbage claimed he suggested to Ada that she write her own account of the Analytical Engine, and that she had responded that the thought hadn’t occurred to her. But in any case, by February 1843, Ada had resolved to do the translation but add extensive notes of her own.

Over the months that followed she worked very hard—often exchanging letters almost daily with Babbage (despite sometimes having other “pressing and unavoidable engagements”). And though in those days letters were sent by post (which did come 6 times a day in London at the time) or carried by a servant (Ada lived about a mile from Babbage when she was in London), they read a lot like emails about a project might today, apart from being in Victorian English. Ada asks Babbage questions; he responds; she figures things out; he comments on them. She was clearly in charge, but felt she was first and foremost explaining Babbage’s work, so wanted to check things with him—though she got annoyed when Babbage, for example, tried to make his own corrections to her manuscript.

It’s charming to read Ada’s letter as she works on debugging her computation of Bernoulli numbers: “My Dear Babbage. I am in much dismay at having got into so amazing a quagmire & botheration with these *Numbers*, that I cannot possibly get the thing done today. …. I am now going out on horseback. Tant mieux.” Later she told Babbage: “I have worked incessantly, & most successfully, all day. You will admire the Table & Diagram extremely. They have been made out with extreme care, & all the indices most minutely & scrupulously attended to.” Then she added that William (or “Lord L.” as she referred to him) “is at this moment kindly inking it all over for me. I had to do it in pencil…”

William was also apparently the one who suggested that she sign the translation and notes. As she wrote to Babbage: “It is not my wish to *proclaim* who has written it; at the same time I rather wish to append anything that may tend hereafter to *individualize*, & *identify* it, with the other productions of the said A.A.L.” (for “Ada Augusta Lovelace”).

By the end of July 1843, Ada had pretty much finished writing her notes. She was proud of them, and Babbage was complimentary about them. But Babbage wanted one more thing: he wanted to add an anonymous preface (written by him) that explained how the British government had failed to support the project. Ada thought it a bad idea. Babbage tried to insist, even suggesting that without the preface the whole publication should be withdrawn. Ada was furious, and told Babbage so. In the end, Ada’s translation appeared, signed “AAL”, without the preface, followed by her notes headed “Translator’s Note”.

Ada was clearly excited about it, sending reprints to her mother, and explaining that “No one can estimate the trouble & *interminable* labour of having to revise the printing of *mathematical* formulae. This is a pleasant prospect for the future, as I suppose many hundreds & thousands of such formulae will come forth from my pen, in one way or another.” She said that her husband William had been excitedly giving away copies to his friends too, and Ada wrote, “William especially conceives that it places me in a much *juster* & *truer* position & light, than anything else can. And he tells me that it has already placed *him* in a far more agreeable position in this country.”

Within days, there was also apparently society gossip about Ada’s publication. She explained to her mother that she and William “are by no means desirous of making it a secret, altho’ I do not wish the *importance* of the thing to be exaggerated and overrated”. She saw herself as being a successful expositor and interpreter of Babbage’s work, setting it in a broader conceptual framework that she hoped could be built on.

There’s lots to say about the actual content of Ada’s notes. But before we get to that, let’s finish the story of Ada herself.

While Babbage’s preface wasn’t itself a great idea, one good thing it did for posterity was to cause Ada on August 14, 1843 to write Babbage a fascinating, and very forthright, 16-page letter. (Unlike her usual letters, which were on little folded pages, this was on large sheets.) In it, she explains that while he is often “implicit” in what he says, she is herself “always a very ‘explicit function of x’”. She says that “Your affairs have been, & are, deeply occupying both myself and Lord Lovelace…. And the result is that I have plans for you…” Then she proceeds to ask, “If I am to lay before you in the course of a year or two, explicit & honorable propositions for *executing your engine* … would there be any chance of allowing myself … to conduct the business for you; your own *undivided* energies being devoted to the execution of the work …”

In other words, she basically proposed to take on the role of CEO, with Babbage becoming CTO. It wasn’t an easy pitch to make, especially given Babbage’s personality. But she was skillful in making her case, and as part of it, she discussed their different motivation structures. She wrote, “My own uncompromising principle is to endeavour to love *truth & God before fame & glory* …”, while “Yours is to love truth & God … but to love *fame, glory, honours, yet more*.” Still, she explained, “Far be it from me, to disclaim the influence of ambition & fame. No living soul ever was more imbued with it than myself … but I certainly would not deceive myself or others by pretending it is other than a very important motive & ingredient in my character & nature.”

She ended the letter, “I wonder if you will choose to retain the lady-fairy in your service or not.”

At noon the next day she wrote to Babbage again, asking if he would help in “the *final* revision”. Then she added, “You will have had my long letter this morning. Perhaps you will not choose to have anything more to do with me. But I hope the best…”

At 5 pm that day, Ada was in London, and wrote to her mother: “I am uncertain as yet how the Babbage business will end…. I have written to him … very explicitly; stating my own *conditions* … He has so strong an idea of the *advantage* of having *my* pen as his servant, that he will probably yield; though I demand very strong concessions. If he *does* consent to what I propose, I shall probably be enabled to keep him out of much hot water; & to bring his engine to *consummation*, (which all I have seen of him & his habits the last 3 months, makes me scarcely anticipate it ever *will* be, unless someone really exercises a strong coercive influence over him). He is beyond measure *careless* & *desultory* at times. — I shall be willing to be his Whipper-in during the next 3 years if I see fair prospect of success.”

But on Babbage’s copy of Ada’s letter, he scribbled, “Saw A.A.L. this morning and refused all the conditions”.

Yet on August 18, Babbage wrote to Ada about bringing drawings and papers when he would next come to visit her. The next week, Ada wrote to Babbage that “We are quite delighted at your (somewhat *unhoped* for) proposal” [of a long visit with Ada and her husband]. And Ada wrote to her mother: “Babbage & I are I think more friends than ever. I have never seen him so agreeable, so reasonable, or in such good spirits!”

Then, on Sept. 9, Babbage wrote to Ada, expressing his admiration for her and (famously) describing her as “Enchantress of Number” and “my dear and much admired Interpreter”. (Yes, despite what’s often quoted, he wrote “Number” not “Numbers”.)

The next day, Ada responded to Babbage, “You are a brave man to give yourself wholly up to Fairy-Guidance!”, and Babbage signed off on his next letter as “Your faithful Slave”. And Ada described herself to her mother as serving as the “High-Priestess of Babbage’s Engine”.

But unfortunately that’s not how things worked out. For a while it was just that Ada had to take care of household and family things that she’d neglected while concentrating on her Notes. But then her health collapsed, and she spent many months going between doctors and various “cures” (her mother suggested “mesmerism”, i.e. hypnosis), all the while watching their effects on, as she put it, “that portion of the material forces of the world entitled the body of A.A.L.”

She was still excited about science, though. She’d interacted with Michael Faraday, who apparently referred to her as “the *rising star* of Science”. She talked about her first publication as her “first-born”, “with a colouring & undercurrent (rather *hinted at* & *suggested* than definitely expressed) of *large, general, & metaphysical views*”, and said that “He [the publication] will make an excellent head (I hope) of a large family of brothers & sisters”.

When her notes were published, Babbage had said “You should have written an original paper. The postponement of that will however only render it more perfect.” But by October 1844, it seemed that David Brewster (inventor of the kaleidoscope, among other things) would write about the Analytical Engine, and Ada asked if perhaps Brewster could suggest another topic for her, saying “I rather think some physiological topics would suit me as well as any.”

And indeed later that year, she wrote to a friend (who was also her lawyer, as well as being Mary Somerville’s son): “It does not appear to me that cerebral matter need be more unmanageable to mathematicians than *sidereal* & *planetary* matter & movements; if they would but inspect it from the *right point of view*. I hope to bequeath to the generations a *Calculus of the Nervous System*.” An impressive vision—10 years before, for example, George Boole would talk about similar things.

Both Babbage and Mary Somerville had started their scientific publishing careers with translations, and she saw herself as doing the same, saying that perhaps her next works would be reviews of Whewell and Ohm, and that she might eventually become a general “prophet of science”.

There were roadblocks, to be sure. Like that, at that time, as a woman, she couldn’t get access to the Royal Society’s library in London, even though her husband, partly through her efforts, was a member of the society. But the most serious issue was still Ada’s health. She had a whole series of problems, though in 1846 she was still saying optimistically “Nothing is needed but a year or two more of patience & *cure*”.

There were also problems with money. William had a never-ending series of elaborate—and often quite innovative—construction projects (he seems to have been particularly keen on towers and tunnels). And to finance them, they had to turn to Ada’s mother, who often made things difficult. Ada’s children were also approaching teenage-hood, and Ada was exercised by many issues that were coming up with them.

Meanwhile, she continued to have a good social relationship with Babbage, seeing him with considerable frequency, though in her letters talking more about dogs and pet parrots than the Analytical Engine. In 1848 Babbage developed a hare-brained scheme to construct an engine that played tic-tac-toe, and to tour it around the country as a way to raise money for his projects. Ada talked him out of it. The idea was raised for Babbage to meet Prince Albert to discuss his engines, but it never happened.

William also dipped his toe into publishing. He had already written short reports with titles like “Method of growing Beans and Cabbages on the same Ground” and “On the Culture of Mangold-Wurzel”. But in 1848 he wrote one more substantial piece, comparing the productivity of agriculture in France and England, based on detailed statistics, with observations like “It is demonstrable, not only that the Frenchman is much worse off than the Englishman, but that he is less well fed than during the devastating exhaustion of the empire.”

1850 was a notable year for Ada. She and William moved into a new house in London, intensifying their exposure to the London scientific social scene. She had a highly emotional experience visiting for the first time her father’s family’s former estate in the north of England—and got into an argument with her mother about it. And she got more deeply involved in betting on horseracing, and lost some money doing it. (It wouldn’t have been out of character for Babbage or her to invent some mathematical scheme for betting, but there’s no evidence that they did.)

In May 1851 the Great Exhibition opened at the Crystal Palace in London. (When Ada visited the site back in January, Babbage advised, “Pray put on worsted stockings, cork soles and every other thing which can keep you warm.”) The exhibition was a high point of Victorian science and technology, and Ada, Babbage and their scientific social circle were all involved (though Babbage less so than he thought he should be). Babbage gave out many copies of a flyer on his Mechanical Notation. William won an award for brick-making.

But within a year, Ada’s health situation was dire. For a while her doctors were just telling her to spend more time at the seaside. But eventually they admitted she had cancer (from what we know now, probably cervical cancer). Opium no longer controlled her pain; she experimented with cannabis. By August 1852, she wrote, “I begin to understand Death; which is going on quietly & gradually every minute, & will never be a thing of one particular moment”. And on August 19, she asked Babbage’s friend Charles Dickens to visit and read her an account of death from one of his books.

Her mother moved into her house, keeping other people away from her, and on September 1, Ada made an unknown confession that apparently upset William. She seemed close to death, but she hung on, in great pain, for nearly 3 more months, finally dying on November 27, 1852, at the age of 36. Florence Nightingale, nursing pioneer and friend of Ada’s, wrote: “They said she could not possibly have lived so long, were it not for the tremendous vitality of the brain, that would not die.”

Ada had made Babbage the executor of her will. And—much to her mother’s chagrin—she had herself buried in the Byron family vault next to her father, who, like her, died at age 36 (Ada lived 266 days longer). Her mother built a memorial that included a sonnet entitled “The Rainbow” that Ada wrote.

Ada’s funeral was small; neither her mother nor Babbage attended. But the obituaries were kind, if Victorian in their sentiments:

William outlived her by 41 years, eventually remarrying. Her oldest son—with whom Ada had many difficulties—joined the navy several years before she died, but deserted. Ada thought he might have gone to America (he was apparently in San Francisco in 1851), but in fact he died at 26 working in a shipyard in England. Ada’s daughter married a somewhat wild poet, spent many years in the Middle East, and became the world’s foremost breeder of Arabian horses. Ada’s youngest son inherited the family title, and spent most of his life on the family estate.

Ada’s mother died in 1860, but even then the gossip about her and Byron continued, with books and articles appearing, including Harriet Beecher Stowe’s 1870 *Lady Byron Vindicated*. In 1905, a year before he died, Ada’s youngest son—who had been largely brought up by Ada’s mother—published a book about the whole thing, with such choice lines as “Lord Byron’s life contained nothing of any interest except what ought not to have been told”.

When Ada died, there was a certain air of scandal that seemed to hang around her. Had she had affairs? Had she run up huge gambling debts? There’s scant evidence of either. Perhaps it was a reflection of her father’s “bad boy” image. But before long there were claims that she’d pawned the family jewels (twice!), or lost, some said, £20,000, or maybe even £40,000 (equivalent to about $7 million today) betting on horses.

It didn’t help that Ada’s mother and her youngest son both seemed against her. On September 1, 1852—the same day as her confession to William—Ada had written, “It is my earnest and dying request that all my friends who have letters from me will deliver them to my mother Lady Noel Byron after my death.” Babbage refused. But others complied, and, later on, when her son organized them, he destroyed some.

But many thousands of pages of Ada’s documents still remain, scattered around the world. Back-and-forth letters that read like a modern text stream, setting up meetings, or mentioning colds and other ailments. Charles Babbage complaining about the postal service. Three Greek sisters seeking money from Ada because their dead brother had been a page for Lord Byron. Charles Dickens talking about chamomile tea. Pleasantries from a person Ada met at Paddington Station. And household accounts, with entries for note paper, musicians, and ginger biscuits. And then, mixed in with all the rest, serious intellectual discussion about the Analytical Engine and many other things.

So what happened to Babbage? He lived 18 more years after Ada, dying in 1871. He tried working on the Analytical Engine again in 1856, but made no great progress. He wrote papers with titles like “On the Statistics of Light-Houses”, “Table of the Relative Frequency of Occurrences of the Causes of Breaking Plate-Glass Windows”, and “On Remains of Human Art, mixed with the Bones of Extinct Races of Animals”.

Then in 1864 he published his autobiography, *Passages from the Life of a Philosopher*—a strange and rather bitter document. The chapter on the Analytical Engine opens with a quote from a poem by Byron—“Man wrongs, and Time avenges”—and goes on from there. There are chapters on “Theatrical experience”, “Hints for travellers” (including on advice about how to get an RV-like carriage in Europe), and, perhaps most peculiar, “Street nuisances”. For some reason Babbage waged a campaign against street musicians who he claimed woke him up at 6 am, and caused him to lose a quarter of his productive time. One wonders why he didn’t invent a sound-masking solution, but his campaign was so notable, and so odd, that when he died it was a leading element of his obituary.

Babbage never remarried after his wife died, and his last years seem to have been lonely ones. A gossip column of the time records impressions of him:

Apparently he was fond of saying that he would gladly give up the remainder of his life if he could spend just 3 days 500 years in the future. When he died, his brain was preserved, and is still on display…

Even though Babbage never finished his Difference Engine, a Swedish company did, and even already displayed part of it at the Great Exhibition. When Babbage died, many documents and spare parts from his Difference Engine project passed to his son Major-General Henry Babbage, who published some of the documents, and privately assembled a few more devices, including part of the Mill for the Analytical Engine. Meanwhile, the fragment of the Difference Engine that had been built in Babbage’s time was deposited at the Science Museum in London.

After Babbage died, his life work on his engines was all but forgotten (though did, for example, get a mention in the 1911 Encyclopaedia Britannica). Mechanical computers nevertheless continued to be developed, gradually giving way to electromechanical ones, and eventually to electronic ones. And when programming began to be understood in the 1940s, Babbage’s work—and Ada’s Notes—were rediscovered.

People knew that “AAL” was Ada Augusta Lovelace, and that she was Byron’s daughter. Alan Turing read her Notes, and coined the term “Lady Lovelace’s Objection” (“an AI can’t originate anything”) in his 1950 Turing Test paper. But Ada herself was still largely a footnote at that point.

It was a certain Bertram Bowden—a British nuclear physicist who went into the computer industry and eventually became Minister of Science and Education—who “rediscovered” Ada. In researching his 1953 book *Faster Than Thought* (yes, about computers), he located Ada’s granddaughter Lady Wentworth (the daughter of Ada’s daughter), who told him the family lore about Ada, both accurate and inaccurate, and let him look at some of Ada’s papers. Charmingly, Bowden notes that in Ada’s granddaughter’s book *Thoroughbred Racing Stock*, there is use of binary in computing pedigrees. Ada, and the Analytical Engine, of course, used decimal, with no binary in sight.

But even in the 1960s, Babbage—and Ada—weren’t exactly well known. Babbage’s Difference Engine prototype had been given to the Science Museum in London, but even though I spent lots of time at the Science Museum as a child in the 1960s, I’m pretty sure I never saw it there. Still, by the 1980s, particularly after the US Department of Defense named its ill-fated programming language after Ada, awareness of Ada Lovelace and Charles Babbage began to increase, and biographies began to appear, though sometimes with hair-raising errors (my favorite is that the mention of “the problem of three bodies” in a letter from Babbage indicated a romantic triangle between Babbage, Ada and William—while it actually refers to the three-body problem in celestial mechanics!).

As interest in Babbage and Ada increased, so did curiosity about whether the Difference Engine would actually have worked if it had been built from Babbage’s plans. A project was mounted, and in 2002, after a heroic effort, a complete Difference Engine was built, with only one correction in the plans being made. Amazingly, the machine worked. Building it cost about the same, inflation adjusted, as Babbage had requested from the British government back in 1823.

What about the Analytical Engine? So far, no real version of it has ever been built—or even fully simulated.

OK, so now that I’ve talked (at length) about the life of Ada Lovelace, what about the actual content of her Notes on the Analytical Engine?

They start crisply: “The particular function whose integral the Difference Engine was constructed to tabulate, is …”. She then explains that the Difference Engine can compute values of any 6th degree polynomial—but the Analytical Engine is different, because it can perform any sequence of operations. Or, as she says: “The Analytical Engine is an *embodying of the science of operations*, constructed with peculiar reference to abstract number as the subject of those operations. The Difference Engine is the embodying of one particular and very limited set of operations…”

Charmingly, at least for me, considering the years I have spent working on Mathematica, she continues at a later point: “We may consider the engine as the *material and mechanical representative of analysis*, and that our actual working powers in this department of human study will be enabled more effectually than heretofore to keep pace with our theoretical knowledge of its principles and laws, through the complete control which the engine gives us over the executive manipulation of algebraical and numerical symbols.”

A little later, she explains that punched cards are how the Analytical Engine is controlled, and then makes the classic statement that “the Analytical Engine *weaves algebraical patterns* just as the Jacquard-loom weaves flowers and leaves”.

Ada then goes through how a sequence of specific kinds of computations would work on the Analytical Engine, with “Operation Cards” defining the operations to be done, and “Variable Cards” defining the locations of values. Ada talks about “cycles” and “cycles of cycles, etc”, now known as loops and nested loops, giving a mathematical notation for them:

There’s a lot of modern-seeming content in Ada’s notes. She comments that “There is in existence a beautiful woven portrait of Jacquard, in the fabrication of which 24,000 cards were required.” Then she discusses the idea of using loops to reduce the number of cards needed, and the value of rearranging operations to optimize their execution on the Analytical Engine, ultimately showing that just 3 cards could do what might seem like it should require 330.

Ada talks about just how far the Analytical Engine can go in computing what was previously not computable, at least with any accuracy. And as an example she discusses the three-body problem, and the fact that in her time, of “about 295 coefficients of lunar perturbations” there were many on which different peoples’ computations didn’t agree.

Finally comes Ada’s Note G. Early on, she states: “The Analytical Engine has no pretensions whatever to *originate* anything. It can do whatever we *know how to order it* to perform…. Its province is to assist us in making available what we are already acquainted with.”

Ada seems to have understood with some clarity the traditional view of programming: that we engineer programs to do things we know how to do. But she also notes that in actually putting “the truths and the formulae of analysis” into a form amenable to the engine, “the nature of many subjects in that science are necessarily thrown into new lights, and more profoundly investigated.” In other words—as I often point out—actually programming something inevitably lets one do more exploration of it.

She goes on to say that “in devising for mathematical truths a new form in which to record and throw themselves out for actual use, views are likely to be induced, which should again react on the more theoretical phase of the subject”, or in other words—as I have also often said—representing mathematical truths in a computable form is likely to help one understand those truths themselves better.

Ada seems to have understood, though, that the “science of operations” implemented by the engine would not only apply to traditional mathematical operations. For example, she notes that if “the fundamental relations of pitched sounds in the science of harmony” were amenable to abstract operations, then the engine could use them to “compose elaborate and scientific pieces of music of any degree of complexity or extent”. Not a bad level of understanding for 1843.

What’s become the most famous part of what Ada wrote is the computation of Bernoulli numbers, in Note G. This seems to have come out of a letter she wrote to Babbage, in July 1843. She begins the letter with “I am working very hard for you; like the Devil in fact; (which perhaps I am)”. Then she asks for some specific references, and finally ends with, “I want to put in something about Bernoulli’s Numbers, in one of my Notes, as an example of how an implicit function may be worked out by the engine, without having been worked out by human head & hands first…. Give me the necessary data & formulae.”

Ada’s choice of Bernoulli numbers to show off the Analytical Engine was an interesting one. Back in the 1600s, people spent their lives making tables of sums of powers of integers—in other words, tabulating values of for different *m* and *n*. But Jakob Bernoulli pointed out that all such sums can be expressed as polynomials in *m*, with the coefficients being related to what are now called Bernoulli numbers. And in 1713 Bernoulli was proud to say that he’d computed the first 10 Bernoulli numbers “in a quarter of an hour”—reproducing years of other peoples’ work.

Today, of course, it’s instantaneous to do the computation in the Wolfram Language:

And, as it happens, a few years ago, just to show off new algorithms, we even computed 10 million of them.

But, OK, so how did Ada plan to do it? She started from the fact that Bernoulli numbers appear in the series expansion

Then by rearranging this and matching up powers of *x*, she got a sequence of equations for the Bernoulli numbers *B _{n}*—which she then “unravelled” to give a recurrence relation of the form:

Now Ada had to specify how to actually compute this on the Analytical Engine. First, she used the fact that odd Bernoulli numbers (other than *B*_{1}) are zero, then computed *B _{n}*, which is our modern

On the Analytical Engine, the idea was to have a sequence of operations (specified by “Operation Cards”) performed by the “Mill”, with operands coming from the “Store” (with addresses specified by “Variable Cards”). (In the Store, each number was represented by a sequence of wheels, each turned to the appropriate value for each digit.) To compute Bernoulli numbers the way Ada wanted takes two nested loops of operations. With the Analytical Engine design that existed at the time, Ada had to basically unroll these loops. But in the end she successfully produced a description of how *B*_{8} (which she called *B*_{7}) could be computed:

This is effectively the execution trace of a program that runs for 25 steps (plus a loop) on the Analytical Engine. At each step, the trace shows what operation is performed on which Variable Cards, and which Variable Cards receive the results. Lacking a symbolic notation for loops, Ada just indicated loops in the execution trace using braces, noting in English that parts are repeated.

And in the end, the final result of the computation appears in location 24:

As it’s printed, there’s a bug in Ada’s execution trace on line 4: the fraction is upside down. But if you fix that, it’s easy to get a modern version of what Ada did:

And here’s what the same scheme gives for next two (nonzero) Bernoulli numbers. As Ada figured out it doesn’t ultimately take any more storage locations (specified by Variable Cards) to compute higher Bernoulli numbers, just more operations.

The Analytical Engine, as it was designed in 1843, was supposed to store 1000 40-digit numbers, which would in principle have allowed it to compute up to perhaps *B*_{50} (=495057205241079648212477525/66). It would have been reasonably fast too; the Analytical Engine was intended to do about 7 operations per second. So Ada’s *B*_{8} would have taken about 5 seconds and *B*_{50} would have taken perhaps a minute.

Curiously, even in our record-breaking computation of Bernoulli numbers a few years ago, we were basically using the same algorithm as Ada—though now there are slightly faster algorithms that effectively compute Bernoulli number numerators modulo a sequence of primes, then reconstruct the full numbers using the Chinese Remainder Theorem.

The Analytical Engine and its construction were all Babbage’s work. So what did Ada add? Ada saw herself first and foremost as an expositor. Babbage had shown her lots of plans and examples of the Analytical Engine. She wanted to explain what the overall point was—as well as relate it, as she put it, to “large, general, & metaphysical views”.

In the surviving archive of Babbage’s papers (discovered years later in his lawyer’s family’s cowhide trunk), there are a remarkable number of drafts of expositions of the Analytical Engine, starting in the 1830s, and continuing for decades, with titles like “Of the Analytical Engine” and “The Science of Number Reduced to Mechanism”. Why Babbage never published any of these isn’t clear. They seem like perfectly decent descriptions of the basic operation of the engine—though they are definitely more pedestrian than what Ada produced.

When Babbage died, he was writing a “History of the Analytical Engine”, which his son completed. In it, there’s a dated list of “446 Notations of the Analytical Engine”, each essentially a representations of how some operation—like division—could be done on the Analytical Engine. The dates start in the 1830s, and run through the mid-1840s, with not much happening in the summer of 1843.

Meanwhile, in the collection of Babbage’s papers at the Science Museum, there are some sketches of higher-level operations on the Analytical Engine. For example, from 1837 there’s “Elimination between two equations of the first degree”—essentially the evaluation of a rational function:

There are a few very simple recurrence relations:

Then from 1838, there’s a computation of the coefficients in the product of two polynomials:

But there’s nothing as sophisticated—or as clean—as Ada’s computation of the Bernoulli numbers. Babbage certainly helped and commented on Ada’s work, but she was definitely the driver of it.

So what did Babbage say about that? In his autobiography written 26 years later, he had a hard time saying anything nice about anyone or anything. About Ada’s Notes, he writes: “We discussed together the various illustrations that might be introduced: I suggested several, but the selection was entirely her own. So also was the algebraic working out of the different problems, except, indeed, that relating to the numbers of Bernoulli, which I had offered to do to save Lady Lovelace the trouble. This she sent back to me for an amendment, having detected a grave mistake which I had made in the process.”

When I first read this, I thought Babbage was saying that he basically ghostwrote all of Ada’s Notes. But reading what he wrote again, I realize it actually says almost nothing, other than that he suggested things that Ada may or may not have used.

To me, there’s little doubt about what happened: Ada had an idea of what the Analytical Engine should be capable of, and was asking Babbage questions about how it could be achieved. If my own experiences with hardware designers in modern times are anything to go by, the answers will often have been very detailed. Ada’s achievement was to distill from these details a clear exposition of the abstract operation of the machine—something which Babbage never did. (In his autobiography, he basically just refers to Ada’s Notes.)

For all his various shortcomings, the very fact that Babbage figured out how to build even a functioning Difference Engine—let alone an Analytical Engine—is extremely impressive. So how did he do it? I think the key was what he called his Mechanical Notation. He first wrote about it in 1826 under the title “On a Method of Expressing by Signs the Action of Machinery”. His idea was to take a detailed structure of a machine and abstract a kind of symbolic diagram of how its part acts on each other. His first example was a hydraulic device:

Then he gave the example of a clock, showing on the left a kind of “execution trace” of how the components of the clock change, and on the right a kind of “block diagram” of their relationships:

It’s a pretty nice way to represent how a system works, similar in some ways to a modern timing diagram—but not quite the same. And over the years that Babbage worked on the Analytical Engine, his notes show ever more complex diagrams. It’s not quite clear what something like this means:

But it looks surprisingly like a modern Modelica representation—say in Wolfram SystemModeler. (One difference in modern times is that subsystems are represented much more hierarchically; another is that everything is now computable, so that actual behavior of the system can be simulated from the representation.)

But even though Babbage used his various kinds of diagrams extensively himself, he didn’t write papers about them. Indeed, his only other publication about “Mechanical Notation” is the flyer he had printed up for the Great Exhibition in 1851—apparently a pitch for standardization in drawings of mechanical components (and indeed these notations appear on Babbage’s diagrams like the one above).

I’m not sure why Babbage didn’t do more to explain his Mechanical Notation and his diagrams. Perhaps he was just bitter about peoples’ failure to appreciate it in 1826. Or perhaps he saw it as the secret that let him create his designs. And even though systems engineering has progressed a long way since Babbage’s time, there may yet be inspiration to be had from what Babbage did.

OK, so what’s the bigger picture of what happened with Ada, Babbage and the Analytical Engine?

Charles Babbage was an energetic man who had many ideas, some of them good. At the age of 30 he thought of making mathematical tables by machine, and continued to pursue this idea until he died 49 years later, inventing the Analytical Engine as a way to achieve his objective. He was good—even inspired—at the engineering details. He was bad at keeping a project on track.

Ada Lovelace was an intelligent woman who became friends with Babbage (there’s zero evidence they were ever romantically involved). As something of a favor to Babbage, she wrote an exposition of the Analytical Engine, and in doing so she developed a more abstract understanding of it than Babbage had—and got a glimpse of the incredibly powerful idea of universal computation.

The Difference Engine and things like it are special-purpose computers, with hardware that’s built to do only one kind of thing. One might have thought that to do lots of different kinds of things would necessarily require lots of different kinds of computers. But this isn’t true. And instead it’s a fundamental fact that it’s possible to make general-purpose computers, where a single fixed piece of hardware can be programmed to do any computation. And it’s this idea of universal computation that for example makes software possible—and that launched the whole computer revolution in the 20th century.

Gottfried Leibniz had already had a philosophical concept of something like universal computation back in the 1600s. But it wasn’t followed up. And Babbage’s Analytical Engine is the first explicit example we know of a machine that would have been capable of universal computation.

Babbage didn’t think of it in these terms, though. He just wanted a machine that was as effective as possible at producing mathematical tables. But in the effort to design this, he ended up with a universal computer.

When Ada wrote about Babbage’s machine, she wanted to explain what it did in the clearest way—and to do this she looked at the machine more abstractly, with the result that she ended up exploring and articulating something quite recognizable as the modern notion of universal computation.

What Ada did was lost for many years. But as the field of mathematical logic developed, the idea of universal computation arose again, most clearly in the work of Alan Turing in 1936. Then when electronic computers were built in the 1940s, it was realized they too exhibited universal computation, and the connection was made with Turing’s work.

There was still, though, a suspicion that perhaps some other way of making computers might lead to a different form of computation. And it actually wasn’t until the 1980s that universal computation became widely accepted as a robust notion. And by that time, something new was emerging—notably through work I was doing: that universal computation was not only something that’s possible, but that it’s actually common.

And what we now know (embodied for example in my Principle of Computational Equivalence) is that beyond a low threshold a very wide range of systems—even of very simple construction—are actually capable of universal computation.

A Difference Engine doesn’t get there. But as soon as one adds just a little more, one will have universal computation. So in retrospect, it’s not surprising that the Analytical Engine was capable of universal computation.

Today, with computers and software all around us, the notion of universal computation seems almost obvious: of course we can use software to compute anything we want. But in the abstract, things might not be that way. And I think one can fairly say that Ada Lovelace was the first person ever to glimpse with any clarity what has become a defining phenomenon of our technology and even our civilization: the notion of universal computation.

What if Ada’s health hadn’t failed—and she had successfully taken over the Analytical Engine project? What might have happened then?

I don’t doubt that the Analytical Engine would have been built. Maybe Babbage would have had to revise his plans a bit, but I’m sure he would have made it work. The thing would have been the size of a train locomotive, with maybe 50,000 moving parts. And no doubt it would have been able to compute mathematical tables to 30- or 50-digit precision at the rate of perhaps one result every 4 seconds.

Would they have figured that the machine could be electromechanical rather than purely mechanical? I suspect so. After all, Charles Wheatstone, who was intimately involved in the development of the electric telegraph in the 1830s, was a good friend of theirs. And by transmitting information electrically through wires, rather than mechanically through rods, the hardware for the machine would have been dramatically reduced, and its reliability (which would have been a big issue) would have been dramatically increased.

Another major way that modern computers reduce hardware is by dealing with numbers in binary rather than decimal. Would they have figured that idea out? Leibniz knew about binary. And if George Boole had followed up on his meeting with Babbage at the Great Exhibition, maybe that would have led to something. Binary wasn’t well known in the mid-1800s, but it did appear in puzzles, and Babbage, at least, was quite into puzzles: a notable example being his question of how to make a square of words with “bishop” along the top and side (which now takes just a few lines of Wolfram Language code to solve).

Babbage’s primary conception of the Analytical Engine was as a machine for automatically producing mathematical tables—either printing them out by typesetting, or giving them as plots by drawing onto a plate. He imagined that humans would be the main users of these tables—although he did think of the idea of having libraries of pre-computed cards that would provide machine-readable versions.

Today—in the Wolfram Language for example—we never store much in the way of mathematical tables; we just compute what we need when we need it. But in Babbage’s day—with the idea of a massive Analytical Engine—this way of doing things would have been unthinkable.

So, OK: would the Analytical Engine have gotten beyond computing mathematical tables? I suspect so. If Ada had lived as long as Babbage, she would still have been around in the 1890s when Herman Hollerith was doing card-based electromechanical tabulation for the census (and founding what would eventually become IBM). The Analytical Engine could have done much more.

Perhaps Ada would have used the Analytical Engine—as she began to imagine—to produce algorithmic music. Perhaps they would have used it to solve things like the three-body problem, maybe even by simulation. If they’d figured out binary, maybe they would even have simulated things like cellular automata.

Neither Babbage nor Ada ever made money commercially (and, as Babbage took pains to point out, his government contracts just paid his engineers, not him). If they had developed the Analytical Engine, would they have found a business model for it? No doubt they would have sold some engines to governments. Maybe they would even have operated a kind of cloud computing service for Victorian science, technology, finance and more.

But none of this actually happened, and instead Ada died young, the Analytical Engine was never finished, and it took until the 20th century for the power of computation to be discovered.

If one had met Charles Babbage, what would he have been like? He was, I think, a good conversationalist. Early in life he was idealistic (“do my best to leave the world wiser than I found it”); later he was almost a Dickensian caricature of a bitter old man. He gave good parties, and put great value in connecting with the highest strata of intellectual society. But particularly in his later years, he spent most of his time alone in his large house, filled with books and papers and unfinished projects.

Babbage was never a terribly good judge of people, or of how what he said would be perceived by them. And even in his eighties, he was still quite child-like in his polemics. He was also notoriously poor at staying focused; he always had a new idea to pursue. The one big exception to this was his almost-50-year persistence in trying to automate the process of computation.

I myself have shared a modern version of this very goal in my own life (…, Mathematica, Wolfram|Alpha, Wolfram Language, …)—though so far only for 40 years. I am fortunate to have lived in a time when ambient technology made this much easier to achieve, but in every large project I have done it has still taken a certain singlemindedness and gritty tenacity—as well as leadership—to actually get it finished.

So what about Ada? From everything I can tell, she was a clear speaking, clear thinking individual. She came from the upper classes, but didn’t wear especially fashionable clothes, and carried herself much less like a stereotypical countess than like an intellectual. As an adult, she was emotionally quite mature—probably more so than Babbage—and seems to have had a good practical grasp of people and the world.

Like Babbage, she was independently wealthy, and had no need to work for a living. But she was ambitious, and wanted to make something of herself. In person, beyond the polished Victorian upper-class exterior, I suspect she was something of a nerd, complete with math jokes and everything. She was also capable of great and sustained focus, for example over the months she spent writing her Notes.

In mathematics, she successfully learned up to the state of the art in her time—probably about the same level as Babbage. Unlike Babbage, we don’t know of any specific research she did in mathematics, so it’s hard to judge how good she would have been; Babbage was respectable though unremarkable.

When one reads Ada’s letters, what comes through is a smart, sophisticated person, with a clear, logical mind. What she says is often dressed in Victorian pleasantaries—but underneath, the ideas are clear and often quite forceful.

Ada was very conscious of her family background, and of being “Lord Byron’s daughter”. At some level, his story and success no doubt fueled her ambition, and her willingness to try new things. (I can’t help thinking of her leading the engineers of the Analytical Engine as a bit like Lord Byron leading the Greek army.) But I also suspect his troubles loomed over her. For many years, partly at her mother’s behest, she eschewed things like poetry. But she was drawn to abstract ways of thinking, not only in mathematics and science, but also in more metaphysical areas.

And she seems to have concluded that her greatest strength would be in bridging the scientific with the metaphysical—perhaps in what she called “poetical science”. It was likely a correct self perception. For that is in a sense exactly what she did in the Notes she wrote: she took Babbage’s detailed engineering, and made it more abstract and “metaphysical”—and in the process gave us a first glimpse of the idea of universal computation.

The story of Ada and Babbage has many interesting themes. It is a story of technical prowess meeting abstract “big picture” thinking. It is a story of friendship between old and young. It is a story of people who had the confidence to be original and creative.

It is also a tragedy. A tragedy for Babbage, who lost so many people in his life, and whose personality pushed others away and prevented him from realizing his ambitions. A tragedy for Ada, who was just getting started in something she loved when her health failed.

We will never know what Ada could have become. Another Mary Somerville, famous Victorian expositor of science? A Steve-Jobs-like figure who would lead the vision of the Analytical Engine? Or an Alan Turing, understanding the abstract idea of universal computation?

That Ada touched what would become a defining intellectual idea of our time was good fortune. Babbage did not know what he had; Ada started to see glimpses and successfully described them.

For someone like me the story of Ada and Babbage has particular resonance. Like Babbage, I have spent much of my life pursuing particular goals—though unlike Babbage, I have been able to see a fair fraction of them achieved. And I suspect that, like Ada, I have been put in a position where I can potentially see glimpses of some of the great ideas of the future.

But the challenge is to be enough of an Ada to grasp what’s there—or at least to find an Ada who does. But at least now I think I have an idea of what the original Ada born 200 years ago today was like: a fitting personality on the road to universal computation and the present and future achievements of computational thinking.

It’s been a pleasure getting to know you, Ada.

*Quite a few organizations and people helped in getting information and material for this post. I’d like to thank the British Library, the Museum of the History of Science, Oxford, Science Museum, London; the Bodleian Library, Oxford (with permission from the Earl of Lytton, Ada’s great-great grandson, and one of her 10 living descendants); the New York Public Library; St. Mary Magdalene Church, Hucknall, Nottinghamshire (Ada’s burial place); and Betty Toole (author of a collection of Ada’s letters); as well as two old friends: Tim Robinson (re-creator of Babbage engines) and Nathan Myhrvold (funder of Difference Engine #2 re-creation). *

I wasn’t sure if I was ever going to write another book. My last book—*A New Kind of Science*—took me more than a decade of intensely focused work, and is the largest personal project I’ve ever done.

But a little while ago, I realized there was another book I had to write: a book that would introduce people with no knowledge of programming to the Wolfram Language and the kind of computational thinking it allows.

The result is *An Elementary Introduction to the Wolfram Language*, published today in print, free on the web, etc.

The goal of the book is to take people from zero to the point where they know enough about the Wolfram Language that they can routinely use it to create programs for things they want to do. And when I say “zero”, I really mean “zero”. This is a book for everyone. It doesn’t assume any knowledge of programming, or math (beyond basic arithmetic), or anything else. It just starts from scratch and explains things. I’ve tried to make it appropriate for both adults and kids. I think it’ll work for typical kids aged about 12 and up.

In the past, a book like this would have been inconceivable. The necessary underlying technology just didn’t exist. Serious programming was always difficult, and there wasn’t a good way to connect with real-world concepts. But now we have the Wolfram Language. It’s taken three decades. But now we’ve built in enough knowledge and automated enough of the process of programming that it’s actually realistic to take almost anyone from zero to the frontiers of what can be achieved with computation.

But how should one actually do it? What should one explain, in what order? Those were challenges I had to address to write this book. I’d written a Fast Introduction for Programmers that in 30 pages or so introduces people who already know about modern programming to the core concepts of the Wolfram Language. But what about people who don’t start off knowing anything about programming?

For many years I’ve found various opportunities to show what’s now the Wolfram Language to people like that. And now I’ve used my experience to figure out what to do in the book.

In essence, the book brings the reader into a conversation with the computer. There are two great things about the Wolfram Language that make this really work. First, that the language is symbolic, so that anything one’s dealing with—a color, an image, a graph, whatever—can be right there in the dialog. And second, that the language can be purely functional, so that everything is stateless, and every input can be self contained.

It’s also very important that the Wolfram Language has built-in knowledge that lets one immediately compute with real-world things.

Oh, and visualization is really important too—so it’s easy to see what one’s computing.

OK, but where should one start? The very first page is about arithmetic—just because that’s a place where everyone can see that a computation is actually happening:

There’s a section called Vocabulary because that’s what it is: one’s learning some “words” in the Wolfram Language. Then there are exercises, which I’ll talk about soon.

OK, but once one’s done arithmetic, where should one go next? What I decided to do was to go immediately to the idea of functions—and to first introduce them in terms of arithmetic. The advantage of this is that while the concept of a function may be new, the operation it’s doing (namely arithmetic) is familiar.

And once one’s understood the function Plus, one can immediately go to functions like Max that don’t have special inputs. What Max does isn’t that exciting, though. So as a slightly more exciting function, what I introduce next is RandomInteger—which people often like to run over and over again, to see what it produces.

OK, so what next? The obvious answer is that we have to introduce lists. But what should one do with lists? Doing something like picking elements out of them isn’t terribly exciting, and it’s hard immediately to see why it’s important. So instead what I decided was to make the very first function I show for lists be ListPlot. It’s nice to start getting in the idea of visualization—and it’s also a good example of how one can type in a tiny piece of code, and get something bigger and more interesting out.

Actually, the best extremely simple example of that is Range, which I also show at this point. Range is a great way to show the computer actually computing something, with a result that’s easy to understand.

But OK, so now we want to reinforce the idea of functions, and functions working together. The function Reverse isn’t incredibly common in practice, but it’s very easy to understand, so I introduce it next, followed by Join.

What’s nice then is that between Reverse, Range and Join we have a little microlanguage that’s completely self-contained, but lets us do a variety of computations. And, of course, whatever computations one does, one can immediately see the results, either symbolically or visually.

The next couple of sections talk about displaying and operating on lists, reinforcing what’s already been said, and introducing a variety of functions that are useful in practice. Then it’s on to Table—a very common and powerful function, that in effect packages up a lot of what might otherwise need explicit loops and so on.

I start with trivial versions of Table, without any iteration variable. I take it for granted (as people who don’t know “better” do!) that Table can produce a list of graphics just like it can produce a list of numbers. (Of course, the fact that it can do this is a consequence of the fundamentally symbolic character of the Wolfram Language.)

The next big step is to introduce a variable into Table. I thought a lot about how to do this, and decided that the best thing to show first is the purely symbolic version. After all, we’ve already introduced functions, and with the symbolic version, one can immediately see where the variable goes. But now that we’ve got Table with variables, we can really go to town and start doing what people will think of as “real computations”.

In the first few sections of the book, the raw material for our computations is basically numbers and lists. What I wanted to do next was to show that there are other things to compute with. I chose colors as the first example. Colors are good because (a) everyone knows what they are, (b) you can actually compute with them and (c) they make colorful output (!).

After colors we’re ready for some graphics. I haven’t talked about coordinates yet, so I can only show individual graphical objects, without placement information.

There’s absolutely no reason not to go to 3D, and I do.

Now we’re all set up for something “advanced”: interactive manipulation. It’s pretty much like Table, except that one gets out a complete interactive user interface. And since we’ve introduced graphics, those can be part of the interface. People have seen interactive interfaces in lots of consumer software. My experience is that they’re pretty excited to be able to create them from scratch themselves.

The next, perhaps surprising thing I introduce in the book is image processing. Yes, there’s a lot of sophisticated computation behind image processing. But in the Wolfram Language that’s all internal. And what people see are just functions—like Blur and ColorNegate—whose purposes are easy to understand.

It’s also nice that people—especially kids—can compute with images they take, or drag in. And this is actually the first example in the book where there’s rich data coming into a computation from outside. (I needed a sample image for the section, so, yes, I just snapped one right there—of me working on the book.)

Next I talk about strings and text. String operations on their own are pretty dry. But in the Wolfram Language there’s lots of interesting stuff that’s easy to do with them—like visualizing word clouds from Wikipedia, or looking at common words in different languages.

Next I cover sound, and talk about how to generate sequences of musical notes. In the printed book you can’t hear them, of course, though the little score icons give some sense of what’s there.

One might wonder, “Why not talk about sound right after graphics?” Well, first of all, I thought it wasn’t bad to mix things up a bit, to help keep the flow interesting. But more than that, there’s a certain chain of dependencies between different areas. For example, the names of musical notes are specified as strings—so one has to have talked about strings before musical notes.

Next it’s “Arrays, or Lists of Lists”. Then it’s “Coordinates and Graphics”. At first, I worried that coordinates were too “mathy”. But particularly after one’s seen arrays, it’s not so difficult to understand coordinates. And once one’s got the idea of 2D coordinates, it’s easy to go to 3D.

By this point in the book, people already know how to do some useful and real things with the Wolfram Language. So I made the next section a kind of interlude—a meta-section that gives a sense of the overall scope of the Wolfram Language, and also shows how to find information on specific topics and functions.

Now that people have seen a bit about abstract computation, it’s time to talk about real-world data, and to show how to access the vast amount of data that the Wolfram Language shares with Wolfram|Alpha.

Lots of real-world data involves units—so the next section is devoted to working with units. Once that’s done, we can talk about geocomputation: things like finding distances on the Earth, and drawing maps.

After that I talk about dates and times. One might think this wouldn’t be an interesting or useful topic. But it’s actually a really good example of real-world computation, and it’s also something one uses all over the place.

The Wolfram Language is big. But it’s based on a small number of ideas that are consistently used over and over again. One of the important objectives in the book is to cover these ideas. And the next section—on options—covers one such simple idea that’s widely used in practice.

After covering options, we’re set to talk about something that’s often viewed as a quite advanced topic: graphs and networks. But my experience is that in modern times, people have seen enough graphs and networks in their everyday life that they don’t have much trouble understanding them in the Wolfram Language. Of course, it helps a lot that the language can manipulate them directly, as just another example of symbolic objects.

After graphs and networks, we’re ready for another seemingly very advanced topic: machine learning. But even though the internal algorithms for machine learning are complicated, the actual functions that do it in the Wolfram Language are perfectly easy to understand. And what’s nice is that by doing a bunch of examples with them, one can start to get pretty good high-level intuition about the core ideas of machine learning.

Throughout the book, I try to keep things as simple as possible. But sometimes that means I have to go back for a deeper view of a topic I’ve already covered. “More about Numbers” and “More Forms of Visualization” are two examples of doing this—covering things that would have gotten in the way when numbers and visualization were first introduced, but that need to be said to get a full understanding of these areas.

The next few sections tackle the important and incredibly powerful topic of functional programming. In the past, functional programming tended to be viewed as a sophisticated topic—and certainly not something to teach people who are first learning about programming. But I think in the Wolfram Language the picture has changed—and it’s now possible to explain functional programming in a way that people will find easy to understand. I start by just talking more abstractly about the process of applying a function.

The big thing this does is set me up to talk about pure anonymous functions. In principle I could have talked about these much sooner, but I think it’s important for people to have seen many different kinds of examples of how functions are used in general—because that’s what’s needed to motivate pure functions.

The next section is where some of the real power of functional programming starts to shine through. In the abstract, functions like NestList and NestGraph sound pretty complicated and abstract. But by this point in the book, we’ve covered enough of the Wolfram Language that there are plenty of concrete examples to give—that are quite easy to understand.

The next several sections cover areas of the language that are unlocked as soon as one understands pure functions. There are lots of powerful programming techniques that emerge from a smaller number of ideas.

After functional programming, the next big topics are patterns and pattern-based programming. I could have chosen to talk about patterns earlier in the book, but they weren’t really needed until now.

What makes patterns so powerful in the Wolfram Language is something much more fundamental: the uniform structure of everything in the language, based on symbolic expressions. If I were writing a formal specification of the Wolfram Language, I would start with symbolic expressions. And I might do the same if I were writing a book for theoretical computer scientists or pure mathematicians.

It’s not that symbolic expressions are a difficult concept to understand. It’s just that without seeing how things actually work in practice in the Wolfram Language, it’s difficult to motivate abstractly studying them. But now it makes sense to talk about them, not least because they let one see the full power of what’s possible with patterns.

At this point in the book, we’re getting ready to see how to actually deploy things like web apps. There are a few more pieces to put in place to get there. I talk about associations—and then I talk about natural language understanding. Internally, the way natural language understanding works is complex. But at the level of the Wolfram Language, it’s easy to use—though to see how to connect it into things, it’s helpful to know about pure functions.

OK, so now everything is ready to talk about deploying things to the web. And at this point, people will be able to start creating useful, practical pieces of software that they can share with the world.

It’s taken 220 pages or so. But to me that’s an amazingly small number of pages to go from zero to what are essentially professional-grade web apps. If we’d just been talking about some very specific kind of app, it wouldn’t be so impressive. But we’re talking about extremely general kinds of apps, that do pretty much any kind of computation.

If you open a book about a traditional programming language like C++ or Java, one of the first things you’re likely to see is a discussion of assigning values to variables. But in my book I don’t do this until Section 38. At some level, this might seem bizarre—but it really isn’t. Because in the Wolfram Language you can do an amazing amount—including for example deploying a complete web app—without ever needing to assign a value to a variable.

And this is actually one of the reasons why it’s so easy to learn the Wolfram Language. Because if you don’t assign values to variables, every piece of code in the language stands alone, and will do the same thing whenever it’s run. But as soon as you’re assigning values to variables, there’s hidden state, and your code will do different things depending on what values variables happen to have.

Still, having talked about assigning values to variables—as well as about patterns—we’re ready to talk about defining your own functions, which is the way to build up more and more sophisticated functionality in the Wolfram Language.

At this point, you’re pretty much set in terms of the basic concepts of the Wolfram Language. But the last few sections of the book cover some important practical extensions. There’s a section on string patterns and templates. There’s a section on storing things, locally and in the cloud. There’s a section on importing and exporting. And there’s a section on datasets. Not everyone who uses the Wolfram Language will ever need datasets, but when you’re dealing with large amounts of structured data they’re very useful. And they provide an interesting example that makes use of many different ideas from the Wolfram Language.

At the end of the book, I have what are basically essay sections: about writing good code, about debugging and about being a programmer. My goal in these sections is to build on the way of thinking that I hope people have developed from reading the rest of the book, and then to communicate some more abstract principles.

I said at the beginning of this post that the book is essentially written as a conversation. In almost every section, I found it convenient to add two additional parts: Q&A and Tech Notes. The goal of Q&A is to have a place to answer obvious questions people might have, without distracting from the main narrative.

There are several different types of questions. Some are about extensions to the functionality that’s been discussed. Some are about the background to it. And some are questions (“What does ‘raised to the power’ mean?”) that will be trivial to some readers but not to others.

In addition to Q&A, I found it useful to include what I call Tech Notes. Their goal is to add technical information—and to help people who already have sophisticated technical knowledge in some particular area to connect it to what they’re reading in this book.

Another part of most sections is a collection of exercises. The vast majority are basically of the form “write a piece of code to do X”—though a few are instead “find a simpler version of this piece of code”.

There are answers to all the exercises in the printed book at the back—and in the web version there are additional exercises. Of course, the answers that are given are just possible answers—and they’re almost never the only possible answers.

Writing the exercises was an interesting experience for me, that was actually quite important in my thinking about topics like how to talk to AIs. Because what most of the exercises effectively say is, “Take this description that’s written in English, and turn it into Wolfram Language code.” If what one’s doing is simple enough, then English works quite well as a description language. But when what one’s doing gets more complicated, English doesn’t do so well. And by later in the book, I was often finding it much easier to write the Wolfram Language answer for an exercise than to create the actual exercise in English.

In a sense this is very satisfying, because it means we really need the Wolfram Language to be able to express ideas. Some things we can express easily in English—and eventually expect Wolfram|Alpha to be able to understand. But there’s plenty that requires the greater structure and precision of the Wolfram Language.

At some level it might seem odd in this day and age to be writing a book that can actually be printed on paper, rather than creating some more flexible online structure. But what I’ve found is that the concept of a book is very useful. Yes, one can have a website where one can reach lots of information by following links. But when people are trying to systematically learn a subject, I think it’s good to have a definite, finite container of information, where there’s an expectation of digesting it sequentially, and where one can readily see the overall structure.

That’s not to say that it’s not useful to have the book online. Right now the book is available as a website, and for many purposes this web version works very well. But somewhat to my surprise, I still find the physical book, with its definite pagination and browsable pages, better for many things.

Of course, if you’re going to learn the Wolfram Language, you actually need to run it. So even if you’re using a physical book, it’s best to to have a computer (or tablet) by your side—so you can try the examples, do the exercises, etc. You can do this immediately if you’re reading the book on the web or in the cloud. But some people have told me that they actually find it helpful to retype the examples: they internalize them better that way, and with all the autocompletion and other features in the Wolfram Language, it’s very fast to type in code.

I call the book an “elementary introduction”. And that’s what it is. It’s not a complete book about the Wolfram Language—far from it. It’s intended to be a basic introduction that gets people to the point where they can start writing useful programs. It covers a lot of the core principles of the language—but only a small fraction of the very large number of specific areas of functionality.

Generally I tried to include areas that are either very commonly encountered in practice, or easy for people to understand without external knowledge—and good for illuminating principles. I’m very happy with the sequence of areas I was able to cover—but another book could certainly pick quite different ones.

Of course, I was a little disappointed to have to leave out all sorts of amazing things that the Wolfram Language can do. And at the end of the book I decided to include a short section that gives a taste of what I wasn’t able to talk about.

I see my new book as part of the effort to launch the Wolfram Language. And back in 1988, when we first launched Mathematica, I wrote a book for that, too. But it was a different kind of book: it was a book that was intended to provide a complete tutorial introduction and reference guide to the whole system. The first edition was 767 pages. But by the 5th edition a decade later, the book had grown to 1488 pages. And at that point we decided a book just wasn’t the correct way to deliver the information—and we built a whole online system instead.

It’s just as well we did that, because it allowed us to greatly expand the depth of coverage, particularly in terms of examples. And of course the actual software system grew a lot—with the result that today the full Documentation Center contains more than 50,000 pages of content.

Many people have told me that they liked the original Mathematica book—and particularly the fact that it was short enough to realistically read from cover to cover. My goal with *An Elementary Introduction to the Wolfram Language* was again to have a book that’s short enough that people can actually read all of it.

Looking at the book, it’s interesting to see how much of it is about things that simply didn’t exist in the Wolfram Language—or Mathematica—until very recently. Of course it’s satisfying to me to see that things we’re adding now are important enough to make it into an elementary introduction. But it also means that even people who’ve known parts of the Wolfram Language through Mathematica for many years should find the book interesting to read.

I’ve thought for a while that there should be a book like the one I’ve now written. And obviously there are plenty of people who know the Wolfram Language well and could in principle have written an introduction to it. But I’m happy that I’ve been the one to write this book. It’s reduced my productivity on other things—like writing blogs—for a little while. But it’s been a fascinating experience.

It’s been a bit like being back hundreds of years and asking, “How should one approach explaining math to people?” And working out that first one should talk about arithmetic, then algebra, and so on. Well, now we have to do the same kind of thing for computational thinking. And I see the book as a first effort at communicating the tools of computational thinking to a broad range of people.

It’s been fun to write. I hope people find it fun to read—and that they use what they learn from it to create amazing things with the Wolfram Language.

]]>